ChatGPT API成本优化实战：如何精确计算每个token的费用

17次阅读

没有评论

共计 2323 个字符，预计需要花费 6 分钟才能阅读完成。

最近在项目中使用 ChatGPT API 时，我发现一个很现实的问题：每次调用 API 的费用很难预估。尤其是当对话历史变长或者请求内容复杂时，账单上的数字总是让人心惊肉跳。这种不可预测性给项目预算带来了很大压力。

ChatGPT API 是按照 token 数量来计费的。这里的 token 不是传统意义上的单词，而是语言模型处理文本时的最小单位。一个英文单词大约等于 1 - 2 个 token，而中文汉字通常一个字符就是一个 token。

目前 OpenAI 提供了多种模型，价格也不同：

gpt-3.5-turbo：每 1000 个 token $0.002
gpt-4：每 1000 个 token $0.03
gpt-4-32k：每 1000 个 token $0.06

值得注意的是，API 调用时输入和输出的 token 都会被计入费用。

要精确计算费用，首先需要知道每次调用消耗了多少 token。OpenAI 的 API 响应中会包含 token 使用量信息。下面是一个 Python 示例：

import openai
from typing import Dict, Any

def count_tokens(prompt: str, model: str = "gpt-3.5-turbo") -> Dict[str, Any]:
    try:
        response = openai.ChatCompletion.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=1000,
            temperature=0.7
        )

        input_tokens = response["usage"]["prompt_tokens"]
        output_tokens = response["usage"]["completion_tokens"]
        total_tokens = response["usage"]["total_tokens"]

        return {
            "input_tokens": input_tokens,
            "output_tokens": output_tokens,
            "total_tokens": total_tokens,
            "cost": calculate_cost(total_tokens, model)
        }
    except Exception as e:
        print(f"Error counting tokens: {str(e)}")
        raise

def calculate_cost(token_count: int, model: str) -> float:
    """计算 token 对应的费用"""
    prices = {
        "gpt-3.5-turbo": 0.002,
        "gpt-4": 0.03,
        "gpt-4-32k": 0.06
    }
    return (token_count / 1000) * prices.get(model, 0.002)

每次 API 调用时，如果包含了之前的对话历史，这些历史消息也会被计入 token 消耗。这意味着长时间的对话会越来越昂贵。

有效的策略包括：

定期清除非必要的对话历史
对历史消息进行摘要而不是完整保留
只在必要时才携带历史上下文

精简提示词，去除冗余信息
使用缩写但明确的指令
避免重复内容
考虑将长提示拆分为多个 API 调用

设置合理的 max_tokens 参数非常重要。太小的值可能导致回答不完整，太大的值则会增加不必要的费用。

# 根据场景设置合适的 max_tokens
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=messages,
    max_tokens=500  # 根据实际需要调整
)

认为只有输出内容会收费（实际上输入也会）
忽略不同模型的价格差异
低估长对话的累计成本

实现 API 调用限流
设置每日 / 每月预算上限
使用更便宜的模型处理简单任务

可以使用 Prometheus+Grafana 搭建简单的监控系统，记录每次 API 调用的 token 消耗和费用。

import time
from prometheus_client import start_http_server, Gauge

# 创建指标
token_gauge = Gauge('api_token_usage', 'Token usage per API call', ['model'])
cost_gauge = Gauge('api_call_cost', 'Cost per API call', ['model'])

def track_usage(model: str, tokens: int, cost: float):
    token_gauge.labels(model=model).set(tokens)
    cost_gauge.labels(model=model).set(cost)

def check_budget(monthly_spend: float, threshold: float = 100.0):
    if monthly_spend > threshold:
        send_alert(f"API spending exceeded ${threshold} this month")

通过精确计算 token 消耗和合理的优化策略，我们可以有效控制 ChatGPT API 的使用成本。但这也带来了一些思考：在追求成本效益的同时，如何平衡用户体验和功能完整性？特别是在需要较长上下文或复杂推理的场景下，是否有更好的折中方案？欢迎分享你的经验和想法。

正文完