调用ChatGPT API的代码实战：从接入到生产环境优化

16次阅读

没有评论

共计 2964 个字符，预计需要花费 8 分钟才能阅读完成。

在当前的 AI 应用开发中，调用 ChatGPT API 已成为快速集成智能对话能力的首选方案。然而，在实际开发过程中，许多开发者都会遇到各种挑战。本文将分享一套经过生产环境验证的 Python 实现方案，帮助大家解决这些痛点。

调用 ChatGPT API 时，开发者常会遇到以下几个典型问题：

认证管理复杂：API 密钥需要安全存储和定期轮换，直接暴露在代码中会带来安全隐患
速率限制处理：免费账号有严格的每分钟请求限制，容易触发 429 错误
长文本截断：当输入超过模型最大 token 限制时，需要手动分段处理
错误恢复困难：网络波动或服务端错误需要合理的重试机制
响应延迟高：大模型的推理时间可能导致客户端超时

与直接调用 API 相比，使用官方 SDK 虽然简化了部分操作，但也存在一些限制：

SDK 版本更新滞后于 API 功能更新
自定义错误处理和日志记录不够灵活
难以实现高级功能如请求合并、流式处理等

下面是一个带有基础功能的 API 封装类，采用 Python 实现：

import openai
import time
from functools import lru_cache
from tenacity import retry, stop_after_attempt, wait_exponential

class ChatGPTClient:
    def __init__(self, api_key: str, organization: str = None):
        self.api_key = api_key
        self.organization = organization

    @lru_cache(maxsize=1)
    def _get_client(self):
        """缓存 OpenAI 客户端实例"""
        openai.api_key = self.api_key
        if self.organization:
            openai.organization = self.organization
        return openai

    @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
    async def async_completion(self, prompt: str, model="gpt-3.5-turbo", **kwargs):
        """异步完成 API 调用"""
        client = self._get_client()
        try:
            response = await client.ChatCompletion.acreate(
                model=model,
                messages=[{"role": "user", "content": prompt}],
                **kwargs
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"API 调用失败: {str(e)}")
            raise

    def split_long_text(self, text: str, max_tokens: int = 4000):
        """处理长文本分段"""
        # 实现按 token 数分割文本的逻辑
        pass

认证缓存 ：使用@lru_cache 装饰器缓存 OpenAI 客户端实例，避免重复初始化
错误重试 ：通过tenacity 库实现指数退避重试机制，3 次重试机会，等待时间从 4 秒到 10 秒指数增长
异步支持：同时提供同步和异步接口，适应不同场景
长文本处理 ：预留split_long_text 方法用于处理超过 token 限制的输入

对于大段文本生成，使用流式 API 可以显著降低内存占用：

def stream_response(self, prompt: str, callback=None, **kwargs):
    """流式处理 API 响应"""
    client = self._get_client()
    response = client.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}],
        stream=True,
        **kwargs
    )

    full_response = ""
    for chunk in response:
        content = chunk.choices[0].delta.get("content", "")
        if callback:
            callback(content)  # 实时处理每个 chunk
        full_response += content
    return full_response

当需要处理多个相似请求时，可以通过以下方式减少 API 调用：

将多个问题合并为一个更长的 prompt
使用 ”\n\n” 分隔不同问题
在 post-processing 阶段拆分响应

建议监控以下关键指标：

请求成功率（200 vs 非 200 响应）
平均响应延迟（P50/P90/P99）
Token 使用量（输入 / 输出）

可以使用 Prometheus 客户端实现：

from prometheus_client import Counter, Histogram

API_CALLS = Counter('chatgpt_api_calls', 'API call count', ['status'])
RESPONSE_TIME = Histogram('chatgpt_response_time', 'API response time in seconds')

@RESPONSE_TIME.time()
def call_api(self, prompt: str):
    try:
        response = self._raw_api_call(prompt)
        API_CALLS.labels(status='success').inc()
        return response
    except Exception as e:
        API_CALLS.labels(status='fail').inc()
        raise

根据 QPS 需求选择合适的部署方案：