面向开发者的ChatGPT集成实战：从API调用到生产环境优化

17次阅读

共计 2350 个字符，预计需要花费 6 分钟才能阅读完成。

在将 ChatGPT 集成到应用程序时，开发者常遇到以下核心问题：

Token 计算不准确 ：
中文混合编码导致 token 计数偏差
长文本截断引发内容丢失
计费预估与实际消耗差异
对话上下文丢失 ：
多轮对话状态维护困难
超过 max_tokens 限制时历史消息被丢弃
分布式环境下的会话一致性
突发流量处理 ：
API 速率限制（RPM/TPM）突发触发
错误重试导致的雪崩效应
冷启动延迟影响用户体验

方案类型	QPS 能力	维护成本	灵活性	适用场景
直接调用 API	中等	低	高	简单需求 / 快速原型
官方 SDK	高	中	中	标准业务场景
自建代理层	极高	高	极高	企业级 / 定制化需求

import aiohttp
from typing import List, Dict
import logging

logger = logging.getLogger(__name__)

class ChatGPTBatchProcessor:
    def __init__(self, api_key: str):
        self.api_key = api_key
        self.session = aiohttp.ClientSession()

    async def batch_request(self, 
                          messages_list: List[List[Dict[str, str]]],
                          model: str = "gpt-3.5-turbo",
                          temperature: float = 0.7) -> List[str]:
        """
        temperature 参数说明：- 0.0: 确定性输出
        - 0.7: 平衡创造性
        - 1.0: 最大多样性
        """
        results = []
        try:
            async with self.session.post(
                "https://api.openai.com/v1/chat/completions",
                headers={"Authorization": f"Bearer {self.api_key}"},
                json={
                    "model": model,
                    "messages": messages_list,
                    "temperature": temperature
                }
            ) as response:
                if response.status == 200:
                    data = await response.json()
                    results = [choice['message']['content'] 
                              for choice in data['choices']]
                else:
                    logger.error(f"API error: {response.status}")
                    raise Exception(f"API request failed: {response.status}")
        except Exception as e:
            logger.exception("Batch request failed")
            raise
        finally:
            await self.session.close()
        return results

生成机制 ：
使用 UUID+ 时间戳 + 用户 ID 哈希
示例：ses_<user_id>_<timestamp>_<random_str>

存储结构 ：

{
  "session_id": "ses_abc123",
  "message_history": [{"role": "user", "content": "你好"},
    {"role": "assistant", "content": "您好！"}
  ],
  "created_at": 1689292800,
  "last_accessed": 1689292850
}

请求批次	gpt-3.5-turbo(ms)	gpt-4(ms)
1-100	320	650
101-500	350	700
501-1000	380	750

import redis
from datetime import datetime, timedelta

class RateLimiter:
    def __init__(self, redis_conn: redis.Redis, max_requests: int, window_seconds: int):
        self.redis = redis_conn
        self.max_requests = max_requests
        self.window = window_seconds

    def is_allowed(self, key: str) -> bool:
        now = datetime.now()
        window_start = now - timedelta(seconds=self.window)

        pipe = self.redis.pipeline()
        pipe.zremrangebyscore(key, 0, window_start.timestamp())
        pipe.zcard(key)
        pipe.zadd(key, {now.timestamp(): now.timestamp()})
        pipe.expire(key, self.window)
        _, count, _, _ = pipe.execute()

        return count <= self.max_requests