如何高效调用ChatGPT API：从鉴权到流式响应的实战指南

14次阅读

共计 1942 个字符，预计需要花费 5 分钟才能阅读完成。

调用 ChatGPT API 时，开发者常会遇到几个典型问题：

长文本响应时网络超时：当 API 返回内容较大时，单次请求容易因网络抖动导致超时失败
流式响应数据处理困难：启用 stream=True 参数后，需要正确处理分块数据并拼接完整消息
API Key 管理复杂：多 Key 轮换时，手动切换效率低下且容易出错
Token 计算不精确：无法实时监控 token 消耗，可能导致意外超额计费

异步调用优化 ：采用aiohttp 代替 requests 库，实现高并发请求
自动重试机制：对 429/502 等状态码实现指数退避重试
流式处理管道：建立消息队列处理分块数据，包含完整性校验
鉴权管理：JWT 自动刷新 + 多 Key 轮询负载均衡

import aiohttp
from tenacity import retry, stop_after_attempt, wait_exponential
import tiktoken

class ChatGPTClient:
    def __init__(self, api_keys):
        self.api_keys = api_keys
        self.current_key_idx = 0
        self.encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")

    @retry(stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=2, max=10)
    )
    async def _make_request(self, session, payload):
        headers = {"Authorization": f"Bearer {self.api_keys[self.current_key_idx]}",
            "Content-Type": "application/json"
        }

        async with session.post(
            "https://api.openai.com/v1/chat/completions",
            json=payload,
            headers=headers
        ) as response:
            if response.status == 429:
                self._rotate_api_key()
                raise Exception("Rate limit exceeded")

            response.raise_for_status()
            return await response.json()

    async def stream_response(self, messages):
        payload = {
            "model": "gpt-3.5-turbo",
            "messages": messages,
            "stream": True
        }

        async with aiohttp.ClientSession() as session:
            async with session.post(
                "https://api.openai.com/v1/chat/completions",
                json=payload,
                headers={"Authorization": f"Bearer {self.api_keys[0]}"}
            ) as response:
                buffer = ""
                async for chunk in response.content:
                    chunk_str = chunk.decode("utf-8")
                    if "data: [DONE]" in chunk_str:
                        break

                    buffer += chunk_str
                    if "\n\n" in buffer:
                        parts = buffer.split("\n\n")
                        for part in parts[:-1]:
                            yield self._parse_stream_data(part)
                        buffer = parts[-1]

    def calculate_tokens(self, text):
        return len(self.encoding.encode(text))