如何高效调用ChatGPT API：从认证到流式响应的全链路实践

10次阅读

没有评论

共计 3589 个字符，预计需要花费 9 分钟才能阅读完成。

在实际开发中直接调用 OpenAI 原生 API 时，开发者常遇到以下几个典型问题：

认证令牌过期 ：API 密钥有效期通常较短，需要频繁刷新，手动处理会增加开发复杂度。
长文本响应截断 ：默认的 API 响应限制可能导致长文本被截断，影响用户体验。
并发限制 ：高并发场景下容易触发速率限制，导致请求失败。

这些问题不仅影响开发效率，还可能降低服务的稳定性。

REST：
优点：简单易用，兼容性强，适合大多数 Web 应用。
缺点：性能较低，尤其在处理流式响应时延迟较高。
gRPC：
优点：高性能，支持双向流式通信，适合高并发场景。
缺点：配置复杂，对客户端和服务端的要求较高。

对于大多数开发者来说，REST 协议是更稳妥的选择，除非有明确的性能需求。

以下是 Python 和 Node.js 的示例代码：

import requests
from datetime import datetime, timedelta
import jwt

class AuthManager:
    def __init__(self, api_key):
        self.api_key = api_key
        self.token = None
        self.expires_at = None

    def get_token(self):
        if self.token and datetime.now() < self.expires_at:
            return self.token

        # 生成新的 JWT 令牌
        payload = {
            'iss': 'your-service',
            'exp': datetime.now() + timedelta(minutes=30),
            'api_key': self.api_key
        }
        self.token = jwt.encode(payload, 'your-secret-key', algorithm='HS256')
        self.expires_at = datetime.now() + timedelta(minutes=25)  # 提前 5 分钟刷新
        return self.token

const jwt = require('jsonwebtoken');

class AuthManager {constructor(apiKey) {
        this.apiKey = apiKey;
        this.token = null;
        this.expiresAt = null;
    }

    getToken() {if (this.token && new Date() < this.expiresAt) {return this.token;}

        const payload = {
            iss: 'your-service',
            exp: Math.floor(Date.now() / 1000) + 1800, // 30 分钟后过期
            api_key: this.apiKey
        };
        this.token = jwt.sign(payload, 'your-secret-key');
        this.expiresAt = new Date(Date.now() + 25 * 60 * 1000); // 提前 5 分钟刷新
        return this.token;
    }
}

使用 Server-Sent Events（SSE）可以高效处理流式响应。以下是一个事件驱动架构的示例：

import requests

def stream_response(prompt):
    headers = {'Authorization': f'Bearer {AuthManager.get_token()}',
        'Accept': 'text/event-stream'
    }
    response = requests.post(
        'https://api.openai.com/v1/chat/completions',
        headers=headers,
        json={'prompt': prompt, 'stream': True},
        stream=True
    )

    for chunk in response.iter_lines():
        if chunk:
            yield chunk.decode('utf-8')

import time
from requests.exceptions import RequestException

def exponential_backoff(retries=3):
    def decorator(func):
        def wrapper(*args, **kwargs):
            for attempt in range(retries):
                try:
                    return func(*args, **kwargs)
                except RequestException as e:
                    if attempt == retries - 1:
                        raise
                    wait_time = (2 ** attempt) * 0.1
                    time.sleep(wait_time)
        return wrapper
    return decorator

from contextlib import contextmanager

@contextmanager
def api_session():
    session = requests.Session()
    session.headers.update({'Authorization': f'Bearer {AuthManager.get_token()}'})
    try:
        yield session
    finally:
        session.close()

import re

def filter_sensitive_data(text):
    # 过滤 API 密钥
    text = re.sub(r'sk-[a-zA-Z0-9]{48}', '[REDACTED]', text)
    # 过滤 JWT 令牌
    text = re.sub(r'eyJ[a-zA-Z0-9-_]+\.[a-zA-Z0-9-_]+\.[a-zA-Z0-9-_]+', '[REDACTED]', text)
    return text

将多个请求合并为一个批量请求，可以减少 API 调用次数，从而降低计费成本。例如：

batch_prompts = ['prompt1', 'prompt2', 'prompt3']
responses = [stream_response(prompt) for prompt in batch_prompts]

使用 Prometheus 监控 API 调用情况：

from prometheus_client import Counter, Gauge

api_calls_total = Counter('api_calls_total', 'Total API calls', ['endpoint', 'status'])
api_latency = Gauge('api_latency_seconds', 'API latency in seconds', ['endpoint'])

@exponential_backoff()
def call_api(endpoint, data):
    start_time = time.time()
    try:
        response = requests.post(endpoint, json=data)
        api_calls_total.labels(endpoint=endpoint, status='success').inc()
        return response
    except RequestException:
        api_calls_total.labels(endpoint=endpoint, status='failure').inc()
        raise
    finally:
        api_latency.labels(endpoint=endpoint).set(time.time() - start_time)

确保遵守欧盟 GDPR 法规：

用户数据加密存储
提供数据删除接口
记录数据访问日志

使用 Locust 进行每秒 100 次调用的压力测试：

from locust import HttpUser, task, between

class ApiUser(HttpUser):
    wait_time = between(0.1, 0.5)

    @task
    def call_chatgpt(self):
        self.client.post('/chat', json={'prompt': 'Hello, world!'})

对比原生 SDK 与封装方案的 CPU/ 内存占用：