服务器端集成ChatGPT API的实战指南：从认证到高并发优化

13次阅读

没有评论

共计 2156 个字符，预计需要花费 6 分钟才能阅读完成。

直接在前端调用 ChatGPT API 会带来三个致命问题：

密钥暴露风险：前端代码无法保密 API Key，可能被恶意抓取
IP 限制问题：OpenAI 的 API 限制来自单个 IP 的请求频率
请求碎片化：每个客户端直连会导致重复内容消耗 Token 额度

通过服务器端集成，我们可以实现：

统一认证和请求管理
请求合并与结果缓存
企业级安全管控

方案一：环境变量（适合中小项目）

# .env 文件（加入.gitignore）OPENAI_API_KEY=sk-xxxxxxxx

# Python 读取示例
from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

方案二：KMS 服务（生产环境推荐）

// AWS KMS 示例(Node.js)
const {KMSClient, DecryptCommand} = require('@aws-sdk/client-kms');

async function getApiKey() {const client = new KMSClient({ region: 'us-west-2'});
  const response = await client.send(new DecryptCommand({CiphertextBlob: Buffer.from(process.env.ENCRYPTED_API_KEY, 'base64')
  }));
  return response.Plaintext.toString('utf-8');
}

import openai
import time
import random

async def chat_with_retry(messages, max_retries=3):
    base_delay = 1
    for attempt in range(max_retries):
        try:
            response = await openai.ChatCompletion.create(
                model="gpt-3.5-turbo",
                messages=messages
            )
            return response
        except Exception as e:
            if attempt == max_retries - 1:
                raise

            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
            time.sleep(delay)

使用 aiohttp 连接池配置示例：

import aiohttp

conn = aiohttp.TCPConnector(
    limit=100,  # 最大连接数
    limit_per_host=50,  # 单域名最大连接
    enable_cleanup_closed=True  # 自动清理关闭连接
)

async with aiohttp.ClientSession(connector=conn) as session:
    # 业务代码

压测数据对比（JMeter 100 并发）：

配置方式	平均响应时间	错误率
无连接池	1200ms	15%
连接池优化	450ms	0.2%

当使用 stream=True 时，必须：

确保 TCP 连接不会超时断开（Nginx 默认 60s）
客户端需要实现心跳检测
服务端设置合理的keepalive_timeout

用户输入内容需过滤 PII（个人身份信息）
日志中不能记录完整对话内容
提供数据删除接口

推荐组合：

Prometheus 记录每分钟 Token 消耗
Grafana 设置费用预警
每日调用量审计报表

import redis
import hashlib
import json

r = redis.Redis(host='localhost', port=6379)

def get_cache_key(prompt):
    return f"gpt_cache:{hashlib.md5(prompt.encode()).hexdigest()}"

def get_cached_response(prompt):
    key = get_cache_key(prompt)
    cached = r.get(key)
    return json.loads(cached) if cached else None

graph TD
    A[请求到达] --> B{剩余 Token>0?}
    B -->| 是 | C[扣除 Token 并处理]
    B -->| 否 | D[返回 429 错误]
    C --> E[记录消耗 Token]
    F[定时任务] --> G[重置 Token 配额]

测试环境：AWS t3.xlarge（同区域）