云服务器搭建ChatGPT镜像实战：从零构建到性能优化

17次阅读

共计 3556 个字符，预计需要花费 9 分钟才能阅读完成。

最近在云服务器上部署 ChatGPT 服务时，遇到了几个头疼的问题：

高延迟问题：直接调用 OpenAI 官方 API，由于服务器地理位置限制，国内访问延迟常常超过 500ms
API 调用限制：免费账号每分钟只有 3 次调用额度，商业账号又存在突发流量被限流风险
数据合规需求：某些业务场景要求对话数据必须保留在自有服务器

在容器编排方案上，我们比较了三种主流方案：

方案	学习成本	资源占用	适用场景
Docker Swarm	低	低	小型集群
Kubernetes	高	中	大规模生产环境
Docker-Compose	极低	最低	单机快速部署

最终选择 Docker-Compose 的原因：

我们的 ChatGPT 镜像主要运行在单台云服务器上
不需要复杂的服务发现和自动扩缩容功能
配置简单，调试方便

基础镜像选择nvidia/cuda:11.8.0-base-ubuntu20.04，这是目前 PyTorch 官方推荐的 CUDA 版本。

# Dockerfile
FROM nvidia/cuda:11.8.0-base-ubuntu20.04

# 安装 Python3.9 和基础依赖
RUN apt-get update && apt-get install -y \
    python3.9 \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

# 设置软链接让 python3 指向 python3.9
RUN ln -s /usr/bin/python3.9 /usr/bin/python3

# 安装 PyTorch with CUDA 支持
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# 安装其他依赖
COPY requirements.txt .
RUN pip3 install -r requirements.txt

# 拷贝模型文件和代码
COPY . /app
WORKDIR /app

# 暴露 HTTP 端口
EXPOSE 8000

# 启动命令
CMD ["python3", "app.py"]

# 反向代理配置
server {
    listen 443 ssl;
    server_name chat.yourdomain.com;

    # SSL 证书配置（使用 Certbot 自动续期）ssl_certificate /etc/letsencrypt/live/chat.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/chat.yourdomain.com/privkey.pem;

    # 限流配置（每秒 10 个请求）limit_req_zone $binary_remote_addr zone=chatlimit:10m rate=10r/s;

    location / {
        limit_req zone=chatlimit burst=20 nodelay;
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    # gRPC 代理配置
    location /grpc/ {grpc_pass grpc://localhost:9000;}
}

# api_client.py
import asyncio
import aiohttp
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3),  # 最多重试 3 次
    wait=wait_exponential(multiplier=1, min=1, max=10)  # 指数退避
)
async def chat_completion(prompt, api_key):
    headers = {"Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

    payload = {
        "model": "gpt-3.5-turbo",
        "messages": [{"role": "user", "content": prompt}]
    }

    async with aiohttp.ClientSession() as session:
        async with session.post(
            "https://api.openai.com/v1/chat/completions",
            json=payload,
            headers=headers
        ) as response:
            if response.status != 200:
                raise Exception(f"API 请求失败: {response.status}")
            return await response.json()

# 添加 Prometheus 监控支持
RUN pip install prometheus-client

# 添加健康检查
HEALTHCHECK --interval=30s --timeout=3s \
    CMD curl -f http://localhost:8000/health || exit 1

# 暴露监控端口
EXPOSE 9090

#!/bin/bash
# preload_model.sh

# 预热加载模型
curl -X POST "http://localhost:8000/load_model" \
    -H "Content-Type: application/json" \
    -d '{"model_name":"gpt-3.5-turbo"}'

# 等待模型加载完成
while true; do
    response=$(curl -s "http://localhost:8000/health")
    if [[$response == *"ready"*]]; then
        echo "模型预热完成"
        break
    fi
    sleep 5
done

# middleware.py
from fastapi import Request, HTTPException
from datetime import datetime, timedelta
import jwt

SECRET_KEY = "your-secret-key"
ALGORITHM = "HS256"

async def auth_middleware(request: Request):
    token = request.headers.get("Authorization")
    if not token:
        raise HTTPException(status_code=401, detail="未提供认证令牌")

    try:
        payload = jwt.decode(token.split(" ")[1], SECRET_KEY, algorithms=[ALGORITHM])
        request.state.user_id = payload.get("sub")
    except jwt.ExpiredSignatureError:
        raise HTTPException(status_code=401, detail="令牌已过期")
    except jwt.InvalidTokenError:
        raise HTTPException(status_code=401, detail="无效令牌")

在阿里云 4 核 8G GPU 实例（T4 显卡）上的测试结果：