Open WebUI 深度整合 ChatGPT：从本地部署到生产环境优化

1次阅读

没有评论

共计 1968 个字符，预计需要花费 5 分钟才能阅读完成。

直接使用 ChatGPT API 存在几个明显问题：

速率限制 ：官方 API 有严格的每分钟请求上限，高峰期容易触发限流
数据隐私 ：对话内容需传输到第三方服务器，不符合企业数据合规要求
成本不可控 ：按 token 计费方式在长期使用时成本难以预测

主流自托管方案包括 Open WebUI 和 FastChat：

Open WebUI 优势 ：
原生支持 Docker 一键部署
内置用户权限管理系统
可视化对话历史管理
支持插件扩展机制
FastChat 特点 ：
更适合研究场景
需要手动搭建 Gradio 界面
模型权重加载更灵活

确保已安装：
– Docker 20.10+
– NVIDIA 驱动（GPU 加速场景）
– 至少 16GB 内存

创建 docker-compose.yml 文件：

version: '3.8'

services:
  webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:3000"
    volumes:
      - ./data:/app/data
    environment:
      - OPENAI_API_KEY=your_api_key
      - CACHE_ENABLED=true
    deploy:
      resources:
        limits:
          cpus: '4'
          memory: 8G

关键参数说明：
– volumes 挂载确保数据持久化
– deploy.resources 限制容器资源占用

修改 config.yml 添加本地模型：

models:
  - name: "my-llama"
    base_url: "http://localhost:5000"
    api_key: ""
    parameters:
      temperature: 0.7

实现 Redis 缓存层：

from redis import Redis
from functools import wraps

redis = Redis(host='localhost', port=6379)

def cache_response(ttl=300):
    def decorator(f):
        @wraps(f)
        async def wrapper(*args, **kwargs):
            cache_key = f"{f.__name__}:{str(kwargs)}"
            cached = redis.get(cache_key)
            if cached:
                return cached.decode()

            result = await f(*args, **kwargs)
            redis.setex(cache_key, ttl, result)
            return result
        return wrapper
    return decorator

使用 FastAPI 的异步端点：

from fastapi import FastAPI
import httpx

app = FastAPI()

@app.post("/chat")
async def chat_endpoint(prompt: str):
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "http://webui:3000/api/chat",
            json={"prompt": prompt}
        )
    return response.json()

通过 Nginx 添加基础认证：

location / {
    auth_basic "Restricted";
    auth_basic_user_file /etc/nginx/.htpasswd;
    proxy_pass http://webui:3000;
}

创建过滤中间件：

from fastapi import Request

banned_words = ["credit_card", "password"]

async def filter_middleware(request: Request):
    body = await request.body()
    if any(word in body.decode() for word in banned_words):
        raise HTTPException(status_code=403)