SpringAI整合ChatGPT实战：构建高效AI服务接口的完整指南

7次阅读

没有评论

共计 3067 个字符，预计需要花费 8 分钟才能阅读完成。

在直接调用 ChatGPT API 时，开发者常遇到几个典型问题：

速率限制：OpenAI 对 API 调用有严格的每分钟请求数限制，突发流量容易触发 429 错误
响应延迟：GPT-3.5/ 4 模型生成长文本时，服务端处理时间可能超过 10 秒，导致客户端超时
错误处理复杂：需要处理网络抖动、内容过滤、token 超额等多种异常场景
上下文管理：多轮对话需要维护会话状态，增加了业务逻辑复杂度

这些痛点使得直接裸调 API 难以满足生产环境要求，我们需要更健壮的集成方案。

常见的 ChatGPT 集成方案主要有三种：

原生 HTTP 调用
优点：实现简单，无需额外依赖
缺点：需要自行处理重试、限流等逻辑，维护成本高
Spring Cloud OpenFeign
优点：声明式接口，内置负载均衡
缺点：对异步支持有限，错误处理不够细化
SpringAI（推荐方案）
优势：
- 模块化设计，可插拔的 AI 提供商支持
- 内置 prompt 模板和上下文管理
- 提供统一的异常处理体系
- 支持响应式编程模型

首先在 pom.xml 中添加依赖：

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
    <version>0.8.0</version>
</dependency>

在 application.yml 中配置：

spring:
  ai:
    openai:
      api-key: ${OPENAI_KEY}
      chat:
        options:
          model: gpt-3.5-turbo
          temperature: 0.7

建议抽象出独立的 AI 服务层：

@Service
@RequiredArgsConstructor
public class AIChatService {
    private final ChatClient chatClient;

    @Async
    public CompletableFuture<String> generateResponse(String prompt) {
        PromptTemplate template = new PromptTemplate("""
          你是一位专业的 AI 助手，请用中文回答。问题：{question}
          """);

        Prompt engineeredPrompt = template.create(Map.of("question", prompt)
        );

        return CompletableFuture.completedFuture(chatClient.call(engineeredPrompt).getResult().getOutput().getContent());
    }
}

SpringAI 天然支持响应式编程，这是处理长耗时 AI 调用的最佳实践：

@GetMapping("/chat")
public Mono<ResponseEntity<String>> chat(@RequestParam String message) {return Mono.fromCallable(() -> aIChatService.generateResponse(message))
        .timeout(Duration.ofSeconds(30))
        .onErrorResume(e -> {log.error("API 调用失败", e);
            return Mono.just("系统繁忙，请稍后重试");
        })
        .map(response -> ResponseEntity.ok(response));
}

对于批量查询场景，可以使用 ChatCompletionRequest 的messages数组：

List<ChatMessage> messages = questions.stream()
    .map(q -> new ChatMessage("user", q))
    .collect(Collectors.toList());

ChatCompletionRequest request = ChatCompletionRequest.builder()
    .model("gpt-3.5-turbo")
    .messages(messages)
    .build();

建议采用两级缓存策略：

本地缓存：使用 Caffeine 缓存高频问题
分布式缓存：Redis 存储历史会话

@Cacheable(value = "aiResponses", key = "#prompt.hashCode()")
public String getCachedResponse(String prompt) {return chatClient.call(new Prompt(prompt)).getResult().getOutput().getContent();}

通过 RetryTemplate 处理瞬时故障：

@Bean
public RetryTemplate retryTemplate() {return new RetryTemplateBuilder()
        .maxAttempts(3)
        .exponentialBackoff(1000, 2, 5000)
        .retryOn(OpenAiApiException.class)
        .build();}

使用 Guava RateLimiter 控制 QPS：

private final RateLimiter rateLimiter = RateLimiter.create(50); // 50 QPS

public String rateLimitedCall(String prompt) {if (!rateLimiter.tryAcquire()) {throw new BusyException("系统繁忙");
    }
    return chatClient.call(prompt);
}

建议监控以下关键指标：

请求成功率
平均响应时间
Token 消耗量
限流触发次数

可以通过 Micrometer 暴露 metrics：

@Timed(value = "ai.request.time", description = "AI 请求耗时")
@Counted(value = "ai.request.count", description = "AI 请求次数")
public String monitoredCall(String prompt) {/*...*/}

解决方案：

检查是否超出 OpenAI 的 RPM 限制
实现指数退避重试
考虑升级到更高限额的 API 套餐

可能原因：

API 端点不可用
客户端超时设置过短

处理方案：

@Retryable(value = {ServiceUnavailableException.class}, 
           maxAttempts = 2,
           backoff = @Backoff(delay = 1000))
public String reliableCall(String prompt) {/*...*/}

当前方案仍然基于同步轮询机制，当需要处理大量并发请求时，可以考虑：