SpringAI + DeepSeek大模型实战：构建高效ChatGPT应用开发框架

8次阅读

共计 2536 个字符，预计需要花费 7 分钟才能阅读完成。

近年来，大模型应用开发面临三个主要挑战：

响应延迟 ：传统的 API 调用方式需要网络往返，导致响应时间不可控
算力成本 ：云服务按 Token 计费的方式让长期运营成本居高不下
并发瓶颈 ：同步阻塞的架构难以应对突发流量

依赖管理
SpringAI 通过自动配置简化模型服务的初始化
传统 Servlet 需要手动维护模型实例的生命周期
并发处理
WebFlux 的非阻塞特性更适合大模型的长耗时请求
Servlet 的线程池模型容易在高并发时耗尽资源

原生中文支持：基于 GB18030 标准训练，中文 Tokenization 效率比 GPT- 3 高 40%
量化部署：支持 8bit 量化后仅需 16GB 显存即可运行
长文本处理：最大支持 32k 上下文长度

@SpringBootApplication
@EnableModelServers(basePackages = "com.example.ai")
public class AiApplication {public static void main(String[] args) {SpringApplication.run(AiApplication.class, args);
    }
}

@ModelServer
public class DeepSeekService {
    private final DeepSeekModel model;

    @Autowire
    public DeepSeekService(ModelLoader loader) {this.model = loader.load("deepseek-v2");
    }

    @ModelMethod
    public CompletionResult generate(CompletionRequest request) {return model.generate(request);
    }
}

@RestController
@RequestMapping("/api/v1/chat")
public class ChatController {

    @Autowired
    private DeepSeekService modelService;

    @PostMapping(produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> streamChat(@RequestBody ChatRequest request) {
        return Flux.create(sink -> {modelService.generateAsync(request)
                .onNext(token -> {if (!sink.isCancelled()) {sink.next(token);
                    }
                })
                .onComplete(() -> {if (!sink.isCancelled()) {sink.complete();
                    }
                });
        }, FluxSink.OverflowStrategy.BUFFER);
    }
}

public class PromptTemplateFactory {

    private static final Map<String, String> TEMPLATES = Map.of("customer_service", "你是一位专业的客服代表，请用友好但专业的态度回答用户问题。\n 问题：{question}",
        "technical_support", "作为技术支持工程师，请用简洁的技术语言解决问题。\n 错误描述：{error}"
    );

    public String buildPrompt(String scenario, Map<String, String> params) {String template = TEMPLATES.get(scenario);
        if (template == null) {throw new IllegalArgumentException("Unknown scenario:" + scenario);
        }

        String prompt = template;
        for (Map.Entry<String, String> entry : params.entrySet()) {prompt = prompt.replace("{" + entry.getKey() + "}", entry.getValue());
        }

        return prompt;
    }
}

请求方式	平均延迟 (ms)	P99 延迟 (ms)
本地 API 调用	120	250
云端 HTTP 调用	450	1200

hystrix:
  command:
    default:
      execution:
        isolation:
          thread:
            timeoutInMilliseconds: 5000
      circuitBreaker:
        requestVolumeThreshold: 20
        errorThresholdPercentage: 50
        sleepWindowInMilliseconds: 10000

使用 WeakReference 持有模型实例
加载新模型前显式调用 GC
添加 JVM 参数：-XX:+UseG1GC -XX:MaxGCPauseMillis=200

使用 jieba 分词预处理文本

动态计算 Token 数量：

public int calculateTokens(String text) {List<String> words = JiebaSegmenter.singleton().process(text, SegMode.INDEX);
    return words.stream()
        .mapToInt(word -> (int) Math.ceil(word.length() * 0.8))
        .sum();}