Claude Coding实战：如何解决LLM应用中的上下文窗口限制问题

1次阅读

共计 2117 个字符，预计需要花费 6 分钟才能阅读完成。

在开发基于 Claude 等大语言模型的应用时，上下文窗口限制是一个常见且棘手的问题。想象一下，你正在构建一个法律合同分析系统，合同文本通常长达几十页甚至上百页。当这样的长文本输入到 LLM 中时，超过上下文窗口限制的部分会被直接截断，导致关键条款丢失，严重影响分析结果的准确性。类似的场景也出现在长文档 QA、技术手册解析等应用中。

传统解决方案往往采用滑动窗口方法，将长文本分割成固定大小的块，然后分别处理。这种方法虽然简单，但存在明显的缺陷：

语义不连贯：硬分割可能切断句子或段落的完整语义
信息冗余：相邻窗口间重叠部分导致重复处理
效率低下：需要多次调用模型，增加计算成本

我们提出了一种基于语义的分块处理和动态重组方案，核心思想是：

使用 Sentence-BERT 模型对文本进行语义分块
建立向量索引实现快速检索
动态重组相关上下文片段

首先，我们需要将长文档分割成语义连贯的段落。传统按字数或段落数的分割方式不够智能，我们改用 Sentence-BERT 计算句子间的语义相似度：

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer('all-MiniLM-L6-v2')

def semantic_chunking(text, threshold=0.85, min_chunk_size=3):
    sentences = text.split('.')
    embeddings = model.encode(sentences)

    chunks = []
    current_chunk = []

    for i in range(1, len(sentences)):
        similarity = np.dot(embeddings[i-1], embeddings[i])
        if similarity >= threshold and len(current_chunk) < min_chunk_size:
            current_chunk.append(sentences[i])
        else:
            if current_chunk:
                chunks.append('.'.join(current_chunk))
            current_chunk = [sentences[i]]

    if current_chunk:
        chunks.append('.'.join(current_chunk))

    return chunks

时间复杂度分析：O(n^2)，其中 n 是句子数量，因为需要计算相邻句子相似度。实际应用中可以通过设置最大窗口大小来优化。

当处理用户查询时，我们不是简单发送所有分块，而是先检索最相关的分块，然后根据注意力机制动态重组上下文：

from sklearn.metrics.pairwise import cosine_similarity

class ContextManager:
    def __init__(self, chunks):
        self.chunks = chunks
        self.embeddings = model.encode(chunks)

    def get_relevant_context(self, query, top_k=3):
        query_embedding = model.encode([query])
        similarities = cosine_similarity(query_embedding, self.embeddings)[0]
        top_indices = similarities.argsort()[-top_k:][::-1]

        # 按原始顺序重组以保持连贯性
        sorted_indices = sorted(top_indices)
        context = ' '.join([self.chunks[i] for i in sorted_indices])

        return context[:4000]  # 预留 token 空间给 prompt

我们对方案进行了系统测试，使用不同长度的法律合同文本作为测试数据：