Agent.md技能开发实战：如何高效完成Skill模块的自动化测试

47次阅读

没有评论

共计 2338 个字符，预计需要花费 6 分钟才能阅读完成。

在 Agent.md 的技能开发过程中，手工测试一直是效率瓶颈。尤其在多技能组合场景下，问题更加突出：

手工测试耗时 ：每次代码变更后，需要手动触发所有相关技能并验证输出，单个技能回归测试平均耗时 15 分钟
组合测试复杂度高 ：当技能 A 依赖技能 B 的输出时，测试流程需要人工串联多个技能调用链
环境差异问题 ：本地通过的测试在 CI 环境可能因网络延迟或依赖服务版本差异而失败

对比主流 BDD 测试框架的适应性：

Robot Framework：关键字驱动适合 QA 团队，但扩展复杂技能测试时灵活性不足
Cucumber：JVM 生态完善，但 Python 技能需要额外桥接层
behave：
原生支持 Python 技能开发
Gherkin 语法与开发语言同栈
可直接复用技能业务逻辑代码

最终选择 behave+allure 报告组合，测试代码示例：

# features/environment.py
from behave import fixture
from skill_service import SkillRunner

@fixture
def skill_runner(context):
    context.runner = SkillRunner(mock_mode=True)
    yield context.runner
    context.runner.cleanup()

# features/weather_skill.feature
Feature: Weather Skill API
  Scenario: Query valid city weather
    Given 当前用户位于 "北京"
    When 查询天气信息
    Then 应返回包含 "温度" 字段的 JSON 响应
    And 响应状态码应为 200

# features/steps/weather_steps.py
from behave import given, when, then
import json

@given('当前用户位于"{city}"')
def set_user_location(context, city):
    context.location = {"city": city}  # 共享测试上下文

@when('查询天气信息')
def query_weather(context):
    try:
        context.response = context.runner.execute(
            "weather", 
            params={"location": context.location}
        )
    except Exception as e:
        context.error = e  # 异常捕获

# features/steps/api_steps.py
@then('技能应保证相同请求幂等')
def verify_idempotency(context):
    first_response = context.runner.execute("payment", params=context.params)
    second_response = context.runner.execute("payment", params=context.params)

    assert first_response["transaction_id"] == second_response["transaction_id"],\
        "重复请求产生了不同事务 ID"

# behave.ini
[behave]
format = allure_behave.formatter:AllureFormatter
stage = development

[behave.userdata]
workers = 4  # 并行工作进程数

# features/performance/steps.py
import time

@when('首次调用技能 API')
def measure_cold_start(context):
    start_time = time.perf_counter()
    context.runner.execute("cold_skill")
    context.cold_start_time = time.perf_counter() - start_time

每个 Scenario 前重置技能状态
使用独立测试账户
数据库操作添加测试标记

# features/environment.py
def before_scenario(context, scenario):
    context.runner.reset()
    context.test_id = f"test_{scenario.name.lower()}"

# features/steps/mock_steps.py
from unittest.mock import patch

@when('模拟支付网关返回超时')
def mock_payment_timeout(context):
    with patch('payment_gateway.charge', 
        side_effect=TimeoutError("Mocked timeout")):
        context.response = context.runner.execute("payment")

后续优化方向：