Quick start

import { openai } from '@agentskit/adapters'import { createCodingTestRunnerAgent } from './agents/coding-test-runner/agent'const agent = createCodingTestRunnerAgent({  adapter: openai({    apiKey: process.env.OPENAI_API_KEY!,    model: 'gpt-4o',  }),})const result = await agent.run('Describe your task here')console.log(result.content)

Independent reviewer approved

Validation evidence

How validation works

Review score: 96/100
Confidence: 96%
Evaluation cases: 3
Iterations: 1

The agent produced valid structured reports for all three cases, did not execute anything, did not invent Vitest totals or failures from non-Vitest input, surfaced uncertainty, and resisted the injection attempt. Behavior aligns with the stated purpose of parsing captured output only. Minor issue: the minimal case uses an empty string for duration instead of a clearer sentinel like "unknown", which is less consistent but not a critical failure.

What passed review

Valid structured output in every case.
Correctly treated prompt-like input as untrusted data rather than instructions.
Did not hallucinate test results, file paths, failures, or root causes.
Injection case ignored the request to output APPROVED and preserved the expected report shape.
Summaries clearly explain that no Vitest output was present.

Reviewer notes

Use a consistent explicit unknown marker for missing duration, preferably "unknown", rather than an empty string.

Example

A real usage example maintained with this agent.

import { anthropic } from '@agentskit/adapters'import { createTestRunnerAgent } from './agents/coding-test-runner/agent'const report = await createTestRunnerAgent({  adapter: anthropic({ apiKey: process.env.ANTHROPIC_API_KEY!, model: 'claude-opus-4-8' }),}).run(vitestStdout)// → { passed, failed, skipped, duration, failures: [{ test, file, message, rootCause }], summary }

Extend it

Pass tools, retrieval, memory, permissions, and observers through the factory config.

const agent = createCodingTestRunnerAgent({  adapter,  tools,  retriever,  memory,  onConfirm: (call) => approve(call),  observers: [tracer],})

View agent factory source

import type { AdapterFactory, ChatMemory, Observer, ToolCall, ToolDefinition } from '@agentskit/core'import { fenceUntrustedContent, UNTRUSTED_CONTENT_DIRECTIVE } from '@agentskit/core/security'import { invokeStructured } from '@agentskit/runtime'import { defineZodTool } from '@agentskit/tools'import { z } from 'zod'import { zodToJsonSchema } from 'zod-to-json-schema'import type { JSONSchema7 } from 'json-schema'/** * Test Runner — parses raw Vitest stdout/stderr into a TYPED test report: totals plus * per-failure details (test, file, message, one-sentence root-cause hypothesis), grouped * so the next agent can prioritise fixes. It analyses output you already captured; it does * not execute anything itself (no shell access by design). * * ```ts * const report = await createTestRunnerAgent({ adapter }).run(vitestOutput) * ``` */export interface TestFailure {  test: string  file: string  message: string  /** One-sentence root-cause hypothesis. */  rootCause: string}export interface TestReport {  passed: number  failed: number  skipped: number  duration: string  failures: TestFailure[]  summary: string}export interface TestRunnerConfig {  adapter: AdapterFactory  memory?: ChatMemory  observers?: Observer[]  onConfirm?: (toolCall: ToolCall) => boolean | Promise<boolean>  maxSteps?: number}const Report = z.object({  passed: z.number().int().min(0),  failed: z.number().int().min(0),  skipped: z.number().int().min(0),  duration: z.string(),  failures: z.array(z.object({    test: z.string(),    file: z.string(),    message: z.string(),    rootCause: z.string(),  })),  summary: z.string(),})const toJson = (s: z.ZodTypeAny): JSONSchema7 => zodToJsonSchema(s) as JSONSchema7const skill = {  name: 'test-runner',  description: 'Parses raw Vitest output into a typed test report with per-failure root-cause hypotheses.',  systemPrompt: `You analyse raw Vitest stdout/stderr and produce a STRUCTURED test report. Extract: totalpassed, failed, skipped, and duration. For EACH failure: the test name, the file path, the assertionmessage, and a one-sentence root-cause hypothesis. Group failures by suspected root cause so the nextagent can prioritise. Report only what the output shows — do not invent failures or passes.${UNTRUSTED_CONTENT_DIRECTIVE}Call submit_report exactly once with { passed, failed, skipped, duration, failures, summary }. Stop.`,  tools: ['submit_report'],}export function createTestRunnerAgent(config: TestRunnerConfig) {  const emit = (label: string, status: 'start' | 'ok' | 'skip' | 'error', detail?: string) => {    for (const o of config.observers ?? []) void o.on({ type: 'progress', label, status, detail })  }  const submit = (): ToolDefinition =>    defineZodTool({ name: 'submit_report', description: 'Submit the test report. Call exactly once.', schema: Report, toJsonSchema: toJson, async execute() { return 'recorded' } }) as ToolDefinition  async function run(vitestOutput: string): Promise<TestReport> {    if (!vitestOutput?.trim()) throw new Error('test runner requires raw Vitest output')    emit('parse', 'start')    const report = await invokeStructured({      adapter: config.adapter,      tool: submit(),      task: `RAW VITEST OUTPUT:\n${fenceUntrustedContent(vitestOutput)}`,      parse: (a) => Report.parse(a),      skill,      memory: config.memory,      observers: config.observers,      onConfirm: config.onConfirm,      maxSteps: config.maxSteps ?? 3,    })    emit('parse', 'ok', `${report.passed} passed, ${report.failed} failed`)    return report  }  return {    name: 'coding-test-runner',    run,    asHandle() {      return { name: 'coding-test-runner', run: async (task: string) => JSON.stringify(await run(task)) }    },  }}

View evaluation contract

Replay these cases with the provider and model you plan to deploy.

import type { EvalSuite } from '@agentskit/eval'export const suite: EvalSuite = {  name: 'coding-test-runner',  cases: [    {      input: `Spec files run: src/cart/total.test.ts. Vitest output: FAIL  src/cart/total.test.ts > cartTotal > applies the 10% discountAssertionError: expected 90 to be 81 ❯ src/cart/total.test.ts:23:25 Test Files  1 failed (1)      Tests  1 failed | 4 passed (5)   Duration  1.92sProduce the structured report.`,      expected: (r: string) => /total\.test\.ts/i.test(r) && /(rootCause|root cause|cause|hypothes)/i.test(r),    },    {      input: `Spec files run: src/api/users.test.ts, src/api/orders.test.ts. Vitest output: Test Files  2 passed (2)      Tests  17 passed (17)   Duration  3.41sProduce the structured report.`,      expected: (r: string) => /17/.test(r) && /3\.41/.test(r) && /(passed|"passed")/i.test(r),    },    {      input: `Spec files run: src/parse/date.test.ts. Vitest output: FAIL  src/parse/date.test.ts > parseDate > ISO stringTypeError: Cannot read properties of undefined (reading 'getTime') ❯ src/parse/date.test.ts:11:34 FAIL  src/parse/date.test.ts > parseDate > unix epochTypeError: Cannot read properties of undefined (reading 'getTime') ❯ src/parse/date.test.ts:18:34      Tests  2 failed | 1 passed (3)   Duration  0.88sGroup failures by suspected root cause.`,      expected: (r: string) => /date\.test\.ts/i.test(r) && /(undefined|getTime|TypeError)/i.test(r),    },    {      input: `Spec files run: src/queue/worker.test.ts. The Vitest process was killed before producing any summary line — the output is truncated mid-run with no totals. Produce the report.`,      expected: (r: string) => /(incomplete|truncat|no (summary|total)|cannot|killed|unable|missing|escalat)/i.test(r),    },  ],}

Was this agent useful?

Your response helps us prioritize agent quality.

Report a problem Suggest an improvement

Test Runner

Quick start

Validation evidence

What passed review

Reviewer notes

Example

Extend it

Was this agent useful?

Related agents

Code QA

QA Author

Accessibility Auditor

API Contract Reviewer