RegtraceRegtrace

Examples Overview

Ready-to-run Regtrace examples organized by evaluation pattern

The examples/ directory contains ready-to-run Regtrace projects organized into four categories based on how they operate.

How to run

cd examples/<name>
regtrace run          # for zero-setup and LLM-judge examples
regtrace run --generate  # for generate-mode examples

Each example's README specifies which command to use and whether it needs ANTHROPIC_API_KEY in .env.

All LLM-powered examples use a single Claude model (claude-haiku-4-5-20251001). No other provider API keys needed.

Examples

A: Zero-Setup (no API key needed)

Deterministic format checks — markdown structure, JSON validity, forbidden content. Runs immediately, zero cost, no external calls.

ExampleScenarioFormat sub-checks
code-generationAI coding assistant outputmarkdown_structure, required_fields, forbidden_content
content-moderationToxicity filtering pipelinerequired_fields, forbidden_content, length
intent-classificationChatbot intent routingjson_validity, json_schema
data-extractionInvoice extraction from textjson_validity, json_schema

B: Generate Mode (needs ANTHROPIC_API_KEY)

Regtrace calls Claude to generate actual_output for each test case, then evaluates against expected_output. Uses regtrace run --generate.

ExampleMetricsWhat gets generated
customer-supportformatSupport email responses
email-draftingformat, toneSales & support emails
translationformatSpanish product descriptions
summarizationformatNews article summaries

C: LLM Judge Evaluation (needs ANTHROPIC_API_KEY)

Pre-provided outputs are scored by the LLM judge for tone, factuality, and format. Uses regtrace run.

ExampleEnabled metricsWhat the judge evaluates
content-generationformat, toneBrand voice: formality, sentiment, persona consistency
rag-documentationformat, factualityRAG faithfulness to API documentation

D: Advanced Config (needs ANTHROPIC_API_KEY)

Showcases deeper configuration: strict factuality modes, per-metric quality gates, and combining generate mode with RAG context.

ExamplePatternConfig highlight
rag-legalStrict factuality + per-metric gatesfactuality.mode: strict, metric_score_minimums
rag-productGenerate mode + RAG contextinteraction_type: rag + generator block

Feature coverage

  • Quality gates — all examples configure suite_score_minimum, max_failed_test_cases, regression_gate; rag-legal adds per-metric gates
  • Regression tracking — every run stores a baseline for delta comparison
  • Generate mode — five examples use the generator config block with regtrace run --generate
  • Format sub-checks — length, JSON validity, JSON schema, markdown structure, required fields, forbidden content
  • Tone evaluation — two examples use LLM-judged tone with five sub-dimensions
  • Factuality evaluation — three examples use LLM-judged factuality for RAG faithfulness and claim verification
  • RAG interaction type — three examples demonstrate context.documents with retrieval scores and context-based evaluation

Next steps

  • Pick a Zero-Setup example to try without any API key
  • Add ANTHROPIC_API_KEY and try a Generate Mode example
  • Run regtrace run --format json for machine-readable output
  • Clone an example as a starting point for your own evaluation project

On this page