Examples Overview
Ready-to-run Regtrace examples organized by evaluation pattern
The examples/ directory contains ready-to-run Regtrace projects organized into four categories based on how they operate.
How to run
cd examples/<name>
regtrace run # for zero-setup and LLM-judge examples
regtrace run --generate # for generate-mode examplesEach example's README specifies which command to use and whether it needs
ANTHROPIC_API_KEY in .env.
All LLM-powered examples use a single Claude model (claude-haiku-4-5-20251001).
No other provider API keys needed.
Examples
A: Zero-Setup (no API key needed)
Deterministic format checks — markdown structure, JSON validity, forbidden content. Runs immediately, zero cost, no external calls.
| Example | Scenario | Format sub-checks |
|---|---|---|
| code-generation | AI coding assistant output | markdown_structure, required_fields, forbidden_content |
| content-moderation | Toxicity filtering pipeline | required_fields, forbidden_content, length |
| intent-classification | Chatbot intent routing | json_validity, json_schema |
| data-extraction | Invoice extraction from text | json_validity, json_schema |
B: Generate Mode (needs ANTHROPIC_API_KEY)
Regtrace calls Claude to generate actual_output for each test case, then
evaluates against expected_output. Uses regtrace run --generate.
| Example | Metrics | What gets generated |
|---|---|---|
| customer-support | format | Support email responses |
| email-drafting | format, tone | Sales & support emails |
| translation | format | Spanish product descriptions |
| summarization | format | News article summaries |
C: LLM Judge Evaluation (needs ANTHROPIC_API_KEY)
Pre-provided outputs are scored by the LLM judge for tone, factuality, and
format. Uses regtrace run.
| Example | Enabled metrics | What the judge evaluates |
|---|---|---|
| content-generation | format, tone | Brand voice: formality, sentiment, persona consistency |
| rag-documentation | format, factuality | RAG faithfulness to API documentation |
D: Advanced Config (needs ANTHROPIC_API_KEY)
Showcases deeper configuration: strict factuality modes, per-metric quality gates, and combining generate mode with RAG context.
| Example | Pattern | Config highlight |
|---|---|---|
| rag-legal | Strict factuality + per-metric gates | factuality.mode: strict, metric_score_minimums |
| rag-product | Generate mode + RAG context | interaction_type: rag + generator block |
Feature coverage
- Quality gates — all examples configure
suite_score_minimum,max_failed_test_cases,regression_gate; rag-legal adds per-metric gates - Regression tracking — every run stores a baseline for delta comparison
- Generate mode — five examples use the
generatorconfig block withregtrace run --generate - Format sub-checks — length, JSON validity, JSON schema, markdown structure, required fields, forbidden content
- Tone evaluation — two examples use LLM-judged tone with five sub-dimensions
- Factuality evaluation — three examples use LLM-judged factuality for RAG faithfulness and claim verification
- RAG interaction type — three examples demonstrate
context.documentswith retrieval scores and context-based evaluation
Next steps
- Pick a Zero-Setup example to try without any API key
- Add
ANTHROPIC_API_KEYand try a Generate Mode example - Run
regtrace run --format jsonfor machine-readable output - Clone an example as a starting point for your own evaluation project