Examples Overview

The examples/ directory contains ready-to-run Regtrace projects organized into four categories based on how they operate.

How to run

cd examples/<name>
regtrace run          # for zero-setup and LLM-judge examples
regtrace run --generate  # for generate-mode examples

Each example's README specifies which command to use and whether it needs ANTHROPIC_API_KEY in .env.

All LLM-powered examples use a single Claude model (claude-haiku-4-5-20251001). No other provider API keys needed.

Deterministic format checks — markdown structure, JSON validity, forbidden content. Runs immediately, zero cost, no external calls.

Example	Scenario	Format sub-checks
code-generation	AI coding assistant output	`markdown_structure`, `required_fields`, `forbidden_content`
content-moderation	Toxicity filtering pipeline	`required_fields`, `forbidden_content`, `length`
intent-classification	Chatbot intent routing	`json_validity`, `json_schema`
data-extraction	Invoice extraction from text	`json_validity`, `json_schema`

Regtrace calls Claude to generate actual_output for each test case, then evaluates against expected_output. Uses regtrace run --generate.

Example	Metrics	What gets generated
customer-support	format	Support email responses
email-drafting	format, tone	Sales & support emails
translation	format	Spanish product descriptions
summarization	format	News article summaries

Pre-provided outputs are scored by the LLM judge for tone, factuality, and format. Uses regtrace run.

Example	Enabled metrics	What the judge evaluates
content-generation	format, tone	Brand voice: formality, sentiment, persona consistency
rag-documentation	format, factuality	RAG faithfulness to API documentation

Showcases deeper configuration: strict factuality modes, per-metric quality gates, and combining generate mode with RAG context.

Example	Pattern	Config highlight
rag-legal	Strict factuality + per-metric gates	`factuality.mode: strict`, `metric_score_minimums`
rag-product	Generate mode + RAG context	`interaction_type: rag` + `generator` block

Quality gates — all examples configure suite_score_minimum, max_failed_test_cases, regression_gate; rag-legal adds per-metric gates
Regression tracking — every run stores a baseline for delta comparison
Generate mode — five examples use the generator config block with regtrace run --generate
Format sub-checks — length, JSON validity, JSON schema, markdown structure, required fields, forbidden content
Tone evaluation — two examples use LLM-judged tone with five sub-dimensions
Factuality evaluation — three examples use LLM-judged factuality for RAG faithfulness and claim verification
RAG interaction type — three examples demonstrate context.documents with retrieval scores and context-based evaluation