How Regtrace Works

Regtrace is a CLI tool that compares LLM outputs against golden set expectations and reports whether quality is maintained or degraded.

High-level flow

Each run is:

Load — read config and golden set files
Validate — check schema, semantic rules, no all-null actual_output
Evaluate — for each test case, run each enabled metric
Aggregate — combine scores using weighted averaging
Check gates — compare quality metrics against configured thresholds
Report — print to terminal (stderr), JSON (stdout), or Markdown (file)
Persist — save the run record to .regtrace/runs/

Evaluation pipeline

For each test case:

Each evaluator produces a score (0.0–1.0), a pass/fail decision, confidence level, and an explanation.

Regression comparison

After evaluation, Regtrace checks for an existing baseline:

Quality gates

Quality gates are final pass/fail checks:

Run records are JSON files stored in .regtrace/runs/. Each file contains the full evaluation result, config snapshot, and metadata. Stored runs allow listing, history review, and regression comparison.

How Regtrace Works

High-level flow

Evaluation pipeline

Regression comparison

Quality gates

Run storage

Architecture layers

On this page