RegtraceRegtrace

Config File Reference

Complete schema for regtrace.config.yaml

Location

regtrace.config.yaml in the project root. Specify a custom path with --config.

Schema

# Project metadata (optional)
project:
  name: my-eval-project
  version: "1.0.0"

# Golden set entries
golden_sets:
  - path: golden-sets/qa.yaml
    enabled: true
    weight: 1
    store_in_db: true

# Metric configuration
metrics:
  enabled: [factuality, format, tone, regression]
  default_threshold: 0.7
  default_weight: 1

  factuality:
    mode: strict           # strict | lenient | json_structural
    claim_extraction_depth: shallow
    rag_faithfulness_only: false

  format:
    sub_checks:
      length: true
      json_validity: false
      json_schema: false
      markdown_structure: false
      required_fields: true
      forbidden_content: true
      regex_match: false
    length_tolerance: 0.3
    strict_json: false

  tone:
    sub_dimensions:
      formality: true
      sentiment: true
      assertiveness: true
      persona_consistency: true
      verbosity: true

  regression:
    enabled: true
    baseline_strategy: last_passing
    tolerance: 0.05
    critical_threshold: 0.15
    exclude_new_test_cases: true
    branch_aware: true
    fallback_baseline: main

# Judge provider configuration
judge:
  primary:
    provider: anthropic
    model: claude-haiku-4-5-20251001
    temperature: 0.1
    max_tokens: 4096
    timeout_ms: 30000
    retry_attempts: 3
  fallback:
    provider: openai
    model: gpt-5.4-mini-2026-03-17
    temperature: 0.1
    max_tokens: 4096
    timeout_ms: 30000
    retry_attempts: 2

# Generator provider configuration (optional)
# Falls back to judge.primary when absent
generator:
  provider: anthropic
  model: claude-haiku-4-5-20251001
  temperature: 0.4
  max_tokens: 4096
  timeout_ms: 60000
  retry_attempts: 3

# Runtime configuration
run:
  concurrency: 4

# NFR gates (optional)
nfr_gates:
  max_latency_ms: 30000
  max_cost_usd: 0.50
  min_coverage: 0.8

# Quality gates
quality_gates:
  suite_score_minimum: 0.7
  max_failed_test_cases: 0
  max_low_confidence_ratio: 0.1
  regression_gate: true

# Output configuration
output:
  run_history_limit: 50
  default_format: terminal
  color: auto
  ci_mode_auto_detect: true

# Storage configuration (optional)
storage:
  db:
    enabled: false
    path: .regtrace/regtrace.db

Generator block

FieldTypeDefaultDescription
providerstringProvider name: anthropic, openai, gemini, groq, ollama
modelstringModel identifier (e.g. claude-haiku-4-5-20251001)
temperaturenumber0.1Generation temperature (0–2). Higher values increase creativity. Use 0.1 for factual tasks, 0.4 for creative writing.
max_tokensinteger4096Maximum response tokens
timeout_msinteger30000Request timeout in milliseconds
retry_attemptsinteger3Retries on failure

When generator is absent, regtrace run --generate falls back to judge.primary. Generated output is stored in the run record only; golden set YAML files are never modified.

Golden set entry

|---|---|---|---| | path | string | — | Path to golden set YAML file | | enabled | boolean | true | Include this set in evaluations | | weight | number | 1 | Contribution weight in multi-set runs | | store_in_db | boolean | true | Persist runs to SQLite database |

Run configuration

FieldTypeDefaultDescription
concurrencyinteger4Number of test cases evaluated in parallel per batch. Higher values speed up suites with many LLM-judged metrics. Max: 20.

Quality gates reference

GateTypeDefaultDescription
suite_score_minimumnumber0.7Minimum suite aggregate score (0.0–1.0)
metric_score_minimumsobjectPer-metric minimum thresholds. See example below.
max_failed_test_casesinteger0Maximum allowed failed test cases
max_low_confidence_rationumber0.1Max fraction of results with confidence < 0.6
regression_gatebooleantrueFail on critical regression

Example — factuality must score 80%+ and format 60%+:

quality_gates:
  suite_score_minimum: 0.7
  metric_score_minimums:
    factuality: 0.8
    format: 0.6
  max_failed_test_cases: 1
  max_low_confidence_ratio: 0.1
  regression_gate: true

NFR gates

Non-functional requirement gates expand pass/fail beyond output quality to latency, API cost, and test coverage. When any NFR gate fails, the suite fails — same as quality gates.

GateTypeDescription
max_latency_msintegerMaximum wall-clock duration for the full evaluation run (milliseconds)
max_cost_usdnumberMaximum total API cost across all LLM-judged metrics (USD)
min_coveragenumberMinimum fraction of passing test cases (0.0–1.0)

NFR gates are optional. Omit the block entirely to skip all NFR checks.

Output configuration

FieldTypeDefaultDescription
run_history_limitinteger50Number of run records to retain
default_formatstringterminalOutput format: terminal, json, markdown
colorstringautoColor mode: auto, always, never
ci_mode_auto_detectbooleantrueAuto-detect CI environments from env vars
report_pathstringDefault report output path. Overridden by --output flag.

Tone configuration

FieldTypeDefaultDescription
tone_profilestring or nullExpected tone description passed to the LLM judge. E.g. "confident, approachable, professional".
sub_dimensionsobjectall truePer-dimension toggle: formality, sentiment, assertiveness, persona_consistency, verbosity
sub_dimension_weightsobjectCustom weight per dimension. E.g. { formality: 2.0, verbosity: 0.5 }

Example — custom profile per dimension weights:

tone:
  tone_profile: "friendly but professional, concise"
  sub_dimensions:
    formality: true
    sentiment: true
    persona_consistency: true
  sub_dimension_weights:
    formality: 1.5
    verbosity: 0.3

Storage configuration

FieldTypeDefaultDescription
db.enabledbooleanfalseEnable SQLite run record persistence
db.pathstring.regtrace/regtrace.dbDatabase file path

See the database reference for the full schema.

Weighting

Weights form a three-level cascade:

All weights default to 1.

Regression configuration

FieldTypeDefaultDescription
enabledbooleantrueEnable regression detection
baseline_strategystringlast_passinglast_passing or pinned
pinned_run_idstringRun ID to pin as baseline (required when strategy: pinned)
tolerancenumber0.05Score delta threshold for clean regression (±5%)
metric_tolerancesobjectPer-metric tolerance overrides. E.g. {format: 0, factuality: 0.1} means zero format drift allowed but 10% factuality variance OK
critical_thresholdnumber0.15Score delta threshold for critical regression (±15%)
exclude_new_test_casesbooleantrueExclude new test case IDs from regression comparison
branch_awarebooleantrueAuto-detect current git branch. Feature branches fall back to fallback_baseline.
fallback_baselinestringmainBranch name to use as fallback baseline when no baseline exists for the current branch

Required blocks

These blocks must always be present in the config. regtrace init populates them with defaults:

BlockWhy required
metrics.toneZod schema requires it. Disable all sub-dimensions to skip tone evaluation.
metrics.factualityRequired. Defaults: strict mode, shallow extraction depth.
metrics.formatRequired. Defaults: all 7 sub-checks enabled.
metrics.regressionRequired. Defaults: last_passing strategy.

Validation

Regtrace validates the config file at startup:

  • golden_sets must be an array
  • Each golden set entry must have path (string) and enabled (boolean)
  • provider must be one of: anthropic, openai, gemini, groq, ollama
  • default_format must be one of: terminal, json, markdown
  • Numbers outside expected ranges produce helpful error messages
  • Missing required blocks produce clear error messages naming the missing field

On this page