RegtraceRegtrace

Golden Set Reference

Complete schema for golden set YAML files

Synopsis

Golden sets define evaluation test cases in YAML format. Each file is loaded by a golden_sets entry in regtrace.config.yaml.

File-level fields

FieldTypeRequiredDefaultDescription
namestringYesHuman-readable identifier
versionstringYesSemantic version (e.g. "1.0.0")
descriptionstringYesWhat this golden set evaluates
interaction_typestringYessingle_turn or rag
tagsstring[]Yes[]For filtering during runs
authorstringYesWho created this set
created_atstring (date)YesISO 8601 date
updated_atstring (date)YesISO 8601 date
test_casesarrayYesArray of test case objects

Test case fields

FieldTypeRequiredDefaultDescription
idstringYesUnique within golden set, stable across versions
descriptionstringYesOne-line summary of what this tests
inputstringYesPrompt or user message sent to the model
system_promptstring or nullYesSystem instruction in effect during generation
expected_outputstringYesGround truth ideal response
actual_outputstring or nullYesModel's actual output; null at authoring time
metricsstring[]YesMetrics to evaluate (e.g. [factuality, format])
tagsstring[]Yes[]Case-level tags for filtering
weightnumberYes1Relative importance in suite scoring
thresholdsobjectNo{}Per-metric threshold overrides

Threshold overrides

- id: qa-001
  input: "..."
  expected_output: "..."
  metrics: [factuality, format, tone]
  thresholds:
    factuality: 0.9

This overrides the global default_threshold for factuality on this case.

RAG test case fields

RAG test cases (when interaction_type: rag) require a context block:

- id: rag-001
  input: "What is the API rate limit?"
  context:
    documents:
      - source: "docs/api-reference.md"
        content: "The API rate limit is 500 requests per minute."
        retrieval_score: 0.94
  expected_output: "500 requests per minute"
  actual_output: null

Context document fields

FieldTypeRequiredDefaultDescription
sourcestringYesDocument identifier (URL, file path)
contentstringYesThe retrieved text
retrieval_scorenumberNoRetriever similarity score

Validation rules

  • id must be unique within the file
  • metrics values must be recognized metric names (factuality, format, tone, regression)
  • interaction_type must be single_turn or rag
  • RAG test cases must have a non-null context with at least one document
  • weight must be a positive number
  • thresholds keys must match metric names

If any validation rule is violated, Regtrace exits with a specific error message identifying the file and the exact problem.

On this page