Skip to main content

Evaluation Methods

Spec27 supports several ways to determine whether an output passed.

Strict equality

Use strict equality when the output must match the expected answer exactly.

Best for:

  • deterministic responses
  • exact answer checks
  • baseline validation

Permitted values

Use permitted values when multiple outputs are acceptable, but the set is still constrained.

Best for:

  • fixed labels
  • multiple approved variants
  • simple classification-style outputs

Judge-based scoring

Use judge-based scoring when correctness depends on interpretation rather than exact matching.

Best for:

  • rubric-based reviews
  • nuanced policy checks
  • outputs where explanation and scoring matter

Judge-based runs can include a structured score, explanation, and vote details.

How to choose

  • Start with strict equality when you can.
  • Use permitted values when exact matching is too rigid.
  • Use judge-based scoring when human-like judgment is required.