Mental Model

Spec27 is easiest to understand when you think of it as a chain of reusable assets.

Organization

An organization is the top-level shared workspace. Membership, roles, and invite links are managed here.

A project is the main container for evaluation work. It holds the assets and results for one stream of work.

Datasets hold test cases, examples, or rules.
Agents hold runnable logic.
Secrets provide protected values for agents.
Judges score outputs when simple equality checks are not enough.
Specifications define what to test and how to generate or include adversarial inputs.
Evals connect agents and specifications into a runnable evaluation setup.

A run is a single execution of an eval or playground workflow.
Results are the recorded outputs, scores, statuses, and logs produced by that run.

A Specification is the reusable definition of the test surface for an eval. It can include:

The same specification can be reused across multiple evals.

Organization -> Project -> Datasets / Agents / Secrets / Judges -> Specification -> Eval -> Run -> Results