Mental Model
Spec27 is easiest to understand when you think of it as a chain of reusable assets.
Organization
An organization is the top-level shared workspace. Membership, roles, and invite links are managed here.
Project
A project is the main container for evaluation work. It holds the assets and results for one stream of work.
Assets inside a project
- Datasets hold test cases, examples, or rules.
- Agents hold runnable logic.
- Secrets provide protected values for agents.
- Judges score outputs when simple equality checks are not enough.
- Specifications define what to test and how to generate or include adversarial inputs.
- Evals connect agents and specifications into a runnable evaluation setup.
Runs and results
- A run is a single execution of an eval or playground workflow.
- Results are the recorded outputs, scores, statuses, and logs produced by that run.
The most important idea: Specifications are reusable
A Specification is the reusable definition of the test surface for an eval. It can include:
- a primary dataset
- one or more attack methods
- one or more adversarial dataset selections
- an optional judge-based evaluation setup
The same specification can be reused across multiple evals.
Visual flow
Organization -> Project -> Datasets / Agents / Secrets / Judges -> Specification -> Eval -> Run -> Results