Skip to main content

Mental Model

Spec27 is easiest to understand when you think of it as a chain of reusable assets.

Organization

An organization is the top-level shared workspace. Membership, roles, and invite links are managed here.

Project

A project is the main container for evaluation work. It holds the assets and results for one stream of work.

Assets inside a project

  • Datasets hold test cases, examples, or rules.
  • Agents hold runnable logic.
  • Secrets provide protected values for agents.
  • Judges score outputs when simple equality checks are not enough.
  • Specifications define what to test and how to generate or include adversarial inputs.
  • Evals connect agents and specifications into a runnable evaluation setup.

Runs and results

  • A run is a single execution of an eval or playground workflow.
  • Results are the recorded outputs, scores, statuses, and logs produced by that run.

The most important idea: Specifications are reusable

A Specification is the reusable definition of the test surface for an eval. It can include:

  • a primary dataset
  • one or more attack methods
  • one or more adversarial dataset selections
  • an optional judge-based evaluation setup

The same specification can be reused across multiple evals.

Visual flow

Organization -> Project -> Datasets / Agents / Secrets / Judges -> Specification -> Eval -> Run -> Results