Skip to main content

Run Your First Evaluation

Purpose

Complete the first end-to-end Spec27 workflow and produce a run you can inspect later.

When to use it

Use this guide when you want a dependable baseline evaluation, not just a quick one-off preview.

Prerequisites

You have a project.
You have at least one Dataset and one Agent.
You understand what a Specification is at a high level.

Steps

Open your project and create or confirm a Dataset with the test inputs you want to run.
Create an Agent that can process those inputs.
If the agent depends on protected values, add the required Secrets first.
Create a Specification that points at the primary dataset and any attack methods or adversarial dataset selections you want to include.
Create an Eval that links the agent and the specification.
Open the eval detail page and start a run.
Wait for the run to complete, then open the run detail page.
Review output, correctness, status, and any console logs captured during execution.

Expected result

You have a completed run with persisted results that can be reviewed from the project’s results views.

If you want a faster dry run instead

Use Playground when you want to try a configuration quickly before starting a more formal eval run.

Purpose
When to use it
Prerequisites
Steps
Expected result
If you want a faster dry run instead
Related pages