Run Your First Evaluation
Purpose
Complete the first end-to-end Spec27 workflow and produce a run you can inspect later.
When to use it
Use this guide when you want a dependable baseline evaluation, not just a quick one-off preview.
Prerequisites
- You have a project.
- You have at least one Dataset and one Agent.
- You understand what a Specification is at a high level.
Steps
- Open your project and create or confirm a Dataset with the test inputs you want to run.
- Create an Agent that can process those inputs.
- If the agent depends on protected values, add the required Secrets first.
- Create a Specification that points at the primary dataset and any attack methods or adversarial dataset selections you want to include.
- Create an Eval that links the agent and the specification.
- Open the eval detail page and start a run.
- Wait for the run to complete, then open the run detail page.
- Review output, correctness, status, and any console logs captured during execution.
Expected result
You have a completed run with persisted results that can be reviewed from the project’s results views.
If you want a faster dry run instead
Use Playground when you want to try a configuration quickly before starting a more formal eval run.