Glossary
Agent
Runnable logic that processes an input.
Dataset
A collection of test entries, examples, or rules used during evaluation.
Eval
A named evaluation setup that connects agents and specifications.
Judge
A scoring configuration used when correctness requires interpretation.
Organization
The top-level shared workspace for members, projects, and collaboration.
Playground
A faster exploratory run surface for trying configurations and generating derivative datasets.
Project
The main container for assets and results.
Result
A recorded output, score, or status produced during a run.
Run
A single execution of an eval or playground workflow.
Secret
A protected project-level value used by an agent at runtime.
Specification
A reusable evaluation recipe that defines the datasets, attack methods, and related scoring setup for an eval.