Glossary

Agent

Runnable logic that processes an input.

A collection of test entries, examples, or rules used during evaluation.

A named evaluation setup that connects agents and specifications.

A scoring configuration used when correctness requires interpretation.

The top-level shared workspace for members, projects, and collaboration.

A faster exploratory run surface for trying configurations and generating derivative datasets.

The main container for assets and results.

A recorded output, score, or status produced during a run.

A single execution of an eval or playground workflow.

A protected project-level value used by an agent at runtime.

A reusable evaluation recipe that defines the datasets, attack methods, and related scoring setup for an eval.