Skip to main content

Glossary

Agent

Runnable logic that processes an input.

Dataset

A collection of test entries, examples, or rules used during evaluation.

Eval

A named evaluation setup that connects agents and specifications.

Judge

A scoring configuration used when correctness requires interpretation.

Organization

The top-level shared workspace for members, projects, and collaboration.

Playground

A faster exploratory run surface for trying configurations and generating derivative datasets.

Project

The main container for assets and results.

Result

A recorded output, score, or status produced during a run.

Run

A single execution of an eval or playground workflow.

Secret

A protected project-level value used by an agent at runtime.

Specification

A reusable evaluation recipe that defines the datasets, attack methods, and related scoring setup for an eval.