CLI: grade
vally eval --output jsonl | vally grade --eval-spec eval.yaml [options]vally grade --eval-spec eval.yaml [options] < outcomes.jsonlDescription
Section titled “Description”Grade trajectories read from stdin. Accepts trial-result JSONL from vally eval --output jsonl, legacy EvalOutcome JSONL, a single EvalOutcome JSON object from eval, ATIF, or a session run from an agent like Copilot CLI. The input format is auto-detected from the JSON shape. This lets you iterate on graders without re-running the (expensive) agent execution.
The command reads from stdin and matches each trajectory’s stimulus to the graders defined in the eval spec.
Options
Section titled “Options”| Flag | Type | Required | Description |
|---|---|---|---|
--eval-spec <path> | string | Yes | Path to eval spec file |
--stimulus <name> | string | No | Stimulus name to grade against (ATIF or session run input only; required when prompt matches multiple stimuli) |
--workspace <path> | string | No | Path to the agent workspace directory for file-based graders (ATIF or session run input only; defaults to the directory in the session run, or process.cwd() for ATIF) |
--output jsonl | string | No | Emit graded results as JSONL (one outcome per line) |
--judge-model <model> | string | No | Model for LLM judge graders |
--grader-plugin <specifier> | string | No | Grader plugin to load (npm package name or local path) |
--verbose | boolean | No | Show detailed grader evidence |
Exit codes
Section titled “Exit codes”| Code | Meaning |
|---|---|
0 | All graders passed |
1 | One or more graders failed, or an error occurred |
Examples
Section titled “Examples”# Pipe directly from evalvally eval --eval-spec eval.yaml --skip-grade --output jsonl \ | vally grade --eval-spec eval.yaml
# Grade a saved JSONL filevally grade --eval-spec eval.yaml < results/outcomes.jsonl
# Grade with verbose outputvally grade --eval-spec eval.yaml --verbose < results/outcomes.jsonl
# Grade and re-emit as JSONL (for downstream processing)vally grade --eval-spec eval.yaml --output jsonl < results/outcomes.jsonl > graded.jsonl
# Grade an ATIF trajectory (auto-detected from schema_version)vally grade --eval-spec eval.yaml < trajectory.json
# Grade ATIF with explicit stimulus and workspacevally grade --eval-spec eval.yaml --stimulus basic --workspace /path/to/workspace < trajectory.json
# Grade a Copilot CLI session run, by id (auto-resolves from ~/.copilot/session-state/)echo '{"sessionKind":"copilot-cli","sessionId":"abc123"}' | vally grade --eval-spec eval.yaml
# Grade a Copilot CLI session run by explicit pathecho '{"sessionKind":"copilot-cli","sessionPath":"/path/to/events.jsonl"}' | vally grade --eval-spec eval.yaml
# Grade a Copilot CLI session run with explicit stimulus and workspaceecho '{"sessionKind":"copilot-cli","sessionId":"abc123"}' | \ vally grade --eval-spec eval.yaml --stimulus task-name --workspace /path/to/workspaceInput formats
Section titled “Input formats”The grade command auto-detects the input shape from the JSON content and accepts:
- trial-result JSONL — current
vally eval --output jsonloutput, includingtrial-resultrecords and typed metadata such asrun-summary - Legacy EvalOutcome JSONL — one JSON object per line, as emitted by older
eval --output jsonlflows - Single EvalOutcome JSON — a whole-file JSON object
- ATIF JSON — an Agent Trajectory Interchange Format document (identified by
schema_version: "ATIF-...") - Session Run JSON — an explicit session run reference (
sessionKind: "copilot-cli"with eithersessionId, set to the session ID orsessionPath, an explicit path to a copilot session log)
Session Run format
Section titled “Session Run format”A session run provides a reference to Copilot CLI event logs:
{ "sessionKind": "copilot-cli", "sessionId": "abc123"}Or with an explicit path:
{ "sessionKind": "copilot-cli", "sessionPath": "/path/to/events.jsonl"}When sessionId is provided, the events file is auto-resolved from ~/.copilot/session-state/<sessionId>/events.jsonl. Specify sessionPath to provide an explicit location. Do not set both sessionId and sessionPath.
Mixing formats
Section titled “Mixing formats”ATIF documents and EvalOutcome records must not be mixed in the same stream; grade will exit with an error if both shapes appear together. Session runs cannot be mixed with EvalOutcomes or ATIF.
Output format
Section titled “Output format”━━━ basic-test-generation ━━━✅ basic-test-generation (2/2 graders passed) ✓ [file-exists] Files matching 'add.test.js' found: add.test.js ✓ [output-contains] 'test' found in output
Score: 100.0% | PASSEDOn failure:
━━━ basic-test-generation ━━━❌ basic-test-generation (1/2 graders passed, 1 failed) ✓ [output-contains] 'test' found in output ✗ [file-exists] No files matching 'add.test.js' found
Score: 33.3% | FAILEDWorkflow
Section titled “Workflow”The typical workflow with grade:
- Run eval with
--skip-grade --output jsonl > outcomes.jsonlto capture trajectories - Iterate on your
eval.yamlgraders - Re-grade:
vally grade --eval-spec eval.yaml < outcomes.jsonl - Repeat until graders work correctly, then run the full eval