CLI: experiment
vally experiment run <experiment-file> [options]Description
Section titled “Description”Resolve and execute an experiment file — a declarative spec that runs the same eval set across multiple variants (e.g. different models, skill sets, MCP server configurations) for controlled A/B comparison.
Every (eval × variant) combination flows through a single shared worker pool, so one slow variant does not block the others. Each variant gets its own output directory containing results.jsonl (one trial-result record per trial, with an experiment provenance block) and run-summary.jsonl. A single experiment-level markdown report (report.md) sits alongside the per-variant subdirectories.
See Experiment File Schema for the YAML format.
Options
Section titled “Options”| Flag | Type | Default | Description |
|---|---|---|---|
<experiment-file> | path | — | Required positional argument. Path to the experiment YAML file. |
--variant <name> | string | — | Run only the named variant. Useful for CI splits or partial reruns. |
--output-dir <path> | path | ./vally-experiment-results | Directory for output files. A timestamped subdirectory is created inside. |
--workers <n> | integer | 5 | Max concurrent trials across all variants. |
--dry-run | flag | false | Resolve the experiment and print the plan without executing. |
Exit codes
Section titled “Exit codes”| Code | Meaning |
|---|---|
0 | All variants completed and every trial passed. |
1 | Any trial failed, the experiment file did not resolve, or --variant named an unknown variant. |
Output layout
Section titled “Output layout”<output-dir>/└── 2026-06-09T20-41-10-885Z/ # timestamped run directory ├── report.md # experiment-level markdown report ├── baseline/ # one subdirectory per variant │ ├── results.jsonl # trial-result records + experiment block │ └── run-summary.jsonl └── treatment/ ├── results.jsonl └── run-summary.jsonlEach results.jsonl line is a TrialResultRecord (the same shape vally eval writes) with an extra experiment field carrying the run ID, variant name, and content hashes of the eval and resolved config.
Examples
Section titled “Examples”# Run every variantvally experiment run experiments/skill-comparison.yaml
# Run a single variant (e.g. for a CI matrix split)vally experiment run experiments/skill-comparison.yaml --variant treatment
# Print the resolved plan without executingvally experiment run experiments/skill-comparison.yaml --dry-run
# Custom output locationvally experiment run experiments/skill-comparison.yaml \ --output-dir ./runs/skill-comparison
# Run a single cell of a matrix experiment (names are generated as axis=label,…)vally experiment run experiments/model-x-skills.yaml \ --variant "model=gpt-5.5,skills=none"Workflow
Section titled “Workflow”- Author an experiment YAML file declaring the eval set, the
varyaxis, and the named variants — or amatrixof axes that expands into them automatically. See Experiment File Schema. - Run
vally experiment run <file>(optionally with--dry-runfirst to inspect the plan). - Inspect per-variant
results.jsonldirectly, ingest the run directory withvally ingest, or browse it viavally serve.