Skip to content

CLI: experiment

Terminal window
vally experiment run <experiment-file> [options]

Resolve and execute an experiment file — a declarative spec that runs the same eval set across multiple variants (e.g. different models, skill sets, MCP server configurations) for controlled A/B comparison.

Every (eval × variant) combination flows through a single shared worker pool, so one slow variant does not block the others. Each variant gets its own output directory containing results.jsonl (one trial-result record per trial, with an experiment provenance block) and run-summary.jsonl. A single experiment-level markdown report (report.md) sits alongside the per-variant subdirectories.

See Experiment File Schema for the YAML format.

FlagTypeDefaultDescription
<experiment-file>pathRequired positional argument. Path to the experiment YAML file.
--variant <name>stringRun only the named variant. Useful for CI splits or partial reruns.
--output-dir <path>path./vally-experiment-resultsDirectory for output files. A timestamped subdirectory is created inside.
--workers <n>integer5Max concurrent trials across all variants.
--dry-runflagfalseResolve the experiment and print the plan without executing.
CodeMeaning
0All variants completed and every trial passed.
1Any trial failed, the experiment file did not resolve, or --variant named an unknown variant.
<output-dir>/
└── 2026-06-09T20-41-10-885Z/ # timestamped run directory
├── report.md # experiment-level markdown report
├── baseline/ # one subdirectory per variant
│ ├── results.jsonl # trial-result records + experiment block
│ └── run-summary.jsonl
└── treatment/
├── results.jsonl
└── run-summary.jsonl

Each results.jsonl line is a TrialResultRecord (the same shape vally eval writes) with an extra experiment field carrying the run ID, variant name, and content hashes of the eval and resolved config.

Terminal window
# Run every variant
vally experiment run experiments/skill-comparison.yaml
# Run a single variant (e.g. for a CI matrix split)
vally experiment run experiments/skill-comparison.yaml --variant treatment
# Print the resolved plan without executing
vally experiment run experiments/skill-comparison.yaml --dry-run
# Custom output location
vally experiment run experiments/skill-comparison.yaml \
--output-dir ./runs/skill-comparison
# Run a single cell of a matrix experiment (names are generated as axis=label,…)
vally experiment run experiments/model-x-skills.yaml \
--variant "model=gpt-5.5,skills=none"
  1. Author an experiment YAML file declaring the eval set, the vary axis, and the named variants — or a matrix of axes that expands into them automatically. See Experiment File Schema.
  2. Run vally experiment run <file> (optionally with --dry-run first to inspect the plan).
  3. Inspect per-variant results.jsonl directly, ingest the run directory with vally ingest, or browse it via vally serve.