Skip to content

Vally

Vally

Stop guessing.
Start grading.

Vally is the modular platform for agent evaluation.
Define stimuli, capture trajectories, grade behavior — and catch regressions before they reach production.

See it in action

One CLI. Every platform

Terminal — vally eval
Dashboard & API

Explore results. Spot regressions.

Spin up the built-in dashboard to explore scores, catch regressions, and track trends over time. Or go further — the Results API puts every eval metric within reach of your pipelines, your workflows, and your agents.

Vally dashboard showing pass rate stats and duration chartVally dashboard score matrix and all outcomes table

Why Vally?


Fast by default

Static graders finish in under 2 seconds. LLM judges are opt-in. You get instant feedback during development and full confidence in CI.

Everything is a Grader

File checks, string matching, LLM judges, custom scripts — one interface. Write a grader once, compose it into any eval.

One pipeline, every loop

Same model from local dev to CI to production. Define stimuli, capture trajectories, grade with any combination.

Extend, don’t fork

Custom graders, executors, and reporters plug in cleanly. Bring your own scoring logic without changing the framework.

Ready to grade your agent?