Vally
Vally
Stop guessing.
Start grading.
Vally is the modular platform for agent evaluation.
Define stimuli, capture trajectories, grade behavior — and catch regressions before they reach
production.
One CLI. Every platform
Explore results. Spot regressions.
Spin up the built-in dashboard to explore scores, catch regressions, and track trends over time. Or go further — the Results API puts every eval metric within reach of your pipelines, your workflows, and your agents.


Why Vally?
Fast by default
Static graders finish in under 2 seconds. LLM judges are opt-in. You get instant feedback during development and full confidence in CI.
Everything is a Grader
File checks, string matching, LLM judges, custom scripts — one interface. Write a grader once, compose it into any eval.
One pipeline, every loop
Same model from local dev to CI to production. Define stimuli, capture trajectories, grade with any combination.
Extend, don’t fork
Custom graders, executors, and reporters plug in cleanly. Bring your own scoring logic without changing the framework.