Add to CI
This quickstart adds skill validation to your CI pipeline so every PR gets checked automatically.
Run vally lint as an initial CI gate on every PR — it also belongs in your dev loop (pre-commit, save-time, or IDE integration). Once lint passes, run vally eval for behavioral checks before merge.
-
Add the lint step (every PR)
Create
.github/workflows/skill-lint.yml:.github/workflows/skill-lint.yml name: Skill Linton:pull_request:paths:- "**/SKILL.md"- "**/*.yaml"jobs:lint:runs-on: ubuntu-lateststeps:- uses: actions/checkout@v4- uses: actions/setup-node@v4with:node-version: "22"- run: npm install -g @microsoft/vally-cli- name: Lint skillsrun: vally lint .- name: Validate eval specsrun: vally lint --eval-spec eval.yaml --strictThis lint pass catches:
- Invalid SKILL.md format (name, description, frontmatter)
- Broken file references
- Orphaned reference files
-
Add eval for behavioral checks (every PR)
Once lint passes, run agent-level evals before merge:
.github/workflows/skill-eval.yml name: Skill Evalon:pull_request:paths:- "**/SKILL.md"- "**/eval.yaml"jobs:eval:runs-on: ubuntu-lateststeps:- uses: actions/checkout@v4- uses: actions/setup-node@v4with:node-version: "22"- run: npm install -g @microsoft/vally-cli- name: Run evalsrun: |vally eval \--eval-spec eval.yaml \--skill-dir . \--output-dir ./resultsenv:GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}- name: Upload trajectoriesif: always()uses: actions/upload-artifact@v4with:name: eval-resultspath: ./results/ -
Use exit codes for gating
Both
lintandevalsetprocess.exitCode = 1on failure, so GitHub Actions will automatically mark the step as failed.vally lint→ exits 1 if any skill fails validationvally eval→ exits 1 if any eval verdict fails, unless you intentionally lower the threshold for that run (for example,--threshold 0); execution and tooling errors still exit 1
Running Specific Suites
Section titled “Running Specific Suites”Define suites in .vally.yaml to run different eval subsets in different CI contexts:
suites: ci-gate: filter: { priority: [p0, p1] } nightly: filter: { priority: [p0, p1, p2] }# .github/workflows/ci.yml — fast suite on every PR- name: Run CI gate evals run: vally eval --suite ci-gate
# .github/workflows/nightly.yml — comprehensive nightly run- name: Run nightly evals run: vally eval --suite nightlyThis keeps your CI fast while ensuring comprehensive coverage in scheduled runs.
Next steps
Section titled “Next steps”- Grader taxonomy — pick graders by determinism and cost
- Writing eval specs — design better stimuli