Skip to content

Add to CI

This quickstart adds skill validation to your CI pipeline so every PR gets checked automatically.

Run vally lint as an initial CI gate on every PR — it also belongs in your dev loop (pre-commit, save-time, or IDE integration). Once lint passes, run vally eval for behavioral checks before merge.

  1. Add the lint step (every PR)

    Create .github/workflows/skill-lint.yml:

    .github/workflows/skill-lint.yml
    name: Skill Lint
    on:
    pull_request:
    paths:
    - "**/SKILL.md"
    - "**/*.yaml"
    jobs:
    lint:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
    with:
    node-version: "22"
    - run: npm install -g @microsoft/vally-cli
    - name: Lint skills
    run: vally lint .
    - name: Validate eval specs
    run: vally lint --eval-spec eval.yaml --strict

    This lint pass catches:

    • Invalid SKILL.md format (name, description, frontmatter)
    • Broken file references
    • Orphaned reference files
  2. Add eval for behavioral checks (every PR)

    Once lint passes, run agent-level evals before merge:

    .github/workflows/skill-eval.yml
    name: Skill Eval
    on:
    pull_request:
    paths:
    - "**/SKILL.md"
    - "**/eval.yaml"
    jobs:
    eval:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
    with:
    node-version: "22"
    - run: npm install -g @microsoft/vally-cli
    - name: Run evals
    run: |
    vally eval \
    --eval-spec eval.yaml \
    --skill-dir . \
    --output-dir ./results
    env:
    GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
    - name: Upload trajectories
    if: always()
    uses: actions/upload-artifact@v4
    with:
    name: eval-results
    path: ./results/
  3. Use exit codes for gating

    Both lint and eval set process.exitCode = 1 on failure, so GitHub Actions will automatically mark the step as failed.

    • vally lint → exits 1 if any skill fails validation
    • vally eval → exits 1 if any eval verdict fails, unless you intentionally lower the threshold for that run (for example, --threshold 0); execution and tooling errors still exit 1

Define suites in .vally.yaml to run different eval subsets in different CI contexts:

.vally.yaml
suites:
ci-gate:
filter: { priority: [p0, p1] }
nightly:
filter: { priority: [p0, p1, p2] }
# .github/workflows/ci.yml — fast suite on every PR
- name: Run CI gate evals
run: vally eval --suite ci-gate
# .github/workflows/nightly.yml — comprehensive nightly run
- name: Run nightly evals
run: vally eval --suite nightly

This keeps your CI fast while ensuring comprehensive coverage in scheduled runs.