Add to CI

This quickstart adds skill validation to your CI pipeline so every PR gets checked automatically.

Run vally lint as an initial CI gate on every PR — it also belongs in your dev loop (pre-commit, save-time, or IDE integration). Once lint passes, run vally eval for behavioral checks before merge.

Add the lint step (every PR)

Create .github/workflows/skill-lint.yml:

name: Skill Lint
on:
  pull_request:
    paths:
      - "**/SKILL.md"
      - "**/*.yaml"

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: "22"

      - run: npm install -g @microsoft/vally-cli

      - name: Lint skills
        run: vally lint .

      - name: Validate eval specs
        run: vally lint --eval-spec eval.yaml --strict

This lint pass catches:

Invalid SKILL.md format (name, description, frontmatter)
Broken file references

Add eval for behavioral checks (every PR)

Once lint passes, run agent-level evals before merge:

An eval runs a real agent, so the eval step needs credentials for your executor’s model provider. The default copilot-sdk executor reads the COPILOT_GITHUB_TOKEN env var — it must be a fine-grained PAT with the Account › Copilot Requests permission. The auto-generated secrets.GITHUB_TOKEN can’t access Copilot, and no permissions: block will change that.

Create the credential, store it as a repository secret (any name — this example uses COPILOT_CLI_TOKEN), then pass it through the env var your executor reads:

Executor Credential Env var

copilot-sdk (default) Fine-grained PAT with Copilot Requests COPILOT_GITHUB_TOKEN

claude-cli Anthropic API key ANTHROPIC_API_KEY

Getting the tokens:
- GitHub PAT — go to github.com/settings/tokens → Fine-grained tokens → Generate new token. Under Account permissions, set Copilot Requests to Read and write. Then add the token as a repository secret.
- Anthropic API key — go to console.anthropic.com/settings/keys → Create key. Then add it as a repository secret.
.github/workflows/skill-eval.yml
```
name: Skill Eval
on:
  pull_request:
    paths:
      - "**/SKILL.md"
      - "**/eval.yaml"

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: "22"

      - run: npm install -g @microsoft/vally-cli

      - name: Run evals
        run: |
          vally eval \
            --eval-spec eval.yaml \
            --skill-dir . \
            --output-dir ./results
        env:
          COPILOT_GITHUB_TOKEN: ${{ secrets.COPILOT_CLI_TOKEN }}

      - name: Upload trajectories
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: eval-results
          path: ./results/
```
Always upload trajectory artifacts — they let you re-grade offline with vally grade without re-running the agent.
Use exit codes for gating

Both lint and eval set process.exitCode = 1 on failure, so GitHub Actions will automatically mark the step as failed.
- vally lint → exits 1 if any skill fails validation
- vally eval → exits 1 if any eval verdict fails, unless you intentionally lower the threshold for that run (for example, --threshold 0); execution and tooling errors still exit 1

Executor	Credential	Env var
`copilot-sdk` (default)	Fine-grained PAT with Copilot Requests	`COPILOT_GITHUB_TOKEN`
`claude-cli`	Anthropic API key	`ANTHROPIC_API_KEY`

Running Specific Suites

Define suites in .vally.yaml to run different eval subsets in different CI contexts:

suites:
  ci-gate:
    filter: { priority: [p0, p1] }
  nightly:
    filter: { priority: [p0, p1, p2] }

# .github/workflows/ci.yml — fast suite on every PR
- name: Run CI gate evals
  run: vally eval --suite ci-gate

# .github/workflows/nightly.yml — comprehensive nightly run
- name: Run nightly evals
  run: vally eval --suite nightly

This keeps your CI fast while ensuring comprehensive coverage in scheduled runs.

Next steps

Grader taxonomy — pick graders by determinism and cost
Writing eval specs — design better stimuli