CI/CD Integration
Integrate waza evaluations into your GitHub Actions CI/CD pipeline.
GitHub Actions Workflow
Section titled “GitHub Actions Workflow”Waza scaffolds a ready-to-use workflow with waza init:
waza init my-projectCreates .github/workflows/eval.yml:
name: Evaluation
on: push: branches: [main] pull_request:
jobs: evaluate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Install waza run: | curl -fsSL https://raw.githubusercontent.com/microsoft/waza/main/install.sh | bash echo "$HOME/bin" >> $GITHUB_PATH
- name: Run evaluations env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: waza run -v -o results.json
- name: Upload results uses: actions/upload-artifact@v4 with: name: eval-results path: results.jsonCustom Workflow
Section titled “Custom Workflow”Create your own workflow in .github/workflows/eval.yml:
name: Run Evaluations
on: push: branches: [main, develop] pull_request: types: [opened, synchronize]
jobs: evaluate: runs-on: ubuntu-latest strategy: matrix: model: [gpt-4o, claude-sonnet-4.6]
steps: - uses: actions/checkout@v4
- name: Install waza run: | curl -fsSL https://raw.githubusercontent.com/microsoft/waza/main/install.sh | bash
- name: Run evals with ${{ matrix.model }} env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: | waza run \ --model "${{ matrix.model }}" \ -o "results-${{ matrix.model }}.json" \ -v
- name: Upload results uses: actions/upload-artifact@v4 with: name: results-${{ matrix.model }} path: results-${{ matrix.model }}.jsonMulti-Model Matrix
Section titled “Multi-Model Matrix”Test across multiple models in parallel:
strategy: matrix: model: - gpt-4o - claude-sonnet-4.6 - claude-opus-4 max-parallel: 3
steps: - name: Run evals for ${{ matrix.model }} run: waza run --model "${{ matrix.model }}" -o "results-${{ matrix.model }}.json"Filtering Tasks
Section titled “Filtering Tasks”Run subset of tasks in CI to save time:
- name: Run fast tests run: waza run --tags "smoke" -v
- name: Run comprehensive tests (nightly) if: github.event_name == 'schedule' run: waza run -vParallel Execution
Section titled “Parallel Execution”Run tasks in parallel:
- name: Run evaluations in parallel run: waza run --parallel --workers 8 -vResult Artifacts
Section titled “Result Artifacts”Save results for later analysis:
- name: Upload evaluation results uses: actions/upload-artifact@v4 with: name: eval-results-${{ github.run_id }} path: results.json retention-days: 30Download in dashboard:
gh run download <run-id> -n eval-results-<run-id>waza servePR Comments
Section titled “PR Comments”Post results as GitHub comment:
- name: Run evaluations run: waza run -o results.json --format github-comment > comment.md
- name: Post comment uses: actions/github-script@v7 with: script: | const fs = require('fs'); const comment = fs.readFileSync('comment.md', 'utf8'); github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: comment });Conditional Execution
Section titled “Conditional Execution”Run evaluations only on specific conditions:
- name: Check if eval config changed id: check run: | if git diff-tree --no-commit-id --name-only -r HEAD | grep -q "evals/"; then echo "EVAL_CHANGED=true" >> $GITHUB_OUTPUT fi
- name: Run evaluations if: steps.check.outputs.EVAL_CHANGED == 'true' || github.event_name == 'workflow_dispatch' run: waza run -vCaching
Section titled “Caching”Cache evaluation results to speed up repeated runs:
- name: Cache waza results uses: actions/cache@v3 with: path: .waza-cache key: waza-cache-${{ hashFiles('evals/**') }} restore-keys: waza-cache-
- name: Run evaluations with cache run: waza run --cache --cache-dir .waza-cache -vScheduled Runs
Section titled “Scheduled Runs”Run evaluations on a schedule (e.g., daily):
on: schedule: - cron: '0 0 * * *' # Daily at midnight UTCSecrets and Environment Variables
Section titled “Secrets and Environment Variables”Pass credentials safely:
- name: Run evaluations env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} API_KEY: ${{ secrets.API_KEY }} run: waza run -vSetup Guide
Section titled “Setup Guide”1. Create Workflow File
Section titled “1. Create Workflow File”Create .github/workflows/eval.yml in your repository.
2. Configure Permissions
Section titled “2. Configure Permissions”If using github.rest.issues.createComment:
permissions: contents: read issues: write pull-requests: write3. Set Up Credentials
Section titled “3. Set Up Credentials”GitHub Actions automatically provides GITHUB_TOKEN. For other APIs, add secrets in repository settings.
4. Test Locally
Section titled “4. Test Locally”Test workflow locally with act:
# Install actbrew install act
# Run workflowact pushStatus Checks
Section titled “Status Checks”Fail the workflow if evaluation fails:
- name: Run evaluations run: waza run -v # Exit code 1 if tasks fail, workflow failsBest Practices
Section titled “Best Practices”- Cache fixtures — Speed up repeated runs
- Matrix testing — Test multiple models in parallel
- Artifact retention — Keep results for dashboard analysis
- Conditional runs — Skip unnecessary evaluations
- Timeout — Set reasonable timeouts to catch hanging tasks
- Notify on failure — Post comments or create issues
Troubleshooting
Section titled “Troubleshooting””waza: command not found”
Section titled “”waza: command not found””Ensure $HOME/bin is in PATH:
- name: Install waza run: | curl -fsSL https://raw.githubusercontent.com/microsoft/waza/main/install.sh | bash echo "$HOME/bin" >> $GITHUB_PATH“eval.yaml not found”
Section titled ““eval.yaml not found””Check working directory:
- name: Run from correct directory run: | ls -la waza run evals/my-skill/eval.yaml -vTests timeout
Section titled “Tests timeout”Increase timeout or use --parallel:
- name: Run with longer timeout run: waza run --config.timeout_seconds 600 -vNext Steps
Section titled “Next Steps”- Writing Eval Specs — Create evaluation benchmarks
- CLI Reference — All commands and flags
- GitHub Repository — Examples and issues