Quick Start
Get from zero to running your first evaluation benchmark in 5 minutes.
Prerequisites
Section titled “Prerequisites”- Go 1.26 or later (for binary install), or
- GitHub Copilot access (for
copilot login)
1. Install
Section titled “1. Install”Choose one method:
curl -fsSL https://raw.githubusercontent.com/microsoft/waza/main/install.sh | bashwaza --versiongo install github.com/microsoft/waza/cmd/waza@latestwaza --versionazd ext source add -n waza -t url -l https://raw.githubusercontent.com/microsoft/waza/main/registry.jsonazd ext install microsoft.azd.wazaazd waza --versionAll commands below use waza — with azd extension, replace with azd waza.
2. Authenticate
Section titled “2. Authenticate”Waza needs GitHub Copilot access for running evaluations:
copilot loginThis opens your browser to authenticate. After login, you’re ready to go.
3. Create Your First Skill
Section titled “3. Create Your First Skill”Initialize a project and create a skill:
mkdir my-eval-suitecd my-eval-suitewaza initwaza new skill my-skillYou’ll see:
✓ Created skill: skills/my-skill/├── skill.yaml # Skill definition├── evals/│ └── eval.yaml # Evaluation spec└── fixtures/ ├── input.txt # Sample task input └── README.md # How to add more fixtures4. Write Your First Eval
Section titled “4. Write Your First Eval”Open skills/my-skill/evals/eval.yaml and modify it to this minimal spec:
name: my-skill-evaldescription: Test my skillconfig: model: claude-sonnet-4.6 timeout_seconds: 30
graders: - type: text name: has_response config: pattern: "\\w+"
tasks: - name: test-task-1 description: Simple test input: "Hello, world!" expected: "Should say hello"5. Run It
Section titled “5. Run It”waza run skills/my-skill/evals/eval.yaml -vYou’ll see live execution:
Running evaluation: my-skill-eval──────────────────────────────────
Task: test-task-1Prompt: Hello, world!
Agent Response:Hello! I'm an AI assistant. How can I help you?
Grading...✓ has_response [PASS]
Task Summary: Passed: 1/1 Score: 100%6. View Results
Section titled “6. View Results”Serve the interactive dashboard:
waza serveOpen your browser to http://localhost:3000 — you’ll see:
- Dashboard — overview of all runs
- Run Details — task-by-task breakdown with pass/fail
- Scoring — individual grader results and weights
- Trends — historical performance across runs
Workflow Diagram
Section titled “Workflow Diagram”graph LR A["Create Skill<br/>waza new skill"] --> B["Write Eval YAML<br/>eval.yaml"] B --> C["Run Evaluation<br/>waza run eval.yaml"] C --> D["View Results<br/>waza serve"] D --> E["Dashboard<br/>localhost:3000"] style A fill:#e1f5ff style B fill:#f3e5f5 style C fill:#e8f5e9 style D fill:#fff3e0 style E fill:#fce4ecNext Steps
Section titled “Next Steps”- Getting Started — Complete reference with project structure and workflow
- Eval YAML Reference — Full spec for writing eval files
- Validators & Graders — All 11 grader types with examples
- Web Dashboard Guide — Features and navigation
- CI/CD Integration — Automate evaluations in GitHub Actions
Stuck? Open an issue on GitHub.