Graders: metric thresholds
Five graders that check trajectory metrics against a configurable max budget. All read values that the pipeline already computes — no additional analysis cost.
token-budget
Section titled “token-budget”Checks that total tokens (input + output) do not exceed max.
graders: - type: token-budget config: max: 50000Evidence: 1500 tokens (within budget of 50000) or 75000 tokens exceeds max of 50000.
tool-call-count
Section titled “tool-call-count”Checks that the number of tool calls does not exceed max.
graders: - type: tool-call-count config: max: 20turn-count
Section titled “turn-count”Checks that the number of agent turns does not exceed max.
graders: - type: turn-count config: max: 10error-count
Section titled “error-count”Checks that the number of error events does not exceed max. Use max: 0 to require zero errors.
graders: - type: error-count config: max: 0wall-time
Section titled “wall-time”Checks that wall-clock execution time does not exceed max. Accepts duration strings ("30s", "2m", "1h").
graders: - type: wall-time config: max: "2m"Shared behavior
Section titled “Shared behavior”Config
Section titled “Config”All five require a max field:
| Field | Type | Required | Description |
|---|---|---|---|
max | integer or duration | yes | Upper bound. wall-time requires a duration string (e.g. "30s", "2m"); the rest require non-negative integers. |
max must be a finite non-negative value. Omitting it is a validation error.
Scoring
Section titled “Scoring”When within budget (value ≤ max): score = 1.
When over budget: score degrades linearly — score = max(0, 1 − (value − max) / max(max, 1)), floored at 0. This means:
- At exactly the threshold → score 1 (pass)
- At 1.5× the threshold → score 0.5 (for thresholds ≥ 1)
- At 2× the threshold or above → score 0 (for thresholds ≥ 1)
- For
max: 0, any value above 0 immediately scores 0
Taxonomy
Section titled “Taxonomy”All five share identical taxonomy metadata:
| Property | Value |
|---|---|
| Determinism | static |
| Cost | free |
| Portability | t1-universal |
| Reference | reference-free |
| Temporal scope | trajectory-level |
| Score kind | code |
Metadata
Section titled “Metadata”Every result includes structured metadata:
{ "value": 1500, "max": 50000}