Grader: tool-calls
Taxonomy
Section titled “Taxonomy”| Property | Value |
|---|---|
| Determinism | static |
| Cost | low |
| Portability | t1-universal |
| Reference | reference-free |
| Temporal scope | trajectory-level |
| Score kind | code |
Config
Section titled “Config”graders: - type: tool-calls config: required: - create - name: bash command: "npm test" disallowed: - name: view path: "secret\\.env$" sequence: - name: validate - name: create| Field | Type | Required | Default | Description |
|---|---|---|---|---|
required | (ToolMatch | string)[] | No | — | Tools that must be called. Fails if any are missing |
disallowed | (ToolMatch | string)[] | No | — | Tools that must NOT be called. Fails if any are found |
sequence | (ToolMatch | string)[] | No | — | Tools that must be called in this relative order (see below) |
At least one of required, disallowed, or sequence must be provided.
ToolMatch
Section titled “ToolMatch”Each entry can be a plain string (shorthand for { name: ... } — also a regex matched against the tool name) or an object with additional argument filters:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Regex matched against the tool name. Unanchored — use ^name$ for an exact match. |
command | string | No | Regex matched against the tool’s command argument |
path | string | No | Regex matched against the tool’s path argument |
args | Record<string, string> | No | Regex patterns matched against arbitrary call arguments, keyed by argument name (a generalization of command/path). |
result | string | No | Regex matched against the tool’s result/observation (stringified). Not supported on sequence entries. |
min_count | number | No | Minimum number of matching calls required (default 1). required entries only. |
final | boolean | No | When true, the matching call must be the final tool call in the trajectory. required entries only. |
at_step | number | No | Positional constraint: the call must occur in exactly this agent step (0-based). required entries only. |
before_step | number | No | Positional constraint: the call must occur before this agent step (i.e. step 0 .. before_step - 1); must be >= 1. required entries only. |
Matching arguments
Section titled “Matching arguments”command and path are convenient shortcuts for the most common arguments. args generalizes this to any argument: args: { query: "vally" } matches calls whose query argument matches the regex vally. Every listed argument must be present and match.
The two forms differ when a referenced argument is absent from a call: command/path raise a configuration error (they are expected to exist on the tools they describe), while args simply treats the call as a non-match, so the same matcher can be applied across heterogeneous tools.
Only string-valued arguments are matchable. Numeric, boolean, and object/array argument values are treated as absent, so a matcher against them never matches (and an args matcher against them is a non-match rather than an error).
Counting and result matching
Section titled “Counting and result matching”min_countrequires at least N matching (completed) calls instead of just one — useful for “the agent retried at least twice” style checks.resultmatches against the tool’s output/observation. Non-string results are JSON-stringified before matching. It is evaluated onrequiredanddisallowedentries; it is not supported onsequenceentries, which match at invocation time before the result is known.finalasserts that the matching call is the last tool call in the trajectory — useful for “the agent finished by reporting its result” checks.
A step is an agent turn, counted by turn_start events in the trajectory (the first turn is step 0). The granularity depends on the executor: the ATIF adapter emits one turn per model response, while the Copilot adapter emits one turn per agent turn. Trajectories without any turn boundaries are treated as a single step 0.
at_step and before_step are only honored on required entries; using them on disallowed or sequence entries raises a configuration error. When both are set on the same entry, at_step must be strictly less than before_step (otherwise no step could satisfy both, which raises a configuration error).
Behavior
Section titled “Behavior”Scans trajectory events for tool_call and tool_result pairs. For each completed tool invocation, checks the tool name and arguments against the required and disallowed matchers.
name,command,path,argsvalues, andresultare matched as unanchored regex patterns. To require an exact match, anchor the pattern:^value$.- A
requiredmatcher is satisfied when at least one tool call matches it (ormin_countcalls when set). When arequiredentry also hasat_step/before_step, the matching call must additionally occur in the specified step; withfinal: true, the matching call must be the last tool call in the trajectory. - A
requiredentry withat_step: Nis satisfied only if a matching call occurs in stepN; withbefore_step: N, only if a matching call occurs in one of steps0 .. N-1. The step is the turn in which the tool was invoked (thetool_call), not where its result arrived. - A
disallowedmatcher is violated when any tool call matches it (including itsargs/resultfilters). sequencematchers must be satisfied as an ordered subsequence: for[A, B, C], the trajectory must contain a tool call matchingA, then a later one matchingB, then a later one matchingC. Other tool calls may be interleaved between them, and repeated entries (e.g.[poll, poll]) require that many distinct calls in order.
Passes when all required matchers are satisfied, no disallowed matchers are violated, and the sequence (if any) is satisfied in order. Fails otherwise.
Use cases
Section titled “Use cases”# Agent must use the create tool at some point- type: tool-calls config: required: - create
# Agent must run npm test via bash- type: tool-calls config: required: - name: bash command: "npm test"
# Agent must use either bash or powershell (regex name with alternation)- type: tool-calls config: required: - name: "^(bash|powershell)$"
# Agent must not view sensitive files- type: tool-calls config: disallowed: - name: view path: "\\.env$"
# Combine required and disallowed- type: tool-calls config: required: - name: edit path: "src/.*\\.ts$" disallowed: - name: bash command: "rm -rf"
# Agent must validate before it mutates (ordered subsequence)- type: tool-calls config: sequence: - name: "^validate_.*" - name: "^(create|update)_.*"
# Agent must load the skill early, in the first three steps- type: tool-calls config: required: - name: "^load_skill$" before_step: 3
# Agent must validate in the very first step- type: tool-calls config: required: - name: "^validate$" at_step: 0
# Agent must try the upload at least twice- type: tool-calls config: required: - name: "^upload$" min_count: 2
# Agent must search for a specific query (generic argument matching)- type: tool-calls config: required: - name: "^web_search$" args: query: "release notes"
# Agent's build must succeed (match on tool output)- type: tool-calls config: required: - name: "^bash$" command: "npm run build" result: "BUILD SUCCEEDED"
# Agent must finish by reporting its result (final tool call)- type: tool-calls config: required: - name: "^report_result$" final: true