Skip to content

Grader: tool-calls

PropertyValue
Determinismstatic
Costlow
Portabilityt1-universal
Referencereference-free
Temporal scopetrajectory-level
Score kindcode
graders:
- type: tool-calls
config:
required:
- create
- name: bash
command: "npm test"
disallowed:
- name: view
path: "secret\\.env$"
sequence:
- name: validate
- name: create
FieldTypeRequiredDefaultDescription
required(ToolMatch | string)[]NoTools that must be called. Fails if any are missing
disallowed(ToolMatch | string)[]NoTools that must NOT be called. Fails if any are found
sequence(ToolMatch | string)[]NoTools that must be called in this relative order (see below)

At least one of required, disallowed, or sequence must be provided.

Each entry can be a plain string (shorthand for { name: ... } — also a regex matched against the tool name) or an object with additional argument filters:

FieldTypeRequiredDescription
namestringYesRegex matched against the tool name. Unanchored — use ^name$ for an exact match.
commandstringNoRegex matched against the tool’s command argument
pathstringNoRegex matched against the tool’s path argument
argsRecord<string, string>NoRegex patterns matched against arbitrary call arguments, keyed by argument name (a generalization of command/path).
resultstringNoRegex matched against the tool’s result/observation (stringified). Not supported on sequence entries.
min_countnumberNoMinimum number of matching calls required (default 1). required entries only.
finalbooleanNoWhen true, the matching call must be the final tool call in the trajectory. required entries only.
at_stepnumberNoPositional constraint: the call must occur in exactly this agent step (0-based). required entries only.
before_stepnumberNoPositional constraint: the call must occur before this agent step (i.e. step 0 .. before_step - 1); must be >= 1. required entries only.

command and path are convenient shortcuts for the most common arguments. args generalizes this to any argument: args: { query: "vally" } matches calls whose query argument matches the regex vally. Every listed argument must be present and match.

The two forms differ when a referenced argument is absent from a call: command/path raise a configuration error (they are expected to exist on the tools they describe), while args simply treats the call as a non-match, so the same matcher can be applied across heterogeneous tools.

Only string-valued arguments are matchable. Numeric, boolean, and object/array argument values are treated as absent, so a matcher against them never matches (and an args matcher against them is a non-match rather than an error).

  • min_count requires at least N matching (completed) calls instead of just one — useful for “the agent retried at least twice” style checks.
  • result matches against the tool’s output/observation. Non-string results are JSON-stringified before matching. It is evaluated on required and disallowed entries; it is not supported on sequence entries, which match at invocation time before the result is known.
  • final asserts that the matching call is the last tool call in the trajectory — useful for “the agent finished by reporting its result” checks.

A step is an agent turn, counted by turn_start events in the trajectory (the first turn is step 0). The granularity depends on the executor: the ATIF adapter emits one turn per model response, while the Copilot adapter emits one turn per agent turn. Trajectories without any turn boundaries are treated as a single step 0.

at_step and before_step are only honored on required entries; using them on disallowed or sequence entries raises a configuration error. When both are set on the same entry, at_step must be strictly less than before_step (otherwise no step could satisfy both, which raises a configuration error).

Scans trajectory events for tool_call and tool_result pairs. For each completed tool invocation, checks the tool name and arguments against the required and disallowed matchers.

  • name, command, path, args values, and result are matched as unanchored regex patterns. To require an exact match, anchor the pattern: ^value$.
  • A required matcher is satisfied when at least one tool call matches it (or min_count calls when set). When a required entry also has at_step / before_step, the matching call must additionally occur in the specified step; with final: true, the matching call must be the last tool call in the trajectory.
  • A required entry with at_step: N is satisfied only if a matching call occurs in step N; with before_step: N, only if a matching call occurs in one of steps 0 .. N-1. The step is the turn in which the tool was invoked (the tool_call), not where its result arrived.
  • A disallowed matcher is violated when any tool call matches it (including its args / result filters).
  • sequence matchers must be satisfied as an ordered subsequence: for [A, B, C], the trajectory must contain a tool call matching A, then a later one matching B, then a later one matching C. Other tool calls may be interleaved between them, and repeated entries (e.g. [poll, poll]) require that many distinct calls in order.

Passes when all required matchers are satisfied, no disallowed matchers are violated, and the sequence (if any) is satisfied in order. Fails otherwise.

# Agent must use the create tool at some point
- type: tool-calls
config:
required:
- create
# Agent must run npm test via bash
- type: tool-calls
config:
required:
- name: bash
command: "npm test"
# Agent must use either bash or powershell (regex name with alternation)
- type: tool-calls
config:
required:
- name: "^(bash|powershell)$"
# Agent must not view sensitive files
- type: tool-calls
config:
disallowed:
- name: view
path: "\\.env$"
# Combine required and disallowed
- type: tool-calls
config:
required:
- name: edit
path: "src/.*\\.ts$"
disallowed:
- name: bash
command: "rm -rf"
# Agent must validate before it mutates (ordered subsequence)
- type: tool-calls
config:
sequence:
- name: "^validate_.*"
- name: "^(create|update)_.*"
# Agent must load the skill early, in the first three steps
- type: tool-calls
config:
required:
- name: "^load_skill$"
before_step: 3
# Agent must validate in the very first step
- type: tool-calls
config:
required:
- name: "^validate$"
at_step: 0
# Agent must try the upload at least twice
- type: tool-calls
config:
required:
- name: "^upload$"
min_count: 2
# Agent must search for a specific query (generic argument matching)
- type: tool-calls
config:
required:
- name: "^web_search$"
args:
query: "release notes"
# Agent's build must succeed (match on tool output)
- type: tool-calls
config:
required:
- name: "^bash$"
command: "npm run build"
result: "BUILD SUCCEEDED"
# Agent must finish by reporting its result (final tool call)
- type: tool-calls
config:
required:
- name: "^report_result$"
final: true