Skip to content

Grader: run-command

PropertyValue
Determinismcomplex-static
Costlow
Portabilityt2-domain
Referencereference-free
Temporal scopetrajectory-level
Score kindcode
graders:
- type: run-command
config:
command: "node"
args: ["validate.js", "--strict"]
expected_exit_code: 0
stdout_contains: "OK"
stdout_matches: "(?i)success"
sub_path: "src/app"
timeout: 30s
FieldTypeRequiredDefaultDescription
commandstringYesBinary to run (when args provided) or shell command string (when args omitted)
argsstring[]NoArguments passed directly to the binary. When provided, the command runs without a shell for reliable exit codes and cross-platform behavior
expected_exit_codenumberNo0Expected exit code for a passing result
stdout_containsstringNoSubstring that must appear in stdout
stderr_containsstringNoSubstring that must appear in stderr
stdout_matchesstringNoRegex pattern that must match stdout. Supports inline flags: (?i) for case-insensitive, (?m) for multiline, (?s) for dotAll
stdout_not_matchesstringNoRegex pattern that must not match stdout. Same inline flag support as stdout_matches
sub_pathstringNoSubdirectory under the workspace root to use as the working directory. Must stay within the workspace root.
timeoutdurationNo30sTimeout (e.g. 30s, 2m)
envRecord<string, string>NoAdditional environment variables passed to the subprocess. Merged with the process environment.

Executes the command in the trajectory’s workDir as a child process. Checks exit code and optionally validates stdout/stderr content.

When args is omitted, command is passed to the system shell (/bin/sh on Unix, cmd.exe on Windows) for parsing — shell built-ins and metacharacters work in this mode. When args is provided, command is the binary path and is executed directly without a shell, giving reliable exit codes and cross-platform arg handling.

Passes when all configured checks pass (exit code match + any stdout/stderr assertions). Fails otherwise.

This is a complex-static grader — deterministic but with side effects and I/O. Because of the I/O cost, it’s typically appropriate for CI and outer-loop evals rather than fast inner-loop checks.

# Run tests to verify agent's code works
- type: run-command
config:
command: "npm test"
expected_exit_code: 0
# Check a file is valid JSON (shell command string mode)
- type: run-command
config:
command: 'node -e ''JSON.parse(require("fs").readFileSync("output.json"))'''
expected_exit_code: 0
# Compile TypeScript
- type: run-command
config:
command: "npx tsc --noEmit"
expected_exit_code: 0
# Direct exec with explicit args (cross-platform, reliable exit codes)
- type: run-command
config:
command: "node"
args: ["validate.js", "--strict"]
expected_exit_code: 0
stdout_contains: "OK"
# Regex match on stdout (case-insensitive)
- type: run-command
config:
command: "dotnet"
args: ["run"]
stdout_matches: "(?i)hello world"
# Assert stdout does NOT contain error patterns
- type: run-command
config:
command: "node"
args: ["app.js"]
stdout_not_matches: "ERROR|FATAL|WARN"
# Run tests from a subdirectory within the workspace
- type: run-command
config:
command: "npm test"
sub_path: "packages/core"
# Pass custom environment variables to the command
- type: run-command
config:
command: pytest
args: [tests/test_output.py, -x]
env:
TEST_MODE: "eval"
PYTHONPATH: "/app/src"
✔ Command 'npm test' exited with code 0 (expected 0)
✘ Command 'npm test' exited with code 1 (expected 0)