Skip to content

Grader: program

PropertyValue
Determinismstatic
Costlow
Portabilityt3a-scenario
Referencereference-free
Temporal scopetrajectory-level
Score kindcode
graders:
- type: program
config:
program: "python"
args: ["graders/check_output.py"]
timeout: 60s
sub_path: "src/app"
FieldTypeRequiredDefaultDescription
programstringYesThe program to run (e.g. python, node, bash)
argsstring[]NoCommand-line arguments passed to the program
shellbooleanNofalseRun the program using a shell instead of exec
sub_pathstringNoSubdirectory under the workspace root to use as the working directory. Must stay within the workspace root.
timeoutdurationNo60sMaximum time the process can run before being terminated (e.g. 60s, 2m)
envRecord<string, string>NoAdditional environment variables passed to the subprocess. Merged with the process environment. EVALUATE_WORKSPACE and EVALUATE_GRADER_INPUT are always set by the grader and cannot be specified in config.env.

Executes the program in a child process and sets two environment variables so your grader can load the full evaluation context:

Environment variableValue
EVALUATE_WORKSPACEPath to the workspace root (always the top-level workspace, even when sub_path changes the working directory)
EVALUATE_GRADER_INPUTPath to a temporary JSON file containing the serialized GraderInput

When sub_path is set, the child process’s working directory (cwd) is the resolved subdirectory, but EVALUATE_WORKSPACE still points to the workspace root.

The reserved variables EVALUATE_WORKSPACE and EVALUATE_GRADER_INPUT are rejected if they appear in env (case-insensitive), because the grader owns their values.

Your program communicates its result back in one of two ways:

Print nothing to stdout (stderr is fine). Exit 0 to pass, non-zero to fail. The score is 1 on pass, 0 on fail.

Print a JSON object conforming to the GraderResult schema to stdout. This lets you return a custom score (between 0 and 1 inclusive), evidence text, and metadata. Do not print anything else to stdout — use stderr for diagnostics.

{
"name": "my-custom-check",
"passed": true,
"score": 0.85,
"evidence": "14 of 16 assertions passed",
"kind": "code"
}

If stdout contains text that is not valid JSON, or the JSON does not match the GraderResult schema, the grader fails with a 0 score.

# Python script that inspects the workspace
- type: program
config:
program: "python"
args: ["graders/validate_schema.py"]
# Bash script using exit-code mode
- type: program
config:
program: "bash"
args: ["graders/check.sh"]
shell: true
# Node.js script returning a GraderResult JSON
- type: program
config:
program: "node"
args: ["graders/score.js"]
timeout: 120s
# Pass custom environment variables to the grader
- type: program
config:
program: python
args: [-m, my_checker]
env:
API_ENDPOINT: "https://test.example.com"
EXPECTED_STATUS: "active"
✔ Grader exited successfully
✘ Grader exited with exit code 1
✘ Grader timed out
✘ Failed to start grader program: ENOENT (spawn)
✘ Grader returned unparseable JSON output on stdout: SyntaxError: …
✘ Grader output did not match schema