Trajectory Format
A trajectory is the complete record of a single agent run. Saved as JSON via --output-dir, trajectories can be re-graded, compared, and analyzed.
Top-level structure
Section titled “Top-level structure”interface Trajectory { id: string; // Unique run identifier stimulus: Stimulus; // The prompt + config that produced this events: TrajectoryEvent[]; // Flat array of typed events metrics: TrajectoryMetrics; // Computed aggregates output: string; // Final agent output text workDir: string; // Workspace path during run metadata: TrajectoryMetadata; // Execution context workspaceStatus?: WorkspaceStatus; // How the workspace was produced}workspaceStatus
Section titled “workspaceStatus”Indicates how the trajectory’s workDir was produced. Used by the pipeline to determine whether file-based graders can run.
| Value | Meaning |
|---|---|
"local" | Executor wrote files directly to the local workDir (default when absent) |
"materialized" | Remote executor synced files back via finalizeWorkspace() |
"remote" | Files only exist on the remote executor; file-based graders will error |
Events
Section titled “Events”Events are a discriminated union on the type field. Every event has a timestamp.
tool_call
Section titled “tool_call”Agent invoked a tool.
{ "type": "tool_call", "timestamp": "2025-01-15T10:30:00.000Z", "data": { "toolName": "write_file", "toolCallId": "call_abc123", "arguments": { "path": "add.test.js", "content": "..." } }}tool_result
Section titled “tool_result”Tool returned a result.
{ "type": "tool_result", "timestamp": "2025-01-15T10:30:01.000Z", "data": { "toolName": "write_file", "toolCallId": "call_abc123", "success": true, "result": "File written successfully" }}token_usage
Section titled “token_usage”LLM token counts from a single API call.
{ "type": "token_usage", "timestamp": "2025-01-15T10:30:00.500Z", "data": { "inputTokens": 1500, "outputTokens": 350, "model": "gpt-5.5", "cacheReadTokens": 200, "cacheWriteTokens": 0 }}turn_start / turn_end
Section titled “turn_start / turn_end”Conversation turn boundaries.
{ "type": "turn_start", "timestamp": "...", "data": { "turnId": "turn-1" } }{ "type": "turn_end", "timestamp": "...", "data": { "turnId": "turn-1" } }assistant_message / user_message
Section titled “assistant_message / user_message”Text messages in the conversation.
{ "type": "assistant_message", "timestamp": "...", "data": { "content": "I'll write tests for..." } }{ "type": "user_message", "timestamp": "...", "data": { "content": "Write tests for add()" } }skill_activation
Section titled “skill_activation”A skill was loaded by the agent.
{ "type": "skill_activation", "timestamp": "...", "data": { "name": "test-writer", "path": "/skills/test-writer/SKILL.md", "pluginName": "copilot", "allowedTools": ["write_file", "read_file"] }}Something went wrong during the run.
{ "type": "error", "timestamp": "...", "data": { "message": "Request timed out", "type": "TimeoutError", "code": 408 }}Metrics
Section titled “Metrics”Computed from events after the run completes:
interface TrajectoryMetrics { tokenUsage: { inputTokens: number; outputTokens: number; totalTokens: number; cacheReadTokens: number; cacheWriteTokens: number; callCount: number; byModel: Record< string, { inputTokens: number; outputTokens: number; callCount: number; } >; }; toolCallCount: number; toolCallBreakdown: Record<string, number>; // { "write_file": 3, "read_file": 1 } skillActivationCount: number; skillActivationBreakdown: Record<string, number>; // { "test-writer": 1 } turnCount: number; wallTimeMs: number; errorCount: number;}Metadata
Section titled “Metadata”interface TrajectoryMetadata { model: string; // Model used for execution skillsLoaded: string[]; // Names of skills that were loaded startedAt: Date; completedAt: Date; executor: string; // Which executor ran this sessionID: string;}Working with trajectories
Section titled “Working with trajectories”# Save trajectories (one trial-result record per trial, in results.jsonl)vally eval --eval-spec eval.yaml --output-dir ./results
# Re-grade trajectories from a previous runcat ./results/*/results.jsonl | vally grade --eval-spec eval.yaml
# Inspect trial trajectories with jq. `results.jsonl` interleaves# `trial-result` records (with a nested `trajectory` field) with a final# `run-summary` line, so filter on `.type == "trial-result"` first.cat ./results/*/results.jsonl | jq 'select(.type == "trial-result") | .trajectory.metrics'cat ./results/*/results.jsonl | jq 'select(.type == "trial-result") | .trajectory.events[] | select(.type == "tool_call") | .data.toolName'cat ./results/*/results.jsonl | jq 'select(.type == "trial-result") | .trajectory.events | length'