Evaluation Logs

The evaluation log stores task completion assessment results from the EvaluationAgent. The log is saved as evaluation.log in JSON format, containing a single entry that evaluates the entire session.

Log Structure

The evaluation log contains the following fields:

Field	Description	Type
`complete`	Overall completion status: `yes`, `no`, or `unsure`	String
`sub_scores`	Breakdown of evaluation into sub-goals, each with name and evaluation status	List of Dictionaries
`reason`	Detailed justification based on screenshots and execution trajectory	String
`level`	Evaluation scope (e.g., `session`)	String
`request`	Original user request being evaluated	String
`type`	Log entry type, set to `evaluation_result`	String

Sub-score Structure

Each item in sub_scores contains:

Field	Description	Type
`name`	Name of the sub-goal being evaluated	String
`evaluation`	Completion status: `yes`, `no`, or `unsure`	String

Example

```json { "complete": "yes", "sub_scores": [ { "name": "Open application", "evaluation": "yes" }, { "name": "Complete data entry", "evaluation": "yes" } ], "reason": "All sub-tasks completed successfully. Screenshots show the application was opened and data was correctly entered.", "level": "session", "request": "Open the application and enter data", "type": "evaluation_result" }