Evaluation Logs

The evaluation log stores task completion assessment results from the EvaluationAgent. The log is saved as evaluation.log in JSON format, containing a single entry that evaluates the entire session.

Log Structure

The evaluation log contains the following fields:

Field Description Type
complete Overall completion status: yes, no, or unsure String
sub_scores Breakdown of evaluation into sub-goals, each with name and evaluation status List of Dictionaries
reason Detailed justification based on screenshots and execution trajectory String
level Evaluation scope (e.g., session) String
request Original user request being evaluated String
type Log entry type, set to evaluation_result String

Sub-score Structure

Each item in sub_scores contains:

Field Description Type
name Name of the sub-goal being evaluated String
evaluation Completion status: yes, no, or unsure String

Example

```json { "complete": "yes", "sub_scores": [ { "name": "Open application", "evaluation": "yes" }, { "name": "Complete data entry", "evaluation": "yes" } ], "reason": "All sub-tasks completed successfully. Screenshots show the application was opened and data was correctly entered.", "level": "session", "request": "Open the application and enter data", "type": "evaluation_result" }