Screenshot Logs

UFO captures screenshots at every step for debugging and evaluation purposes. All screenshots are stored in the logs/{task_name}/ directory.

Screenshot Types

1. Clean Screenshots

Unmodified screenshots of the desktop or application window.

File naming:

  • Step screenshots: action_step{step_number}.png
  • Subtask completion: action_round_{round_id}_sub_round_{sub_task_id}_final.png
  • Round completion: action_round_{round_id}_final.png
  • Session completion: action_step_final.png

Example:

Clean Screenshot

2. Annotated Screenshots

Screenshots with UI controls labeled using the Set-of-Mark paradigm. Each interactive control is marked with a number for reference.

File naming: action_step{step_number}_annotated.png

Example:

Annotated Screenshot

Only control types configured in CONTROL_LIST (in config_dev.yaml) are annotated. Different control types use different colors, configurable via ANNOTATION_COLORS.

3. Concatenated Screenshots

Clean and annotated screenshots placed side-by-side for comparison.

File naming: action_step{step_number}_concat.png

Example:

Concatenated Screenshot

Configure whether to feed concatenated or separate screenshots to LLMs using CONCAT_SCREENSHOT in config_dev.yaml.

4. Selected Control Screenshots

Close-up view of the control element selected for interaction in the previous step.

File naming: action_step{step_number}_selected_controls.png

Example:

Selected Control Screenshot

Enable/disable sending selected control screenshots to LLM using INCLUDE_LAST_SCREENSHOT in config_dev.yaml.