Round
A Round is a single request-response cycle within a Session, orchestrating agents through a state machine to execute commands until the user's request is fulfilled.
Quick Reference:
- Lifecycle? See Round Lifecycle
- State machine? See State Machine
- Agent switching? See Agent Orchestration
- Snapshots? See Snapshot Capture
Overview
A Round represents one complete request-response interaction:
- Input: User request (e.g., "Send an email to John")
- Processing: Agent state machine execution
- Output: Request fulfilled or error state
Round in Context
Round Lifecycle
State Machine Overview
Core Execution Loop
async def run(self) -> None:
"""
Run the round asynchronously.
"""
while not self.is_finished():
# 1. Agent processes current state
await self.agent.handle(self.context)
# 2. State machine transitions
self.state = self.agent.state.next_state(self.agent)
# 3. Agent switching (HostAgent ↔ AppAgent)
self.agent = self.agent.state.next_agent(self.agent)
self.agent.set_state(self.state)
# 4. Snapshot capture at subtask boundaries
if self.state.is_subtask_end():
time.sleep(configs["SLEEP_TIME"])
await self.capture_last_snapshot(sub_round_id=self.subtask_amount)
self.subtask_amount += 1
# 5. Add request to blackboard
self.agent.blackboard.add_requests(
{f"request_{self.id}": self.request}
)
# 6. Final snapshot
if self.application_window is not None:
await self.capture_last_snapshot()
# 7. Evaluation (optional)
if self._should_evaluate:
await self.evaluation()
Lifecycle Stages
1. Initialization
Created by session's create_new_round():
round = Round(
task="email_task",
context=session.context,
request="Send an email to John",
id=0 # Round number
)
Initialization sets:
| Property | Source | Description |
|---|---|---|
task |
Session | Task name for logging |
context |
Session | Shared context object |
request |
User input | Natural language request |
id |
Round counter | Sequential round number |
agent |
Initial agent | Usually HostAgent (Windows) or LinuxAgent |
state |
Initial state | Usually START state |
2. Agent Handle
Each loop iteration calls agent.handle(context):
await self.agent.handle(self.context)
What happens:
- Observation: Agent observes UI state
- Reasoning: LLM generates plan and actions
- Action: Commands sent to dispatcher
- Execution: Commands executed locally or remotely
- Results: Results stored in context
Example Flow:
3. State Transition
After agent handling, state machine transitions:
self.state = self.agent.state.next_state(self.agent)
State Transitions:
| Current State | Condition | Next State |
|---|---|---|
| START | Initial | CONTINUE |
| CONTINUE | More actions needed | CONTINUE |
| CONTINUE | Task complete | FINISH |
| CONTINUE | Error occurred | ERROR |
| FINISH | Always | Round ends |
| ERROR | Always | Round ends |
State Diagram:
4. Agent Switching
Determine which agent handles next step:
self.agent = self.agent.state.next_agent(self.agent)
self.agent.set_state(self.state)
Agent Switching Logic (Windows):
| Current Agent | Condition | Next Agent |
|---|---|---|
| HostAgent | Application selected | AppAgent |
| AppAgent | Need different app | HostAgent |
| AppAgent | Same app continues | AppAgent |
| HostAgent | Task complete | HostAgent (finish) |
Agent Switching Logic (Linux):
| Current Agent | Condition | Next Agent |
|---|---|---|
| LinuxAgent | Always | LinuxAgent (no switching) |
Switching Example:
5. Subtask Boundary Capture
Capture snapshot when subtask ends:
if self.state.is_subtask_end():
time.sleep(configs["SLEEP_TIME"]) # Let UI settle
await self.capture_last_snapshot(sub_round_id=self.subtask_amount)
self.subtask_amount += 1
Subtask End Conditions:
- Agent switched (HostAgent ↔ AppAgent)
- Major UI change detected
- Explicit subtask boundary in plan
Captured Data:
- Window screenshot:
action_round_{id}_sub_round_{sub_id}_final.png - UI tree (if enabled):
ui_tree_round_{id}_sub_round_{sub_id}_final.json - Desktop screenshot (if enabled):
desktop_round_{id}_sub_round_{sub_id}_final.png
6. Finish Check
def is_finished(self) -> bool:
"""Check if round is complete."""
return self.state in [AgentState.FINISH, AgentState.ERROR]
Loop continues until state is FINISH or ERROR.
7. Final Snapshot
After loop exits:
if self.application_window is not None:
await self.capture_last_snapshot()
Final snapshot captures the end state of the application for logging and evaluation.
8. Evaluation
Optional evaluation of round success:
if self._should_evaluate:
await self.evaluation()
Evaluation checks: - Was the request fulfilled? - Quality of actions taken - Efficiency metrics
State Machine
AgentState Enum
class AgentState(Enum):
START = "START"
CONTINUE = "CONTINUE"
FINISH = "FINISH"
ERROR = "ERROR"
State Behaviors
| State | Meaning | Transitions To |
|---|---|---|
| START | Initial state | CONTINUE |
| CONTINUE | Actively processing | CONTINUE, FINISH, ERROR |
| FINISH | Successfully complete | Round ends |
| ERROR | Fatal error occurred | Round ends |
State Methods
Each state implements:
class StateInterface:
def next_state(self, agent) -> AgentState:
"""Determine next state based on agent's decision."""
pass
def next_agent(self, agent) -> Agent:
"""Determine next agent to handle the request."""
pass
def is_subtask_end(self) -> bool:
"""Check if current state marks subtask boundary."""
pass
Agent Orchestration
Windows Two-Tier Architecture
Linux Single-Tier Architecture
Snapshot Capture
capture_last_snapshot()
async def capture_last_snapshot(self, sub_round_id: Optional[int] = None) -> None
Purpose: Capture UI state for logging, debugging, and evaluation.
Captured Artifacts:
| Artifact | File Pattern | Purpose |
|---|---|---|
| Window Screenshot | action_round_{id}_final.png |
Visual state |
| Subtask Screenshot | action_round_{id}_sub_round_{sub_id}_final.png |
Subtask boundary |
| UI Tree | ui_tree_round_{id}_final.json |
Control structure |
| Desktop Screenshot | desktop_round_{id}_final.png |
Full desktop (if enabled) |
Example Output:
logs/task_name/
├── action_round_0_sub_round_0_final.png ← After HostAgent selects Outlook
├── action_round_0_sub_round_1_final.png ← After AppAgent composes email
├── action_round_0_final.png ← Final state after sending
├── ui_trees/
│ ├── ui_tree_round_0_sub_round_0_final.json
│ ├── ui_tree_round_0_sub_round_1_final.json
│ └── ui_tree_round_0_final.json
└── desktop_round_0_final.png
save_ui_tree()
async def save_ui_tree(self, save_path: str)
Saves the control tree as JSON for analysis:
{
"root": {
"control_type": "Window",
"name": "Outlook",
"children": [
{
"control_type": "Button",
"name": "New Email",
"automation_id": "btn_new_email",
"bounding_box": [100, 50, 150, 30]
}
]
}
}
Properties
Auto-Syncing Properties
Properties that sync with context automatically:
@property
def step(self) -> int:
"""Current step number in this round."""
return self._context.get(ContextNames.ROUND_STEP).get(self.id, 0)
@property
def cost(self) -> float:
"""Total cost for this round."""
return self._context.get(ContextNames.ROUND_COST).get(self.id, 0)
@property
def subtask_amount(self) -> int:
"""Number of subtasks completed."""
return self._context.get(ContextNames.ROUND_SUBTASK_AMOUNT).get(self.id, 0)
@subtask_amount.setter
def subtask_amount(self, value: int) -> None:
"""Set subtask amount in context."""
self._context.current_round_subtask_amount = value
Static Properties
@property
def request(self) -> str:
"""User request for this round."""
return self._request
@property
def id(self) -> int:
"""Round number (sequential)."""
return self._id
@property
def context(self) -> Context:
"""Shared context object."""
return self._context
Cost Tracking
print_cost()
Display round cost after completion:
def print_cost(self) -> None:
"""Print the total cost of the round."""
total_cost = self.cost
if isinstance(total_cost, float):
formatted_cost = "${:.2f}".format(total_cost)
console.print(
f"💰 Request total cost for current round is {formatted_cost}",
style="yellow",
)
Output Example:
💰 Request total cost for current round is $0.42
Cost Components:
- LLM API calls (HostAgent + AppAgent)
- Vision model calls (screenshot analysis)
- Embedding model calls (if used)
Error Handling
Error States
Rounds can end in error state:
if agent_fails:
self.state = AgentState.ERROR
# Round exits loop with ERROR state
Common Error Scenarios
| Error Type | Trigger | Handling |
|---|---|---|
| Timeout | Command execution timeout | Set ERROR state |
| Agent Failure | LLM returns invalid plan | Set ERROR state |
| UI Not Found | Element doesn't exist | Retry or ERROR |
| Connection Lost | Dispatcher disconnected | Set ERROR state |
Error Recovery
try:
await self.agent.handle(self.context)
except AgentError as e:
logger.error(f"Agent handle failed: {e}")
self.state = AgentState.ERROR
# Loop exits
Configuration
Round Behavior Settings
| Setting | Type | Purpose |
|---|---|---|
eva_round |
bool |
Enable round evaluation |
SLEEP_TIME |
float |
Wait time before snapshot (seconds) |
save_ui_tree |
bool |
Save UI trees |
save_full_screen |
bool |
Save desktop screenshots |
Example Configuration:
# config/ufo/config.yaml
system:
eva_round: true
SLEEP_TIME: 0.5
save_ui_tree: true
save_full_screen: false
Best Practices
Efficient Round Execution
Performance Tips
- ✅ Keep agent prompts concise
- ✅ Use appropriate timeouts for commands
- ✅ Disable full desktop screenshots unless needed
- ✅ Capture UI trees only for debugging
- ❌ Don't set SLEEP_TIME too high
- ❌ Don't enable all logging in production
State Machine Design
Clean State Management
- ✅ Each state should have clear purpose
- ✅ Transitions should be deterministic
- ✅ Error states should be terminal
- ✅ Subtask boundaries should be meaningful
- ❌ Don't create circular state loops
- ❌ Don't mix state logic with business logic
Reference
BaseRound
Bases: ABC
A round of a session in UFO. A round manages a single user request and consists of multiple steps. A session may consists of multiple rounds of interactions.
Initialize a round.
| Parameters: |
|
|---|
Source code in module/basic.py
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | |
agent
property
writable
Get the agent of the round. return: The agent of the round.
application_window
property
writable
Get the application of the session. return: The application of the session.
application_window_info
property
writable
Get the application window info of the session. return: The application window info of the session.
context
property
Get the context of the round. return: The context of the round.
cost
property
Get the cost of the round. return: The cost of the round.
id
property
Get the id of the round. return: The id of the round.
log_path
property
Get the log path of the round.
return: The log path of the round.
request
property
Get the request of the round. return: The request of the round.
state
property
writable
Get the status of the round. return: The status of the round.
step
property
Get the local step of the round. return: The step of the round.
subtask_amount
property
writable
Get the subtask amount of the round. return: The subtask amount of the round.
capture_last_snapshot(sub_round_id=None)
async
Capture the last snapshot of the application, including the screenshot and the XML file if configured.
| Parameters: |
|
|---|
Source code in module/basic.py
303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 | |
evaluation()
TODO: Evaluate the round.
Source code in module/basic.py
405 406 407 408 409 | |
is_finished()
Check if the round is finished. return: True if the round is finished, otherwise False.
Source code in module/basic.py
183 184 185 186 187 188 189 190 191 | |
print_cost()
Print the total cost of the round.
Source code in module/basic.py
281 282 283 284 285 286 287 288 289 290 291 292 | |
run()
async
Run the round.
Source code in module/basic.py
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | |
save_ui_tree(save_path)
async
Save the UI tree of the current application window.
Source code in module/basic.py
379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 | |
See Also
- Session - Multi-round conversation management
- Context - Shared state across rounds
- Dispatcher - Command execution
- Overview - Module system architecture