Round

A Round is a single request-response cycle within a Session, orchestrating agents through a state machine to execute commands until the user's request is fulfilled.

Quick Reference:


Overview

A Round represents one complete request-response interaction:

  • Input: User request (e.g., "Send an email to John")
  • Processing: Agent state machine execution
  • Output: Request fulfilled or error state

Round in Context

graph TB subgraph "Session Scope" SESS[Session] REQ1[Request 1] REQ2[Request 2] REQ3[Request 3] end subgraph "Round Scope (One Request)" ROUND[Round Instance] CTX[Shared Context] INIT[Initialize] LOOP[Execution Loop] FINISH[Finish Condition] end subgraph "Execution Loop Detail" HANDLE[agent.handle<br/>Generate & Execute] NEXT_STATE[next_state<br/>State Transition] NEXT_AGENT[next_agent<br/>Agent Switching] SUBTASK{Subtask End?} SNAPSHOT[capture_last_snapshot] end SESS --> REQ1 SESS --> REQ2 SESS --> REQ3 REQ1 --> ROUND ROUND --> CTX ROUND --> INIT INIT --> LOOP LOOP --> HANDLE HANDLE --> NEXT_STATE NEXT_STATE --> NEXT_AGENT NEXT_AGENT --> SUBTASK SUBTASK -->|Yes| SNAPSHOT SNAPSHOT --> FINISH SUBTASK -->|No| FINISH FINISH -->|Not finished| HANDLE FINISH -->|Finished| REQ2 style ROUND fill:#e1f5ff style HANDLE fill:#f0ffe1 style SNAPSHOT fill:#fff4e1 style FINISH fill:#ffe1f5

Round Lifecycle

State Machine Overview

stateDiagram-v2 [*] --> Initialized: create_new_round() Initialized --> Running: run() Running --> AgentHandle: agent.handle(context) AgentHandle --> StateTransition: generate actions StateTransition --> AgentSwitch: determine next AgentSwitch --> SubtaskCheck: update agent SubtaskCheck --> CaptureSnapshot: if subtask_end SubtaskCheck --> FinishCheck: if not subtask_end CaptureSnapshot --> FinishCheck: snapshot saved FinishCheck --> AgentHandle: not finished FinishCheck --> FinalSnapshot: finished FinalSnapshot --> Evaluation: if enabled Evaluation --> [*]: round complete FinalSnapshot --> [*]: skip evaluation

Core Execution Loop

async def run(self) -> None:
    """
    Run the round asynchronously.
    """

    while not self.is_finished():
        # 1. Agent processes current state
        await self.agent.handle(self.context)

        # 2. State machine transitions
        self.state = self.agent.state.next_state(self.agent)

        # 3. Agent switching (HostAgent ↔ AppAgent)
        self.agent = self.agent.state.next_agent(self.agent)
        self.agent.set_state(self.state)

        # 4. Snapshot capture at subtask boundaries
        if self.state.is_subtask_end():
            time.sleep(configs["SLEEP_TIME"])
            await self.capture_last_snapshot(sub_round_id=self.subtask_amount)
            self.subtask_amount += 1

    # 5. Add request to blackboard
    self.agent.blackboard.add_requests(
        {f"request_{self.id}": self.request}
    )

    # 6. Final snapshot
    if self.application_window is not None:
        await self.capture_last_snapshot()

    # 7. Evaluation (optional)
    if self._should_evaluate:
        await self.evaluation()

Lifecycle Stages

1. Initialization

Created by session's create_new_round():

round = Round(
    task="email_task",
    context=session.context,
    request="Send an email to John",
    id=0  # Round number
)

Initialization sets:

Property Source Description
task Session Task name for logging
context Session Shared context object
request User input Natural language request
id Round counter Sequential round number
agent Initial agent Usually HostAgent (Windows) or LinuxAgent
state Initial state Usually START state

2. Agent Handle

Each loop iteration calls agent.handle(context):

await self.agent.handle(self.context)

What happens:

  1. Observation: Agent observes UI state
  2. Reasoning: LLM generates plan and actions
  3. Action: Commands sent to dispatcher
  4. Execution: Commands executed locally or remotely
  5. Results: Results stored in context

Example Flow:

sequenceDiagram participant R as Round participant A as Agent (HostAgent) participant LLM as Language Model participant D as Dispatcher participant UI as UI System R->>A: handle(context) A->>UI: Observe desktop UI-->>A: Screenshot + control tree A->>LLM: Generate plan Note over LLM: Request: "Send email to John"<br/>Observation: Desktop with Outlook icon LLM-->>A: Action: open_application("Outlook") A->>D: execute_commands([open_app_cmd]) D->>UI: Click Outlook icon UI-->>D: Result: Outlook opened D-->>A: ResultStatus.SUCCESS A->>R: Update context with results

3. State Transition

After agent handling, state machine transitions:

self.state = self.agent.state.next_state(self.agent)

State Transitions:

Current State Condition Next State
START Initial CONTINUE
CONTINUE More actions needed CONTINUE
CONTINUE Task complete FINISH
CONTINUE Error occurred ERROR
FINISH Always Round ends
ERROR Always Round ends

State Diagram:

stateDiagram-v2 [*] --> START START --> CONTINUE: First action CONTINUE --> CONTINUE: More actions CONTINUE --> FINISH: Task complete CONTINUE --> ERROR: Error occurred FINISH --> [*] ERROR --> [*]

4. Agent Switching

Determine which agent handles next step:

self.agent = self.agent.state.next_agent(self.agent)
self.agent.set_state(self.state)

Agent Switching Logic (Windows):

Current Agent Condition Next Agent
HostAgent Application selected AppAgent
AppAgent Need different app HostAgent
AppAgent Same app continues AppAgent
HostAgent Task complete HostAgent (finish)

Agent Switching Logic (Linux):

Current Agent Condition Next Agent
LinuxAgent Always LinuxAgent (no switching)

Switching Example:

sequenceDiagram participant R as Round participant H as HostAgent participant A as AppAgent R->>H: handle() - Select app H-->>R: Application: Outlook Note over R: Agent switch: HostAgent → AppAgent R->>A: handle() - Compose email A-->>R: Commands executed R->>A: handle() - Send email A-->>R: Task complete Note over R: State: FINISH

5. Subtask Boundary Capture

Capture snapshot when subtask ends:

if self.state.is_subtask_end():
    time.sleep(configs["SLEEP_TIME"])  # Let UI settle
    await self.capture_last_snapshot(sub_round_id=self.subtask_amount)
    self.subtask_amount += 1

Subtask End Conditions:

  • Agent switched (HostAgent ↔ AppAgent)
  • Major UI change detected
  • Explicit subtask boundary in plan

Captured Data:

  1. Window screenshot: action_round_{id}_sub_round_{sub_id}_final.png
  2. UI tree (if enabled): ui_tree_round_{id}_sub_round_{sub_id}_final.json
  3. Desktop screenshot (if enabled): desktop_round_{id}_sub_round_{sub_id}_final.png

6. Finish Check

def is_finished(self) -> bool:
    """Check if round is complete."""
    return self.state in [AgentState.FINISH, AgentState.ERROR]

Loop continues until state is FINISH or ERROR.

7. Final Snapshot

After loop exits:

if self.application_window is not None:
    await self.capture_last_snapshot()

Final snapshot captures the end state of the application for logging and evaluation.

8. Evaluation

Optional evaluation of round success:

if self._should_evaluate:
    await self.evaluation()

Evaluation checks: - Was the request fulfilled? - Quality of actions taken - Efficiency metrics


State Machine

AgentState Enum

class AgentState(Enum):
    START = "START"
    CONTINUE = "CONTINUE"
    FINISH = "FINISH"
    ERROR = "ERROR"

State Behaviors

State Meaning Transitions To
START Initial state CONTINUE
CONTINUE Actively processing CONTINUE, FINISH, ERROR
FINISH Successfully complete Round ends
ERROR Fatal error occurred Round ends

State Methods

Each state implements:

class StateInterface:
    def next_state(self, agent) -> AgentState:
        """Determine next state based on agent's decision."""
        pass

    def next_agent(self, agent) -> Agent:
        """Determine next agent to handle the request."""
        pass

    def is_subtask_end(self) -> bool:
        """Check if current state marks subtask boundary."""
        pass

Agent Orchestration

Windows Two-Tier Architecture

sequenceDiagram participant U as User Request participant R as Round participant H as HostAgent participant A as AppAgent participant UI as UI System U->>R: "Send email to John" R->>H: handle() - Select application H->>UI: Observe desktop UI-->>H: Screenshot of desktop H->>H: Decide: Outlook H-->>R: Switch to AppAgent for Outlook R->>A: handle() - Compose email A->>UI: Observe Outlook window UI-->>A: Screenshot + control tree A->>A: Plan: Click "New Email" A->>UI: Click command UI-->>A: New email window opened A-->>R: Continue R->>A: handle() - Fill recipient A->>UI: Type "john@example.com" UI-->>A: Recipient filled A-->>R: Continue R->>A: handle() - Click Send A->>UI: Click "Send" button UI-->>A: Email sent A-->>R: Finish R-->>U: Request complete

Linux Single-Tier Architecture

sequenceDiagram participant U as User Request participant R as Round participant L as LinuxAgent participant UI as UI System U->>R: "Open gedit and type Hello" R->>L: handle() - Open application L->>UI: Observe desktop UI-->>L: Desktop state L->>L: Plan: Open gedit L->>UI: Launch gedit command UI-->>L: gedit opened L-->>R: Continue R->>L: handle() - Type text L->>UI: Type "Hello" UI-->>L: Text typed L-->>R: Finish R-->>U: Request complete

Snapshot Capture

capture_last_snapshot()

async def capture_last_snapshot(self, sub_round_id: Optional[int] = None) -> None

Purpose: Capture UI state for logging, debugging, and evaluation.

Captured Artifacts:

Artifact File Pattern Purpose
Window Screenshot action_round_{id}_final.png Visual state
Subtask Screenshot action_round_{id}_sub_round_{sub_id}_final.png Subtask boundary
UI Tree ui_tree_round_{id}_final.json Control structure
Desktop Screenshot desktop_round_{id}_final.png Full desktop (if enabled)

Example Output:

logs/task_name/
├── action_round_0_sub_round_0_final.png  ← After HostAgent selects Outlook
├── action_round_0_sub_round_1_final.png  ← After AppAgent composes email
├── action_round_0_final.png               ← Final state after sending
├── ui_trees/
│   ├── ui_tree_round_0_sub_round_0_final.json
│   ├── ui_tree_round_0_sub_round_1_final.json
│   └── ui_tree_round_0_final.json
└── desktop_round_0_final.png

save_ui_tree()

async def save_ui_tree(self, save_path: str)

Saves the control tree as JSON for analysis:

{
  "root": {
    "control_type": "Window",
    "name": "Outlook",
    "children": [
      {
        "control_type": "Button",
        "name": "New Email",
        "automation_id": "btn_new_email",
        "bounding_box": [100, 50, 150, 30]
      }
    ]
  }
}

Properties

Auto-Syncing Properties

Properties that sync with context automatically:

@property
def step(self) -> int:
    """Current step number in this round."""
    return self._context.get(ContextNames.ROUND_STEP).get(self.id, 0)

@property
def cost(self) -> float:
    """Total cost for this round."""
    return self._context.get(ContextNames.ROUND_COST).get(self.id, 0)

@property
def subtask_amount(self) -> int:
    """Number of subtasks completed."""
    return self._context.get(ContextNames.ROUND_SUBTASK_AMOUNT).get(self.id, 0)

@subtask_amount.setter
def subtask_amount(self, value: int) -> None:
    """Set subtask amount in context."""
    self._context.current_round_subtask_amount = value

Static Properties

@property
def request(self) -> str:
    """User request for this round."""
    return self._request

@property
def id(self) -> int:
    """Round number (sequential)."""
    return self._id

@property
def context(self) -> Context:
    """Shared context object."""
    return self._context

Cost Tracking

Display round cost after completion:

def print_cost(self) -> None:
    """Print the total cost of the round."""

    total_cost = self.cost
    if isinstance(total_cost, float):
        formatted_cost = "${:.2f}".format(total_cost)
        console.print(
            f"💰 Request total cost for current round is {formatted_cost}",
            style="yellow",
        )

Output Example:

💰 Request total cost for current round is $0.42

Cost Components:

  • LLM API calls (HostAgent + AppAgent)
  • Vision model calls (screenshot analysis)
  • Embedding model calls (if used)

Error Handling

Error States

Rounds can end in error state:

if agent_fails:
    self.state = AgentState.ERROR
    # Round exits loop with ERROR state

Common Error Scenarios

Error Type Trigger Handling
Timeout Command execution timeout Set ERROR state
Agent Failure LLM returns invalid plan Set ERROR state
UI Not Found Element doesn't exist Retry or ERROR
Connection Lost Dispatcher disconnected Set ERROR state

Error Recovery

try:
    await self.agent.handle(self.context)
except AgentError as e:
    logger.error(f"Agent handle failed: {e}")
    self.state = AgentState.ERROR
    # Loop exits

Configuration

Round Behavior Settings

Setting Type Purpose
eva_round bool Enable round evaluation
SLEEP_TIME float Wait time before snapshot (seconds)
save_ui_tree bool Save UI trees
save_full_screen bool Save desktop screenshots

Example Configuration:

# config/ufo/config.yaml
system:
  eva_round: true
  SLEEP_TIME: 0.5
  save_ui_tree: true
  save_full_screen: false

Best Practices

Efficient Round Execution

Performance Tips

  • ✅ Keep agent prompts concise
  • ✅ Use appropriate timeouts for commands
  • ✅ Disable full desktop screenshots unless needed
  • ✅ Capture UI trees only for debugging
  • ❌ Don't set SLEEP_TIME too high
  • ❌ Don't enable all logging in production

State Machine Design

Clean State Management

  • ✅ Each state should have clear purpose
  • ✅ Transitions should be deterministic
  • ✅ Error states should be terminal
  • ✅ Subtask boundaries should be meaningful
  • ❌ Don't create circular state loops
  • ❌ Don't mix state logic with business logic

Reference

BaseRound

Bases: ABC

A round of a session in UFO. A round manages a single user request and consists of multiple steps. A session may consists of multiple rounds of interactions.

Initialize a round.

Parameters:
  • request (str) –

    The request of the round.

  • agent (BasicAgent) –

    The initial agent of the round.

  • context (Context) –

    The shared context of the round.

  • should_evaluate (bool) –

    Whether to evaluate the round.

  • id (int) –

    The id of the round.

Source code in module/basic.py
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def __init__(
    self,
    request: str,
    agent: BasicAgent,
    context: Context,
    should_evaluate: bool,
    id: int,
) -> None:
    """
    Initialize a round.
    :param request: The request of the round.
    :param agent: The initial agent of the round.
    :param context: The shared context of the round.
    :param should_evaluate: Whether to evaluate the round.
    :param id: The id of the round.
    """

    self._request = request
    self._context = context
    self._agent = agent
    self._state = agent.state
    self._id = id
    self._should_evaluate = should_evaluate
    self.logger = logging.getLogger(__name__)

    self._init_context()

agent property writable

Get the agent of the round. return: The agent of the round.

application_window property writable

Get the application of the session. return: The application of the session.

application_window_info property writable

Get the application window info of the session. return: The application window info of the session.

context property

Get the context of the round. return: The context of the round.

cost property

Get the cost of the round. return: The cost of the round.

id property

Get the id of the round. return: The id of the round.

log_path property

Get the log path of the round.

return: The log path of the round.

request property

Get the request of the round. return: The request of the round.

state property writable

Get the status of the round. return: The status of the round.

step property

Get the local step of the round. return: The step of the round.

subtask_amount property writable

Get the subtask amount of the round. return: The subtask amount of the round.

capture_last_snapshot(sub_round_id=None) async

Capture the last snapshot of the application, including the screenshot and the XML file if configured.

Parameters:
  • sub_round_id (Optional[int], default: None ) –

    The id of the sub-round, default is None.

Source code in module/basic.py
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
async def capture_last_snapshot(self, sub_round_id: Optional[int] = None) -> None:
    """
    Capture the last snapshot of the application, including the screenshot and the XML file if configured.
    :param sub_round_id: The id of the sub-round, default is None.
    """

    # Capture the final screenshot
    if sub_round_id is None:
        screenshot_save_path = self.log_path + f"action_round_{self.id}_final.png"
    else:
        screenshot_save_path = (
            self.log_path
            + f"action_round_{self.id}_sub_round_{sub_round_id}_final.png"
        )

    if (
        self.application_window is not None
        or self.application_window_info is not None
    ):

        try:

            result = await self.context.command_dispatcher.execute_commands(
                [
                    Command(
                        tool_name="capture_window_screenshot",
                        parameters={},
                        tool_type="data_collection",
                    )
                ]
            )

            image = result[0].result
            utils.save_image_string(image, screenshot_save_path)
            self.logger.info(
                f"Captured application window screenshot at final: {screenshot_save_path}"
            )

        except Exception as e:
            self.logger.warning(
                f"The last snapshot capture failed, due to the error: {e}"
            )
        if ufo_config.system.save_ui_tree:
            # Get session data manager from context

            ui_tree_path = os.path.join(self.log_path, "ui_trees")
            ui_tree_file_name = (
                f"ui_tree_round_{self.id}_final.json"
                if sub_round_id is None
                else f"ui_tree_round_{self.id}_sub_round_{sub_round_id}_final.json"
            )
            ui_tree_save_path = os.path.join(ui_tree_path, ui_tree_file_name)

            await self.save_ui_tree(ui_tree_save_path)

        if ufo_config.system.save_full_screen:

            desktop_save_path = (
                self.log_path
                + f"desktop_round_{self.id}_sub_round_{sub_round_id}_final.png"
            )

            result = await self.context.command_dispatcher.execute_commands(
                [
                    Command(
                        tool_name="capture_desktop_screenshot",
                        parameters={"all_screens": True},
                        tool_type="data_collection",
                    )
                ]
            )

            desktop_screenshot_url = result[0].result
            utils.save_image_string(desktop_screenshot_url, desktop_save_path)
            self.logger.info(f"Desktop screenshot saved to {desktop_save_path}")

evaluation()

TODO: Evaluate the round.

Source code in module/basic.py
405
406
407
408
409
def evaluation(self) -> None:
    """
    TODO: Evaluate the round.
    """
    pass

is_finished()

Check if the round is finished. return: True if the round is finished, otherwise False.

Source code in module/basic.py
183
184
185
186
187
188
189
190
191
def is_finished(self) -> bool:
    """
    Check if the round is finished.
    return: True if the round is finished, otherwise False.
    """
    return (
        self.state.is_round_end()
        or self.context.get(ContextNames.SESSION_STEP) >= ufo_config.system.max_step
    )

print_cost()

Print the total cost of the round.

Source code in module/basic.py
281
282
283
284
285
286
287
288
289
290
291
292
def print_cost(self) -> None:
    """
    Print the total cost of the round.
    """

    total_cost = self.cost
    if isinstance(total_cost, float):
        formatted_cost = "${:.2f}".format(total_cost)
        console.print(
            f"💰 Request total cost for current round is {formatted_cost}",
            style="yellow",
        )

run() async

Run the round.

Source code in module/basic.py
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
async def run(self) -> None:
    """
    Run the round.
    """

    while not self.is_finished():

        await self.agent.handle(self.context)

        # Take action

        self.state = self.agent.state.next_state(self.agent)
        self.agent = self.agent.state.next_agent(self.agent)

        self.logger.info(
            f"Agent {self.agent.name} transitioned to state {self.state.name()}"
        )

        self.agent.set_state(self.state)

        # If the subtask ends, capture the last snapshot of the application.
        if self.state.is_subtask_end():
            time.sleep(ufo_config.system.sleep_time)
            await self.capture_last_snapshot(sub_round_id=self.subtask_amount)
            self.subtask_amount += 1

    self.agent.blackboard.add_requests(
        {"request_{i}".format(i=self.id): self.request}
    )

    await self.capture_last_snapshot()

    if self._should_evaluate:
        self.evaluation()

    return self.context.get(ContextNames.ROUND_RESULT)

save_ui_tree(save_path) async

Save the UI tree of the current application window.

Source code in module/basic.py
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
async def save_ui_tree(self, save_path: str):
    """
    Save the UI tree of the current application window.
    """
    if self.application_window is not None:
        result = await self.context.command_dispatcher.execute_commands(
            [
                Command(
                    tool_name="get_ui_tree",
                    parameters={},
                    tool_type="data_collection",
                )
            ]
        )
        step_ui_tree = result[0].result

        if step_ui_tree:

            save_dir = os.path.dirname(save_path)
            if not os.path.exists(save_dir):
                os.makedirs(save_dir)

            with open(save_path, "w") as file:
                json.dump(step_ui_tree, file, indent=4)
                self.logger.info(f"UI tree saved to {save_path}")

See Also