HostAgent: Desktop Orchestrator

HostAgent serves as the centralized control plane of UFO². It interprets user-specified goals, decomposes them into structured subtasks, instantiates and dispatches AppAgent modules, and coordinates their progress across the system. HostAgent provides system-level services for introspection, planning, application lifecycle management, and multi-agent synchronization.


Architecture Overview

Operating atop the native Windows substrate, HostAgent monitors active applications, issues shell commands to spawn new processes as needed, and manages the creation and teardown of application-specific AppAgent instances. All coordination occurs through a persistent state machine, which governs the transitions across execution phases.

HostAgent Architecture

Figure: HostAgent architecture showing the finite state machine, processing pipeline, and interactions with AppAgents through the Blackboard pattern.

Core Responsibilities

Task Decomposition

Given a user's natural language input, HostAgent identifies the underlying task goal and decomposes it into a dependency-ordered subtask graph.

Example: User request "Extract data from Word and create an Excel chart" becomes:

  1. Extract table from Word document
  2. Create chart in Excel with extracted data

Task Decomposition

Figure: HostAgent decomposes user requests into sequential subtasks, assigns each to the appropriate application, and orchestrates AppAgents to complete them in dependency order.

Application Lifecycle Management

For each subtask, HostAgent inspects system process metadata (via UIA APIs) to determine whether the target application is running. If not, it launches the program and registers it with the runtime.

AppAgent Instantiation

HostAgent spawns the corresponding AppAgent for each active application, providing it with task context, memory references, and relevant toolchains (e.g., APIs, documentation).

Task Scheduling and Control

The global execution plan is serialized into a finite state machine (FSM), allowing HostAgent to enforce execution order, detect failures, and resolve dependencies across agents. See State Machine Details for the FSM architecture.

Shared State Communication

HostAgent reads from and writes to a global blackboard, enabling inter-agent communication and system-level observability for debugging and replay.


Key Characteristics

  • Scope: Desktop-level orchestrator (system-wide, not application-specific)
  • Lifecycle: Single instance per session, persists throughout task execution
  • Hierarchy: Parent agent that manages multiple child AppAgents
  • Communication: Owns and coordinates the shared Blackboard
  • Control: 7-state finite state machine with 4-phase processing pipeline

Execution Workflow

sequenceDiagram participant User participant HostAgent participant Blackboard participant AppAgent1 participant AppAgent2 User->>HostAgent: "Extract Word table, create Excel chart" HostAgent->>HostAgent: Decompose into subtasks HostAgent->>Blackboard: Write subtask 1 HostAgent->>AppAgent1: Create/Get Word AppAgent AppAgent1->>AppAgent1: Execute Word task AppAgent1->>Blackboard: Write result 1 AppAgent1-->>HostAgent: Return FINISH HostAgent->>Blackboard: Read result 1 HostAgent->>Blackboard: Write subtask 2 HostAgent->>AppAgent2: Create/Get Excel AppAgent AppAgent2->>Blackboard: Read result 1 AppAgent2->>AppAgent2: Execute Excel task AppAgent2->>Blackboard: Write result 2 AppAgent2-->>HostAgent: Return FINISH HostAgent->>HostAgent: Verify completion HostAgent-->>User: Task completed

Deep Dive Topics


Input and Output

HostAgent Input

Input Description Type
User Request Natural language task description String
Application Information Active application metadata List of Dicts
Desktop Screenshots Visual context of desktop state Image
Previous Sub-Tasks Completed subtask history List of Dicts
Previous Plan Planned future subtasks List of Strings
Blackboard Shared memory space Dictionary

HostAgent Output

Output Description Type
Observation Desktop screenshot analysis String
Thought Reasoning process String
Current Sub-Task Active subtask description String
Message Information for AppAgent String
ControlLabel Selected application index String
ControlText Selected application name String
Plan Future subtask sequence List of Strings
Status Agent state (CONTINUE/ASSIGN/FINISH/etc.) String
Comment User-facing information String
Questions Clarification requests List of Strings
Bash System command to execute String

Example Output:

{
    "Observation": "Desktop shows Microsoft Word with document open containing a table",
    "Thought": "User wants to extract data from Word first",
    "Current Sub-Task": "Extract the table data from the document",
    "Message": "Starting data extraction from Word document",
    "ControlLabel": "0",
    "ControlText": "Microsoft Word - Document1",
    "Plan": ["Extract table from Word", "Create chart in Excel"],
    "Status": "ASSIGN",
    "Comment": "Delegating table extraction to Word AppAgent",
    "Questions": [],
    "Bash": ""
}


Architecture & Design:

Configuration:

System Integration:


API Reference

Bases: BasicAgent

The HostAgent class the manager of AppAgents.

Initialize the HostAgent. :name: The name of the agent.

Parameters:
  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt file path.

  • example_prompt (str) –

    The example prompt file path.

  • api_prompt (str) –

    The API prompt file path.

Source code in agents/agent/host_agent.py
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
def __init__(
    self,
    name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
) -> None:
    """
    Initialize the HostAgent.
    :name: The name of the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    """
    super().__init__(name=name)
    self.prompter = self.get_prompter(
        is_visual, main_prompt, example_prompt, api_prompt
    )
    self.offline_doc_retriever = None
    self.online_doc_retriever = None
    self.experience_retriever = None
    self.human_demonstration_retriever = None
    self.agent_factory = AgentFactory()
    self.appagent_dict = {}
    self._active_appagent = None
    self._blackboard = Blackboard()
    self.set_state(self.default_state)

    self._context_provision_executed = False

blackboard property

Get the blackboard.

default_state property

Get the default state.

status_manager property

Get the status manager.

sub_agent_amount property

Get the amount of sub agents.

Returns:
  • int

    The amount of sub agents.

context_provision(context) async

Provide the context for the agent.

Parameters:
  • context (Context) –

    The context for the agent.

Source code in agents/agent/host_agent.py
276
277
278
279
280
281
async def context_provision(self, context: Context) -> None:
    """
    Provide the context for the agent.
    :param context: The context for the agent.
    """
    await self._load_mcp_context(context)

create_subagent(context=None)

Orchestrate creation of the appropriate sub-agent. Decides between third-party agent and built-in app/operator agent.

Parameters:
  • context (Optional['Context'], default: None ) –

    The context for the agent and session.

Source code in agents/agent/host_agent.py
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
def create_subagent(self, context: Optional["Context"] = None) -> None:
    """
    Orchestrate creation of the appropriate sub-agent.
    Decides between third-party agent and built-in app/operator agent.
    :param context: The context for the agent and session.
    """
    mode = RunningMode(context.get(ContextNames.MODE))

    assigned_third_party_agent = self.processor.processing_context.get_local(
        "assigned_third_party_agent"
    )
    # if self.processor.assigned_third_party_agent:
    if assigned_third_party_agent:
        config = AgentConfigResolver.resolve_third_party_config(
            assigned_third_party_agent, mode
        )
    else:
        window_name = context.get(ContextNames.APPLICATION_PROCESS_NAME)
        root_name = context.get(ContextNames.APPLICATION_ROOT_NAME)

        if mode in {
            RunningMode.NORMAL,
            RunningMode.BATCH_NORMAL,
            RunningMode.FOLLOWER,
        }:
            config = AgentConfigResolver.resolve_app_agent_config(
                root_name, window_name, mode
            )
        elif mode in {RunningMode.NORMAL_OPERATOR, RunningMode.BATCH_OPERATOR}:
            config = AgentConfigResolver.resolve_operator_agent_config(
                root_name, window_name, mode
            )
        else:
            raise ValueError(f"Unsupported mode: {mode}")

    agent_name = config.get("name")
    agent_type = config.get("agent_type")
    process_name = config.get("process_name")

    self.logger.info(f"Creating sub agent with config: {config}")

    app_agent = self.agent_factory.create_agent(**config)
    self.appagent_dict[agent_name] = app_agent
    app_agent.host = self
    self._active_appagent = app_agent

    self.logger.info(
        f"Created sub agent: {agent_name} with type {agent_type} and process name {process_name}, class {app_agent.__class__.__name__}"
    )

    return app_agent

get_active_appagent()

Get the active app agent.

Returns:
  • AppAgent

    The active app agent.

Source code in agents/agent/host_agent.py
205
206
207
208
209
210
def get_active_appagent(self) -> AppAgent:
    """
    Get the active app agent.
    :return: The active app agent.
    """
    return self._active_appagent

get_prompter(is_visual, main_prompt, example_prompt, api_prompt)

Get the prompt for the agent.

Parameters:
  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt file path.

  • example_prompt (str) –

    The example prompt file path.

  • api_prompt (str) –

    The API prompt file path.

Returns:
  • HostAgentPrompter

    The prompter instance.

Source code in agents/agent/host_agent.py
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
def get_prompter(
    self,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
) -> HostAgentPrompter:
    """
    Get the prompt for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    :return: The prompter instance.
    """
    return HostAgentPrompter(is_visual, main_prompt, example_prompt, api_prompt)

message_constructor(image_list, os_info, plan, prev_subtask, request, blackboard_prompt)

Construct the message.

Parameters:
  • image_list (List[str]) –

    The list of screenshot images.

  • os_info (str) –

    The OS information.

  • prev_subtask (List[Dict[str, str]]) –

    The previous subtask.

  • plan (List[str]) –

    The plan.

  • request (str) –

    The request.

Returns:
  • List[Dict[str, Union[str, List[Dict[str, str]]]]]

    The message.

Source code in agents/agent/host_agent.py
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
def message_constructor(
    self,
    image_list: List[str],
    os_info: str,
    plan: List[str],
    prev_subtask: List[Dict[str, str]],
    request: str,
    blackboard_prompt: List[Dict[str, str]],
) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
    """
    Construct the message.
    :param image_list: The list of screenshot images.
    :param os_info: The OS information.
    :param prev_subtask: The previous subtask.
    :param plan: The plan.
    :param request: The request.
    :return: The message.
    """
    hostagent_prompt_system_message = self.prompter.system_prompt_construction()
    hostagent_prompt_user_message = self.prompter.user_content_construction(
        image_list=image_list,
        control_item=os_info,
        prev_subtask=prev_subtask,
        prev_plan=plan,
        user_request=request,
    )

    if blackboard_prompt:
        hostagent_prompt_user_message = (
            blackboard_prompt + hostagent_prompt_user_message
        )

    hostagent_prompt_message = self.prompter.prompt_construction(
        hostagent_prompt_system_message, hostagent_prompt_user_message
    )

    return hostagent_prompt_message

print_response(response)

Print the response using the presenter.

Parameters:
  • response (HostAgentResponse) –

    The response object to print.

Source code in agents/agent/host_agent.py
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
def print_response(self, response: HostAgentResponse) -> None:
    """
    Print the response using the presenter.
    :param response: The response object to print.
    """
    # Format the action string using get_command_string and pass to presenter
    function = response.function
    arguments = response.arguments

    action_str = None
    if function:
        action_str = self.get_command_string(function, arguments)

    # Pass formatted action string as parameter instead of modifying response
    self.presenter.present_host_agent_response(response, action_str=action_str)

process(context) async

Process the agent.

Parameters:
  • context (Context) –

    The context.

Source code in agents/agent/host_agent.py
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
async def process(self, context: Context) -> None:
    """
    Process the agent.
    :param context: The context.
    """
    # from ufo.agents.processors.host_agent_processor import HostAgentProcessor

    if not self._context_provision_executed:
        await self.context_provision(context=context)
        self._context_provision_executed = True
    self.processor = HostAgentProcessor(agent=self, global_context=context)
    # self.processor = HostAgentProcessor(agent=self, context=context)
    await self.processor.process()

    # Sync the status with the processor.
    # self.status = self.processor.status
    self.status = self.processor.processing_context.get_local("status")
    self.logger.info(f"Host agent status updated to: {self.status}")

process_confirmation()

TODO: Process the confirmation.

Source code in agents/agent/host_agent.py
365
366
367
368
369
def process_confirmation(self) -> None:
    """
    TODO: Process the confirmation.
    """
    pass

Summary

HostAgent is the desktop-level orchestrator that:

  • Decomposes tasks and coordinates AppAgents
  • Operates at system level, not application level
  • Uses a 7-state FSM: CONTINUE → ASSIGN → AppAgent → CONTINUE → FINISH
  • Executes a 4-phase pipeline: DATA_COLLECTION → LLM → ACTION → MEMORY
  • Creates, caches, and reuses AppAgent instances
  • Provides shared Blackboard memory for all agents
  • Maintains single instance per session managing multiple AppAgents

Next Steps:

  1. Read State Machine for FSM details
  2. Read Processing Strategy for pipeline architecture
  3. Read Command System for available desktop operations
  4. Read AppAgent for application-level execution