HostAgent: Desktop Orchestrator

HostAgent serves as the centralized control plane of UFO². It interprets user-specified goals, decomposes them into structured subtasks, instantiates and dispatches AppAgent modules, and coordinates their progress across the system. HostAgent provides system-level services for introspection, planning, application lifecycle management, and multi-agent synchronization.

Architecture Overview

Operating atop the native Windows substrate, HostAgent monitors active applications, issues shell commands to spawn new processes as needed, and manages the creation and teardown of application-specific AppAgent instances. All coordination occurs through a persistent state machine, which governs the transitions across execution phases.

HostAgent Architecture — **Figure:** HostAgent architecture showing the finite state machine, processing pipeline, and interactions with AppAgents through the Blackboard pattern.

Core Responsibilities

Task Decomposition

Given a user's natural language input, HostAgent identifies the underlying task goal and decomposes it into a dependency-ordered subtask graph.

Example: User request "Extract data from Word and create an Excel chart" becomes:

Extract table from Word document
Create chart in Excel with extracted data

Task Decomposition — **Figure:** HostAgent decomposes user requests into sequential subtasks, assigns each to the appropriate application, and orchestrates AppAgents to complete them in dependency order.

Application Lifecycle Management

For each subtask, HostAgent inspects system process metadata (via UIA APIs) to determine whether the target application is running. If not, it launches the program and registers it with the runtime.

AppAgent Instantiation

HostAgent spawns the corresponding AppAgent for each active application, providing it with task context, memory references, and relevant toolchains (e.g., APIs, documentation).

Task Scheduling and Control

The global execution plan is serialized into a finite state machine (FSM), allowing HostAgent to enforce execution order, detect failures, and resolve dependencies across agents. See State Machine Details for the FSM architecture.

Shared State Communication

HostAgent reads from and writes to a global blackboard, enabling inter-agent communication and system-level observability for debugging and replay.

Key Characteristics

Scope: Desktop-level orchestrator (system-wide, not application-specific)
Lifecycle: Single instance per session, persists throughout task execution
Hierarchy: Parent agent that manages multiple child AppAgents
Communication: Owns and coordinates the shared Blackboard
Control: 7-state finite state machine with 4-phase processing pipeline

Execution Workflow

sequenceDiagram participant User participant HostAgent participant Blackboard participant AppAgent1 participant AppAgent2 User->>HostAgent: "Extract Word table, create Excel chart" HostAgent->>HostAgent: Decompose into subtasks HostAgent->>Blackboard: Write subtask 1 HostAgent->>AppAgent1: Create/Get Word AppAgent AppAgent1->>AppAgent1: Execute Word task AppAgent1->>Blackboard: Write result 1 AppAgent1-->>HostAgent: Return FINISH HostAgent->>Blackboard: Read result 1 HostAgent->>Blackboard: Write subtask 2 HostAgent->>AppAgent2: Create/Get Excel AppAgent AppAgent2->>Blackboard: Read result 1 AppAgent2->>AppAgent2: Execute Excel task AppAgent2->>Blackboard: Write result 2 AppAgent2-->>HostAgent: Return FINISH HostAgent->>HostAgent: Verify completion HostAgent-->>User: Task completed

Deep Dive Topics

State Machine: 7-state FSM architecture and transitions
Processing Strategy: 4-phase processing pipeline
Command System: Desktop-level MCP commands

Input and Output

HostAgent Input

Input	Description	Type
User Request	Natural language task description	String
Application Information	Active application metadata	List of Dicts
Desktop Screenshots	Visual context of desktop state	Image
Previous Sub-Tasks	Completed subtask history	List of Dicts
Previous Plan	Planned future subtasks	List of Strings
Blackboard	Shared memory space	Dictionary

HostAgent Output

Output	Description	Type
Observation	Desktop screenshot analysis	String
Thought	Reasoning process	String
Current Sub-Task	Active subtask description	String
Message	Information for AppAgent	String
ControlLabel	Selected application index	String
ControlText	Selected application name	String
Plan	Future subtask sequence	List of Strings
Status	Agent state (CONTINUE/ASSIGN/FINISH/etc.)	String
Comment	User-facing information	String
Questions	Clarification requests	List of Strings
Bash	System command to execute	String

Example Output:

{
    "Observation": "Desktop shows Microsoft Word with document open containing a table",
    "Thought": "User wants to extract data from Word first",
    "Current Sub-Task": "Extract the table data from the document",
    "Message": "Starting data extraction from Word document",
    "ControlLabel": "0",
    "ControlText": "Microsoft Word - Document1",
    "Plan": ["Extract table from Word", "Create chart in Excel"],
    "Status": "ASSIGN",
    "Comment": "Delegating table extraction to Word AppAgent",
    "Questions": [],
    "Bash": ""
}

Architecture & Design:

Windows Agent Overview: Module architecture and hierarchy
AppAgent: Application automation agent
Blackboard: Inter-agent communication
Memory System: Execution history

Configuration:

Configuration System Overview: System configuration structure
Agents Configuration: LLM and agent settings
System Configuration: Runtime and execution settings
MCP Reference: MCP server configuration

System Integration:

Session Management: Session lifecycle
Round Management: Execution rounds

API Reference

Bases: BasicAgent

The HostAgent class the manager of AppAgents.

Initialize the HostAgent. :name: The name of the agent.

Parameters:	`is_visual` (`bool`) – The flag indicating whether the agent is visual or not. `main_prompt` (`str`) – The main prompt file path. `example_prompt` (`str`) – The example prompt file path. `api_prompt` (`str`) – The API prompt file path.

Source code in agents/agent/host_agent.py

def __init__(
    self,
    name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
) -> None:
    """
    Initialize the HostAgent.
    :name: The name of the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    """
    super().__init__(name=name)
    self.prompter = self.get_prompter(
        is_visual, main_prompt, example_prompt, api_prompt
    )
    self.offline_doc_retriever = None
    self.online_doc_retriever = None
    self.experience_retriever = None
    self.human_demonstration_retriever = None
    self.agent_factory = AgentFactory()
    self.appagent_dict = {}
    self._active_appagent = None
    self._blackboard = Blackboard()
    self.set_state(self.default_state)

    self._context_provision_executed = False

`blackboard` `property`

Get the blackboard.

`default_state` `property`

Get the default state.

`status_manager` `property`

Get the status manager.

`sub_agent_amount` `property`

Get the amount of sub agents.

Returns:	`int` – The amount of sub agents.

`context_provision(context)` `async`

Provide the context for the agent.

Parameters:	`context` (`Context`) – The context for the agent.

Source code in agents/agent/host_agent.py

async def context_provision(self, context: Context) -> None:
    """
    Provide the context for the agent.
    :param context: The context for the agent.
    """
    await self._load_mcp_context(context)

`create_subagent(context=None)`

Orchestrate creation of the appropriate sub-agent. Decides between third-party agent and built-in app/operator agent.

Parameters:	`context` (`Optional['Context']`, default: `None` ) – The context for the agent and session.

Source code in agents/agent/host_agent.py

def create_subagent(self, context: Optional["Context"] = None) -> None:
    """
    Orchestrate creation of the appropriate sub-agent.
    Decides between third-party agent and built-in app/operator agent.
    :param context: The context for the agent and session.
    """
    mode = RunningMode(context.get(ContextNames.MODE))

    assigned_third_party_agent = self.processor.processing_context.get_local(
        "assigned_third_party_agent"
    )
    # if self.processor.assigned_third_party_agent:
    if assigned_third_party_agent:
        config = AgentConfigResolver.resolve_third_party_config(
            assigned_third_party_agent, mode
        )
    else:
        window_name = context.get(ContextNames.APPLICATION_PROCESS_NAME)
        root_name = context.get(ContextNames.APPLICATION_ROOT_NAME)

        if mode in {
            RunningMode.NORMAL,
            RunningMode.BATCH_NORMAL,
            RunningMode.FOLLOWER,
        }:
            config = AgentConfigResolver.resolve_app_agent_config(
                root_name, window_name, mode
            )
        elif mode in {RunningMode.NORMAL_OPERATOR, RunningMode.BATCH_OPERATOR}:
            config = AgentConfigResolver.resolve_operator_agent_config(
                root_name, window_name, mode
            )
        else:
            raise ValueError(f"Unsupported mode: {mode}")

    agent_name = config.get("name")
    agent_type = config.get("agent_type")
    process_name = config.get("process_name")

    self.logger.info(f"Creating sub agent with config: {config}")

    app_agent = self.agent_factory.create_agent(**config)
    self.appagent_dict[agent_name] = app_agent
    app_agent.host = self
    self._active_appagent = app_agent

    self.logger.info(
        f"Created sub agent: {agent_name} with type {agent_type} and process name {process_name}, class {app_agent.__class__.__name__}"
    )

    return app_agent

`get_active_appagent()`

Get the active app agent.

Returns:	`AppAgent` – The active app agent.

Source code in agents/agent/host_agent.py

def get_active_appagent(self) -> AppAgent:
    """
    Get the active app agent.
    :return: The active app agent.
    """
    return self._active_appagent

`get_prompter(is_visual, main_prompt, example_prompt, api_prompt)`

Get the prompt for the agent.

Parameters:	`is_visual` (`bool`) – The flag indicating whether the agent is visual or not. `main_prompt` (`str`) – The main prompt file path. `example_prompt` (`str`) – The example prompt file path. `api_prompt` (`str`) – The API prompt file path.

Returns:	`HostAgentPrompter` – The prompter instance.

Source code in agents/agent/host_agent.py

def get_prompter(
    self,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
) -> HostAgentPrompter:
    """
    Get the prompt for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    :return: The prompter instance.
    """
    return HostAgentPrompter(is_visual, main_prompt, example_prompt, api_prompt)

`message_constructor(image_list, os_info, plan, prev_subtask, request, blackboard_prompt)`

Construct the message.

Parameters:	`image_list` (`List[str]`) – The list of screenshot images. `os_info` (`str`) – The OS information. `prev_subtask` (`List[Dict[str, str]]`) – The previous subtask. `plan` (`List[str]`) – The plan. `request` (`str`) – The request.

Returns:	`List[Dict[str, Union[str, List[Dict[str, str]]]]]` – The message.

Source code in agents/agent/host_agent.py

def message_constructor(
    self,
    image_list: List[str],
    os_info: str,
    plan: List[str],
    prev_subtask: List[Dict[str, str]],
    request: str,
    blackboard_prompt: List[Dict[str, str]],
) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
    """
    Construct the message.
    :param image_list: The list of screenshot images.
    :param os_info: The OS information.
    :param prev_subtask: The previous subtask.
    :param plan: The plan.
    :param request: The request.
    :return: The message.
    """
    hostagent_prompt_system_message = self.prompter.system_prompt_construction()
    hostagent_prompt_user_message = self.prompter.user_content_construction(
        image_list=image_list,
        control_item=os_info,
        prev_subtask=prev_subtask,
        prev_plan=plan,
        user_request=request,
    )

    if blackboard_prompt:
        hostagent_prompt_user_message = (
            blackboard_prompt + hostagent_prompt_user_message
        )

    hostagent_prompt_message = self.prompter.prompt_construction(
        hostagent_prompt_system_message, hostagent_prompt_user_message
    )

    return hostagent_prompt_message

`print_response(response)`

Print the response using the presenter.

Parameters:	`response` (`HostAgentResponse`) – The response object to print.

Source code in agents/agent/host_agent.py

def print_response(self, response: HostAgentResponse) -> None:
    """
    Print the response using the presenter.
    :param response: The response object to print.
    """
    # Format the action string using get_command_string and pass to presenter
    function = response.function
    arguments = response.arguments

    action_str = None
    if function:
        action_str = self.get_command_string(function, arguments)

    # Pass formatted action string as parameter instead of modifying response
    self.presenter.present_host_agent_response(response, action_str=action_str)

`process(context)` `async`

Process the agent.

Parameters:	`context` (`Context`) – The context.

Source code in agents/agent/host_agent.py

async def process(self, context: Context) -> None:
    """
    Process the agent.
    :param context: The context.
    """
    # from ufo.agents.processors.host_agent_processor import HostAgentProcessor

    if not self._context_provision_executed:
        await self.context_provision(context=context)
        self._context_provision_executed = True
    self.processor = HostAgentProcessor(agent=self, global_context=context)
    # self.processor = HostAgentProcessor(agent=self, context=context)
    await self.processor.process()

    # Sync the status with the processor.
    # self.status = self.processor.status
    self.status = self.processor.processing_context.get_local("status")
    self.logger.info(f"Host agent status updated to: {self.status}")

`process_confirmation()`

TODO: Process the confirmation.

Source code in agents/agent/host_agent.py

def process_confirmation(self) -> None:
    """
    TODO: Process the confirmation.
    """
    pass

Summary

HostAgent is the desktop-level orchestrator that:

Decomposes tasks and coordinates AppAgents
Operates at system level, not application level
Uses a 7-state FSM: CONTINUE → ASSIGN → AppAgent → CONTINUE → FINISH
Executes a 4-phase pipeline: DATA_COLLECTION → LLM → ACTION → MEMORY
Creates, caches, and reuses AppAgent instances
Provides shared Blackboard memory for all agents
Maintains single instance per session managing multiple AppAgents

Next Steps:

Read State Machine for FSM details
Read Processing Strategy for pipeline architecture
Read Command System for available desktop operations
Read AppAgent for application-level execution

HostAgent: Desktop Orchestrator

Architecture Overview

Core Responsibilities

Task Decomposition

Application Lifecycle Management

AppAgent Instantiation

Task Scheduling and Control

Shared State Communication

Key Characteristics

Execution Workflow

Deep Dive Topics

Input and Output

HostAgent Input

HostAgent Output

Related Documentation

API Reference

blackboard property

default_state property

status_manager property

sub_agent_amount property

context_provision(context) async

create_subagent(context=None)

get_active_appagent()

get_prompter(is_visual, main_prompt, example_prompt, api_prompt)

message_constructor(image_list, os_info, plan, prev_subtask, request, blackboard_prompt)

print_response(response)

process(context) async

process_confirmation()

Summary

`blackboard` `property`

`default_state` `property`

`status_manager` `property`

`sub_agent_amount` `property`

`context_provision(context)` `async`

`create_subagent(context=None)`

`get_active_appagent()`

`get_prompter(is_visual, main_prompt, example_prompt, api_prompt)`

`message_constructor(image_list, os_info, plan, prev_subtask, request, blackboard_prompt)`

`print_response(response)`

`process(context)` `async`

`process_confirmation()`