HostAgent 🤖

The HostAgent assumes three primary responsibilities:

Task Decomposition. Given a user's natural language input, HostAgent identifies the underlying task goal and decomposes it into a dependency-ordered subtask graph.
Application Lifecycle Management. For each subtask, HostAgent inspects system process metadata (via UIA APIs) to determine whether the target application is running. If not, it launches the program and registers it with the runtime.
AppAgent Instantiation. HostAgent spawns the corresponding AppAgent for each active application, providing it with task context, memory references, and relevant toolchains (e.g., APIs, documentation).
Task Scheduling and Control. The global execution plan is serialized into a finite state machine (FSM), allowing HostAgent to enforce execution order, detect failures, and resolve dependencies across agents.
Shared State Communication. HostAgent reads from and writes to a global blackboard, enabling inter-agent communication and system-level observability for debugging and replay.

Below is a diagram illustrating the HostAgent architecture and its interactions with other components:

The HostAgent activates its Processor to process the user's request and decompose it into sub-tasks. Each sub-task is then assigned to an AppAgent for execution. The HostAgent monitors the progress of the AppAgents and ensures the successful completion of the user's request.

HostAgent Input

The HostAgent receives the following inputs:

Input	Description	Type
User Request	The user's request in natural language.	String
Application Information	Information about the existing active applications.	List of Strings
Desktop Screenshots	Screenshots of the desktop to provide context to the `HostAgent`.	Image
Previous Sub-Tasks	The previous sub-tasks and their completion status.	List of Strings
Previous Plan	The previous plan for the following sub-tasks.	List of Strings
Blackboard	The shared memory space for storing and sharing information among the agents.	Dictionary

By processing these inputs, the HostAgent determines the appropriate application to fulfill the user's request and orchestrates the AppAgents to execute the necessary actions.

HostAgent Output

With the inputs provided, the HostAgent generates the following outputs:

Output	Description	Type
Observation	The observation of current desktop screenshots.	String
Thought	The logical reasoning process of the `HostAgent`.	String
Current Sub-Task	The current sub-task to be executed by the `AppAgent`.	String
Message	The message to be sent to the `AppAgent` for the completion of the sub-task.	String
ControlLabel	The index of the selected application to execute the sub-task.	String
ControlText	The name of the selected application to execute the sub-task.	String
Plan	The plan for the following sub-tasks after the current sub-task.	List of Strings
Status	The status of the agent, mapped to the `AgentState`.	String
Comment	Additional comments or information provided to the user.	String
Questions	The questions to be asked to the user for additional information.	List of Strings
Bash	The bash command to be executed by the `HostAgent`. It can be used to open applications or execute system commands.	String

Below is an example of the HostAgent output:

{
    "Observation": "Desktop screenshot",
    "Thought": "Logical reasoning process",
    "Current Sub-Task": "Sub-task description",
    "Message": "Message to AppAgent",
    "ControlLabel": "Application index",
    "ControlText": "Application name",
    "Plan": ["Sub-task 1", "Sub-task 2"],
    "Status": "AgentState",
    "Comment": "Additional comments",
    "Questions": ["Question 1", "Question 2"],
    "Bash": "Bash command"
}

Info

The HostAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python.

HostAgent State

The HostAgent progresses through different states, as defined in the ufo/agents/states/host_agent_states.py module. The states include:

State	Description
`CONTINUE`	Default state for action planning and execution.
`PENDING`	Invoked for safety-critical actions (e.g., destructive operations); requires user confirmation.
`FINISH`	Task completed; execution ends.
`FAIL`	Irrecoverable failure detected (e.g., application crash, permission error).

The state machine diagram for the HostAgent is shown below:

The HostAgent transitions between these states based on the user's request, the application information, and the progress of the AppAgents in executing the sub-tasks.

Task Decomposition

Upon receiving the user's request, the HostAgent decomposes it into sub-tasks and assigns each sub-task to an AppAgent for execution. The HostAgent determines the appropriate application to fulfill the user's request based on the application information and the user's request. It then orchestrates the AppAgents to execute the necessary actions to complete the sub-tasks. We show the task decomposition process in the following figure:

Creating and Registering AppAgents

When the HostAgent determines the need for a new AppAgent to fulfill a sub-task, it creates an instance of the AppAgent and registers it with the HostAgent, by calling the create_subagent method:

def create_subagent(
        self,
        agent_type: str,
        agent_name: str,
        process_name: str,
        app_root_name: str,
        is_visual: bool,
        main_prompt: str,
        example_prompt: str,
        api_prompt: str,
        *args,
        **kwargs,
    ) -> BasicAgent:
        """
        Create an SubAgent hosted by the HostAgent.
        :param agent_type: The type of the agent to create.
        :param agent_name: The name of the SubAgent.
        :param process_name: The process name of the app.
        :param app_root_name: The root name of the app.
        :param is_visual: The flag indicating whether the agent is visual or not.
        :param main_prompt: The main prompt file path.
        :param example_prompt: The example prompt file path.
        :param api_prompt: The API prompt file path.
        :return: The created SubAgent.
        """
        app_agent = self.agent_factory.create_agent(
            agent_type,
            agent_name,
            process_name,
            app_root_name,
            is_visual,
            main_prompt,
            example_prompt,
            api_prompt,
            *args,
            **kwargs,
        )
        self.appagent_dict[agent_name] = app_agent
        app_agent.host = self
        self._active_appagent = app_agent

        return app_agent

The HostAgent then assigns the sub-task to the AppAgent for execution and monitors its progress.

Reference

Bases: BasicAgent

The HostAgent class the manager of AppAgents.

Initialize the HostAgent. :name: The name of the agent.

Parameters:	`is_visual` (`bool`) – The flag indicating whether the agent is visual or not. `main_prompt` (`str`) – The main prompt file path. `example_prompt` (`str`) – The example prompt file path. `api_prompt` (`str`) – The API prompt file path.

Source code in agents/agent/host_agent.py

def __init__(
    self,
    name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
) -> None:
    """
    Initialize the HostAgent.
    :name: The name of the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    """
    super().__init__(name=name)
    self.prompter = self.get_prompter(
        is_visual, main_prompt, example_prompt, api_prompt
    )
    self.offline_doc_retriever = None
    self.online_doc_retriever = None
    self.experience_retriever = None
    self.human_demonstration_retriever = None
    self.agent_factory = AgentFactory()
    self.appagent_dict = {}
    self._active_appagent = None
    self._blackboard = Blackboard()
    self.set_state(self.default_state)
    self.Puppeteer = self.create_puppeteer_interface()

`blackboard` `property`

Get the blackboard.

`default_state` `property`

Get the default state.

`status_manager` `property`

Get the status manager.

`sub_agent_amount` `property`

Get the amount of sub agents.

Returns:	`int` – The amount of sub agents.

`create_app_agent(application_window_name, application_root_name, request, mode)`

Create the app agent for the host agent.

Parameters:	`application_window_name` (`str`) – The name of the application window. `application_root_name` (`str`) – The name of the application root. `request` (`str`) – The user request. `mode` (`str`) – The mode of the session.

Returns:	`AppAgent` – The app agent.

Source code in agents/agent/host_agent.py

def create_app_agent(
    self,
    application_window_name: str,
    application_root_name: str,
    request: str,
    mode: str,
) -> AppAgent:
    """
    Create the app agent for the host agent.
    :param application_window_name: The name of the application window.
    :param application_root_name: The name of the application root.
    :param request: The user request.
    :param mode: The mode of the session.
    :return: The app agent.
    """

    if configs.get("ACTION_SEQUENCE", False):
        example_prompt = configs["APPAGENT_EXAMPLE_PROMPT_AS"]
    else:
        example_prompt = configs["APPAGENT_EXAMPLE_PROMPT"]

    if mode in ["normal", "batch_normal", "follower"]:

        agent_name = (
            "AppAgent/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
            if mode == "normal"
            else "BatchAgent/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
        )

        app_agent: AppAgent = self.create_subagent(
            agent_type="app",
            agent_name=agent_name,
            process_name=application_window_name,
            app_root_name=application_root_name,
            is_visual=configs["APP_AGENT"]["VISUAL_MODE"],
            main_prompt=configs["APPAGENT_PROMPT"],
            example_prompt=example_prompt,
            api_prompt=configs["API_PROMPT"],
            mode=mode,
        )

    elif mode in ["normal_operator", "batch_normal_operator"]:

        agent_name = (
            "OpenAIOperator/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
            if mode == "normal_operator"
            else "BatchOpenAIOperator/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
        )

        app_agent: OpenAIOperatorAgent = self.create_subagent(
            "operator",
            agent_name=agent_name,
            process_name=application_window_name,
            app_root_name=application_root_name,
        )

    else:
        raise ValueError(f"The {mode} mode is not supported.")

    # Create the COM receiver for the app agent.
    if configs.get("USE_APIS", False):
        app_agent.Puppeteer.receiver_manager.create_api_receiver(
            application_root_name, application_window_name
        )

    # Provision the context for the app agent, including the all retrievers.
    app_agent.context_provision(request)

    return app_agent

`create_puppeteer_interface()`

Create the Puppeteer interface to automate the app.

Returns:	`AppPuppeteer` – The Puppeteer interface.

Source code in agents/agent/host_agent.py

def create_puppeteer_interface(self) -> puppeteer.AppPuppeteer:
    """
    Create the Puppeteer interface to automate the app.
    :return: The Puppeteer interface.
    """
    return puppeteer.AppPuppeteer("", "")

`create_subagent(agent_type, agent_name, process_name, app_root_name, *args, **kwargs)`

Create an SubAgent hosted by the HostAgent.

Parameters:	`agent_type` (`str`) – The type of the agent to create. `agent_name` (`str`) – The name of the SubAgent. `process_name` (`str`) – The process name of the app. `app_root_name` (`str`) – The root name of the app.

Returns:	`BasicAgent` – The created SubAgent.

Source code in agents/agent/host_agent.py

def create_subagent(
    self,
    agent_type: str,
    agent_name: str,
    process_name: str,
    app_root_name: str,
    *args,
    **kwargs,
) -> BasicAgent:
    """
    Create an SubAgent hosted by the HostAgent.
    :param agent_type: The type of the agent to create.
    :param agent_name: The name of the SubAgent.
    :param process_name: The process name of the app.
    :param app_root_name: The root name of the app.
    :return: The created SubAgent.
    """
    app_agent = self.agent_factory.create_agent(
        agent_type,
        agent_name,
        process_name,
        app_root_name,
        # is_visual,
        # main_prompt,
        # example_prompt,
        # api_prompt,
        *args,
        **kwargs,
    )
    self.appagent_dict[agent_name] = app_agent
    app_agent.host = self
    self._active_appagent = app_agent

    return app_agent

`get_active_appagent()`

Get the active app agent.

Returns:	`AppAgent` – The active app agent.

Source code in agents/agent/host_agent.py

def get_active_appagent(self) -> AppAgent:
    """
    Get the active app agent.
    :return: The active app agent.
    """
    return self._active_appagent

`get_prompter(is_visual, main_prompt, example_prompt, api_prompt)`

Get the prompt for the agent.

Parameters:	`is_visual` (`bool`) – The flag indicating whether the agent is visual or not. `main_prompt` (`str`) – The main prompt file path. `example_prompt` (`str`) – The example prompt file path. `api_prompt` (`str`) – The API prompt file path.

Returns:	`HostAgentPrompter` – The prompter instance.

Source code in agents/agent/host_agent.py

def get_prompter(
    self,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
) -> HostAgentPrompter:
    """
    Get the prompt for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    :return: The prompter instance.
    """
    return HostAgentPrompter(is_visual, main_prompt, example_prompt, api_prompt)

`message_constructor(image_list, os_info, plan, prev_subtask, request, blackboard_prompt)`

Construct the message.

Parameters:	`image_list` (`List[str]`) – The list of screenshot images. `os_info` (`str`) – The OS information. `prev_subtask` (`List[Dict[str, str]]`) – The previous subtask. `plan` (`List[str]`) – The plan. `request` (`str`) – The request.

Returns:	`List[Dict[str, Union[str, List[Dict[str, str]]]]]` – The message.

Source code in agents/agent/host_agent.py

def message_constructor(
    self,
    image_list: List[str],
    os_info: str,
    plan: List[str],
    prev_subtask: List[Dict[str, str]],
    request: str,
    blackboard_prompt: List[Dict[str, str]],
) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
    """
    Construct the message.
    :param image_list: The list of screenshot images.
    :param os_info: The OS information.
    :param prev_subtask: The previous subtask.
    :param plan: The plan.
    :param request: The request.
    :return: The message.
    """
    hostagent_prompt_system_message = self.prompter.system_prompt_construction()
    hostagent_prompt_user_message = self.prompter.user_content_construction(
        image_list=image_list,
        control_item=os_info,
        prev_subtask=prev_subtask,
        prev_plan=plan,
        user_request=request,
    )

    if blackboard_prompt:
        hostagent_prompt_user_message = (
            blackboard_prompt + hostagent_prompt_user_message
        )

    hostagent_prompt_message = self.prompter.prompt_construction(
        hostagent_prompt_system_message, hostagent_prompt_user_message
    )

    return hostagent_prompt_message

`print_response(response_dict)`

Print the response.

Parameters:	`response_dict` (`Dict`) – The response dictionary to print.

Source code in agents/agent/host_agent.py

def print_response(self, response_dict: Dict) -> None:
    """
    Print the response.
    :param response_dict: The response dictionary to print.
    """

    application = response_dict.get("ControlText")
    if not application:
        application = "[The required application needs to be opened.]"
    observation = response_dict.get("Observation")
    thought = response_dict.get("Thought")
    bash_command = response_dict.get("Bash", None)
    subtask = response_dict.get("CurrentSubtask")

    # Convert the message from a list to a string.
    message = list(response_dict.get("Message", ""))
    message = "\n".join(message)

    # Concatenate the subtask with the plan and convert the plan from a list to a string.
    plan = list(response_dict.get("Plan"))
    plan = [subtask] + plan
    plan = "\n".join([f"({i+1}) " + str(item) for i, item in enumerate(plan)])

    status = response_dict.get("Status")
    comment = response_dict.get("Comment")

    utils.print_with_color(
        "Observations👀: {observation}".format(observation=observation), "cyan"
    )
    utils.print_with_color("Thoughts💡: {thought}".format(thought=thought), "green")
    if bash_command:
        utils.print_with_color(
            "Running Bash Command🔧: {bash}".format(bash=bash_command), "yellow"
        )
    utils.print_with_color(
        "Plans📚: {plan}".format(plan=plan),
        "cyan",
    )
    utils.print_with_color(
        "Next Selected application📲: {application}".format(
            application=application
        ),
        "yellow",
    )
    utils.print_with_color(
        "Messages to AppAgent📩: {message}".format(message=message), "cyan"
    )
    utils.print_with_color("Status📊: {status}".format(status=status), "blue")

    utils.print_with_color("Comment💬: {comment}".format(comment=comment), "green")

`process(context)`

Process the agent.

Parameters:	`context` (`Context`) – The context.

Source code in agents/agent/host_agent.py

def process(self, context: Context) -> None:
    """
    Process the agent.
    :param context: The context.
    """
    self.processor = HostAgentProcessor(agent=self, context=context)
    self.processor.process()

    # Sync the status with the processor.
    self.status = self.processor.status

`process_comfirmation()`

TODO: Process the confirmation.

Source code in agents/agent/host_agent.py

def process_comfirmation(self) -> None:
    """
    TODO: Process the confirmation.
    """
    pass

HostAgent 🤖

HostAgent Input

HostAgent Output

HostAgent State

Task Decomposition

Creating and Registering AppAgents

Reference

blackboard property

default_state property

status_manager property

sub_agent_amount property

create_app_agent(application_window_name, application_root_name, request, mode)

create_puppeteer_interface()

create_subagent(agent_type, agent_name, process_name, app_root_name, *args, **kwargs)

get_active_appagent()

get_prompter(is_visual, main_prompt, example_prompt, api_prompt)

message_constructor(image_list, os_info, plan, prev_subtask, request, blackboard_prompt)

print_response(response_dict)

process(context)

process_comfirmation()

`blackboard` `property`

`default_state` `property`

`status_manager` `property`

`sub_agent_amount` `property`

`create_app_agent(application_window_name, application_root_name, request, mode)`

`create_puppeteer_interface()`

`create_subagent(agent_type, agent_name, process_name, app_root_name, *args, **kwargs)`

`get_active_appagent()`

`get_prompter(is_visual, main_prompt, example_prompt, api_prompt)`

`message_constructor(image_list, os_info, plan, prev_subtask, request, blackboard_prompt)`

`print_response(response_dict)`

`process(context)`

`process_comfirmation()`