HostAgent 🤖

The HostAgent assumes three primary responsibilities:

  • Task Decomposition. Given a user's natural language input, HostAgent identifies the underlying task goal and decomposes it into a dependency-ordered subtask graph.

  • Application Lifecycle Management. For each subtask, HostAgent inspects system process metadata (via UIA APIs) to determine whether the target application is running. If not, it launches the program and registers it with the runtime.

  • AppAgent Instantiation. HostAgent spawns the corresponding AppAgent for each active application, providing it with task context, memory references, and relevant toolchains (e.g., APIs, documentation).

  • Task Scheduling and Control. The global execution plan is serialized into a finite state machine (FSM), allowing HostAgent to enforce execution order, detect failures, and resolve dependencies across agents.

  • Shared State Communication. HostAgent reads from and writes to a global blackboard, enabling inter-agent communication and system-level observability for debugging and replay.

Below is a diagram illustrating the HostAgent architecture and its interactions with other components:

Blackboard Image

The HostAgent activates its Processor to process the user's request and decompose it into sub-tasks. Each sub-task is then assigned to an AppAgent for execution. The HostAgent monitors the progress of the AppAgents and ensures the successful completion of the user's request.

HostAgent Input

The HostAgent receives the following inputs:

Input Description Type
User Request The user's request in natural language. String
Application Information Information about the existing active applications. List of Strings
Desktop Screenshots Screenshots of the desktop to provide context to the HostAgent. Image
Previous Sub-Tasks The previous sub-tasks and their completion status. List of Strings
Previous Plan The previous plan for the following sub-tasks. List of Strings
Blackboard The shared memory space for storing and sharing information among the agents. Dictionary

By processing these inputs, the HostAgent determines the appropriate application to fulfill the user's request and orchestrates the AppAgents to execute the necessary actions.

HostAgent Output

With the inputs provided, the HostAgent generates the following outputs:

Output Description Type
Observation The observation of current desktop screenshots. String
Thought The logical reasoning process of the HostAgent. String
Current Sub-Task The current sub-task to be executed by the AppAgent. String
Message The message to be sent to the AppAgent for the completion of the sub-task. String
ControlLabel The index of the selected application to execute the sub-task. String
ControlText The name of the selected application to execute the sub-task. String
Plan The plan for the following sub-tasks after the current sub-task. List of Strings
Status The status of the agent, mapped to the AgentState. String
Comment Additional comments or information provided to the user. String
Questions The questions to be asked to the user for additional information. List of Strings
Bash The bash command to be executed by the HostAgent. It can be used to open applications or execute system commands. String

Below is an example of the HostAgent output:

{
    "Observation": "Desktop screenshot",
    "Thought": "Logical reasoning process",
    "Current Sub-Task": "Sub-task description",
    "Message": "Message to AppAgent",
    "ControlLabel": "Application index",
    "ControlText": "Application name",
    "Plan": ["Sub-task 1", "Sub-task 2"],
    "Status": "AgentState",
    "Comment": "Additional comments",
    "Questions": ["Question 1", "Question 2"],
    "Bash": "Bash command"
}

Info

The HostAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python.

HostAgent State

The HostAgent progresses through different states, as defined in the ufo/agents/states/host_agent_states.py module. The states include:

State Description
CONTINUE Default state for action planning and execution.
PENDING Invoked for safety-critical actions (e.g., destructive operations); requires user confirmation.
FINISH Task completed; execution ends.
FAIL Irrecoverable failure detected (e.g., application crash, permission error).

The state machine diagram for the HostAgent is shown below:

The HostAgent transitions between these states based on the user's request, the application information, and the progress of the AppAgents in executing the sub-tasks.

Task Decomposition

Upon receiving the user's request, the HostAgent decomposes it into sub-tasks and assigns each sub-task to an AppAgent for execution. The HostAgent determines the appropriate application to fulfill the user's request based on the application information and the user's request. It then orchestrates the AppAgents to execute the necessary actions to complete the sub-tasks. We show the task decomposition process in the following figure:

Task Decomposition Image

Creating and Registering AppAgents

When the HostAgent determines the need for a new AppAgent to fulfill a sub-task, it creates an instance of the AppAgent and registers it with the HostAgent, by calling the create_subagent method:

def create_subagent(
        self,
        agent_type: str,
        agent_name: str,
        process_name: str,
        app_root_name: str,
        is_visual: bool,
        main_prompt: str,
        example_prompt: str,
        api_prompt: str,
        *args,
        **kwargs,
    ) -> BasicAgent:
        """
        Create an SubAgent hosted by the HostAgent.
        :param agent_type: The type of the agent to create.
        :param agent_name: The name of the SubAgent.
        :param process_name: The process name of the app.
        :param app_root_name: The root name of the app.
        :param is_visual: The flag indicating whether the agent is visual or not.
        :param main_prompt: The main prompt file path.
        :param example_prompt: The example prompt file path.
        :param api_prompt: The API prompt file path.
        :return: The created SubAgent.
        """
        app_agent = self.agent_factory.create_agent(
            agent_type,
            agent_name,
            process_name,
            app_root_name,
            is_visual,
            main_prompt,
            example_prompt,
            api_prompt,
            *args,
            **kwargs,
        )
        self.appagent_dict[agent_name] = app_agent
        app_agent.host = self
        self._active_appagent = app_agent

        return app_agent

The HostAgent then assigns the sub-task to the AppAgent for execution and monitors its progress.

Reference

Bases: BasicAgent

The HostAgent class the manager of AppAgents.

Initialize the HostAgent. :name: The name of the agent.

Parameters:
  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt file path.

  • example_prompt (str) –

    The example prompt file path.

  • api_prompt (str) –

    The API prompt file path.

Source code in agents/agent/host_agent.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def __init__(
    self,
    name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
) -> None:
    """
    Initialize the HostAgent.
    :name: The name of the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    """
    super().__init__(name=name)
    self.prompter = self.get_prompter(
        is_visual, main_prompt, example_prompt, api_prompt
    )
    self.offline_doc_retriever = None
    self.online_doc_retriever = None
    self.experience_retriever = None
    self.human_demonstration_retriever = None
    self.agent_factory = AgentFactory()
    self.appagent_dict = {}
    self._active_appagent = None
    self._blackboard = Blackboard()
    self.set_state(self.default_state)
    self.Puppeteer = self.create_puppeteer_interface()

blackboard property

Get the blackboard.

default_state property

Get the default state.

status_manager property

Get the status manager.

sub_agent_amount property

Get the amount of sub agents.

Returns:
  • int –

    The amount of sub agents.

create_app_agent(application_window_name, application_root_name, request, mode)

Create the app agent for the host agent.

Parameters:
  • application_window_name (str) –

    The name of the application window.

  • application_root_name (str) –

    The name of the application root.

  • request (str) –

    The user request.

  • mode (str) –

    The mode of the session.

Returns:
  • AppAgent –

    The app agent.

Source code in agents/agent/host_agent.py
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
def create_app_agent(
    self,
    application_window_name: str,
    application_root_name: str,
    request: str,
    mode: str,
) -> AppAgent:
    """
    Create the app agent for the host agent.
    :param application_window_name: The name of the application window.
    :param application_root_name: The name of the application root.
    :param request: The user request.
    :param mode: The mode of the session.
    :return: The app agent.
    """

    if configs.get("ACTION_SEQUENCE", False):
        example_prompt = configs["APPAGENT_EXAMPLE_PROMPT_AS"]
    else:
        example_prompt = configs["APPAGENT_EXAMPLE_PROMPT"]

    if mode in ["normal", "batch_normal", "follower"]:

        agent_name = (
            "AppAgent/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
            if mode == "normal"
            else "BatchAgent/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
        )

        app_agent: AppAgent = self.create_subagent(
            agent_type="app",
            agent_name=agent_name,
            process_name=application_window_name,
            app_root_name=application_root_name,
            is_visual=configs["APP_AGENT"]["VISUAL_MODE"],
            main_prompt=configs["APPAGENT_PROMPT"],
            example_prompt=example_prompt,
            api_prompt=configs["API_PROMPT"],
            mode=mode,
        )

    elif mode in ["normal_operator", "batch_normal_operator"]:

        agent_name = (
            "OpenAIOperator/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
            if mode == "normal_operator"
            else "BatchOpenAIOperator/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
        )

        app_agent: OpenAIOperatorAgent = self.create_subagent(
            "operator",
            agent_name=agent_name,
            process_name=application_window_name,
            app_root_name=application_root_name,
        )

    else:
        raise ValueError(f"The {mode} mode is not supported.")

    # Create the COM receiver for the app agent.
    if configs.get("USE_APIS", False):
        app_agent.Puppeteer.receiver_manager.create_api_receiver(
            application_root_name, application_window_name
        )

    # Provision the context for the app agent, including the all retrievers.
    app_agent.context_provision(request)

    return app_agent

create_puppeteer_interface()

Create the Puppeteer interface to automate the app.

Returns:
  • AppPuppeteer –

    The Puppeteer interface.

Source code in agents/agent/host_agent.py
209
210
211
212
213
214
def create_puppeteer_interface(self) -> puppeteer.AppPuppeteer:
    """
    Create the Puppeteer interface to automate the app.
    :return: The Puppeteer interface.
    """
    return puppeteer.AppPuppeteer("", "")

create_subagent(agent_type, agent_name, process_name, app_root_name, *args, **kwargs)

Create an SubAgent hosted by the HostAgent.

Parameters:
  • agent_type (str) –

    The type of the agent to create.

  • agent_name (str) –

    The name of the SubAgent.

  • process_name (str) –

    The process name of the app.

  • app_root_name (str) –

    The root name of the app.

Returns:
  • BasicAgent –

    The created SubAgent.

Source code in agents/agent/host_agent.py
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
def create_subagent(
    self,
    agent_type: str,
    agent_name: str,
    process_name: str,
    app_root_name: str,
    *args,
    **kwargs,
) -> BasicAgent:
    """
    Create an SubAgent hosted by the HostAgent.
    :param agent_type: The type of the agent to create.
    :param agent_name: The name of the SubAgent.
    :param process_name: The process name of the app.
    :param app_root_name: The root name of the app.
    :return: The created SubAgent.
    """
    app_agent = self.agent_factory.create_agent(
        agent_type,
        agent_name,
        process_name,
        app_root_name,
        # is_visual,
        # main_prompt,
        # example_prompt,
        # api_prompt,
        *args,
        **kwargs,
    )
    self.appagent_dict[agent_name] = app_agent
    app_agent.host = self
    self._active_appagent = app_agent

    return app_agent

get_active_appagent()

Get the active app agent.

Returns:
  • AppAgent –

    The active app agent.

Source code in agents/agent/host_agent.py
146
147
148
149
150
151
def get_active_appagent(self) -> AppAgent:
    """
    Get the active app agent.
    :return: The active app agent.
    """
    return self._active_appagent

get_prompter(is_visual, main_prompt, example_prompt, api_prompt)

Get the prompt for the agent.

Parameters:
  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt file path.

  • example_prompt (str) –

    The example prompt file path.

  • api_prompt (str) –

    The API prompt file path.

Returns:
  • HostAgentPrompter –

    The prompter instance.

Source code in agents/agent/host_agent.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
def get_prompter(
    self,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
) -> HostAgentPrompter:
    """
    Get the prompt for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    :return: The prompter instance.
    """
    return HostAgentPrompter(is_visual, main_prompt, example_prompt, api_prompt)

message_constructor(image_list, os_info, plan, prev_subtask, request, blackboard_prompt)

Construct the message.

Parameters:
  • image_list (List[str]) –

    The list of screenshot images.

  • os_info (str) –

    The OS information.

  • prev_subtask (List[Dict[str, str]]) –

    The previous subtask.

  • plan (List[str]) –

    The plan.

  • request (str) –

    The request.

Returns:
  • List[Dict[str, Union[str, List[Dict[str, str]]]]] –

    The message.

Source code in agents/agent/host_agent.py
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
def message_constructor(
    self,
    image_list: List[str],
    os_info: str,
    plan: List[str],
    prev_subtask: List[Dict[str, str]],
    request: str,
    blackboard_prompt: List[Dict[str, str]],
) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
    """
    Construct the message.
    :param image_list: The list of screenshot images.
    :param os_info: The OS information.
    :param prev_subtask: The previous subtask.
    :param plan: The plan.
    :param request: The request.
    :return: The message.
    """
    hostagent_prompt_system_message = self.prompter.system_prompt_construction()
    hostagent_prompt_user_message = self.prompter.user_content_construction(
        image_list=image_list,
        control_item=os_info,
        prev_subtask=prev_subtask,
        prev_plan=plan,
        user_request=request,
    )

    if blackboard_prompt:
        hostagent_prompt_user_message = (
            blackboard_prompt + hostagent_prompt_user_message
        )

    hostagent_prompt_message = self.prompter.prompt_construction(
        hostagent_prompt_system_message, hostagent_prompt_user_message
    )

    return hostagent_prompt_message

print_response(response_dict)

Print the response.

Parameters:
  • response_dict (Dict) –

    The response dictionary to print.

Source code in agents/agent/host_agent.py
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
def print_response(self, response_dict: Dict) -> None:
    """
    Print the response.
    :param response_dict: The response dictionary to print.
    """

    application = response_dict.get("ControlText")
    if not application:
        application = "[The required application needs to be opened.]"
    observation = response_dict.get("Observation")
    thought = response_dict.get("Thought")
    bash_command = response_dict.get("Bash", None)
    subtask = response_dict.get("CurrentSubtask")

    # Convert the message from a list to a string.
    message = list(response_dict.get("Message", ""))
    message = "\n".join(message)

    # Concatenate the subtask with the plan and convert the plan from a list to a string.
    plan = list(response_dict.get("Plan"))
    plan = [subtask] + plan
    plan = "\n".join([f"({i+1}) " + str(item) for i, item in enumerate(plan)])

    status = response_dict.get("Status")
    comment = response_dict.get("Comment")

    utils.print_with_color(
        "Observations👀: {observation}".format(observation=observation), "cyan"
    )
    utils.print_with_color("Thoughts💡: {thought}".format(thought=thought), "green")
    if bash_command:
        utils.print_with_color(
            "Running Bash Command🔧: {bash}".format(bash=bash_command), "yellow"
        )
    utils.print_with_color(
        "Plans📚: {plan}".format(plan=plan),
        "cyan",
    )
    utils.print_with_color(
        "Next Selected application📲: {application}".format(
            application=application
        ),
        "yellow",
    )
    utils.print_with_color(
        "Messages to AppAgent📩: {message}".format(message=message), "cyan"
    )
    utils.print_with_color("Status📊: {status}".format(status=status), "blue")

    utils.print_with_color("Comment💬: {comment}".format(comment=comment), "green")

process(context)

Process the agent.

Parameters:
  • context (Context) –

    The context.

Source code in agents/agent/host_agent.py
198
199
200
201
202
203
204
205
206
207
def process(self, context: Context) -> None:
    """
    Process the agent.
    :param context: The context.
    """
    self.processor = HostAgentProcessor(agent=self, context=context)
    self.processor.process()

    # Sync the status with the processor.
    self.status = self.processor.status

process_comfirmation()

TODO: Process the confirmation.

Source code in agents/agent/host_agent.py
294
295
296
297
298
def process_comfirmation(self) -> None:
    """
    TODO: Process the confirmation.
    """
    pass