HostAgent 🤖

The HostAgent assumes three primary responsibilities:

  1. User Engagement: The HostAgent engages with the user to understand their request and analyze their intent. It also conversates with the user to gather additional information when necessary.
  2. AppAgent Management: The HostAgent manages the creation and registration of AppAgents to fulfill the user's request. It also orchestrates the interaction between the AppAgents and the application.
  3. Task Management: The HostAgent analyzes the user's request, to decompose it into sub-tasks and distribute them among the AppAgents. It also manages the scheduling, orchestration, coordination, and monitoring of the AppAgents to ensure the successful completion of the user's request.
  4. Bash Command Execution: The HostAgent can execute bash commands to open applications or execute system commands to support the user's request and the AppAgents' execution.
  5. Communication: The HostAgent communicates with the AppAgents to exchange information. It also manages the Blackboard to store and share information among the agents, as shown below:

Blackboard Image

The HostAgent activates its Processor to process the user's request and decompose it into sub-tasks. Each sub-task is then assigned to an AppAgent for execution. The HostAgent monitors the progress of the AppAgents and ensures the successful completion of the user's request.

HostAgent Input

The HostAgent receives the following inputs:

Input Description Type
User Request The user's request in natural language. String
Application Information Information about the existing active applications. List of Strings
Desktop Screenshots Screenshots of the desktop to provide context to the HostAgent. Image
Previous Sub-Tasks The previous sub-tasks and their completion status. List of Strings
Previous Plan The previous plan for the following sub-tasks. List of Strings
Blackboard The shared memory space for storing and sharing information among the agents. Dictionary

By processing these inputs, the HostAgent determines the appropriate application to fulfill the user's request and orchestrates the AppAgents to execute the necessary actions.

HostAgent Output

With the inputs provided, the HostAgent generates the following outputs:

Output Description Type
Observation The observation of current desktop screenshots. String
Thought The logical reasoning process of the HostAgent. String
Current Sub-Task The current sub-task to be executed by the AppAgent. String
Message The message to be sent to the AppAgent for the completion of the sub-task. String
ControlLabel The index of the selected application to execute the sub-task. String
ControlText The name of the selected application to execute the sub-task. String
Plan The plan for the following sub-tasks after the current sub-task. List of Strings
Status The status of the agent, mapped to the AgentState. String
Comment Additional comments or information provided to the user. String
Questions The questions to be asked to the user for additional information. List of Strings
Bash The bash command to be executed by the HostAgent. It can be used to open applications or execute system commands. String

Below is an example of the HostAgent output:

{
    "Observation": "Desktop screenshot",
    "Thought": "Logical reasoning process",
    "Current Sub-Task": "Sub-task description",
    "Message": "Message to AppAgent",
    "ControlLabel": "Application index",
    "ControlText": "Application name",
    "Plan": ["Sub-task 1", "Sub-task 2"],
    "Status": "AgentState",
    "Comment": "Additional comments",
    "Questions": ["Question 1", "Question 2"],
    "Bash": "Bash command"
}

Info

The HostAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python.

HostAgent State

The HostAgent progresses through different states, as defined in the ufo/agents/states/host_agent_states.py module. The states include:

State Description
CONTINUE The HostAgent is ready to process the user's request and emloy the Processor to decompose it into sub-tasks.
ASSIGN The HostAgent is assigning the sub-tasks to the AppAgents for execution.
FINISH The overall task is completed, and the HostAgent is ready to return the results to the user.
ERROR An error occurred during the processing of the user's request, and the HostAgent is unable to proceed.
FAIL The HostAgent believes the task is unachievable and cannot proceed further.
PENDING The HostAgent is waiting for additional information from the user to proceed.

The state machine diagram for the HostAgent is shown below:

The HostAgent transitions between these states based on the user's request, the application information, and the progress of the AppAgents in executing the sub-tasks.

Task Decomposition

Upon receiving the user's request, the HostAgent decomposes it into sub-tasks and assigns each sub-task to an AppAgent for execution. The HostAgent determines the appropriate application to fulfill the user's request based on the application information and the user's request. It then orchestrates the AppAgents to execute the necessary actions to complete the sub-tasks. We show the task decomposition process in the following figure:

Task Decomposition Image

Creating and Registering AppAgents

When the HostAgent determines the need for a new AppAgent to fulfill a sub-task, it creates an instance of the AppAgent and registers it with the HostAgent, by calling the create_subagent method:

def create_subagent(
        self,
        agent_type: str,
        agent_name: str,
        process_name: str,
        app_root_name: str,
        is_visual: bool,
        main_prompt: str,
        example_prompt: str,
        api_prompt: str,
        *args,
        **kwargs,
    ) -> BasicAgent:
        """
        Create an SubAgent hosted by the HostAgent.
        :param agent_type: The type of the agent to create.
        :param agent_name: The name of the SubAgent.
        :param process_name: The process name of the app.
        :param app_root_name: The root name of the app.
        :param is_visual: The flag indicating whether the agent is visual or not.
        :param main_prompt: The main prompt file path.
        :param example_prompt: The example prompt file path.
        :param api_prompt: The API prompt file path.
        :return: The created SubAgent.
        """
        app_agent = self.agent_factory.create_agent(
            agent_type,
            agent_name,
            process_name,
            app_root_name,
            is_visual,
            main_prompt,
            example_prompt,
            api_prompt,
            *args,
            **kwargs,
        )
        self.appagent_dict[agent_name] = app_agent
        app_agent.host = self
        self._active_appagent = app_agent

        return app_agent

The HostAgent then assigns the sub-task to the AppAgent for execution and monitors its progress.

Reference

Bases: BasicAgent

The HostAgent class the manager of AppAgents.

Initialize the HostAgent. :name: The name of the agent.

Parameters:
  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt file path.

  • example_prompt (str) –

    The example prompt file path.

  • api_prompt (str) –

    The API prompt file path.

Source code in agents/agent/host_agent.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def __init__(
    self,
    name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
) -> None:
    """
    Initialize the HostAgent.
    :name: The name of the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    """
    super().__init__(name=name)
    self.prompter = self.get_prompter(
        is_visual, main_prompt, example_prompt, api_prompt
    )
    self.offline_doc_retriever = None
    self.online_doc_retriever = None
    self.experience_retriever = None
    self.human_demonstration_retriever = None
    self.agent_factory = AgentFactory()
    self.appagent_dict = {}
    self._active_appagent = None
    self._blackboard = Blackboard()
    self.set_state(ContinueHostAgentState())
    self.Puppeteer = self.create_puppeteer_interface()

blackboard property

Get the blackboard.

status_manager: HostAgentStatus property

Get the status manager.

sub_agent_amount: int property

Get the amount of sub agents.

Returns:
  • int –

    The amount of sub agents.

create_app_agent(application_window_name, application_root_name, request, mode)

Create the app agent for the host agent.

Parameters:
  • application_window_name (str) –

    The name of the application window.

  • application_root_name (str) –

    The name of the application root.

  • request (str) –

    The user request.

  • mode (str) –

    The mode of the session.

Returns:
  • AppAgent –

    The app agent.

Source code in agents/agent/host_agent.py
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
def create_app_agent(
    self,
    application_window_name: str,
    application_root_name: str,
    request: str,
    mode: str,
) -> AppAgent:
    """
    Create the app agent for the host agent.
    :param application_window_name: The name of the application window.
    :param application_root_name: The name of the application root.
    :param request: The user request.
    :param mode: The mode of the session.
    :return: The app agent.
    """

    if mode == "normal" or "batch_normal":

        agent_name = (
            "AppAgent/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
            if mode == "normal"
            else "BatchAgent/{root}/{process}".format(
                root=application_root_name, process=application_window_name
            )
        )

        app_agent: AppAgent = self.create_subagent(
            agent_type="app",
            agent_name=agent_name,
            process_name=application_window_name,
            app_root_name=application_root_name,
            is_visual=configs["APP_AGENT"]["VISUAL_MODE"],
            main_prompt=configs["APPAGENT_PROMPT"],
            example_prompt=configs["APPAGENT_EXAMPLE_PROMPT"],
            api_prompt=configs["API_PROMPT"],
        )

    elif mode == "follower":

        # Load additional app info prompt.
        app_info_prompt = configs.get("APP_INFO_PROMPT", None)

        agent_name = "FollowerAgent/{root}/{process}".format(
            root=application_root_name, process=application_window_name
        )

        # Create the app agent in the follower mode.
        app_agent = self.create_subagent(
            agent_type="follower",
            agent_name=agent_name,
            process_name=application_window_name,
            app_root_name=application_root_name,
            is_visual=configs["APP_AGENT"]["VISUAL_MODE"],
            main_prompt=configs["FOLLOWERAHENT_PROMPT"],
            example_prompt=configs["APPAGENT_EXAMPLE_PROMPT"],
            api_prompt=configs["API_PROMPT"],
            app_info_prompt=app_info_prompt,
        )

    else:
        raise ValueError(f"The {mode} mode is not supported.")

    # Create the COM receiver for the app agent.
    if configs.get("USE_APIS", False):
        app_agent.Puppeteer.receiver_manager.create_api_receiver(
            application_root_name, application_window_name
        )

    # Provision the context for the app agent, including the all retrievers.
    app_agent.context_provision(request)

    return app_agent

create_puppeteer_interface()

Create the Puppeteer interface to automate the app.

Returns:
  • AppPuppeteer –

    The Puppeteer interface.

Source code in agents/agent/host_agent.py
215
216
217
218
219
220
def create_puppeteer_interface(self) -> puppeteer.AppPuppeteer:
    """
    Create the Puppeteer interface to automate the app.
    :return: The Puppeteer interface.
    """
    return puppeteer.AppPuppeteer("", "")

create_subagent(agent_type, agent_name, process_name, app_root_name, is_visual, main_prompt, example_prompt, api_prompt, *args, **kwargs)

Create an SubAgent hosted by the HostAgent.

Parameters:
  • agent_type (str) –

    The type of the agent to create.

  • agent_name (str) –

    The name of the SubAgent.

  • process_name (str) –

    The process name of the app.

  • app_root_name (str) –

    The root name of the app.

  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt file path.

  • example_prompt (str) –

    The example prompt file path.

  • api_prompt (str) –

    The API prompt file path.

Returns:
  • BasicAgent –

    The created SubAgent.

Source code in agents/agent/host_agent.py
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
def create_subagent(
    self,
    agent_type: str,
    agent_name: str,
    process_name: str,
    app_root_name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
    *args,
    **kwargs,
) -> BasicAgent:
    """
    Create an SubAgent hosted by the HostAgent.
    :param agent_type: The type of the agent to create.
    :param agent_name: The name of the SubAgent.
    :param process_name: The process name of the app.
    :param app_root_name: The root name of the app.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    :return: The created SubAgent.
    """
    app_agent = self.agent_factory.create_agent(
        agent_type,
        agent_name,
        process_name,
        app_root_name,
        is_visual,
        main_prompt,
        example_prompt,
        api_prompt,
        *args,
        **kwargs,
    )
    self.appagent_dict[agent_name] = app_agent
    app_agent.host = self
    self._active_appagent = app_agent

    return app_agent

get_active_appagent()

Get the active app agent.

Returns:
  • AppAgent –

    The active app agent.

Source code in agents/agent/host_agent.py
152
153
154
155
156
157
def get_active_appagent(self) -> AppAgent:
    """
    Get the active app agent.
    :return: The active app agent.
    """
    return self._active_appagent

get_prompter(is_visual, main_prompt, example_prompt, api_prompt)

Get the prompt for the agent.

Parameters:
  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt file path.

  • example_prompt (str) –

    The example prompt file path.

  • api_prompt (str) –

    The API prompt file path.

Returns:
  • HostAgentPrompter –

    The prompter instance.

Source code in agents/agent/host_agent.py
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
def get_prompter(
    self,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
) -> HostAgentPrompter:
    """
    Get the prompt for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    :return: The prompter instance.
    """
    return HostAgentPrompter(is_visual, main_prompt, example_prompt, api_prompt)

message_constructor(image_list, os_info, plan, prev_subtask, request)

Construct the message.

Parameters:
  • image_list (List[str]) –

    The list of screenshot images.

  • os_info (str) –

    The OS information.

  • prev_subtask (List[Dict[str, str]]) –

    The previous subtask.

  • plan (List[str]) –

    The plan.

  • request (str) –

    The request.

Returns:
  • List[Dict[str, Union[str, List[Dict[str, str]]]]] –

    The message.

Source code in agents/agent/host_agent.py
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
def message_constructor(
    self,
    image_list: List[str],
    os_info: str,
    plan: List[str],
    prev_subtask: List[Dict[str, str]],
    request: str,
) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
    """
    Construct the message.
    :param image_list: The list of screenshot images.
    :param os_info: The OS information.
    :param prev_subtask: The previous subtask.
    :param plan: The plan.
    :param request: The request.
    :return: The message.
    """
    hostagent_prompt_system_message = self.prompter.system_prompt_construction()
    hostagent_prompt_user_message = self.prompter.user_content_construction(
        image_list=image_list,
        control_item=os_info,
        prev_subtask=prev_subtask,
        prev_plan=plan,
        user_request=request,
    )

    if not self.blackboard.is_empty():
        blackboard_prompt = self.blackboard.blackboard_to_prompt()
        hostagent_prompt_user_message = (
            blackboard_prompt + hostagent_prompt_user_message
        )

    hostagent_prompt_message = self.prompter.prompt_construction(
        hostagent_prompt_system_message, hostagent_prompt_user_message
    )

    return hostagent_prompt_message

print_response(response_dict)

Print the response.

Parameters:
  • response_dict (Dict) –

    The response dictionary to print.

Source code in agents/agent/host_agent.py
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
def print_response(self, response_dict: Dict) -> None:
    """
    Print the response.
    :param response_dict: The response dictionary to print.
    """

    application = response_dict.get("ControlText")
    if not application:
        application = "[The required application needs to be opened.]"
    observation = response_dict.get("Observation")
    thought = response_dict.get("Thought")
    bash_command = response_dict.get("Bash", None)
    subtask = response_dict.get("CurrentSubtask")

    # Convert the message from a list to a string.
    message = list(response_dict.get("Message", ""))
    message = "\n".join(message)

    # Concatenate the subtask with the plan and convert the plan from a list to a string.
    plan = list(response_dict.get("Plan"))
    plan = [subtask] + plan
    plan = "\n".join([f"({i+1}) " + str(item) for i, item in enumerate(plan)])

    status = response_dict.get("Status")
    comment = response_dict.get("Comment")

    utils.print_with_color(
        "Observations👀: {observation}".format(observation=observation), "cyan"
    )
    utils.print_with_color("Thoughts💡: {thought}".format(thought=thought), "green")
    if bash_command:
        utils.print_with_color(
            "Running Bash Command🔧: {bash}".format(bash=bash_command), "yellow"
        )
    utils.print_with_color(
        "Plans📚: {plan}".format(plan=plan),
        "cyan",
    )
    utils.print_with_color(
        "Next Selected application📲: {application}".format(
            application=application
        ),
        "yellow",
    )
    utils.print_with_color(
        "Messages to AppAgent📩: {message}".format(message=message), "cyan"
    )
    utils.print_with_color("Status📊: {status}".format(status=status), "blue")

    utils.print_with_color("Comment💬: {comment}".format(comment=comment), "green")

process(context)

Process the agent.

Parameters:
  • context (Context) –

    The context.

Source code in agents/agent/host_agent.py
204
205
206
207
208
209
210
211
212
213
def process(self, context: Context) -> None:
    """
    Process the agent.
    :param context: The context.
    """
    self.processor = HostAgentProcessor(agent=self, context=context)
    self.processor.process()

    # Sync the status with the processor.
    self.status = self.processor.status

process_comfirmation()

TODO: Process the confirmation.

Source code in agents/agent/host_agent.py
297
298
299
300
301
def process_comfirmation(self) -> None:
    """
    TODO: Process the confirmation.
    """
    pass