HostAgent 🤖

The HostAgent assumes three primary responsibilities:

  1. User Engagement: The HostAgent engages with the user to understand their request and analyze their intent. It also conversates with the user to gather additional information when necessary.
  2. AppAgent Management: The HostAgent manages the creation and registration of AppAgents to fulfill the user's request. It also orchestrates the interaction between the AppAgents and the application.
  3. Task Management: The HostAgent analyzes the user's request, to decompose it into sub-tasks and distribute them among the AppAgents. It also manages the scheduling, orchestration, coordination, and monitoring of the AppAgents to ensure the successful completion of the user's request.
  4. Communication: The HostAgent communicates with the AppAgents to exchange information. It also manages the Blackboard to store and share information among the agents, as shown below:

Blackboard Image

The HostAgent activates its Processor to process the user's request and decompose it into sub-tasks. Each sub-task is then assigned to an AppAgent for execution. The HostAgent monitors the progress of the AppAgents and ensures the successful completion of the user's request.

HostAgent Input

The HostAgent receives the following inputs:

Input Description Type
User Request The user's request in natural language. String
Application Information Information about the existing active applications. List of Strings
Desktop Screenshots Screenshots of the desktop to provide context to the HostAgent. Image
Previous Sub-Tasks The previous sub-tasks and their completion status. List of Strings
Previous Plan The previous plan for the following sub-tasks. List of Strings
Blackboard The shared memory space for storing and sharing information among the agents. Dictionary

By processing these inputs, the HostAgent determines the appropriate application to fulfill the user's request and orchestrates the AppAgents to execute the necessary actions.

HostAgent Output

With the inputs provided, the HostAgent generates the following outputs:

Output Description Type
Observation The observation of current desktop screenshots. String
Thought The logical reasoning process of the HostAgent. String
Current Sub-Task The current sub-task to be executed by the AppAgent. String
Message The message to be sent to the AppAgent for the completion of the sub-task. String
ControlLabel The index of the selected application to execute the sub-task. String
ControlText The name of the selected application to execute the sub-task. String
Plan The plan for the following sub-tasks after the current sub-task. List of Strings
Status The status of the agent, mapped to the AgentState. String
Comment Additional comments or information provided to the user. String
Questions The questions to be asked to the user for additional information. List of Strings
AppsToOpen The application to be opened to execute the sub-task if it is not already open. Dictionary

Below is an example of the HostAgent output:

{
    "Observation": "Desktop screenshot",
    "Thought": "Logical reasoning process",
    "Current Sub-Task": "Sub-task description",
    "Message": "Message to AppAgent",
    "ControlLabel": "Application index",
    "ControlText": "Application name",
    "Plan": ["Sub-task 1", "Sub-task 2"],
    "Status": "AgentState",
    "Comment": "Additional comments",
    "Questions": ["Question 1", "Question 2"],
    "AppsToOpen": {"APP": "powerpnt", "file_path": ""}
}

Info

The HostAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python.

HostAgent State

The HostAgent progresses through different states, as defined in the ufo/agents/states/host_agent_states.py module. The states include:

State Description
CONTINUE The HostAgent is ready to process the user's request and emloy the Processor to decompose it into sub-tasks and assign them to the AppAgents.
FINISH The overall task is completed, and the HostAgent is ready to return the results to the user.
ERROR An error occurred during the processing of the user's request, and the HostAgent is unable to proceed.
FAIL The HostAgent believes the task is unachievable and cannot proceed further.
PENDING The HostAgent is waiting for additional information from the user to proceed.

The state machine diagram for the HostAgent is shown below:

The HostAgent transitions between these states based on the user's request, the application information, and the progress of the AppAgents in executing the sub-tasks.

Task Decomposition

Upon receiving the user's request, the HostAgent decomposes it into sub-tasks and assigns each sub-task to an AppAgent for execution. The HostAgent determines the appropriate application to fulfill the user's request based on the application information and the user's request. It then orchestrates the AppAgents to execute the necessary actions to complete the sub-tasks. We show the task decomposition process in the following figure:

Task Decomposition Image

Creating and Registering AppAgents

When the HostAgent determines the need for a new AppAgent to fulfill a sub-task, it creates an instance of the AppAgent and registers it with the HostAgent, by calling the create_subagent method:

def create_subagent(
        self,
        agent_type: str,
        agent_name: str,
        process_name: str,
        app_root_name: str,
        is_visual: bool,
        main_prompt: str,
        example_prompt: str,
        api_prompt: str,
        *args,
        **kwargs,
    ) -> BasicAgent:
        """
        Create an SubAgent hosted by the HostAgent.
        :param agent_type: The type of the agent to create.
        :param agent_name: The name of the SubAgent.
        :param process_name: The process name of the app.
        :param app_root_name: The root name of the app.
        :param is_visual: The flag indicating whether the agent is visual or not.
        :param main_prompt: The main prompt file path.
        :param example_prompt: The example prompt file path.
        :param api_prompt: The API prompt file path.
        :return: The created SubAgent.
        """
        app_agent = self.agent_factory.create_agent(
            agent_type,
            agent_name,
            process_name,
            app_root_name,
            is_visual,
            main_prompt,
            example_prompt,
            api_prompt,
            *args,
            **kwargs,
        )
        self.appagent_dict[agent_name] = app_agent
        app_agent.host = self
        self._active_appagent = app_agent

        return app_agent

The HostAgent then assigns the sub-task to the AppAgent for execution and monitors its progress.

Reference

Bases: BasicAgent

The HostAgent class the manager of AppAgents.

Initialize the HostAgent. :name: The name of the agent.

Parameters:
  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt file path.

  • example_prompt (str) –

    The example prompt file path.

  • api_prompt (str) –

    The API prompt file path.

Source code in agents/agent/host_agent.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def __init__(
    self,
    name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
    allow_openapp=False,
) -> None:
    """
    Initialize the HostAgent.
    :name: The name of the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    """
    super().__init__(name=name)
    self.prompter = self.get_prompter(
        is_visual, main_prompt, example_prompt, api_prompt, allow_openapp
    )
    self.offline_doc_retriever = None
    self.online_doc_retriever = None
    self.experience_retriever = None
    self.human_demonstration_retriever = None
    self.agent_factory = AgentFactory()
    self.appagent_dict = {}
    self._active_appagent = None
    self._blackboard = Blackboard()
    self.set_state(ContinueHostAgentState())

blackboard property

Get the blackboard.

status_manager: HostAgentStatus property

Get the status manager.

sub_agent_amount: int property

Get the amount of sub agents.

Returns:
  • int –

    The amount of sub agents.

app_file_manager(app_file_info)

Open the application or file for the user.

Parameters:
  • app_file_info (Dict[str, str]) –

    The information of the application or file. {'APP': name of app, 'file_path': path}

Returns:
  • UIAWrapper –

    The window of the application.

Source code in agents/agent/host_agent.py
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
def app_file_manager(self, app_file_info: Dict[str, str]) -> UIAWrapper:
    """
    Open the application or file for the user.
    :param app_file_info: The information of the application or file. {'APP': name of app, 'file_path': path}
    :return: The window of the application.
    """

    utils.print_with_color("Opening the required application or file...", "yellow")
    file_manager = openfile.FileController()
    results = file_manager.execute_code(app_file_info)
    time.sleep(configs.get("SLEEP_TIME", 5))
    desktop_windows_dict = ControlInspectorFacade(
        configs["CONTROL_BACKEND"]
    ).get_desktop_app_dict(remove_empty=True)
    if not results:
        self.status = "ERROR in openning the application or file."
        return None
    app_window = file_manager.find_window_by_app_name(desktop_windows_dict)
    app_name = app_window.window_text()

    utils.print_with_color(
        f"The application {app_name} has been opened successfully.", "green"
    )

    return app_window

create_subagent(agent_type, agent_name, process_name, app_root_name, is_visual, main_prompt, example_prompt, api_prompt, *args, **kwargs)

Create an SubAgent hosted by the HostAgent.

Parameters:
  • agent_type (str) –

    The type of the agent to create.

  • agent_name (str) –

    The name of the SubAgent.

  • process_name (str) –

    The process name of the app.

  • app_root_name (str) –

    The root name of the app.

  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt file path.

  • example_prompt (str) –

    The example prompt file path.

  • api_prompt (str) –

    The API prompt file path.

Returns:
  • BasicAgent –

    The created SubAgent.

Source code in agents/agent/host_agent.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
def create_subagent(
    self,
    agent_type: str,
    agent_name: str,
    process_name: str,
    app_root_name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
    *args,
    **kwargs,
) -> BasicAgent:
    """
    Create an SubAgent hosted by the HostAgent.
    :param agent_type: The type of the agent to create.
    :param agent_name: The name of the SubAgent.
    :param process_name: The process name of the app.
    :param app_root_name: The root name of the app.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    :return: The created SubAgent.
    """
    app_agent = self.agent_factory.create_agent(
        agent_type,
        agent_name,
        process_name,
        app_root_name,
        is_visual,
        main_prompt,
        example_prompt,
        api_prompt,
        *args,
        **kwargs,
    )
    self.appagent_dict[agent_name] = app_agent
    app_agent.host = self
    self._active_appagent = app_agent

    return app_agent

get_active_appagent()

Get the active app agent.

Returns:
  • AppAgent –

    The active app agent.

Source code in agents/agent/host_agent.py
157
158
159
160
161
162
def get_active_appagent(self) -> AppAgent:
    """
    Get the active app agent.
    :return: The active app agent.
    """
    return self._active_appagent

get_prompter(is_visual, main_prompt, example_prompt, api_prompt, allow_openapp=False)

Get the prompt for the agent.

Parameters:
  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt file path.

  • example_prompt (str) –

    The example prompt file path.

  • api_prompt (str) –

    The API prompt file path.

Returns:
  • HostAgentPrompter –

    The prompter instance.

Source code in agents/agent/host_agent.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
def get_prompter(
    self,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
    allow_openapp=False,
) -> HostAgentPrompter:
    """
    Get the prompt for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    :return: The prompter instance.
    """
    return HostAgentPrompter(
        is_visual, main_prompt, example_prompt, api_prompt, allow_openapp
    )

message_constructor(image_list, os_info, plan, prev_subtask, request)

Construct the message.

Parameters:
  • image_list (List[str]) –

    The list of screenshot images.

  • os_info (str) –

    The OS information.

  • prev_subtask (List[Dict[str, str]]) –

    The previous subtask.

  • plan (List[str]) –

    The plan.

  • request (str) –

    The request.

Returns:
  • List[Dict[str, Union[str, List[Dict[str, str]]]]] –

    The message.

Source code in agents/agent/host_agent.py
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
def message_constructor(
    self,
    image_list: List[str],
    os_info: str,
    plan: List[str],
    prev_subtask: List[Dict[str, str]],
    request: str,
) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
    """
    Construct the message.
    :param image_list: The list of screenshot images.
    :param os_info: The OS information.
    :param prev_subtask: The previous subtask.
    :param plan: The plan.
    :param request: The request.
    :return: The message.
    """
    hostagent_prompt_system_message = self.prompter.system_prompt_construction()
    hostagent_prompt_user_message = self.prompter.user_content_construction(
        image_list=image_list,
        control_item=os_info,
        prev_subtask=prev_subtask,
        prev_plan=plan,
        user_request=request,
    )

    if not self.blackboard.is_empty():
        blackboard_prompt = self.blackboard.blackboard_to_prompt()
        hostagent_prompt_user_message = (
            blackboard_prompt + hostagent_prompt_user_message
        )

    hostagent_prompt_message = self.prompter.prompt_construction(
        hostagent_prompt_system_message, hostagent_prompt_user_message
    )

    return hostagent_prompt_message

print_response(response_dict)

Print the response.

Parameters:
  • response_dict (Dict) –

    The response dictionary to print.

Source code in agents/agent/host_agent.py
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
def print_response(self, response_dict: Dict) -> None:
    """
    Print the response.
    :param response_dict: The response dictionary to print.
    """

    application = response_dict.get("ControlText")
    if not application:
        application = "[The required application needs to be opened.]"
    observation = response_dict.get("Observation")
    thought = response_dict.get("Thought")
    subtask = response_dict.get("CurrentSubtask")

    # Convert the message from a list to a string.
    message = list(response_dict.get("Message", ""))
    message = "\n".join(message)

    # Concatenate the subtask with the plan and convert the plan from a list to a string.
    plan = list(response_dict.get("Plan"))
    plan = [subtask] + plan
    plan = "\n".join([f"({i+1}) " + str(item) for i, item in enumerate(plan)])

    status = response_dict.get("Status")
    comment = response_dict.get("Comment")

    utils.print_with_color(
        "Observations👀: {observation}".format(observation=observation), "cyan"
    )
    utils.print_with_color("Thoughts💡: {thought}".format(thought=thought), "green")
    utils.print_with_color(
        "Plans📚: {plan}".format(plan=plan),
        "cyan",
    )
    utils.print_with_color(
        "Next Selected application📲: {application}".format(
            application=application
        ),
        "yellow",
    )
    utils.print_with_color(
        "Messages to AppAgent📩: {message}".format(message=message), "cyan"
    )
    utils.print_with_color("Status📊: {status}".format(status=status), "blue")

    utils.print_with_color("Comment💬: {comment}".format(comment=comment), "green")

process(context)

Process the agent.

Parameters:
  • context (Context) –

    The context.

Source code in agents/agent/host_agent.py
235
236
237
238
239
240
241
242
def process(self, context: Context) -> None:
    """
    Process the agent.
    :param context: The context.
    """
    self.processor = HostAgentProcessor(agent=self, context=context)
    self.processor.process()
    self.status = self.processor.status

process_comfirmation()

TODO: Process the confirmation.

Source code in agents/agent/host_agent.py
244
245
246
247
248
def process_comfirmation(self) -> None:
    """
    TODO: Process the confirmation.
    """
    pass