AppAgent 👾

An AppAgent is responsible for iteratively executing actions on the selected applications until the task is successfully concluded within a specific application. The AppAgent is created by the HostAgent to fulfill a sub-task within a Round. The AppAgent is responsible for executing the necessary actions within the application to fulfill the user's request. The AppAgent has the following features:

ReAct with the Application - The AppAgent recursively interacts with the application in a workflow of observation->thought->action, leveraging the multi-modal capabilities of Visual Language Models (VLMs) to comprehend the application UI and fulfill the user's request.
Comprehension Enhancement - The AppAgent is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases, and demonstration libraries, making the agent an application "expert".
Versatile Skill Set - The AppAgent is equipped with a diverse set of skills to support comprehensive automation, such as mouse, keyboard, native APIs, and "Copilot".

Tip

You can find the how to enhance the AppAgent with external knowledge bases and demonstration libraries in the Reinforcing AppAgent documentation.

We show the framework of the AppAgent in the following diagram:

AppAgent Input

To interact with the application, the AppAgent receives the following inputs:

Input	Description	Type
User Request	The user's request in natural language.	String
Sub-Task	The sub-task description to be executed by the `AppAgent`, assigned by the `HostAgent`.	String
Current Application	The name of the application to be interacted with.	String
Control Information	Index, name and control type of available controls in the application.	List of Dictionaries
Application Screenshots	Screenshots of the application, including a clean screenshot, an annotated screenshot with labeled controls, and a screenshot with a rectangle around the selected control at the previous step (optional).	List of Strings
Previous Sub-Tasks	The previous sub-tasks and their completion status.	List of Strings
Previous Plan	The previous plan for the following steps.	List of Strings
HostAgent Message	The message from the `HostAgent` for the completion of the sub-task.	String
Retrived Information	The retrieved information from external knowledge bases or demonstration libraries.	String
Blackboard	The shared memory space for storing and sharing information among the agents.	Dictionary

Below is an example of the annotated application screenshot with labeled controls. This follow the Set-of-Mark paradigm.

By processing these inputs, the AppAgent determines the necessary actions to fulfill the user's request within the application.

Tip

Whether to concatenate the clean screenshot and annotated screenshot can be configured in the CONCAT_SCREENSHOT field in the config_dev.yaml file.

Tip

Whether to include the screenshot with a rectangle around the selected control at the previous step can be configured in the INCLUDE_LAST_SCREENSHOT field in the config_dev.yaml file.

AppAgent Output

With the inputs provided, the AppAgent generates the following outputs:

Output	Description	Type
Observation	The observation of the current application screenshots.	String
Thought	The logical reasoning process of the `AppAgent`.	String
ControlLabel	The index of the selected control to interact with.	String
ControlText	The name of the selected control to interact with.	String
Function	The function to be executed on the selected control.	String
Args	The arguments required for the function execution.	List of Strings
Status	The status of the agent, mapped to the `AgentState`.	String
Plan	The plan for the following steps after the current action.	List of Strings
Comment	Additional comments or information provided to the user.	String
SaveScreenshot	The flag to save the screenshot of the application to the `blackboard` for future reference.	Boolean

Below is an example of the AppAgent output:

{
    "Observation": "Application screenshot",
    "Thought": "Logical reasoning process",
    "ControlLabel": "Control index",
    "ControlText": "Control name",
    "Function": "Function name",
    "Args": ["arg1", "arg2"],
    "Status": "AgentState",
    "Plan": ["Step 1", "Step 2"],
    "Comment": "Additional comments",
    "SaveScreenshot": true
}

Info

The AppAgent output is formatted as a JSON object by LLMs and can be parsed by the json.loads method in Python.

AppAgent State

The AppAgent state is managed by a state machine that determines the next action to be executed based on the current state, as defined in the ufo/agents/states/app_agent_states.py module. The states include:

State	Description
`CONTINUE`	The `AppAgent` continues executing the current action.
`FINISH`	The `AppAgent` has completed the current sub-task.
`ERROR`	The `AppAgent` encountered an error during execution.
`FAIL`	The `AppAgent` believes the current sub-task is unachievable.
`CONFIRM`	The `AppAgent` is confirming the user's input or action.
`SCREENSHOT`	The `AppAgent` believes the current screenshot is not clear in annotating the control and requests a new screenshot.

The state machine diagram for the AppAgent is shown below:

The AppAgent progresses through these states to execute the necessary actions within the application and fulfill the sub-task assigned by the HostAgent.

Knowledge Enhancement

The AppAgent is enhanced by Retrieval Augmented Generation (RAG) from heterogeneous sources, including external knowledge bases and demonstration libraries. The AppAgent leverages this knowledge to enhance its comprehension of the application and learn from demonstrations to improve its performance.

Learning from Help Documents

User can provide help documents to the AppAgent to enhance its comprehension of the application and improve its performance in the config.yaml file.

Tip

Please find details configuration in the documentation.

Tip

You may also refer to the here for how to provide help documents to the AppAgent.

In the AppAgent, it calls the build_offline_docs_retriever to build a help document retriever, and uses the retrived_documents_prompt_helper to contruct the prompt for the AppAgent.

Learning from Bing Search

Since help documents may not cover all the information or the information may be outdated, the AppAgent can also leverage Bing search to retrieve the latest information. You can activate Bing search and configure the search engine in the config.yaml file.

Tip

Please find details configuration in the documentation.

Tip

You may also refer to the here for the implementation of Bing search in the AppAgent.

In the AppAgent, it calls the build_online_search_retriever to build a Bing search retriever, and uses the retrived_documents_prompt_helper to contruct the prompt for the AppAgent.

Learning from Self-Demonstrations

You may save successful action trajectories in the AppAgent to learn from self-demonstrations and improve its performance. After the completion of a session, the AppAgent will ask the user whether to save the action trajectories for future reference. You may configure the use of self-demonstrations in the config.yaml file.

Tip

You can find details of the configuration in the documentation.

Tip

You may also refer to the here for the implementation of self-demonstrations in the AppAgent.

In the AppAgent, it calls the build_experience_retriever to build a self-demonstration retriever, and uses the rag_experience_retrieve to retrieve the demonstration for the AppAgent.

Learning from Human Demonstrations

In addition to self-demonstrations, you can also provide human demonstrations to the AppAgent to enhance its performance by using the Step Recorder tool built in the Windows OS. The AppAgent will learn from the human demonstrations to improve its performance and achieve better personalization. The use of human demonstrations can be configured in the config.yaml file.

Tip

You can find details of the configuration in the documentation.

Tip

You may also refer to the here for the implementation of human demonstrations in the AppAgent.

In the AppAgent, it calls the build_human_demonstration_retriever to build a human demonstration retriever, and uses the rag_experience_retrieve to retrieve the demonstration for the AppAgent.

Skill Set for Automation

The AppAgent is equipped with a versatile skill set to support comprehensive automation within the application by calling the create_puppeteer_interface method. The skills include:

Skill	Description
UI Automation	Mimicking user interactions with the application UI controls using the `UI Automation` and `Win32` API.
Native API	Accessing the application's native API to execute specific functions and actions.
In-App Agent	Leveraging the in-app agent to interact with the application's internal functions and features.

By utilizing these skills, the AppAgent can efficiently interact with the application and fulfill the user's request. You can find more details in the Automator documentation and the code in the ufo/automator module.

Reference

Bases: BasicAgent

The AppAgent class that manages the interaction with the application.

Initialize the AppAgent. :name: The name of the agent.

Parameters:

process_name (str) –

The process name of the app.
app_root_name (str) –

The root name of the app.
is_visual (bool) –

The flag indicating whether the agent is visual or not.
main_prompt (str) –

The main prompt file path.
example_prompt (str) –

The example prompt file path.
api_prompt (str) –

The API prompt file path.
skip_prompter (bool, default: False ) –

The flag indicating whether to skip the prompter initialization.

Source code in agents/agent/app_agent.py

def __init__(
    self,
    name: str,
    process_name: str,
    app_root_name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
    skip_prompter: bool = False,
) -> None:
    """
    Initialize the AppAgent.
    :name: The name of the agent.
    :param process_name: The process name of the app.
    :param app_root_name: The root name of the app.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    :param skip_prompter: The flag indicating whether to skip the prompter initialization.
    """
    super().__init__(name=name)
    if not skip_prompter:
        self.prompter = self.get_prompter(
            is_visual, main_prompt, example_prompt, api_prompt, app_root_name
        )
    self._process_name = process_name
    self._app_root_name = app_root_name
    self.offline_doc_retriever = None
    self.online_doc_retriever = None
    self.experience_retriever = None
    self.human_demonstration_retriever = None

    self.Puppeteer = self.create_puppeteer_interface()

    control_detection_backend = configs.get("CONTROL_BACKEND", ["uia"])

    if "omniparser" in control_detection_backend:
        omniparser_endpoint = configs.get("OMNIPARSER", {}).get("ENDPOINT", "")
        omniparser_service = OmniParser(endpoint=omniparser_endpoint)
        self.grounding_service: Optional[BasicGrounding] = OmniparserGrounding(
            service=omniparser_service
        )
    else:
        self.grounding_service: Optional[BasicGrounding] = None

    self.set_state(ContinueAppAgentState())

`status_manager` `property`

Get the status manager.

`build_experience_retriever(db_path)`

Build the experience retriever.

Parameters:	`db_path` (`str`) – The path to the experience database.

Returns:	`None` – The experience retriever.

Source code in agents/agent/app_agent.py

def build_experience_retriever(self, db_path: str) -> None:
    """
    Build the experience retriever.
    :param db_path: The path to the experience database.
    :return: The experience retriever.
    """
    self.experience_retriever = self.retriever_factory.create_retriever(
        "experience", db_path
    )

`build_human_demonstration_retriever(db_path)`

Build the human demonstration retriever.

Parameters:	`db_path` (`str`) – The path to the human demonstration database.

Returns:	`None` – The human demonstration retriever.

Source code in agents/agent/app_agent.py

def build_human_demonstration_retriever(self, db_path: str) -> None:
    """
    Build the human demonstration retriever.
    :param db_path: The path to the human demonstration database.
    :return: The human demonstration retriever.
    """
    self.human_demonstration_retriever = self.retriever_factory.create_retriever(
        "demonstration", db_path
    )

`build_offline_docs_retriever()`

Build the offline docs retriever.

Source code in agents/agent/app_agent.py

def build_offline_docs_retriever(self) -> None:
    """
    Build the offline docs retriever.
    """
    self.offline_doc_retriever = self.retriever_factory.create_retriever(
        "offline", self._app_root_name
    )

`build_online_search_retriever(request, top_k)`

Build the online search retriever.

Parameters:	`request` (`str`) – The request for online Bing search. `top_k` (`int`) – The number of documents to retrieve.

Source code in agents/agent/app_agent.py

def build_online_search_retriever(self, request: str, top_k: int) -> None:
    """
    Build the online search retriever.
    :param request: The request for online Bing search.
    :param top_k: The number of documents to retrieve.
    """
    self.online_doc_retriever = self.retriever_factory.create_retriever(
        "online", request, top_k
    )

`context_provision(request='')`

Provision the context for the app agent.

Parameters:	`request` (`str`, default: `''` ) – The request sent to the Bing search retriever.

Source code in agents/agent/app_agent.py

def context_provision(self, request: str = "") -> None:
    """
    Provision the context for the app agent.
    :param request: The request sent to the Bing search retriever.
    """

    # Load the offline document indexer for the app agent if available.
    if configs["RAG_OFFLINE_DOCS"]:
        utils.print_with_color(
            "Loading offline help document indexer for {app}...".format(
                app=self._process_name
            ),
            "magenta",
        )
        self.build_offline_docs_retriever()

    # Load the online search indexer for the app agent if available.

    if configs["RAG_ONLINE_SEARCH"] and request:
        utils.print_with_color("Creating a Bing search indexer...", "magenta")
        self.build_online_search_retriever(
            request, configs["RAG_ONLINE_SEARCH_TOPK"]
        )

    # Load the experience indexer for the app agent if available.
    if configs["RAG_EXPERIENCE"]:
        utils.print_with_color("Creating an experience indexer...", "magenta")
        experience_path = configs["EXPERIENCE_SAVED_PATH"]
        db_path = os.path.join(experience_path, "experience_db")
        self.build_experience_retriever(db_path)

    # Load the demonstration indexer for the app agent if available.
    if configs["RAG_DEMONSTRATION"]:
        utils.print_with_color("Creating an demonstration indexer...", "magenta")
        demonstration_path = configs["DEMONSTRATION_SAVED_PATH"]
        db_path = os.path.join(demonstration_path, "demonstration_db")
        self.build_human_demonstration_retriever(db_path)

`create_puppeteer_interface()`

Create the Puppeteer interface to automate the app.

Returns:	`AppPuppeteer` – The Puppeteer interface.

Source code in agents/agent/app_agent.py

def create_puppeteer_interface(self) -> puppeteer.AppPuppeteer:
    """
    Create the Puppeteer interface to automate the app.
    :return: The Puppeteer interface.
    """
    return puppeteer.AppPuppeteer(self._process_name, self._app_root_name)

`demonstration_prompt_helper(request)`

Get the examples and tips for the AppAgent using the demonstration retriever.

Parameters:	`request` – The request for the AppAgent.

Returns:	`Tuple[List[Dict[str, Any]]]` – The examples and tips for the AppAgent.

Source code in agents/agent/app_agent.py

def demonstration_prompt_helper(self, request) -> Tuple[List[Dict[str, Any]]]:
    """
    Get the examples and tips for the AppAgent using the demonstration retriever.
    :param request: The request for the AppAgent.
    :return: The examples and tips for the AppAgent.
    """

    # Get the examples and tips for the AppAgent using the experience and demonstration retrievers.
    if configs["RAG_EXPERIENCE"]:
        experience_results = self.rag_experience_retrieve(
            request, configs["RAG_EXPERIENCE_RETRIEVED_TOPK"]
        )
    else:
        experience_results = []

    if configs["RAG_DEMONSTRATION"]:
        demonstration_results = self.rag_demonstration_retrieve(
            request, configs["RAG_DEMONSTRATION_RETRIEVED_TOPK"]
        )
    else:
        demonstration_results = []

    return experience_results, demonstration_results

`external_knowledge_prompt_helper(request, offline_top_k, online_top_k)`

Retrieve the external knowledge and construct the prompt.

Parameters:	`request` (`str`) – The request. `offline_top_k` (`int`) – The number of offline documents to retrieve. `online_top_k` (`int`) – The number of online documents to retrieve.

Returns:	`Tuple[str, str]` – The prompt message for the external_knowledge.

Source code in agents/agent/app_agent.py

def external_knowledge_prompt_helper(
    self, request: str, offline_top_k: int, online_top_k: int
) -> Tuple[str, str]:
    """
    Retrieve the external knowledge and construct the prompt.
    :param request: The request.
    :param offline_top_k: The number of offline documents to retrieve.
    :param online_top_k: The number of online documents to retrieve.
    :return: The prompt message for the external_knowledge.
    """

    # Retrieve offline documents and construct the prompt
    if self.offline_doc_retriever:
        offline_docs = self.offline_doc_retriever.retrieve(
            "How to {query} for {app}".format(
                query=request, app=self._process_name
            ),
            offline_top_k,
            filter=None,
        )
        offline_docs_prompt = self.prompter.retrived_documents_prompt_helper(
            "Help Documents",
            "Document",
            [doc.metadata["text"] for doc in offline_docs],
        )
    else:
        offline_docs_prompt = ""

    # Retrieve online documents and construct the prompt
    if self.online_doc_retriever:
        online_search_docs = self.online_doc_retriever.retrieve(
            request, online_top_k, filter=None
        )
        online_docs_prompt = self.prompter.retrived_documents_prompt_helper(
            "Online Search Results",
            "Search Result",
            [doc.page_content for doc in online_search_docs],
        )
    else:
        online_docs_prompt = ""

    return offline_docs_prompt, online_docs_prompt

`get_prompter(is_visual, main_prompt, example_prompt, api_prompt, app_root_name)`

Get the prompt for the agent.

Parameters:	`is_visual` (`bool`) – The flag indicating whether the agent is visual or not. `main_prompt` (`str`) – The main prompt file path. `example_prompt` (`str`) – The example prompt file path. `api_prompt` (`str`) – The API prompt file path. `app_root_name` (`str`) – The root name of the app.

Returns:	`AppAgentPrompter` – The prompter instance.

Source code in agents/agent/app_agent.py

def get_prompter(
    self,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
    app_root_name: str,
) -> AppAgentPrompter:
    """
    Get the prompt for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param api_prompt: The API prompt file path.
    :param app_root_name: The root name of the app.
    :return: The prompter instance.
    """
    return AppAgentPrompter(
        is_visual, main_prompt, example_prompt, api_prompt, app_root_name
    )

`message_constructor(dynamic_examples, dynamic_knowledge, image_list, control_info, prev_subtask, plan, request, subtask, current_application, host_message, blackboard_prompt, last_success_actions, include_last_screenshot)`

Construct the prompt message for the AppAgent.

Parameters:

dynamic_examples (str) –

The dynamic examples retrieved from the self-demonstration and human demonstration.
dynamic_knowledge (str) –

The dynamic knowledge retrieved from the external knowledge base.
image_list (List) –

The list of screenshot images.
control_info (str) –

The control information.
plan (List[str]) –

The plan list.
request (str) –

The overall user request.
subtask (str) –

The subtask for the current AppAgent to process.
current_application (str) –

The current application name.
host_message (List[str]) –

The message from the HostAgent.
blackboard_prompt (List[Dict[str, str]]) –

The prompt message from the blackboard.
last_success_actions (List[Dict[str, Any]]) –

The list of successful actions in the last step.
include_last_screenshot (bool) –

The flag indicating whether to include the last screenshot.

Returns:	`List[Dict[str, Union[str, List[Dict[str, str]]]]]` – The prompt message.

Source code in agents/agent/app_agent.py

def message_constructor(
    self,
    dynamic_examples: str,
    dynamic_knowledge: str,
    image_list: List,
    control_info: str,
    prev_subtask: List[Dict[str, str]],
    plan: List[str],
    request: str,
    subtask: str,
    current_application: str,
    host_message: List[str],
    blackboard_prompt: List[Dict[str, str]],
    last_success_actions: List[Dict[str, Any]],
    include_last_screenshot: bool,
) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
    """
    Construct the prompt message for the AppAgent.
    :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration.
    :param dynamic_knowledge: The dynamic knowledge retrieved from the external knowledge base.
    :param image_list: The list of screenshot images.
    :param control_info: The control information.
    :param plan: The plan list.
    :param request: The overall user request.
    :param subtask: The subtask for the current AppAgent to process.
    :param current_application: The current application name.
    :param host_message: The message from the HostAgent.
    :param blackboard_prompt: The prompt message from the blackboard.
    :param last_success_actions: The list of successful actions in the last step.
    :param include_last_screenshot: The flag indicating whether to include the last screenshot.
    :return: The prompt message.
    """
    appagent_prompt_system_message = self.prompter.system_prompt_construction(
        dynamic_examples
    )

    appagent_prompt_user_message = self.prompter.user_content_construction(
        image_list=image_list,
        control_item=control_info,
        prev_subtask=prev_subtask,
        prev_plan=plan,
        user_request=request,
        subtask=subtask,
        current_application=current_application,
        host_message=host_message,
        retrieved_docs=dynamic_knowledge,
        last_success_actions=last_success_actions,
        include_last_screenshot=include_last_screenshot,
    )

    if blackboard_prompt:
        appagent_prompt_user_message = (
            blackboard_prompt + appagent_prompt_user_message
        )

    appagent_prompt_message = self.prompter.prompt_construction(
        appagent_prompt_system_message, appagent_prompt_user_message
    )

    return appagent_prompt_message

`print_response(response_dict, print_action=True)`

Print the response.

Parameters:	`response_dict` (`Dict[str, Any]`) – The response dictionary to print. `print_action` (`bool`, default: `True` ) – The flag indicating whether to print the action.

Source code in agents/agent/app_agent.py

def print_response(
    self, response_dict: Dict[str, Any], print_action: bool = True
) -> None:
    """
    Print the response.
    :param response_dict: The response dictionary to print.
    :param print_action: The flag indicating whether to print the action.
    """

    control_text = response_dict.get("ControlText")
    control_label = response_dict.get("ControlLabel")
    if not control_text and not control_label:
        control_text = "[No control selected.]"
        control_label = "[No control label selected.]"
    observation = response_dict.get("Observation")
    thought = response_dict.get("Thought")
    plan = response_dict.get("Plan")
    status = response_dict.get("Status")
    comment = response_dict.get("Comment")
    function_call = response_dict.get("Function")
    args = utils.revise_line_breaks(response_dict.get("Args"))

    # Generate the function call string
    action = self.Puppeteer.get_command_string(function_call, args)

    utils.print_with_color(
        "Observations👀: {observation}".format(observation=observation), "cyan"
    )
    utils.print_with_color("Thoughts💡: {thought}".format(thought=thought), "green")
    if print_action:
        utils.print_with_color(
            "Selected item🕹️: {control_text}, Label: {label}".format(
                control_text=control_text, label=control_label
            ),
            "yellow",
        )
        utils.print_with_color(
            "Action applied⚒️: {action}".format(action=action), "blue"
        )
        utils.print_with_color("Status📊: {status}".format(status=status), "blue")
    utils.print_with_color(
        "Next Plan📚: {plan}".format(plan="\n".join(plan)), "cyan"
    )
    utils.print_with_color("Comment💬: {comment}".format(comment=comment), "green")

    screenshot_saving = response_dict.get("SaveScreenshot", {})

    if screenshot_saving.get("save", False):
        utils.print_with_color(
            "Notice: The current screenshot📸 is saved to the blackboard.",
            "yellow",
        )
        utils.print_with_color(
            "Saving reason: {reason}".format(
                reason=screenshot_saving.get("reason")
            ),
            "yellow",
        )

`process(context)`

Process the agent.

Parameters:	`context` (`Context`) – The context.

Source code in agents/agent/app_agent.py

def process(self, context: Context) -> None:
    """
    Process the agent.
    :param context: The context.
    """
    if configs.get("ACTION_SEQUENCE", False):
        self.processor = AppAgentActionSequenceProcessor(
            agent=self, context=context
        )
    else:
        self.processor = AppAgentProcessor(
            agent=self, context=context, ground_service=self.grounding_service
        )
    self.processor.process()
    self.status = self.processor.status

`process_comfirmation()`

Process the user confirmation.

Returns:	`bool` – The decision.

Source code in agents/agent/app_agent.py

def process_comfirmation(self) -> bool:
    """
    Process the user confirmation.
    :return: The decision.
    """
    action = self.processor.actions
    control_text = self.processor.control_text

    decision = interactor.sensitive_step_asker(action, control_text)

    if not decision:
        utils.print_with_color("The user has canceled the action.", "red")

    return decision

`rag_demonstration_retrieve(request, demonstration_top_k)`

Retrieving demonstration examples for the user request.

Parameters:	`request` (`str`) – The user request. `demonstration_top_k` (`int`) – The number of documents to retrieve.

Returns:	`str` – The retrieved examples and tips string.

Source code in agents/agent/app_agent.py

def rag_demonstration_retrieve(self, request: str, demonstration_top_k: int) -> str:
    """
    Retrieving demonstration examples for the user request.
    :param request: The user request.
    :param demonstration_top_k: The number of documents to retrieve.
    :return: The retrieved examples and tips string.
    """

    retrieved_docs = []

    # Retrieve demonstration examples.
    demonstration_docs = self.human_demonstration_retriever.retrieve(
        request, demonstration_top_k
    )

    if demonstration_docs:
        for doc in demonstration_docs:
            example_request = doc.metadata.get("request", "")
            response = doc.metadata.get("example", {})
            subtask = doc.metadata.get("Sub-task", "")
            tips = doc.metadata.get("Tips", "")
            retrieved_docs.append(
                {
                    "Request": example_request,
                    "Response": response,
                    "Sub-task": subtask,
                    "Tips": tips,
                }
            )
    else:
        examples = []
        tips = []

    return examples, tips

`rag_experience_retrieve(request, experience_top_k)`

Retrieving experience examples for the user request.

Parameters:	`request` (`str`) – The user request. `experience_top_k` (`int`) – The number of documents to retrieve.

Returns:	`List[Dict[str, Any]]` – The retrieved examples and tips dictionary.

Source code in agents/agent/app_agent.py

def rag_experience_retrieve(
    self, request: str, experience_top_k: int
) -> List[Dict[str, Any]]:
    """
    Retrieving experience examples for the user request.
    :param request: The user request.
    :param experience_top_k: The number of documents to retrieve.
    :return: The retrieved examples and tips dictionary.
    """

    retrieved_docs = []

    # Retrieve experience examples. Only retrieve the examples that are related to the current application.
    experience_docs = self.experience_retriever.retrieve(
        request,
        experience_top_k,
        filter=lambda x: self._app_root_name.lower()
        in [app.lower() for app in x["app_list"]],
    )

    if experience_docs:
        for doc in experience_docs:
            example_request = doc.metadata.get("request", "")
            response = doc.metadata.get("example", {})
            tips = doc.metadata.get("Tips", "")
            subtask = doc.metadata.get("Sub-task", "")
            retrieved_docs.append(
                {
                    "Request": example_request,
                    "Response": response,
                    "Sub-task": subtask,
                    "Tips": tips,
                }
            )

    return retrieved_docs

AppAgent 👾

AppAgent Input

AppAgent Output

AppAgent State

Knowledge Enhancement

Learning from Help Documents

Learning from Bing Search

Learning from Self-Demonstrations

Learning from Human Demonstrations

Skill Set for Automation

Reference

status_manager property

build_experience_retriever(db_path)

build_human_demonstration_retriever(db_path)

build_offline_docs_retriever()

build_online_search_retriever(request, top_k)

context_provision(request='')

create_puppeteer_interface()

demonstration_prompt_helper(request)

external_knowledge_prompt_helper(request, offline_top_k, online_top_k)

get_prompter(is_visual, main_prompt, example_prompt, api_prompt, app_root_name)

message_constructor(dynamic_examples, dynamic_knowledge, image_list, control_info, prev_subtask, plan, request, subtask, current_application, host_message, blackboard_prompt, last_success_actions, include_last_screenshot)

print_response(response_dict, print_action=True)

process(context)

process_comfirmation()

rag_demonstration_retrieve(request, demonstration_top_k)

rag_experience_retrieve(request, experience_top_k)

`status_manager` `property`

`build_experience_retriever(db_path)`

`build_human_demonstration_retriever(db_path)`

`build_offline_docs_retriever()`

`build_online_search_retriever(request, top_k)`

`context_provision(request='')`

`create_puppeteer_interface()`

`demonstration_prompt_helper(request)`

`external_knowledge_prompt_helper(request, offline_top_k, online_top_k)`

`get_prompter(is_visual, main_prompt, example_prompt, api_prompt, app_root_name)`

`message_constructor(dynamic_examples, dynamic_knowledge, image_list, control_info, prev_subtask, plan, request, subtask, current_application, host_message, blackboard_prompt, last_success_actions, include_last_screenshot)`

`print_response(response_dict, print_action=True)`

`process(context)`

`process_comfirmation()`

`rag_demonstration_retrieve(request, demonstration_top_k)`

`rag_experience_retrieve(request, experience_top_k)`