AppAgent: Application Execution Agent

AppAgent is the core execution runtime in UFO, responsible for carrying out individual subtasks within a specific Windows application. Each AppAgent functions as an isolated, application-specialized worker process launched and orchestrated by the central HostAgent.


What is AppAgent?

AppAgent Architecture

AppAgent Architecture: Application-specialized worker process for subtask execution

AppAgent operates as a child agent under the HostAgent's orchestration:

  • Isolated Runtime: Each AppAgent is dedicated to a single Windows application
  • Subtask Executor: Executes specific subtasks delegated by HostAgent
  • Application Expert: Tailored with deep knowledge of the target app's API surface, control semantics, and domain logic
  • Hybrid Execution: Leverages both GUI automation and API-based actions through MCP commands

Unlike monolithic Computer-Using Agents (CUAs) that treat all GUI contexts uniformly, each AppAgent is tailored to a single application and operates with specialized knowledge of its interface and capabilities.


Core Responsibilities

graph TB subgraph "AppAgent Core Responsibilities" SR[Sense:<br/>Capture Application State] RE[Reason:<br/>Analyze Next Action] EX[Execute:<br/>GUI or API Action] RP[Report:<br/>Write Results to Blackboard] end SR --> RE RE --> EX EX --> RP RP --> SR style SR fill:#e3f2fd style RE fill:#fff3e0 style EX fill:#f1f8e9 style RP fill:#fce4ec
Responsibility Description Example
State Sensing Capture application UI, detect controls, understand current state Screenshot Word window → Detect 50 controls → Annotate UI elements
Reasoning Analyze state and determine next action using LLM "Table visible with Export button [12] → Click to export data"
Action Execution Execute GUI clicks or API calls via MCP commands click_input(control_id=12) or execute_word_command("export_table")
Result Reporting Write execution results to shared Blackboard Write extracted data to subtask_result_1 for HostAgent

ReAct-Style Control Loop

Upon receiving a subtask and execution context from the HostAgent, the AppAgent initializes a ReAct-style control loop where it iteratively:

  1. Observes the current application state (screenshot + control detection)
  2. Thinks about the next step (LLM reasoning)
  3. Acts by executing either a GUI or API-based action (MCP commands)
sequenceDiagram participant HostAgent participant AppAgent participant Application participant Blackboard HostAgent->>AppAgent: Delegate subtask<br/>"Extract table from Word" loop ReAct Loop AppAgent->>Application: Observe (screenshot + controls) Application-->>AppAgent: UI state AppAgent->>AppAgent: Think (LLM reasoning) AppAgent->>Application: Act (click/API call) Application-->>AppAgent: Action result end AppAgent->>Blackboard: Write result AppAgent->>HostAgent: Return control

The MCP command system enables reliable control over dynamic and complex UIs by favoring structured API commands whenever available, while retaining fallback to GUI-based interaction commands when necessary.


Execution Architecture

Finite State Machine

AppAgent uses a finite state machine with 7 states to control its execution flow:

  • CONTINUE: Continue processing the current subtask
  • FINISH: Successfully complete the subtask
  • ERROR: Encounter an unrecoverable error
  • FAIL: Fail to complete the subtask
  • PENDING: Wait for user input or clarification
  • CONFIRM: Request user confirmation for sensitive actions
  • SCREENSHOT: Capture and re-annotate the application screenshot

State Details: See State Machine Documentation for complete state definitions and transitions.

4-Phase Processing Pipeline

Each execution round follows a 4-phase pipeline:

graph LR DC[Phase 1:<br/>DATA_COLLECTION<br/>Screenshot + Controls] --> LLM[Phase 2:<br/>LLM_INTERACTION<br/>Reasoning] LLM --> AE[Phase 3:<br/>ACTION_EXECUTION<br/>GUI/API Action] AE --> MU[Phase 4:<br/>MEMORY_UPDATE<br/>Record Action] style DC fill:#e1f5ff style LLM fill:#fff4e6 style AE fill:#e8f5e9 style MU fill:#fce4ec

Strategy Details: See Processing Strategy Documentation for complete pipeline implementation.


Hybrid GUI–API Execution

AppAgent executes actions through the MCP (Model-Context Protocol) command system, which provides a unified interface for both GUI automation and native API calls:

# GUI-based command (fallback)
command = Command(
    tool_name="click_input",
    parameters={"control_id": "12", "button": "left"}
)
await command_dispatcher.execute_commands([command])

# API-based command (preferred when available)
command = Command(
    tool_name="word_export_table",
    parameters={"format": "csv", "path": "output.csv"}
)
await command_dispatcher.execute_commands([command])

Implementation: See Hybrid Actions for details on the MCP command system.


Knowledge Enhancement

AppAgent is enhanced with Retrieval Augmented Generation (RAG) from heterogeneous sources:

Knowledge Source Purpose Configuration
Help Documents Application-specific documentation Learning from Help Documents
Bing Search Latest information and updates Learning from Bing Search
Self-Demonstrations Successful action trajectories Experience Learning
Human Demonstrations Expert-provided workflows Learning from Demonstrations

Knowledge Substrate Overview: See Knowledge Substrate for the complete RAG architecture.


Command System

AppAgent executes actions through the MCP (Model-Context Protocol) command system:

Application-Level Commands:

  • capture_window_screenshot - Capture application window
  • get_control_info - Detect UI controls via UIA/OmniParser
  • click_input - Click on UI control
  • set_edit_text - Type text into input field
  • annotation - Annotate screenshot with control labels

Command Details: See Command System Documentation for complete command reference.


Control Detection Backends

AppAgent supports multiple control detection backends for comprehensive UI understanding:

UIA (UI Automation):
Native Windows UI Automation API for standard controls

  • ✅ Fast and accurate
  • ✅ Works with most Windows applications
  • ❌ May miss custom controls

OmniParser (Visual Detection):
Vision-based grounding model for visual elements

  • ✅ Detects icons, images, custom controls
  • ✅ Works with web content
  • ❌ Requires external service

Hybrid (UIA + OmniParser):
Best of both worlds - maximum coverage

  • ✅ Native controls + visual elements
  • ✅ Comprehensive UI understanding

Control Detection Details: See Control Detection Overview.


Input and Output

AppAgent Input

Input Description Source
User Request Original user request in natural language HostAgent
Sub-Task Specific subtask to execute HostAgent delegation
Application Context Target app name, window info HostAgent
Control Information Detected UI controls with labels Data collection phase
Screenshots Clean, annotated, previous step images Data collection phase
Blackboard Shared memory for inter-agent communication Global context
Retrieved Knowledge Help docs, demos, search results RAG system

AppAgent Output

Output Description Consumer
Observation Current UI state description LLM context
Thought Reasoning about next action Execution log
ControlLabel Selected control to interact with Action executor
Function MCP command to execute (click_input, set_edit_text, etc.) Command dispatcher
Args Command parameters Command dispatcher
Status Agent state (CONTINUE, FINISH, etc.) State machine
Blackboard Update Execution results HostAgent

Example Output:

{
    "Observation": "Word document with table, Export button at [12]",
    "Thought": "Click Export to extract table data",
    "ControlLabel": "12",
    "Function": "click_input",
    "Args": {"button": "left"},
    "Status": "CONTINUE"
}


Detailed Documentation:

Core Features:

Tutorials:


API Reference

Bases: BasicAgent

The AppAgent class that manages the interaction with the application.

Initialize the AppAgent.

Parameters:
  • name (str) –

    The name of the agent.

  • process_name (str) –

    The process name of the app.

  • app_root_name (str) –

    The root name of the app.

  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt file path.

  • example_prompt (str) –

    The example prompt file path.

  • skip_prompter (bool, default: False ) –

    The flag indicating whether to skip the prompter initialization.

  • mode (str, default: 'normal' ) –

    The mode of the agent.

Source code in agents/agent/app_agent.py
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
def __init__(
    self,
    name: str,
    process_name: str,
    app_root_name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    skip_prompter: bool = False,
    mode: str = "normal",
) -> None:
    """
    Initialize the AppAgent.
    :param name: The name of the agent.
    :param process_name: The process name of the app.
    :param app_root_name: The root name of the app.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :param skip_prompter: The flag indicating whether to skip the prompter initialization.
    :param mode: The mode of the agent.
    """
    super().__init__(name=name)
    if not skip_prompter:
        self.prompter = self.get_prompter(is_visual, main_prompt, example_prompt)
    self._process_name = process_name
    self._app_root_name = app_root_name
    self.offline_doc_retriever = None
    self.online_doc_retriever = None
    self.experience_retriever = None
    self.human_demonstration_retriever = None

    self._mode = mode

    self.set_state(self.default_state)

    self._context_provision_executed = False
    self.logger = logging.getLogger(__name__)

    self._processor: Optional[AppAgentProcessor] = None

default_state property

Get the default state.

mode property

Get the mode of the session.

status_manager property

Get the status manager.

tools_info property writable

Get the tools information.

Returns:
  • List[MCPToolInfo]

    The list of MCPToolInfo objects.

build_experience_retriever(db_path)

Build the experience retriever.

Parameters:
  • db_path (str) –

    The path to the experience database.

Returns:
  • None

    The experience retriever.

Source code in agents/agent/app_agent.py
434
435
436
437
438
439
440
441
442
def build_experience_retriever(self, db_path: str) -> None:
    """
    Build the experience retriever.
    :param db_path: The path to the experience database.
    :return: The experience retriever.
    """
    self.experience_retriever = self.retriever_factory.create_retriever(
        "experience", db_path
    )

build_human_demonstration_retriever(db_path)

Build the human demonstration retriever.

Parameters:
  • db_path (str) –

    The path to the human demonstration database.

Returns:
  • None

    The human demonstration retriever.

Source code in agents/agent/app_agent.py
444
445
446
447
448
449
450
451
452
def build_human_demonstration_retriever(self, db_path: str) -> None:
    """
    Build the human demonstration retriever.
    :param db_path: The path to the human demonstration database.
    :return: The human demonstration retriever.
    """
    self.human_demonstration_retriever = self.retriever_factory.create_retriever(
        "demonstration", db_path
    )

build_offline_docs_retriever()

Build the offline docs retriever.

Source code in agents/agent/app_agent.py
416
417
418
419
420
421
422
def build_offline_docs_retriever(self) -> None:
    """
    Build the offline docs retriever.
    """
    self.offline_doc_retriever = self.retriever_factory.create_retriever(
        "offline", self._app_root_name
    )

build_online_search_retriever(request, top_k)

Build the online search retriever.

Parameters:
  • request (str) –

    The request for online Bing search.

  • top_k (int) –

    The number of documents to retrieve.

Source code in agents/agent/app_agent.py
424
425
426
427
428
429
430
431
432
def build_online_search_retriever(self, request: str, top_k: int) -> None:
    """
    Build the online search retriever.
    :param request: The request for online Bing search.
    :param top_k: The number of documents to retrieve.
    """
    self.online_doc_retriever = self.retriever_factory.create_retriever(
        "online", request, top_k
    )

context_provision(request='', context=None) async

Provision the context for the app agent.

Parameters:
  • request (str, default: '' ) –

    The request sent to the Bing search retriever.

Source code in agents/agent/app_agent.py
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
async def context_provision(
    self, request: str = "", context: Context = None
) -> None:
    """
    Provision the context for the app agent.
    :param request: The request sent to the Bing search retriever.
    """

    ufo_config = get_ufo_config()

    # Load the offline document indexer for the app agent if available.
    if ufo_config.rag.offline_docs:
        console.print(
            f"📚 Loading offline help document indexer for {self._process_name}...",
            style="magenta",
        )
        self.build_offline_docs_retriever()

    # Load the online search indexer for the app agent if available.

    if ufo_config.rag.online_search and request:
        console.print("🔍 Creating a Bing search indexer...", style="magenta")
        self.build_online_search_retriever(
            request, ufo_config.rag.online_search_topk
        )

    # Load the experience indexer for the app agent if available.
    if ufo_config.rag.experience:
        console.print("📖 Creating an experience indexer...", style="magenta")
        experience_path = ufo_config.rag.experience_saved_path
        db_path = os.path.join(experience_path, "experience_db")
        self.build_experience_retriever(db_path)

    # Load the demonstration indexer for the app agent if available.
    if ufo_config.rag.demonstration:
        console.print("🎬 Creating an demonstration indexer...", style="magenta")
        demonstration_path = ufo_config.rag.demonstration_saved_path
        db_path = os.path.join(demonstration_path, "demonstration_db")
        self.build_human_demonstration_retriever(db_path)

    await self._load_mcp_context(context)

demonstration_prompt_helper(request)

Get the examples and tips for the AppAgent using the demonstration retriever.

Parameters:
  • request

    The request for the AppAgent.

Returns:
  • Tuple[List[Dict[str, Any]]]

    The examples and tips for the AppAgent.

Source code in agents/agent/app_agent.py
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
def demonstration_prompt_helper(self, request) -> Tuple[List[Dict[str, Any]]]:
    """
    Get the examples and tips for the AppAgent using the demonstration retriever.
    :param request: The request for the AppAgent.
    :return: The examples and tips for the AppAgent.
    """

    ufo_config = get_ufo_config()

    # Get the examples and tips for the AppAgent using the experience and demonstration retrievers.
    if ufo_config.rag.experience:
        experience_results = self.rag_experience_retrieve(
            request, ufo_config.rag.experience_retrieved_topk
        )
    else:
        experience_results = []

    if ufo_config.rag.demonstration:
        demonstration_results = self.rag_demonstration_retrieve(
            request, ufo_config.rag.demonstration_retrieved_topk
        )
    else:
        demonstration_results = []

    return experience_results, demonstration_results

external_knowledge_prompt_helper(request, offline_top_k, online_top_k)

Retrieve the external knowledge and construct the prompt.

Parameters:
  • request (str) –

    The request.

  • offline_top_k (int) –

    The number of offline documents to retrieve.

  • online_top_k (int) –

    The number of online documents to retrieve.

Returns:
  • Tuple[str, str]

    The prompt message for the external_knowledge.

Source code in agents/agent/app_agent.py
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
def external_knowledge_prompt_helper(
    self, request: str, offline_top_k: int, online_top_k: int
) -> Tuple[str, str]:
    """
    Retrieve the external knowledge and construct the prompt.
    :param request: The request.
    :param offline_top_k: The number of offline documents to retrieve.
    :param online_top_k: The number of online documents to retrieve.
    :return: The prompt message for the external_knowledge.
    """

    # Retrieve offline documents and construct the prompt
    if self.offline_doc_retriever:

        offline_docs = self.offline_doc_retriever.retrieve(
            request,
            offline_top_k,
            filter=None,
        )

        format_string = "[Similar Requests]: {question}\nStep: {answer}\n"

        offline_docs_prompt = self.prompter.retrieved_documents_prompt_helper(
            "[Help Documents]",
            "",
            [
                format_string.format(
                    question=doc.metadata.get("title", ""),
                    answer=doc.metadata.get("text", ""),
                )
                for doc in offline_docs
            ],
        )
    else:
        offline_docs_prompt = ""

    # Retrieve online documents and construct the prompt
    if self.online_doc_retriever:
        online_search_docs = self.online_doc_retriever.retrieve(
            request, online_top_k, filter=None
        )
        online_docs_prompt = self.prompter.retrieved_documents_prompt_helper(
            "Online Search Results",
            "Search Result",
            [doc.page_content for doc in online_search_docs],
        )
    else:
        online_docs_prompt = ""

    return offline_docs_prompt, online_docs_prompt

get_prompter(is_visual, main_prompt, example_prompt)

Get the prompt for the agent.

Parameters:
  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt file path.

  • example_prompt (str) –

    The example prompt file path.

Returns:
  • AppAgentPrompter

    The prompter instance.

Source code in agents/agent/app_agent.py
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
def get_prompter(
    self,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
) -> AppAgentPrompter:
    """
    Get the prompt for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt file path.
    :param example_prompt: The example prompt file path.
    :return: The prompter instance.
    """
    return AppAgentPrompter(is_visual, main_prompt, example_prompt)

message_constructor(dynamic_examples, dynamic_knowledge, image_list, control_info, prev_subtask, plan, request, subtask, current_application, host_message, blackboard_prompt, last_success_actions, include_last_screenshot)

Construct the prompt message for the AppAgent.

Parameters:
  • dynamic_examples (str) –

    The dynamic examples retrieved from the self-demonstration and human demonstration.

  • dynamic_knowledge (str) –

    The dynamic knowledge retrieved from the external knowledge base.

  • image_list (List) –

    The list of screenshot images.

  • control_info (str) –

    The control information.

  • plan (List[str]) –

    The plan list.

  • request (str) –

    The overall user request.

  • subtask (str) –

    The subtask for the current AppAgent to process.

  • current_application (str) –

    The current application name.

  • host_message (List[str]) –

    The message from the HostAgent.

  • blackboard_prompt (List[Dict[str, str]]) –

    The prompt message from the blackboard.

  • last_success_actions (List[Dict[str, Any]]) –

    The list of successful actions in the last step.

  • include_last_screenshot (bool) –

    The flag indicating whether to include the last screenshot.

Returns:
  • List[Dict[str, Union[str, List[Dict[str, str]]]]]

    The prompt message.

Source code in agents/agent/app_agent.py
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
def message_constructor(
    self,
    dynamic_examples: str,
    dynamic_knowledge: str,
    image_list: List,
    control_info: str,
    prev_subtask: List[Dict[str, str]],
    plan: List[str],
    request: str,
    subtask: str,
    current_application: str,
    host_message: List[str],
    blackboard_prompt: List[Dict[str, str]],
    last_success_actions: List[Dict[str, Any]],
    include_last_screenshot: bool,
) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
    """
    Construct the prompt message for the AppAgent.
    :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration.
    :param dynamic_knowledge: The dynamic knowledge retrieved from the external knowledge base.
    :param image_list: The list of screenshot images.
    :param control_info: The control information.
    :param plan: The plan list.
    :param request: The overall user request.
    :param subtask: The subtask for the current AppAgent to process.
    :param current_application: The current application name.
    :param host_message: The message from the HostAgent.
    :param blackboard_prompt: The prompt message from the blackboard.
    :param last_success_actions: The list of successful actions in the last step.
    :param include_last_screenshot: The flag indicating whether to include the last screenshot.
    :return: The prompt message.
    """
    appagent_prompt_system_message = self.prompter.system_prompt_construction(
        dynamic_examples
    )

    appagent_prompt_user_message = self.prompter.user_content_construction(
        image_list=image_list,
        control_item=control_info,
        prev_subtask=prev_subtask,
        prev_plan=plan,
        user_request=request,
        subtask=subtask,
        current_application=current_application,
        host_message=host_message,
        retrieved_docs=dynamic_knowledge,
        last_success_actions=last_success_actions,
        include_last_screenshot=include_last_screenshot,
    )

    if blackboard_prompt:
        appagent_prompt_user_message = (
            blackboard_prompt + appagent_prompt_user_message
        )

    appagent_prompt_message = self.prompter.prompt_construction(
        appagent_prompt_system_message, appagent_prompt_user_message
    )

    return appagent_prompt_message

print_response(response, print_action=True)

Print the response using the presenter.

Parameters:
  • response (AppAgentResponse) –

    The response object to print.

  • print_action (bool, default: True ) –

    The flag indicating whether to print the action.

Source code in agents/agent/app_agent.py
210
211
212
213
214
215
216
217
218
def print_response(
    self, response: AppAgentResponse, print_action: bool = True
) -> None:
    """
    Print the response using the presenter.
    :param response: The response object to print.
    :param print_action: The flag indicating whether to print the action.
    """
    self.presenter.present_app_agent_response(response, print_action=print_action)

process(context) async

Process the agent.

Parameters:
  • context (Context) –

    The context.

Source code in agents/agent/app_agent.py
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
async def process(self, context: Context) -> None:
    """
    Process the agent.
    :param context: The context.
    """
    if not self._context_provision_executed:
        await self.context_provision(context=context)
        self._context_provision_executed = True

    if not self._processor_cls:
        raise ValueError(f"{self.__class__.__name__} has no processor assigned.")

    self.processor: ProcessorTemplate = self._processor_cls(
        agent=self, global_context=context
    )
    await self.processor.process()

    self.status = self.processor.processing_context.get_local("status")

process_confirmation()

Process the user confirmation.

Returns:
  • bool

    The decision.

Source code in agents/agent/app_agent.py
387
388
389
390
391
392
393
394
395
396
397
398
399
400
def process_confirmation(self) -> bool:
    """
    Process the user confirmation.
    :return: The decision.
    """
    action = self.processor.actions
    control_text = self.processor.control_text

    decision = interactor.sensitive_step_asker(action, control_text)

    if not decision:
        console.print("❌ The user has canceled the action.", style="red")

    return decision

rag_demonstration_retrieve(request, demonstration_top_k)

Retrieving demonstration examples for the user request.

Parameters:
  • request (str) –

    The user request.

  • demonstration_top_k (int) –

    The number of documents to retrieve.

Returns:
  • str

    The retrieved examples and tips string.

Source code in agents/agent/app_agent.py
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
def rag_demonstration_retrieve(self, request: str, demonstration_top_k: int) -> str:
    """
    Retrieving demonstration examples for the user request.
    :param request: The user request.
    :param demonstration_top_k: The number of documents to retrieve.
    :return: The retrieved examples and tips string.
    """

    retrieved_docs = []

    # Retrieve demonstration examples.
    demonstration_docs = self.human_demonstration_retriever.retrieve(
        request, demonstration_top_k
    )

    if demonstration_docs:
        for doc in demonstration_docs:
            example_request = doc.metadata.get("request", "")
            response = doc.metadata.get("example", {})
            subtask = doc.metadata.get("Sub-task", "")
            tips = doc.metadata.get("Tips", "")
            retrieved_docs.append(
                {
                    "Request": example_request,
                    "Response": response,
                    "Sub-task": subtask,
                    "Tips": tips,
                }
            )

        return retrieved_docs
    else:
        return []

rag_experience_retrieve(request, experience_top_k)

Retrieving experience examples for the user request.

Parameters:
  • request (str) –

    The user request.

  • experience_top_k (int) –

    The number of documents to retrieve.

Returns:
  • List[Dict[str, Any]]

    The retrieved examples and tips dictionary.

Source code in agents/agent/app_agent.py
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
def rag_experience_retrieve(
    self, request: str, experience_top_k: int
) -> List[Dict[str, Any]]:
    """
    Retrieving experience examples for the user request.
    :param request: The user request.
    :param experience_top_k: The number of documents to retrieve.
    :return: The retrieved examples and tips dictionary.
    """

    retrieved_docs = []

    # Retrieve experience examples. Only retrieve the examples that are related to the current application.
    experience_docs = self.experience_retriever.retrieve(
        request,
        experience_top_k,
        filter=lambda x: self._app_root_name.lower()
        in [app.lower() for app in x["app_list"]],
    )

    if experience_docs:
        for doc in experience_docs:
            example_request = doc.metadata.get("request", "")
            response = doc.metadata.get("example", {})
            tips = doc.metadata.get("Tips", "")
            subtask = doc.metadata.get("Sub-task", "")
            retrieved_docs.append(
                {
                    "Request": example_request,
                    "Response": response,
                    "Sub-task": subtask,
                    "Tips": tips,
                }
            )

    return retrieved_docs

Summary

AppAgent Key Characteristics:

Application-Specialized Worker: Dedicated to single Windows application
ReAct Control Loop: Iterative observe → think → act execution
Hybrid Execution: GUI automation + API calls via MCP commands
7-State FSM: Robust state management for execution control
4-Phase Pipeline: Structured data collection → reasoning → action → memory
Knowledge-Enhanced: RAG from docs, demos, and search
Orchestrated by HostAgent: Child agent in hierarchical architecture

Next Steps:

  1. Deep Dive: Read State Machine and Processing Strategy for implementation details
  2. Learn Features: Explore Core Features for advanced capabilities
  3. Hands-On Tutorial: Follow Creating AppAgent guide