AppAgent Command System
AppAgent executes application-level commands through the MCP (Model-Context Protocol) system. Commands are dynamically provided by MCP servers and executed through the CommandDispatcher interface. This document describes the MCP configuration for AppAgent commands.
Command Execution Architecture
Dynamic Commands
AppAgent commands are not hardcoded. They are dynamically discovered from configured MCP servers. The available commands depend on:
- MCP server configuration in
config/ufo/mcp.yaml - Application context (e.g., Word, Excel, PowerPoint)
- Installed MCP servers (local, HTTP, or stdio)
MCP Server Configuration
Configuration File
AppAgent commands are configured in config/ufo/mcp.yaml:
# Default configuration for all applications
AppAgent:
default:
data_collection:
- namespace: UICollector
type: local
start_args: []
reset: false
action:
- namespace: AppUIExecutor
type: local
start_args: []
reset: false
- namespace: CommandLineExecutor
type: local
start_args: []
reset: false
# Application-specific configurations
WINWORD.EXE:
action:
- namespace: AppUIExecutor
type: local
- namespace: WordCOMExecutor
type: local
reset: true # Reset on document switch
EXCEL.EXE:
action:
- namespace: AppUIExecutor
type: local
- namespace: ExcelCOMExecutor
type: local
reset: true
POWERPNT.EXE:
action:
- namespace: AppUIExecutor
type: local
- namespace: PowerPointCOMExecutor
type: local
reset: true
explorer.exe:
action:
- namespace: AppUIExecutor
type: local
- namespace: PDFReaderExecutor
type: local
reset: true
MCP Servers Used by AppAgent
| Server | Namespace | Type | Purpose | Command Categories |
|---|---|---|---|---|
| UICollector | UICollector |
Local | Data collection | Screenshot capture, control detection, UI tree |
| AppUIExecutor | AppUIExecutor |
Local | UI automation | Mouse clicks, keyboard input, text entry |
| CommandLineExecutor | CommandLineExecutor |
Local | Shell execution | PowerShell, Bash commands |
| WordCOMExecutor | WordCOMExecutor |
Local | Word automation | Document creation, text manipulation, formatting |
| ExcelCOMExecutor | ExcelCOMExecutor |
Local | Excel automation | Workbook creation, data entry, charts |
| PowerPointCOMExecutor | PowerPointCOMExecutor |
Local | PowerPoint automation | Presentation creation, slides, shapes |
| PDFReaderExecutor | PDFReaderExecutor |
Local | PDF operations | Text extraction, page navigation |
When AppAgent works with specific applications (Word, Excel, PowerPoint), additional COM executor servers are automatically loaded to provide native API access alongside UI automation commands. These servers have reset: true to prevent state leakage between documents.
Command Discovery
Listing Available Commands
AppAgent dynamically discovers available commands from MCP servers:
# Get all available tools from MCP servers
result = await command_dispatcher.execute_commands([
Command(tool_name="list_tools", parameters={})
])
tools = result[0].result
# Returns list of all available commands with their schemas
Command Categories
Commands are categorized by purpose:
| Category | Server | Examples |
|---|---|---|
| Data Collection | UICollector | capture_window_screenshot, get_app_window_controls_target_info, get_ui_tree |
| Mouse Actions | AppUIExecutor | click_input, click_on_coordinates, drag_on_coordinates, wheel_mouse_input |
| Keyboard Actions | AppUIExecutor | set_edit_text, keyboard_input |
| Data Retrieval | AppUIExecutor | texts, get_text |
| Document API | WordCOMExecutor | create_document, insert_text, save_document |
| Spreadsheet API | ExcelCOMExecutor | create_workbook, insert_data, create_chart |
| Presentation API | PowerPointCOMExecutor | create_presentation, add_slide, insert_shape |
| Shell Execution | CommandLineExecutor | execute_command |
Command Execution
Execution Flow
Example: Execute UI Command
from aip.messages import Command
# Create command
command = Command(
tool_name="click_input",
parameters={
"id": "12",
"name": "Export",
"button": "left",
"double": False
},
tool_type="action",
)
# Execute command
results = await command_dispatcher.execute_commands([command])
# Check result
if results[0].status == "SUCCESS":
print(f"Command executed: {results[0].result}")
Configuration Resources
For detailed MCP configuration, server setup, and command reference:
Quick References:
- MCP Configuration Reference - Quick MCP settings reference
- MCP Overview - MCP architecture and concepts
Configuration Guides:
- MCP Configuration Guide - Complete configuration documentation
- Local Servers - Built-in MCP servers
- Remote Servers - HTTP and stdio servers
- Creating MCP Servers - Creating custom MCP servers
Server Type Documentation:
- Action Servers - Action server documentation
- Data Collection Servers - Data collection server documentation
Detailed Server Documentation
Each MCP server has comprehensive documentation:
| Server | Documentation | Command Details |
|---|---|---|
| UICollector | UICollector Server | Screenshot, control detection, UI tree commands |
| AppUIExecutor | AppUIExecutor Server | UI automation commands with parameters |
| WordCOMExecutor | Word COM Executor | Microsoft Word API commands |
| ExcelCOMExecutor | Excel COM Executor | Microsoft Excel API commands |
| PowerPointCOMExecutor | PowerPoint COM Executor | Microsoft PowerPoint API commands |
| PDFReaderExecutor | PDF Reader Executor | PDF reading commands |
| CommandLineExecutor | CommandLine Executor | Shell command execution |
Command Details Subject to Change
Specific command parameters, names, and behaviors may change as MCP servers evolve. Always refer to the server-specific documentation for the most up-to-date command reference.
Agent Configuration Settings
AppAgent Configuration
# config/ufo/app_agent_config.yaml
system:
# Control detection backend
control_backend:
- "uia" # Windows UI Automation
- "omniparser" # Vision-based detection
# Screenshot settings
save_full_screen: true # Also capture desktop
save_ui_tree: true # Save UI tree JSON
include_last_screenshot: true # Include previous step
concat_screenshot: true # Concatenate clean + annotated
# Window behavior
maximize_window: false # Maximize on selection
show_visual_outline_on_screen: true # Draw red outline
See Configuration Overview and System Configuration for complete configuration options.
Related Documentation
Architecture & Design:
- AppAgent Overview - High-level AppAgent architecture
- State Machine - State machine documentation
- Processing Strategy - 4-phase processing pipeline
- HostAgent Commands - Desktop-level commands
Core Features:
- Hybrid Actions - MCP command system architecture
- Control Detection - UIA and OmniParser backends
- Command Dispatcher - Command routing
Summary
Key Takeaways:
✅ MCP-Based: All commands provided by MCP servers configured in mcp.yaml
✅ Dynamic Discovery: Commands discovered at runtime via list_tools
✅ Application-Specific: COM executors auto-loaded for Word, Excel, PowerPoint
✅ Hybrid Approach: UI automation + native API commands
✅ Configurable: Extensive MCP server configuration options
✅ Documented: Each server has detailed command reference
Command Details Subject to Change
Specific command parameters, names, and behaviors may change as MCP servers evolve. Always refer to the server-specific documentation for the most up-to-date command reference.
Next Steps:
- Review MCP Configuration: MCP Configuration Reference
- Explore Server Documentation: Click server links above for command details
- Understand Processing: Processing Strategy shows commands in action
- Learn State Machine: State Machine explains when commands execute