HostAgent Command System
HostAgent executes desktop-level commands through the MCP (Model Context Protocol) system. Commands are dynamically provided by MCP servers and executed through the CommandDispatcher interface. This document describes the MCP configuration for HostAgent commands.
Command Execution Architecture
Dynamic Commands
HostAgent commands are not hardcoded. They are dynamically discovered from configured MCP servers. Available commands depend on MCP server configuration in config/ufo/mcp.yaml, installed MCP servers, and active MCP connections.
MCP Server Configuration
Configuration File
HostAgent commands are configured in config/ufo/mcp.yaml:
HostAgent:
default:
data_collection:
- namespace: UICollector
type: local
start_args: []
reset: false
action:
- namespace: HostUIExecutor
type: local
start_args: []
reset: false
- namespace: CommandLineExecutor
type: local
start_args: []
reset: false
MCP Servers Used by HostAgent
| Server | Namespace | Type | Purpose | Command Categories |
|---|---|---|---|---|
| UICollector | UICollector |
Local | Data collection | Desktop screenshot, window enumeration |
| HostUIExecutor | HostUIExecutor |
Local | Desktop actions | Window selection, application launch |
| CommandLineExecutor | CommandLineExecutor |
Local | Shell execution | PowerShell, Bash commands |
Command Discovery
Listing Available Commands
HostAgent dynamically discovers available commands from MCP servers:
# Get all available tools from MCP servers
result = await command_dispatcher.execute_commands([
Command(tool_name="list_tools", parameters={})
])
tools = result[0].result
# Returns list of all available commands with their schemas
Command Categories
Commands are categorized by purpose:
| Category | Server | Examples |
|---|---|---|
| Data Collection | UICollector | capture_desktop_screenshot, get_desktop_app_target_info, get_desktop_window_info |
| Window Management | HostUIExecutor | select_application_window, launch_application |
| Process Control | HostUIExecutor | close_application, get_process_info |
| Shell Execution | CommandLineExecutor | execute_command |
| Tool Discovery | All Servers | list_tools |
Command Execution
Execution Flow
Example: Capture Desktop Screenshot
from aip.messages import Command
# Create command
command = Command(
tool_name="capture_desktop_screenshot",
parameters={"all_screens": True},
tool_type="data_collection",
)
# Execute command
results = await command_dispatcher.execute_commands([command])
# Access result
screenshot_data = results[0].result # Base64-encoded image
Example: Select Application Window
# Select and focus application window
command = Command(
tool_name="select_application_window",
parameters={
"id": "0",
"name": "Microsoft Word - Document1"
},
tool_type="action",
)
results = await command_dispatcher.execute_commands([command])
app_info = results[0].result
Configuration Resources
For detailed MCP configuration, server setup, and command reference:
Quick References:
- MCP Configuration Reference - Quick MCP settings reference
- MCP Overview - MCP architecture and concepts
Configuration Guides:
- MCP Configuration Guide - Complete configuration documentation
- Local Servers - Built-in MCP servers
- Remote Servers - HTTP and stdio servers
- Creating MCP Servers - Creating custom MCP servers
Server Type Documentation:
- Action Servers - Action server documentation
- Data Collection Servers - Data collection server documentation
Detailed Server Documentation
Each MCP server has comprehensive documentation:
| Server | Documentation | Command Details |
|---|---|---|
| UICollector | UICollector Server | Screenshot, window info, control detection commands |
| HostUIExecutor | HostUIExecutor Server | Window management and desktop automation commands |
| CommandLineExecutor | CommandLine Executor | Shell command execution |
Command Details Subject to Change
Specific command parameters, names, and behaviors may change as MCP servers evolve. Always refer to the server-specific documentation for the most up-to-date command reference.
Agent Configuration Settings
HostAgent Configuration
# config/ufo/host_agent_config.yaml
system:
# Control detection backend
control_backend:
- "uia" # Windows UI Automation
- "omniparser" # Vision-based detection
# Screenshot settings
save_full_screen: true # Capture desktop screenshots
save_ui_tree: true # Save UI tree JSON
include_last_screenshot: true # Include previous step
concat_screenshot: true # Concatenate clean + annotated
# Window behavior
maximize_window: false # Maximize on selection
show_visual_outline_on_screen: true # Draw red outline
See Configuration Overview and System Configuration for complete configuration options.
Related Documentation
Architecture & Design:
- HostAgent Overview - High-level HostAgent architecture
- State Machine - 7-state FSM documentation
- Processing Strategy - 4-phase processing pipeline
- AppAgent Commands - Application-level commands
Core Features: - Hybrid Actions - MCP command system architecture - Control Detection - UIA and OmniParser backends - Command Dispatcher - Command routing
Summary
Key Takeaways:
- MCP-Based: All commands provided by MCP servers configured in
mcp.yaml - Dynamic Discovery: Commands discovered at runtime via
list_tools - Desktop-Level: System-wide operations (screenshots, window management)
- Configurable: Extensive MCP server configuration options
- Documented: Each server has detailed command reference
Warning
Command details subject to change - refer to server documentation for latest information
Next Steps:
- Review MCP Configuration: MCP Configuration Reference
- Explore Server Documentation: Click server links above for command details
- Understand Processing: Processing Strategy shows commands in action
- Learn State Machine: State Machine explains when commands execute