HostAgent Command System

HostAgent executes desktop-level commands through the MCP (Model Context Protocol) system. Commands are dynamically provided by MCP servers and executed through the CommandDispatcher interface. This document describes the MCP configuration for HostAgent commands.

Command Execution Architecture

graph TB HostAgent[HostAgent] --> Dispatcher[CommandDispatcher] Dispatcher --> MCPClient[MCP Client] MCPClient --> UICollector[UICollector Server] MCPClient --> HostUIExecutor[HostUIExecutor Server] MCPClient --> CLIExecutor[CommandLine Executor] UICollector --> DataCollection[Desktop Screenshot<br/>Window Info] HostUIExecutor --> DesktopActions[Window Selection<br/>App Launch] CLIExecutor --> ShellActions[Shell<br/>Commands] style HostAgent fill:#e3f2fd style Dispatcher fill:#fff3e0 style MCPClient fill:#f1f8e9 style UICollector fill:#c8e6c9 style HostUIExecutor fill:#fff9c4 style CLIExecutor fill:#d1c4e9

Dynamic Commands

HostAgent commands are not hardcoded. They are dynamically discovered from configured MCP servers. Available commands depend on MCP server configuration in config/ufo/mcp.yaml, installed MCP servers, and active MCP connections.

MCP Server Configuration

Configuration File

HostAgent commands are configured in config/ufo/mcp.yaml:

HostAgent:
  default:
    data_collection:
      - namespace: UICollector
        type: local
        start_args: []
        reset: false
    action:
      - namespace: HostUIExecutor
        type: local
        start_args: []
        reset: false
      - namespace: CommandLineExecutor
        type: local
        start_args: []
        reset: false

MCP Servers Used by HostAgent

Server	Namespace	Type	Purpose	Command Categories
UICollector	`UICollector`	Local	Data collection	Desktop screenshot, window enumeration
HostUIExecutor	`HostUIExecutor`	Local	Desktop actions	Window selection, application launch
CommandLineExecutor	`CommandLineExecutor`	Local	Shell execution	PowerShell, Bash commands

Command Discovery

Listing Available Commands

HostAgent dynamically discovers available commands from MCP servers:

# Get all available tools from MCP servers
result = await command_dispatcher.execute_commands([
    Command(tool_name="list_tools", parameters={})
])

tools = result[0].result
# Returns list of all available commands with their schemas

Command Categories

Commands are categorized by purpose:

Category	Server	Examples
Data Collection	UICollector	`capture_desktop_screenshot`, `get_desktop_app_target_info`, `get_desktop_window_info`
Window Management	HostUIExecutor	`select_application_window`, `launch_application`
Process Control	HostUIExecutor	`close_application`, `get_process_info`
Shell Execution	CommandLineExecutor	`execute_command`
Tool Discovery	All Servers	`list_tools`

Command Execution

Execution Flow

sequenceDiagram participant Strategy participant Executor as ActionExecutor participant Dispatcher as CommandDispatcher participant MCP as MCP Server Strategy->>Executor: execute(action_info) Executor->>Dispatcher: execute_commands([Command(...)]) Dispatcher->>MCP: Invoke tool MCP->>MCP: Execute command logic MCP-->>Dispatcher: Result Dispatcher-->>Executor: Result Executor-->>Strategy: Success/Error

Example: Capture Desktop Screenshot

from aip.messages import Command

# Create command
command = Command(
    tool_name="capture_desktop_screenshot",
    parameters={"all_screens": True},
    tool_type="data_collection",
)

# Execute command
results = await command_dispatcher.execute_commands([command])

# Access result
screenshot_data = results[0].result  # Base64-encoded image

Example: Select Application Window

# Select and focus application window
command = Command(
    tool_name="select_application_window",
    parameters={
        "id": "0",
        "name": "Microsoft Word - Document1"
    },
    tool_type="action",
)

results = await command_dispatcher.execute_commands([command])
app_info = results[0].result

Configuration Resources

For detailed MCP configuration, server setup, and command reference:

Quick References:

MCP Configuration Reference - Quick MCP settings reference
MCP Overview - MCP architecture and concepts

Configuration Guides:

MCP Configuration Guide - Complete configuration documentation
Local Servers - Built-in MCP servers
Remote Servers - HTTP and stdio servers
Creating MCP Servers - Creating custom MCP servers

Server Type Documentation:

Action Servers - Action server documentation
Data Collection Servers - Data collection server documentation

Detailed Server Documentation

Each MCP server has comprehensive documentation:

Server	Documentation	Command Details
UICollector	UICollector Server	Screenshot, window info, control detection commands
HostUIExecutor	HostUIExecutor Server	Window management and desktop automation commands
CommandLineExecutor	CommandLine Executor	Shell command execution

Command Details Subject to Change

Specific command parameters, names, and behaviors may change as MCP servers evolve. Always refer to the server-specific documentation for the most up-to-date command reference.

Agent Configuration Settings

HostAgent Configuration

# config/ufo/host_agent_config.yaml
system:
  # Control detection backend
  control_backend:
    - "uia"  # Windows UI Automation
    - "omniparser"  # Vision-based detection

  # Screenshot settings
  save_full_screen: true  # Capture desktop screenshots
  save_ui_tree: true  # Save UI tree JSON
  include_last_screenshot: true  # Include previous step
  concat_screenshot: true  # Concatenate clean + annotated

  # Window behavior
  maximize_window: false  # Maximize on selection
  show_visual_outline_on_screen: true  # Draw red outline

See Configuration Overview and System Configuration for complete configuration options.

Architecture & Design:

HostAgent Overview - High-level HostAgent architecture
State Machine - 7-state FSM documentation
Processing Strategy - 4-phase processing pipeline
AppAgent Commands - Application-level commands

Core Features: - Hybrid Actions - MCP command system architecture - Control Detection - UIA and OmniParser backends - Command Dispatcher - Command routing

Summary

Key Takeaways:

MCP-Based: All commands provided by MCP servers configured in mcp.yaml
Dynamic Discovery: Commands discovered at runtime via list_tools
Desktop-Level: System-wide operations (screenshots, window management)
Configurable: Extensive MCP server configuration options
Documented: Each server has detailed command reference

Warning

Command details subject to change - refer to server documentation for latest information

Next Steps:

Review MCP Configuration: MCP Configuration Reference
Explore Server Documentation: Click server links above for command details
Understand Processing: Processing Strategy shows commands in action
Learn State Machine: State Machine explains when commands execute