Action Servers

Overview

Action Servers provide tools that modify system state by executing actions. These servers enable agents to interact with the environment, automate tasks, and implement decisions.

Action servers are the only servers whose tools can be selected by the LLM agent. At each step, the agent chooses which action tool to execute based on the task and current context.

  • LLM Decision: Agent actively selects from available action tools
  • Dynamic Selection: Different action chosen at each step based on needs
  • Tool Visibility: All action tools are presented to the LLM in the prompt

Data Collection Servers are NOT LLM-selectable - they are automatically invoked by the framework.

How Tool Metadata Becomes LLM Instructions

Every action tool's implementation directly affects what the LLM sees and understands. The UFO² framework automatically extracts:

  • Annotated type hints: Parameter types, constraints, and descriptions
  • Docstrings: Tool purpose, parameter explanations, return value descriptions
  • Function signatures: Parameter names, defaults, required vs. optional

These are automatically assembled into structured tool instructions that appear in the LLM's prompt. The LLM uses these instructions to understand what each tool does, select the appropriate tool for each step, and call the tool with correct parameters.

Therefore, developers MUST write clear, comprehensive metadata. For examples:

graph TB LLM["LLM Agent Decision<br/>(Selects Action Tool)"] Agent["Agent Decision<br/>'Click OK Button'"] MCP["MCP Server<br/>Action Server"] subgraph Tools["Available Action Tools"] Click["click()"] Type["type_text()"] Insert["insert_table()"] Shell["run_shell()"] end System["System Modified<br/>✅ Side Effects"] LLM --> Agent Agent --> MCP MCP --> Tools Tools --> System style LLM fill:#e3f2fd,stroke:#1976d2,stroke-width:2px style Agent fill:#fff3e0,stroke:#f57c00,stroke-width:2px style MCP fill:#e8f5e9,stroke:#388e3c,stroke-width:2px style Tools fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px style System fill:#ffebee,stroke:#c62828,stroke-width:2px

Side Effects:

  • ✅ Modifies State: Can change system, files, UI
  • ⚠️ Not Idempotent: Same action may have different results
  • 🔒 Use with Caution: Always verify before executing
  • 📝 Audit Trail: Log all actions for debugging
  • 🤖 LLM-Controlled: Agent decides when and which action to execute

Tool Type Identifier

All action tools use the tool type:

tool_type = "action"

Tool keys follow the format:

tool_key = "action::{tool_name}"

# Examples:
"action::click"
"action::type_text"
"action::run_shell"

Built-in Action Servers

UFO² provides several built-in action servers for different automation scenarios. Below is a summary - click each server name for detailed documentation including all tools, parameters, and usage examples.

UI Automation Servers

Server Agent Description Documentation
HostUIExecutor HostAgent Window selection and desktop-level UI automation Full Details →
AppUIExecutor AppAgent Application-level UI automation (clicks, typing, scrolling) Full Details →

Command Execution Servers

Server Platform Description Documentation
CommandLineExecutor Windows Execute shell commands and launch applications Full Details →
BashExecutor Linux Execute Linux commands via HTTP server Full Details →

Office Automation Servers (COM API)

Server Application Description Documentation
WordCOMExecutor Microsoft Word Word document automation (insert table, format text, etc.) Full Details →
ExcelCOMExecutor Microsoft Excel Excel automation (insert data, create charts, etc.) Full Details →
PowerPointCOMExecutor Microsoft PowerPoint PowerPoint automation (slides, formatting, etc.) Full Details →

Specialized Servers

Server Purpose Description Documentation
PDFReaderExecutor PDF Processing Extract text from PDFs with human simulation Full Details →
ConstellationEditor Multi-Device Create and manage multi-device task workflows Full Details →
HardwareExecutor Hardware Control Control Arduino, robot arms, test fixtures, mobile devices Full Details →

Quick Reference: Each server documentation page includes:

  • 📋 Complete tool reference with all parameters and return values
  • 💡 Code examples showing actual usage patterns
  • ⚙️ Configuration examples for different scenarios
  • Best practices with do's and don'ts
  • 🎯 Use cases with complete workflows

Configuration Examples

Action servers are configured in config/ufo/mcp.yaml. Each server's documentation provides detailed configuration examples.

Basic Configuration

HostAgent:
  default:
    action:
      - namespace: HostUIExecutor
        type: local
        reset: false
      - namespace: CommandLineExecutor
        type: local
        reset: false

App-Specific Configuration

AppAgent:
  # Default configuration for all apps
  default:
    action:
      - namespace: AppUIExecutor
        type: local
        reset: false

  # Word-specific configuration
  WINWORD.EXE:
    action:
      - namespace: AppUIExecutor
        type: local
        reset: false
      - namespace: WordCOMExecutor
        type: local
        reset: true  # Reset when switching documents

  # Excel-specific configuration
  EXCEL.EXE:
    action:
      - namespace: AppUIExecutor
        type: local
        reset: false
      - namespace: ExcelCOMExecutor
        type: local
        reset: true  # Reset when switching workbooks

Multi-Platform Configuration

# Windows agent
HostAgent:
  default:
    action:
      - namespace: HostUIExecutor
        type: local
      - namespace: CommandLineExecutor
        type: local

# Linux agent
LinuxAgent:
  default:
    action:
      - namespace: BashExecutor
        type: http
        host: "192.168.1.100"
        port: 8010
        path: "/mcp"

For complete configuration details, see:

  • MCP Configuration Guide - Complete configuration reference
  • Individual server documentation for server-specific configuration options

Best Practices

General Principles

1. Verify Before Acting

Always observe before executing actions:

# ✅ Good: Verify target exists
control_info = await computer.run_actions([
    MCPToolCall(tool_key="data_collection::get_control_info", ...)
])

if control_info[0].data and control_info[0].data["is_enabled"]:
    await computer.run_actions([
        MCPToolCall(tool_key="action::click", ...)
    ])

2. Handle Action Failures

Actions can fail for many reasons - always implement error handling and retries.

3. Validate Inputs

Never execute unsanitized commands, especially with run_shell and similar tools.

4. Wait for Action Completion

Some actions need time to complete - add appropriate delays after launching applications or triggering UI changes.

For detailed best practices including code examples, error handling patterns, and common pitfalls, see the individual server documentation:

Common Use Cases

For complete use case examples with detailed workflows, see the individual server documentation:

UI Automation

Document Automation

System Automation

Multi-Device Workflows

Error Handling

Action servers implement robust error handling with timeouts and retries. For detailed error handling patterns specific to each server, see:

General Timeout Handling

Actions are executed with a timeout (default: 6000 seconds):

try:
    result = await computer.run_actions([
        MCPToolCall(tool_key="action::run_shell", ...)
    ])
except asyncio.TimeoutError:
    logger.error("Action timed out after 6000 seconds")
    # Cleanup or retry logic...

General Retry Pattern

async def retry_action(action: MCPToolCall, max_retries: int = 3):
    """Retry an action with exponential backoff."""
    for attempt in range(max_retries):
        try:
            result = await computer.run_actions([action])
            if not result[0].is_error:
                return result[0]
            logger.warning(f"Attempt {attempt + 1} failed: {result[0].content}")
            if attempt < max_retries - 1:
                await asyncio.sleep(2 ** attempt)  # Exponential backoff
        except Exception as e:
            logger.error(f"Exception on attempt {attempt + 1}: {e}")
            if attempt < max_retries - 1:
                await asyncio.sleep(2 ** attempt)
    raise ValueError(f"Action failed after {max_retries} attempts")

Integration with Data Collection

Actions should be paired with data collection for verification:

# Pattern: Observe → Act → Verify

# 1. Observe: Capture initial state
before_screenshot = await computer.run_actions([
    MCPToolCall(tool_key="data_collection::take_screenshot", ...)
])

# 2. Act: Execute action
action_result = await computer.run_actions([
    MCPToolCall(tool_key="action::click", ...)
])

# 3. Verify: Check result
await asyncio.sleep(1)  # Wait for UI update
after_screenshot = await computer.run_actions([
    MCPToolCall(tool_key="data_collection::take_screenshot", ...)
])

For more details on agent execution patterns:

For more details on data collection:

Safety Reminder: Action servers can modify system state. Always:

  1. Validate inputs before execution
  2. Verify targets exist and are accessible
  3. Log all actions for audit trail
  4. Handle failures gracefully with retries
  5. Test in safe environment before production use