Action Servers
Overview
Action Servers provide tools that modify system state by executing actions. These servers enable agents to interact with the environment, automate tasks, and implement decisions.
Action servers are the only servers whose tools can be selected by the LLM agent. At each step, the agent chooses which action tool to execute based on the task and current context.
- LLM Decision: Agent actively selects from available action tools
- Dynamic Selection: Different action chosen at each step based on needs
- Tool Visibility: All action tools are presented to the LLM in the prompt
Data Collection Servers are NOT LLM-selectable - they are automatically invoked by the framework.
How Tool Metadata Becomes LLM Instructions
Every action tool's implementation directly affects what the LLM sees and understands. The UFO² framework automatically extracts:
Annotatedtype hints: Parameter types, constraints, and descriptions- Docstrings: Tool purpose, parameter explanations, return value descriptions
- Function signatures: Parameter names, defaults, required vs. optional
These are automatically assembled into structured tool instructions that appear in the LLM's prompt. The LLM uses these instructions to understand what each tool does, select the appropriate tool for each step, and call the tool with correct parameters.
Therefore, developers MUST write clear, comprehensive metadata. For examples:
- See AppUIExecutor documentation for well-documented UI automation tools
- See WordCOMExecutor documentation for COM API tool examples
- See Creating Custom MCP Servers Tutorial for step-by-step guide on writing tool metadata
Side Effects:
- ✅ Modifies State: Can change system, files, UI
- ⚠️ Not Idempotent: Same action may have different results
- 🔒 Use with Caution: Always verify before executing
- 📝 Audit Trail: Log all actions for debugging
- 🤖 LLM-Controlled: Agent decides when and which action to execute
Tool Type Identifier
All action tools use the tool type:
tool_type = "action"
Tool keys follow the format:
tool_key = "action::{tool_name}"
# Examples:
"action::click"
"action::type_text"
"action::run_shell"
Built-in Action Servers
UFO² provides several built-in action servers for different automation scenarios. Below is a summary - click each server name for detailed documentation including all tools, parameters, and usage examples.
UI Automation Servers
| Server | Agent | Description | Documentation |
|---|---|---|---|
| HostUIExecutor | HostAgent | Window selection and desktop-level UI automation | Full Details → |
| AppUIExecutor | AppAgent | Application-level UI automation (clicks, typing, scrolling) | Full Details → |
Command Execution Servers
| Server | Platform | Description | Documentation |
|---|---|---|---|
| CommandLineExecutor | Windows | Execute shell commands and launch applications | Full Details → |
| BashExecutor | Linux | Execute Linux commands via HTTP server | Full Details → |
Office Automation Servers (COM API)
| Server | Application | Description | Documentation |
|---|---|---|---|
| WordCOMExecutor | Microsoft Word | Word document automation (insert table, format text, etc.) | Full Details → |
| ExcelCOMExecutor | Microsoft Excel | Excel automation (insert data, create charts, etc.) | Full Details → |
| PowerPointCOMExecutor | Microsoft PowerPoint | PowerPoint automation (slides, formatting, etc.) | Full Details → |
Specialized Servers
| Server | Purpose | Description | Documentation |
|---|---|---|---|
| PDFReaderExecutor | PDF Processing | Extract text from PDFs with human simulation | Full Details → |
| ConstellationEditor | Multi-Device | Create and manage multi-device task workflows | Full Details → |
| HardwareExecutor | Hardware Control | Control Arduino, robot arms, test fixtures, mobile devices | Full Details → |
Quick Reference: Each server documentation page includes:
- 📋 Complete tool reference with all parameters and return values
- 💡 Code examples showing actual usage patterns
- ⚙️ Configuration examples for different scenarios
- ✅ Best practices with do's and don'ts
- 🎯 Use cases with complete workflows
Configuration Examples
Action servers are configured in config/ufo/mcp.yaml. Each server's documentation provides detailed configuration examples.
Basic Configuration
HostAgent:
default:
action:
- namespace: HostUIExecutor
type: local
reset: false
- namespace: CommandLineExecutor
type: local
reset: false
App-Specific Configuration
AppAgent:
# Default configuration for all apps
default:
action:
- namespace: AppUIExecutor
type: local
reset: false
# Word-specific configuration
WINWORD.EXE:
action:
- namespace: AppUIExecutor
type: local
reset: false
- namespace: WordCOMExecutor
type: local
reset: true # Reset when switching documents
# Excel-specific configuration
EXCEL.EXE:
action:
- namespace: AppUIExecutor
type: local
reset: false
- namespace: ExcelCOMExecutor
type: local
reset: true # Reset when switching workbooks
Multi-Platform Configuration
# Windows agent
HostAgent:
default:
action:
- namespace: HostUIExecutor
type: local
- namespace: CommandLineExecutor
type: local
# Linux agent
LinuxAgent:
default:
action:
- namespace: BashExecutor
type: http
host: "192.168.1.100"
port: 8010
path: "/mcp"
For complete configuration details, see:
- MCP Configuration Guide - Complete configuration reference
- Individual server documentation for server-specific configuration options
Best Practices
General Principles
1. Verify Before Acting
Always observe before executing actions:
# ✅ Good: Verify target exists
control_info = await computer.run_actions([
MCPToolCall(tool_key="data_collection::get_control_info", ...)
])
if control_info[0].data and control_info[0].data["is_enabled"]:
await computer.run_actions([
MCPToolCall(tool_key="action::click", ...)
])
2. Handle Action Failures
Actions can fail for many reasons - always implement error handling and retries.
3. Validate Inputs
Never execute unsanitized commands, especially with run_shell and similar tools.
4. Wait for Action Completion
Some actions need time to complete - add appropriate delays after launching applications or triggering UI changes.
For detailed best practices including code examples, error handling patterns, and common pitfalls, see the individual server documentation:
- HostUIExecutor Best Practices
- AppUIExecutor Best Practices
- CommandLineExecutor Best Practices
- WordCOMExecutor Best Practices
- ExcelCOMExecutor Best Practices
- PowerPointCOMExecutor Best Practices
- PDFReaderExecutor Best Practices
- ConstellationEditor Best Practices
- HardwareExecutor Best Practices
- BashExecutor Best Practices
Common Use Cases
For complete use case examples with detailed workflows, see the individual server documentation:
UI Automation
- Form Filling: AppUIExecutor
- Window Management: HostUIExecutor
Document Automation
- Word Processing: WordCOMExecutor
- Excel Data Processing: ExcelCOMExecutor
- PowerPoint Generation: PowerPointCOMExecutor
- PDF Extraction: PDFReaderExecutor
System Automation
- Application Launching: CommandLineExecutor
- Linux Command Execution: BashExecutor
Multi-Device Workflows
- Task Distribution: ConstellationEditor
- Hardware Control: HardwareExecutor
Error Handling
Action servers implement robust error handling with timeouts and retries. For detailed error handling patterns specific to each server, see:
- HostUIExecutor
- AppUIExecutor
- CommandLineExecutor
- BashExecutor
- And other server-specific documentation
General Timeout Handling
Actions are executed with a timeout (default: 6000 seconds):
try:
result = await computer.run_actions([
MCPToolCall(tool_key="action::run_shell", ...)
])
except asyncio.TimeoutError:
logger.error("Action timed out after 6000 seconds")
# Cleanup or retry logic...
General Retry Pattern
async def retry_action(action: MCPToolCall, max_retries: int = 3):
"""Retry an action with exponential backoff."""
for attempt in range(max_retries):
try:
result = await computer.run_actions([action])
if not result[0].is_error:
return result[0]
logger.warning(f"Attempt {attempt + 1} failed: {result[0].content}")
if attempt < max_retries - 1:
await asyncio.sleep(2 ** attempt) # Exponential backoff
except Exception as e:
logger.error(f"Exception on attempt {attempt + 1}: {e}")
if attempt < max_retries - 1:
await asyncio.sleep(2 ** attempt)
raise ValueError(f"Action failed after {max_retries} attempts")
Integration with Data Collection
Actions should be paired with data collection for verification:
# Pattern: Observe → Act → Verify
# 1. Observe: Capture initial state
before_screenshot = await computer.run_actions([
MCPToolCall(tool_key="data_collection::take_screenshot", ...)
])
# 2. Act: Execute action
action_result = await computer.run_actions([
MCPToolCall(tool_key="action::click", ...)
])
# 3. Verify: Check result
await asyncio.sleep(1) # Wait for UI update
after_screenshot = await computer.run_actions([
MCPToolCall(tool_key="data_collection::take_screenshot", ...)
])
For more details on agent execution patterns:
- HostAgent Commands - HostAgent command patterns
- AppAgent Commands - AppAgent action patterns
- Agent Overview - UFO² agent architecture
For more details on data collection:
- Data Collection Servers - Observation tools
- UICollector Documentation - Complete data collection reference
Related Documentation
- Data Collection Servers - Observation tools
- Configuration Guide - Configure action servers
- Local Servers - Built-in action servers overview
- Remote Servers - HTTP deployment for actions
- Computer - Action execution layer
- MCP Overview - High-level MCP architecture
Safety Reminder: Action servers can modify system state. Always:
- ✅ Validate inputs before execution
- ✅ Verify targets exist and are accessible
- ✅ Log all actions for audit trail
- ✅ Handle failures gracefully with retries
- ✅ Test in safe environment before production use