UFO Client Overview

The UFO Client runs on target devices and serves as the execution layer of UFO's distributed agent system. It manages MCP (Model Context Protocol) servers, executes commands deterministically, and communicates with the Agent Server through the Agent Interaction Protocol (AIP).

Quick Start: Jump to the Quick Start Guide to connect your device. Make sure the Agent Server is running first.

🎯 What is the UFO Client?

graph LR subgraph "Agent Server (Brain)" Reasoning[High-Level Reasoning] Planning[Task Planning] Strategy[Strategy Selection] end subgraph "Agent Client (Hands)" Execution[Command Execution] Tools[Tool Management] Reporting[Status Reporting] end subgraph "Device Environment" Apps[Applications] Files[File System] UI[User Interface] end Reasoning -->|Directives| Execution Planning -->|Commands| Execution Strategy -->|Tasks| Execution Execution --> Tools Tools --> Apps Tools --> Files Tools --> UI Reporting -->|Results| Reasoning style Reasoning fill:#bbdefb style Execution fill:#c8e6c9 style Tools fill:#fff9c4

The UFO Client is a stateless execution agent that:

Capability	Description	Benefit
🔧 Executes Commands	Translates server directives into concrete actions	Deterministic, reliable execution
🛠️ Manages MCP Servers	Orchestrates local and remote tool interfaces	Extensible tool ecosystem
📊 Reports Device Info	Provides hardware and software profile to server	Intelligent task assignment
📡 Communicates via AIP	Maintains persistent WebSocket connection	Real-time bidirectional communication
🚫 Remains Stateless	Executes directives without high-level reasoning	Independent updates, simple architecture

Stateless Design Philosophy: The client focuses purely on execution. All reasoning and decision-making happens on the server, allowing independent updates to server logic and client tools, simple client architecture, intelligent orchestration of multiple clients, and resource-efficient operation.

Architecture: The UFO Client is part of UFO's distributed server-client architecture, where it handles command execution and resource access while the Agent Server handles orchestration and decision-making. See Server-Client Architecture for the complete design rationale, communication protocols, and deployment patterns.

🏗️ Architecture

The client implements a layered architecture separating communication, execution, and tool management for maximum flexibility and maintainability.

graph TB subgraph "Communication" WSC[WebSocket Client AIP Protocol] end subgraph "Orchestration" UFC[UFO Client] CM[Computer Manager] end subgraph "Execution" COMP[Computer] MCPM[MCP Manager] end subgraph "Tools" LOCAL[Local MCP Servers] REMOTE[Remote MCP Servers] end WSC --> UFC UFC --> CM CM --> COMP COMP --> MCPM MCPM --> LOCAL MCPM --> REMOTE style WSC fill:#bbdefb style UFC fill:#c8e6c9 style COMP fill:#fff9c4 style MCPM fill:#ffcdd2

Core Components

Component	Responsibility	Key Features	Documentation
WebSocket Client	AIP communication	• Connection management • Registration • Heartbeat monitoring • Message routing	Details →
UFO Client	Execution orchestration	• Command execution • Result aggregation • Error handling • Session management	Details →
Computer Manager	Multi-computer abstraction	• Computer instance management • Namespace routing • Resource isolation	Details →
Computer	Tool management	• MCP server registration • Tool registry • Execution isolation • Thread pool management	Details →
MCP Server Manager	MCP lifecycle	• Server creation • Configuration loading • Connection pooling • Health monitoring	MCP Documentation →
Device Info Provider	System profiling	• Hardware detection • Capability reporting • Platform identification • Feature enumeration	Details →

For detailed component documentation:

WebSocket Client - AIP protocol implementation
UFO Client - Execution orchestration
Computer Manager - Multi-computer management
Device Info Provider - System profiling
MCP Integration - MCP server management (comprehensive documentation)

🚀 Key Capabilities

1. Deterministic Command Execution

The client executes commands exactly as specified without interpretation or reasoning, ensuring predictable behavior.

sequenceDiagram participant Server participant Client as UFO Client participant Computer participant Tool as MCP Tool Server->>Client: COMMAND (AIP) Client->>Computer: Execute Command Computer->>Computer: Lookup Tool Computer->>Tool: Execute with Timeout Tool-->>Computer: Result Computer-->>Client: Aggregated Result Client-->>Server: COMMAND_RESULTS (AIP)

Execution Flow:

Step	Action	Purpose
1️⃣ Receive	Get structured command from server via AIP	Ensure well-formed input
2️⃣ Route	Dispatch to appropriate computer instance	Support multi-namespace execution
3️⃣ Lookup	Find tool in MCP registry	Dynamic tool resolution
4️⃣ Execute	Run tool in isolated thread pool	Fault isolation and timeout protection
5️⃣ Aggregate	Combine results from multiple tools	Structured response format
6️⃣ Return	Send results back to server via AIP	Complete the execution loop

Execution Guarantees: - Isolation: Each tool runs in separate thread pool - Timeouts: Configurable timeout (default: 6000 seconds/100 minutes) - Fault Tolerance: One failed tool doesn't crash entire client - Thread Safety: Concurrent tool execution supported - Error Reporting: Structured errors returned to server

2. MCP Server Management

The client manages a collection of MCP (Model Context Protocol) servers to provide diverse tool access for automation tasks. The client is responsible for registering, managing, and executing these tools, while the Agent Server handles command orchestration. See Server-Client Architecture for how MCP integration fits into the overall architecture.

MCP Server Categories:

Data Collection Servers gather information from the device:

Server Type	Tools Provided	Use Cases
System Info	CPU, memory, disk stats	Resource monitoring
Application State	Running apps, windows	Context awareness
Screenshot	Screen capture	Visual verification
UI Element Detection	Control trees, accessibility	UI automation

Example Tools: get_system_info(), list_running_apps(), capture_screenshot(), get_ui_tree()

Action Servers perform actions on the device:

Server Type	Tools Provided	Use Cases
GUI Automation	Keyboard, mouse, clicks	UI interaction
Application Control	Launch, close, focus	App management
File System	Read, write, delete	File operations
Command Execution	Shell commands	System automation

Example Tools: click_button(label), type_text(text), open_application(name), execute_command(cmd)

Server Types:

Type	Deployment	Pros	Cons
Local MCP Servers	Run in same process via FastMCP	Fast, no network overhead	Limited to local capabilities
Remote MCP Servers	Connect via HTTP/SSE	Scalable, shared services	Network latency, external dependency

Example MCP Server Configuration:

mcp_servers:
  data_collection:
    - name: "system_info"
      type: "local"
      class: "SystemInfoServer"
    - name: "ui_detector"
      type: "local"
      class: "UIDetectionServer"

  action:
    - name: "gui_automation"
      type: "local"
      class: "GUIAutomationServer"
    - name: "file_ops"
      type: "remote"
      url: "http://localhost:8080/mcp"

See MCP Integration for comprehensive MCP server documentation.

3. Device Profiling

The client automatically collects and reports device information to enable the server to make intelligent task routing decisions.

Device Profile Structure:

{
  "device_id": "device_windows_001",
  "platform": "windows",
  "platform_type": "computer",
  "os_version": "10.0.22631",
  "system_info": {
    "cpu_count": 8,
    "memory_total_gb": 16.0,
    "disk_total_gb": 512.0,
    "hostname": "DESKTOP-ABC123",
    "ip_address": "192.168.1.100"
  },
  "supported_features": [
    "gui_automation",
    "cli_execution",
    "browser_control",
    "office_integration",
    "windows_apps"
  ],
  "installed_applications": [
    "Chrome",
    "Excel",
    "PowerPoint",
    "VSCode"
  ],
  "screen_resolution": "1920x1080",
  "connected_at": "2025-11-05T10:30:00Z"
}

Profile Usage on Server:

Server Uses Profile For:

Use Case	Example Logic
Platform Matching	Route Excel task to Windows device
Capability Filtering	Only send browser tasks to devices with Chrome
Load Balancing	Distribute tasks based on CPU/memory
Failure Recovery	Reassign task if device disconnects

See Device Info Provider for detailed profiling documentation.

4. Resilient Communication

Robust, fault-tolerant communication with the server using strongly-typed AIP messages.

Connection Lifecycle:

stateDiagram-v2 [*] --> Disconnected Disconnected --> Connecting: Initiate Connection Connecting --> Registering: WebSocket Established Registering --> Connected: Registration Success Connecting --> Disconnected: Connection Failed Registering --> Disconnected: Registration Failed Connected --> Heartbeating: Start Heartbeat Loop Heartbeating --> Heartbeating: Send/Receive Heartbeat Heartbeating --> Disconnected: Heartbeat Timeout Heartbeating --> Disconnected: WebSocket Closed Disconnected --> Connecting: Retry (Exponential Backoff) note right of Connected • Receive commands • Execute tasks • Report results end note note right of Heartbeating Default interval: 30s Timeout: 60s end note

Connection Features:

Feature	Description	Configuration
Auto Registration	Registers with server on connect	Device ID, platform, capabilities
Exponential Backoff	Smart retry on connection failure	Max retries: 5 (default)
Heartbeat Monitoring	Keep-alive mechanism	Interval: 30s (configurable)
Graceful Reconnection	Resume operation after disconnect	Auto-reconnect on network recovery

Message Types:

Message	Direction	Purpose
`REGISTRATION`	Client → Server	Register device with capabilities
`REGISTRATION_ACK`	Server → Client	Confirm registration
`HEARTBEAT`	Client ↔ Server	Keep connection alive
`COMMAND`	Server → Client	Execute task command
`COMMAND_RESULTS`	Client → Server	Return execution results
`ERROR`	Client → Server	Report execution errors

See WebSocket Client and AIP Protocol for protocol details.

📋 Workflow Examples

Client Initialization & Registration

sequenceDiagram participant Main as Client Main participant MCP as MCP Manager participant WSC as WebSocket Client participant Server Main->>MCP: Initialize MCP Servers MCP-->>Main: Server Registry Ready Main->>WSC: Create Client & Connect WSC->>Server: WebSocket Connect Server-->>WSC: Connection Established WSC->>WSC: Collect Device Info WSC->>Server: REGISTRATION Server-->>WSC: REGISTRATION_ACK WSC->>WSC: Start Heartbeat Loop loop Every 30 seconds WSC->>Server: HEARTBEAT Server-->>WSC: HEARTBEAT_ACK end Note over WSC,Server: Ready to Execute Commands

Initialization Steps:

Step	Action	Details
1️⃣ Parse Args	Process command-line arguments	`--client-id`, `--ws-server`, `--platform`
2️⃣ Load Config	Load UFO configuration	MCP servers, tools, settings
3️⃣ Init MCP	Initialize MCP server manager	Create local/remote servers
4️⃣ Create Managers	Create computer manager	Register MCP servers with computers
5️⃣ Connect	Establish WebSocket connection	Connect to server
6️⃣ Register	Send device profile	Platform, capabilities, system info
7️⃣ Heartbeat	Start keep-alive loop	Default: 30s interval
8️⃣ Listen	Wait for commands	Ready for task execution

Command Execution Flow

sequenceDiagram participant Server participant Client as UFO Client participant Comp as Computer participant Tool as MCP Tool Server->>Client: COMMAND {type: "click_button", args: {...}} Client->>Comp: execute_command() Comp->>Comp: find_tool("click_button") alt Tool Found Comp->>Tool: execute(args) Note over Tool: Thread Pool Execution 6000s timeout Tool-->>Comp: Success Comp-->>Client: Result Client-->>Server: COMMAND_RESULTS {status: "completed"} else Tool Not Found Comp-->>Client: Error Client-->>Server: ERROR {error: "Tool not found"} end

🖥️ Platform Support

The client supports multiple platforms with platform-specific tool implementations.

Platform	Status	Features	Native Tools
Windows	✅ Full Support	• UI Automation (UIAutomation API) • COM API integration • Office automation • Windows-specific apps	PowerShell, Registry, WMI, Win32 API
Linux	✅ Full Support	• Bash automation • X11/Wayland GUI tools • Package managers • Linux applications	bash, apt/yum, systemd, xdotool
macOS	🚧 In Development	• macOS applications • Automator integration • AppleScript support	osascript, Automator, launchctl
Mobile	🔮 Planned	• Touch interface • Mobile apps • Gesture control	ADB (Android), XCTest (iOS)

Platform Detection:

Automatic: Detected via platform.system() on startup
Override: Use --platform flag to specify manually
Validation: Server validates platform matches task requirements

Platform-Specific Example:

Windows:

# Windows-specific tools
tools = [
    "open_windows_app(name='Excel')",
    "execute_powershell(script='Get-Process')",
    "read_registry(key='HKLM\\Software')"
]

Linux:

# Linux-specific tools
tools = [
    "execute_bash(command='ls -la')",
    "install_package(name='vim')",
    "control_systemd(service='nginx', action='restart')"
]

⚙️ Configuration

Command-Line Arguments

Start the UFO client with:

python -m ufo.client.client [OPTIONS]

Available Options:

Option	Type	Default	Description	Example
`--client-id`	`str`	`client_001`	Unique client identifier	`--client-id device_win_001`
`--ws-server`	`str`	`ws://localhost:5000/ws`	WebSocket server URL	`--ws-server ws://192.168.1.10:5000/ws`
`--ws`	`flag`	`False`	Enable WebSocket mode (required)	`--ws`
`--max-retries`	`int`	`5`	Connection retry limit	`--max-retries 10`
`--platform`	`str`	Auto-detect	Platform override	`--platform windows`
`--log-level`	`str`	`WARNING`	Logging verbosity	`--log-level DEBUG`

Quick Start Command:

# Minimal command (default server)
python -m ufo.client.client --ws --client-id my_device

# Production command (custom server)
python -m ufo.client.client \
  --ws \
  --client-id device_production_01 \
  --ws-server ws://ufo-server.company.com:5000/ws \
  --max-retries 10 \
  --log-level INFO

UFO Configuration

The client inherits settings from config_dev.yaml:

Key Configuration Sections:

Section	Purpose	Example
MCP Servers	Define data collection and action servers	`mcp_servers.data_collection`, `mcp_servers.action`
Tool Settings	Tool-specific parameters	Timeouts, retries, API keys
Logging	Log levels, formats, destinations	File logging, console output
Platform Settings	OS-specific configurations	Windows UI automation settings

Sample Configuration:

client:
  heartbeat_interval: 30  # seconds
  command_timeout: 6000   # seconds (100 minutes)
  max_concurrent_tools: 10

mcp_servers:
  data_collection:
    - name: system_info
      type: local
      enabled: true
  action:
    - name: gui_automation
      type: local
      enabled: true
      settings:
        click_delay: 0.5
        typing_speed: 100  # chars per minute

logging:
  level: INFO
  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
  file: "logs/client.log"

See Configuration Guide for comprehensive documentation.

⚠️ Error Handling

The client is designed to handle various failure scenarios gracefully without crashing.

Connection Failures

stateDiagram-v2 [*] --> Attempting Attempting --> Connected: Success Attempting --> Failed: Error Failed --> Waiting: Exponential Backoff Waiting --> Attempting: Retry (2^n seconds) Failed --> [*]: Max Retries Exceeded note right of Waiting Retry Delays: 1st: 2s 2nd: 4s 3rd: 8s 4th: 16s 5th: 32s end note

Connection Error Handling:

Scenario	Client Behavior	Configuration
Initial Connection Failed	Exponential backoff retry	`--max-retries` (default: 5)
Connection Lost	Attempt reconnection	Automatic
Max Retries Exceeded	Exit with error code	Log error, exit
Server Unreachable	Log error, retry	Backoff between retries

Tool Execution Failures

Protection Mechanisms:

Mechanism	Purpose	Default Value
Thread Pool Isolation	Prevent one tool from blocking others	Enabled
Execution Timeout	Kill hung tools	6000 seconds (100 minutes)
Exception Catching	Graceful error handling	All tools wrapped
Error Reporting	Notify server of failures	Structured error messages

Error Handling Example:

# Client automatically handles tool errors
try:
    result = tool.execute(args)
    return {"status": "success", "result": result}
except TimeoutError:
    return {"status": "error", "error": "Tool execution timeout"}
except Exception as e:
    return {"status": "error", "error": str(e)}

Server Disconnection

Graceful Shutdown Process:

Detect Disconnection - WebSocket connection lost
Stop Heartbeat - Terminate keep-alive loop
Cancel Pending Tasks - Abort in-progress commands
Attempt Reconnection - Use exponential backoff
Clean Shutdown - If max retries exceeded

✅ Best Practices

Development Best Practices

1. Use Unique Client IDs

# Bad: Generic ID
--client-id client_001

# Good: Descriptive ID
--client-id device_win_dev_john_laptop

2. Start with INFO Logging

# Development: WARNING for normal operation (default)
--log-level WARNING

# Debugging: DEBUG for troubleshooting
--log-level DEBUG

3. Test MCP Connectivity First

# Verify MCP servers are accessible before running client
from ufo.client.mcp.mcp_server_manager import MCPServerManager

manager = MCPServerManager()
# Test server creation from configuration

Production Best Practices

1. Use Descriptive Client IDs

# Include environment, location, purpose
--client-id device_windows_production_office_01
--client-id device_linux_staging_lab_02

2. Configure Automatic Restart

systemd (Linux):

[Unit]
Description=UFO Agent Client
After=network.target

[Service]
Type=simple
User=ufo
WorkingDirectory=/opt/ufo
ExecStart=/usr/bin/python3 -m ufo.client.client \
  --ws \
  --client-id device_linux_prod_01 \
  --ws-server ws://ufo-server.internal:5000/ws \
  --log-level INFO
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

PM2 (Cross-platform):

{
  "apps": [{
    "name": "ufo-client",
    "script": "python",
    "args": [
      "-m", "ufo.client.client",
      "--ws",
      "--client-id", "device_win_prod_01",
      "--ws-server", "ws://ufo-server.internal:5000/ws",
      "--log-level", "INFO"
    ],
    "cwd": "C:\\ufo",
    "restart_delay": 5000,
    "max_restarts": 10
  }]
}

3. Monitor Connection Health

# Check logs for connection status
tail -f logs/client.log | grep -E "Connected|Disconnected|ERROR"

Security Best Practices

Security Considerations

Practice	Description	Implementation
Use WSS	Encrypt WebSocket communication	`wss://server:5000/ws` instead of `ws://`
Validate Server	Verify server certificate	Configure SSL/TLS verification
Restrict Tools	Limit MCP server access	Only enable necessary tools
Least Privilege	Run with minimum permissions	Create dedicated user account
Network Isolation	Use firewalls and VPNs	Restrict server access to internal network

🎓 Documentation Map

Getting Started

Document	Purpose	When to Read
Quick Start	Connect your device quickly	First time setup
Server Quick Start	Understand server-side setup	Before running client

Component Details

Document	Component	Topics Covered
WebSocket Client	Communication layer	AIP protocol, connection management
UFO Client	Orchestration	Session tracking, command execution
Computer Manager	Multi-computer abstraction	Namespace management, routing
Computer	Tool management	MCP registry, execution
Device Info	System profiling	Hardware detection, capabilities
MCP Integration	MCP servers	Server types, configuration

Document	Topic	Relevance
Server Overview	Server architecture	Understand the other half
AIP Protocol	Communication protocol	Deep dive into messaging
Configuration	UFO configuration	Customize behavior

🔄 Client vs. Server

Understanding the clear division between client and server responsibilities is crucial for effective system design.

Responsibility Matrix:

Aspect	Client (Execution)	Server (Orchestration)
Primary Role	Execute directives deterministically	Reason about tasks, plan actions
State Management	Stateless (no session memory)	Stateful (maintains sessions)
Reasoning	None (pure execution)	Full (high-level decision-making)
Tools	MCP servers (local/remote)	Agent strategies, prompts, LLMs
Communication	Device ↔ Server (AIP)	Multi-client coordination
Updates	Tool implementation changes	Strategy and logic updates
Complexity	Low (simple execution loop)	High (complex orchestration)
Dependencies	MCP servers, system APIs	LLMs, databases, client registry

Workflow Comparison:

graph TB subgraph "Server Workflow" S1[Receive User Request] S2[Reason About Task] S3[Plan Execution Steps] S4[Select Target Device] S5[Send Commands] end subgraph "Client Workflow" C1[Receive Command] C2[Lookup Tool] C3[Execute Tool] C4[Return Result] end S1 --> S2 S2 --> S3 S3 --> S4 S4 --> S5 S5 -.->|AIP| C1 C1 --> C2 C2 --> C3 C3 --> C4 C4 -.->|AIP| S5 style S1 fill:#bbdefb style S2 fill:#bbdefb style S3 fill:#bbdefb style C1 fill:#c8e6c9 style C2 fill:#c8e6c9 style C3 fill:#c8e6c9

Decoupled Architecture Benefits: - Independent Updates: Modify server logic without touching clients - Flexible Deployment: Run clients on any platform - Scalability: Add more clients without server changes - Maintainability: Simpler client code, easier debugging - Testability: Test client and server independently

🚀 Next Steps

1. Run Your First Client

# Follow the quick start guide
python -m ufo.client.client \
  --ws \
  --client-id my_first_device \
  --ws-server ws://localhost:5000/ws

👉 Quick Start Guide

2. Understand Registration Process

Learn how clients register with the server, device profile structure, and registration acknowledgment.

👉 Server Quick Start - Start server and connect clients

3. Explore MCP Integration

Learn about MCP servers, configure custom tools, and create your own MCP servers.

👉 MCP Integration

4. Configure for Your Environment

Customize MCP servers, adjust timeouts and retries, and configure platform-specific settings.

👉 Configuration Guide

5. Master the Protocol

Deep dive into AIP messages, understand message flow, and error handling patterns.

👉 AIP Protocol