Frequently Asked Questions (FAQ)

Quick answers to common questions about UFO³ Galaxy, UFO², Linux Agents, and general troubleshooting.

🎯 General Questions

Q: What is UFO³?

A: UFO³ is the third iteration of the UFO project, encompassing three major frameworks:

UFO² - Desktop AgentOS for Windows automation
UFO³ Galaxy - Multi-device orchestration framework
Linux Agent - Server and CLI automation for Linux

Q: Why is it called UFO?

A: UFO stands for UI Focused agent. The name was given to the first version of the project and has been retained through all iterations (UFO v1, UFO², UFO³) as the project evolved from a simple UI-focused agent to a comprehensive multi-device orchestration framework.

Q: Which version should I use?

A: Choose based on your needs:

Use Case	Recommended Version
Windows desktop automation only	UFO²
Cross-device workflows (Windows + Linux)	UFO³ Galaxy
Linux server management only	Linux Agent
Multi-device orchestration	UFO³ Galaxy

Q: What's the difference between UFO² and UFO³ Galaxy?

A: UFO² is for single Windows desktop automation with: - Deep Windows OS integration (UIA, Win32, COM) - Office application automation - GUI + API hybrid execution

UFO³ Galaxy orchestrates multiple devices with: - Cross-platform support (Windows + Linux) - Distributed task execution - Device capability-based routing - Constellation-based DAG orchestration

See Migration Guide for details.

Q: Can I use UFO on Linux or macOS?

A: Yes and No:

✅ Linux: Supported via Linux Agent for server/CLI automation
❌ macOS: Not currently supported (Windows and Linux only)
Windows: Full UFO² desktop automation support

🔧 Installation & Setup

Q: Which Python version do I need?

A: Python 3.10 or higher is required for all UFO³ components.

# Check your Python version
python --version

Q: What models does UFO support?

A: UFO³ supports multiple LLM providers:

OpenAI - GPT-4o, GPT-4, GPT-3.5
Azure OpenAI - All Azure-hosted models
Google Gemini - Gemini Pro, Gemini Flash
Anthropic Claude - Claude 3.5, Claude 3
Qwen - Local or API deployment
DeepSeek - DeepSeek models
Ollama - Local model hosting
And more...

See Model Configuration Guide for the complete list and setup instructions.

Q: Can I use non-vision models in UFO?

A: Yes! You can disable visual mode:

# config/ufo/system.yaml
VISUAL_MODE: false

However, UFO² is designed for vision models. Non-vision models may have reduced performance for GUI automation tasks.

Q: Can I host my own LLM endpoint?

A: Yes! UFO³ supports custom endpoints:

# config/ufo/agents.yaml
HOST_AGENT:
  API_TYPE: "openai"  # Or compatible API
  API_BASE: "http://your-endpoint.com/v1/chat/completions"
  API_KEY: "your-key"
  API_MODEL: "your-model-name"

See Model Configuration for details.

Q: Do I need API keys for all agents?

A: No, only for LLM-powered agents:

Component	Requires API Key	Purpose
ConstellationAgent (Galaxy)	✅ Yes	Orchestration reasoning
HostAgent (UFO²)	✅ Yes	Task planning
AppAgent (UFO²)	✅ Yes	Action execution
LinuxAgent	✅ Yes	Command planning
Device Server	❌ No	Message routing only
MCP Servers	❌ No	Tool provider only

⚙️ Configuration

Q: Where are configuration files located?

A: UFO³ uses a modular configuration system in config/:

config/
├── ufo/                    # UFO² configuration
│   ├── agents.yaml         # LLM and agent settings
│   ├── system.yaml         # Runtime settings
│   ├── rag.yaml           # Knowledge retrieval
│   └── mcp.yaml           # MCP server configuration
└── galaxy/                 # Galaxy configuration
    ├── agent.yaml          # ConstellationAgent LLM
    ├── devices.yaml        # Device pool
    └── constellation.yaml  # Runtime settings

Q: Can I still use the old `ufo/config/config.yaml`?

A: Yes, for backward compatibility, but we recommend migrating to the new modular system:

# Check current configuration
python -m ufo.tools.validate_config ufo --show-config

# Migrate from legacy to new
python -m ufo.tools.migrate_config

See Configuration Migration Guide for details.

Q: How do I protect my API keys?

A: Best practices for API key security:

Never commit .yaml files with keys - Use .template files

# Good pattern
config/ufo/agents.yaml.template  # Commit this (with placeholders)
config/ufo/agents.yaml           # DON'T commit (has real keys)

Use environment variables for sensitive data:

# In agents.yaml
HOST_AGENT:
  API_KEY: ${OPENAI_API_KEY}  # Reads from environment

Add to .gitignore:

config/**/agents.yaml
config/**/agent.yaml
!**/*.template

🌌 UFO³ Galaxy Questions

Q: What's the minimum number of devices for Galaxy?

A: Galaxy requires at least 1 device agent (Windows or Linux) to be useful, but you can start with just one device and add more later.

# Minimal Galaxy setup (1 device)
devices:
  - device_id: "my_windows_pc"
    server_url: "ws://localhost:5000/ws"
    os: "windows"

Q: Can Galaxy mix Windows and Linux devices?

A: Yes! Galaxy can orchestrate heterogeneous devices:

devices:
  - device_id: "windows_desktop"
    os: "windows"
    capabilities: ["office", "excel", "outlook"]

  - device_id: "linux_server"
    os: "linux"
    capabilities: ["server", "database", "log_analysis"]

Galaxy automatically routes tasks based on device capabilities.

Q: Do all devices need to be on the same network?

A: No, devices can be distributed across networks using SSH tunneling:

Same network: Direct WebSocket connections
Different networks: Use SSH tunnels (reverse/forward)
Cloud + local: SSH tunnels with public gateways

See Linux Quick Start - SSH Tunneling for examples.

Q: How does Galaxy decide which device to use?

A: Galaxy uses capability-based routing:

Analyzes the task requirements
Matches against device capabilities in devices.yaml
Considers device metadata (OS, performance, etc.)
Selects the best-fit device(s)

Example:

# Task: "Analyze error logs on the production server"
# → Galaxy routes to device with:
capabilities:
  - "log_analysis"
  - "server_management"
os: "linux"

🐧 Linux Agent Questions

Q: Does the Linux Agent require a GUI?

A: No! The Linux Agent is designed for headless servers:

Executes CLI commands via MCP
No X11/desktop environment needed
Works over SSH
Perfect for remote servers

Q: Can I run multiple Linux Agents on one machine?

A: Yes, using different ports and client IDs:

# Agent 1
python -m ufo.server.app --port 5001
python -m ufo.client.client --ws --client-id linux_1 --platform linux

# Agent 2 (same machine)
python -m ufo.server.app --port 5002
python -m ufo.client.client --ws --client-id linux_2 --platform linux

Q: What's the MCP service for?

A: The MCP (Model Context Protocol) service provides the actual command execution tools for the Linux Agent:

Linux Agent (LLM reasoning)
     ↓
MCP Service (tool provider)
     ↓
Bash commands (actual execution)

Without MCP, the Linux Agent can't execute commands - it can only plan them.

🪟 UFO² Questions

Q: Does UFO² work on Windows 10?

A: Yes! UFO² supports: - ✅ Windows 11 (recommended) - ✅ Windows 10 (fully supported) - ❌ Windows 8.1 or earlier (not tested)

Q: Can UFO² automate Office apps?

A: Yes! UFO² has enhanced Office support through: - MCP Office servers - Direct API access to Excel, Word, Outlook, PowerPoint - GUI automation - Fallback for unsupported operations - Hybrid execution - Automatically chooses API or GUI

Enable MCP in config/ufo/mcp.yaml for better Office automation.

Q: Does UFO² interrupt my work?

A: UFO² can run automation tasks on your current desktop. For non-disruptive operation, you can run it on a separate machine or virtual desktop environment.

Note: Picture-in-Picture mode is planned for future releases.

Q: Can I use UFO² without MCP?

A: UFO² requires MCP (Model Context Protocol) servers for tool execution. MCP provides the interface between the LLM agents and system operations (Windows APIs, Office automation, etc.). Without MCP, UFO² cannot perform actions.

🐛 Common Issues & Troubleshooting

Issue: "Configuration file not found"

Error:

FileNotFoundError: config/ufo/agents.yaml not found

Solution:

# Copy template files
cp config/ufo/agents.yaml.template config/ufo/agents.yaml

# Edit with your API keys
notepad config/ufo/agents.yaml  # Windows
nano config/ufo/agents.yaml     # Linux

Issue: "API Authentication Error"

Error:

openai.AuthenticationError: Invalid API key

Solutions:

Check API key format:

API_KEY: "sk-..."  # OpenAI starts with sk-
API_KEY: "..."     # Azure uses deployment key

Verify API_TYPE matches your provider:

API_TYPE: "openai"  # For OpenAI
API_TYPE: "aoai"    # For Azure OpenAI

Check for extra spaces/quotes in YAML
For Azure: Verify API_DEPLOYMENT_ID is set

Issue: "Connection aborted / Remote end closed connection"

Error:

Error making API request: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Solutions:

Check network connection (VPN, proxy, firewall)
Verify LLM endpoint is accessible: curl https://api.openai.com/v1/models
Check endpoint status (Azure, OpenAI, etc.)
Try increasing timeout in config
Verify API base URL is correct

Issue: "Device not connecting to Galaxy"

Error:

ERROR - [WS] Failed to connect to ws://localhost:5000/ws
Connection refused

Checklist:

[ ] Is the server running? (curl http://localhost:5000/api/health)
[ ] Port number correct? (Server: --port 5000, Client: ws://...:5000/ws)
[ ] Platform flag set? (--platform windows or --platform linux)
[ ] Firewall blocking? (Allow port 5000)
[ ] SSH tunnel established? (If using remote devices)

Issue: "device_id mismatch in Galaxy"

Error:

ERROR - Device 'linux_agent_1' not found in configuration

Cause: Mismatch between devices.yaml and client command

Solution: Ensure exact match:

Location	Field	Example
`devices.yaml`	`device_id:`	`"linux_agent_1"`
Client command	`--client-id`	`linux_agent_1`

Critical: IDs must match exactly (case-sensitive, no typos).

Issue: "MCP service not responding (Linux)"

Error:

ERROR - Cannot connect to MCP server at http://127.0.0.1:8010

Solutions:

Check if MCP service is running:

curl http://localhost:8010/health
ps aux | grep linux_mcp_server

Restart MCP service:

pkill -f linux_mcp_server
python -m ufo.client.mcp.http_servers.linux_mcp_server

Check port conflict:

lsof -i :8010
# If port taken, use different port:
python -m ufo.client.mcp.http_servers.linux_mcp_server --port 8011

Issue: "Tasks failing after X steps"

Cause: MAX_STEP limit reached

Solution: Increase step limit in config/ufo/system.yaml:

# Default is 50
MAX_STEP: 100  # For complex tasks

# Or disable limit (not recommended)
MAX_STEP: -1

Issue: "Too many LLM calls / high cost"

Solutions:

Enable action sequences (bundles actions):

# config/ufo/system.yaml
ACTION_SEQUENCE: true

Use vision-capable models for GUI tasks:

# config/ufo/agents.yaml
APP_AGENT:
  API_MODEL: "gpt-4o"  # Use vision models for GUI automation

Note: Non-vision models like gpt-3.5-turbo cannot process screenshots and should not be used for GUI automation tasks.

Enable experience learning (reuse patterns):

# config/ufo/rag.yaml
RAG_EXPERIENCE: true

Issue: "Why is the latency high?"

A: Latency depends on several factors:

LLM response time - GPT-4o typically takes 10-30 seconds per step
Network speed - API calls to OpenAI/Azure endpoints
Endpoint workload - Provider server load
Visual mode - Image processing adds overhead

To reduce latency: - Use faster models (gpt-3.5-turbo vs gpt-4o) - Enable action sequences to batch operations - Use local models (Ollama) if acceptable - Disable visual mode if not needed

Issue: "Can I use non-English requests?"

A: Yes! Most modern LLMs support multiple languages:

GPT-4o, GPT-4: Excellent multilingual support
Gemini: Good multilingual support
Qwen: Excellent for Chinese
Claude: Good multilingual support

Performance may vary by language and model. Test with your specific language and model combination.

📚 Where to Find More Help

Documentation

Topic	Link
Getting Started	UFO² Quick Start, Galaxy Quick Start, Linux Quick Start
Configuration	Configuration Overview
Troubleshooting	Quick start guides have detailed troubleshooting sections
Architecture	Project Structure
More Guidance	User & Developer Guide

Community & Support

GitHub Discussions: https://github.com/microsoft/UFO/discussions
GitHub Issues: https://github.com/microsoft/UFO/issues
Email: ufo-agent@microsoft.com

Debugging Tips

Enable debug logging:

# config/ufo/system.yaml
LOG_LEVEL: "DEBUG"

Check log files:

logs/<task-name>/
├── request.log                    # Request logs
├── response.log                   # Response logs
├── action_step*.png               # Screenshots at each step
└── action_step*_annotated.png     # Annotated screenshots

Validate configuration:

python -m ufo.tools.validate_config ufo --show-config
python -m ufo.tools.validate_config galaxy --show-config

Test LLM connectivity:

# Test your API key
from openai import OpenAI
client = OpenAI(api_key="your-key")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)

💡 Still have questions? Check the More Guidance page for additional resources, or reach out to the community!

Frequently Asked Questions (FAQ)

🎯 General Questions

Q: What is UFO³?

Q: Why is it called UFO?

Q: Which version should I use?

Q: What's the difference between UFO² and UFO³ Galaxy?

Q: Can I use UFO on Linux or macOS?

🔧 Installation & Setup

Q: Which Python version do I need?

Q: What models does UFO support?

Q: Can I use non-vision models in UFO?

Q: Can I host my own LLM endpoint?

Q: Do I need API keys for all agents?

⚙️ Configuration

Q: Where are configuration files located?

Q: Can I still use the old ufo/config/config.yaml?

Q: How do I protect my API keys?

🌌 UFO³ Galaxy Questions

Q: What's the minimum number of devices for Galaxy?

Q: Can Galaxy mix Windows and Linux devices?

Q: Do all devices need to be on the same network?

Q: How does Galaxy decide which device to use?

🐧 Linux Agent Questions

Q: Does the Linux Agent require a GUI?

Q: Can I run multiple Linux Agents on one machine?

Q: What's the MCP service for?

🪟 UFO² Questions

Q: Does UFO² work on Windows 10?

Q: Can UFO² automate Office apps?

Q: Does UFO² interrupt my work?

Q: Can I use UFO² without MCP?

🐛 Common Issues & Troubleshooting

Issue: "Configuration file not found"

Issue: "API Authentication Error"

Issue: "Connection aborted / Remote end closed connection"

Issue: "Device not connecting to Galaxy"

Issue: "device_id mismatch in Galaxy"

Issue: "MCP service not responding (Linux)"

Issue: "Tasks failing after X steps"

Issue: "Too many LLM calls / high cost"

Issue: "Why is the latency high?"

Issue: "Can I use non-English requests?"

📚 Where to Find More Help

Documentation

Community & Support

Debugging Tips

Q: Can I still use the old `ufo/config/config.yaml`?