System Configuration (system.yaml)
Configure UFO²'s runtime behavior, execution limits, control backends, logging, and operational parameters. This file controls how UFO² interacts with the Windows environment.
Overview
The system.yaml file defines runtime settings that control UFO²'s behavior during task execution. Unlike agents.yaml (which configures LLMs), this file configures how UFO² operates on Windows.
File Location: config/ufo/system.yaml
Note: Unlike agents.yaml, the system.yaml file is already present in the repository with sensible defaults. You can use it as-is or customize it for your needs.
Quick Configuration
Default Configuration (Works Out of Box)
# Most users can use default settings
MAX_STEP: 50
MAX_ROUND: 1
CONTROL_BACKEND: ["uia"]
USE_MCP: True
PRINT_LOG: False
Recommended for Development
# More verbose logging for debugging
MAX_STEP: 50
MAX_ROUND: 1
PRINT_LOG: True
LOG_LEVEL: "DEBUG"
CONTROL_BACKEND: ["uia"]
Recommended for Production
# Optimized for reliability
MAX_STEP: 100
MAX_ROUND: 3
CONTROL_BACKEND: ["uia"]
USE_MCP: True
SAFE_GUARD: True
LOG_TO_MARKDOWN: True
Configuration Categories
The system.yaml file is organized into logical sections:
| Category | Purpose | Key Fields |
|---|---|---|
| LLM Parameters | API call settings | MAX_TOKENS, TEMPERATURE, TIMEOUT |
| Execution Limits | Task boundaries | MAX_STEP, MAX_ROUND, SLEEP_TIME |
| Control Backend | UI detection methods | CONTROL_BACKEND, IOU_THRESHOLD |
| Action Configuration | Interaction behavior | CLICK_API, INPUT_TEXT_API, MAXIMIZE_WINDOW |
| Logging | Output and debugging | PRINT_LOG, LOG_LEVEL, LOG_XML |
| MCP Settings | Tool server integration | USE_MCP, MCP_SERVERS_CONFIG |
| Safety | Security controls | SAFE_GUARD, CONTROL_LIST |
| Control Filtering | UI element filtering | CONTROL_FILTER_TYPE, CONTROL_FILTER_TOP_K |
LLM Parameters
These settings control how UFO² communicates with LLM APIs.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
MAX_TOKENS |
Integer | 2000 |
Maximum tokens for LLM response |
MAX_RETRY |
Integer | 20 |
Maximum retries for failed API calls |
TEMPERATURE |
Float | 0.0 |
Sampling temperature (0.0 = deterministic, 1.0 = creative) |
TOP_P |
Float | 0.0 |
Nucleus sampling threshold |
TIMEOUT |
Integer | 60 |
API call timeout (seconds) |
Example
# Conservative settings (recommended)
MAX_TOKENS: 2000
MAX_RETRY: 20
TEMPERATURE: 0.0 # Deterministic
TOP_P: 0.0
TIMEOUT: 60
# Creative settings (experimental)
# MAX_TOKENS: 4000
# TEMPERATURE: 0.7 # More creative
# TOP_P: 0.9
When to Adjust:
- Increase MAX_TOKENS if responses are getting cut off
- Increase TEMPERATURE if you want more varied responses (not recommended)
- Keep at 0.0 for consistent, repeatable automation
- Increase TIMEOUT for slow API connections
Execution Limits
Control how long and how many attempts UFO² makes for tasks.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
MAX_STEP |
Integer | 50 |
Maximum steps per task |
MAX_ROUND |
Integer | 1 |
Maximum rounds per task (retries from start) |
SLEEP_TIME |
Integer | 1 |
Wait time between steps (seconds) |
RECTANGLE_TIME |
Integer | 1 |
Duration to show visual highlights (seconds) |
Example
# Default settings
MAX_STEP: 50
MAX_ROUND: 1
SLEEP_TIME: 1
RECTANGLE_TIME: 1
# For complex tasks
# MAX_STEP: 100
# MAX_ROUND: 3
# For faster execution (risky)
# SLEEP_TIME: 0
Note on Step vs Round:
- STEP: Individual action (click, type, etc.)
- ROUND: Complete task attempt from start
Example: If MAX_ROUND: 3, UFO² will retry the entire task up to 3 times if it fails.
Control Backend
Configure how UFO² detects and interacts with UI elements.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
CONTROL_BACKEND |
List[String] | ["uia"] |
UI detection backends to use |
IOU_THRESHOLD_FOR_MERGE |
Float | 0.1 |
IoU threshold for merging overlapping controls |
Available Backends
| Backend | Description | Pros | Cons |
|---|---|---|---|
"uia" |
UI Automation | Fast, reliable, Windows native | May miss some controls |
"omniparser" |
Vision-based | Finds visual-only elements | Requires GPU, slow |
Note: win32 backend is no longer supported.
Example
# Recommended: Use UIA (default)
CONTROL_BACKEND: ["uia"]
IOU_THRESHOLD_FOR_MERGE: 0.1
# With vision-based parsing (slow)
# CONTROL_BACKEND: ["uia", "omniparser"]
Best Practice: Use ["uia"] as the default backend. Add "omniparser" only if you need vision-based control detection.
Action Configuration
Configure how UFO² performs actions on UI elements.
Core Action Settings
| Field | Type | Default | Description |
|---|---|---|---|
ACTION_SEQUENCE |
Boolean | False |
Enable multi-action sequences in one step |
SHOW_VISUAL_OUTLINE_ON_SCREEN |
Boolean | False |
Show visual highlights during execution |
MAXIMIZE_WINDOW |
Boolean | False |
Maximize application windows before actions |
JSON_PARSING_RETRY |
Integer | 3 |
Retries for parsing LLM JSON responses |
Click Settings
| Field | Type | Default | Description |
|---|---|---|---|
CLICK_API |
String | "click_input" |
Click method to use |
AFTER_CLICK_WAIT |
Integer | 0 |
Wait time after clicking (seconds) |
Input Settings
| Field | Type | Default | Description |
|---|---|---|---|
INPUT_TEXT_API |
String | "type_keys" |
Text input method |
INPUT_TEXT_ENTER |
Boolean | False |
Press Enter after typing |
INPUT_TEXT_INTER_KEY_PAUSE |
Float | 0.05 |
Pause between keystrokes (seconds) |
Example
# Recommended settings
ACTION_SEQUENCE: True # Enable multi-action for speed
SHOW_VISUAL_OUTLINE_ON_SCREEN: False
MAXIMIZE_WINDOW: False
JSON_PARSING_RETRY: 3
CLICK_API: "click_input"
AFTER_CLICK_WAIT: 0
INPUT_TEXT_API: "type_keys"
INPUT_TEXT_ENTER: False
INPUT_TEXT_INTER_KEY_PAUSE: 0.05
# For visual debugging
# SHOW_VISUAL_OUTLINE_ON_SCREEN: True
# If clicks are too fast
# AFTER_CLICK_WAIT: 1
# For automation that needs Enter key
# INPUT_TEXT_ENTER: True
Input Methods
type_keys: Simulates keyboard (slower, more realistic)set_text: Direct text insertion (faster, may not trigger events)
Logging
Control UFO²'s logging output and debugging information.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
PRINT_LOG |
Boolean | False |
Print logs to console |
LOG_LEVEL |
String | "DEBUG" |
Logging verbosity level |
LOG_TO_MARKDOWN |
Boolean | True |
Save logs as Markdown files |
LOG_XML |
Boolean | False |
Log UI tree XML at each step |
CONCAT_SCREENSHOT |
Boolean | False |
Concatenate control screenshots |
INCLUDE_LAST_SCREENSHOT |
Boolean | True |
Include previous screenshot in context |
SCREENSHOT_TO_MEMORY |
Boolean | True |
Load screenshots into memory |
REQUEST_TIMEOUT |
Integer | 250 |
Request timeout for vision models |
Log Levels
| Level | Usage | When to Use |
|---|---|---|
"DEBUG" |
Detailed debugging info | Development, troubleshooting |
"INFO" |
General information | Normal operation |
"WARNING" |
Warning messages | Production |
"ERROR" |
Errors only | Production (minimal logs) |
Example
# Development settings
PRINT_LOG: True
LOG_LEVEL: "DEBUG"
LOG_TO_MARKDOWN: True
LOG_XML: True # Useful for debugging UI detection
# Production settings
# PRINT_LOG: False
# LOG_LEVEL: "WARNING"
# LOG_TO_MARKDOWN: True
# LOG_XML: False
# Memory optimization
# SCREENSHOT_TO_MEMORY: False
Log Files Location
Logs are saved to logs/<timestamp>/ directory.
MCP Settings
Configure Model Context Protocol (MCP) tool servers.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
USE_MCP |
Boolean | True |
Enable MCP tool integration |
MCP_SERVERS_CONFIG |
String | "config/ufo/mcp.yaml" |
Path to MCP servers config |
MCP_PREFERRED_APPS |
List[String] | [] |
Apps that prefer MCP over UI automation |
MCP_FALLBACK_TO_UI |
Boolean | True |
Fall back to UI if MCP fails |
MCP_INSTRUCTIONS_PATH |
String | "ufo/config/mcp_instructions" |
MCP instruction templates path |
MCP_TOOL_TIMEOUT |
Integer | 30 |
MCP tool execution timeout (seconds) |
MCP_LOG_EXECUTION |
Boolean | False |
Log detailed MCP execution |
Example
# Recommended settings
USE_MCP: True
MCP_SERVERS_CONFIG: "config/ufo/mcp.yaml"
MCP_FALLBACK_TO_UI: True
MCP_TOOL_TIMEOUT: 30
MCP_LOG_EXECUTION: False
# Prefer MCP for VS Code and Terminal
MCP_PREFERRED_APPS:
- "Code.exe"
- "WindowsTerminal.exe"
# Debugging MCP issues
# MCP_LOG_EXECUTION: True
# MCP_TOOL_TIMEOUT: 60
What is MCP?
MCP (Model Context Protocol) provides programmatic APIs for applications, offering more reliable automation than UI-based control.
See MCP Configuration for details.
Safety
Security and safety controls to prevent dangerous operations.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
SAFE_GUARD |
Boolean | False |
Enable safety checks |
CONTROL_LIST |
List[String] | See below | Allowed UI control types |
Default CONTROL_LIST
CONTROL_LIST:
- "Button"
- "Edit"
- "TabItem"
- "Document"
- "ListItem"
- "MenuItem"
- "ScrollBar"
- "TreeItem"
- "Hyperlink"
- "ComboBox"
- "RadioButton"
- "Spinner"
- "CheckBox"
- "Group"
- "Text"
Example
# Enable safety for production
SAFE_GUARD: True
CONTROL_LIST:
- "Button"
- "Edit"
- "TabItem"
# Add only safe control types
# Disable for full automation (risky)
# SAFE_GUARD: False
Safety Warning
When SAFE_GUARD: True, UFO² will only interact with control types in CONTROL_LIST. This prevents accidental dangerous operations but may limit functionality.
Control Filtering
Advanced UI element filtering using semantic and icon similarity.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
CONTROL_FILTER_TYPE |
List[String] | [] |
Filter types to enable |
CONTROL_FILTER_TOP_K_PLAN |
Integer | 2 |
Top K plans to consider |
CONTROL_FILTER_TOP_K_SEMANTIC |
Integer | 15 |
Top K controls by text similarity |
CONTROL_FILTER_TOP_K_ICON |
Integer | 15 |
Top K controls by icon similarity |
CONTROL_FILTER_MODEL_SEMANTIC_NAME |
String | "all-MiniLM-L6-v2" |
Semantic embedding model |
CONTROL_FILTER_MODEL_ICON_NAME |
String | "clip-ViT-B-32" |
Icon embedding model |
Filter Types
| Type | Description | Use Case |
|---|---|---|
"TEXT" |
Text-based filtering | Filter by control labels |
"SEMANTIC" |
Semantic similarity | Find similar controls by meaning |
"ICON" |
Icon similarity | Find controls by icon appearance |
Example
# Disable filtering (use all controls)
CONTROL_FILTER_TYPE: []
# Enable semantic filtering (recommended)
CONTROL_FILTER_TYPE: ["SEMANTIC"]
CONTROL_FILTER_TOP_K_SEMANTIC: 15
CONTROL_FILTER_MODEL_SEMANTIC_NAME: "all-MiniLM-L6-v2"
# Enable all filtering (most selective)
# CONTROL_FILTER_TYPE: ["TEXT", "SEMANTIC", "ICON"]
# CONTROL_FILTER_TOP_K_SEMANTIC: 20
# CONTROL_FILTER_TOP_K_ICON: 20
Performance Impact
- Filtering reduces the number of controls sent to LLM (faster, cheaper)
- But may filter out the target control (less reliable)
- Start without filtering, add if you have too many controls
API Usage Configuration
Configure native API usage for Office applications.
Fields
| Field | Type | Default | Description |
|---|---|---|---|
USE_APIS |
Boolean | True |
Enable COM API usage for Office applications |
API_PROMPT |
String | "ufo/prompts/share/base/api.yaml" |
API prompt template |
APP_API_PROMPT_ADDRESS |
Dict | See below | App-specific API prompts |
Default APP_API_PROMPT_ADDRESS
APP_API_PROMPT_ADDRESS:
"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml"
"EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml"
"msedge.exe": "ufo/prompts/apps/web/api.yaml"
"chrome.exe": "ufo/prompts/apps/web/api.yaml"
"POWERPNT.EXE": "ufo/prompts/apps/powerpoint/api.yaml"
Example
# Enable API usage (recommended for Office)
USE_APIS: True
API_PROMPT: "ufo/prompts/share/base/api.yaml"
APP_API_PROMPT_ADDRESS:
"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml"
"EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml"
# Disable for pure UI automation
# USE_APIS: False
When to Use APIs
COM APIs are faster and more reliable for Office applications. Keep USE_APIS: True for best results with Word, Excel, PowerPoint.
Complete Example Configuration
Here's a complete, production-ready system.yaml:
# LLM Parameters
MAX_TOKENS: 2000
MAX_RETRY: 20
TEMPERATURE: 0.0
TOP_P: 0.0
TIMEOUT: 60
# Execution Limits
MAX_STEP: 100
MAX_ROUND: 3
SLEEP_TIME: 1
RECTANGLE_TIME: 1
# Control Backend
CONTROL_BACKEND: ["uia"]
IOU_THRESHOLD_FOR_MERGE: 0.1
# Action Configuration
ACTION_SEQUENCE: True
SHOW_VISUAL_OUTLINE_ON_SCREEN: False
MAXIMIZE_WINDOW: False
JSON_PARSING_RETRY: 3
CLICK_API: "click_input"
AFTER_CLICK_WAIT: 0
INPUT_TEXT_API: "type_keys"
INPUT_TEXT_ENTER: False
INPUT_TEXT_INTER_KEY_PAUSE: 0.05
# Logging
PRINT_LOG: False
LOG_LEVEL: "INFO"
LOG_TO_MARKDOWN: True
LOG_XML: False
CONCAT_SCREENSHOT: False
INCLUDE_LAST_SCREENSHOT: True
SCREENSHOT_TO_MEMORY: True
REQUEST_TIMEOUT: 250
# MCP Settings
USE_MCP: True
MCP_SERVERS_CONFIG: "config/ufo/mcp.yaml"
MCP_PREFERRED_APPS:
- "Code.exe"
- "WindowsTerminal.exe"
MCP_FALLBACK_TO_UI: True
MCP_TOOL_TIMEOUT: 30
MCP_LOG_EXECUTION: False
# Safety
SAFE_GUARD: True
CONTROL_LIST:
- "Button"
- "Edit"
- "TabItem"
- "Document"
- "ListItem"
- "MenuItem"
- "ScrollBar"
- "TreeItem"
- "Hyperlink"
- "ComboBox"
- "RadioButton"
# API Usage
USE_APIS: True
API_PROMPT: "ufo/prompts/share/base/api.yaml"
APP_API_PROMPT_ADDRESS:
"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml"
"EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml"
"msedge.exe": "ufo/prompts/apps/web/api.yaml"
# Control Filtering (disabled by default)
CONTROL_FILTER_TYPE: []
CONTROL_FILTER_TOP_K_PLAN: 2
CONTROL_FILTER_TOP_K_SEMANTIC: 15
CONTROL_FILTER_TOP_K_ICON: 15
CONTROL_FILTER_MODEL_SEMANTIC_NAME: "all-MiniLM-L6-v2"
CONTROL_FILTER_MODEL_ICON_NAME: "clip-ViT-B-32"
Programmatic Access
from config.config_loader import get_ufo_config
config = get_ufo_config()
# Access system settings
max_step = config.system.max_step
log_level = config.system.log_level
control_backends = config.system.control_backend
# Check MCP settings
if config.system.use_mcp:
mcp_config_path = config.system.mcp_servers_config
print(f"MCP enabled, config: {mcp_config_path}")
# Modify at runtime (not recommended)
# config.system.max_step = 200
Troubleshooting
Issue 1: Tasks Failing After X Steps
Error Message
Task stopped: Maximum steps (50) reached
Solution: Increase MAX_STEP
MAX_STEP: 100 # or higher
Issue 2: Controls Not Detected
Symptom: UFO² can't find UI elements
Solutions: 1. Try enabling omniparser for vision-based detection:
CONTROL_BACKEND: ["uia", "omniparser"]
CONTROL_FILTER_TYPE: []
Issue 3: Actions Too Fast
Symptom: Actions execute before UI is ready
Solution: Add delays
SLEEP_TIME: 2
AFTER_CLICK_WAIT: 1
Issue 4: Logs Too Verbose
Symptom: Too much console output
Solution: Reduce logging
PRINT_LOG: False
LOG_LEVEL: "WARNING"
```
---
## Performance Tuning
### For Speed
```yaml
MAX_STEP: 50
SLEEP_TIME: 0
CONTROL_BACKEND: ["uia"]
CONTROL_FILTER_TYPE: ["SEMANTIC"] # Reduce LLM input
ACTION_SEQUENCE: True # Multi-action in one step
For Reliability
MAX_STEP: 100
MAX_ROUND: 3
SLEEP_TIME: 2
AFTER_CLICK_WAIT: 1
CONTROL_BACKEND: ["uia"]
CONTROL_FILTER_TYPE: [] # Don't filter out controls
For Debugging
PRINT_LOG: True
LOG_LEVEL: "DEBUG"
LOG_XML: True
SHOW_VISUAL_OUTLINE_ON_SCREEN: True
MCP_LOG_EXECUTION: True
Related Documentation
- Agent Configuration - LLM and API settings
- MCP Configuration - Tool server configuration
- RAG Configuration - Knowledge retrieval
Summary
Key Takeaways:
✅ Default settings work - Start with defaults, adjust as needed
✅ Increase MAX_STEP for complex tasks
✅ Use ["uia"] for control detection
✅ Enable ACTION_SEQUENCE for faster execution
✅ Adjust logging based on dev vs production
✅ Enable MCP for better Office automation
Fine-tune system settings for optimal performance! ⚙️