Agent Configuration (agents.yaml)
Configure all LLM models and agent-specific settings for UFO². Each agent type can use different models and API configurations for optimal performance.
Overview
The agents.yaml file defines LLM settings for all agents in UFO². This is the most important configuration file as it contains your API keys and model selections.
File Location: config/ufo/agents.yaml
Initial Setup Required:
-
Copy the template file:
Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml -
Edit
config/ufo/agents.yamlwith your API keys and settings -
Never commit
agents.yamlto version control (it contains secrets)
Quick Start
Step 1: Create Configuration File
# Copy template to create your configuration
Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
Step 2: Configure Your LLM Provider
Choose your LLM provider and edit config/ufo/agents.yaml:
OpenAI:
HOST_AGENT:
VISUAL_MODE: True
API_TYPE: "openai"
API_BASE: "https://api.openai.com/v1/chat/completions"
API_KEY: "sk-YOUR_OPENAI_KEY_HERE"
API_MODEL: "gpt-4o"
API_VERSION: "2025-02-01-preview"
APP_AGENT:
VISUAL_MODE: True
API_TYPE: "openai"
API_BASE: "https://api.openai.com/v1/chat/completions"
API_KEY: "sk-YOUR_OPENAI_KEY_HERE"
API_MODEL: "gpt-4o-mini"
API_VERSION: "2025-02-01-preview"
Azure OpenAI:
HOST_AGENT:
VISUAL_MODE: True
API_TYPE: "aoai"
API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
API_KEY: "YOUR_AOAI_KEY"
API_MODEL: "gpt-4o"
API_VERSION: "2024-02-15-preview"
API_DEPLOYMENT_ID: "gpt-4o-deployment"
APP_AGENT:
VISUAL_MODE: True
API_TYPE: "aoai"
API_BASE: "https://YOUR_RESOURCE.openai.azure.com"
API_KEY: "YOUR_AOAI_KEY"
API_MODEL: "gpt-4o-mini"
API_VERSION: "2024-02-15-preview"
API_DEPLOYMENT_ID: "gpt-4o-mini-deployment"
Google Gemini:
HOST_AGENT:
VISUAL_MODE: True
API_TYPE: "gemini"
API_BASE: "https://generativelanguage.googleapis.com"
API_KEY: "YOUR_GEMINI_API_KEY"
API_MODEL: "gemini-2.0-flash-exp"
API_VERSION: "v1beta"
Anthropic Claude:
HOST_AGENT:
VISUAL_MODE: True
API_TYPE: "claude"
API_BASE: "https://api.anthropic.com"
API_KEY: "YOUR_CLAUDE_API_KEY"
API_MODEL: "claude-3-5-sonnet-20241022"
API_VERSION: "2023-06-01"
Step 3: Verify Configuration
from config.config_loader import get_ufo_config
config = get_ufo_config()
print(f"HOST_AGENT model: {config.host_agent.api_model}")
print(f"APP_AGENT model: {config.app_agent.api_model}")
Agent Types
UFO² uses different agents for different purposes. Each can be configured with different models.
| Agent | Purpose | Recommended Model | Frequency |
|---|---|---|---|
| HOST_AGENT | Task planning, app coordination | GPT-4o, GPT-4 | Low (planning) |
| APP_AGENT | Action execution, UI interaction | GPT-4o-mini, GPT-4o | High (every action) |
| BACKUP_AGENT | Fallback when others fail | GPT-4-vision-preview | Rare (errors) |
| EVALUATION_AGENT | Task completion evaluation | GPT-4o | Low (end of task) |
| OPERATOR | CUA-based automation | computer-use-preview | Optional |
Cost Optimization Tips:
- Use GPT-4o for HOST_AGENT (complex planning)
- Use GPT-4o-mini for APP_AGENT (frequent actions, 60% cheaper)
- Same model can be used for BACKUP_AGENT and EVALUATION_AGENT
Configuration Fields
Common Fields (All Agents)
These fields are available for HOST_AGENT, APP_AGENT, BACKUP_AGENT, EVALUATION_AGENT, and OPERATOR.
Core Settings
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
VISUAL_MODE |
Boolean | ❌ | True |
Enable vision capabilities (screenshot understanding) |
REASONING_MODEL |
Boolean | ❌ | False |
Whether model is a reasoning model (o1, o3, o3-mini) |
API_TYPE |
String | ✅ | "openai" |
LLM provider type |
API_BASE |
String | ✅ | varies | API endpoint URL |
API_KEY |
String | ✅ | "" |
API authentication key |
API_MODEL |
String | ✅ | varies | Model identifier |
API_VERSION |
String | ❌ | "2025-02-01-preview" |
API version |
Legend: ✅ = Required (must be set), ❌ = Optional (has default value)
API_TYPE Options
| API_TYPE | Provider | Example API_BASE |
|---|---|---|
"openai" |
OpenAI | https://api.openai.com/v1/chat/completions |
"aoai" |
Azure OpenAI | https://YOUR_RESOURCE.openai.azure.com |
"azure_ad" |
Azure OpenAI (AD auth) | https://YOUR_RESOURCE.openai.azure.com |
"gemini" |
Google Gemini | https://generativelanguage.googleapis.com |
"claude" |
Anthropic Claude | https://api.anthropic.com |
"qwen" |
Alibaba Qwen | varies |
"ollama" |
Ollama (local) | http://localhost:11434 |
Azure OpenAI Additional Fields
| Field | Type | Required | Description |
|---|---|---|---|
API_DEPLOYMENT_ID |
String | ✅ (for AOAI) | Azure deployment name |
Example:
HOST_AGENT:
API_TYPE: "aoai"
API_BASE: "https://myresource.openai.azure.com"
API_KEY: "abc123..."
API_MODEL: "gpt-4o"
API_DEPLOYMENT_ID: "gpt-4o-deployment-name"
Azure AD Authentication Fields
| Field | Type | Required | Description |
|---|---|---|---|
AAD_TENANT_ID |
String | ✅ (for azure_ad) | Azure AD tenant ID |
AAD_API_SCOPE |
String | ✅ (for azure_ad) | Azure AD API scope |
AAD_API_SCOPE_BASE |
String | ✅ (for azure_ad) | Scope base URL |
Example:
HOST_AGENT:
API_TYPE: "azure_ad"
API_BASE: "https://myresource.openai.azure.com"
AAD_TENANT_ID: "your-tenant-id"
AAD_API_SCOPE: "your-scope"
AAD_API_SCOPE_BASE: "API://your-scope-base"
API_MODEL: "gpt-4o"
API_DEPLOYMENT_ID: "gpt-4o-deployment"
Prompt Configuration
| Field | Type | Required | Description |
|---|---|---|---|
PROMPT |
String | ❌ | Path to main prompt template |
EXAMPLE_PROMPT |
String | ❌ | Path to example prompt template |
API_PROMPT |
String | ❌ | Path to API usage prompt (APP_AGENT only) |
Default Prompt Paths:
HOST_AGENT:
PROMPT: "ufo/prompts/share/base/host_agent.yaml"
EXAMPLE_PROMPT: "ufo/prompts/examples/{mode}/host_agent_example.yaml"
APP_AGENT:
PROMPT: "ufo/prompts/share/base/app_agent.yaml"
EXAMPLE_PROMPT: "ufo/prompts/examples/{mode}/app_agent_example.yaml"
API_PROMPT: "ufo/prompts/share/base/api.yaml"
You can customize prompts by creating your own YAML files and updating these paths. See the Customization Guide for details.
OPERATOR-Specific Fields
| Field | Type | Required | Description |
|---|---|---|---|
SCALER |
List[int] | ❌ | Screen dimensions for visual input [width, height], default: [1024, 768] |
Example:
OPERATOR:
SCALER: [1920, 1080] # Full HD resolution
API_MODEL: "computer-use-preview-20250311"
# ... other settings
Complete Configuration Example
Here's a complete agents.yaml with all agent types configured:
# HOST_AGENT - Task planning and coordination
HOST_AGENT:
VISUAL_MODE: True
REASONING_MODEL: False
API_TYPE: "openai"
API_BASE: "https://api.openai.com/v1/chat/completions"
API_KEY: "sk-YOUR_KEY_HERE"
API_MODEL: "gpt-4o"
API_VERSION: "2025-02-01-preview"
PROMPT: "ufo/prompts/share/base/host_agent.yaml"
EXAMPLE_PROMPT: "ufo/prompts/examples/{mode}/host_agent_example.yaml"
# APP_AGENT - Action execution
APP_AGENT:
VISUAL_MODE: True
REASONING_MODEL: False
API_TYPE: "openai"
API_BASE: "https://api.openai.com/v1/chat/completions"
API_KEY: "sk-YOUR_KEY_HERE"
API_MODEL: "gpt-4o-mini" # Cheaper for frequent actions
API_VERSION: "2025-02-01-preview"
PROMPT: "ufo/prompts/share/base/app_agent.yaml"
EXAMPLE_PROMPT: "ufo/prompts/examples/{mode}/app_agent_example.yaml"
API_PROMPT: "ufo/prompts/share/base/api.yaml"
# BACKUP_AGENT - Fallback agent
BACKUP_AGENT:
VISUAL_MODE: True
REASONING_MODEL: False
API_TYPE: "openai"
API_BASE: "https://api.openai.com/v1/chat/completions"
API_KEY: "sk-YOUR_KEY_HERE"
API_MODEL: "gpt-4-vision-preview"
API_VERSION: "2025-02-01-preview"
# EVALUATION_AGENT - Task evaluation
EVALUATION_AGENT:
VISUAL_MODE: True
REASONING_MODEL: False
API_TYPE: "openai"
API_BASE: "https://api.openai.com/v1/chat/completions"
API_KEY: "sk-YOUR_KEY_HERE"
API_MODEL: "gpt-4o"
API_VERSION: "2025-02-01-preview"
# OPERATOR - OpenAI Operator (optional)
OPERATOR:
SCALER: [1024, 768] # Screen resolution for visual input
VISUAL_MODE: True
REASONING_MODEL: False
API_TYPE: "openai"
API_BASE: "https://api.openai.com/v1/chat/completions"
API_KEY: "sk-YOUR_KEY_HERE"
API_MODEL: "computer-use-preview-20250311"
API_VERSION: "2025-03-01-preview"
Multi-Provider Configuration
You can use different providers for different agents:
# Use OpenAI for planning
HOST_AGENT:
API_TYPE: "openai"
API_BASE: "https://api.openai.com/v1/chat/completions"
API_KEY: "sk-YOUR_OPENAI_KEY"
API_MODEL: "gpt-4o"
# Use Azure OpenAI for actions (cost control)
APP_AGENT:
API_TYPE: "aoai"
API_BASE: "https://mycompany.openai.azure.com"
API_KEY: "YOUR_AZURE_KEY"
API_MODEL: "gpt-4o-mini"
API_DEPLOYMENT_ID: "gpt-4o-mini-deploy"
# Use Claude for evaluation
EVALUATION_AGENT:
API_TYPE: "claude"
API_BASE: "https://api.anthropic.com"
API_KEY: "YOUR_CLAUDE_KEY"
API_MODEL: "claude-3-5-sonnet-20241022"
Model Recommendations
For HOST_AGENT (Planning)
| Model | Provider | Pros | Cons |
|---|---|---|---|
| gpt-4o | OpenAI | Best overall, fast, multimodal | $$ |
| gpt-4-turbo | OpenAI | Good quality, cheaper than GPT-4 | Slower |
| claude-3-5-sonnet | Anthropic | Excellent reasoning | No vision API yet |
| gemini-2.0-flash | Fast, cheap, multimodal | New, less tested |
For APP_AGENT (Execution)
| Model | Provider | Pros | Cons |
|---|---|---|---|
| gpt-4o-mini | OpenAI | 60% cheaper, fast, good quality | Slightly less capable |
| gpt-4o | OpenAI | Best quality | More expensive |
| gemini-1.5-flash | Very cheap, fast | Less accurate |
For OPERATOR (CUA Mode)
| Model | Provider | Notes |
|---|---|---|
| computer-use-preview-20250311 | OpenAI | Supported model for Operator mode (Computer Use Agent) |
Reasoning Models
For models like OpenAI o1, o3, o3-mini, set REASONING_MODEL: True:
HOST_AGENT:
REASONING_MODEL: True # Enable for o1/o3/o3-mini
API_TYPE: "openai"
API_MODEL: "o3-mini"
# ... other settings
Note: Reasoning models have different behavior including no streaming responses, different token limits, and may have different pricing.
Environment Variables
Instead of hardcoding API keys, you can use environment variables:
HOST_AGENT:
API_KEY: "${OPENAI_API_KEY}" # Reads from environment variable
APP_AGENT:
API_KEY: "${AZURE_OPENAI_KEY}"
Setting environment variables:
Windows (PowerShell):
$env:OPENAI_API_KEY = "sk-your-key"
$env:AZURE_OPENAI_KEY = "your-azure-key"
Windows (Persistent):
[System.Environment]::SetEnvironmentVariable('OPENAI_API_KEY', 'sk-your-key', 'User')
Linux/macOS:
export OPENAI_API_KEY="sk-your-key"
export AZURE_OPENAI_KEY="your-azure-key"
Programmatic Access
from config.config_loader import get_ufo_config
config = get_ufo_config()
# Access HOST_AGENT settings
host_model = config.host_agent.api_model
host_type = config.host_agent.api_type
host_visual = config.host_agent.visual_mode
# Access APP_AGENT settings
app_model = config.app_agent.api_model
app_key = config.app_agent.api_key
# Check if agent is configured
if config.host_agent.api_key:
print("HOST_AGENT is configured")
else:
print("Warning: HOST_AGENT API key not set")
Troubleshooting
Issue 1: "agents.yaml not found"
Error Message:
FileNotFoundError: config/ufo/agents.yaml not found
Solution: Copy the template file
Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
Issue 2: API Authentication Errors
Error Message:
openai.AuthenticationError: Invalid API key
Solutions: 1. Verify API key is correct 2. Check for extra spaces or quotes 3. Ensure API_TYPE matches your provider 4. For Azure, verify API_DEPLOYMENT_ID is set
Issue 3: Model Not Found
Error Message:
openai.NotFoundError: The model 'gpt-4o' does not exist
Solutions: 1. Verify model name is correct (check provider's documentation) 2. For Azure, ensure deployment exists and API_DEPLOYMENT_ID matches 3. Check if you have access to the model
Issue 4: Rate Limits
Error Message:
openai.RateLimitError: Rate limit exceeded
Solutions:
1. Add delays between requests (configure in system.yaml)
2. Upgrade your API plan
3. Use different API keys for different agents
Security Best Practices
API Key Security Guidelines:
- ✅ Never commit
agents.yamlto Git - Add to
.gitignore -
Only commit
agents.yaml.template -
✅ Use environment variables for production
API_KEY: "${OPENAI_API_KEY}" -
✅ Rotate keys regularly
-
✅ Use separate keys for dev/prod environments
-
✅ Restrict key permissions (e.g., read-only for evaluation agents)
Related Documentation
- Third-Party Agent Configuration - Configure external agents like LinuxAgent and HardwareAgent
- Creating Custom Third-Party Agents - Build your own specialized agents
- System Configuration - Runtime and execution settings
- MCP Configuration - Tool server configuration
- RAG Configuration - Knowledge retrieval settings
- Model Setup Guide - Provider-specific setup
- Migration Guide - Migrating from legacy config
Summary
Key Takeaways:
✅ Copy template first: Copy-Item config\ufo\agents.yaml.template config\ufo\agents.yaml
✅ Add your API keys: Edit agents.yaml with your credentials
✅ Choose models wisely: GPT-4o for planning, GPT-4o-mini for actions
✅ Never commit secrets: Keep agents.yaml out of version control
✅ Use environment variables: For production deployments
Your agents are now ready to work! 🚀