Supported Models
UFO supports a wide variety of LLM models and APIs. You can configure different models for HOST_AGENT, APP_AGENT, BACKUP_AGENT, and EVALUATION_AGENT in the config/ufo/agents.yaml file to optimize for performance, cost, or specific capabilities.
Available Model Integrations
| Provider | Documentation | Visual Support | Authentication |
|---|---|---|---|
| OpenAI | OpenAI API | ✅ | API Key |
| Azure OpenAI (AOAI) | Azure OpenAI API | ✅ | API Key / Azure AD |
| Google Gemini | Gemini API | ✅ | API Key |
| Anthropic Claude | Claude API | ✅ | API Key |
| Qwen (Alibaba) | Qwen API | ✅ | API Key |
| DeepSeek | DeepSeek API | ❌ | API Key |
| Ollama | Ollama API | ⚠️ Limited | Local |
| OpenAI Operator | Operator (CUA) | ✅ | Azure AD |
| Custom Models | Custom API | Depends | Varies |
Model Selection Guide
By Use Case
For Production Deployments: - Primary: OpenAI GPT-4o or Azure OpenAI (enterprise features) - Cost-optimized: GPT-4o-mini for APP_AGENT, GPT-4o for HOST_AGENT - Privacy-sensitive: Ollama (local models)
For Development & Testing: - Fast iteration: Gemini 2.0 Flash (high speed, low cost) - Local testing: Ollama with llama2 or similar - Budget-friendly: DeepSeek or Qwen models
For Specialized Tasks: - Computer control: OpenAI Operator (CUA model) - Code generation: DeepSeek-Coder or Claude - Long context: Gemini 1.5 Pro (large context window)
By Capability
Vision Support (Screenshot Understanding): - ✅ OpenAI GPT-4o, GPT-4-turbo - ✅ Azure OpenAI (vision-enabled deployments) - ✅ Google Gemini (all 1.5+ models) - ✅ Claude 3+ (all variants) - ✅ Qwen-VL models - ⚠️ Ollama (llava models only) - ❌ DeepSeek (text-only)
JSON Schema Support: - ✅ OpenAI / Azure OpenAI - ✅ Google Gemini - ⚠️ Limited: Claude, Qwen, Ollama
Configuration Architecture
Each model is implemented as a separate class in the ufo/llm directory, inheriting from the BaseService class in ufo/llm/base.py. All models implement the chat_completion method to maintain a consistent interface.
Key Configuration Files:
config/ufo/agents.yaml: Primary agent configuration (HOST, APP, BACKUP, EVALUATION, OPERATOR)config/ufo/system.yaml: System-wide LLM parameters (MAX_TOKENS, TEMPERATURE, etc.)config/ufo/prices.yaml: Cost tracking for different models
Multi-Provider Setup
You can mix and match providers for different agents to optimize cost and performance:
# Use OpenAI for planning
HOST_AGENT:
API_TYPE: "openai"
API_MODEL: "gpt-4o"
# Use Azure OpenAI for execution (cost control)
APP_AGENT:
API_TYPE: "aoai"
API_MODEL: "gpt-4o-mini"
# Use Claude for evaluation
EVALUATION_AGENT:
API_TYPE: "claude"
API_MODEL: "claude-3-5-sonnet-20241022"
Getting Started
- Choose your LLM provider from the table above
- Follow the provider-specific documentation to obtain API keys
- Configure
config/ufo/agents.yamlwith your credentials - Refer to the Quick Start Guide to begin
For detailed configuration options:
- Agent Configuration Guide - Complete configuration reference
- System Configuration - LLM parameters and behavior
- Quick Start Guide - Step-by-step setup