More Guidance
This page provides additional guidance and resources for different user types and use cases.
🎯 For End Users
If you want to use UFO³ to automate your tasks on Windows, Linux, or across multiple devices, here's your learning path:
1. Getting Started (5-10 minutes)
Choose your path based on your needs:
| Your Goal | Start Here | Time |
|---|---|---|
| Automate Windows desktop tasks | UFO² Quick Start | 5 min |
| Manage Linux servers | Linux Quick Start | 10 min |
| Orchestrate multiple devices | Galaxy Quick Start | 10 min |
2. Configure Your Environment (10-20 minutes)
After installation, customize UFO³ to your needs:
Essential Configuration:
- Agent Configuration - Set up LLM API keys (OpenAI, Azure, Gemini, Claude, etc.)
- System Configuration - Adjust runtime settings (step limits, timeouts, logging)
Optional Enhancements:
- RAG Configuration - Add external knowledge sources:
- Offline help documents
- Bing search integration
- Experience learning from past tasks
- User demonstrations
- MCP Configuration - Enable tool servers for:
- Better Office automation
- Linux command execution
- Custom tool integration
💡 Configuration Tip: Start with default settings and adjust only what you need. See Configuration Overview for the big picture.
3. Learn Core Features (20-30 minutes)
For UFO² Users (Windows Desktop Automation):
| Feature | Documentation | What It Does |
|---|---|---|
| Hybrid GUI-API Execution | Hybrid Actions | Combines UI automation with native API calls for faster, more reliable execution |
| Knowledge Substrate | Knowledge Overview | Augments agents with external knowledge (docs, search, experience) |
| MCP Integration | MCP Overview | Extends capabilities with custom tools and Office APIs |
For Galaxy Users (Multi-Device Orchestration):
| Feature | Documentation | What It Does |
|---|---|---|
| Task Constellation | Constellation Overview | Decomposes tasks into parallel DAGs across devices |
| Device Capabilities | Galaxy Devices Config | Routes tasks based on device capabilities and metadata |
| Asynchronous Execution | Constellation Overview | Executes subtasks in parallel for faster completion |
| Agent Interaction Protocol | AIP Overview | Enables persistent WebSocket communication between devices |
4. Troubleshooting & Support
When Things Go Wrong:
- Check the FAQ - Common issues and solutions
- Review logs - Located in
logs/<task-name>/:logs/my-task-2025-11-11/ ├── request.log # Request logs ├── response.log # Response logs ├── action_step*.png # Screenshots at each step └── action_step*_annotated.png # Annotated screenshots - Validate configuration:
python -m ufo.tools.validate_config ufo --show-config - Enable debug logging:
# config/ufo/system.yaml LOG_LEVEL: "DEBUG"
Get Help:
- GitHub Discussions - Ask questions, share tips
- GitHub Issues - Report bugs, request features
- Email: ufo-agent@microsoft.com
👨💻 For Developers
If you want to contribute to UFO³ or build extensions, here's your development guide:
1. Understand the Architecture (30-60 minutes)
Start with the big picture:
- Project Structure - Codebase organization and component roles
- Configuration Architecture - New modular config system design
Deep dive into core components:
| Component | Documentation | What to Learn |
|---|---|---|
| Session | Session Module | Task lifecycle management, state tracking |
| Round | Round Module | Single agent reasoning cycle |
| HostAgent | HostAgent | High-level task planning and app selection |
| AppAgent | AppAgent | Low-level action execution |
| ConstellationAgent | ConstellationAgent | Multi-device task orchestration |
2. Set Up Development Environment (15-30 minutes)
Installation:
# Clone the repository
git clone https://github.com/microsoft/UFO.git
cd UFO
# Create development environment
conda create -n ufo-dev python=3.10
conda activate ufo-dev
# Install dependencies (including dev tools)
pip install -r requirements.txt
pip install pytest pytest-cov black flake8 # Testing & linting
Configuration:
# Create config files from templates
cp config/ufo/agents.yaml.template config/ufo/agents.yaml
cp config/galaxy/agent.yaml.template config/galaxy/agent.yaml
# Edit with your development API keys
# (Consider using lower-cost models for testing)
3. Explore the Codebase (1-2 hours)
Key Directories:
UFO/
├── ufo/ # Core UFO² implementation
│ ├── agents/ # HostAgent, AppAgent
│ ├── automator/ # UI automation engines
│ ├── prompter/ # Prompt management
│ └── module/ # Core modules (Session, Round)
├── galaxy/ # Galaxy orchestration framework
│ ├── agents/ # ConstellationAgent
│ ├── constellation/ # DAG orchestration
│ └── core/ # Core Galaxy infrastructure
├── aip/ # Agent Interaction Protocol
│ ├── protocol/ # Message definitions
│ └── transport/ # WebSocket transport
├── ufo/client/ # Device agents (Windows, Linux)
│ ├── client.py # Generic client
│ └── mcp/ # MCP integration
├── ufo/server/ # Device agent server
│ └── app.py # FastAPI server
└── config/ # Configuration system
├── ufo/ # UFO² configs
└── galaxy/ # Galaxy configs
Entry Points:
- UFO² Main:
ufo/__main__.py - Galaxy Main:
galaxy/__main__.py - Server:
ufo/server/app.py - Client:
ufo/client/client.py
4. Development Workflows
Adding a New Feature
- Identify the component to modify (Agent, Module, Automator, etc.)
- Read existing code in that component
- Check related tests in
tests/directory - Implement your feature following existing patterns
- Add tests for your feature
- Update documentation if needed
Extending Configuration
See Extending Configuration for: - Adding custom fields - Creating new config modules - Environment-specific overrides - Plugin configuration patterns
Creating Custom MCP Servers
See Creating MCP Servers Tutorial for: - MCP server architecture - Tool definition and registration - HTTP vs. local vs. stdio servers - Integration with UFO³
5. Testing & Debugging
Run Tests:
# Run all tests
pytest
# Run specific test file
pytest tests/config/test_config_system.py
# Run with coverage
pytest --cov=ufo --cov-report=html
Debug Logging:
# Add debug logs to your code
import logging
logger = logging.getLogger(__name__)
logger.debug("Debug message with context: %s", variable)
logger.info("Informational message")
logger.warning("Warning message")
logger.error("Error message")
Interactive Debugging:
# Add breakpoint in code
import pdb; pdb.set_trace()
# Or use VS Code debugger with launch.json
6. Code Style & Best Practices
Formatting:
# Auto-format with black
black ufo/ galaxy/
# Check style with flake8
flake8 ufo/ galaxy/
Best Practices:
- ✅ Use type hints:
def process(data: Dict[str, Any]) -> Optional[str]: - ✅ Write docstrings for public functions
- ✅ Follow existing code patterns
- ✅ Add comments for complex logic
- ✅ Keep functions focused and modular
- ✅ Handle errors gracefully
- ✅ Write tests for new features
Configuration Best Practices:
- ✅ Use typed config access:
config.system.max_step - ✅ Provide
.templatefiles for sensitive configs - ✅ Document custom fields in YAML comments
- ✅ Use environment variables for secrets:
${OPENAI_API_KEY} - ✅ Validate configurations early:
ConfigValidator.validate()
7. Contributing Guidelines
Before Submitting a PR:
- Test your changes thoroughly
- Update documentation if needed
- Follow code style (black + flake8)
- Write clear commit messages
- Reference related issues in PR description
PR Template:
## Description
Brief description of changes
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Documentation update
- [ ] Refactoring
## Testing
- [ ] Added tests for new functionality
- [ ] All tests pass locally
- [ ] Manual testing completed
## Checklist
- [ ] Code follows project style
- [ ] Documentation updated
- [ ] No breaking changes (or documented)
8. Advanced Topics
For Deep Customization:
- Prompt Engineering - Customize agent prompts
- State Management - Constellation state machine internals
- Protocol Extensions - Extend AIP message types
- Custom Automators - Implement new automation backends
🎓 Learning Paths
Path 1: Basic User → Power User
- ✅ Complete quick start for your platform
- ✅ Run 5-10 simple automation tasks
- ✅ Configure RAG for your organization's docs
- ✅ Enable MCP for better Office automation
- ✅ Set up experience learning for common tasks
- ✅ Create custom device configurations (Galaxy)
Time Investment: 2-4 hours
Outcome: Efficient automation of daily tasks
Path 2: Power User → Developer
- ✅ Understand project structure and architecture
- ✅ Read Session and Round module code
- ✅ Create a custom MCP server
- ✅ Add custom metadata to device configs
- ✅ Contribute documentation improvements
- ✅ Submit your first bug fix PR
Time Investment: 10-20 hours
Outcome: Ability to extend and customize UFO³
Path 3: Developer → Core Contributor
- ✅ Deep dive into agent implementations
- ✅ Understand Galaxy orchestration internals
- ✅ Study AIP protocol and transport layer
- ✅ Implement a new agent capability
- ✅ Add support for a new LLM provider
- ✅ Contribute major features or refactorings
Time Investment: 40+ hours
Outcome: Core contributor to UFO³ project
📚 Additional Resources
Documentation Hubs
| Topic | Link | Description |
|---|---|---|
| Getting Started | Getting Started Index | All quick start guides |
| Configuration | Configuration Overview | Complete config system documentation |
| Architecture | Galaxy Overview, UFO² Overview | System architecture and design |
| API Reference | Agent APIs | Agent interfaces and APIs |
| Tutorials | Creating Device Agents | Step-by-step guides |
Community Resources
- GitHub Repository - Source code and releases
- GitHub Discussions - Q&A and community
- GitHub Issues - Bug reports and features
- Project Website - Official website
Research Papers
- UFO v1 (Feb 2024): A UI-Focused Agent for Windows OS Interaction
- UFO² v2 (Apr 2025): A Windows Agent for Seamless OS Interaction
- UFO³ Galaxy (Nov 2025): UFO³: Weaving the Digital Agent Galaxy (Coming Soon)
🆘 Need More Help?
- Can't find what you're looking for? Check the FAQ
- Still stuck? Ask on GitHub Discussions
- Found a bug? Open an issue on GitHub Issues
- Want to contribute? Read the Contributing Guidelines
Happy automating! 🚀