Project Directory Structure
This repository implements UFOยณ, a multi-tier AgentOS architecture spanning from single-device automation (UFOยฒ) to cross-device orchestration (Galaxy). This document provides an overview of the directory structure to help you understand the codebase organization.
New to UFOยณ? Start with the Documentation Home for an introduction and Quick Start Guide to get up and running.
Architecture Overview:
- ๐ Galaxy: Multi-device DAG-based orchestration framework that coordinates agents across different platforms
- ๐ฏ UFOยฒ: Single-device Windows desktop agent system that can serve as Galaxy's sub-agent
- ๐ AIP: Agent Integration Protocol for cross-device communication
- โ๏ธ Modular Configuration: Type-safe configs in
config/galaxy/andconfig/ufo/
๐ฆ Root Directory Structure
UFO/
โโโ galaxy/ # ๐ Multi-device orchestration framework
โโโ ufo/ # ๐ฏ Desktop AgentOS (can be Galaxy sub-agent)
โโโ config/ # โ๏ธ Modular configuration system
โโโ aip/ # ๐ Agent Integration Protocol
โโโ documents/ # ๐ MkDocs documentation site
โโโ vectordb/ # ๐๏ธ Vector database for RAG
โโโ learner/ # ๐ Help document indexing tools
โโโ record_processor/ # ๐ฅ Human demonstration parser
โโโ dataflow/ # ๐ Data collection pipeline
โโโ model_worker/ # ๐ค Custom LLM deployment tools
โโโ logs/ # ๐ Execution logs (auto-generated)
โโโ scripts/ # ๐ ๏ธ Utility scripts
โโโ tests/ # ๐งช Unit and integration tests
โโโ requirements.txt # ๐ฆ Python dependencies
๐ Galaxy Framework (galaxy/)
The cross-device orchestration framework that transforms natural language requests into executable DAG workflows distributed across heterogeneous devices.
Directory Structure
galaxy/
โโโ agents/ # ๐ค Constellation orchestration agents
โ โโโ agent/ # ConstellationAgent and basic agent classes
โ โโโ states/ # Agent state machines
โ โโโ processors/ # Request/result processing
โ โโโ presenters/ # Response formatting
โ
โโโ constellation/ # ๐ Core DAG management system
โ โโโ task_constellation.py # TaskConstellation - DAG container
โ โโโ task_star.py # TaskStar - Task nodes
โ โโโ task_star_line.py # TaskStarLine - Dependency edges
โ โโโ enums.py # Enums for constellation components
โ โโโ editor/ # Interactive DAG editing with undo/redo
โ โโโ orchestrator/ # Event-driven execution coordination
โ
โโโ session/ # ๐ Session lifecycle management
โ โโโ galaxy_session.py # GalaxySession implementation
โ โโโ observers/ # Event-driven observers
โ
โโโ client/ # ๐ก Device management
โ โโโ constellation_client.py # Device registration interface
โ โโโ device_manager.py # Device management coordinator
โ โโโ config_loader.py # Configuration loading
โ โโโ components/ # Device registry, connection manager, etc.
โ โโโ support/ # Client support utilities
โ
โโโ core/ # โก Foundational components
โ โโโ types.py # Type system (protocols, dataclasses, enums)
โ โโโ interfaces.py # Interface definitions
โ โโโ di_container.py # Dependency injection container
โ โโโ events.py # Event system
โ
โโโ visualization/ # ๐จ Rich console visualization
โ โโโ dag_visualizer.py # DAG topology visualization
โ โโโ task_display.py # Task status displays
โ โโโ components/ # Visualization components
โ
โโโ prompts/ # ๐ฌ Prompt templates
โ โโโ constellation_agent/ # ConstellationAgent prompts
โ โโโ share/ # Shared examples
โ
โโโ trajectory/ # ๐ Execution trajectory parsing
โ
โโโ __main__.py # ๐ Entry point: python -m galaxy
โโโ galaxy.py # Main Galaxy orchestrator
โโโ galaxy_client.py # Galaxy client interface
โโโ README.md # Galaxy overview
โโโ README_ZH.md # Galaxy overview (Chinese)
Key Components
| Component | Description | Documentation |
|---|---|---|
| ConstellationAgent | AI-powered agent that generates and modifies task DAGs | Galaxy Overview |
| TaskConstellation | DAG container with validation and state management | Constellation |
| TaskOrchestrator | Event-driven execution coordinator | Constellation Orchestrator |
| DeviceManager | Multi-device coordination and assignment | Device Manager |
| Visualization | Rich console DAG monitoring | Galaxy Overview |
Galaxy Documentation:
- Galaxy Overview - Architecture and concepts
- Quick Start - Get started with Galaxy
- Constellation Agent - AI-powered task planning
- Constellation Orchestrator - Event-driven coordination
- Device Manager - Multi-device management
๐ฏ UFOยฒ Desktop AgentOS (ufo/)
Single-device desktop automation system implementing a two-tier agent architecture (HostAgent + AppAgent) with hybrid GUI-API automation.
Directory Structure
ufo/
โโโ agents/ # Two-tier agent implementation
โ โโโ agent/ # Base agent classes (HostAgent, AppAgent)
โ โโโ states/ # State machine implementations
โ โโโ processors/ # Processing strategy pipelines
โ โโโ memory/ # Agent memory and blackboard
โ โโโ presenters/ # Response presentation logic
โ
โโโ server/ # Server-client architecture components
โ โโโ websocket_server.py # WebSocket server for remote agent control
โ โโโ handlers/ # Request handlers
โ
โโโ client/ # MCP client and device management
โ โโโ mcp/ # MCP server manager
โ โ โโโ local_servers/ # Built-in MCP servers (UI, CLI, Office COM)
โ โ โโโ http_servers/ # Remote MCP servers (hardware, Linux)
โ โโโ ufo_client.py # UFOยฒ client implementation
โ โโโ computer.py # Computer/device abstraction
โ
โโโ automator/ # GUI and API automation layer
โ โโโ ui_control/ # GUI automation (inspector, controller)
โ โโโ puppeteer/ # Execution orchestration
โ โโโ *_automator.py # App-specific automators (Excel, Word, etc.)
โ
โโโ prompter/ # Prompt construction engines
โโโ prompts/ # Jinja2 prompt templates
โ โโโ host_agent/ # HostAgent prompts
โ โโโ app_agent/ # AppAgent prompts
โ โโโ share/ # Shared components
โ
โโโ llm/ # LLM provider integrations
โโโ rag/ # Retrieval-Augmented Generation
โโโ trajectory/ # Task trajectory parsing
โโโ experience/ # Self-experience learning
โโโ module/ # Core modules (session, round, context)
โโโ config/ # Legacy config support
โโโ logging/ # Logging utilities
โโโ utils/ # Utility functions
โโโ tools/ # CLI tools (config conversion, etc.)
โ
โโโ __main__.py # Entry point: python -m ufo
โโโ ufo.py # Main UFOยฒ orchestrator
Key Components
| Component | Description | Documentation |
|---|---|---|
| HostAgent | Desktop-level orchestration with 7-state FSM | HostAgent Overview |
| AppAgent | Application-level execution with 6-state FSM | AppAgent Overview |
| MCP System | Extensible command execution framework | MCP Overview |
| Automator | Hybrid GUI-API automation with fallback | Core Features |
| RAG | Knowledge retrieval from multiple sources | Knowledge Substrate |
UFOยฒ Documentation:
- UFOยฒ Overview - Architecture and concepts
- Quick Start - Get started with UFOยฒ
- HostAgent States - Desktop orchestration states
- AppAgent States - Application execution states
- As Galaxy Device - Using UFOยฒ as Galaxy sub-agent
- Creating Custom Agents - Build your own application agents
๐ Agent Integration Protocol (aip/)
Standardized message passing protocol for cross-device communication between Galaxy and UFOยฒ agents.
aip/
โโโ messages.py # Message types (Command, Result, Event, Error)
โโโ protocol/ # Protocol definitions
โโโ transport/ # Transport layers (HTTP, WebSocket, MQTT)
โโโ endpoints/ # API endpoints
โโโ extensions/ # Protocol extensions
โโโ resilience/ # Retry and error handling
Purpose: Enables Galaxy to coordinate UFOยฒ agents running on different devices and platforms through standardized messaging over HTTP/WebSocket.
Documentation: See AIP Overview for protocol details and Message Types for message specifications.
๐ง Linux Agent
Lightweight CLI-based agent for Linux devices that integrates with Galaxy as a third-party device agent.
Key Features: - CLI Execution: Execute shell commands on Linux systems - Galaxy Integration: Register as device in Galaxy's multi-device orchestration - Simple Architecture: Minimal dependencies, easy deployment - Cross-Platform Tasks: Enable Windows + Linux workflows in Galaxy
Configuration: Configured in config/ufo/third_party.yaml under THIRD_PARTY_AGENT_CONFIG.LinuxAgent
Linux Agent Documentation:
- Linux Agent Overview - Architecture and capabilities
- Quick Start - Setup and deployment
- As Galaxy Device - Integration with Galaxy
๐ฑ Mobile Agent
Android device automation agent that enables UI automation, app control, and mobile-specific operations through ADB integration.
Key Features: - UI Automation: Touch, swipe, and text input via ADB - Visual Context: Screenshot capture and UI hierarchy analysis - App Management: Launch apps, navigate between applications - Galaxy Integration: Serve as mobile device in cross-platform workflows - Platform Support: Android devices (physical and emulators)
Configuration: Configured in config/ufo/third_party.yaml under THIRD_PARTY_AGENT_CONFIG.MobileAgent
Mobile Agent Documentation:
- Mobile Agent Overview - Architecture and capabilities
- Quick Start - Setup and deployment
- As Galaxy Device - Integration with Galaxy
โ๏ธ Configuration (config/)
Modular configuration system with type-safe schemas and auto-discovery.
config/
โโโ galaxy/ # Galaxy configuration
โ โโโ agent.yaml.template # ConstellationAgent LLM settings template
โ โโโ agent.yaml # ConstellationAgent LLM settings (active)
โ โโโ constellation.yaml # Constellation orchestration settings
โ โโโ devices.yaml # Multi-device registry
โ โโโ dag_templates/ # Pre-built DAG templates (future)
โ
โโโ ufo/ # UFOยฒ configuration
โ โโโ agents.yaml.template # Agent LLM configs template
โ โโโ agents.yaml # Agent LLM configs (active)
โ โโโ system.yaml # System settings
โ โโโ rag.yaml # RAG settings
โ โโโ mcp.yaml # MCP server configs
โ โโโ third_party.yaml # Third-party agent configs (LinuxAgent, etc.)
โ โโโ prices.yaml # API pricing data
โ
โโโ config_loader.py # Auto-discovery config loader
โโโ config_schemas.py # Pydantic validation schemas
Configuration Files:
- Template files (
.yaml.template) should be copied to.yamland edited - Active config files (
.yaml) contain API keys and should NOT be committed - Galaxy: Uses
config/galaxy/agent.yamlfor ConstellationAgent LLM settings - UFOยฒ: Uses
config/ufo/agents.yamlfor HostAgent/AppAgent LLM settings - Third-Party: Configure LinuxAgent and HardwareAgent in
config/ufo/third_party.yaml - Use
python -m ufo.tools.convert_configto migrate from legacy configs
Configuration Documentation:
- Configuration Overview - System architecture
- Agents Configuration - LLM and agent settings
- System Configuration - Runtime and execution settings
- RAG Configuration - Knowledge retrieval
- Third-Party Configuration - LinuxAgent and external agents
- MCP Configuration - MCP server setup
- Model Configuration - LLM provider setup
๐ Documentation (documents/)
MkDocs documentation site with comprehensive guides and API references.
documents/
โโโ docs/ # Markdown documentation source
โ โโโ getting_started/ # Installation and quick starts
โ โโโ galaxy/ # Galaxy framework docs
โ โโโ ufo2/ # UFOยฒ architecture docs
โ โโโ linux/ # Linux agent documentation
โ โโโ mcp/ # MCP server documentation
โ โโโ aip/ # Agent Interaction Protocol docs
โ โโโ configuration/ # Configuration guides
โ โโโ infrastructure/ # Core infrastructure (agents, modules)
โ โโโ server/ # Server-client architecture docs
โ โโโ client/ # Client components docs
โ โโโ tutorials/ # Step-by-step tutorials
โ โโโ modules/ # Module-specific docs
โ โโโ about/ # Project information
โ
โโโ mkdocs.yml # MkDocs configuration
โโโ site/ # Generated static site
Documentation Sections:
| Section | Description |
|---|---|
| Getting Started | Installation, quick starts, migration guides |
| Galaxy | Multi-device orchestration, DAG workflows, device management |
| UFOยฒ | Desktop agents, automation features, benchmarks |
| Linux | Linux agent integration, CLI executor for Galaxy |
| MCP | Server documentation, custom server development |
| AIP | Agent Interaction Protocol, message types, transport layers |
| Configuration | System settings, model configs, deployment |
| Infrastructure | Core components, agent design, server-client architecture |
| Tutorials | Creating agents, custom automators, advanced usage |
๐๏ธ Supporting Modules
VectorDB (vectordb/)
Vector database storage for RAG knowledge sources (help documents, execution traces, user demonstrations). See RAG Configuration for setup details.
Learner (learner/)
Tools for indexing help documents into vector database for RAG retrieval. Integrates with the Knowledge Substrate feature.
Record Processor (record_processor/)
Parses human demonstrations from Windows Step Recorder for learning from user actions.
Dataflow (dataflow/)
Data collection pipeline for Large Action Model (LAM) training. See the Dataflow documentation for workflow details.
Model Worker (model_worker/)
Custom LLM deployment tools for running local models. See Model Configuration for supported providers.
Logs (logs/)
Auto-generated execution logs organized by task and timestamp, including screenshots, UI trees, and agent actions.
๐ฏ Galaxy vs UFOยฒ vs Linux Agent vs Mobile Agent: When to Use What?
| Aspect | Galaxy | UFOยฒ | Linux Agent | Mobile Agent |
|---|---|---|---|---|
| Scope | Multi-device orchestration | Single-device Windows automation | Single-device Linux CLI | Single-device Android automation |
| Use Cases | Cross-platform workflows, distributed tasks | Desktop automation, Office tasks | Server management, CLI operations | Mobile app testing, UI automation |
| Architecture | DAG-based task workflows | Two-tier state machines | Simple CLI executor | UI automation via ADB |
| Platform | Orchestrator (platform-agnostic) | Windows | Linux | Android |
| Complexity | Complex multi-step workflows | Simple to moderate tasks | Simple command execution | UI interaction and app control |
| Best For | Cross-device collaboration | Windows desktop tasks | Linux server operations | Mobile app automation |
| Integration | Orchestrates all agents | Can be Galaxy device | Can be Galaxy device | Can be Galaxy device |
Choosing the Right Framework:
- Use Galaxy when: Tasks span multiple devices/platforms, complex workflows with dependencies
- Use UFOยฒ Standalone when: Single-device Windows automation, rapid prototyping
- Use Linux Agent when: Linux server/CLI operations needed in Galaxy workflows
- Use Mobile Agent when: Android device automation, mobile app testing, UI interactions
- Best Practice: Galaxy orchestrates UFOยฒ (Windows) + Linux Agent (Linux) + Mobile Agent (Android) for comprehensive cross-platform tasks
๐ Quick Start
Galaxy Multi-Device Orchestration
# Interactive mode
python -m galaxy --interactive
# Single request
python -m galaxy --request "Your cross-device task"
Documentation: Galaxy Quick Start
UFOยฒ Desktop Automation
# Interactive mode
python -m ufo --task <task_name>
# With custom config
python -m ufo --task <task_name> --config_path config/ufo/
Documentation: UFOยฒ Quick Start
๐ Key Documentation Links
Getting Started
- Installation & Setup
- Galaxy Quick Start
- UFOยฒ Quick Start
- Linux Agent Quick Start
- Mobile Agent Quick Start
- Migration Guide
Galaxy Framework
UFOยฒ Desktop AgentOS
Linux Agent
Mobile Agent
MCP System
Agent Integration Protocol
Configuration
- Configuration Overview
- Agents Configuration
- System Configuration
- Model Configuration
- MCP Configuration
๐๏ธ Architecture Principles
UFOยณ follows SOLID principles and established software engineering patterns:
- Single Responsibility: Each component has a focused purpose
- Open/Closed: Extensible through interfaces and plugins
- Interface Segregation: Focused interfaces for different capabilities
- Dependency Inversion: Dependency injection for loose coupling
- Event-Driven: Observer pattern for real-time monitoring
- State Machines: Well-defined states and transitions for agents
- Command Pattern: Encapsulated DAG editing with undo/redo
๐ Additional Resources
- GitHub Repository - Source code and issues
- Research Paper - UFOยณ technical details
- Documentation Site - Full documentation
- Video Demo - YouTube demonstration
Next Steps:
- Start with Galaxy Quick Start for multi-device orchestration
- Or explore UFOยฒ Quick Start for single-device automation
- Check FAQ for common questions
- Join our community and contribute!