Project Directory Structure

This repository implements UFOยณ, a multi-tier AgentOS architecture spanning from single-device automation (UFOยฒ) to cross-device orchestration (Galaxy). This document provides an overview of the directory structure to help you understand the codebase organization.

New to UFOยณ? Start with the Documentation Home for an introduction and Quick Start Guide to get up and running.

Architecture Overview:

  • ๐ŸŒŒ Galaxy: Multi-device DAG-based orchestration framework that coordinates agents across different platforms
  • ๐ŸŽฏ UFOยฒ: Single-device Windows desktop agent system that can serve as Galaxy's sub-agent
  • ๐Ÿ”Œ AIP: Agent Integration Protocol for cross-device communication
  • โš™๏ธ Modular Configuration: Type-safe configs in config/galaxy/ and config/ufo/

๐Ÿ“ฆ Root Directory Structure

UFO/
โ”œโ”€โ”€ galaxy/                 # ๐ŸŒŒ Multi-device orchestration framework
โ”œโ”€โ”€ ufo/                    # ๐ŸŽฏ Desktop AgentOS (can be Galaxy sub-agent)
โ”œโ”€โ”€ config/                 # โš™๏ธ Modular configuration system
โ”œโ”€โ”€ aip/                    # ๐Ÿ”Œ Agent Integration Protocol
โ”œโ”€โ”€ documents/              # ๐Ÿ“– MkDocs documentation site
โ”œโ”€โ”€ vectordb/               # ๐Ÿ—„๏ธ Vector database for RAG
โ”œโ”€โ”€ learner/                # ๐Ÿ“š Help document indexing tools
โ”œโ”€โ”€ record_processor/       # ๐ŸŽฅ Human demonstration parser
โ”œโ”€โ”€ dataflow/               # ๐Ÿ“Š Data collection pipeline
โ”œโ”€โ”€ model_worker/           # ๐Ÿค– Custom LLM deployment tools
โ”œโ”€โ”€ logs/                   # ๐Ÿ“ Execution logs (auto-generated)
โ”œโ”€โ”€ scripts/                # ๐Ÿ› ๏ธ Utility scripts
โ”œโ”€โ”€ tests/                  # ๐Ÿงช Unit and integration tests
โ””โ”€โ”€ requirements.txt        # ๐Ÿ“ฆ Python dependencies

๐ŸŒŒ Galaxy Framework (galaxy/)

The cross-device orchestration framework that transforms natural language requests into executable DAG workflows distributed across heterogeneous devices.

Directory Structure

galaxy/
โ”œโ”€โ”€ agents/                 # ๐Ÿค– Constellation orchestration agents
โ”‚   โ”œโ”€โ”€ agent/              # ConstellationAgent and basic agent classes
โ”‚   โ”œโ”€โ”€ states/             # Agent state machines
โ”‚   โ”œโ”€โ”€ processors/         # Request/result processing
โ”‚   โ””โ”€โ”€ presenters/         # Response formatting
โ”‚
โ”œโ”€โ”€ constellation/          # ๐ŸŒŸ Core DAG management system
โ”‚   โ”œโ”€โ”€ task_constellation.py    # TaskConstellation - DAG container
โ”‚   โ”œโ”€โ”€ task_star.py        # TaskStar - Task nodes
โ”‚   โ”œโ”€โ”€ task_star_line.py   # TaskStarLine - Dependency edges
โ”‚   โ”œโ”€โ”€ enums.py            # Enums for constellation components
โ”‚   โ”œโ”€โ”€ editor/             # Interactive DAG editing with undo/redo
โ”‚   โ””โ”€โ”€ orchestrator/       # Event-driven execution coordination
โ”‚
โ”œโ”€โ”€ session/                # ๐Ÿ“Š Session lifecycle management
โ”‚   โ”œโ”€โ”€ galaxy_session.py   # GalaxySession implementation
โ”‚   โ””โ”€โ”€ observers/          # Event-driven observers
โ”‚
โ”œโ”€โ”€ client/                 # ๐Ÿ“ก Device management
โ”‚   โ”œโ”€โ”€ constellation_client.py              # Device registration interface
โ”‚   โ”œโ”€โ”€ device_manager.py                    # Device management coordinator
โ”‚   โ”œโ”€โ”€ config_loader.py                     # Configuration loading
โ”‚   โ”œโ”€โ”€ components/         # Device registry, connection manager, etc.
โ”‚   โ””โ”€โ”€ support/            # Client support utilities
โ”‚
โ”œโ”€โ”€ core/                   # โšก Foundational components
โ”‚   โ”œโ”€โ”€ types.py            # Type system (protocols, dataclasses, enums)
โ”‚   โ”œโ”€โ”€ interfaces.py       # Interface definitions
โ”‚   โ”œโ”€โ”€ di_container.py     # Dependency injection container
โ”‚   โ””โ”€โ”€ events.py           # Event system
โ”‚
โ”œโ”€โ”€ visualization/          # ๐ŸŽจ Rich console visualization
โ”‚   โ”œโ”€โ”€ dag_visualizer.py   # DAG topology visualization
โ”‚   โ”œโ”€โ”€ task_display.py     # Task status displays
โ”‚   โ””โ”€โ”€ components/         # Visualization components
โ”‚
โ”œโ”€โ”€ prompts/                # ๐Ÿ’ฌ Prompt templates
โ”‚   โ”œโ”€โ”€ constellation_agent/ # ConstellationAgent prompts
โ”‚   โ””โ”€โ”€ share/              # Shared examples
โ”‚
โ”œโ”€โ”€ trajectory/             # ๐Ÿ“ˆ Execution trajectory parsing
โ”‚
โ”œโ”€โ”€ __main__.py             # ๐Ÿš€ Entry point: python -m galaxy
โ”œโ”€โ”€ galaxy.py               # Main Galaxy orchestrator
โ”œโ”€โ”€ galaxy_client.py        # Galaxy client interface
โ”œโ”€โ”€ README.md               # Galaxy overview
โ””โ”€โ”€ README_ZH.md            # Galaxy overview (Chinese)

Key Components

Component Description Documentation
ConstellationAgent AI-powered agent that generates and modifies task DAGs Galaxy Overview
TaskConstellation DAG container with validation and state management Constellation
TaskOrchestrator Event-driven execution coordinator Constellation Orchestrator
DeviceManager Multi-device coordination and assignment Device Manager
Visualization Rich console DAG monitoring Galaxy Overview

Galaxy Documentation:


๐ŸŽฏ UFOยฒ Desktop AgentOS (ufo/)

Single-device desktop automation system implementing a two-tier agent architecture (HostAgent + AppAgent) with hybrid GUI-API automation.

Directory Structure

ufo/
โ”œโ”€โ”€ agents/                 # Two-tier agent implementation
โ”‚   โ”œโ”€โ”€ agent/              # Base agent classes (HostAgent, AppAgent)
โ”‚   โ”œโ”€โ”€ states/             # State machine implementations
โ”‚   โ”œโ”€โ”€ processors/         # Processing strategy pipelines
โ”‚   โ”œโ”€โ”€ memory/             # Agent memory and blackboard
โ”‚   โ””โ”€โ”€ presenters/         # Response presentation logic
โ”‚
โ”œโ”€โ”€ server/                 # Server-client architecture components
โ”‚   โ”œโ”€โ”€ websocket_server.py # WebSocket server for remote agent control
โ”‚   โ””โ”€โ”€ handlers/           # Request handlers
โ”‚
โ”œโ”€โ”€ client/                 # MCP client and device management
โ”‚   โ”œโ”€โ”€ mcp/                # MCP server manager
โ”‚   โ”‚   โ”œโ”€โ”€ local_servers/  # Built-in MCP servers (UI, CLI, Office COM)
โ”‚   โ”‚   โ””โ”€โ”€ http_servers/   # Remote MCP servers (hardware, Linux)
โ”‚   โ”œโ”€โ”€ ufo_client.py       # UFOยฒ client implementation
โ”‚   โ””โ”€โ”€ computer.py         # Computer/device abstraction
โ”‚
โ”œโ”€โ”€ automator/              # GUI and API automation layer
โ”‚   โ”œโ”€โ”€ ui_control/         # GUI automation (inspector, controller)
โ”‚   โ”œโ”€โ”€ puppeteer/          # Execution orchestration
โ”‚   โ””โ”€โ”€ *_automator.py      # App-specific automators (Excel, Word, etc.)
โ”‚
โ”œโ”€โ”€ prompter/               # Prompt construction engines
โ”œโ”€โ”€ prompts/                # Jinja2 prompt templates
โ”‚   โ”œโ”€โ”€ host_agent/         # HostAgent prompts
โ”‚   โ”œโ”€โ”€ app_agent/          # AppAgent prompts
โ”‚   โ””โ”€โ”€ share/              # Shared components
โ”‚
โ”œโ”€โ”€ llm/                    # LLM provider integrations
โ”œโ”€โ”€ rag/                    # Retrieval-Augmented Generation
โ”œโ”€โ”€ trajectory/             # Task trajectory parsing
โ”œโ”€โ”€ experience/             # Self-experience learning
โ”œโ”€โ”€ module/                 # Core modules (session, round, context)
โ”œโ”€โ”€ config/                 # Legacy config support
โ”œโ”€โ”€ logging/                # Logging utilities
โ”œโ”€โ”€ utils/                  # Utility functions
โ”œโ”€โ”€ tools/                  # CLI tools (config conversion, etc.)
โ”‚
โ”œโ”€โ”€ __main__.py             # Entry point: python -m ufo
โ””โ”€โ”€ ufo.py                  # Main UFOยฒ orchestrator

Key Components

Component Description Documentation
HostAgent Desktop-level orchestration with 7-state FSM HostAgent Overview
AppAgent Application-level execution with 6-state FSM AppAgent Overview
MCP System Extensible command execution framework MCP Overview
Automator Hybrid GUI-API automation with fallback Core Features
RAG Knowledge retrieval from multiple sources Knowledge Substrate

UFOยฒ Documentation:


๐Ÿ”Œ Agent Integration Protocol (aip/)

Standardized message passing protocol for cross-device communication between Galaxy and UFOยฒ agents.

aip/
โ”œโ”€โ”€ messages.py             # Message types (Command, Result, Event, Error)
โ”œโ”€โ”€ protocol/               # Protocol definitions
โ”œโ”€โ”€ transport/              # Transport layers (HTTP, WebSocket, MQTT)
โ”œโ”€โ”€ endpoints/              # API endpoints
โ”œโ”€โ”€ extensions/             # Protocol extensions
โ””โ”€โ”€ resilience/             # Retry and error handling

Purpose: Enables Galaxy to coordinate UFOยฒ agents running on different devices and platforms through standardized messaging over HTTP/WebSocket.

Documentation: See AIP Overview for protocol details and Message Types for message specifications.


๐Ÿง Linux Agent

Lightweight CLI-based agent for Linux devices that integrates with Galaxy as a third-party device agent.

Key Features: - CLI Execution: Execute shell commands on Linux systems - Galaxy Integration: Register as device in Galaxy's multi-device orchestration - Simple Architecture: Minimal dependencies, easy deployment - Cross-Platform Tasks: Enable Windows + Linux workflows in Galaxy

Configuration: Configured in config/ufo/third_party.yaml under THIRD_PARTY_AGENT_CONFIG.LinuxAgent

Linux Agent Documentation:


๐Ÿ“ฑ Mobile Agent

Android device automation agent that enables UI automation, app control, and mobile-specific operations through ADB integration.

Key Features: - UI Automation: Touch, swipe, and text input via ADB - Visual Context: Screenshot capture and UI hierarchy analysis - App Management: Launch apps, navigate between applications - Galaxy Integration: Serve as mobile device in cross-platform workflows - Platform Support: Android devices (physical and emulators)

Configuration: Configured in config/ufo/third_party.yaml under THIRD_PARTY_AGENT_CONFIG.MobileAgent

Mobile Agent Documentation:


โš™๏ธ Configuration (config/)

Modular configuration system with type-safe schemas and auto-discovery.

config/
โ”œโ”€โ”€ galaxy/                 # Galaxy configuration
โ”‚   โ”œโ”€โ”€ agent.yaml.template     # ConstellationAgent LLM settings template
โ”‚   โ”œโ”€โ”€ agent.yaml              # ConstellationAgent LLM settings (active)
โ”‚   โ”œโ”€โ”€ constellation.yaml      # Constellation orchestration settings
โ”‚   โ”œโ”€โ”€ devices.yaml            # Multi-device registry
โ”‚   โ””โ”€โ”€ dag_templates/          # Pre-built DAG templates (future)
โ”‚
โ”œโ”€โ”€ ufo/                    # UFOยฒ configuration
โ”‚   โ”œโ”€โ”€ agents.yaml.template    # Agent LLM configs template
โ”‚   โ”œโ”€โ”€ agents.yaml             # Agent LLM configs (active)
โ”‚   โ”œโ”€โ”€ system.yaml             # System settings
โ”‚   โ”œโ”€โ”€ rag.yaml                # RAG settings
โ”‚   โ”œโ”€โ”€ mcp.yaml                # MCP server configs
โ”‚   โ”œโ”€โ”€ third_party.yaml        # Third-party agent configs (LinuxAgent, etc.)
โ”‚   โ””โ”€โ”€ prices.yaml             # API pricing data
โ”‚
โ”œโ”€โ”€ config_loader.py        # Auto-discovery config loader
โ””โ”€โ”€ config_schemas.py       # Pydantic validation schemas

Configuration Files:

  • Template files (.yaml.template) should be copied to .yaml and edited
  • Active config files (.yaml) contain API keys and should NOT be committed
  • Galaxy: Uses config/galaxy/agent.yaml for ConstellationAgent LLM settings
  • UFOยฒ: Uses config/ufo/agents.yaml for HostAgent/AppAgent LLM settings
  • Third-Party: Configure LinuxAgent and HardwareAgent in config/ufo/third_party.yaml
  • Use python -m ufo.tools.convert_config to migrate from legacy configs

Configuration Documentation:


๐Ÿ“– Documentation (documents/)

MkDocs documentation site with comprehensive guides and API references.

documents/
โ”œโ”€โ”€ docs/                   # Markdown documentation source
โ”‚   โ”œโ”€โ”€ getting_started/    # Installation and quick starts
โ”‚   โ”œโ”€โ”€ galaxy/             # Galaxy framework docs
โ”‚   โ”œโ”€โ”€ ufo2/               # UFOยฒ architecture docs
โ”‚   โ”œโ”€โ”€ linux/              # Linux agent documentation
โ”‚   โ”œโ”€โ”€ mcp/                # MCP server documentation
โ”‚   โ”œโ”€โ”€ aip/                # Agent Interaction Protocol docs
โ”‚   โ”œโ”€โ”€ configuration/      # Configuration guides
โ”‚   โ”œโ”€โ”€ infrastructure/     # Core infrastructure (agents, modules)
โ”‚   โ”œโ”€โ”€ server/             # Server-client architecture docs
โ”‚   โ”œโ”€โ”€ client/             # Client components docs
โ”‚   โ”œโ”€โ”€ tutorials/          # Step-by-step tutorials
โ”‚   โ”œโ”€โ”€ modules/            # Module-specific docs
โ”‚   โ””โ”€โ”€ about/              # Project information
โ”‚
โ”œโ”€โ”€ mkdocs.yml              # MkDocs configuration
โ””โ”€โ”€ site/                   # Generated static site

Documentation Sections:

Section Description
Getting Started Installation, quick starts, migration guides
Galaxy Multi-device orchestration, DAG workflows, device management
UFOยฒ Desktop agents, automation features, benchmarks
Linux Linux agent integration, CLI executor for Galaxy
MCP Server documentation, custom server development
AIP Agent Interaction Protocol, message types, transport layers
Configuration System settings, model configs, deployment
Infrastructure Core components, agent design, server-client architecture
Tutorials Creating agents, custom automators, advanced usage

๐Ÿ—„๏ธ Supporting Modules

VectorDB (vectordb/)

Vector database storage for RAG knowledge sources (help documents, execution traces, user demonstrations). See RAG Configuration for setup details.

Learner (learner/)

Tools for indexing help documents into vector database for RAG retrieval. Integrates with the Knowledge Substrate feature.

Record Processor (record_processor/)

Parses human demonstrations from Windows Step Recorder for learning from user actions.

Dataflow (dataflow/)

Data collection pipeline for Large Action Model (LAM) training. See the Dataflow documentation for workflow details.

Model Worker (model_worker/)

Custom LLM deployment tools for running local models. See Model Configuration for supported providers.

Logs (logs/)

Auto-generated execution logs organized by task and timestamp, including screenshots, UI trees, and agent actions.


๐ŸŽฏ Galaxy vs UFOยฒ vs Linux Agent vs Mobile Agent: When to Use What?

Aspect Galaxy UFOยฒ Linux Agent Mobile Agent
Scope Multi-device orchestration Single-device Windows automation Single-device Linux CLI Single-device Android automation
Use Cases Cross-platform workflows, distributed tasks Desktop automation, Office tasks Server management, CLI operations Mobile app testing, UI automation
Architecture DAG-based task workflows Two-tier state machines Simple CLI executor UI automation via ADB
Platform Orchestrator (platform-agnostic) Windows Linux Android
Complexity Complex multi-step workflows Simple to moderate tasks Simple command execution UI interaction and app control
Best For Cross-device collaboration Windows desktop tasks Linux server operations Mobile app automation
Integration Orchestrates all agents Can be Galaxy device Can be Galaxy device Can be Galaxy device

Choosing the Right Framework:

  • Use Galaxy when: Tasks span multiple devices/platforms, complex workflows with dependencies
  • Use UFOยฒ Standalone when: Single-device Windows automation, rapid prototyping
  • Use Linux Agent when: Linux server/CLI operations needed in Galaxy workflows
  • Use Mobile Agent when: Android device automation, mobile app testing, UI interactions
  • Best Practice: Galaxy orchestrates UFOยฒ (Windows) + Linux Agent (Linux) + Mobile Agent (Android) for comprehensive cross-platform tasks

๐Ÿš€ Quick Start

Galaxy Multi-Device Orchestration

# Interactive mode
python -m galaxy --interactive

# Single request
python -m galaxy --request "Your cross-device task"

Documentation: Galaxy Quick Start

UFOยฒ Desktop Automation

# Interactive mode
python -m ufo --task <task_name>

# With custom config
python -m ufo --task <task_name> --config_path config/ufo/

Documentation: UFOยฒ Quick Start


Getting Started

Galaxy Framework

UFOยฒ Desktop AgentOS

Linux Agent

Mobile Agent

MCP System

Agent Integration Protocol

Configuration


๐Ÿ—๏ธ Architecture Principles

UFOยณ follows SOLID principles and established software engineering patterns:

  • Single Responsibility: Each component has a focused purpose
  • Open/Closed: Extensible through interfaces and plugins
  • Interface Segregation: Focused interfaces for different capabilities
  • Dependency Inversion: Dependency injection for loose coupling
  • Event-Driven: Observer pattern for real-time monitoring
  • State Machines: Well-defined states and transitions for agents
  • Command Pattern: Encapsulated DAG editing with undo/redo

๐Ÿ“ Additional Resources


Next Steps:

  1. Start with Galaxy Quick Start for multi-device orchestration
  2. Or explore UFOยฒ Quick Start for single-device automation
  3. Check FAQ for common questions
  4. Join our community and contribute!