Welcome to UFO²'s Document!

arxivPython VersionLicense: MITgithubYouTube

Introduction

UFO now evolves into UFO² (Desktop AgentOS), a new generation of agent framework that can run on Windows desktop OS. It is designed to automate and orchestrate tasks across multiple applications, enabling users to seamlessly interact with their operating system using natural language commands beyond just UI automation.

✨ Key Capabilities

Feature Description
Deep OS Integration Combines Windows UIA, Win32 and WinCOM for first‑class control detection and native commands.
Picture‑in‑Picture Desktop (coming soon) Automation runs in a sandboxed virtual desktop so you can keep using your main screen.
Hybrid GUI + API Actions Chooses native APIs when available, falls back to clicks/keystrokes when not—fast and robust.
Speculative Multi‑Action Bundles several predicted steps into one LLM call, validated live—up to 51 % fewer queries.
Continuous Knowledge Substrate Mixes docs, Bing search, user demos and execution traces via RAG for agents that learn over time.
UIA + Visual Control Detection Detects standard and custom controls with a hybrid UIA + vision pipeline.

Please refer to the UFO² paper and the hyperlinked sections for more details on each capability.


🏗️ Architecture overview

UFO² architecture

UFO² operates as a Desktop AgentOS, encompassing a multi-agent framework that includes:

  1. HostAgent – Parses the natural‑language goal, launches the necessary applications, spins up / coordinates AppAgents, and steers a global finite‑state machine (FSM).
  2. AppAgents – One per application; each runs a ReAct loop with multimodal perception, hybrid control detection, retrieval‑augmented knowledge, and the Puppeteer executor that chooses between GUI actions and native APIs.
  3. Knowledge Substrate – Blends offline documentation, online search, demonstrations, and execution traces into a vector store that is retrieved on‑the‑fly at inference.
  4. Speculative Executor – Slashes LLM latency by predicting batches of likely actions and validating them against live UIA state in a single shot.
  5. Picture‑in‑Picture Desktop (coming soon) – Runs the agent in an isolated virtual desktop so your main workspace and input devices remain untouched.

For a deep dive see our technical report.


🚀 Quick Start

Please follow the Quick Start Guide to get started with UFO.

🌐 Media Coverage

Check out our official deep dive of UFO on this Youtube Video.

UFO sightings have garnered attention from various media outlets, including:

❓Get help


📚 Citation

If you build on this work, please cite our the AgentOS framework:

UFO² – The Desktop AgentOS (2025)
https://arxiv.org/abs/2504.14603

@article{zhang2025ufo2,
  title   = {{UFO2: The Desktop AgentOS}},
  author  = {Zhang, Chaoyun and Huang, He and Ni, Chiming and Mu, Jian and Qin, Si and He, Shilin and Wang, Lu and Yang, Fangkai and Zhao, Pu and Du, Chao and Li, Liqun and Kang, Yu and Jiang, Zhao and Zheng, Suzhen and Wang, Rujia and Qian, Jiaxu and Ma, Minghua and Lou, Jian-Guang and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei},
  journal = {arXiv preprint arXiv:2504.14603},
  year    = {2025}
}

UFO – A UI‑Focused Agent for Windows OS Interaction (2024)
https://arxiv.org/abs/2402.07939

@article{zhang2024ufo,
  title   = {{UFO: A UI-Focused Agent for Windows OS Interaction}},
  author  = {Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi},
  journal = {arXiv preprint arXiv:2402.07939},
  year    = {2024}
}

📝 Roadmap

The UFO² team is actively working on the following features and improvements:

  • Picture‑in‑Picture Mode – Completed and will be available in the next release
  • AgentOS‑as‑a‑Service – Completed and will be available in the next release
  • Auto‑Debugging Toolkit – Completed and will be available in the next release
  • Integration with MCP and Agent2Agent Communication – Planned; under implementation