🧩 Setting up UFO with Windows Agent Arena (WAA)

Windows Agent Arena (WAA) is a benchmark suite designed to evaluate the performance of AI agents in executing real-world tasks on Windows operating systems. It consists of 154 tasks across 15 applications, including Microsoft Office, Edge, File Explorer, and VS Code. The tasks are designed to cover a wide range of functionalities and interactions that users typically perform on their computers.

This repository provides a modified version of Windows Agent Arena (WAA) 🪟, a scalable platform for benchmarking and evaluating multimodal desktop AI agents. This customized fork integrates with UFO, a UI-focused automation agent for Windows OS.


We strongly recommend reviewing the original WAA deployment guide beforehand. The instructions below assume you are familiar with the original setup.


1. Clone the Repository

git clone https://github.com/nice-mee/WindowsAgentArena.git

💡 To run OSWorld cases, switch to the dedicated development branch:

git checkout 2020-qqtcg/dev

Create a config.json file in the repo root with a placeholder key (UFO will override this):

{
  "OPENAI_API_KEY": "placeholder"
}

2. Build the Docker Image

Navigate to the scripts directory and build the Docker image:

cd scripts
chmod +x build-container-image.sh prepare-agents.sh  # (if needed)
./build-container-image.sh --build-base-image true

This will generate the windowsarena/winarena:latest image using the latest codebase in src/.


3. Integrate UFO

  1. Configure UFO via ufo/config/config.json (see UFO repo for details).
  2. Copy the entire ufo folder into the WAA container client directory:
cp -r src/win-arena-container/vm/setup/mm_agents/UFO/ufo src/win-arena-container/client/

⚠️ Python 3.9 Compatibility Fix
In ufo/llm/openai.py, swap the order of @staticmethod and @functools.lru_cache() to prevent issues due to a known Python 3.9 bug.


4. Prepare the Windows 11 Virtual Machine

4.1 Download the ISO

  1. Go to the Microsoft Evaluation Center
  2. Accept the terms and download Windows 11 Enterprise Evaluation (English, 90-day trial) (~6GB)
  3. Rename the file to setup.iso and place it in:
WindowsAgentArena/src/win-arena-container/vm/image

4.2 Generate the Golden Image Snapshot

Prepare the Windows VM snapshot (a fully provisioned 30GB image):

cd ./scripts
./run-local.sh --mode dev --prepare-image true

⚠️ Do not interact with the VM during preparation. It will shut down automatically when complete.

The golden image will be saved in:

WindowsAgentArena/src/win-arena-container/vm/storage

5. Initial Run (First Boot Setup)

Launch the environment:

./run-local.sh --mode dev --json-name "evaluation_examples_windows/test_custom.json" --agent UFO --agent-settings '{"llm_type": "azure", "llm_endpoint": "https://cloudgpt-openai.azure-api.net/openai/deployments/gpt-4o-20240513/chat/completions?api-version=2024-04-01-preview", "llm_auth": {"type": "api-key", "token": ""}}'

Once the VM boots:

  1. Do not enter the device code (this keeps the WAA server alive indefinitely).
  2. Visit http://localhost:8006 and perform the following setup actions:
  3. Disable Windows Firewall
  4. Open Google Chrome and complete initial setup
  5. Open VLC and complete initial setup

After setup:

  • Stop the client
  • Backup the golden image from the storage folder

🧪 Running Experiments

Before each experiment:

  1. Replace the VM image with your prepared golden snapshot
  2. Clear any previous UFO logs

Then run:

./run-local.sh --mode dev --json-name "evaluation_examples_windows/test_full.json" --agent UFO --agent-settings '{"llm_type": "azure", "llm_endpoint": "https://cloudgpt-openai.azure-api.net/openai/deployments/gpt-4o-20240513/chat/completions?api-version=2024-04-01-preview", "llm_auth": {"type": "api-key", "token": ""}}'

Note

  • test_full.json: Contains all test cases where UIA is available.
  • test_all.json: Includes all test cases, even those incompatible with UIA.
  • Use test_full.json if you're not using OmniParser.