UICollector Server

Overview

UICollector is a data collection MCP server that provides comprehensive UI observation and information retrieval capabilities for the UFO² framework. It automatically gathers screenshots, window lists, control information, and UI trees to build the observation context for LLM decision-making.

Server Type: Data Collection
Deployment: Local (in-process)
Agent: HostAgent, AppAgent
LLM-Selectable: ❌ No (automatically invoked by framework)

Server Information

Property	Value
Namespace	`UICollector`
Server Name	`UFO UI Data MCP Server`
Platform	Windows
Backend	UIAutomation (UIA) or Win32
Tool Type	`data_collection`
Tool Key Format	`data_collection::{tool_name}`

Tools

1. get_desktop_app_info

Get information about all application windows currently open on the desktop.

Description

Retrieves a list of all visible application windows on the Windows desktop, including window names, types, and identifiers. This is typically the first step in UI automation workflows to discover available applications.

Parameters

Parameter	Type	Required	Default	Description
`remove_empty`	`bool`	No	`True`	Whether to remove windows with no visible content
`refresh_app_windows`	`bool`	No	`True`	Whether to refresh the list of application windows

Returns

Type: List[Dict[str, Any]]

List of window information dictionaries, each containing:

{
    "id": str,           # Unique window identifier (e.g., "1", "2", "3")
    "name": str,         # Window title/text
    "type": str,         # Control type (e.g., "Window", "Pane")
    "kind": str          # Target kind: "window"
}

Example

result = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::get_desktop_app_info",
        tool_name="get_desktop_app_info",
        parameters={
            "remove_empty": True,
            "refresh_app_windows": True
        }
    )
])

# Example output:
[
    {
        "id": "1",
        "name": "Visual Studio Code",
        "type": "Window",
        "kind": "window"
    },
    {
        "id": "2",
        "name": "Microsoft Edge",
        "type": "Window",
        "kind": "window"
    }
]

2. get_desktop_app_target_info

Get comprehensive target information for all desktop application windows.

Description

Similar to get_desktop_app_info, but returns TargetInfo objects instead of plain dictionaries. This provides a more structured representation of window information for internal framework use.

Parameters

Parameter	Type	Required	Default	Description
`remove_empty`	`bool`	No	`True`	Whether to remove windows with no visible content
`refresh_app_windows`	`bool`	No	`True`	Whether to refresh the list of application windows

Returns

Type: List[TargetInfo]

List of TargetInfo objects with properties: - id: Unique identifier - name: Window title - type: Control type - kind: TargetKind.WINDOW

3. get_app_window_info

Get detailed information about the currently selected application window.

Description

Retrieves specific fields of information for the active/selected window. You must select a window using select_application_window (HostUIExecutor) before calling this tool.

Parameters

Parameter	Type	Required	Default	Description
`field_list`	`List[str]`	Yes	-	List of field names to retrieve

Supported Fields

Common fields include: - "control_text": Window title/text - "control_type": Control type (e.g., "Window") - "control_rect": Bounding rectangle coordinates - "process_id": Process ID - "class_name": Window class name - "is_visible": Visibility status - "is_enabled": Enabled status

Returns

Type: Dict[str, Any]

Dictionary mapping field names to their values.

Example

# First select a window
await computer.run_actions([
    MCPToolCall(
        tool_key="action::select_application_window",
        parameters={"id": "1", "name": "Calculator"}
    )
])

# Then get window info
result = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::get_app_window_info",
        tool_name="get_app_window_info",
        parameters={
            "field_list": ["control_text", "control_type", "control_rect"]
        }
    )
])

# Example output:
{
    "control_text": "Calculator",
    "control_type": "Window",
    "control_rect": {"x": 100, "y": 100, "width": 400, "height": 600}
}

4. get_app_window_controls_info

Get information about all UI controls in the selected application window.

Description

Scans the currently selected window and retrieves information about all interactive controls (buttons, text boxes, etc.). This is essential for understanding what actions can be performed on the window.

Parameters

Parameter	Type	Required	Default	Description
`field_list`	`List[str]`	Yes	-	List of field names to retrieve for each control

Supported Fields

"label": Control identifier/label
"control_text": Text content of the control
"control_type": Type of control (Button, Edit, etc.)
"control_rect": Bounding rectangle
"is_enabled": Whether control is enabled
"is_visible": Whether control is visible

Returns

Type: List[Dict[str, Any]]

List of dictionaries, each representing one UI control.

Example

result = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::get_app_window_controls_info",
        tool_name="get_app_window_controls_info",
        parameters={
            "field_list": ["label", "control_text", "control_type"]
        }
    )
])

# Example output:
[
    {
        "label": "1",
        "control_text": "Submit",
        "control_type": "Button"
    },
    {
        "label": "2",
        "control_text": "",
        "control_type": "Edit"
    }
]

5. get_app_window_controls_target_info

Get TargetInfo objects for all controls in the selected window.

Description

Similar to get_app_window_controls_info, but returns structured TargetInfo objects for internal framework use.

Parameters

Parameter	Type	Required	Default	Description
`field_list`	`List[str]`	Yes	-	List of field names to retrieve

Returns

Type: List[TargetInfo]

List of TargetInfo objects, each with: - kind: TargetKind.CONTROL - id: Control identifier - name: Control text - type: Control type - rect: Bounding rectangle - source: "uia"

6. capture_window_screenshot

Capture a screenshot of the currently selected application window.

Description

Takes a screenshot of the active window and returns it as base64-encoded image data. This is crucial for visual observation and LLM vision capabilities.

Parameters

None

Returns

Type: str

Base64-encoded PNG image data.

Example

result = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::capture_window_screenshot",
        tool_name="capture_window_screenshot",
        parameters={}
    )
])

# Result is base64 string: "iVBORw0KGgoAAAANSUhEUgAA..."

Error Handling

Returns error message string if screenshot capture fails:

"Error: No window selected"
"Error capturing screenshot: {error_details}"

7. capture_desktop_screenshot

Capture a screenshot of the entire desktop or primary screen.

Description

Takes a screenshot of the desktop environment, either all monitors or just the primary screen.

Parameters

Parameter	Type	Required	Default	Description
`all_screens`	`bool`	No	`True`	Capture all screens (True) or primary screen only (False)

Returns

Type: str

Base64-encoded PNG image data of the desktop screenshot.

Example

# Capture all screens
result = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::capture_desktop_screenshot",
        tool_name="capture_desktop_screenshot",
        parameters={"all_screens": True}
    )
])

# Capture primary screen only
result = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::capture_desktop_screenshot",
        tool_name="capture_desktop_screenshot",
        parameters={"all_screens": False}
    )
])

8. get_ui_tree

Get the complete UI tree structure for the selected window.

Description

Retrieves the hierarchical structure of all UI elements in the window as a tree. This provides deep insight into the window's layout and control relationships.

Parameters

None

Returns

Type: Dict[str, Any]

UI tree structure as a nested dictionary representing the control hierarchy.

Example

result = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::get_ui_tree",
        tool_name="get_ui_tree",
        parameters={}
    )
])

# Example output (simplified):
{
    "control_type": "Window",
    "name": "Calculator",
    "children": [
        {
            "control_type": "Pane",
            "name": "Display",
            "children": [...]
        },
        {
            "control_type": "Button",
            "name": "1"
        }
    ]
}

Error Handling

Returns error dictionary if UI tree extraction fails:

{"error": "No window selected"}
{"error": "Error getting UI tree: {details}"}

Configuration

Basic Configuration

HostAgent:
  default:
    data_collection:
      - namespace: UICollector
        type: local
        reset: false

AppAgent:
  default:
    data_collection:
      - namespace: UICollector
        type: local
        reset: false

Configuration Options

Option	Type	Description
`namespace`	`str`	Must be `"UICollector"`
`type`	`str`	Deployment type: `"local"`
`reset`	`bool`	Whether to reset server state between tasks

Internal State

The UICollector maintains shared state across operations:

photographer: Screenshot capture facade
control_inspector: UI control inspection facade
selected_app_window: Currently selected window (set by HostUIExecutor)
last_app_windows: Cached list of desktop windows
control_dict: Dictionary mapping control IDs to control objects

Usage Patterns

Pattern 1: Complete Desktop Observation

# 1. Get all windows
windows = await computer.run_actions([
    MCPToolCall(tool_key="data_collection::get_desktop_app_info", ...)
])

# 2. Capture desktop screenshot
screenshot = await computer.run_actions([
    MCPToolCall(tool_key="data_collection::capture_desktop_screenshot", ...)
])

# 3. Select target window
await computer.run_actions([
    MCPToolCall(
        tool_key="action::select_application_window",
        parameters={"id": "1", "name": "Calculator"}
    )
])

# 4. Get window controls
controls = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::get_app_window_controls_info",
        parameters={"field_list": ["label", "control_text", "control_type"]}
    )
])

Pattern 2: Window-Specific Observation

# After window is selected by HostUIExecutor...

# Get window info
window_info = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::get_app_window_info",
        parameters={"field_list": ["control_text", "control_rect"]}
    )
])

# Get window screenshot
screenshot = await computer.run_actions([
    MCPToolCall(tool_key="data_collection::capture_window_screenshot", ...)
])

# Get UI controls
controls = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::get_app_window_controls_info",
        parameters={"field_list": ["label", "control_text"]}
    )
])

Best Practices

1. Caching Window Lists

# First call: refresh windows
windows = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::get_desktop_app_info",
        parameters={"refresh_app_windows": True}
    )
])

# Subsequent calls: use cached data
windows = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::get_desktop_app_info",
        parameters={"refresh_app_windows": False}  # Faster
    )
])

2. Selective Field Retrieval

# ✅ Good: Only request needed fields
controls = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::get_app_window_controls_info",
        parameters={"field_list": ["label", "control_text"]}
    )
])

# ❌ Bad: Don't request unnecessary fields
controls = await computer.run_actions([
    MCPToolCall(
        tool_key="data_collection::get_app_window_controls_info",
        parameters={"field_list": [
            "label", "control_text", "control_type", "control_rect",
            "is_visible", "is_enabled", "automation_id", "class_name"
        ]}  # Too many fields slow down processing
    )
])

3. Error Handling

# Always check for window selection
window_info = await computer.run_actions([
    MCPToolCall(tool_key="data_collection::get_app_window_info", ...)
])

if "error" in window_info[0].content[0].text:
    # No window selected
    # Select window first...

Data Collection Overview - Data collection concepts
HostUIExecutor - Window selection server
AppUIExecutor - UI action execution
Local Servers - Local server deployment

UICollector Server

Overview

Server Information

Tools

1. get_desktop_app_info

Description

Parameters

Returns

Example

2. get_desktop_app_target_info

Description

Parameters

Returns

3. get_app_window_info

Description

Parameters

Supported Fields

Returns

Example

4. get_app_window_controls_info

Description

Parameters

Supported Fields

Returns

Example

5. get_app_window_controls_target_info

Description

Parameters

Returns

6. capture_window_screenshot

Description

Parameters

Returns

Example

Error Handling

7. capture_desktop_screenshot

Description

Parameters

Returns

Example

8. get_ui_tree

Description

Parameters

Returns

Example

Error Handling

Configuration

Basic Configuration

Configuration Options

Internal State

Usage Patterns

Pattern 1: Complete Desktop Observation

Pattern 2: Window-Specific Observation

Best Practices

1. Caching Window Lists

2. Selective Field Retrieval

3. Error Handling

Related Documentation