Device Agent Architecture

Device Agents are the execution engines of UFO3's multi-device orchestration system. Each device agent operates as an autonomous, intelligent controller that translates high-level user intentions into low-level system commands. The architecture is designed for extensibility, safety, and scalability across heterogeneous computing environments.

Overview

UFO3 orchestrates tasks across multiple devices through a network of Device Agents. Originally designed as a Windows automation framework (UFO2), the architecture has evolved to support diverse platforms including Linux, macOS, and embedded systems. This document describes the abstract design principles and interfaces that enable this multi-platform capability.

Key Capabilities:

Multi-Platform: Windows agents (HostAgent, AppAgent), Linux agent, extensible to macOS and embedded systems
Safe Execution: Server-client separation isolates reasoning from system-level operations
Scalable Architecture: Hierarchical agent coordination supports complex cross-device workflows
LLM-Driven Reasoning: Dynamic decision-making using large language models
Modular Design: Three-layer architecture (State, Strategy, Command) enables customization

Three-Layer Architecture

Device agents implement a three-layer framework that separates concerns, promotes modularity, and enables extensibility:

graph TB subgraph "Device Agent Architecture" subgraph "Level-1: State Layer (FSM)" S1[AgentState] S2[State Machine] S3[State Transitions] S1 --> S2 --> S3 end subgraph "Level-2: Strategy Layer (Execution Logic)" P1[ProcessorTemplate Strategy Orchestrator] P2[DATA_COLLECTION Strategies] P3[LLM_INTERACTION Strategies] P4[ACTION_EXECUTION Strategies] P5[MEMORY_UPDATE Strategies] P1 -->|manages & executes| P2 P2 --> P3 --> P4 --> P5 end subgraph "Level-3: Command Layer (System Interface)" C1[CommandDispatcher] C2[MCP Tools] C3[Atomic Commands] C1 --> C2 --> C3 end S3 -->|delegates to| P1 P5 -->|executes via| C1 end LLM[Large Language Model] P3 -.->|reasoning| LLM LLM -.->|decisions| P4

Layer Responsibilities

Layer	Level	Responsibility	Key Components	Extensibility
State	Level-1	Finite State Machine governing agent lifecycle	`AgentState`, `AgentStateManager`, `AgentStatus`	Register new states via `@AgentStateManager.register`
Strategy	Level-2	Execution logic layer: processor manages sequence of modular strategies	`ProcessorTemplate`, `ProcessingStrategy`, `ProcessingPhase`, `Middleware`	Compose custom strategies via `ComposedStrategy`, add middleware
Command	Level-3	Atomic system operations mapped to MCP tools	`BasicCommandDispatcher`, `Command`, MCP integration	Add new tools via client-side MCP server registration

Design Rationale:

The three-layer separation ensures:

State Layer (Level-1): Controls when and what to execute (state transitions, agent handoff)
Strategy Layer (Level-2): Defines how to execute (processor orchestrates modular strategies)
Command Layer (Level-3): Performs actual execution (deterministic system operations)

This separation allows replacing individual layers without affecting others.

Level-1: State Layer (FSM)

The State Layer implements a Finite State Machine (FSM) that governs the agent's execution lifecycle. Each state encapsulates:

A processor (strategy execution logic)
Transition rules (to next state)
Agent handoff logic (for multi-agent workflows)

stateDiagram-v2 [*] --> CONTINUE CONTINUE --> CONTINUE: Success CONTINUE --> PENDING: Wait for external event CONTINUE --> CONFIRM: User confirmation needed CONTINUE --> SCREENSHOT: Capture observation CONTINUE --> FINISH: Task complete CONTINUE --> FAIL: Error occurred CONTINUE --> ERROR: Critical failure PENDING --> CONTINUE: Event received CONFIRM --> CONTINUE: User confirmed CONFIRM --> FAIL: User rejected SCREENSHOT --> CONTINUE: Screenshot captured FINISH --> [*] FAIL --> [*] ERROR --> [*]

AgentStatus Enum

class AgentStatus(Enum):
    """Agent status enumeration"""
    ERROR = "ERROR"            # Critical error occurred
    FINISH = "FINISH"          # Task completed successfully
    CONTINUE = "CONTINUE"      # Normal execution
    FAIL = "FAIL"              # Task failed
    PENDING = "PENDING"        # Waiting for external event
    CONFIRM = "CONFIRM"        # Awaiting user confirmation
    SCREENSHOT = "SCREENSHOT"  # Screenshot capture needed

State Registration:

New states can be registered dynamically using the @AgentStateManager.register decorator:

@AgentStateManager.register
class CustomState(AgentState):
    async def handle(self, agent, context):
        # Custom state logic
        pass

    def next_state(self, agent):
        return AgentStateManager.get_state("CONTINUE")

See State Layer Documentation for complete details.

Level-2: Strategy Layer (Execution Logic)

The Strategy Layer implements the execution logic within each state. Each state encapsulates a processor that manages a sequence of strategies to implement step-level workflow. This layer consists of two key components:

Processor: Strategy Orchestrator

The ProcessorTemplate orchestrates the execution of strategies:

Registers Strategies: Configures which strategies execute in each phase
Manages Middleware: Wraps strategy execution with logging, metrics, error handling
Validates Dependencies: Ensures strategies have required data before execution
Controls Execution: Sequences strategies through fixed workflow phases

Processor and Strategy Relationship:

Processor: Framework that manages the sequence of strategies
Strategy: Modular, reusable execution units

Together they form Level-2: Strategy Layer, which handles: - Data collection and environment inspection - Prompt construction and LLM reasoning - Action planning and tool invocation - Memory updates and context synchronization

Strategy: Modular Execution Units

ProcessingStrategies are modular execution units with a unified execute() interface:

graph LR A[DATA_COLLECTION] --> B[LLM_INTERACTION] B --> C[ACTION_EXECUTION] C --> D[MEMORY_UPDATE] A1[Screenshots UI Info System Status] --> A B1[Prompt Construction LLM Call Response Parsing] --> B C1[Command Dispatch MCP Execution Result Handling] --> C D1[Memory Items Blackboard Update Context Sync] --> D

Four Core Strategy Types

Strategy Type	ProcessingPhase	Purpose	Examples
DATA_COLLECTION	`data_collection`	Gather contextual information	Screenshot capture, UI tree extraction, system info
LLM_INTERACTION	`llm_interaction`	Construct prompts, interact with LLM, parse responses	Prompt building, LLM reasoning, JSON parsing
ACTION_EXECUTION	`action_execution`	Execute commands from LLM/toolkits	Click, type, scroll, API calls
MEMORY_UPDATE	`memory_update`	Update short-term/long-term memory	Add memory items, update blackboard, sync context

Strategy Layer Configuration Example:

Each state configures its processor with strategies and middleware:

class AppAgentProcessor(ProcessorTemplate):
    def _setup_strategies(self):
        # Register strategies for each phase
        self.strategies[ProcessingPhase.DATA_COLLECTION] = ComposedStrategy([
            AppScreenshotCaptureStrategy(),
            AppControlInfoStrategy()
        ])
        self.strategies[ProcessingPhase.LLM_INTERACTION] = AppLLMInteractionStrategy()
        self.strategies[ProcessingPhase.ACTION_EXECUTION] = AppActionExecutionStrategy()
        self.strategies[ProcessingPhase.MEMORY_UPDATE] = AppMemoryUpdateStrategy()

    def _setup_middleware(self):
        # Add middleware for logging, metrics, error handling
        self.middleware_chain = [
            LoggingMiddleware(),
            PerformanceMetricsMiddleware(),
            ErrorHandlingMiddleware()
        ]

See Processor Documentation and Strategy Documentation for complete details.

Level-3: Command Layer (System Interface)

The Command Layer provides atomic, deterministic system operations. Each command maps to an MCP tool that executes on the device client.

sequenceDiagram participant Agent as Device Agent (Server) participant Dispatcher as CommandDispatcher participant Protocol as AIP Protocol participant Client as Device Client participant MCP as MCP Tool Agent->>Dispatcher: execute_commands([command1, command2]) Dispatcher->>Protocol: Send ServerMessage (COMMAND) Protocol->>Client: WebSocket (AIP) Client->>MCP: Route to MCP server MCP->>MCP: Execute tool function MCP->>Client: Return result Client->>Protocol: Send ClientMessage (RESULT) Protocol->>Dispatcher: Receive results Dispatcher->>Agent: Return List[Result]

Command Structure

@dataclass
class Command:
    """Atomic command to be executed on device client"""
    tool_name: str                   # MCP tool name (e.g., "click_element")
    parameters: Dict[str, Any]       # Tool arguments
    tool_type: str                   # "data_collection" or "action"
    call_id: str                     # Unique identifier

Deterministic Execution

Commands are designed to be:

Atomic: Single, indivisible operation
Deterministic: Same inputs → same outputs
Auditable: Full command history logged
Reversible: Where possible, support undo operations

Extensibility:

New commands can be added by:

Registering MCP tool on device client
LLM dynamically selects tool from available MCP registry
No server-side code changes required

See Command Layer Documentation for complete details.

Server-Client Architecture

Device agents use a server-client separation to balance safety, scalability, and functionality:

graph TB subgraph "Server Side (UFO3 Orchestrator)" Server[Device Agent Server] State[State Machine] Processor[Strategy Processor] LLM[LLM Service] Memory[Memory & Context] Server --> State Server --> Processor Server --> Memory Processor -.-> LLM end subgraph "Communication Layer" AIP[AIP Protocol WebSocket] end subgraph "Client Side (Device)" Client[Device Client] Dispatcher[Command Dispatcher] MCP[MCP Server Manager] Tools[MCP Tools] OS[Operating System] Client --> Dispatcher Dispatcher --> MCP MCP --> Tools Tools --> OS end Server <-->|Commands/Results| AIP AIP <-->|Commands/Results| Client

Separation of Concerns

Component	Location	Responsibilities	Security Boundary
Agent Server	Orchestrator	State management, reasoning, planning, memory	Untrusted (LLM-driven decisions)
Device Client	Device	Command execution, MCP tool calls, resource access	Trusted (validated operations)
AIP Protocol	Communication	Message serialization, WebSocket transport, error handling	Secure channel (authentication, encryption)

Why Server-Client Separation?

Safety: Isolates potentially unsafe LLM-generated decisions from direct system access. Clients validate all commands before execution.

Scalability: Single orchestrator server manages multiple device clients. Reduces per-device resource requirements.

Flexibility: Device clients can run on resource-constrained devices (embedded systems, mobile) while heavy reasoning occurs on server.

See Server-Client Architecture for complete details.

Supported Device Platforms

UFO3 currently supports Windows and Linux device agents, with architecture designed for extensibility to other platforms.

Windows Agents

HostAgent (Application-Level Coordinator): - Selects appropriate application(s) for user request - Decomposes tasks into application-specific subtasks - Coordinates multiple AppAgents - Manages application switching and data transfer

AppAgent (Application-Level Executor): - Controls specific Windows application (Word, Excel, browser, etc.) - Uses UI Automation for control element discovery - Executes application-specific actions (type, click, scroll) - Maintains application context and memory

Windows Agent Example

User Request: "Create a chart from sales.xlsx and insert into report.docx"

HostAgent decomposes:
- Open Excel → Create chart → Copy chart
- Open Word → Paste chart
AppAgent (Excel): Opens sales.xlsx, creates chart, copies to clipboard
AppAgent (Word): Opens report.docx, pastes chart at cursor

Linux Agent

graph TB subgraph "Linux Device (Single-Tier Architecture)" Linux[LinuxAgent Direct System Control] Shell[Shell Commands] Files[File Operations] Apps[Application Launch] Linux --> Shell Linux --> Files Linux --> Apps end User[User Request] --> Linux

LinuxAgent (System-Level Executor): - Direct shell command execution - File system operations - Application launch and management - Single-tier architecture (no application-level hierarchy)

Architecture Difference

Windows uses two-tier hierarchy (HostAgent → AppAgent) due to:

UI Automation framework's application-centric model
Distinct application contexts requiring specialized agents

Linux uses single-tier architecture because:

Shell provides unified interface to all system operations
Application control occurs through same command-line interface

Platform Comparison

Feature	Windows (UFO2)	Linux	macOS (Future)	Embedded (Future)
Agent Hierarchy	Two-tier (Host → App)	Single-tier	TBD	Single-tier
UI Control	UI Automation	X11/Wayland	Accessibility API	Platform-specific
Command Interface	MCP tools (Win32 API)	MCP tools (Shell)	MCP tools (AppleScript)	MCP tools (Custom)
Observation	Screenshot + UI tree	Screenshot + Shell output	Screenshot + UI tree	Sensor data
State Management	Shared FSM	Shared FSM	Shared FSM	Shared FSM
Strategy Layer	Processor framework	Processor framework	Processor framework	Processor framework
Current Status	✅ Production	✅ Production	🔜 Planned	🔜 Planned

Extensibility Path:

Adding a new platform requires:

Implement Agent Class: Extend BasicAgent (inherit State layer, Processor framework)
Create Processor: Subclass ProcessorTemplate, implement platform-specific strategies
Define MCP Tools: Register platform-specific MCP tools on device client
Register Agent: Use @AgentRegistry.register decorator

No changes to core State layer, Processor framework, or AIP protocol required.

See Agent Types Documentation for complete implementation details.

Agent Lifecycle

A typical device agent execution follows this lifecycle:

sequenceDiagram participant User participant Orchestrator participant Agent participant State participant Processor participant LLM participant Dispatcher participant Client User->>Orchestrator: Submit task Orchestrator->>Agent: Initialize agent (CONTINUE state) loop Until FINISH/FAIL/ERROR Agent->>State: handle(agent, context) State->>Processor: execute strategies Processor->>Processor: DATA_COLLECTION Note over Processor: Screenshot, UI info Processor->>LLM: LLM_INTERACTION LLM-->>Processor: Action decision Processor->>Dispatcher: ACTION_EXECUTION Dispatcher->>Client: Execute commands Client-->>Dispatcher: Results Dispatcher-->>Processor: Results Processor->>Processor: MEMORY_UPDATE Note over Processor: Update memory, blackboard State->>State: next_state(agent) State->>Agent: Update agent status end Agent->>Orchestrator: Task complete/failed Orchestrator->>User: Return result

Execution Phases

Initialization: Agent created with default state (CONTINUE), processor, memory
State Handling: Current state's handle() method invoked with agent and context
Strategy Execution: Processor runs strategies in sequence (DATA_COLLECTION → LLM_INTERACTION → ACTION_EXECUTION → MEMORY_UPDATE)
State Transition: State's next_state() determines next FSM state
Repeat/Terminate: Loop continues until terminal state (FINISH, FAIL, ERROR)

Multi-Agent Handoff

For multi-agent scenarios (e.g., Windows HostAgent → AppAgent), states implement next_agent():

def next_agent(self, agent: BasicAgent) -> BasicAgent:
    # HostAgent delegates to AppAgent
    if agent.status == "DELEGATE_TO_APP":
        return agent.create_app_agent(...)
    return agent

Memory and Context Management

Device agents maintain two types of memory:

Short-Term Memory (Agent Memory)

Purpose: Track agent's execution history within a session

Implementation: Memory class with MemoryItem entries

class Memory:
    """Agent's short-term memory"""
    _content: List[MemoryItem]

    def add_memory_item(self, memory_item: MemoryItem):
        """Add new memory entry"""
        self._content.append(memory_item)

Content: Actions taken, observations made, results received

Lifetime: Single session (cleared between tasks)

Long-Term Memory (Blackboard)

Purpose: Share information across agents and sessions

Implementation: Blackboard class with multiple memory types

class Blackboard:
    """Multi-agent shared memory"""
    _questions: Memory      # Q&A history
    _requests: Memory       # Request history
    _trajectories: Memory   # Action trajectories
    _screenshots: Memory    # Visual observations

Content: Common knowledge, successful action patterns, user preferences

Lifetime: Persistent across sessions (can be saved/loaded)

Blackboard Usage Example:

Scenario: HostAgent delegates to AppAgent (Excel)

HostAgent adds to blackboard:
- Request: "Create sales chart"
- Context: Previous analysis results
AppAgent reads from blackboard:
- Retrieves request and context
- Adds action trajectories as executed
- Adds screenshot after chart creation
HostAgent reads updated blackboard:
- Verifies chart creation
- Continues to next step (insert to Word)

See Memory System Documentation for complete details.

Integration with UFO3 Components

Device agents integrate with other UFO3 components:

Integration Points

Component	Relationship	Description
Session Manager	Parent	Creates agents, manages agent lifecycle, coordinates multi-agent workflows
Round Manager	Sibling	Manages round-based execution, tracks round state, synchronizes with agent steps
Global Context	Shared State	Agent reads request/config, writes results/status, shares data across components
Command Dispatcher	Execution Interface	Agent sends commands, dispatcher routes to client, returns results
AIP Protocol	Communication	Serializes commands/results, manages WebSocket, handles errors/timeouts
Device Client	Executor	Receives commands, invokes MCP tools, returns results
MCP Servers	Tool Registry	Provides available tools, executes tool functions, returns structured results

See Session Documentation, Context Documentation, and AIP Protocol for integration details.

Design Patterns

Device agent architecture leverages several design patterns:

1. State Pattern (FSM Layer)

Purpose: Encapsulate state-specific behavior, enable dynamic state transitions

Implementation: AgentState abstract class, concrete state classes

class AgentState(ABC):
    @abstractmethod
    async def handle(self, agent, context):
        """Execute state-specific logic"""
        pass

    @abstractmethod
    def next_state(self, agent):
        """Determine next state"""
        pass

2. Strategy Pattern (Strategy Layer)

Purpose: Define family of algorithms (strategies), make them interchangeable

Implementation: ProcessingStrategy protocol, concrete strategy classes

class ProcessingStrategy(Protocol):
    async def execute(self, agent, context) -> ProcessingResult:
        """Execute strategy logic"""
        pass

3. Template Method Pattern (Processor Framework)

Purpose: Define skeleton of algorithm, let subclasses override specific steps

Implementation: ProcessorTemplate abstract class

class ProcessorTemplate(ABC):
    @abstractmethod
    def _setup_strategies(self):
        """Subclass defines which strategies to use"""
        pass

    async def process(self, agent, context):
        """Template method - runs strategies in sequence"""
        for phase, strategy in self.strategies.items():
            result = await strategy.execute(agent, context)
            # Handle result, update context

4. Singleton Pattern (State Manager)

Purpose: Ensure single instance of state registry

Implementation: AgentStateManager with metaclass

class AgentStateManager(ABC, metaclass=SingletonABCMeta):
    _state_mapping: Dict[str, Type[AgentState]] = {}

    def get_state(self, status: str) -> AgentState:
        """Lazy load and return state instance"""
        pass

5. Registry Pattern (Agent Registration)

Purpose: Register agent types, enable dynamic agent creation

Implementation: AgentRegistry decorator

@AgentRegistry.register(agent_name="appagent", processor_cls=AppAgentProcessor)
class AppAgent(BasicAgent):
    pass

6. Blackboard Pattern (Multi-Agent Coordination)

Purpose: Share data across multiple agents

Implementation: Blackboard class

class Blackboard:
    _questions: Memory
    _requests: Memory
    _trajectories: Memory
    _screenshots: Memory

Best Practices

State Design

Keep states focused: Each state should have single, clear responsibility
Use rule-based transitions for deterministic flows, LLM-driven transitions for adaptive behavior
Implement error states for graceful degradation
Document state invariants and transition conditions

Strategy Design

Keep strategies atomic: Each strategy should perform one cohesive task
Declare dependencies explicitly using get_dependencies()
Use ComposedStrategy to combine multiple strategies within a phase
Implement fail-fast for critical errors, continue-on-error for optional operations

Command Design

Keep commands atomic: Single, indivisible operation
Design commands to be idempotent where possible
Validate arguments on client side before execution
Return structured results with success/failure status

Memory Management

Use short-term memory for agent-specific execution history
Use blackboard for multi-agent coordination and persistent knowledge
Clear memory between sessions to avoid context pollution
Implement memory pruning for long-running sessions

Security Considerations

Validate all commands on client side before execution
Sanitize LLM outputs before converting to commands
Limit command scope via MCP tool permissions
Audit all actions for compliance and debugging
Isolate agents to prevent unauthorized cross-agent access

Deep Dive Into Layers:

State Layer Documentation: FSM, AgentState, transitions, state registration
Processor and Strategy Documentation: ProcessorTemplate, strategies, dependency management
Command Layer Documentation: CommandDispatcher, MCP integration, atomic commands

Supporting Systems:

Memory System Documentation: Memory, MemoryItem, Blackboard patterns
Agent Types Documentation: Windows agents, Linux agent, platform-specific implementations

Integration Points:

Server-Client Architecture: Server and client separation, communication patterns
Server Architecture: Agent server, WebSocket manager, orchestration
Client Architecture: Device client, MCP servers, command execution
AIP Protocol: Agent Interaction Protocol for server-client communication
MCP Integration: Model Context Protocol for tool execution

Summary

Key Takeaways:

✅ Three-Layer Architecture: State (FSM) → Strategy (Execution Logic) → Command (System Interface)

✅ Server-Client Separation: Safe isolation of reasoning (server) from execution (client)

✅ Multi-Platform Support: Windows (two-tier), Linux (single-tier), extensible to macOS and embedded

✅ LLM-Driven Reasoning: Dynamic decision-making with structured command output

✅ Modular & Extensible: Register new states, compose strategies, add MCP tools without core changes

✅ Memory Systems: Short-term (agent memory) and long-term (blackboard) for coordination

✅ Design Patterns: State, Strategy, Template Method, Singleton, Registry, Blackboard

The Device Agent architecture provides a robust, extensible foundation for multi-device automation. By separating concerns across three layers and isolating reasoning from execution, UFO3 achieves both safety and flexibility for orchestrating complex cross-device workflows.

Reference

Below is the reference for the BasicAgent class. All device agents inherit from BasicAgent and implement platform-specific processors and states:

Bases: ABC

The BasicAgent class is the abstract class for the agent.

Initialize the BasicAgent.

Parameters:	`name` (`str`) – The name of the agent.

Source code in agents/agent/basic.py

def __init__(self, name: str) -> None:
    """
    Initialize the BasicAgent.
    :param name: The name of the agent.
    """
    self._step = 0
    self._complete = False
    self._name = name
    self._status = self.status_manager.CONTINUE.value
    self._register_self()
    self.retriever_factory = retriever.RetrieverFactory()
    self._memory = Memory()
    self._host = None
    self._processor: Optional[ProcessorTemplate] = None
    self._state = None
    self.logger = logging.getLogger(__name__)

    # Initialize presenter for output formatting
    from ufo.agents.presenters import PresenterFactory

    ufo_config = get_ufo_config()
    presenter_type = ufo_config.system.output_presenter
    self.presenter = PresenterFactory.create_presenter(presenter_type)

`blackboard` `property`

Get the blackboard.

Returns:	`Blackboard` – The blackboard.

`default_state` `property`

Get the default state of the agent.

Returns:	`AgentState` – The default state of the agent.

`host` `property` `writable`

Get the host of the agent.

Returns:	`HostAgent` – The host of the agent.

`memory` `property` `writable`

Get the memory of the agent.

Returns:	`Memory` – The memory of the agent.

`name` `property`

Get the name of the agent.

Returns:	`str` – The name of the agent.

`processor` `property` `writable`

Get the processor.

Returns:	`ProcessorTemplate` – The processor.

`state` `property`

Get the state of the agent.

Returns:	`AgentState` – The state of the agent.

`status` `property` `writable`

Get the status of the agent.

Returns:	`str` – The status of the agent.

`status_manager` `property`

Get the status manager.

Returns:	`AgentStatus` – The status manager.

`step` `property` `writable`

Get the step of the agent.

Returns:	`int` – The step of the agent.

`add_memory(memory_item)`

Update the memory of the agent.

Parameters:	`memory_item` (`MemoryItem`) – The memory item to add.

Source code in agents/agent/basic.py

def add_memory(self, memory_item: MemoryItem) -> None:
    """
    Update the memory of the agent.
    :param memory_item: The memory item to add.
    """
    self._memory.add_memory_item(memory_item)

`build_experience_retriever()`

Build the experience retriever.

Source code in agents/agent/basic.py

def build_experience_retriever(self) -> None:
    """
    Build the experience retriever.
    """
    pass

`build_human_demonstration_retriever()`

Build the human demonstration retriever.

Source code in agents/agent/basic.py

def build_human_demonstration_retriever(self) -> None:
    """
    Build the human demonstration retriever.
    """
    pass

`build_offline_docs_retriever()`

Build the offline docs retriever.

Source code in agents/agent/basic.py

def build_offline_docs_retriever(self) -> None:
    """
    Build the offline docs retriever.
    """
    pass

`build_online_search_retriever()`

Build the online search retriever.

Source code in agents/agent/basic.py

def build_online_search_retriever(self) -> None:
    """
    Build the online search retriever.
    """
    pass

`clear_memory()`

Clear the memory of the agent.

Source code in agents/agent/basic.py

def clear_memory(self) -> None:
    """
    Clear the memory of the agent.
    """
    self._memory.clear()

`context_provision()` `abstractmethod` `async`

Provide the context for the agent.

Source code in agents/agent/basic.py

@abstractmethod
async def context_provision(self) -> None:
    """
    Provide the context for the agent.
    """
    pass

`delete_memory(step)`

Delete the memory of the agent.

Parameters:	`step` (`int`) – The step of the memory item to delete.

Source code in agents/agent/basic.py

def delete_memory(self, step: int) -> None:
    """
    Delete the memory of the agent.
    :param step: The step of the memory item to delete.
    """
    self._memory.delete_memory_item(step)

`get_cls(name)` `classmethod`

Retrieves an agent class from the registry.

Parameters:	`name` (`str`) – The name of the agent class.

Returns:	`Type['BasicAgent']` – The agent class.

Source code in agents/agent/basic.py

@classmethod
def get_cls(cls, name: str) -> Type["BasicAgent"]:
    """
    Retrieves an agent class from the registry.
    :param name: The name of the agent class.
    :return: The agent class.
    """
    return AgentRegistry().get_cls(name)

`get_command_string(command_name, params)` `staticmethod`

Generate a function call string.

Parameters:	`command_name` (`str`) – The function name. `params` (`Dict[str, str]`) – The arguments as a dictionary.

Returns:	`str` – The function call string.

Source code in agents/agent/basic.py

@staticmethod
def get_command_string(command_name: str, params: Dict[str, str]) -> str:
    """
    Generate a function call string.
    :param command_name: The function name.
    :param params: The arguments as a dictionary.
    :return: The function call string.
    """
    # Format the arguments
    args_str = ", ".join(f"{k}={v!r}" for k, v in params.items())

    # Return the function call string
    return f"{command_name}({args_str})"

`get_prompter()` `abstractmethod`

Get the prompt for the agent.

Returns:	`str` – The prompt.

Source code in agents/agent/basic.py

@abstractmethod
def get_prompter(self) -> str:
    """
    Get the prompt for the agent.
    :return: The prompt.
    """
    pass

`get_response(message, namescope, use_backup_engine)` `classmethod`

Get the response for the prompt.

Parameters:	`message` (`List[dict]`) – The message for LLMs. `namescope` (`str`) – The namescope for the LLMs. `use_backup_engine` (`bool`) – Whether to use the backup engine.

Returns:	`Tuple[str, float]` – The response.

Source code in agents/agent/basic.py

@classmethod
def get_response(
    cls,
    message: List[dict],
    namescope: str,
    use_backup_engine: bool,
) -> Tuple[str, float]:
    """
    Get the response for the prompt.
    :param message: The message for LLMs.
    :param namescope: The namescope for the LLMs.
    :param use_backup_engine: Whether to use the backup engine.
    :return: The response.
    """
    response_string, cost = llm_call.get_completion(
        message, namescope, use_backup_engine=use_backup_engine
    )
    return response_string, cost

`handle(context)` `async`

Handle the agent.

Parameters:	`context` (`Context`) – The context for the agent.

Source code in agents/agent/basic.py

async def handle(self, context: Context) -> None:
    """
    Handle the agent.
    :param context: The context for the agent.
    """
    await self.state.handle(self, context)

`message_constructor()` `abstractmethod`

Construct the message.

Returns:	`List[Dict[str, Union[str, List[Dict[str, str]]]]]` – The message.

Source code in agents/agent/basic.py

@abstractmethod
def message_constructor(self) -> List[Dict[str, Union[str, List[Dict[str, str]]]]]:
    """
    Construct the message.
    :return: The message.
    """
    pass

`print_response()`

Print the response.

Source code in agents/agent/basic.py

def print_response(self) -> None:
    """
    Print the response.
    """
    pass

`process(context)` `async`

Process the agent.

Source code in agents/agent/basic.py

async def process(self, context: Context) -> None:
    """
    Process the agent.
    """
    pass

`process_asker(ask_user=True)`

Ask for the process.

Parameters:	`ask_user` (`bool`, default: `True` ) – Whether to ask the user for the questions.

Source code in agents/agent/basic.py

def process_asker(self, ask_user: bool = True) -> None:
    """
    Ask for the process.
    :param ask_user: Whether to ask the user for the questions.
    """

    _ask_message = "Could you please answer the following questions to help me understand your needs and complete the task?"
    _none_answer_message = "The answer for the question is not available, please proceed with your own knowledge or experience, or leave it as a placeholder. Do not ask the same question again."

    if self.processor:
        question_list = self.processor.processing_context.get_local("questions", [])

        if ask_user:
            console.print(
                f"❓ {_ask_message}",
                style="yellow",
            )

        for index, question in enumerate(question_list):
            if ask_user:
                answer = question_asker(question, index + 1)
                if not answer.strip():
                    continue
                qa_pair = {"question": question, "answer": answer}

                ufo_config = get_ufo_config()
                utils.append_string_to_file(
                    ufo_config.system.qa_pair_file, json.dumps(qa_pair)
                )

            else:
                qa_pair = {
                    "question": question,
                    "answer": _none_answer_message,
                }

            self.blackboard.add_questions(qa_pair)

`process_confirmation()` `abstractmethod`

Confirm the process.

Source code in agents/agent/basic.py

@abstractmethod
def process_confirmation(self) -> None:
    """
    Confirm the process.
    """
    pass

`process_resume()` `async`

Resume the process.

Source code in agents/agent/basic.py

async def process_resume(self) -> None:
    """
    Resume the process.
    """
    pass

`reflection()`

TODO: Reflect on the action.

Source code in agents/agent/basic.py

def reflection(self) -> None:
    """
    TODO:
    Reflect on the action.
    """
    pass

`response_to_dict(response)` `staticmethod`

Convert the response to a dictionary.

Parameters:	`response` (`str`) – The response.

Returns:	`Dict[str, str]` – The dictionary.

Source code in agents/agent/basic.py

@staticmethod
def response_to_dict(response: str) -> Dict[str, str]:
    """
    Convert the response to a dictionary.
    :param response: The response.
    :return: The dictionary.
    """
    return utils.json_parser(response)

`set_memory_from_list_of_dicts(data)`

Set the memory from the list of dictionaries.

Parameters:	`data` (`List[Dict[str, str]]`) – The list of dictionaries.

Source code in agents/agent/basic.py

def set_memory_from_list_of_dicts(self, data: List[Dict[str, str]]) -> None:
    """
    Set the memory from the list of dictionaries.
    :param data: The list of dictionaries.
    """

    assert isinstance(data, list), "The data should be a list of dictionaries."

    self._memory.from_list_of_dicts(data)

`set_state(state)`

Set the state of the agent.

Parameters:	`state` (`AgentState`) – The state of the agent.

Source code in agents/agent/basic.py

def set_state(self, state: AgentState) -> None:
    """
    Set the state of the agent.
    :param state: The state of the agent.
    """

    assert issubclass(
        type(self), state.agent_class()
    ), f"The state is only for agent type of {state.agent_class()}, but the current agent is {type(self)}."

    self._state = state

Device Agent Architecture

Overview

Three-Layer Architecture

Layer Responsibilities

Level-1: State Layer (FSM)

AgentStatus Enum

Level-2: Strategy Layer (Execution Logic)

Processor: Strategy Orchestrator

Strategy: Modular Execution Units

Four Core Strategy Types

Level-3: Command Layer (System Interface)

Command Structure

Server-Client Architecture

Separation of Concerns

Supported Device Platforms

Windows Agents

Linux Agent

Platform Comparison

Agent Lifecycle

Execution Phases

Memory and Context Management

Short-Term Memory (Agent Memory)

Long-Term Memory (Blackboard)

Integration with UFO3 Components

Integration Points

Design Patterns

1. State Pattern (FSM Layer)

2. Strategy Pattern (Strategy Layer)

3. Template Method Pattern (Processor Framework)

4. Singleton Pattern (State Manager)

5. Registry Pattern (Agent Registration)

6. Blackboard Pattern (Multi-Agent Coordination)

Best Practices

State Design

Strategy Design

Command Design

Memory Management

Related Documentation

Summary

Reference

blackboard property

default_state property

host property writable

memory property writable

name property

processor property writable

state property

status property writable

status_manager property

step property writable

add_memory(memory_item)

build_experience_retriever()

build_human_demonstration_retriever()

build_offline_docs_retriever()

build_online_search_retriever()

clear_memory()

context_provision() abstractmethod async

delete_memory(step)

get_cls(name) classmethod

get_command_string(command_name, params) staticmethod

get_prompter() abstractmethod

get_response(message, namescope, use_backup_engine) classmethod

handle(context) async

message_constructor() abstractmethod

print_response()

process(context) async

process_asker(ask_user=True)

process_confirmation() abstractmethod

process_resume() async

reflection()

response_to_dict(response) staticmethod

set_memory_from_list_of_dicts(data)

set_state(state)

`blackboard` `property`

`default_state` `property`

`host` `property` `writable`

`memory` `property` `writable`

`name` `property`

`processor` `property` `writable`

`state` `property`

`status` `property` `writable`

`status_manager` `property`

`step` `property` `writable`

`add_memory(memory_item)`

`build_experience_retriever()`

`build_human_demonstration_retriever()`

`build_offline_docs_retriever()`

`build_online_search_retriever()`

`clear_memory()`

`context_provision()` `abstractmethod` `async`

`delete_memory(step)`

`get_cls(name)` `classmethod`

`get_command_string(command_name, params)` `staticmethod`

`get_prompter()` `abstractmethod`

`get_response(message, namescope, use_backup_engine)` `classmethod`

`handle(context)` `async`

`message_constructor()` `abstractmethod`

`print_response()`

`process(context)` `async`

`process_asker(ask_user=True)`

`process_confirmation()` `abstractmethod`

`process_resume()` `async`

`reflection()`

`response_to_dict(response)` `staticmethod`

`set_memory_from_list_of_dicts(data)`

`set_state(state)`