Speculative Multi-Action Execution

UFO² introduces Speculative Multi-Action Execution, a feature that allows agents to bundle multiple predicted steps into a single LLM call and validate them against the live application state. This approach can reduce LLM queries by up to 51% compared to inferring each action separately.

Overview

Traditional agent execution follows a sequential pattern: think → act → observe → think → act → observe. Each cycle requires a separate LLM inference, making complex tasks slow and expensive.

Speculative multi-action execution optimizes this by predicting a batch of likely actions upfront, then validating them against the live UI Automation state in a single execution pass:

Key Benefits:

Reduced LLM Calls: Up to 51% fewer inference requests for multi-step tasks
Faster Execution: Batch prediction eliminates per-action round-trips
Lower Costs: Fewer API calls reduce operational expenses
Maintained Accuracy: Live validation ensures actions remain correct

How It Works

When enabled, the agent:

Predicts Action Sequence: Uses contextual understanding to forecast likely next steps (e.g., "Open Excel → Navigate to cell A1 → Enter value → Save")
Validates Against Live State: Checks each predicted action against current UI Automation state
Executes Valid Actions: Runs all validated actions in sequence
Handles Failures Gracefully: Falls back to single-action mode if predictions fail validation

Configuration

Enable speculative multi-action execution in config/ufo/system.yaml:

# Action Configuration
ACTION_SEQUENCE: true  # Enable multi-action prediction and execution

Configuration Location: config/ufo/system.yaml (migrated from legacy config_dev.yaml)

For configuration migration details, see Configuration Migration Guide.

Implementation Details

The multi-action system is implemented through two core classes in ufo/agents/processors/schemas/actions.py:

ActionCommandInfo

Represents a single action with execution metadata:

Bases: BaseModel

The action information data.

`model_post_init(__context)`

Initialize the action string.

Source code in agents/processors/schemas/actions.py

def model_post_init(self, __context: Any) -> None:
    """
    Initialize the action string.
    """
    self.action_string = ActionCommandInfo.to_string(self.function, self.arguments)

`to_representation()`

Generate a function call representation string.

Source code in agents/processors/schemas/actions.py

def to_representation(self) -> str:
    """
    Generate a function call representation string.
    """
    components = []
    components.append(f"[Action] {self.action_string}")
    if self.target:
        target_info = ", ".join(
            f"{k}={v}"
            for k, v in self.target.model_dump(exclude_none=True).items()
            if k not in {"rect"}  # rect is not needed in representation
        )
        components.append(f"[Target] {target_info}")

    if self.result:
        components.append(f"[Status] {self.result.status}")
        if self.result.error:
            components.append(f"[Error] {self.result.error}")
        components.append(f"[Result] {self.result.result}")

    return "\n".join(components)

`to_string(command_name, params)` `staticmethod`

Generate a function call string.

Source code in agents/processors/schemas/actions.py

@staticmethod
def to_string(command_name: str, params: Dict[str, Any]) -> str:
    """
    Generate a function call string.
    """
    args_str = ", ".join(f"{k}={v!r}" for k, v in params.items())
    return f"{command_name}({args_str})"

Key Properties:

function: Action name (e.g., click, type_text)
arguments: Action parameters
target: UI element information
result: Execution result with status and error details
action_string: Human-readable representation

ListActionCommandInfo

Manages sequences of multiple actions:

A sequence of one-step actions.

Source code in agents/processors/schemas/actions.py

def __init__(self, actions: Optional[List[ActionCommandInfo]] = None):

    if actions is None:
        actions = []

    self._actions = actions
    self._length = len(actions)

`actions` `property`

Get the actions.

Returns:	`List[ActionCommandInfo]` – The actions.

`length` `property`

Get the length of the actions.

Returns:	`int` – The length of the actions.

`status` `property`

Get the status of the actions.

Returns:	`str` – The status of the actions.

`add_action(action)`

Add an action.

Parameters:	`action` (`ActionCommandInfo`) – The action.

Source code in agents/processors/schemas/actions.py

def add_action(self, action: ActionCommandInfo) -> None:
    """
    Add an action.
    :param action: The action.
    """
    self._actions.append(action)

`color_print(success_only=False)`

Pretty-print the action sequence using presenter.

Parameters:	`success_only` (`bool`, default: `False` ) – Whether to print only successful actions.

Source code in agents/processors/schemas/actions.py

def color_print(self, success_only: bool = False) -> None:
    """
    Pretty-print the action sequence using presenter.
    :param success_only: Whether to print only successful actions.
    """
    from ufo.agents.presenters import PresenterFactory

    presenter = PresenterFactory.create_presenter("rich")
    presenter.present_action_list(self, success_only=success_only)

`count_repeat_times(target_action, previous_actions)`

Get the times of the same action in the previous actions.

Parameters:	`target_action` (`ActionCommandInfo`) – The target action to count. `previous_actions` (`List[ActionCommandInfo \| Dict[str, Any]]`) – The previous actions.

Returns:	`int` – The times of the same action in the previous actions.

Source code in agents/processors/schemas/actions.py

def count_repeat_times(
    self,
    target_action: ActionCommandInfo,
    previous_actions: List[ActionCommandInfo | Dict[str, Any]],
) -> int:
    """
    Get the times of the same action in the previous actions.
    :param target_action: The target action to count.
    :param previous_actions: The previous actions.
    :return: The times of the same action in the previous actions.
    """

    count = 0
    for action in previous_actions[::-1]:
        if self.is_same_action(action, target_action):
            count += 1
        else:
            break
    return count

`get_function_calls(is_success_only=False)`

Get the function calls of the actions.

Parameters:	`is_success_only` (`bool`, default: `False` ) – Whether to get the successful actions only.

Returns:	`List[str]` – The function calls of the actions.

Source code in agents/processors/schemas/actions.py

def get_function_calls(self, is_success_only: bool = False) -> List[str]:
    """
    Get the function calls of the actions.
    :param is_success_only: Whether to get the successful actions only.
    :return: The function calls of the actions.
    """
    return [
        action.action_string
        for action in self.actions
        if not is_success_only or action.result.status == ResultStatus.SUCCESS
    ]

`get_results(success_only=False)`

Get the results of the actions.

Parameters:	`success_only` (`bool`, default: `False` ) – Whether to get the successful actions only.

Returns:	`List[Dict[str, Any]]` – The results of the actions.

Source code in agents/processors/schemas/actions.py

def get_results(self, success_only: bool = False) -> List[Dict[str, Any]]:
    """
    Get the results of the actions.
    :param success_only: Whether to get the successful actions only.
    :return: The results of the actions.
    """
    return [
        action.result.model_dump()
        for action in self.actions
        if not success_only or action.result.status == ResultStatus.SUCCESS
    ]

`get_target_info(success_only=False)`

Get the control logs of the actions.

Parameters:	`success_only` (`bool`, default: `False` ) – Whether to get the successful actions only.

Returns:	`List[Dict[str, Any]]` – The control logs of the actions.

Source code in agents/processors/schemas/actions.py

def get_target_info(self, success_only: bool = False) -> List[Dict[str, Any]]:
    """
    Get the control logs of the actions.
    :param success_only: Whether to get the successful actions only.
    :return: The control logs of the actions.
    """

    target_info = []

    for action in self.actions:
        if not success_only or action.result.status == ResultStatus.SUCCESS:
            if action.target:
                target_info.append(action.target.model_dump())
            else:
                target_info.append({})

    return target_info

`get_target_objects(success_only=False)`

Get the control logs of the actions.

Parameters:	`success_only` (`bool`, default: `False` ) – Whether to get the successful actions only.

Returns:	`List[TargetInfo]` – The control logs of the actions.

Source code in agents/processors/schemas/actions.py

def get_target_objects(self, success_only: bool = False) -> List[TargetInfo]:
    """
    Get the control logs of the actions.
    :param success_only: Whether to get the successful actions only.
    :return: The control logs of the actions.
    """
    target_objects = []

    for action in self.actions:
        if not success_only or action.result.status == ResultStatus.SUCCESS:
            if action.target:
                target_objects.append(action.target)

    return target_objects

`is_same_action(action1, action2)` `staticmethod`

Check whether the two actions are the same.

Parameters:	`action1` (`ActionCommandInfo \| Dict[str, Any]`) – The first action to compare. `action2` (`ActionCommandInfo \| Dict[str, Any]`) – The second action to compare.

Returns:	`bool` – Whether the two actions are the same.

Source code in agents/processors/schemas/actions.py

@staticmethod
def is_same_action(
    action1: ActionCommandInfo | Dict[str, Any],
    action2: ActionCommandInfo | Dict[str, Any],
) -> bool:
    """
    Check whether the two actions are the same.
    :param action1: The first action to compare.
    :param action2: The second action to compare.
    :return: Whether the two actions are the same.
    """

    if isinstance(action1, ActionCommandInfo):
        action_dict_1 = action1.model_dump()
    else:
        action_dict_1 = action1

    if isinstance(action2, ActionCommandInfo):
        action_dict_2 = action2.model_dump()
    else:
        action_dict_2 = action2

    return action_dict_1.get("function") == action_dict_2.get(
        "function"
    ) and action_dict_1.get("arguments") == action_dict_2.get("arguments")

`to_list_of_dicts(success_only=False, keep_keys=None, previous_actions=None)`

Convert the action sequence to a dictionary.

Parameters:	`success_only` (`bool`, default: `False` ) – Whether to convert the successful actions only. `previous_actions` (`Optional[List[ActionCommandInfo \| Dict[str, Any]]]`, default: `None` ) – The previous actions for repeat count calculation.

Returns:	`List[Dict[str, Any]]` – The dictionary of the action sequence.

Source code in agents/processors/schemas/actions.py

def to_list_of_dicts(
    self,
    success_only: bool = False,
    keep_keys: Optional[List[str]] = None,
    previous_actions: Optional[List[ActionCommandInfo | Dict[str, Any]]] = None,
) -> List[Dict[str, Any]]:
    """
    Convert the action sequence to a dictionary.
    :param success_only: Whether to convert the successful actions only.
    :param previous_actions: The previous actions for repeat count calculation.
    :return: The dictionary of the action sequence.
    """

    action_list = []
    for action in self.actions:
        if success_only and action.result.status != ResultStatus.SUCCESS:
            continue
        action_dict = action.model_dump()
        if keep_keys:
            action_dict = {k: v for k, v in action_dict.items() if k in keep_keys}
        if previous_actions:
            repeat_time = self.count_repeat_times(action, previous_actions)
            action_dict["repeat_time"] = repeat_time
        action_list.append(action_dict)
    return action_list

`to_representation(success_only=False)`

Convert the action sequence to a representation string.

Parameters:	`success_only` (`bool`, default: `False` ) – Whether to convert the successful actions only.

Returns:	`List[str]` – The representation string of the action sequence.

Source code in agents/processors/schemas/actions.py

def to_representation(
    self,
    success_only: bool = False,
) -> List[str]:
    """
    Convert the action sequence to a representation string.
    :param success_only: Whether to convert the successful actions only.
    :return: The representation string of the action sequence.
    """
    representations = []
    for action in self.actions:
        if success_only and action.result.status != ResultStatus.SUCCESS:
            continue
        representations.append(action.to_representation())
    return representations

`to_string(success_only=False, previous_actions=None)`

Convert the action sequence to a string.

Parameters:	`success_only` (`bool`, default: `False` ) – Whether to convert the successful actions only. `previous_actions` (`Optional[List[ActionCommandInfo]]`, default: `None` ) – The previous actions.

Returns:	`str` – The string of the action sequence.

Source code in agents/processors/schemas/actions.py

def to_string(
    self,
    success_only: bool = False,
    previous_actions: Optional[List[ActionCommandInfo]] = None,
) -> str:
    """
    Convert the action sequence to a string.
    :param success_only: Whether to convert the successful actions only.
    :param previous_actions: The previous actions.
    :return: The string of the action sequence.
    """
    return json.dumps(
        self.to_list_of_dicts(success_only, previous_actions), ensure_ascii=False
    )

Key Methods:

add_action(): Append action to sequence
to_list_of_dicts(): Serialize for logging/debugging
to_representation(): Generate human-readable summary
count_repeat_times(): Track repeated actions for loop detection
get_results(): Extract execution outcomes

Example Scenarios

Scenario 1: Excel Data Entry

Without multi-action:

Think → Open Excel → Observe → Think → Click A1 → Observe → Think → Type "Sales" → Observe → Think → Save → Observe

5 LLM calls

With multi-action:

Think → [Open Excel, Click A1, Type "Sales", Save] → Observe

1 LLM call (80% reduction)

Scenario 2: Email Composition

Single-action mode:

Think → Open Outlook → Think → Click New → Think → Enter recipient → Think → Enter subject → Think → Type body → Think → Send

7 LLM calls

Multi-action mode:

Think → [Open Outlook, Click New, Enter recipient, Enter subject, Type body, Send] → Observe

1 LLM call (85% reduction)

When to Use

Best for:

✅ Predictable workflows with clear action sequences
✅ Repetitive tasks (data entry, form filling)
✅ Applications with stable UI structures
✅ Cost-sensitive deployments requiring fewer LLM calls

Not recommended for:

❌ Highly dynamic UIs with frequent state changes
❌ Exploratory tasks requiring frequent observation
❌ Error-prone applications where validation is critical per step
❌ Tasks requiring user confirmation between actions

AppAgent Processing Strategy — How agents process and execute actions
Hybrid GUI-API Actions — Combining GUI automation with native APIs
System Configuration Reference — Complete system.yaml options
Configuration Migration — Migrating from legacy config_dev.yaml

Performance Considerations

Trade-offs:

Accuracy vs. Speed: Multi-action sacrifices per-step validation for batch efficiency
Memory Usage: Larger context windows needed to predict action sequences
Failure Recovery: Invalid predictions require full sequence rollback and retry

Optimization Tips:

Start Conservative: Test with ACTION_SEQUENCE: false before enabling
Monitor Validation Rates: High rejection rates indicate poor prediction quality
Combine with Hybrid Actions: Use API-based execution where possible for fastest performance
Tune MAX_STEP: Set appropriate MAX_STEP limits in system.yaml to prevent runaway sequences

Speculative Multi-Action Execution

Overview

How It Works

Configuration

Implementation Details

ActionCommandInfo

model_post_init(__context)

to_representation()

to_string(command_name, params) staticmethod

ListActionCommandInfo

actions property

length property

status property

add_action(action)

color_print(success_only=False)

count_repeat_times(target_action, previous_actions)

get_function_calls(is_success_only=False)

get_results(success_only=False)

get_target_info(success_only=False)

get_target_objects(success_only=False)

is_same_action(action1, action2) staticmethod

to_list_of_dicts(success_only=False, keep_keys=None, previous_actions=None)

to_representation(success_only=False)

to_string(success_only=False, previous_actions=None)

Example Scenarios

When to Use

Related Documentation

Performance Considerations

`model_post_init(__context)`

`to_representation()`

`to_string(command_name, params)` `staticmethod`

`actions` `property`

`length` `property`

`status` `property`

`add_action(action)`

`color_print(success_only=False)`

`count_repeat_times(target_action, previous_actions)`

`get_function_calls(is_success_only=False)`

`get_results(success_only=False)`

`get_target_info(success_only=False)`

`get_target_objects(success_only=False)`

`is_same_action(action1, action2)` `staticmethod`

`to_list_of_dicts(success_only=False, keep_keys=None, previous_actions=None)`

`to_representation(success_only=False)`

`to_string(success_only=False, previous_actions=None)`