Speculative Multi-Action Execution

UFO² introduces Speculative Multi-Action Execution, a feature that allows agents to bundle multiple predicted steps into a single LLM call and validate them against the live application state. This approach can reduce LLM queries by up to 51% compared to inferring each action separately.

Overview

Traditional agent execution follows a sequential pattern: think → act → observe → think → act → observe. Each cycle requires a separate LLM inference, making complex tasks slow and expensive.

Speculative multi-action execution optimizes this by predicting a batch of likely actions upfront, then validating them against the live UI Automation state in a single execution pass:

Speculative Multi-Action Execution

Key Benefits:

  • Reduced LLM Calls: Up to 51% fewer inference requests for multi-step tasks
  • Faster Execution: Batch prediction eliminates per-action round-trips
  • Lower Costs: Fewer API calls reduce operational expenses
  • Maintained Accuracy: Live validation ensures actions remain correct

How It Works

When enabled, the agent:

  1. Predicts Action Sequence: Uses contextual understanding to forecast likely next steps (e.g., "Open Excel → Navigate to cell A1 → Enter value → Save")
  2. Validates Against Live State: Checks each predicted action against current UI Automation state
  3. Executes Valid Actions: Runs all validated actions in sequence
  4. Handles Failures Gracefully: Falls back to single-action mode if predictions fail validation

Configuration

Enable speculative multi-action execution in config/ufo/system.yaml:

# Action Configuration
ACTION_SEQUENCE: true  # Enable multi-action prediction and execution

Configuration Location: config/ufo/system.yaml (migrated from legacy config_dev.yaml)

For configuration migration details, see Configuration Migration Guide.

Implementation Details

The multi-action system is implemented through two core classes in ufo/agents/processors/schemas/actions.py:

ActionCommandInfo

Represents a single action with execution metadata:

Bases: BaseModel

The action information data.

model_post_init(__context)

Initialize the action string.

Source code in agents/processors/schemas/actions.py
61
62
63
64
65
def model_post_init(self, __context: Any) -> None:
    """
    Initialize the action string.
    """
    self.action_string = ActionCommandInfo.to_string(self.function, self.arguments)

to_representation()

Generate a function call representation string.

Source code in agents/processors/schemas/actions.py
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
def to_representation(self) -> str:
    """
    Generate a function call representation string.
    """
    components = []
    components.append(f"[Action] {self.action_string}")
    if self.target:
        target_info = ", ".join(
            f"{k}={v}"
            for k, v in self.target.model_dump(exclude_none=True).items()
            if k not in {"rect"}  # rect is not needed in representation
        )
        components.append(f"[Target] {target_info}")

    if self.result:
        components.append(f"[Status] {self.result.status}")
        if self.result.error:
            components.append(f"[Error] {self.result.error}")
        components.append(f"[Result] {self.result.result}")

    return "\n".join(components)

to_string(command_name, params) staticmethod

Generate a function call string.

Source code in agents/processors/schemas/actions.py
67
68
69
70
71
72
73
@staticmethod
def to_string(command_name: str, params: Dict[str, Any]) -> str:
    """
    Generate a function call string.
    """
    args_str = ", ".join(f"{k}={v!r}" for k, v in params.items())
    return f"{command_name}({args_str})"

Key Properties:

  • function: Action name (e.g., click, type_text)
  • arguments: Action parameters
  • target: UI element information
  • result: Execution result with status and error details
  • action_string: Human-readable representation

ListActionCommandInfo

Manages sequences of multiple actions:

A sequence of one-step actions.

Source code in agents/processors/schemas/actions.py
103
104
105
106
107
108
109
def __init__(self, actions: Optional[List[ActionCommandInfo]] = None):

    if actions is None:
        actions = []

    self._actions = actions
    self._length = len(actions)

actions property

Get the actions.

Returns:

length property

Get the length of the actions.

Returns:
  • int

    The length of the actions.

status property

Get the status of the actions.

Returns:
  • str

    The status of the actions.

add_action(action)

Add an action.

Parameters:
Source code in agents/processors/schemas/actions.py
143
144
145
146
147
148
def add_action(self, action: ActionCommandInfo) -> None:
    """
    Add an action.
    :param action: The action.
    """
    self._actions.append(action)

color_print(success_only=False)

Pretty-print the action sequence using presenter.

Parameters:
  • success_only (bool, default: False ) –

    Whether to print only successful actions.

Source code in agents/processors/schemas/actions.py
207
208
209
210
211
212
213
214
215
def color_print(self, success_only: bool = False) -> None:
    """
    Pretty-print the action sequence using presenter.
    :param success_only: Whether to print only successful actions.
    """
    from ufo.agents.presenters import PresenterFactory

    presenter = PresenterFactory.create_presenter("rich")
    presenter.present_action_list(self, success_only=success_only)

count_repeat_times(target_action, previous_actions)

Get the times of the same action in the previous actions.

Parameters:
Returns:
  • int

    The times of the same action in the previous actions.

Source code in agents/processors/schemas/actions.py
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
def count_repeat_times(
    self,
    target_action: ActionCommandInfo,
    previous_actions: List[ActionCommandInfo | Dict[str, Any]],
) -> int:
    """
    Get the times of the same action in the previous actions.
    :param target_action: The target action to count.
    :param previous_actions: The previous actions.
    :return: The times of the same action in the previous actions.
    """

    count = 0
    for action in previous_actions[::-1]:
        if self.is_same_action(action, target_action):
            count += 1
        else:
            break
    return count

get_function_calls(is_success_only=False)

Get the function calls of the actions.

Parameters:
  • is_success_only (bool, default: False ) –

    Whether to get the successful actions only.

Returns:
  • List[str]

    The function calls of the actions.

Source code in agents/processors/schemas/actions.py
308
309
310
311
312
313
314
315
316
317
318
def get_function_calls(self, is_success_only: bool = False) -> List[str]:
    """
    Get the function calls of the actions.
    :param is_success_only: Whether to get the successful actions only.
    :return: The function calls of the actions.
    """
    return [
        action.action_string
        for action in self.actions
        if not is_success_only or action.result.status == ResultStatus.SUCCESS
    ]

get_results(success_only=False)

Get the results of the actions.

Parameters:
  • success_only (bool, default: False ) –

    Whether to get the successful actions only.

Returns:
  • List[Dict[str, Any]]

    The results of the actions.

Source code in agents/processors/schemas/actions.py
263
264
265
266
267
268
269
270
271
272
273
def get_results(self, success_only: bool = False) -> List[Dict[str, Any]]:
    """
    Get the results of the actions.
    :param success_only: Whether to get the successful actions only.
    :return: The results of the actions.
    """
    return [
        action.result.model_dump()
        for action in self.actions
        if not success_only or action.result.status == ResultStatus.SUCCESS
    ]

get_target_info(success_only=False)

Get the control logs of the actions.

Parameters:
  • success_only (bool, default: False ) –

    Whether to get the successful actions only.

Returns:
  • List[Dict[str, Any]]

    The control logs of the actions.

Source code in agents/processors/schemas/actions.py
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
def get_target_info(self, success_only: bool = False) -> List[Dict[str, Any]]:
    """
    Get the control logs of the actions.
    :param success_only: Whether to get the successful actions only.
    :return: The control logs of the actions.
    """

    target_info = []

    for action in self.actions:
        if not success_only or action.result.status == ResultStatus.SUCCESS:
            if action.target:
                target_info.append(action.target.model_dump())
            else:
                target_info.append({})

    return target_info

get_target_objects(success_only=False)

Get the control logs of the actions.

Parameters:
  • success_only (bool, default: False ) –

    Whether to get the successful actions only.

Returns:
  • List[TargetInfo]

    The control logs of the actions.

Source code in agents/processors/schemas/actions.py
293
294
295
296
297
298
299
300
301
302
303
304
305
306
def get_target_objects(self, success_only: bool = False) -> List[TargetInfo]:
    """
    Get the control logs of the actions.
    :param success_only: Whether to get the successful actions only.
    :return: The control logs of the actions.
    """
    target_objects = []

    for action in self.actions:
        if not success_only or action.result.status == ResultStatus.SUCCESS:
            if action.target:
                target_objects.append(action.target)

    return target_objects

is_same_action(action1, action2) staticmethod

Check whether the two actions are the same.

Parameters:
Returns:
  • bool

    Whether the two actions are the same.

Source code in agents/processors/schemas/actions.py
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
@staticmethod
def is_same_action(
    action1: ActionCommandInfo | Dict[str, Any],
    action2: ActionCommandInfo | Dict[str, Any],
) -> bool:
    """
    Check whether the two actions are the same.
    :param action1: The first action to compare.
    :param action2: The second action to compare.
    :return: Whether the two actions are the same.
    """

    if isinstance(action1, ActionCommandInfo):
        action_dict_1 = action1.model_dump()
    else:
        action_dict_1 = action1

    if isinstance(action2, ActionCommandInfo):
        action_dict_2 = action2.model_dump()
    else:
        action_dict_2 = action2

    return action_dict_1.get("function") == action_dict_2.get(
        "function"
    ) and action_dict_1.get("arguments") == action_dict_2.get("arguments")

to_list_of_dicts(success_only=False, keep_keys=None, previous_actions=None)

Convert the action sequence to a dictionary.

Parameters:
  • success_only (bool, default: False ) –

    Whether to convert the successful actions only.

  • previous_actions (Optional[List[ActionCommandInfo | Dict[str, Any]]], default: None ) –

    The previous actions for repeat count calculation.

Returns:
  • List[Dict[str, Any]]

    The dictionary of the action sequence.

Source code in agents/processors/schemas/actions.py
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
def to_list_of_dicts(
    self,
    success_only: bool = False,
    keep_keys: Optional[List[str]] = None,
    previous_actions: Optional[List[ActionCommandInfo | Dict[str, Any]]] = None,
) -> List[Dict[str, Any]]:
    """
    Convert the action sequence to a dictionary.
    :param success_only: Whether to convert the successful actions only.
    :param previous_actions: The previous actions for repeat count calculation.
    :return: The dictionary of the action sequence.
    """

    action_list = []
    for action in self.actions:
        if success_only and action.result.status != ResultStatus.SUCCESS:
            continue
        action_dict = action.model_dump()
        if keep_keys:
            action_dict = {k: v for k, v in action_dict.items() if k in keep_keys}
        if previous_actions:
            repeat_time = self.count_repeat_times(action, previous_actions)
            action_dict["repeat_time"] = repeat_time
        action_list.append(action_dict)
    return action_list

to_representation(success_only=False)

Convert the action sequence to a representation string.

Parameters:
  • success_only (bool, default: False ) –

    Whether to convert the successful actions only.

Returns:
  • List[str]

    The representation string of the action sequence.

Source code in agents/processors/schemas/actions.py
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
def to_representation(
    self,
    success_only: bool = False,
) -> List[str]:
    """
    Convert the action sequence to a representation string.
    :param success_only: Whether to convert the successful actions only.
    :return: The representation string of the action sequence.
    """
    representations = []
    for action in self.actions:
        if success_only and action.result.status != ResultStatus.SUCCESS:
            continue
        representations.append(action.to_representation())
    return representations

to_string(success_only=False, previous_actions=None)

Convert the action sequence to a string.

Parameters:
  • success_only (bool, default: False ) –

    Whether to convert the successful actions only.

  • previous_actions (Optional[List[ActionCommandInfo]], default: None ) –

    The previous actions.

Returns:
  • str

    The string of the action sequence.

Source code in agents/processors/schemas/actions.py
176
177
178
179
180
181
182
183
184
185
186
187
188
189
def to_string(
    self,
    success_only: bool = False,
    previous_actions: Optional[List[ActionCommandInfo]] = None,
) -> str:
    """
    Convert the action sequence to a string.
    :param success_only: Whether to convert the successful actions only.
    :param previous_actions: The previous actions.
    :return: The string of the action sequence.
    """
    return json.dumps(
        self.to_list_of_dicts(success_only, previous_actions), ensure_ascii=False
    )

Key Methods:

  • add_action(): Append action to sequence
  • to_list_of_dicts(): Serialize for logging/debugging
  • to_representation(): Generate human-readable summary
  • count_repeat_times(): Track repeated actions for loop detection
  • get_results(): Extract execution outcomes

Example Scenarios

Scenario 1: Excel Data Entry

Without multi-action:

Think → Open Excel → Observe → Think → Click A1 → Observe → Think → Type "Sales" → Observe → Think → Save → Observe
5 LLM calls

With multi-action:

Think → [Open Excel, Click A1, Type "Sales", Save] → Observe
1 LLM call (80% reduction)

Scenario 2: Email Composition

Single-action mode:

Think → Open Outlook → Think → Click New → Think → Enter recipient → Think → Enter subject → Think → Type body → Think → Send
7 LLM calls

Multi-action mode:

Think → [Open Outlook, Click New, Enter recipient, Enter subject, Type body, Send] → Observe
1 LLM call (85% reduction)

When to Use

Best for:

✅ Predictable workflows with clear action sequences
✅ Repetitive tasks (data entry, form filling)
✅ Applications with stable UI structures
✅ Cost-sensitive deployments requiring fewer LLM calls

Not recommended for:

❌ Highly dynamic UIs with frequent state changes
❌ Exploratory tasks requiring frequent observation
❌ Error-prone applications where validation is critical per step
❌ Tasks requiring user confirmation between actions

Performance Considerations

Trade-offs:

  • Accuracy vs. Speed: Multi-action sacrifices per-step validation for batch efficiency
  • Memory Usage: Larger context windows needed to predict action sequences
  • Failure Recovery: Invalid predictions require full sequence rollback and retry

Optimization Tips:

  1. Start Conservative: Test with ACTION_SEQUENCE: false before enabling
  2. Monitor Validation Rates: High rejection rates indicate poor prediction quality
  3. Combine with Hybrid Actions: Use API-based execution where possible for fastest performance
  4. Tune MAX_STEP: Set appropriate MAX_STEP limits in system.yaml to prevent runaway sequences