Speculative Multi-Action Execution

UFO² introduces a new feature called Speculative Multi-Action Execution. This feature allows the agent to bundle several predicted steps into one LLM call, which are then validated live. This approach can lead to up to 51% fewer queries compared to inferring each step separately. The agent will first predict a batch of likely actions and then validate them against the live UIA state in a single shot. We illustrate the speculative multi-action execution in the figure below:

Speculative Multi-Action Execution

Configuration

To activate the speculative multi-action execution, you need to set ACTION_SEQUENCE to True in the config_dev.yaml file.

ACTION_SEQUENCE: True

References

The implementation of the speculative multi-action execution is located in the ufo/agents/processors/actions.py file. The following classes are used for the speculative multi-action execution:

Source code in agents/processors/actions.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def __init__(
    self,
    function: str = "",
    args: Dict[str, Any] = {},
    control_label: str = "",
    control_text: str = "",
    after_status: str = "",
    results: Optional[ActionExecutionLog] = None,
    configs=Config.get_instance().config_data,
):
    self._function = function
    self._args = args
    self._control_label = control_label
    self._control_text = control_text
    self._after_status = after_status
    self._results = ActionExecutionLog() if results is None else results
    self._configs = configs
    self._control_log = BaseControlLog()

after_status property

Get the status.

Returns:
  • str

    The status.

args property

Get the arguments.

Returns:
  • Dict[str, Any]

    The arguments.

command_string property

Generate a function call string.

Returns:
  • str

    The function call string.

control_label property

Get the control label.

Returns:
  • str

    The control label.

control_log property writable

Get the control log.

Returns:
  • BaseControlLog

    The control log.

control_text property

Get the control text.

Returns:
  • str

    The control text.

function property

Get the function name.

Returns:
  • str

    The function.

results property writable

Get the results.

Returns:
  • ActionExecutionLog

    The results.

action_flow(puppeteer, control_dict, application_window)

Execute the action flow.

Parameters:
  • puppeteer (AppPuppeteer) –

    The puppeteer that controls the application.

  • control_dict (Dict[str, UIAWrapper]) –

    The control dictionary.

  • application_window (UIAWrapper) –

    The application window where the control is located.

Returns:
  • Tuple[ActionExecutionLog, BaseControlLog]

    The action execution log.

Source code in agents/processors/actions.py
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
def action_flow(
    self,
    puppeteer: AppPuppeteer,
    control_dict: Dict[str, UIAWrapper],
    application_window: UIAWrapper,
) -> Tuple[ActionExecutionLog, BaseControlLog]:
    """
    Execute the action flow.
    :param puppeteer: The puppeteer that controls the application.
    :param control_dict: The control dictionary.
    :param application_window: The application window where the control is located.
    :return: The action execution log.
    """
    control_selected: UIAWrapper = control_dict.get(self.control_label, None)

    # If the control is selected, but not available, return an error.
    if control_selected is not None and not self._control_validation(
        control_selected
    ):
        self.results = ActionExecutionLog(
            status="error",
            traceback="Control is not available.",
            error="Control is not available.",
        )
        self._control_log = BaseControlLog()

        return self.results

    # Create the control receiver.
    puppeteer.receiver_manager.create_ui_control_receiver(
        control_selected, application_window
    )

    if self.function:

        if self._configs.get("SHOW_VISUAL_OUTLINE_ON_SCREEN", True):
            if control_selected:
                control_selected.draw_outline(colour="red", thickness=3)
                time.sleep(self._configs.get("RECTANGLE_TIME", 0))

        self._control_log = self._get_control_log(
            control_selected=control_selected, application_window=application_window
        )

        try:
            return_value = self.execute(puppeteer=puppeteer)
            if not utils.is_json_serializable(return_value):
                return_value = ""

            self.results = ActionExecutionLog(
                status="success",
                return_value=return_value,
            )

        except Exception as e:

            import traceback

            self.results = ActionExecutionLog(
                status="error",
                traceback=traceback.format_exc(),
                error=str(e),
            )
        return self.results

count_repeat_times(previous_actions)

Get the times of the same action in the previous actions.

Parameters:
  • previous_actions (List[Dict[str, Any]]) –

    The previous actions.

Returns:
  • int

    The times of the same action in the previous actions.

Source code in agents/processors/actions.py
172
173
174
175
176
177
178
179
180
181
182
183
184
185
def count_repeat_times(self, previous_actions: List[Dict[str, Any]]) -> int:
    """
    Get the times of the same action in the previous actions.
    :param previous_actions: The previous actions.
    :return: The times of the same action in the previous actions.
    """

    count = 0
    for action in previous_actions[::-1]:
        if self.is_same_action(action):
            count += 1
        else:
            break
    return count

execute(puppeteer)

Execute the action.

Parameters:
  • puppeteer (AppPuppeteer) –

    The puppeteer that controls the application.

Source code in agents/processors/actions.py
234
235
236
237
238
239
def execute(self, puppeteer: AppPuppeteer) -> Any:
    """
    Execute the action.
    :param puppeteer: The puppeteer that controls the application.
    """
    return puppeteer.execute_command(self.function, self.args)

get_operation_point_list()

Get the operation points of the action.

Returns:
  • List[Tuple[int]]

    The operation points of the action.

Source code in agents/processors/actions.py
364
365
366
367
368
369
370
371
372
373
374
375
def get_operation_point_list(self) -> List[Tuple[int]]:
    """
    Get the operation points of the action.
    :return: The operation points of the action.
    """

    if "path" in self.args:
        return [(point["x"], point["y"]) for point in self.args["path"]]
    elif "x" in self.args and "y" in self.args:
        return [(self.args["x"], self.args["y"])]
    else:
        return []

is_same_action(action_to_compare)

Check whether the two actions are the same.

Parameters:
  • action_to_compare (Dict[str, Any]) –

    The action to compare with the current action.

Returns:
  • bool

    Whether the two actions are the same.

Source code in agents/processors/actions.py
159
160
161
162
163
164
165
166
167
168
169
170
def is_same_action(self, action_to_compare: Dict[str, Any]) -> bool:
    """
    Check whether the two actions are the same.
    :param action_to_compare: The action to compare with the current action.
    :return: Whether the two actions are the same.
    """

    return (
        self.function == action_to_compare.get("Function")
        and self.args == action_to_compare.get("Args")
        and self.control_text == action_to_compare.get("ControlText")
    )

print_result()

Print the action execution result.

Source code in agents/processors/actions.py
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
def print_result(self) -> None:
    """
    Print the action execution result.
    """

    utils.print_with_color(
        "Selected item🕹️: {control_text}, Label: {label}".format(
            control_text=self.control_text, label=self.control_label
        ),
        "yellow",
    )
    utils.print_with_color(
        "Action applied⚒️: {action}".format(action=self.command_string), "blue"
    )

    result_color = "red" if self.results.status != "success" else "green"

    utils.print_with_color(
        "Execution result📜: {result}".format(result=asdict(self.results)),
        result_color,
    )

to_dict(previous_actions)

Convert the action to a dictionary.

Parameters:
  • previous_actions (Optional[List[Dict[str, Any]]]) –

    The previous actions.

Returns:
  • Dict[str, Any]

    The dictionary of the action.

Source code in agents/processors/actions.py
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
def to_dict(
    self, previous_actions: Optional[List[Dict[str, Any]]]
) -> Dict[str, Any]:
    """
    Convert the action to a dictionary.
    :param previous_actions: The previous actions.
    :return: The dictionary of the action.
    """

    action_dict = {
        "Function": self.function,
        "Args": self.args,
        "ControlLabel": self.control_label,
        "ControlText": self.control_text,
        "Status": self.after_status,
        "Results": asdict(self.results),
    }

    # Add the repetitive times of the same action in the previous actions if the previous actions are provided.
    if previous_actions:
        action_dict["RepeatTimes"] = self.count_repeat_times(previous_actions)

    return action_dict

to_string(previous_actions)

Convert the action to a string.

Parameters:
  • previous_actions (Optional[List[OneStepAction]]) –

    The previous actions.

Returns:
  • str

    The string of the action.

Source code in agents/processors/actions.py
211
212
213
214
215
216
217
def to_string(self, previous_actions: Optional[List["OneStepAction"]]) -> str:
    """
    Convert the action to a string.
    :param previous_actions: The previous actions.
    :return: The string of the action.
    """
    return json.dumps(self.to_dict(previous_actions), ensure_ascii=False)