Speculative Multi-Action Execution

UFO² introduces a new feature called Speculative Multi-Action Execution. This feature allows the agent to bundle several predicted steps into one LLM call, which are then validated live. This approach can lead to up to 51% fewer queries compared to inferring each step separately. The agent will first predict a batch of likely actions and then validate them against the live UIA state in a single shot. We illustrate the speculative multi-action execution in the figure below:

Configuration

To activate the speculative multi-action execution, you need to set ACTION_SEQUENCE to True in the config_dev.yaml file.

ACTION_SEQUENCE: True

References

The implementation of the speculative multi-action execution is located in the ufo/agents/processors/actions.py file. The following classes are used for the speculative multi-action execution:

Source code in agents/processors/actions.py

def __init__(
    self,
    function: str = "",
    args: Dict[str, Any] = {},
    control_label: str = "",
    control_text: str = "",
    after_status: str = "",
    results: Optional[ActionExecutionLog] = None,
    configs=Config.get_instance().config_data,
):
    self._function = function
    self._args = args
    self._control_label = control_label
    self._control_text = control_text
    self._after_status = after_status
    self._results = ActionExecutionLog() if results is None else results
    self._configs = configs
    self._control_log = BaseControlLog()

`after_status` `property`

Get the status.

Returns:	`str` – The status.

`args` `property`

Get the arguments.

Returns:	`Dict[str, Any]` – The arguments.

`command_string` `property`

Generate a function call string.

Returns:	`str` – The function call string.

`control_label` `property`

Get the control label.

Returns:	`str` – The control label.

`control_log` `property` `writable`

Get the control log.

Returns:	`BaseControlLog` – The control log.

`control_text` `property`

Get the control text.

Returns:	`str` – The control text.

`function` `property`

Get the function name.

Returns:	`str` – The function.

`results` `property` `writable`

Get the results.

Returns:	`ActionExecutionLog` – The results.

`action_flow(puppeteer, control_dict, application_window)`

Execute the action flow.

Parameters:	`puppeteer` (`AppPuppeteer`) – The puppeteer that controls the application. `control_dict` (`Dict[str, UIAWrapper]`) – The control dictionary. `application_window` (`UIAWrapper`) – The application window where the control is located.

Returns:	`Tuple[ActionExecutionLog, BaseControlLog]` – The action execution log.

Source code in agents/processors/actions.py

def action_flow(
    self,
    puppeteer: AppPuppeteer,
    control_dict: Dict[str, UIAWrapper],
    application_window: UIAWrapper,
) -> Tuple[ActionExecutionLog, BaseControlLog]:
    """
    Execute the action flow.
    :param puppeteer: The puppeteer that controls the application.
    :param control_dict: The control dictionary.
    :param application_window: The application window where the control is located.
    :return: The action execution log.
    """
    control_selected: UIAWrapper = control_dict.get(self.control_label, None)

    # If the control is selected, but not available, return an error.
    if control_selected is not None and not self._control_validation(
        control_selected
    ):
        self.results = ActionExecutionLog(
            status="error",
            traceback="Control is not available.",
            error="Control is not available.",
        )
        self._control_log = BaseControlLog()

        return self.results

    # Create the control receiver.
    puppeteer.receiver_manager.create_ui_control_receiver(
        control_selected, application_window
    )

    if self.function:

        if self._configs.get("SHOW_VISUAL_OUTLINE_ON_SCREEN", True):
            if control_selected:
                control_selected.draw_outline(colour="red", thickness=3)
                time.sleep(self._configs.get("RECTANGLE_TIME", 0))

        self._control_log = self._get_control_log(
            control_selected=control_selected, application_window=application_window
        )

        try:
            return_value = self.execute(puppeteer=puppeteer)
            if not utils.is_json_serializable(return_value):
                return_value = ""

            self.results = ActionExecutionLog(
                status="success",
                return_value=return_value,
            )

        except Exception as e:

            import traceback

            self.results = ActionExecutionLog(
                status="error",
                traceback=traceback.format_exc(),
                error=str(e),
            )
        return self.results

`count_repeat_times(previous_actions)`

Get the times of the same action in the previous actions.

Parameters:	`previous_actions` (`List[Dict[str, Any]]`) – The previous actions.

Returns:	`int` – The times of the same action in the previous actions.

Source code in agents/processors/actions.py

def count_repeat_times(self, previous_actions: List[Dict[str, Any]]) -> int:
    """
    Get the times of the same action in the previous actions.
    :param previous_actions: The previous actions.
    :return: The times of the same action in the previous actions.
    """

    count = 0
    for action in previous_actions[::-1]:
        if self.is_same_action(action):
            count += 1
        else:
            break
    return count

`execute(puppeteer)`

Execute the action.

Parameters:	`puppeteer` (`AppPuppeteer`) – The puppeteer that controls the application.

Source code in agents/processors/actions.py

def execute(self, puppeteer: AppPuppeteer) -> Any:
    """
    Execute the action.
    :param puppeteer: The puppeteer that controls the application.
    """
    return puppeteer.execute_command(self.function, self.args)

`get_operation_point_list()`

Get the operation points of the action.

Returns:	`List[Tuple[int]]` – The operation points of the action.

Source code in agents/processors/actions.py

def get_operation_point_list(self) -> List[Tuple[int]]:
    """
    Get the operation points of the action.
    :return: The operation points of the action.
    """

    if "path" in self.args:
        return [(point["x"], point["y"]) for point in self.args["path"]]
    elif "x" in self.args and "y" in self.args:
        return [(self.args["x"], self.args["y"])]
    else:
        return []

`is_same_action(action_to_compare)`

Check whether the two actions are the same.

Parameters:	`action_to_compare` (`Dict[str, Any]`) – The action to compare with the current action.

Returns:	`bool` – Whether the two actions are the same.

Source code in agents/processors/actions.py

def is_same_action(self, action_to_compare: Dict[str, Any]) -> bool:
    """
    Check whether the two actions are the same.
    :param action_to_compare: The action to compare with the current action.
    :return: Whether the two actions are the same.
    """

    return (
        self.function == action_to_compare.get("Function")
        and self.args == action_to_compare.get("Args")
        and self.control_text == action_to_compare.get("ControlText")
    )

`print_result()`

Print the action execution result.

Source code in agents/processors/actions.py

def print_result(self) -> None:
    """
    Print the action execution result.
    """

    utils.print_with_color(
        "Selected item🕹️: {control_text}, Label: {label}".format(
            control_text=self.control_text, label=self.control_label
        ),
        "yellow",
    )
    utils.print_with_color(
        "Action applied⚒️: {action}".format(action=self.command_string), "blue"
    )

    result_color = "red" if self.results.status != "success" else "green"

    utils.print_with_color(
        "Execution result📜: {result}".format(result=asdict(self.results)),
        result_color,
    )

`to_dict(previous_actions)`

Convert the action to a dictionary.

Parameters:	`previous_actions` (`Optional[List[Dict[str, Any]]]`) – The previous actions.

Returns:	`Dict[str, Any]` – The dictionary of the action.

Source code in agents/processors/actions.py

def to_dict(
    self, previous_actions: Optional[List[Dict[str, Any]]]
) -> Dict[str, Any]:
    """
    Convert the action to a dictionary.
    :param previous_actions: The previous actions.
    :return: The dictionary of the action.
    """

    action_dict = {
        "Function": self.function,
        "Args": self.args,
        "ControlLabel": self.control_label,
        "ControlText": self.control_text,
        "Status": self.after_status,
        "Results": asdict(self.results),
    }

    # Add the repetitive times of the same action in the previous actions if the previous actions are provided.
    if previous_actions:
        action_dict["RepeatTimes"] = self.count_repeat_times(previous_actions)

    return action_dict

`to_string(previous_actions)`

Convert the action to a string.

Parameters:	`previous_actions` (`Optional[List[OneStepAction]]`) – The previous actions.

Returns:	`str` – The string of the action.

Source code in agents/processors/actions.py

def to_string(self, previous_actions: Optional[List["OneStepAction"]]) -> str:
    """
    Convert the action to a string.
    :param previous_actions: The previous actions.
    :return: The string of the action.
    """
    return json.dumps(self.to_dict(previous_actions), ensure_ascii=False)

Speculative Multi-Action Execution

Configuration

References

after_status property

args property

command_string property

control_label property

control_log property writable

control_text property

function property

results property writable

action_flow(puppeteer, control_dict, application_window)

count_repeat_times(previous_actions)

execute(puppeteer)

get_operation_point_list()

is_same_action(action_to_compare)

print_result()

to_dict(previous_actions)

to_string(previous_actions)

`after_status` `property`

`args` `property`

`command_string` `property`

`control_label` `property`

`control_log` `property` `writable`

`control_text` `property`

`function` `property`

`results` `property` `writable`

`action_flow(puppeteer, control_dict, application_window)`

`count_repeat_times(previous_actions)`

`execute(puppeteer)`

`get_operation_point_list()`

`is_same_action(action_to_compare)`

`print_result()`

`to_dict(previous_actions)`

`to_string(previous_actions)`