Execution

The instantiated plans will be executed by a execute task. In this phase, given the task-action data, the execution process will match the real controller based on word environment and execute the plan step by step. After execution, evalution agent will evaluation the quality of the entire execution process.

ExecuteFlow

The ExecuteFlow class is designed to facilitate the execution and evaluation of tasks in a Windows application environment. It provides functionality to interact with the application's UI, execute predefined tasks, capture screenshots, and evaluate the results of the execution. The class also handles logging and error management for the tasks.

Task Execution

The task execution in the ExecuteFlow class follows a structured sequence to ensure accurate and traceable task performance:

Initialization:
Load configuration settings and log paths.
Find the application window matching the task.
Retrieve or create an ExecuteAgent for executing the task.
Plan Execution:
Loop through each step in the instantiated_plan.
Parse the step to extract information like subtasks, control text, and the required operation.
Action Execution:
Find the control in the application window that matches the specified control text.
If no matching control is found, raise an error.
Perform the specified action (e.g., click, input text) using the agent's Puppeteer framework.
Capture screenshots of the application window and selected controls for logging and debugging.
Result Logging:
Log details of the step execution, including control information, performed action, and results.
Finalization:
Save the final state of the application window.
Quit the application client gracefully.

Evaluation

The evaluation process in the ExecuteFlow class is designed to assess the performance of the executed task based on predefined prompts:

Start Evaluation:
Evaluation begins immediately after task execution.
It uses an ExecuteEvalAgent initialized during class construction.
Perform Evaluation:
The ExecuteEvalAgent evaluates the task using a combination of input prompts (e.g., main prompt and API prompt) and logs generated during task execution.
The evaluation process outputs a result summary (e.g., quality flag, comments, and task type).
Log and Output Results:
Display the evaluation results in the console.
Return the evaluation summary alongside the executed plan for further analysis or reporting.

Reference

ExecuteFlow

Bases: AppAgentProcessor

ExecuteFlow class for executing the task and saving the result.

Initialize the execute flow for a task.

Parameters:	`task_file_name` (`str`) – Name of the task file being processed. `context` (`Context`) – Context object for the current session. `environment` (`WindowsAppEnv`) – Environment object for the application being processed.

Source code in execution/workflow/execute_flow.py

def __init__(
    self, task_file_name: str, context: Context, environment: WindowsAppEnv
) -> None:
    """
    Initialize the execute flow for a task.
    :param task_file_name: Name of the task file being processed.
    :param context: Context object for the current session.
    :param environment: Environment object for the application being processed.
    """

    super().__init__(agent=ExecuteAgent, context=context)

    self.execution_time = None
    self.eval_time = None
    self._app_env = environment
    self._task_file_name = task_file_name
    self._app_name = self._app_env.app_name

    log_path = _configs["EXECUTE_LOG_PATH"].format(task=task_file_name)
    self._initialize_logs(log_path)

    self.application_window = self._app_env.find_matching_window(task_file_name)
    self.app_agent = self._get_or_create_execute_agent()
    self.eval_agent = self._get_or_create_evaluation_agent()

    self._matched_control = None  # Matched control for the current step.

`execute(request, instantiated_plan)`

Execute the execute flow: Execute the task and save the result.

Parameters:	`request` (`str`) – Original request to be executed. `instantiated_plan` (`List[Dict[str, Any]]`) – Instantiated plan containing steps to execute.

Returns:	`Tuple[List[Dict[str, Any]], Dict[str, str]]` – Tuple containing task quality flag, comment, and task type.

Source code in execution/workflow/execute_flow.py

def execute(
    self, request: str, instantiated_plan: List[Dict[str, Any]]
) -> Tuple[List[Dict[str, Any]], Dict[str, str]]:
    """
    Execute the execute flow: Execute the task and save the result.
    :param request: Original request to be executed.
    :param instantiated_plan: Instantiated plan containing steps to execute.
    :return: Tuple containing task quality flag, comment, and task type.
    """

    start_time = time.time()
    try:
        executed_plan = self.execute_plan(instantiated_plan)
    except Exception as error:
        raise RuntimeError(f"Execution failed. {error}")
    finally:
        self.execution_time = round(time.time() - start_time, 3)

    start_time = time.time()
    try:
        result, _ = self.eval_agent.evaluate(
            request=request, log_path=self.log_path
        )
        utils.print_with_color(f"Result: {result}", "green")
    except Exception as error:
        raise RuntimeError(f"Evaluation failed. {error}")
    finally:
        self.eval_time = round(time.time() - start_time, 3)

    return executed_plan, result

`execute_action()`

Execute the action.

Source code in execution/workflow/execute_flow.py

def execute_action(self) -> None:
    """
    Execute the action.
    """

    control_selected = None
    # Find the matching window and control.
    self.application_window = self._app_env.find_matching_window(
        self._task_file_name
    )
    if self.control_text == "":
        control_selected = self.application_window
    else:
        self._control_label, control_selected = (
            self._app_env.find_matching_controller(
                self.filtered_annotation_dict, self.control_text
            )
        )
        if control_selected:
            self._matched_control = control_selected.window_text()

    if not control_selected:
        # If the control is not found, raise an error.
        raise RuntimeError(f"Control with text '{self.control_text}' not found.")

    try:
        # Get the selected control item from the annotation dictionary and LLM response.
        # The LLM response is a number index corresponding to the key in the annotation dictionary.
        if control_selected:

            if _ufo_configs.get("SHOW_VISUAL_OUTLINE_ON_SCREEN", True):
                control_selected.draw_outline(colour="red", thickness=3)
                time.sleep(_ufo_configs.get("RECTANGLE_TIME", 0))

            control_coordinates = PhotographerDecorator.coordinate_adjusted(
                self.application_window.rectangle(), control_selected.rectangle()
            )

            self._control_log = {
                "control_class": control_selected.element_info.class_name,
                "control_type": control_selected.element_info.control_type,
                "control_automation_id": control_selected.element_info.automation_id,
                "control_friendly_class_name": control_selected.friendly_class_name(),
                "control_coordinates": {
                    "left": control_coordinates[0],
                    "top": control_coordinates[1],
                    "right": control_coordinates[2],
                    "bottom": control_coordinates[3],
                },
            }

            self.app_agent.Puppeteer.receiver_manager.create_ui_control_receiver(
                control_selected, self.application_window
            )

            # Save the screenshot of the tagged selected control.
            self.capture_control_screenshot(control_selected)

            self._results = self.app_agent.Puppeteer.execute_command(
                self._operation, self._args
            )
            self.control_reannotate = None
            if not utils.is_json_serializable(self._results):
                self._results = ""

                return

    except Exception:
        self.general_error_handler()

`execute_plan(instantiated_plan)`

Get the executed result from the execute agent.

Parameters:	`instantiated_plan` (`List[Dict[str, Any]]`) – Plan containing steps to execute.

Returns:	`List[Dict[str, Any]]` – List of executed steps.

Source code in execution/workflow/execute_flow.py

def execute_plan(
    self, instantiated_plan: List[Dict[str, Any]]
) -> List[Dict[str, Any]]:
    """
    Get the executed result from the execute agent.
    :param instantiated_plan: Plan containing steps to execute.
    :return: List of executed steps.
    """

    # Initialize the step counter and capture the initial screenshot.
    self.session_step = 0
    try:
        time.sleep(1)
        # Initialize the API receiver
        self.app_agent.Puppeteer.receiver_manager.create_api_receiver(
            self.app_agent._app_root_name, self.app_agent._process_name
        )
        # Initialize the control receiver
        current_receiver = self.app_agent.Puppeteer.receiver_manager.receiver_list[
            -1
        ]

        if current_receiver is not None:
            self.application_window = self._app_env.find_matching_window(
                self._task_file_name
            )
            current_receiver.com_object = (
                current_receiver.get_object_from_process_name()
            )

        self.init_and_final_capture_screenshot()
    except Exception as error:
        raise RuntimeError(f"Execution initialization failed. {error}")

    # Initialize the success flag for each step.
    for index, step_plan in enumerate(instantiated_plan):
        instantiated_plan[index]["Success"] = None
        instantiated_plan[index]["MatchedControlText"] = None

    for index, step_plan in enumerate(instantiated_plan):
        try:
            self.session_step += 1

            # Check if the maximum steps have been exceeded.
            if self.session_step > _configs["MAX_STEPS"]:
                raise RuntimeError("Maximum steps exceeded.")

            self._parse_step_plan(step_plan)

            try:
                self.process()
                instantiated_plan[index]["Success"] = True
                instantiated_plan[index]["ControlLabel"] = self._control_label
                instantiated_plan[index][
                    "MatchedControlText"
                ] = self._matched_control
            except Exception as ControllerNotFoundError:
                instantiated_plan[index]["Success"] = False
                raise ControllerNotFoundError

        except Exception as error:
            err_info = RuntimeError(
                f"Step {self.session_step} execution failed. {error}"
            )
            raise err_info
    # capture the final screenshot
    self.session_step += 1
    time.sleep(1)
    self.init_and_final_capture_screenshot()
    # save the final state of the app

    win_com_receiver = None
    for receiver in reversed(
        self.app_agent.Puppeteer.receiver_manager.receiver_list
    ):
        if isinstance(receiver, WinCOMReceiverBasic):
            if receiver.client is not None:
                win_com_receiver = receiver
                break

    if win_com_receiver is not None:
        win_com_receiver.save()
        time.sleep(1)
        win_com_receiver.client.Quit()

    print("Execution complete.")

    return instantiated_plan

`general_error_handler()`

Handle general errors.

Source code in execution/workflow/execute_flow.py

def general_error_handler(self) -> None:
    """
    Handle general errors.
    """

    pass

`init_and_final_capture_screenshot()`

Capture the screenshot.

Source code in execution/workflow/execute_flow.py

def init_and_final_capture_screenshot(self) -> None:
    """
    Capture the screenshot.
    """

    # Define the paths for the screenshots saved.
    screenshot_save_path = self.log_path + f"action_step{self.session_step}.png"

    self._memory_data.add_values_from_dict(
        {
            "CleanScreenshot": screenshot_save_path,
        }
    )

    self.photographer.capture_app_window_screenshot(
        self.application_window, save_path=screenshot_save_path
    )
    # Capture the control screenshot.
    control_selected = self._app_env.app_window
    self.capture_control_screenshot(control_selected)

`log_save()`

Log the constructed prompt message for the PrefillAgent.

Source code in execution/workflow/execute_flow.py

def log_save(self) -> None:
    """
    Log the constructed prompt message for the PrefillAgent.
    """

    step_memory = {
        "Step": self.session_step,
        "Subtask": self.subtask,
        "ControlLabel": self._control_label,
        "ControlText": self.control_text,
        "Action": self.action,
        "ActionType": self.app_agent.Puppeteer.get_command_types(self._operation),
        "Results": self._results,
        "Application": self.app_agent._app_root_name,
        "TimeCost": self.time_cost,
    }
    self._memory_data.add_values_from_dict(step_memory)
    self.log(self._memory_data.to_dict())

`print_step_info()`

Print the step information.

Source code in execution/workflow/execute_flow.py

def print_step_info(self) -> None:
    """
    Print the step information.
    """

    utils.print_with_color(
        "Step {step}: {subtask}".format(
            step=self.session_step,
            subtask=self.subtask,
        ),
        "magenta",
    )

`process()`

Process the current step.

Source code in execution/workflow/execute_flow.py

def process(self) -> None:
    """
    Process the current step.
    """

    step_start_time = time.time()
    self.print_step_info()
    self.capture_screenshot()
    self.execute_action()
    self.time_cost = round(time.time() - step_start_time, 3)
    self.log_save()

ExecuteAgent

Bases: AppAgent

The Agent for task execution.

Initialize the ExecuteAgent.

Parameters:	`name` (`str`) – The name of the agent. `process_name` (`str`) – The name of the process. `app_root_name` (`str`) – The name of the app root.

Source code in execution/agent/execute_agent.py

def __init__(
    self,
    name: str,
    process_name: str,
    app_root_name: str,
):
    """
    Initialize the ExecuteAgent.
    :param name: The name of the agent.
    :param process_name: The name of the process.
    :param app_root_name: The name of the app root.
    """

    self._step = 0
    self._complete = False
    self._name = name
    self._status = None
    self._process_name = process_name
    self._app_root_name = app_root_name
    self.Puppeteer = self.create_puppeteer_interface()

ExecuteEvalAgent

Bases: EvaluationAgent

The Agent for task execution evaluation.

Initialize the ExecuteEvalAgent.

Parameters:	`name` (`str`) – The name of the agent. `app_root_name` (`str`) – The name of the app root. `is_visual` (`bool`) – The flag indicating whether the agent is visual or not. `main_prompt` (`str`) – The main prompt. `example_prompt` (`str`) – The example prompt. `api_prompt` (`str`) – The API prompt.

Source code in execution/agent/execute_eval_agent.py

def __init__(
    self,
    name: str,
    app_root_name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
):
    """
    Initialize the ExecuteEvalAgent.
    :param name: The name of the agent.
    :param app_root_name: The name of the app root.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt.
    :param example_prompt: The example prompt.
    :param api_prompt: The API prompt.
    """

    super().__init__(
        name=name,
        app_root_name=app_root_name,
        is_visual=is_visual,
        main_prompt=main_prompt,
        example_prompt=example_prompt,
        api_prompt=api_prompt,
    )

`get_prompter(is_visual, prompt_template, example_prompt_template, api_prompt_template, root_name=None)`

Get the prompter for the agent.

Parameters:	`is_visual` (`bool`) – The flag indicating whether the agent is visual or not. `prompt_template` (`str`) – The prompt template. `example_prompt_template` (`str`) – The example prompt template. `api_prompt_template` (`str`) – The API prompt template. `root_name` (`Optional[str]`, default: `None` ) – The name of the root.

Returns:	`ExecuteEvalAgentPrompter` – The prompter.

Source code in execution/agent/execute_eval_agent.py

def get_prompter(
    self,
    is_visual: bool,
    prompt_template: str,
    example_prompt_template: str,
    api_prompt_template: str,
    root_name: Optional[str] = None,
) -> ExecuteEvalAgentPrompter:
    """
    Get the prompter for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param prompt_template: The prompt template.
    :param example_prompt_template: The example prompt template.
    :param api_prompt_template: The API prompt template.
    :param root_name: The name of the root.
    :return: The prompter.
    """

    return ExecuteEvalAgentPrompter(
        is_visual=is_visual,
        prompt_template=prompt_template,
        example_prompt_template=example_prompt_template,
        api_prompt_template=api_prompt_template,
        root_name=root_name,
    )

« Previous Next »