Execution

The instantiated plans will be executed by a execute task. In this phase, given the task-action data, the execution process will match the real controller based on word environment and execute the plan step by step. After execution, evalution agent will evaluation the quality of the entire execution process.

ExecuteFlow

The ExecuteFlow class is designed to facilitate the execution and evaluation of tasks in a Windows application environment. It provides functionality to interact with the application's UI, execute predefined tasks, capture screenshots, and evaluate the results of the execution. The class also handles logging and error management for the tasks.

Task Execution

The task execution in the ExecuteFlow class follows a structured sequence to ensure accurate and traceable task performance:

  1. Initialization:

  2. Load configuration settings and log paths.

  3. Find the application window matching the task.
  4. Retrieve or create an ExecuteAgent for executing the task.
  5. Plan Execution:

  6. Loop through each step in the instantiated_plan.

  7. Parse the step to extract information like subtasks, control text, and the required operation.
  8. Action Execution:

  9. Find the control in the application window that matches the specified control text.

  10. If no matching control is found, raise an error.
  11. Perform the specified action (e.g., click, input text) using the agent's Puppeteer framework.
  12. Capture screenshots of the application window and selected controls for logging and debugging.
  13. Result Logging:

  14. Log details of the step execution, including control information, performed action, and results.

  15. Finalization:

  16. Save the final state of the application window.

  17. Quit the application client gracefully.

Evaluation

The evaluation process in the ExecuteFlow class is designed to assess the performance of the executed task based on predefined prompts:

  1. Start Evaluation:

  2. Evaluation begins immediately after task execution.

  3. It uses an ExecuteEvalAgent initialized during class construction.
  4. Perform Evaluation:

  5. The ExecuteEvalAgent evaluates the task using a combination of input prompts (e.g., main prompt and API prompt) and logs generated during task execution.

  6. The evaluation process outputs a result summary (e.g., quality flag, comments, and task type).
  7. Log and Output Results:

  8. Display the evaluation results in the console.

  9. Return the evaluation summary alongside the executed plan for further analysis or reporting.

Reference

ExecuteFlow

Bases: AppAgentProcessor

ExecuteFlow class for executing the task and saving the result.

Initialize the execute flow for a task.

Parameters:
  • task_file_name (str) –

    Name of the task file being processed.

  • context (Context) –

    Context object for the current session.

  • environment (WindowsAppEnv) –

    Environment object for the application being processed.

Source code in execution/workflow/execute_flow.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def __init__(
    self, task_file_name: str, context: Context, environment: WindowsAppEnv
) -> None:
    """
    Initialize the execute flow for a task.
    :param task_file_name: Name of the task file being processed.
    :param context: Context object for the current session.
    :param environment: Environment object for the application being processed.
    """

    super().__init__(agent=ExecuteAgent, context=context)

    self.execution_time = None
    self.eval_time = None
    self._app_env = environment
    self._task_file_name = task_file_name
    self._app_name = self._app_env.app_name

    log_path = _configs["EXECUTE_LOG_PATH"].format(task=task_file_name)
    self._initialize_logs(log_path)

    self.application_window = self._app_env.find_matching_window(task_file_name)
    self.app_agent = self._get_or_create_execute_agent()
    self.eval_agent = self._get_or_create_evaluation_agent()

    self._matched_control = None  # Matched control for the current step.

execute(request, instantiated_plan)

Execute the execute flow: Execute the task and save the result.

Parameters:
  • request (str) –

    Original request to be executed.

  • instantiated_plan (List[Dict[str, Any]]) –

    Instantiated plan containing steps to execute.

Returns:
  • Tuple[List[Dict[str, Any]], Dict[str, str]]

    Tuple containing task quality flag, comment, and task type.

Source code in execution/workflow/execute_flow.py
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
def execute(
    self, request: str, instantiated_plan: List[Dict[str, Any]]
) -> Tuple[List[Dict[str, Any]], Dict[str, str]]:
    """
    Execute the execute flow: Execute the task and save the result.
    :param request: Original request to be executed.
    :param instantiated_plan: Instantiated plan containing steps to execute.
    :return: Tuple containing task quality flag, comment, and task type.
    """

    start_time = time.time()
    try:
        executed_plan = self.execute_plan(instantiated_plan)
    except Exception as error:
        raise RuntimeError(f"Execution failed. {error}")
    finally:
        self.execution_time = round(time.time() - start_time, 3)

    start_time = time.time()
    try:
        result, _ = self.eval_agent.evaluate(
            request=request, log_path=self.log_path
        )
        utils.print_with_color(f"Result: {result}", "green")
    except Exception as error:
        raise RuntimeError(f"Evaluation failed. {error}")
    finally:
        self.eval_time = round(time.time() - start_time, 3)

    return executed_plan, result

execute_action()

Execute the action.

Source code in execution/workflow/execute_flow.py
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
def execute_action(self) -> None:
    """
    Execute the action.
    """

    control_selected = None
    # Find the matching window and control.
    self.application_window = self._app_env.find_matching_window(
        self._task_file_name
    )
    if self.control_text == "":
        control_selected = self.application_window
    else:
        self._control_label, control_selected = self._app_env.find_matching_controller(
            self.filtered_annotation_dict, self.control_text
            )
        if control_selected:
            self._matched_control = control_selected.window_text()

    if not control_selected:
        # If the control is not found, raise an error.
        raise RuntimeError(f"Control with text '{self.control_text}' not found.")

    try:
        # Get the selected control item from the annotation dictionary and LLM response.
        # The LLM response is a number index corresponding to the key in the annotation dictionary.
        if control_selected:

            if _ufo_configs.get("SHOW_VISUAL_OUTLINE_ON_SCREEN", True):
                control_selected.draw_outline(colour="red", thickness=3)
                time.sleep(_ufo_configs.get("RECTANGLE_TIME", 0))

            control_coordinates = PhotographerDecorator.coordinate_adjusted(
                self.application_window.rectangle(), control_selected.rectangle()
            )

            self._control_log = {
                "control_class": control_selected.element_info.class_name,
                "control_type": control_selected.element_info.control_type,
                "control_automation_id": control_selected.element_info.automation_id,
                "control_friendly_class_name": control_selected.friendly_class_name(),
                "control_coordinates": {
                    "left": control_coordinates[0],
                    "top": control_coordinates[1],
                    "right": control_coordinates[2],
                    "bottom": control_coordinates[3],
                },
            }

            self.app_agent.Puppeteer.receiver_manager.create_ui_control_receiver(
                control_selected, self.application_window
            )

            # Save the screenshot of the tagged selected control.
            self.capture_control_screenshot(control_selected)

            self._results = self.app_agent.Puppeteer.execute_command(
                self._operation, self._args
            )
            self.control_reannotate = None
            if not utils.is_json_serializable(self._results):
                self._results = ""

                return

    except Exception:
        self.general_error_handler()

execute_plan(instantiated_plan)

Get the executed result from the execute agent.

Parameters:
  • instantiated_plan (List[Dict[str, Any]]) –

    Plan containing steps to execute.

Returns:
  • List[Dict[str, Any]]

    List of executed steps.

Source code in execution/workflow/execute_flow.py
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
def execute_plan(
    self, instantiated_plan: List[Dict[str, Any]]
) -> List[Dict[str, Any]]:
    """
    Get the executed result from the execute agent.
    :param instantiated_plan: Plan containing steps to execute.
    :return: List of executed steps.
    """

    # Initialize the step counter and capture the initial screenshot.
    self.session_step = 0
    try:
        time.sleep(1)
        # Initialize the API receiver
        self.app_agent.Puppeteer.receiver_manager.create_api_receiver(
            self.app_agent._app_root_name, self.app_agent._process_name
        )
        # Initialize the control receiver
        current_receiver = self.app_agent.Puppeteer.receiver_manager.receiver_list[
            -1
        ]

        if current_receiver is not None:
            self.application_window = self._app_env.find_matching_window(
                self._task_file_name
            )
            current_receiver.com_object = (
                current_receiver.get_object_from_process_name()
            )

        self.init_and_final_capture_screenshot()
    except Exception as error:
        raise RuntimeError(f"Execution initialization failed. {error}")

    # Initialize the success flag for each step.
    for index, step_plan in enumerate(instantiated_plan):
        instantiated_plan[index]["Success"] = None
        instantiated_plan[index]["MatchedControlText"] = None

    for index, step_plan in enumerate(instantiated_plan):
        try:
            self.session_step += 1

            # Check if the maximum steps have been exceeded.
            if self.session_step > _configs["MAX_STEPS"]:
                raise RuntimeError("Maximum steps exceeded.")

            self._parse_step_plan(step_plan)

            try:
                self.process()
                instantiated_plan[index]["Success"] = True
                instantiated_plan[index]["ControlLabel"] = self._control_label
                instantiated_plan[index][
                    "MatchedControlText"
                ] = self._matched_control
            except Exception as ControllerNotFoundError:
                instantiated_plan[index]["Success"] = False
                raise ControllerNotFoundError

        except Exception as error:
            err_info = RuntimeError(
                f"Step {self.session_step} execution failed. {error}"
            )
            raise err_info
    # capture the final screenshot
    self.session_step += 1
    time.sleep(1)
    self.init_and_final_capture_screenshot()
    # save the final state of the app

    win_com_receiver = None
    for receiver in reversed(
        self.app_agent.Puppeteer.receiver_manager.receiver_list
    ):
        if isinstance(receiver, WinCOMReceiverBasic):
            if receiver.client is not None:
                win_com_receiver = receiver
                break

    if win_com_receiver is not None:
        win_com_receiver.save()
        time.sleep(1)
        win_com_receiver.client.Quit()

    print("Execution complete.")

    return instantiated_plan

general_error_handler()

Handle general errors.

Source code in execution/workflow/execute_flow.py
374
375
376
377
378
379
def general_error_handler(self) -> None:
    """
    Handle general errors.
    """

    pass

init_and_final_capture_screenshot()

Capture the screenshot.

Source code in execution/workflow/execute_flow.py
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
def init_and_final_capture_screenshot(self) -> None:
    """
    Capture the screenshot.
    """

    # Define the paths for the screenshots saved.
    screenshot_save_path = self.log_path + f"action_step{self.session_step}.png"

    self._memory_data.add_values_from_dict(
        {
            "CleanScreenshot": screenshot_save_path,
        }
    )

    self.photographer.capture_app_window_screenshot(
        self.application_window, save_path=screenshot_save_path
    )
    # Capture the control screenshot.
    control_selected = self._app_env.app_window
    self.capture_control_screenshot(control_selected)

log_save()

Log the constructed prompt message for the PrefillAgent.

Source code in execution/workflow/execute_flow.py
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
def log_save(self) -> None:
    """
    Log the constructed prompt message for the PrefillAgent.
    """

    step_memory = {
        "Step": self.session_step,
        "Subtask": self.subtask,
        "ControlLabel": self._control_label,
        "ControlText": self.control_text,
        "Action": self.action,
        "ActionType": self.app_agent.Puppeteer.get_command_types(self._operation),
        "Results": self._results,
        "Application": self.app_agent._app_root_name,
        "TimeCost": self.time_cost,
    }
    self._memory_data.add_values_from_dict(step_memory)
    self.log(self._memory_data.to_dict())

print_step_info()

Print the step information.

Source code in execution/workflow/execute_flow.py
233
234
235
236
237
238
239
240
241
242
243
244
def print_step_info(self) -> None:
    """
    Print the step information.
    """

    utils.print_with_color(
        "Step {step}: {subtask}".format(
            step=self.session_step,
            subtask=self.subtask,
        ),
        "magenta",
    )

process()

Process the current step.

Source code in execution/workflow/execute_flow.py
221
222
223
224
225
226
227
228
229
230
231
def process(self) -> None:
    """
    Process the current step.
    """

    step_start_time = time.time()
    self.print_step_info()
    self.capture_screenshot()
    self.execute_action()
    self.time_cost = round(time.time() - step_start_time, 3)
    self.log_save()

ExecuteAgent

Bases: AppAgent

The Agent for task execution.

Initialize the ExecuteAgent.

Parameters:
  • name (str) –

    The name of the agent.

  • process_name (str) –

    The name of the process.

  • app_root_name (str) –

    The name of the app root.

Source code in execution/agent/execute_agent.py
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def __init__(
    self,
    name: str,
    process_name: str,
    app_root_name: str,
):
    """
    Initialize the ExecuteAgent.
    :param name: The name of the agent.
    :param process_name: The name of the process.
    :param app_root_name: The name of the app root.
    """

    self._step = 0
    self._complete = False
    self._name = name
    self._status = None
    self._process_name = process_name
    self._app_root_name = app_root_name
    self.Puppeteer = self.create_puppeteer_interface()

ExecuteEvalAgent

Bases: EvaluationAgent

The Agent for task execution evaluation.

Initialize the ExecuteEvalAgent.

Parameters:
  • name (str) –

    The name of the agent.

  • app_root_name (str) –

    The name of the app root.

  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt.

  • example_prompt (str) –

    The example prompt.

  • api_prompt (str) –

    The API prompt.

Source code in execution/agent/execute_eval_agent.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def __init__(
    self,
    name: str,
    app_root_name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
):
    """
    Initialize the ExecuteEvalAgent.
    :param name: The name of the agent.
    :param app_root_name: The name of the app root.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt.
    :param example_prompt: The example prompt.
    :param api_prompt: The API prompt.
    """

    super().__init__(
        name=name,
        app_root_name=app_root_name,
        is_visual=is_visual,
        main_prompt=main_prompt,
        example_prompt=example_prompt,
        api_prompt=api_prompt,
    )

get_prompter(is_visual, prompt_template, example_prompt_template, api_prompt_template, root_name=None)

Get the prompter for the agent.

Parameters:
  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • prompt_template (str) –

    The prompt template.

  • example_prompt_template (str) –

    The example prompt template.

  • api_prompt_template (str) –

    The API prompt template.

  • root_name (Optional[str], default: None ) –

    The name of the root.

Returns:
  • ExecuteEvalAgentPrompter

    The prompter.

Source code in execution/agent/execute_eval_agent.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
def get_prompter(
    self,
    is_visual: bool,
    prompt_template: str,
    example_prompt_template: str,
    api_prompt_template: str,
    root_name: Optional[str] = None,
) -> ExecuteEvalAgentPrompter:
    """
    Get the prompter for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param prompt_template: The prompt template.
    :param example_prompt_template: The example prompt template.
    :param api_prompt_template: The API prompt template.
    :param root_name: The name of the root.
    :return: The prompter.
    """

    return ExecuteEvalAgentPrompter(
        is_visual=is_visual,
        prompt_template=prompt_template,
        example_prompt_template=example_prompt_template,
        api_prompt_template=api_prompt_template,
        root_name=root_name,
    )