Instantiation

There are three key steps in the instantiation process:

  1. Choose a template file according to the specified app and instruction.
  2. Prefill the task using the current screenshot.
  3. Filter the established task.

Given the initial task, the dataflow first choose a template (Phase 1), the prefill the initial task based on word envrionment to obtain task-action data (Phase 2). Finnally, it will filter the established task to evaluate the quality of task-action data.

1. Choose Template File

Templates for your app must be defined and described in dataflow/templates/app. For instance, if you want to instantiate tasks for the Word application, place the relevant .docx files in dataflow /templates/word, along with a description.json file. The appropriate template will be selected based on how well its description matches the instruction.

The ChooseTemplateFlow uses semantic matching, where task descriptions are compared with template descriptions using embeddings and FAISS for efficient nearest neighbor search. If semantic matching fails, a random template is chosen from the available files.

2. Prefill the Task

PrefillFlow

The PrefillFlow class orchestrates the refinement of task plans and UI interactions by leveraging PrefillAgent for task planning and action generation. It automates UI control updates, captures screenshots, and manages logs for messages and responses during execution.

PrefillAgent

The PrefillAgent class facilitates task instantiation and action sequence generation by constructing tailored prompt messages using the PrefillPrompter. It integrates system, user, and dynamic context to generate actionable inputs for down-stream workflows.

3. Filter Task

FilterFlow

The FilterFlow class is designed to process and refine task plans by leveraging a FilterAgent. The FilterFlow class acts as a bridge between the instantiation of tasks and the execution of a filtering process, aiming to refine task steps and prefill task-related files based on predefined filtering criteria.

FilterAgent

The FilterAgent class is a specialized agent used to evaluate whether an instantiated task is correct. It inherits from the BasicAgent class and includes several methods and attributes to handle its functionality.

Reference

ChooseTemplateFlow

Class to select and copy the most relevant template file based on the given task context.

Initialize the flow with the given task context.

Parameters:
  • app_name (str) –

    The name of the application.

  • file_extension (str) –

    The file extension of the template.

  • task_file_name (str) –

    The name of the task file.

Source code in instantiation/workflow/choose_template_flow.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def __init__(self, app_name: str, task_file_name: str, file_extension: str):
    """
    Initialize the flow with the given task context.
    :param app_name: The name of the application.
    :param file_extension: The file extension of the template.
    :param task_file_name: The name of the task file.
    """

    self._app_name = app_name
    self._file_extension = file_extension
    self._task_file_name = task_file_name
    self.execution_time = None
    self._embedding_model = self._load_embedding_model(
        model_name=_configs["CONTROL_FILTER_MODEL_SEMANTIC_NAME"]
    )

execute()

Execute the flow and return the copied template path.

Returns:
  • str

    The path to the copied template file.

Source code in instantiation/workflow/choose_template_flow.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def execute(self) -> str:
    """
    Execute the flow and return the copied template path.
    :return: The path to the copied template file.
    """

    start_time = time.time()
    try:
        template_copied_path = self._choose_template_and_copy()
    except Exception as e:
        raise e
    finally:
        self.execution_time = round(time.time() - start_time, 3)
    return template_copied_path

PrefillFlow

Bases: AppAgentProcessor

Class to manage the prefill process by refining planning steps and automating UI interactions

Initialize the prefill flow with the application context.

Parameters:
  • app_name (str) –

    The name of the application.

  • task_file_name (str) –

    The name of the task file for logging and tracking.

  • environment (WindowsAppEnv) –

    The environment of the app.

Source code in instantiation/workflow/prefill_flow.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def __init__(
    self,
    app_name: str,
    task_file_name: str,
    environment: WindowsAppEnv,
) -> None:
    """
    Initialize the prefill flow with the application context.
    :param app_name: The name of the application.
    :param task_file_name: The name of the task file for logging and tracking.
    :param environment: The environment of the app.
    """

    self.execution_time = None
    self._app_name = app_name
    self._task_file_name = task_file_name
    self._app_env = environment
    # Create or reuse a PrefillAgent for the app
    if self._app_name not in PrefillFlow._app_prefill_agent_dict:
        PrefillFlow._app_prefill_agent_dict[self._app_name] = PrefillAgent(
            "prefill",
            self._app_name,
            is_visual=True,
            main_prompt=_configs["PREFILL_PROMPT"],
            example_prompt=_configs["PREFILL_EXAMPLE_PROMPT"],
            api_prompt=_configs["API_PROMPT"],
        )
    self._prefill_agent = PrefillFlow._app_prefill_agent_dict[self._app_name]

    # Initialize execution step and UI control tools
    self._execute_step = 0
    self._control_inspector = ControlInspectorFacade(_BACKEND)
    self._photographer = PhotographerFacade()

    # Set default states
    self._status = ""

    # Initialize loggers for messages and responses
    self._log_path_configs = _configs["PREFILL_LOG_PATH"].format(
        task=self._task_file_name
    )
    os.makedirs(self._log_path_configs, exist_ok=True)

    # Set up loggers
    self._message_logger = BaseSession.initialize_logger(
        self._log_path_configs, "prefill_messages.json", "w", _configs
    )
    self._response_logger = BaseSession.initialize_logger(
        self._log_path_configs, "prefill_responses.json", "w", _configs
    )

execute(template_copied_path, original_task, refined_steps)

Start the execution by retrieving the instantiated result.

Parameters:
  • template_copied_path (str) –

    The path of the copied template to use.

  • original_task (str) –

    The original task to refine.

  • refined_steps (List[str]) –

    The steps to guide the refinement process.

Returns:
  • Dict[str, Any]

    The refined task and corresponding action plans.

Source code in instantiation/workflow/prefill_flow.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
def execute(
    self, template_copied_path: str, original_task: str, refined_steps: List[str]
) -> Dict[str, Any]:
    """
    Start the execution by retrieving the instantiated result.
    :param template_copied_path: The path of the copied template to use.
    :param original_task: The original task to refine.
    :param refined_steps: The steps to guide the refinement process.
    :return: The refined task and corresponding action plans.
    """

    start_time = time.time()
    try:
        instantiated_request, instantiated_plan = self._instantiate_task(
            template_copied_path, original_task, refined_steps
        )
    except Exception as e:
        raise e
    finally:
        self.execution_time = round(time.time() - start_time, 3)

    return  {
        "instantiated_request": instantiated_request,
        "instantiated_plan": instantiated_plan,
    }   

PrefillAgent

Bases: BasicAgent

The Agent for task instantialization and action sequence generation.

Initialize the PrefillAgent.

Parameters:
  • name (str) –

    The name of the agent.

  • process_name (str) –

    The name of the process.

  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt.

  • example_prompt (str) –

    The example prompt.

  • api_prompt (str) –

    The API prompt.

Source code in instantiation/agent/prefill_agent.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def __init__(
    self,
    name: str,
    process_name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
):
    """
    Initialize the PrefillAgent.
    :param name: The name of the agent.
    :param process_name: The name of the process.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt.
    :param example_prompt: The example prompt.
    :param api_prompt: The API prompt.
    """

    self._step = 0
    self._complete = False
    self._name = name
    self._status = None
    self.prompter: PrefillPrompter = self.get_prompter(
        is_visual, main_prompt, example_prompt, api_prompt
    )
    self._process_name = process_name

get_prompter(is_visual, main_prompt, example_prompt, api_prompt)

Get the prompt for the agent. This is the abstract method from BasicAgent that needs to be implemented.

Parameters:
  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt.

  • example_prompt (str) –

    The example prompt.

  • api_prompt (str) –

    The API prompt.

Returns:
  • str

    The prompt string.

Source code in instantiation/agent/prefill_agent.py
44
45
46
47
48
49
50
51
52
53
54
55
def get_prompter(self, is_visual: bool, main_prompt: str, example_prompt: str, api_prompt: str) -> str:
    """
    Get the prompt for the agent.
    This is the abstract method from BasicAgent that needs to be implemented.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt.
    :param example_prompt: The example prompt.
    :param api_prompt: The API prompt.
    :return: The prompt string.
    """

    return PrefillPrompter(is_visual, main_prompt, example_prompt, api_prompt)

message_constructor(dynamic_examples, given_task, reference_steps, doc_control_state, log_path)

Construct the prompt message for the PrefillAgent.

Parameters:
  • dynamic_examples (str) –

    The dynamic examples retrieved from the self-demonstration and human demonstration.

  • given_task (str) –

    The given task.

  • reference_steps (List[str]) –

    The reference steps.

  • doc_control_state (Dict[str, str]) –

    The document control state.

  • log_path (str) –

    The path of the log.

Returns:
  • List[str]

    The prompt message.

Source code in instantiation/agent/prefill_agent.py
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def message_constructor(
    self,
    dynamic_examples: str,
    given_task: str,
    reference_steps: List[str],
    doc_control_state: Dict[str, str],
    log_path: str,
) -> List[str]:
    """
    Construct the prompt message for the PrefillAgent.
    :param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration.
    :param given_task: The given task.
    :param reference_steps: The reference steps.
    :param doc_control_state: The document control state.
    :param log_path: The path of the log.
    :return: The prompt message.
    """

    prefill_agent_prompt_system_message = self.prompter.system_prompt_construction(
        dynamic_examples
    )
    prefill_agent_prompt_user_message = self.prompter.user_content_construction(
        given_task, reference_steps, doc_control_state, log_path
    )
    appagent_prompt_message = self.prompter.prompt_construction(
        prefill_agent_prompt_system_message,
        prefill_agent_prompt_user_message,
    )

    return appagent_prompt_message

process_comfirmation()

Confirm the process. This is the abstract method from BasicAgent that needs to be implemented.

Source code in instantiation/agent/prefill_agent.py
88
89
90
91
92
93
94
def process_comfirmation(self) -> None:
    """
    Confirm the process.
    This is the abstract method from BasicAgent that needs to be implemented.
    """

    pass

FilterFlow

Class to refine the plan steps and prefill the file based on filtering criteria.

Initialize the filter flow for a task.

Parameters:
  • app_name (str) –

    Name of the application being processed.

  • task_file_name (str) –

    Name of the task file being processed.

Source code in instantiation/workflow/filter_flow.py
21
22
23
24
25
26
27
28
29
30
31
32
def __init__(self, app_name: str, task_file_name: str) -> None:
    """
    Initialize the filter flow for a task.
    :param app_name: Name of the application being processed.
    :param task_file_name: Name of the task file being processed.
    """

    self.execution_time = None
    self._app_name = app_name
    self._log_path_configs = _configs["FILTER_LOG_PATH"].format(task=task_file_name)
    self._filter_agent = self._get_or_create_filter_agent()
    self._initialize_logs()

execute(instantiated_request)

Execute the filter flow: Filter the task and save the result.

Parameters:
  • instantiated_request (str) –

    Request object to be filtered.

Returns:
  • Dict[str, Any]

    Tuple containing task quality flag, comment, and task type.

Source code in instantiation/workflow/filter_flow.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def execute(self, instantiated_request: str) -> Dict[str, Any]:
    """
    Execute the filter flow: Filter the task and save the result.
    :param instantiated_request: Request object to be filtered.
    :return: Tuple containing task quality flag, comment, and task type.
    """

    start_time = time.time()
    try:
        judge, thought, request_type = self._get_filtered_result(
            instantiated_request
        )
    except Exception as e:
        raise e
    finally:
        self.execution_time = round(time.time() - start_time, 3)
    return {
        "judge": judge,
        "thought": thought,
        "request_type": request_type,
    }

FilterAgent

Bases: BasicAgent

The Agent to evaluate the instantiated task is correct or not.

Initialize the FilterAgent.

Parameters:
  • name (str) –

    The name of the agent.

  • process_name (str) –

    The name of the process.

  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt.

  • example_prompt (str) –

    The example prompt.

  • api_prompt (str) –

    The API prompt.

Source code in instantiation/agent/filter_agent.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def __init__(
    self,
    name: str,
    process_name: str,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str,
):
    """
    Initialize the FilterAgent.
    :param name: The name of the agent.
    :param process_name: The name of the process.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt.
    :param example_prompt: The example prompt.
    :param api_prompt: The API prompt.
    """

    self._step = 0
    self._complete = False
    self._name = name
    self._status = None
    self.prompter: FilterPrompter = self.get_prompter(
        is_visual, main_prompt, example_prompt, api_prompt
    )
    self._process_name = process_name

get_prompter(is_visual, main_prompt, example_prompt, api_prompt)

Get the prompt for the agent.

Parameters:
  • is_visual (bool) –

    The flag indicating whether the agent is visual or not.

  • main_prompt (str) –

    The main prompt.

  • example_prompt (str) –

    The example prompt.

  • api_prompt (str) –

    The API prompt.

Returns:
  • FilterPrompter

    The prompt string.

Source code in instantiation/agent/filter_agent.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
def get_prompter(
    self,
    is_visual: bool,
    main_prompt: str,
    example_prompt: str,
    api_prompt: str
) -> FilterPrompter:
    """
    Get the prompt for the agent.
    :param is_visual: The flag indicating whether the agent is visual or not.
    :param main_prompt: The main prompt.
    :param example_prompt: The example prompt.
    :param api_prompt: The API prompt.
    :return: The prompt string.
    """

    return FilterPrompter(is_visual, main_prompt, example_prompt, api_prompt)

message_constructor(request, app)

Construct the prompt message for the FilterAgent.

Parameters:
  • request (str) –

    The request sentence.

  • app (str) –

    The name of the operated app.

Returns:
  • List[str]

    The prompt message.

Source code in instantiation/agent/filter_agent.py
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def message_constructor(self, request: str, app: str) -> List[str]:
    """
    Construct the prompt message for the FilterAgent.
    :param request: The request sentence.
    :param app: The name of the operated app.
    :return: The prompt message.
    """

    filter_agent_prompt_system_message = self.prompter.system_prompt_construction(
        app=app
    )
    filter_agent_prompt_user_message = self.prompter.user_content_construction(
        request
    )
    filter_agent_prompt_message = self.prompter.prompt_construction(
        filter_agent_prompt_system_message, filter_agent_prompt_user_message
    )

    return filter_agent_prompt_message

process_comfirmation()

Confirm the process. This is the abstract method from BasicAgent that needs to be implemented.

Source code in instantiation/agent/filter_agent.py
80
81
82
83
84
85
86
def process_comfirmation(self) -> None:
    """
    Confirm the process.
    This is the abstract method from BasicAgent that needs to be implemented.
    """

    pass