Instantiation
There are three key steps in the instantiation process:
Choose a template
file according to the specified app and instruction.
Prefill
the task using the current screenshot.
Filter
the established task.
Given the initial task, the dataflow first choose a template (Phase 1
), the prefill the initial task based on word envrionment to obtain task-action data (Phase 2
). Finnally, it will filter the established task to evaluate the quality of task-action data.
1. Choose Template File
Templates for your app must be defined and described in dataflow/templates/app
. For instance, if you want to instantiate tasks for the Word application, place the relevant .docx
files in dataflow /templates/word
, along with a description.json
file. The appropriate template will be selected based on how well its description matches the instruction.
The ChooseTemplateFlow
uses semantic matching, where task descriptions are compared with template descriptions using embeddings and FAISS for efficient nearest neighbor search. If semantic matching fails, a random template is chosen from the available files.
2. Prefill the Task
PrefillFlow
The PrefillFlow
class orchestrates the refinement of task plans and UI interactions by leveraging PrefillAgent
for task planning and action generation. It automates UI control updates, captures screenshots, and manages logs for messages and responses during execution.
PrefillAgent
The PrefillAgent
class facilitates task instantiation and action sequence generation by constructing tailored prompt messages using the PrefillPrompter
. It integrates system, user, and dynamic context to generate actionable inputs for down-stream workflows.
3. Filter Task
FilterFlow
The FilterFlow
class is designed to process and refine task plans by leveraging a FilterAgent
. The FilterFlow
class acts as a bridge between the instantiation of tasks and the execution of a filtering process, aiming to refine task steps and prefill task-related files based on predefined filtering criteria.
FilterAgent
The FilterAgent
class is a specialized agent used to evaluate whether an instantiated task is correct. It inherits from the BasicAgent class and includes several methods and attributes to handle its functionality.
Reference
ChooseTemplateFlow
Class to select and copy the most relevant template file based on the given task context.
Initialize the flow with the given task context.
Parameters: |
-
app_name
(str )
–
The name of the application.
-
file_extension
(str )
–
The file extension of the template.
-
task_file_name
(str )
–
The name of the task file.
|
Source code in instantiation/workflow/choose_template_flow.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41 | def __init__(self, app_name: str, task_file_name: str, file_extension: str):
"""
Initialize the flow with the given task context.
:param app_name: The name of the application.
:param file_extension: The file extension of the template.
:param task_file_name: The name of the task file.
"""
self._app_name = app_name
self._file_extension = file_extension
self._task_file_name = task_file_name
self.execution_time = None
self._embedding_model = self._load_embedding_model(
model_name=_configs["CONTROL_FILTER_MODEL_SEMANTIC_NAME"]
)
|
execute()
Execute the flow and return the copied template path.
Returns: |
-
str
–
The path to the copied template file.
|
Source code in instantiation/workflow/choose_template_flow.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56 | def execute(self) -> str:
"""
Execute the flow and return the copied template path.
:return: The path to the copied template file.
"""
start_time = time.time()
try:
template_copied_path = self._choose_template_and_copy()
except Exception as e:
raise e
finally:
self.execution_time = round(time.time() - start_time, 3)
return template_copied_path
|
PrefillFlow
Bases: AppAgentProcessor
Class to manage the prefill process by refining planning steps and automating UI interactions
Initialize the prefill flow with the application context.
Parameters: |
-
app_name
(str )
–
The name of the application.
-
task_file_name
(str )
–
The name of the task file for logging and tracking.
-
environment
(WindowsAppEnv )
–
The environment of the app.
|
Source code in instantiation/workflow/prefill_flow.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78 | def __init__(
self,
app_name: str,
task_file_name: str,
environment: WindowsAppEnv,
) -> None:
"""
Initialize the prefill flow with the application context.
:param app_name: The name of the application.
:param task_file_name: The name of the task file for logging and tracking.
:param environment: The environment of the app.
"""
self.execution_time = None
self._app_name = app_name
self._task_file_name = task_file_name
self._app_env = environment
# Create or reuse a PrefillAgent for the app
if self._app_name not in PrefillFlow._app_prefill_agent_dict:
PrefillFlow._app_prefill_agent_dict[self._app_name] = PrefillAgent(
"prefill",
self._app_name,
is_visual=True,
main_prompt=_configs["PREFILL_PROMPT"],
example_prompt=_configs["PREFILL_EXAMPLE_PROMPT"],
api_prompt=_configs["API_PROMPT"],
)
self._prefill_agent = PrefillFlow._app_prefill_agent_dict[self._app_name]
# Initialize execution step and UI control tools
self._execute_step = 0
self._control_inspector = ControlInspectorFacade(_BACKEND)
self._photographer = PhotographerFacade()
# Set default states
self._status = ""
# Initialize loggers for messages and responses
self._log_path_configs = _configs["PREFILL_LOG_PATH"].format(
task=self._task_file_name
)
os.makedirs(self._log_path_configs, exist_ok=True)
# Set up loggers
self._message_logger = BaseSession.initialize_logger(
self._log_path_configs, "prefill_messages.json", "w", _configs
)
self._response_logger = BaseSession.initialize_logger(
self._log_path_configs, "prefill_responses.json", "w", _configs
)
|
execute(template_copied_path, original_task, refined_steps)
Start the execution by retrieving the instantiated result.
Parameters: |
-
template_copied_path
(str )
–
The path of the copied template to use.
-
original_task
(str )
–
The original task to refine.
-
refined_steps
(List[str] )
–
The steps to guide the refinement process.
|
Returns: |
-
Dict[str, Any]
–
The refined task and corresponding action plans.
|
Source code in instantiation/workflow/prefill_flow.py
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104 | def execute(
self, template_copied_path: str, original_task: str, refined_steps: List[str]
) -> Dict[str, Any]:
"""
Start the execution by retrieving the instantiated result.
:param template_copied_path: The path of the copied template to use.
:param original_task: The original task to refine.
:param refined_steps: The steps to guide the refinement process.
:return: The refined task and corresponding action plans.
"""
start_time = time.time()
try:
instantiated_request, instantiated_plan = self._instantiate_task(
template_copied_path, original_task, refined_steps
)
except Exception as e:
raise e
finally:
self.execution_time = round(time.time() - start_time, 3)
return {
"instantiated_request": instantiated_request,
"instantiated_plan": instantiated_plan,
}
|
PrefillAgent
Bases: BasicAgent
The Agent for task instantialization and action sequence generation.
Initialize the PrefillAgent.
Parameters: |
-
name
(str )
–
-
process_name
(str )
–
-
is_visual
(bool )
–
The flag indicating whether the agent is visual or not.
-
main_prompt
(str )
–
-
example_prompt
(str )
–
-
api_prompt
(str )
–
|
Source code in instantiation/agent/prefill_agent.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42 | def __init__(
self,
name: str,
process_name: str,
is_visual: bool,
main_prompt: str,
example_prompt: str,
api_prompt: str,
):
"""
Initialize the PrefillAgent.
:param name: The name of the agent.
:param process_name: The name of the process.
:param is_visual: The flag indicating whether the agent is visual or not.
:param main_prompt: The main prompt.
:param example_prompt: The example prompt.
:param api_prompt: The API prompt.
"""
self._step = 0
self._complete = False
self._name = name
self._status = None
self.prompter: PrefillPrompter = self.get_prompter(
is_visual, main_prompt, example_prompt, api_prompt
)
self._process_name = process_name
|
get_prompter(is_visual, main_prompt, example_prompt, api_prompt)
Get the prompt for the agent.
This is the abstract method from BasicAgent that needs to be implemented.
Parameters: |
-
is_visual
(bool )
–
The flag indicating whether the agent is visual or not.
-
main_prompt
(str )
–
-
example_prompt
(str )
–
-
api_prompt
(str )
–
|
Source code in instantiation/agent/prefill_agent.py
44
45
46
47
48
49
50
51
52
53
54
55 | def get_prompter(self, is_visual: bool, main_prompt: str, example_prompt: str, api_prompt: str) -> str:
"""
Get the prompt for the agent.
This is the abstract method from BasicAgent that needs to be implemented.
:param is_visual: The flag indicating whether the agent is visual or not.
:param main_prompt: The main prompt.
:param example_prompt: The example prompt.
:param api_prompt: The API prompt.
:return: The prompt string.
"""
return PrefillPrompter(is_visual, main_prompt, example_prompt, api_prompt)
|
message_constructor(dynamic_examples, given_task, reference_steps, doc_control_state, log_path)
Construct the prompt message for the PrefillAgent.
Parameters: |
-
dynamic_examples
(str )
–
The dynamic examples retrieved from the self-demonstration and human demonstration.
-
given_task
(str )
–
-
reference_steps
(List[str] )
–
-
doc_control_state
(Dict[str, str] )
–
The document control state.
-
log_path
(str )
–
|
Source code in instantiation/agent/prefill_agent.py
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86 | def message_constructor(
self,
dynamic_examples: str,
given_task: str,
reference_steps: List[str],
doc_control_state: Dict[str, str],
log_path: str,
) -> List[str]:
"""
Construct the prompt message for the PrefillAgent.
:param dynamic_examples: The dynamic examples retrieved from the self-demonstration and human demonstration.
:param given_task: The given task.
:param reference_steps: The reference steps.
:param doc_control_state: The document control state.
:param log_path: The path of the log.
:return: The prompt message.
"""
prefill_agent_prompt_system_message = self.prompter.system_prompt_construction(
dynamic_examples
)
prefill_agent_prompt_user_message = self.prompter.user_content_construction(
given_task, reference_steps, doc_control_state, log_path
)
appagent_prompt_message = self.prompter.prompt_construction(
prefill_agent_prompt_system_message,
prefill_agent_prompt_user_message,
)
return appagent_prompt_message
|
process_comfirmation()
Confirm the process.
This is the abstract method from BasicAgent that needs to be implemented.
Source code in instantiation/agent/prefill_agent.py
| def process_comfirmation(self) -> None:
"""
Confirm the process.
This is the abstract method from BasicAgent that needs to be implemented.
"""
pass
|
FilterFlow
Class to refine the plan steps and prefill the file based on filtering criteria.
Initialize the filter flow for a task.
Parameters: |
-
app_name
(str )
–
Name of the application being processed.
-
task_file_name
(str )
–
Name of the task file being processed.
|
Source code in instantiation/workflow/filter_flow.py
21
22
23
24
25
26
27
28
29
30
31
32 | def __init__(self, app_name: str, task_file_name: str) -> None:
"""
Initialize the filter flow for a task.
:param app_name: Name of the application being processed.
:param task_file_name: Name of the task file being processed.
"""
self.execution_time = None
self._app_name = app_name
self._log_path_configs = _configs["FILTER_LOG_PATH"].format(task=task_file_name)
self._filter_agent = self._get_or_create_filter_agent()
self._initialize_logs()
|
execute(instantiated_request)
Execute the filter flow: Filter the task and save the result.
Parameters: |
-
instantiated_request
(str )
–
Request object to be filtered.
|
Returns: |
-
Dict[str, Any]
–
Tuple containing task quality flag, comment, and task type.
|
Source code in instantiation/workflow/filter_flow.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71 | def execute(self, instantiated_request: str) -> Dict[str, Any]:
"""
Execute the filter flow: Filter the task and save the result.
:param instantiated_request: Request object to be filtered.
:return: Tuple containing task quality flag, comment, and task type.
"""
start_time = time.time()
try:
judge, thought, request_type = self._get_filtered_result(
instantiated_request
)
except Exception as e:
raise e
finally:
self.execution_time = round(time.time() - start_time, 3)
return {
"judge": judge,
"thought": thought,
"request_type": request_type,
}
|
FilterAgent
Bases: BasicAgent
The Agent to evaluate the instantiated task is correct or not.
Initialize the FilterAgent.
Parameters: |
-
name
(str )
–
-
process_name
(str )
–
-
is_visual
(bool )
–
The flag indicating whether the agent is visual or not.
-
main_prompt
(str )
–
-
example_prompt
(str )
–
-
api_prompt
(str )
–
|
Source code in instantiation/agent/filter_agent.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40 | def __init__(
self,
name: str,
process_name: str,
is_visual: bool,
main_prompt: str,
example_prompt: str,
api_prompt: str,
):
"""
Initialize the FilterAgent.
:param name: The name of the agent.
:param process_name: The name of the process.
:param is_visual: The flag indicating whether the agent is visual or not.
:param main_prompt: The main prompt.
:param example_prompt: The example prompt.
:param api_prompt: The API prompt.
"""
self._step = 0
self._complete = False
self._name = name
self._status = None
self.prompter: FilterPrompter = self.get_prompter(
is_visual, main_prompt, example_prompt, api_prompt
)
self._process_name = process_name
|
get_prompter(is_visual, main_prompt, example_prompt, api_prompt)
Get the prompt for the agent.
Parameters: |
-
is_visual
(bool )
–
The flag indicating whether the agent is visual or not.
-
main_prompt
(str )
–
-
example_prompt
(str )
–
-
api_prompt
(str )
–
|
Source code in instantiation/agent/filter_agent.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58 | def get_prompter(
self,
is_visual: bool,
main_prompt: str,
example_prompt: str,
api_prompt: str
) -> FilterPrompter:
"""
Get the prompt for the agent.
:param is_visual: The flag indicating whether the agent is visual or not.
:param main_prompt: The main prompt.
:param example_prompt: The example prompt.
:param api_prompt: The API prompt.
:return: The prompt string.
"""
return FilterPrompter(is_visual, main_prompt, example_prompt, api_prompt)
|
message_constructor(request, app)
Construct the prompt message for the FilterAgent.
Parameters: |
-
request
(str )
–
-
app
(str )
–
The name of the operated app.
|
Source code in instantiation/agent/filter_agent.py
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78 | def message_constructor(self, request: str, app: str) -> List[str]:
"""
Construct the prompt message for the FilterAgent.
:param request: The request sentence.
:param app: The name of the operated app.
:return: The prompt message.
"""
filter_agent_prompt_system_message = self.prompter.system_prompt_construction(
app=app
)
filter_agent_prompt_user_message = self.prompter.user_content_construction(
request
)
filter_agent_prompt_message = self.prompter.prompt_construction(
filter_agent_prompt_system_message, filter_agent_prompt_user_message
)
return filter_agent_prompt_message
|
process_comfirmation()
Confirm the process.
This is the abstract method from BasicAgent that needs to be implemented.
Source code in instantiation/agent/filter_agent.py
| def process_comfirmation(self) -> None:
"""
Confirm the process.
This is the abstract method from BasicAgent that needs to be implemented.
"""
pass
|