Introduction
This repository contains the implementation of the Data Collection process for training the Large Action Models (LAMs) in the paper of Large Action Models: From Inception to Implementation. The Data Collection process is designed to streamline task processing, ensuring that all necessary steps are seamlessly integrated from initialization to execution. This module is part of the UFO project.
Dataflow
Dataflow uses UFO to implement instantiation
, execution
, and dataflow
for a given task, with options for batch processing and single processing.
- Instantiation: Instantiation refers to the process of setting up and preparing a task for execution. This step typically involves
choosing template
,prefill
andfilter
. - Execution: Execution is the actual process of running the task. This step involves carrying out the actions or operations specified by the
Instantiation
. And after execution, an evaluate agent will evaluate the quality of the whole execution process. - Dataflow: Dataflow is the overarching process that combines instantiation and execution into a single pipeline. It provides an end-to-end solution for processing tasks, ensuring that all necessary steps (from initialization to execution) are seamlessly integrated.
You can use instantiation
and execution
independently if you only need to perform one specific part of the process. When both steps are required for a task, the dataflow
process streamlines them, allowing you to execute tasks from start to finish in a single pipeline.
The overall processing of dataflow is as below. Given a task-plan data, the LLMwill instantiatie the task-action data, including choosing template, prefill, filter.
How To Use
1. Install Packages
You should install the necessary packages in the UFO root folder:
pip install -r requirements.txt
2. Configure the LLMs
Before running dataflow, you need to provide your LLM configurations individually for PrefillAgent and FilterAgent. You can create your own config file dataflow/config/config.yaml
, by copying the dataflow/config/config.yaml.template
and editing config for PREFILL_AGENT and FILTER_AGENT as follows:
OpenAI
VISUAL_MODE: True, # Whether to use the visual mode
API_TYPE: "openai" , # The API type, "openai" for the OpenAI API.
API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint.
API_KEY: "sk-", # The OpenAI API key, begin with sk-
API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4-vision-preview", # The only OpenAI model
Azure OpenAI (AOAI)
VISUAL_MODE: True, # Whether to use the visual mode
API_TYPE: "aoai" , # The API type, "aoai" for the Azure OpenAI.
API_BASE: "YOUR_ENDPOINT", # The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
API_KEY: "YOUR_KEY", # The aoai API key
API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4-vision-preview", # The only OpenAI model
API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API
You can also non-visial model (e.g., GPT-4) for each agent, by setting VISUAL_MODE: False
and proper API_MODEL
(openai) and API_DEPLOYMENT_ID
(aoai).
Non-Visual Model Configuration
You can utilize non-visual models (e.g., GPT-4) for each agent by configuring the following settings in the config.yaml
file:
VISUAL_MODE: False # To enable non-visual mode.
- Specify the appropriate
API_MODEL
(OpenAI) andAPI_DEPLOYMENT_ID
(AOAI) for each agent.
Ensure you configure these settings accurately to leverage non-visual models effectively.
Other Configurations
config_dev.yaml
specifies the paths of relevant files and contains default settings. The match strategy for the window match and control filter supports options: 'contains'
, 'fuzzy'
, and 'regex'
, allowing flexible matching strategy for users. The MAX_STEPS
is the max step for the execute_flow, which can be set by users.
Note
The specific implementation and invocation method of the matching strategy can refer to windows_app_env.
Note
BE CAREFUL! If you are using GitHub or other open-source tools, do not expose your config.yaml
online, as it contains your private keys.
3. Prepare Files
Certain files need to be prepared before running the task.
3.1. Tasks as JSON
The tasks that need to be instantiated should be organized in a folder of JSON files, with the default folder path set to dataflow /tasks
. This path can be changed in the dataflow/config/config.yaml
file, or you can specify it in the terminal, as mentioned in 4. Start Running. For example, a task stored in dataflow/tasks/prefill/
may look like this:
{
// The app you want to use
"app": "word",
// A unique ID to distinguish different tasks
"unique_id": "1",
// The task and steps to be instantiated
"task": "Type 'hello' and set the font type to Arial",
"refined_steps": [
"Type 'hello'",
"Set the font to Arial"
]
}
3.2. Templates and Descriptions
You should place an app file as a reference for instantiation in a folder named after the app.
For example, if you have template1.docx
for Word, it should be located at dataflow/templates/word/template1.docx
.
Additionally, for each app folder, there should be a description.json
file located at dataflow/templates/word/description.json
, which describes each template file in detail. It may look like this:
{
"template1.docx": "A document with a rectangle shape",
"template2.docx": "A document with a line of text"
}
If a description.json
file is not present, one template file will be selected at random.
3.3. Final Structure
Ensure the following files are in place:
- JSON files to be instantiated
- Templates as references for instantiation
- Description file in JSON format
The structure of the files can be:
dataflow/
|
├── tasks
│ └── prefill
│ ├── bulleted.json
│ ├── delete.json
│ ├── draw.json
│ ├── macro.json
│ └── rotate.json
├── templates
│ └── word
│ ├── description.json
│ ├── template1.docx
│ ├── template2.docx
│ ├── template3.docx
│ ├── template4.docx
│ ├── template5.docx
│ ├── template6.docx
│ └── template7.docx
└── ...
4. Start Running
After finishing the previous steps, you can use the following commands in the command line. We provide single / batch process, for which you need to give the single file path / folder path. Determine the type of path provided by the user and automatically decide whether to process a single task or batch tasks.
Also, you can choose to use instantiation
/ execution
sections individually, or use them as a whole section, which is named as dataflow
.
The default task hub is set to be "TASKS_HUB"
in dataflow/config_dev.yaml
.
- Dataflow Task:
python -m dataflow -dataflow --task_path path_to_task_file
- Instantiation Task:
python -m dataflow -instantiation --task_path path_to_task_file
- Execution Task:
python -m dataflow -execution --task_path path_to_task_file
Note
- Users should be careful to save the original files while using this project; otherwise, the files will be closed when the app is shut down.
- After starting the project, users should not close the app window while the program is taking screenshots.
Workflow
Instantiation
There are three key steps in the instantiation process:
Choose a template
file according to the specified app and instruction.Prefill
the task using the current screenshot.Filter
the established task.
Given the initial task, the dataflow first choose a template (Phase 1
), the prefill the initial task based on word envrionment to obtain task-action data (Phase 2
). Finnally, it will filter the established task to evaluate the quality of task-action data. (Phase 3
)
Note
The more detailed code design documentation for instantiation can be found in instantiation.
Execution
The instantiated plans will be executed by a execute task. After execution, evalution agent will evaluation the quality of the entire execution process.
Note
The more detailed code design documentation for execution can be found in execution.
Result
The results will be saved in the results\
directory under instantiation
, execution
, and dataflow
, and will be further stored in subdirectories based on the execution outcomes.
Note
The more detailed information of result can be found in result.
Quick Start
We prepare two cases to show the dataflow, which can be found in dataflow\tasks\prefill
. So after installing required packages, you can type the following command in the command line:
python -m dataflow -dataflow
And you can see the hints showing in the terminal, which means the dataflow is working.
Structure of related files
After the two tasks are finished, the task and output files would appear as follows:
UFO/
├── dataflow/
│ └── results/
│ ├── saved_document/ # Directory for saved documents
│ │ ├── bulleted.docx # Result of the "bulleted" task
│ │ └── rotate.docx # Result of the "rotate" task
│ ├── dataflow/ # Dataflow results directory
│ │ ├── execution_pass/ # Successfully executed tasks
│ │ │ ├── bulleted.json # Execution result for the "bulleted" task
│ │ │ ├── rotate.json # Execution result for the "rotate" task
│ │ │ └── ...
└── ...
The specific results can be referenced in the result in JSON format along with example data.
Log files
The corresponding logs can be found in the directories logs/bulleted
and logs/rotate
, as shown below. Detailed logs for each workflow are recorded, capturing every step of the execution process.
Reference
AppEnum
Bases: Enum
Enum class for applications.
Initialize the application enum.
Parameters: |
|
---|
Source code in dataflow/data_flow_controller.py
47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
|
TaskObject
Initialize the task object.
Parameters: |
|
---|
Source code in dataflow/data_flow_controller.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
DataFlowController
Flow controller class to manage the instantiation and execution process.
Initialize the flow controller.
Parameters: |
|
---|
Source code in dataflow/data_flow_controller.py
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
|
instantiated_plan: List[Dict[str, Any]]
property
writable
Get the instantiated plan from the task information.
Returns: |
|
---|
template_copied_path: str
property
Get the copied template path from the task information.
Returns: |
|
---|
execute_execution(request, plan)
Execute the execution process.
Parameters: |
|
---|
Source code in dataflow/data_flow_controller.py
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
|
execute_instantiation()
Execute the instantiation process.
Returns: |
|
---|
Source code in dataflow/data_flow_controller.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 |
|
init_task_info()
Initialize the task information.
Returns: |
|
---|
Source code in dataflow/data_flow_controller.py
134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
|
instantiation_single_flow(flow_class, flow_type, init_params=None, execute_params=None)
Execute a single flow process in the instantiation phase.
Parameters: |
|
---|
Returns: |
|
---|
Source code in dataflow/data_flow_controller.py
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 |
|
run()
Run the instantiation and execution process.
Source code in dataflow/data_flow_controller.py
360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 |
|
save_result()
Validate and save the instantiated task result.
Source code in dataflow/data_flow_controller.py
287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 |
|