GUI Automator

The GUI Automator enables to mimic the operations of mouse and keyboard on the application's UI controls. UFO uses the UIA or Win32 APIs to interact with the application's UI controls, such as buttons, edit boxes, and menus.

Configuration

There are several configurations that need to be set up before using the UI Automator in the config_dev.yaml file. Below is the list of configurations related to the UI Automator:

Configuration Option Description Type Default Value
CONTROL_BACKEND The list of backend for control action, currently supporting uia and win32 and onmiparser List ["uia"]
CONTROL_LIST The list of widgets allowed to be selected. List ["Button", "Edit", "TabItem", "Document", "ListItem", "MenuItem", "ScrollBar", "TreeItem", "Hyperlink", "ComboBox", "RadioButton", "DataItem"]
ANNOTATION_COLORS The colors assigned to different control types for annotation. Dictionary {"Button": "#FFF68F", "Edit": "#A5F0B5", "TabItem": "#A5E7F0", "Document": "#FFD18A", "ListItem": "#D9C3FE", "MenuItem": "#E7FEC3", "ScrollBar": "#FEC3F8", "TreeItem": "#D6D6D6", "Hyperlink": "#91FFEB", "ComboBox": "#D8B6D4"}
API_PROMPT The prompt for the UI automation API. String "ufo/prompts/share/base/api.yaml"
CLICK_API The API used for click action, can be click_input or click. String "click_input"
INPUT_TEXT_API The API used for input text action, can be type_keys or set_text. String "type_keys"
INPUT_TEXT_ENTER Whether to press enter after typing the text. Boolean False

Receiver

The receiver of the UI Automator is the ControlReceiver class defined in the ufo/automator/ui_control/controller/control_receiver module. It is initialized with the application's window handle and control wrapper that executes the actions. The ControlReceiver provides functionalities to interact with the application's UI controls. Below is the reference for the ControlReceiver class:

Bases: ReceiverBasic

The control receiver class.

Initialize the control receiver.

Parameters:
  • control (Optional[UIAWrapper]) –

    The control element.

  • application (Optional[UIAWrapper]) –

    The application element.

Source code in automator/ui_control/controller.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def __init__(
    self, control: Optional[UIAWrapper], application: Optional[UIAWrapper]
) -> None:
    """
    Initialize the control receiver.
    :param control: The control element.
    :param application: The application element.
    """

    self.control = control
    self.application = application

    if control:
        self.control.set_focus()
        self.wait_enabled()
    elif application:
        self.application.set_focus()

annotation(params, annotation_dict)

Take a screenshot of the current application window and annotate the control item on the screenshot.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the annotation method.

  • annotation_dict (Dict[str, UIAWrapper]) –

    The dictionary of the control labels.

Source code in automator/ui_control/controller.py
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
def annotation(
    self, params: Dict[str, str], annotation_dict: Dict[str, UIAWrapper]
) -> List[str]:
    """
    Take a screenshot of the current application window and annotate the control item on the screenshot.
    :param params: The arguments of the annotation method.
    :param annotation_dict: The dictionary of the control labels.
    """
    selected_controls_labels = params.get("control_labels", [])

    control_reannotate = [
        annotation_dict[str(label)] for label in selected_controls_labels
    ]

    return control_reannotate

atomic_execution(method_name, params)

Atomic execution of the action on the control elements.

Parameters:
  • method_name (str) –

    The name of the method to execute.

  • params (Dict[str, Any]) –

    The arguments of the method.

Returns:
  • str

    The result of the action.

Source code in automator/ui_control/controller.py
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
def atomic_execution(self, method_name: str, params: Dict[str, Any]) -> str:
    """
    Atomic execution of the action on the control elements.
    :param method_name: The name of the method to execute.
    :param params: The arguments of the method.
    :return: The result of the action.
    """

    import traceback

    try:
        method = getattr(self.control, method_name)
        result = method(**params)
    except AttributeError:
        message = f"{self.control} doesn't have a method named {method_name}"
        print_with_color(f"Warning: {message}", "yellow")
        result = message
    except Exception as e:
        full_traceback = traceback.format_exc()
        message = f"An error occurred: {full_traceback}"
        print_with_color(f"Warning: {message}", "yellow")
        result = message
    return result

click_input(params)

Click the control element.

Parameters:
  • params (Dict[str, Union[str, bool]]) –

    The arguments of the click method.

Returns:
  • str

    The result of the click action.

Source code in automator/ui_control/controller.py
82
83
84
85
86
87
88
89
90
91
92
93
94
def click_input(self, params: Dict[str, Union[str, bool]]) -> str:
    """
    Click the control element.
    :param params: The arguments of the click method.
    :return: The result of the click action.
    """

    api_name = configs.get("CLICK_API", "click_input")

    if api_name == "click":
        return self.atomic_execution("click", params)
    else:
        return self.atomic_execution("click_input", params)

click_on_coordinates(params)

Click on the coordinates of the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the click on coordinates method.

Returns:
  • str

    The result of the click on coordinates action.

Source code in automator/ui_control/controller.py
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
def click_on_coordinates(self, params: Dict[str, str]) -> str:
    """
    Click on the coordinates of the control element.
    :param params: The arguments of the click on coordinates method.
    :return: The result of the click on coordinates action.
    """

    # Get the relative coordinates fraction of the application window.
    x = float(params.get("x", 0))
    y = float(params.get("y", 0))

    button = params.get("button", "left")
    double = params.get("double", False)

    # Get the absolute coordinates of the application window.
    tranformed_x, tranformed_y = self.transform_point(x, y)

    # print(f"Clicking on {tranformed_x}, {tranformed_y}")

    self.application.set_focus()

    pyautogui.click(
        tranformed_x, tranformed_y, button=button, clicks=2 if double else 1
    )

    return ""

drag_on_coordinates(params)

Drag on the coordinates of the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the drag on coordinates method.

Returns:
  • str

    The result of the drag on coordinates action.

Source code in automator/ui_control/controller.py
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
def drag_on_coordinates(self, params: Dict[str, str]) -> str:
    """
    Drag on the coordinates of the control element.
    :param params: The arguments of the drag on coordinates method.
    :return: The result of the drag on coordinates action.
    """

    start = self.transform_point(
        float(params.get("start_x", 0)), float(params.get("start_y", 0))
    )
    end = self.transform_point(
        float(params.get("end_x", 0)), float(params.get("end_y", 0))
    )

    duration = float(params.get("duration", 1))

    button = params.get("button", "left")

    key_hold = params.get("key_hold", None)

    self.application.set_focus()

    if key_hold:
        pyautogui.keyDown(key_hold)

    pyautogui.moveTo(start[0], start[1])
    pyautogui.dragTo(end[0], end[1], button=button, duration=duration)

    if key_hold:
        pyautogui.keyUp(key_hold)

    return ""

key_press(params)

Key press on the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the key press method.

Returns:
  • str

    The result of the key press action.

Source code in automator/ui_control/controller.py
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
def key_press(self, params: Dict[str, str]) -> str:
    """
    Key press on the control element.
    :param params: The arguments of the key press method.
    :return: The result of the key press action.
    """

    keys = params.get("keys", [])

    for key in keys:
        key = key.lower()
        pyautogui.keyDown(key)
    for key in keys:
        key = key.lower()
        pyautogui.keyUp(key)

keyboard_input(params)

Keyboard input on the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the keyboard input method.

Returns:
  • str

    The result of the keyboard input action.

Source code in automator/ui_control/controller.py
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
def keyboard_input(self, params: Dict[str, str]) -> str:
    """
    Keyboard input on the control element.
    :param params: The arguments of the keyboard input method.
    :return: The result of the keyboard input action.
    """

    control_focus = params.get("control_focus", True)
    keys = params.get("keys", "")
    keys = TextTransformer.transform_text(keys, "all")

    if control_focus:
        self.atomic_execution("type_keys", {"keys": keys})
    else:
        self.application.type_keys(keys=keys)
    return keys

mouse_move(params)

Mouse move on the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the mouse move method.

Returns:
  • str

    The result of the mouse move action.

Source code in automator/ui_control/controller.py
292
293
294
295
296
297
298
299
300
301
302
303
304
def mouse_move(self, params: Dict[str, str]) -> str:
    """
    Mouse move on the control element.
    :param params: The arguments of the mouse move method.
    :return: The result of the mouse move action.
    """

    x = int(params.get("x", 0))
    y = int(params.get("y", 0))

    new_x, new_y = self.transform_point(x, y)

    pyautogui.moveTo(new_x, new_y, duration=0.1)

no_action()

No action on the control element.

Returns:
  • The result of the no action.

Source code in automator/ui_control/controller.py
316
317
318
319
320
321
322
def no_action(self):
    """
    No action on the control element.
    :return: The result of the no action.
    """

    return ""

scroll(params)

Scroll on the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the scroll method.

Returns:
  • str

    The result of the scroll action.

Source code in automator/ui_control/controller.py
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
def scroll(self, params: Dict[str, str]) -> str:
    """
    Scroll on the control element.
    :param params: The arguments of the scroll method.
    :return: The result of the scroll action.
    """

    x = int(params.get("x", 0))
    y = int(params.get("y", 0))

    new_x, new_y = self.transform_point(x, y)

    scroll_x = int(params.get("scroll_x", 0))
    scroll_y = int(params.get("scroll_y", 0))

    pyautogui.vscroll(scroll_y, x=new_x, y=new_y)
    pyautogui.hscroll(scroll_x, x=new_x, y=new_y)

set_edit_text(params)

Set the edit text of the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the set edit text method.

Returns:
  • str

    The result of the set edit text action.

Source code in automator/ui_control/controller.py
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
def set_edit_text(self, params: Dict[str, str]) -> str:
    """
    Set the edit text of the control element.
    :param params: The arguments of the set edit text method.
    :return: The result of the set edit text action.
    """

    text = params.get("text", "")
    inter_key_pause = configs.get("INPUT_TEXT_INTER_KEY_PAUSE", 0.1)

    if params.get("clear_current_text", False):
        self.control.type_keys("^a", pause=inter_key_pause)
        self.control.type_keys("{DELETE}", pause=inter_key_pause)

    if configs["INPUT_TEXT_API"] == "set_text":
        method_name = "set_edit_text"
        args = {"text": text}
    else:
        method_name = "type_keys"

        # Transform the text according to the tags.
        text = TextTransformer.transform_text(text, "all")

        args = {"keys": text, "pause": inter_key_pause, "with_spaces": True}
    try:
        result = self.atomic_execution(method_name, args)
        if (
            method_name == "set_text"
            and args["text"] not in self.control.window_text()
        ):
            raise Exception(f"Failed to use set_text: {args['text']}")
        if configs["INPUT_TEXT_ENTER"] and method_name in ["type_keys", "set_text"]:

            self.atomic_execution("type_keys", params={"keys": "{ENTER}"})
        return result
    except Exception as e:
        if method_name == "set_text":
            print_with_color(
                f"{self.control} doesn't have a method named {method_name}, trying default input method",
                "yellow",
            )
            method_name = "type_keys"
            clear_text_keys = "^a{BACKSPACE}"
            text_to_type = args["text"]
            keys_to_send = clear_text_keys + text_to_type
            method_name = "type_keys"
            args = {
                "keys": keys_to_send,
                "pause": inter_key_pause,
                "with_spaces": True,
            }
            return self.atomic_execution(method_name, args)
        else:
            return f"An error occurred: {e}"

summary(params)

Visual summary of the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the visual summary method. should contain a key "text" with the text summary.

Returns:
  • str

    The result of the visual summary action.

Source code in automator/ui_control/controller.py
156
157
158
159
160
161
162
163
def summary(self, params: Dict[str, str]) -> str:
    """
    Visual summary of the control element.
    :param params: The arguments of the visual summary method. should contain a key "text" with the text summary.
    :return: The result of the visual summary action.
    """

    return params.get("text")

texts()

Get the text of the control element.

Returns:
  • str

    The text of the control element.

Source code in automator/ui_control/controller.py
253
254
255
256
257
258
def texts(self) -> str:
    """
    Get the text of the control element.
    :return: The text of the control element.
    """
    return self.control.texts()

transform_point(fraction_x, fraction_y)

Transform the relative coordinates to the absolute coordinates.

Parameters:
  • fraction_x (float) –

    The relative x coordinate.

  • fraction_y (float) –

    The relative y coordinate.

Returns:
  • Tuple[int, int]

    The absolute coordinates.

Source code in automator/ui_control/controller.py
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
def transform_point(self, fraction_x: float, fraction_y: float) -> Tuple[int, int]:
    """
    Transform the relative coordinates to the absolute coordinates.
    :param fraction_x: The relative x coordinate.
    :param fraction_y: The relative y coordinate.
    :return: The absolute coordinates.
    """
    application_rect: RECT = self.application.rectangle()
    application_x = application_rect.left
    application_y = application_rect.top
    application_width = application_rect.width()
    application_height = application_rect.height()

    x = application_x + int(application_width * fraction_x)
    y = application_y + int(application_height * fraction_y)

    return x, y

transform_scaled_point_to_raw(scaled_x, scaled_y, scaled_width, scaled_height, raw_width, raw_height)

Transform the scaled coordinates to the raw coordinates.

Parameters:
  • scaled_x (int) –

    The scaled x coordinate.

  • scaled_y (int) –

    The scaled y coordinate.

  • raw_width (int) –

    The raw width of the application window.

  • raw_height (int) –

    The raw height of the application window.

  • scaled_width (int) –

    The scaled width of the application window.

  • scaled_height (int) –

    The scaled height of the application window.

Source code in automator/ui_control/controller.py
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
def transform_scaled_point_to_raw(
    self,
    scaled_x: int,
    scaled_y: int,
    scaled_width: int,
    scaled_height: int,
    raw_width: int,
    raw_height: int,
) -> Tuple[int, int]:
    """
    Transform the scaled coordinates to the raw coordinates.
    :param scaled_x: The scaled x coordinate.
    :param scaled_y: The scaled y coordinate.
    :param raw_width: The raw width of the application window.
    :param raw_height: The raw height of the application window.
    :param scaled_width: The scaled width of the application window.
    :param scaled_height: The scaled height of the application window.
    """

    ratio = min(scaled_width / raw_width, scaled_height / raw_height)
    raw_x = scaled_x / ratio
    raw_y = scaled_y / ratio

    return int(raw_x), int(raw_y)

transfrom_absolute_point_to_fractional(x, y)

Transform the absolute coordinates to the relative coordinates.

Parameters:
  • x (int) –

    The absolute x coordinate on the application window.

  • y (int) –

    The absolute y coordinate on the application window.

Returns:
  • Tuple[int, int]

    The relative coordinates fraction.

Source code in automator/ui_control/controller.py
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
def transfrom_absolute_point_to_fractional(self, x: int, y: int) -> Tuple[int, int]:
    """
    Transform the absolute coordinates to the relative coordinates.
    :param x: The absolute x coordinate on the application window.
    :param y: The absolute y coordinate on the application window.
    :return: The relative coordinates fraction.
    """
    application_rect: RECT = self.application.rectangle()
    # application_x = application_rect.left
    # application_y = application_rect.top

    application_width = application_rect.width()
    application_height = application_rect.height()

    fraction_x = x / application_width
    fraction_y = y / application_height

    return fraction_x, fraction_y

type(params)

Type on the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the type method.

Returns:
  • str

    The result of the type action.

Source code in automator/ui_control/controller.py
306
307
308
309
310
311
312
313
314
def type(self, params: Dict[str, str]) -> str:
    """
    Type on the control element.
    :param params: The arguments of the type method.
    :return: The result of the type action.
    """

    text = params.get("text", "")
    pyautogui.write(text, interval=0.1)

wait_enabled(timeout=10, retry_interval=0.5)

Wait until the control is enabled.

Parameters:
  • timeout (int, default: 10 ) –

    The timeout to wait.

  • retry_interval (int, default: 0.5 ) –

    The retry interval to wait.

Source code in automator/ui_control/controller.py
340
341
342
343
344
345
346
347
348
349
350
351
def wait_enabled(self, timeout: int = 10, retry_interval: int = 0.5) -> None:
    """
    Wait until the control is enabled.
    :param timeout: The timeout to wait.
    :param retry_interval: The retry interval to wait.
    """
    while not self.control.is_enabled():
        time.sleep(retry_interval)
        timeout -= retry_interval
        if timeout <= 0:
            warnings.warn(f"Timeout: {self.control} is not enabled.")
            break

wait_visible(timeout=10, retry_interval=0.5)

Wait until the window is enabled.

Parameters:
  • timeout (int, default: 10 ) –

    The timeout to wait.

  • retry_interval (int, default: 0.5 ) –

    The retry interval to wait.

Source code in automator/ui_control/controller.py
353
354
355
356
357
358
359
360
361
362
363
364
def wait_visible(self, timeout: int = 10, retry_interval: int = 0.5) -> None:
    """
    Wait until the window is enabled.
    :param timeout: The timeout to wait.
    :param retry_interval: The retry interval to wait.
    """
    while not self.control.is_visible():
        time.sleep(retry_interval)
        timeout -= retry_interval
        if timeout <= 0:
            warnings.warn(f"Timeout: {self.control} is not visible.")
            break

wheel_mouse_input(params)

Wheel mouse input on the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the wheel mouse input method.

Returns:
  • The result of the wheel mouse input action.

Source code in automator/ui_control/controller.py
260
261
262
263
264
265
266
267
268
269
270
271
272
def wheel_mouse_input(self, params: Dict[str, str]):
    """
    Wheel mouse input on the control element.
    :param params: The arguments of the wheel mouse input method.
    :return: The result of the wheel mouse input action.
    """

    if self.control is not None:
        return self.atomic_execution("wheel_mouse_input", params)
    else:
        keyboard.send_keys("{VK_CONTROL up}")
        dist = int(params.get("wheel_dist", 0))
        return self.application.wheel_mouse_input(wheel_dist=dist)


Command

The command of the UI Automator is the ControlCommand class defined in the ufo/automator/ui_control/controller/ControlCommand module. It encapsulates the function and parameters required to execute the action. The ControlCommand class is a base class for all commands in the UI Automator application. Below is an example of a ClickInputCommand class that inherits from the ControlCommand class:

@ControlReceiver.register
class ClickInputCommand(ControlCommand):
    """
    The click input command class.
    """

    def execute(self) -> str:
        """
        Execute the click input command.
        :return: The result of the click input command.
        """
        return self.receiver.click_input(self.params)

    @classmethod
    def name(cls) -> str:
        """
        Get the name of the atomic command.
        :return: The name of the atomic command.
        """
        return "click_input"

Note

The concrete command classes must implement the execute method to execute the action and the name method to return the name of the atomic command.

Note

Each command must register with a specific ControlReceiver to be executed using the @ControlReceiver.register decorator.

Below is the list of available commands in the UI Automator that are currently supported by UFO:

Command Name Function Name Description
ClickInputCommand click_input Click the control item with the mouse.
ClickOnCoordinatesCommand click_on_coordinates Click on the specific fractional coordinates of the application window.
DragOnCoordinatesCommand drag_on_coordinates Drag the mouse on the specific fractional coordinates of the application window.
SetEditTextCommand set_edit_text Add new text to the control item.
GetTextsCommand texts Get the text of the control item.
WheelMouseInputCommand wheel_mouse_input Scroll the control item.
KeyboardInputCommand keyboard_input Simulate the keyboard input.

Tip

Please refer to the ufo/prompts/share/base/api.yaml file for the detailed API documentation of the UI Automator.

Tip

You can customize the commands by adding new command classes to the ufo/automator/ui_control/controller/ControlCommand module.