UI Automator

The UI Automator enables to mimic the operations of mouse and keyboard on the application's UI controls. UFO uses the UIA or Win32 APIs to interact with the application's UI controls, such as buttons, edit boxes, and menus.

Configuration

There are several configurations that need to be set up before using the UI Automator in the config_dev.yaml file. Below is the list of configurations related to the UI Automator:

Configuration Option Description Type Default Value
CONTROL_BACKEND The backend for control action, currently supporting uia and win32. String "uia"
CONTROL_LIST The list of widgets allowed to be selected. List ["Button", "Edit", "TabItem", "Document", "ListItem", "MenuItem", "ScrollBar", "TreeItem", "Hyperlink", "ComboBox", "RadioButton", "DataItem"]
ANNOTATION_COLORS The colors assigned to different control types for annotation. Dictionary {"Button": "#FFF68F", "Edit": "#A5F0B5", "TabItem": "#A5E7F0", "Document": "#FFD18A", "ListItem": "#D9C3FE", "MenuItem": "#E7FEC3", "ScrollBar": "#FEC3F8", "TreeItem": "#D6D6D6", "Hyperlink": "#91FFEB", "ComboBox": "#D8B6D4"}
API_PROMPT The prompt for the UI automation API. String "ufo/prompts/share/base/api.yaml"
CLICK_API The API used for click action, can be click_input or click. String "click_input"
INPUT_TEXT_API The API used for input text action, can be type_keys or set_text. String "type_keys"
INPUT_TEXT_ENTER Whether to press enter after typing the text. Boolean False

Receiver

The receiver of the UI Automator is the ControlReceiver class defined in the ufo/automator/ui_control/controller/control_receiver module. It is initialized with the application's window handle and control wrapper that executes the actions. The ControlReceiver provides functionalities to interact with the application's UI controls. Below is the reference for the ControlReceiver class:

Bases: ReceiverBasic

The control receiver class.

Initialize the control receiver.

Parameters:
  • control (Optional[UIAWrapper]) –

    The control element.

  • application (Optional[UIAWrapper]) –

    The application element.

Source code in automator/ui_control/controller.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
def __init__(
    self, control: Optional[UIAWrapper], application: Optional[UIAWrapper]
) -> None:
    """
    Initialize the control receiver.
    :param control: The control element.
    :param application: The application element.
    """

    self.control = control
    self.application = application

    if control:
        self.control.set_focus()
        self.wait_enabled()
    elif application:
        self.application.set_focus()

annotation(params, annotation_dict)

Take a screenshot of the current application window and annotate the control item on the screenshot.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the annotation method.

  • annotation_dict (Dict[str, UIAWrapper]) –

    The dictionary of the control labels.

Source code in automator/ui_control/controller.py
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
def annotation(
    self, params: Dict[str, str], annotation_dict: Dict[str, UIAWrapper]
) -> List[str]:
    """
    Take a screenshot of the current application window and annotate the control item on the screenshot.
    :param params: The arguments of the annotation method.
    :param annotation_dict: The dictionary of the control labels.
    """
    selected_controls_labels = params.get("control_labels", [])

    control_reannotate = [
        annotation_dict[str(label)] for label in selected_controls_labels
    ]

    return control_reannotate

atomic_execution(method_name, params)

Atomic execution of the action on the control elements.

Parameters:
  • method_name (str) –

    The name of the method to execute.

  • params (Dict[str, Any]) –

    The arguments of the method.

Returns:
  • str

    The result of the action.

Source code in automator/ui_control/controller.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def atomic_execution(self, method_name: str, params: Dict[str, Any]) -> str:
    """
    Atomic execution of the action on the control elements.
    :param method_name: The name of the method to execute.
    :param params: The arguments of the method.
    :return: The result of the action.
    """

    import traceback

    try:
        method = getattr(self.control, method_name)
        result = method(**params)
    except AttributeError:
        message = f"{self.control} doesn't have a method named {method_name}"
        print_with_color(f"Warning: {message}", "yellow")
        result = message
    except Exception as e:
        full_traceback = traceback.format_exc()
        message = f"An error occurred: {full_traceback}"
        print_with_color(f"Warning: {message}", "yellow")
        result = message
    return result

click_input(params)

Click the control element.

Parameters:
  • params (Dict[str, Union[str, bool]]) –

    The arguments of the click method.

Returns:
  • str

    The result of the click action.

Source code in automator/ui_control/controller.py
79
80
81
82
83
84
85
86
87
88
89
90
91
def click_input(self, params: Dict[str, Union[str, bool]]) -> str:
    """
    Click the control element.
    :param params: The arguments of the click method.
    :return: The result of the click action.
    """

    api_name = configs.get("CLICK_API", "click_input")

    if api_name == "click":
        return self.atomic_execution("click", params)
    else:
        return self.atomic_execution("click_input", params)

click_on_coordinates(params)

Click on the coordinates of the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the click on coordinates method.

Returns:
  • str

    The result of the click on coordinates action.

Source code in automator/ui_control/controller.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def click_on_coordinates(self, params: Dict[str, str]) -> str:
    """
    Click on the coordinates of the control element.
    :param params: The arguments of the click on coordinates method.
    :return: The result of the click on coordinates action.
    """

    # Get the relative coordinates fraction of the application window.
    x = float(params.get("x", 0))
    y = float(params.get("y", 0))

    button = params.get("button", "left")
    double = params.get("double", False)

    # Get the absolute coordinates of the application window.
    tranformed_x, tranformed_y = self.transform_point(x, y)

    self.application.set_focus()

    pyautogui.click(
        tranformed_x, tranformed_y, button=button, clicks=2 if double else 1
    )

    return ""

drag_on_coordinates(params)

Drag on the coordinates of the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the drag on coordinates method.

Returns:
  • str

    The result of the drag on coordinates action.

Source code in automator/ui_control/controller.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
def drag_on_coordinates(self, params: Dict[str, str]) -> str:
    """
    Drag on the coordinates of the control element.
    :param params: The arguments of the drag on coordinates method.
    :return: The result of the drag on coordinates action.
    """

    start = self.transform_point(
        float(params.get("start_x", 0)), float(params.get("start_y", 0))
    )
    end = self.transform_point(
        float(params.get("end_x", 0)), float(params.get("end_y", 0))
    )

    duration = float(params.get("duration", 1))

    button = params.get("button", "left")

    key_hold = params.get("key_hold", None)

    self.application.set_focus()

    if key_hold:
        pyautogui.keyDown(key_hold)

    pyautogui.moveTo(start[0], start[1])
    pyautogui.dragTo(end[0], end[1], button=button, duration=duration)

    if key_hold:
        pyautogui.keyUp(key_hold)

    return ""

keyboard_input(params)

Keyboard input on the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the keyboard input method.

Returns:
  • str

    The result of the keyboard input action.

Source code in automator/ui_control/controller.py
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
def keyboard_input(self, params: Dict[str, str]) -> str:
    """
    Keyboard input on the control element.
    :param params: The arguments of the keyboard input method.
    :return: The result of the keyboard input action.
    """

    control_focus = params.get("control_focus", True)
    keys = params.get("keys", "")

    if control_focus:
        self.atomic_execution("type_keys", {"keys": keys})
    else:
        pyautogui.typewrite(keys)
    return keys

no_action()

No action on the control element.

Returns:
  • The result of the no action.

Source code in automator/ui_control/controller.py
242
243
244
245
246
247
248
def no_action(self):
    """
    No action on the control element.
    :return: The result of the no action.
    """

    return ""

set_edit_text(params)

Set the edit text of the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the set edit text method.

Returns:
  • str

    The result of the set edit text action.

Source code in automator/ui_control/controller.py
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
def set_edit_text(self, params: Dict[str, str]) -> str:
    """
    Set the edit text of the control element.
    :param params: The arguments of the set edit text method.
    :return: The result of the set edit text action.
    """

    text = params.get("text", "")
    inter_key_pause = configs.get("INPUT_TEXT_INTER_KEY_PAUSE", 0.1)

    if configs["INPUT_TEXT_API"] == "set_text":
        method_name = "set_edit_text"
        args = {"text": text}
    else:
        method_name = "type_keys"

        # Transform the text according to the tags.
        text = TextTransformer.transform_text(text, "all")

        args = {"keys": text, "pause": inter_key_pause, "with_spaces": True}
    try:
        result = self.atomic_execution(method_name, args)
        if (
            method_name == "set_text"
            and args["text"] not in self.control.window_text()
        ):
            raise Exception(f"Failed to use set_text: {args['text']}")
        if configs["INPUT_TEXT_ENTER"] and method_name in ["type_keys", "set_text"]:

            self.atomic_execution("type_keys", params={"keys": "{ENTER}"})
        return result
    except Exception as e:
        if method_name == "set_text":
            print_with_color(
                f"{self.control} doesn't have a method named {method_name}, trying default input method",
                "yellow",
            )
            method_name = "type_keys"
            clear_text_keys = "^a{BACKSPACE}"
            text_to_type = args["text"]
            keys_to_send = clear_text_keys + text_to_type
            method_name = "type_keys"
            args = {
                "keys": keys_to_send,
                "pause": inter_key_pause,
                "with_spaces": True,
            }
            return self.atomic_execution(method_name, args)
        else:
            return f"An error occurred: {e}"

summary(params)

Visual summary of the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the visual summary method. should contain a key "text" with the text summary.

Returns:
  • str

    The result of the visual summary action.

Source code in automator/ui_control/controller.py
151
152
153
154
155
156
157
158
def summary(self, params: Dict[str, str]) -> str:
    """
    Visual summary of the control element.
    :param params: The arguments of the visual summary method. should contain a key "text" with the text summary.
    :return: The result of the visual summary action.
    """

    return params.get("text")

texts()

Get the text of the control element.

Returns:
  • str

    The text of the control element.

Source code in automator/ui_control/controller.py
227
228
229
230
231
232
def texts(self) -> str:
    """
    Get the text of the control element.
    :return: The text of the control element.
    """
    return self.control.texts()

transform_point(fraction_x, fraction_y)

Transform the relative coordinates to the absolute coordinates.

Parameters:
  • fraction_x (float) –

    The relative x coordinate.

  • fraction_y (float) –

    The relative y coordinate.

Returns:
  • Tuple[int, int]

    The absolute coordinates.

Source code in automator/ui_control/controller.py
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
def transform_point(self, fraction_x: float, fraction_y: float) -> Tuple[int, int]:
    """
    Transform the relative coordinates to the absolute coordinates.
    :param fraction_x: The relative x coordinate.
    :param fraction_y: The relative y coordinate.
    :return: The absolute coordinates.
    """
    application_rect: RECT = self.application.rectangle()
    application_x = application_rect.left
    application_y = application_rect.top
    application_width = application_rect.width()
    application_height = application_rect.height()

    x = application_x + int(application_width * fraction_x)
    y = application_y + int(application_height * fraction_y)

    return x, y

wait_enabled(timeout=10, retry_interval=0.5)

Wait until the control is enabled.

Parameters:
  • timeout (int, default: 10 ) –

    The timeout to wait.

  • retry_interval (int, default: 0.5 ) –

    The retry interval to wait.

Source code in automator/ui_control/controller.py
266
267
268
269
270
271
272
273
274
275
276
277
def wait_enabled(self, timeout: int = 10, retry_interval: int = 0.5) -> None:
    """
    Wait until the control is enabled.
    :param timeout: The timeout to wait.
    :param retry_interval: The retry interval to wait.
    """
    while not self.control.is_enabled():
        time.sleep(retry_interval)
        timeout -= retry_interval
        if timeout <= 0:
            warnings.warn(f"Timeout: {self.control} is not enabled.")
            break

wait_visible(timeout=10, retry_interval=0.5)

Wait until the window is enabled.

Parameters:
  • timeout (int, default: 10 ) –

    The timeout to wait.

  • retry_interval (int, default: 0.5 ) –

    The retry interval to wait.

Source code in automator/ui_control/controller.py
279
280
281
282
283
284
285
286
287
288
289
290
def wait_visible(self, timeout: int = 10, retry_interval: int = 0.5) -> None:
    """
    Wait until the window is enabled.
    :param timeout: The timeout to wait.
    :param retry_interval: The retry interval to wait.
    """
    while not self.control.is_visible():
        time.sleep(retry_interval)
        timeout -= retry_interval
        if timeout <= 0:
            warnings.warn(f"Timeout: {self.control} is not visible.")
            break

wheel_mouse_input(params)

Wheel mouse input on the control element.

Parameters:
  • params (Dict[str, str]) –

    The arguments of the wheel mouse input method.

Returns:
  • The result of the wheel mouse input action.

Source code in automator/ui_control/controller.py
234
235
236
237
238
239
240
def wheel_mouse_input(self, params: Dict[str, str]):
    """
    Wheel mouse input on the control element.
    :param params: The arguments of the wheel mouse input method.
    :return: The result of the wheel mouse input action.
    """
    return self.atomic_execution("wheel_mouse_input", params)


Command

The command of the UI Automator is the ControlCommand class defined in the ufo/automator/ui_control/controller/ControlCommand module. It encapsulates the function and parameters required to execute the action. The ControlCommand class is a base class for all commands in the UI Automator application. Below is an example of a ClickInputCommand class that inherits from the ControlCommand class:

@ControlReceiver.register
class ClickInputCommand(ControlCommand):
    """
    The click input command class.
    """

    def execute(self) -> str:
        """
        Execute the click input command.
        :return: The result of the click input command.
        """
        return self.receiver.click_input(self.params)

    @classmethod
    def name(cls) -> str:
        """
        Get the name of the atomic command.
        :return: The name of the atomic command.
        """
        return "click_input"

Note

The concrete command classes must implement the execute method to execute the action and the name method to return the name of the atomic command.

Note

Each command must register with a specific ControlReceiver to be executed using the @ControlReceiver.register decorator.

Below is the list of available commands in the UI Automator that are currently supported by UFO:

Command Name Function Name Description
ClickInputCommand click_input Click the control item with the mouse.
ClickOnCoordinatesCommand click_on_coordinates Click on the specific fractional coordinates of the application window.
DragOnCoordinatesCommand drag_on_coordinates Drag the mouse on the specific fractional coordinates of the application window.
SetEditTextCommand set_edit_text Add new text to the control item.
GetTextsCommand texts Get the text of the control item.
WheelMouseInputCommand wheel_mouse_input Scroll the control item.
KeyboardInputCommand keyboard_input Simulate the keyboard input.

Tip

Please refer to the ufo/prompts/share/base/api.yaml file for the detailed API documentation of the UI Automator.

Tip

You can customize the commands by adding new command classes to the ufo/automator/ui_control/controller/ControlCommand module.