Agents Processor

The Processor is a key component of the agent to process the core logic of the agent to process the user's request. The Processor is implemented as a class in the ufo/agents/processors folder. Each agent has its own Processor class withing the folder.

Core Process

Once called, an agent follows a series of steps to process the user's request defined in the Processor class by calling the process method. The workflow of the process is as follows:

Step Description Function
1 Print the step information. print_step_info
2 Capture the screenshot of the application. capture_screenshot
3 Get the control information of the application. get_control_info
4 Get the prompt message for the LLM. get_prompt_message
5 Generate the response from the LLM. get_response
6 Update the cost of the step. update_cost
7 Parse the response from the LLM. parse_response
8 Execute the action based on the response. execute_action
9 Update the memory and blackboard. update_memory
10 Update the status of the agent. update_status

At each step, the Processor processes the user's request by invoking the corresponding method sequentially to execute the necessary actions.

The process may be paused. It can be resumed, based on the agent's logic and the user's request using the resume method.

Reference

Below is the basic structure of the Processor class:

Bases: ABC

The base processor for the session. A session consists of multiple rounds of conversation with the user, completing a task. At each round, the HostAgent and AppAgent interact with the user and the application with the processor. Each processor is responsible for processing the user request and updating the HostAgent and AppAgent at a single step in a round.

Initialize the processor.

Parameters:
  • context (Context) –

    The context of the session.

  • agent (BasicAgent) –

    The agent who executes the processor.

Source code in agents/processors/basic.py
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def __init__(self, agent: BasicAgent, context: Context) -> None:
    """
    Initialize the processor.
    :param context: The context of the session.
    :param agent: The agent who executes the processor.
    """

    self._context = context
    self._agent = agent

    self.photographer = PhotographerFacade()
    self.control_inspector = ControlInspectorFacade(BACKEND)

    self._prompt_message = None
    self._status = None
    self._response = None
    self._cost = 0
    self._control_label = None
    self._control_text = None
    self._response_json = {}
    self._memory_data = MemoryItem()
    self._question_list = []
    self._agent_status_manager = self.agent.status_manager
    self._is_resumed = False
    self._plan = None

    self._total_time_cost = 0
    self._time_cost = {}
    self._exeception_traceback = {}
    self._actions = ActionSequence()

actions property writable

Get the actions.

Returns:
  • ActionSequence

    The actions.

agent property

Get the agent.

Returns:
  • BasicAgent

    The agent.

app_root property writable

Get the application root.

Returns:
  • str

    The application root.

application_process_name property writable

Get the application process name.

Returns:
  • str

    The application process name.

application_window property writable

Get the active window.

Returns:
  • UIAWrapper

    The active window.

context property

Get the context.

Returns:
  • Context

    The context.

control_label property writable

Get the control label.

Returns:
  • str

    The control label.

control_reannotate property writable

Get the control reannotation.

Returns:
  • List[str]

    The control reannotation.

control_text property writable

Get the active application.

Returns:
  • str

    The active application.

cost property writable

Get the cost of the processor.

Returns:
  • float

    The cost of the processor.

host_message property writable

Get the host message.

Returns:
  • List[str]

    The host message.

log_path property

Get the log path.

Returns:
  • str

    The log path.

logger property

Get the logger.

Returns:
  • str

    The logger.

name property

Get the name of the processor.

Returns:
  • str

    The name of the processor.

plan property writable

Get the plan of the agent.

Returns:
  • str

    The plan.

prev_plan property

Get the previous plan.

Returns:
  • List[str]

    The previous plan of the agent.

previous_subtasks property writable

Get the previous subtasks.

Returns:
  • List[str]

    The previous subtasks.

question_list property writable

Get the question list.

Returns:
  • List[str]

    The question list.

request property

Get the request.

Returns:
  • str

    The request.

request_logger property

Get the request logger.

Returns:
  • str

    The request logger.

round_cost property writable

Get the round cost.

Returns:
  • float

    The round cost.

round_num property

Get the round number.

Returns:
  • int

    The round number.

round_step property writable

Get the round step.

Returns:
  • int

    The round step.

round_subtask_amount property

Get the round subtask amount.

Returns:
  • int

    The round subtask amount.

session_cost property writable

Get the session cost.

Returns:
  • float

    The session cost.

session_step property writable

Get the session step.

Returns:
  • int

    The session step.

status property writable

Get the status of the processor.

Returns:
  • str

    The status of the processor.

subtask property writable

Get the subtask.

Returns:
  • str

    The subtask.

ui_tree_path property

Get the UI tree path.

Returns:
  • str

    The UI tree path.

add_to_memory(data_dict)

Add the data to the memory.

Parameters:
  • data_dict (Dict[str, Any]) –

    The data dictionary to be added to the memory.

Source code in agents/processors/basic.py
293
294
295
296
297
298
def add_to_memory(self, data_dict: Dict[str, Any]) -> None:
    """
    Add the data to the memory.
    :param data_dict: The data dictionary to be added to the memory.
    """
    self._memory_data.add_values_from_dict(data_dict)

capture_screenshot() abstractmethod

Capture the screenshot.

Source code in agents/processors/basic.py
231
232
233
234
235
236
@abstractmethod
def capture_screenshot(self) -> None:
    """
    Capture the screenshot.
    """
    pass

exception_capture(func) classmethod

Decorator to capture the exception of the method.

Parameters:
  • func

    The method to be decorated.

Returns:
  • The decorated method.

Source code in agents/processors/basic.py
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
@classmethod
def exception_capture(cls, func):
    """
    Decorator to capture the exception of the method.
    :param func: The method to be decorated.
    :return: The decorated method.
    """

    @wraps(func)
    def wrapper(self, *args, **kwargs):
        try:
            func(self, *args, **kwargs)
        except Exception as e:
            self._exeception_traceback[func.__name__] = {
                "type": str(type(e).__name__),
                "message": str(e),
                "traceback": traceback.format_exc(),
            }

            utils.print_with_color(f"Error Occurs at {func.__name__}", "red")
            utils.print_with_color(
                self._exeception_traceback[func.__name__]["traceback"], "red"
            )
            if self._response is not None:
                utils.print_with_color("Response: ", "red")
                utils.print_with_color(self._response, "red")
            self._status = self._agent_status_manager.ERROR.value
            self.sync_memory()
            self.add_to_memory({"error": self._exeception_traceback})
            self.add_to_memory({"Status": self._status})
            self.log_save()

            raise StopIteration("Error occurred during step.")

    return wrapper

execute_action() abstractmethod

Execute the action.

Source code in agents/processors/basic.py
266
267
268
269
270
271
@abstractmethod
def execute_action(self) -> None:
    """
    Execute the action.
    """
    pass

get_control_info() abstractmethod

Get the control information.

Source code in agents/processors/basic.py
238
239
240
241
242
243
@abstractmethod
def get_control_info(self) -> None:
    """
    Get the control information.
    """
    pass

get_prompt_message() abstractmethod

Get the prompt message.

Source code in agents/processors/basic.py
245
246
247
248
249
250
@abstractmethod
def get_prompt_message(self) -> None:
    """
    Get the prompt message.
    """
    pass

get_response() abstractmethod

Get the response from the LLM.

Source code in agents/processors/basic.py
252
253
254
255
256
257
@abstractmethod
def get_response(self) -> None:
    """
    Get the response from the LLM.
    """
    pass

is_application_closed()

Check if the application is closed.

Returns:
  • bool

    The boolean value indicating if the application is closed.

Source code in agents/processors/basic.py
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
def is_application_closed(self) -> bool:
    """
    Check if the application is closed.
    :return: The boolean value indicating if the application is closed.
    """

    if self.application_window is None:

        return True

    try:
        self.application_window.is_enabled()
        return False
    except:
        return True

is_confirm()

Check if the process is confirm.

Returns:
  • bool

    The boolean value indicating if the process is confirm.

Source code in agents/processors/basic.py
732
733
734
735
736
737
738
739
740
def is_confirm(self) -> bool:
    """
    Check if the process is confirm.
    :return: The boolean value indicating if the process is confirm.
    """

    self.agent.status = self.status

    return self.status == self._agent_status_manager.CONFIRM.value

is_error()

Check if the process is in error.

Returns:
  • bool

    The boolean value indicating if the process is in error.

Source code in agents/processors/basic.py
700
701
702
703
704
705
706
707
def is_error(self) -> bool:
    """
    Check if the process is in error.
    :return: The boolean value indicating if the process is in error.
    """

    self.agent.status = self.status
    return self.status == self._agent_status_manager.ERROR.value

is_paused()

Check if the process is paused.

Returns:
  • bool

    The boolean value indicating if the process is paused.

Source code in agents/processors/basic.py
709
710
711
712
713
714
715
716
717
718
719
720
def is_paused(self) -> bool:
    """
    Check if the process is paused.
    :return: The boolean value indicating if the process is paused.
    """

    self.agent.status = self.status

    return (
        self.status == self._agent_status_manager.PENDING.value
        or self.status == self._agent_status_manager.CONFIRM.value
    )

is_pending()

Check if the process is pending.

Returns:
  • bool

    The boolean value indicating if the process is pending.

Source code in agents/processors/basic.py
722
723
724
725
726
727
728
729
730
def is_pending(self) -> bool:
    """
    Check if the process is pending.
    :return: The boolean value indicating if the process is pending.
    """

    self.agent.status = self.status

    return self.status == self._agent_status_manager.PENDING.value

log(response_json)

Set the result of the session, and log the result. result: The result of the session. response_json: The response json. return: The response json.

Source code in agents/processors/basic.py
758
759
760
761
762
763
764
765
766
def log(self, response_json: Dict[str, Any]) -> None:
    """
    Set the result of the session, and log the result.
    result: The result of the session.
    response_json: The response json.
    return: The response json.
    """

    self.logger.info(json.dumps(response_json))

log_save()

Save the log.

Source code in agents/processors/basic.py
300
301
302
303
304
305
306
307
308
def log_save(self) -> None:
    """
    Save the log.
    """

    self._memory_data.add_values_from_dict(
        {"total_time_cost": self._total_time_cost}
    )
    self.log(self._memory_data.to_dict())

method_timer(func) classmethod

Decorator to calculate the time cost of the method.

Parameters:
  • func

    The method to be decorated.

Returns:
  • The decorated method.

Source code in agents/processors/basic.py
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
@classmethod
def method_timer(cls, func):
    """
    Decorator to calculate the time cost of the method.
    :param func: The method to be decorated.
    :return: The decorated method.
    """

    @wraps(func)
    def wrapper(self, *args, **kwargs):
        start_time = time.time()
        result = func(self, *args, **kwargs)
        end_time = time.time()
        self._time_cost[func.__name__] = end_time - start_time
        return result

    return wrapper

parse_response() abstractmethod

Parse the response.

Source code in agents/processors/basic.py
259
260
261
262
263
264
@abstractmethod
def parse_response(self) -> None:
    """
    Parse the response.
    """
    pass

print_step_info() abstractmethod

Print the step information.

Source code in agents/processors/basic.py
224
225
226
227
228
229
@abstractmethod
def print_step_info(self) -> None:
    """
    Print the step information.
    """
    pass

process()

Process a single step in a round. The process includes the following steps: 1. Print the step information. 2. Capture the screenshot. 3. Get the control information. 4. Get the prompt message. 5. Get the response. 6. Update the cost. 7. Parse the response. 8. Execute the action. 9. Update the memory. 10. Update the step and status. 11. Save the log.

Source code in agents/processors/basic.py
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
def process(self) -> None:
    """
    Process a single step in a round.
    The process includes the following steps:
    1. Print the step information.
    2. Capture the screenshot.
    3. Get the control information.
    4. Get the prompt message.
    5. Get the response.
    6. Update the cost.
    7. Parse the response.
    8. Execute the action.
    9. Update the memory.
    10. Update the step and status.
    11. Save the log.
    """

    start_time = time.time()

    try:
        # Step 1: Print the step information.
        self.print_step_info()

        # Step 2: Capture the screenshot.
        self.capture_screenshot()

        # Step 3: Get the control information.
        self.get_control_info()

        # Step 4: Get the prompt message.
        self.get_prompt_message()

        # Step 5: Get the response.
        self.get_response()

        # Step 6: Update the context.
        self.update_cost()

        # Step 7: Parse the response, if there is no error.
        self.parse_response()

        if self.is_pending() or self.is_paused():
            # If the session is pending, update the step and memory, and return.
            if self.is_pending():
                self.update_status()
                self.update_memory()

            return

        # Step 8: Execute the action.
        self.execute_action()

        # Step 9: Update the memory.
        self.update_memory()

        # Step 10: Update the status.
        self.update_status()

        self._total_time_cost = time.time() - start_time

        # Step 11: Save the log.
        self.log_save()

    except StopIteration:
        # Error was handled and logged in the exception capture decorator.
        # Simply return here to stop the process early.

        return

resume()

Resume the process of action execution after the session is paused.

Source code in agents/processors/basic.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
def resume(self) -> None:
    """
    Resume the process of action execution after the session is paused.
    """

    self._is_resumed = True

    try:
        # Step 1: Execute the action.
        self.execute_action()

        # Step 2: Update the memory.
        self.update_memory()

        # Step 3: Update the status.
        self.update_status()

    except StopIteration:
        # Error was handled and logged in the exception capture decorator.
        # Simply return here to stop the process early.
        pass

    finally:
        self._is_resumed = False

string2list(string) staticmethod

Convert a string to a list of string if the input is a string.

Parameters:
  • string (Any) –

    The string.

Returns:
  • List[str]

    The list.

Source code in agents/processors/basic.py
776
777
778
779
780
781
782
783
784
785
786
@staticmethod
def string2list(string: Any) -> List[str]:
    """
    Convert a string to a list of string if the input is a string.
    :param string: The string.
    :return: The list.
    """
    if isinstance(string, str):
        return [string]
    else:
        return string

sync_memory() abstractmethod

Sync the memory of the Agent.

Source code in agents/processors/basic.py
217
218
219
220
221
222
@abstractmethod
def sync_memory(self) -> None:
    """
    Sync the memory of the Agent.
    """
    pass

update_cost()

Update the cost.

Source code in agents/processors/basic.py
318
319
320
321
322
323
324
def update_cost(self) -> None:
    """
    Update the cost.
    """

    self.round_cost += self.cost
    self.session_cost += self.cost

update_memory() abstractmethod

Update the memory of the Agent.

Source code in agents/processors/basic.py
273
274
275
276
277
278
@abstractmethod
def update_memory(self) -> None:
    """
    Update the memory of the Agent.
    """
    pass

update_status()

Update the status of the session.

Source code in agents/processors/basic.py
280
281
282
283
284
285
286
287
288
289
290
291
def update_status(self) -> None:
    """
    Update the status of the session.
    """
    self.agent.step += 1
    self.agent.status = self.status

    if self.status != self._agent_status_manager.FINISH.value:
        time.sleep(configs["SLEEP_TIME"])

    self.round_step += 1
    self.session_step += 1