Wrapping Your App's Native API
UFO takes actions on applications based on UI controls, but providing native API to its toolboxes can enhance the efficiency and accuracy of the actions. This document provides guidance on how to wrap your application's native API into UFO's toolboxes.
How to Wrap Your App's Native API?
Before developing the native API wrappers, we strongly recommend that you read the design of the Automator.
Step 1: Create a Receiver for the Native API
The Receiver
is a class that receives the native API calls from the AppAgent
and executes them. To wrap your application's native API, you need to create a Receiver
class that contains the methods to execute the native API calls.
To create a Receiver
class, follow these steps:
1. Create a Folder for Your Application
- Navigate to the
ufo/automator/app_api/
directory. - Create a folder named after your application.
2. Create a Python File
- Inside the folder you just created, add a Python file named after your application, for example,
{your_application}_client.py
.
3. Define the Receiver Class
- In the Python file, define a class named
{Your_Receiver}
, inheriting from theReceiverBasic
class located inufo/automator/basic.py
. - Initialize the
Your_Receiver
class with the object that executes the native API calls. For example, if your API is based on acom
object, initialize thecom
object in the__init__
method of theYour_Receiver
class.
Example of WinCOMReceiverBasic
class:
class WinCOMReceiverBasic(ReceiverBasic):
"""
The base class for Windows COM client.
"""
_command_registry: Dict[str, Type[CommandBasic]] = {}
def __init__(self, app_root_name: str, process_name: str, clsid: str) -> None:
"""
Initialize the Windows COM client.
:param app_root_name: The app root name.
:param process_name: The process name.
:param clsid: The CLSID of the COM object.
"""
self.app_root_name = app_root_name
self.process_name = process_name
self.clsid = clsid
self.client = win32com.client.Dispatch(self.clsid)
self.com_object = self.get_object_from_process_name()
4. Define Methods to Execute Native API Calls
- Define the methods in the
Your_Receiver
class to execute the native API calls.
Example of ExcelWinCOMReceiver
class:
def table2markdown(self, sheet_name: str) -> str:
"""
Convert the table in the sheet to a markdown table string.
:param sheet_name: The sheet name.
:return: The markdown table string.
"""
sheet = self.com_object.Sheets(sheet_name)
data = sheet.UsedRange()
df = pd.DataFrame(data[1:], columns=data[0])
df = df.dropna(axis=0, how="all")
df = df.applymap(self.format_value)
return df.to_markdown(index=False)
5. Create a Factory Class
- Create your Factory class inheriting from the
APIReceiverFactory
class to manage multipleReceiver
classes that share the same API type. - Implement the
create_receiver
andname
methods in theReceiverFactory
class. Thecreate_receiver
method should return theReceiver
class. - By default, the
create_receiver
takes theapp_root_name
andprocess_name
as parameters and returns theReceiver
class. - Register the
ReceiverFactory
class with the decorator@ReceiverManager.register
.
Example of the COMReceiverFactory
class:
from ufo.automator.puppeteer import ReceiverManager
@ReceiverManager.register
class COMReceiverFactory(APIReceiverFactory):
"""
The factory class for the COM receiver.
"""
def create_receiver(self, app_root_name: str, process_name: str) -> WinCOMReceiverBasic:
"""
Create the wincom receiver.
:param app_root_name: The app root name.
:param process_name: The process name.
:return: The receiver.
"""
com_receiver = self.__com_client_mapper(app_root_name)
clsid = self.__app_root_mappping(app_root_name)
if clsid is None or com_receiver is None:
print_with_color(f"Warning: Win32COM API is not supported for {process_name}.", "yellow")
return None
return com_receiver(app_root_name, process_name, clsid)
@classmethod
def name(cls) -> str:
"""
Get the name of the receiver factory.
:return: The name of the receiver factory.
"""
return "COM"
Note
The create_receiver
method should return None
if the application is not supported.
Note
You must register your ReceiverFactory
with the decorator @ReceiverManager.register
for the ReceiverManager
to manage the ReceiverFactory
.
The Receiver
class is now ready to receive the native API calls from the AppAgent
.
Step 2: Create a Command for the Native API
Commands are the actions that the AppAgent
can execute on the application. To create a command for the native API, you need to create a Command
class that contains the method to execute the native API calls.
1. Create a Command Class
- Create a
Command
class in the same Python file where theReceiver
class is located. TheCommand
class should inherit from theCommandBasic
class located inufo/automator/basic.py
.
Example:
class WinCOMCommand(CommandBasic):
"""
The abstract command interface.
"""
def __init__(self, receiver: WinCOMReceiverBasic, params=None) -> None:
"""
Initialize the command.
:param receiver: The receiver of the command.
"""
self.receiver = receiver
self.params = params if params is not None else {}
@abstractmethod
def execute(self):
pass
@classmethod
def name(cls) -> str:
"""
Get the name of the command.
:return: The name of the command.
"""
return cls.__name__
2. Define the Execute Method
- Define the
execute
method in theCommand
class to call the receiver to execute the native API calls.
Example:
def execute(self):
"""
Execute the command to insert a table.
:return: The inserted table.
"""
return self.receiver.insert_excel_table(
sheet_name=self.params.get("sheet_name", 1),
table=self.params.get("table"),
start_row=self.params.get("start_row", 1),
start_col=self.params.get("start_col", 1),
)
3. Register the Command Class:
- Register the
Command
class in the correspondingReceiver
class using the@your_receiver.register
decorator.
Example:
@ExcelWinCOMReceiver.register
class InsertExcelTable(WinCOMCommand):
...
The Command
class is now registered in the Receiver
class and available for the AppAgent
to execute the native API calls.
Step 3: Provide Prompt Descriptions for the Native API
To let the AppAgent
know the usage of the native API calls, you need to provide prompt descriptions.
1. Create an api.yaml File
- Create an `api.yaml` file in the `ufo/prompts/apps/{your_app_name}` directory.
2. Define Prompt Descriptions
- Define the prompt descriptions for the native API calls in the
api.yaml
file.
Example:
table2markdown:
summary: |-
"table2markdown" is to get the table content in a sheet of the Excel app and convert it to markdown format.
class_name: |-
GetSheetContent
usage: |-
[1] API call: table2markdown(sheet_name: str)
[2] Args:
- sheet_name: The name of the sheet in the Excel app.
[3] Example: table2markdown(sheet_name="Sheet1")
[4] Available control item: Any control item in the Excel app.
[5] Return: the markdown format string of the table content of the sheet.
Note
The table2markdown
is the name of the native API call. It MUST
match the name()
defined in the corresponding Command
class!
3. Register the Prompt Address in config_dev.yaml
- Register the prompt address by adding to the
APP_API_PROMPT_ADDRESS
field ofconfig_dev.yaml
file with the application program name as the key and the prompt file address as the value.
Example:
APP_API_PROMPT_ADDRESS: {
"WINWORD.EXE": "ufo/prompts/apps/word/api.yaml",
"EXCEL.EXE": "ufo/prompts/apps/excel/api.yaml",
"msedge.exe": "ufo/prompts/apps/web/api.yaml",
"chrome.exe": "ufo/prompts/apps/web/api.yaml"
"your_application_program_name": "YOUR_APPLICATION_API_PROMPT"
}
Note
The your_application_program_name
must match the name of the application program.
The AppAgent
can now use the prompt descriptions to understand the usage of the native API calls.
By following these steps, you will have successfully wrapped the native API of your application into UFO's toolboxes, allowing the AppAgent
to execute the native API calls on the application!