Session

A Session is a continuous conversation instance between the user and UFO, managing multiple rounds of interaction from initial request to task completion across different execution modes and platforms.

Quick Reference:


Overview

A Session represents a complete conversation workflow, containing one or more Rounds of agent execution. Sessions manage:

  1. Context: Shared state across all rounds
  2. Agents: HostAgent and AppAgent (or LinuxAgent)
  3. Rounds: Individual request-response cycles
  4. Evaluation: Optional task completion assessment
  5. Experience: Learning from successful workflows

Relationship: Session vs Round

graph TB subgraph "Session (Conversation)" S[Session Instance] CTX[Context<br/>Shared State] R1[Round 1<br/>Request 1] R2[Round 2<br/>Request 2] R3[Round 3<br/>Request 3] EVAL[Evaluation<br/>Optional] end subgraph "Round 1 Details" HOST1[HostAgent] APP1[AppAgent] CMD1[Commands] end subgraph "Round 2 Details" HOST2[HostAgent] APP2[AppAgent] CMD2[Commands] end S --> CTX S --> R1 S --> R2 S --> R3 S --> EVAL R1 -.shares.-> CTX R2 -.shares.-> CTX R3 -.shares.-> CTX R1 --> HOST1 HOST1 --> APP1 APP1 --> CMD1 R2 --> HOST2 HOST2 --> APP2 APP2 --> CMD2 style S fill:#e1f5ff style CTX fill:#fff4e1 style R1 fill:#f0ffe1 style R2 fill:#f0ffe1 style R3 fill:#f0ffe1 style EVAL fill:#ffe1f5

Session Types

UFO supports 7 session types across Windows and Linux platforms:

Session Type Platform Mode Description
Session Windows normal, normal_operator Interactive with HostAgent
ServiceSession Windows service WebSocket-controlled via AIP
FollowerSession Windows follower Replays saved plans
FromFileSession Windows batch_normal Executes from request files
OpenAIOperatorSession Windows operator Pure operator mode
LinuxSession Linux normal, normal_operator Interactive without HostAgent
LinuxServiceSession Linux service WebSocket-controlled on Linux

Class Hierarchy

graph TB BASE[BaseSession<br/>Abstract] WIN_BASE[WindowsBaseSession<br/>with HostAgent] LINUX_BASE[LinuxBaseSession<br/>without HostAgent] SESSION[Session<br/>Interactive] SERVICE[ServiceSession<br/>WebSocket] FOLLOWER[FollowerSession<br/>Plan Replay] FROMFILE[FromFileSession<br/>Batch] OPERATOR[OpenAIOperatorSession<br/>Operator] LINUX_SESS[LinuxSession<br/>Interactive] LINUX_SERVICE[LinuxServiceSession<br/>WebSocket] BASE --> WIN_BASE BASE --> LINUX_BASE WIN_BASE --> SESSION WIN_BASE --> SERVICE WIN_BASE --> FOLLOWER WIN_BASE --> FROMFILE WIN_BASE --> OPERATOR LINUX_BASE --> LINUX_SESS LINUX_BASE --> LINUX_SERVICE style BASE fill:#e1f5ff style WIN_BASE fill:#fff4e1 style LINUX_BASE fill:#f0ffe1 style SESSION fill:#e1ffe1 style LINUX_SESS fill:#e1ffe1

Platform Base Classes

  • WindowsBaseSession: Creates HostAgent, supports two-tier architecture
  • LinuxBaseSession: Single-tier architecture with LinuxAgent only

Session Lifecycle

Standard Lifecycle

stateDiagram-v2 [*] --> Initialized: __init__ Initialized --> ContextReady: _init_context ContextReady --> Running: run() Running --> RoundCreate: create_new_round RoundCreate --> RoundExecute: round.run() RoundExecute --> RoundComplete: Round finishes RoundComplete --> CheckMore: is_finished? CheckMore --> RoundCreate: More requests CheckMore --> Snapshot: No more requests Snapshot --> Evaluation: capture_last_snapshot Evaluation --> CostPrint: evaluation() if enabled CostPrint --> [*]: Session complete

Core Execution Loop

The main session logic:

async def run(self) -> None:
    """
    Run the session.
    """

    while not self.is_finished():
        # Create new round for each request
        round = self.create_new_round()
        if round is None:
            break

        # Execute the round
        await round.run()

    # Capture final state
    if self.application_window is not None:
        await self.capture_last_snapshot()

    # Evaluate if configured
    if self._should_evaluate and not self.is_error():
        await self.evaluation()

    # Print cost summary
    self.print_cost()

Lifecycle Stages

1. Initialization

session = Session(
    task="email_task",
    should_evaluate=True,
    id=0,
    request="Send an email to John",
    mode="normal"
)

What happens: - Task name assigned - Session ID set - Initial request stored - Mode configured

2. Context Initialization

def _init_context(self) -> None:
    """Initialize the session context."""
    super()._init_context()

    # Create MCP server manager
    mcp_server_manager = MCPServerManager()

    # Create local dispatcher
    command_dispatcher = LocalCommandDispatcher(
        session=self,
        mcp_server_manager=mcp_server_manager
    )

    # Attach to context
    self.context.attach_command_dispatcher(command_dispatcher)

What happens: - Context object created - Command dispatcher attached (Local or WebSocket) - MCP servers initialized (if applicable) - Application window tracked

3. Round Creation

def create_new_round(self):
    """Create a new round."""

    # Get request (first or new)
    if not self.context.get(ContextNames.REQUEST):
        request = first_request()
    else:
        request, complete = new_request()
        if complete:
            return None

    # Create round with request
    round = Round(
        task=self.task,
        context=self.context,
        request=request,
        id=self._round_num
    )

    self._round_num += 1
    return round

What happens: - User prompted for request (interactive modes) - Or request read from file/plan (non-interactive) - Round object created with shared context - Round counter incremented

4. Round Execution

await round.run()

What happens: - HostAgent selects application (Windows) - AppAgent executes in application (or LinuxAgent directly) - Commands dispatched and executed - Results captured in context - Experience logged

5. Continuation Check

def is_finished(self) -> bool:
    """Check if session is complete."""
    return self.context.get(ContextNames.SESSION_FINISH, False)

What happens: - Check if user wants another request - Check if error occurred - Check if plan is complete (follower/batch modes)

6. Final Snapshot

async def capture_last_snapshot(self) -> None:
    """Capture the last snapshot of the application."""

    last_round = self.context.get(ContextNames.ROUND_STEP)
    subtask_amount = self.context.get(ContextNames.SUBTASK_AMOUNT)

    # Capture screenshot
    screenshot = self.application_window.capture_screenshot_infor()

    # Save to logs
    self.file_writer.save_screenshot(
        screenshot,
        last_round,
        subtask_amount,
        "last"
    )

What happens: - Screenshot captured - Control tree logged - Final state preserved

7. Evaluation

async def evaluation(self) -> None:
    """Evaluate the session."""

    evaluator = EvaluationAgent(
        name="evaluation",
        process_name=self.context.get(ContextNames.APPLICATION_PROCESS_NAME),
        app_root_name=self.context.get(ContextNames.APPLICATION_ROOT_NAME),
        is_visual=self.configs["EVA_SESSION"]["VIS_EVAL"],
        main_prompt=self.configs["EVA_SESSION"]["MAIN_PROMPT"],
        api_prompt=self.configs["EVA_SESSION"]["API_PROMPT"]
    )

    score = await evaluator.evaluate(
        request=self.context.get(ContextNames.REQUEST),
        trajectory=self.context.get(ContextNames.TRAJECTORY)
    )

    self.file_writer.save_evaluation(score)

What happens: - EvaluationAgent created - Task completion assessed - Score logged - Feedback saved

8. Cost Summary

def print_cost(self) -> None:
    """Print the session cost."""

    total_cost = self.context.get(ContextNames.TOTAL_COST, 0.0)
    total_tokens = self.context.get(ContextNames.TOTAL_TOKENS, 0)

    console.print(f"[bold green]Session Complete[/bold green]")
    console.print(f"Total Cost: ${total_cost:.4f}")
    console.print(f"Total Tokens: {total_tokens}")

Execution Modes

Normal Mode

Interactive execution with user in the loop:

session = Session(
    task="document_edit",
    should_evaluate=True,
    id=0,
    request="",  # Will prompt user
    mode="normal"
)

await session.run()

Features: - User prompted for initial request via first_request() - User prompted for each new request via new_request() - Commands executed locally via LocalCommandDispatcher - User can exit anytime by typing "N"

Flow:

1. Display welcome panel
2. User enters: "Open Word"
3. HostAgent selects Word application
4. AppAgent types content
5. User asked: "What next?"
6. User enters: "Save document"
7. AppAgent saves file
8. User asked: "What next?"
9. User enters: "N" (exit)
10. Session ends

Normal_Operator Mode

Normal mode with operator capabilities:

session = Session(
    task="complex_workflow",
    should_evaluate=True,
    id=0,
    request="Organize my files by date",
    mode="normal_operator"
)

Differences from Normal: - Agent can use operator-level actions - More powerful command set - Same interactive workflow

Service Mode

WebSocket-controlled remote execution:

from aip.protocol.task_execution import TaskExecutionProtocol

protocol = TaskExecutionProtocol(websocket_connection)

session = ServiceSession(
    task="remote_automation",
    should_evaluate=True,
    id="session_abc123",
    request="Click Submit button",
    task_protocol=protocol
)

await session.run()

Features: - No user interaction prompts - Single request per session - Commands sent via WebSocket - Results returned to server - Uses WebSocketCommandDispatcher

Flow:

1. Server sends request via WebSocket
2. ServiceSession created
3. Agent generates commands
4. Commands sent to client via WebSocket
5. Client executes locally
6. Results sent back
7. Session finishes immediately

Key Difference:

def is_finished(self) -> bool:
    """Service session finishes after one round."""
    return self._round_num > 0

Follower Mode

Replay saved action plans:

session = FollowerSession(
    task="email_replay",
    plan_file="/plans/send_email.json",
    should_evaluate=True,
    id=0
)

await session.run()

Features: - No user prompts - Reads actions from plan file - Deterministic execution - Good for testing/demos

Plan File Format:

{
  "request": "Send an email to John",
  "actions": [
    {
      "agent": "HostAgent",
      "action": "select_application",
      "parameters": {"app_name": "Outlook"}
    },
    {
      "agent": "AppAgent",
      "action": "click_element",
      "parameters": {"label": "New Email"}
    }
  ]
}

Batch_Normal Mode

Execute multiple requests from files:

session = FromFileSession(
    task="batch_task",
    plan_file="/requests/task1.json",
    should_evaluate=True,
    id=0
)

await session.run()

Features: - Request loaded from file - No user interaction - Can batch multiple files with SessionPool - Task status tracking available

Request File:

{
  "request": "Create a spreadsheet with sales data"
}

Operator Mode

Pure operator-level execution:

session = OpenAIOperatorSession(
    task="system_automation",
    should_evaluate=True,
    id=0,
    request="Install and configure software"
)

await session.run()

Features: - Operator-level permissions - Can modify system settings - More powerful than AppAgent - Same interactive prompts as normal mode


Platform-Specific Sessions

Windows Sessions

Characteristics: - Two-tier architecture: HostAgent → AppAgent - Base class: WindowsBaseSession - Agent flow: HostAgent selects app, AppAgent controls it - Automation: Uses UIA (UI Automation)

Example:

class Session(WindowsBaseSession):
    """Windows interactive session."""

    def _init_context(self):
        """Initialize with HostAgent."""
        super()._init_context()

        # HostAgent created by WindowsBaseSession
        self.host_agent = self.create_host_agent()

        # MCP and LocalCommandDispatcher
        self.setup_command_dispatcher()

Linux Sessions

Characteristics: - Single-tier architecture: LinuxAgent only (no HostAgent) - Base class: LinuxBaseSession - Agent flow: LinuxAgent controls application directly - Automation: Platform-specific tools

Example:

class LinuxSession(LinuxBaseSession):
    """Linux interactive session."""

    def _init_context(self):
        """Initialize without HostAgent."""
        super()._init_context()

        # No HostAgent - direct LinuxAgent usage
        self.linux_agent = self.create_linux_agent(
            application_name=self.application_name
        )

Comparison:

Aspect Windows Linux
Architecture Two-tier (HostAgent + AppAgent) Single-tier (LinuxAgent)
Application Selection HostAgent decides Pre-specified or LinuxAgent decides
Agent Switching Yes (HostAgent ↔ AppAgent) No
Modes Supported All 7 modes normal, normal_operator, service
UI Automation UIA (UIAutomation) Platform tools

See Platform Sessions for detailed comparison.


Experience Saving

Sessions can save successful workflows for future learning:

# After successful task completion
if self.configs["SAVE_EXPERIENCE"] == "ask":
    save = experience_asker()

    if save:
        self.save_experience()

Save Modes:

Mode Behavior
always Auto-save every successful session
ask Prompt user after each session
auto Save if evaluation score > threshold
always_not Never save

Saved Experience Structure:

{
  "task": "Send email",
  "request": "Send an email to John about the meeting",
  "trajectory": [
    {
      "round": 0,
      "agent": "HostAgent",
      "observation": "Desktop with Outlook icon",
      "action": "select_application",
      "parameters": {"app_name": "Outlook"}
    },
    {
      "round": 0,
      "agent": "AppAgent",
      "observation": "Outlook main window",
      "action": "click_element",
      "parameters": {"label": "New Email"}
    }
  ],
  "outcome": "success",
  "evaluation_score": 0.95,
  "cost": 0.0234,
  "tokens": 1542
}

Error Handling

Error States

Sessions track errors through context:

def is_error(self) -> bool:
    """Check if session encountered error."""
    return self.context.get(ContextNames.ERROR, False)

def set_error(self, error_message: str):
    """Set error state."""
    self.context.set(ContextNames.ERROR, True)
    self.context.set(ContextNames.ERROR_MESSAGE, error_message)

Error Recovery

try:
    await round.run()
except AgentError as e:
    self.set_error(str(e))
    logger.error(f"Round {self._round_num} failed: {e}")

    # Decide whether to continue or abort
    if self.can_recover(e):
        # Try next round
        continue
    else:
        # Abort session
        break

Common Errors

Error Type Cause Handling
TimeoutError Command execution timeout Retry or skip
ConnectionError WebSocket/MCP disconnection Reconnect or abort
AgentError Agent decision failure Log and retry
ValidationError Invalid command parameters Skip command

Best Practices

Session Creation

Efficient Sessions

  • ✅ Use SessionFactory.create_session() for platform-aware creation
  • ✅ Enable evaluation for quality tracking
  • ✅ Choose appropriate mode for use case
  • ✅ Set meaningful task names for logging
  • ❌ Don't create sessions directly (use factory)
  • ❌ Don't mix modes (each session has one mode)

Interactive Sessions

User Experience

  • ✅ Provide clear initial requests
  • ✅ Allow users to exit gracefully ("N" option)
  • ✅ Show progress and confirmations
  • ✅ Handle sensitive actions with confirmation
  • ❌ Don't prompt excessively
  • ❌ Don't hide errors from users

Service Sessions

WebSocket Considerations

  • ✅ Always provide task_protocol
  • ✅ Handle connection loss gracefully
  • ✅ Set appropriate timeouts
  • ✅ Validate requests before execution
  • ❌ Don't assume connection is stable
  • ❌ Don't block waiting for results indefinitely

Batch Sessions

Batch Processing

  • ✅ Enable task status tracking
  • ✅ Use descriptive file names
  • ✅ Group similar tasks
  • ✅ Log failures for retry
  • ❌ Don't stop batch on first failure
  • ❌ Don't run too many sessions in parallel

Examples

Example 1: Basic Interactive Session

from ufo.module.sessions.session import Session

# Create session
session = Session(
    task="word_editing",
    should_evaluate=True,
    id=0,
    request="",  # Will prompt user
    mode="normal"
)

# Run session
await session.run()

# User interaction:
# 1. Welcome panel shown
# 2. User enters: "Open Word and type Hello World"
# 3. HostAgent selects Word
# 4. AppAgent types text
# 5. User asked for next request
# 6. User enters: "N" to exit
# 7. Session evaluates and ends

Example 2: Service Session

from ufo.module.sessions.service_session import ServiceSession
from aip.protocol.task_execution import TaskExecutionProtocol

# WebSocket established
protocol = TaskExecutionProtocol(websocket)

# Create service session
session = ServiceSession(
    task="remote_click",
    should_evaluate=False,  # Server evaluates
    id="sess_12345",
    request="Click the Submit button",
    task_protocol=protocol
)

# Run (non-blocking for client)
await session.run()

# Session finishes after one request

Example 3: Follower Session

from ufo.module.sessions.session import FollowerSession

# Replay saved plan
session = FollowerSession(
    task="email_demo",
    plan_file="./plans/send_email.json",
    should_evaluate=True,
    id=0
)

await session.run()

# Executes exactly as recorded in plan file
# No user prompts
# Deterministic execution

Example 4: Linux Session

from ufo.module.sessions.linux_session import LinuxSession

# Linux interactive session
session = LinuxSession(
    task="linux_task",
    should_evaluate=True,
    id=0,
    request="Open gedit and type Hello Linux",
    mode="normal",
    application_name="gedit"
)

await session.run()

# Single-tier architecture
# No HostAgent
# LinuxAgent controls gedit directly

Reference

BaseSession

Bases: ABC

A basic session in UFO. A session consists of multiple rounds of interactions and conversations.

Initialize a session.

Parameters:
  • task (str) –

    The name of current task.

  • should_evaluate (bool) –

    Whether to evaluate the session.

  • id (str) –

    The id of the session.

Source code in module/basic.py
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
def __init__(self, task: str, should_evaluate: bool, id: str) -> None:
    """
    Initialize a session.
    :param task: The name of current task.
    :param should_evaluate: Whether to evaluate the session.
    :param id: The id of the session.
    """

    self._should_evaluate = should_evaluate
    self._id = id
    self.task = task

    # Logging-related properties
    self.log_path = f"logs/{task}/"
    utils.create_folder(self.log_path)

    self._rounds: Dict[int, BaseRound] = {}

    self._context = Context()
    self._init_context()
    self._finish = False
    self._results = []
    self.logger = logging.getLogger(__name__)

    # Initialize platform-specific agents
    # Subclasses should override _init_agents() to set up their agents
    self._host_agent: Optional[HostAgent] = None
    self._init_agents()

application_window property writable

Get the application of the session. return: The application of the session.

application_window_info property writable

Get the application window info of the session. return: The application window info of the session.

context property

Get the context of the session. return: The context of the session.

cost property writable

Get the cost of the session. return: The cost of the session.

current_agent_class property

Get the class name of the current agent. return: The class name of the current agent.

current_round property

Get the current round of the session. return: The current round of the session.

evaluation_logger property

Get the file writer for evaluation. return: The file writer for evaluation.

host_agent property

Get the host agent of the session. May return None for sessions that don't use a host agent (e.g., Linux).

Returns:
  • Optional[HostAgent]

    The host agent of the session, or None if not applicable.

id property

Get the id of the session. return: The id of the session.

results property writable

Get the evaluation results of the session. return: The evaluation results of the session.

rounds property

Get the rounds of the session. return: The rounds of the session.

session_type property

Get the class name of the session. return: The class name of the session.

step property

Get the step of the session. return: The step of the session.

total_rounds property

Get the total number of rounds in the session. return: The total number of rounds in the session.

add_round(id, round)

Add a round to the session.

Parameters:
  • id (int) –

    The id of the round.

  • round (BaseRound) –

    The round to be added.

Source code in module/basic.py
539
540
541
542
543
544
545
def add_round(self, id: int, round: BaseRound) -> None:
    """
    Add a round to the session.
    :param id: The id of the round.
    :param round: The round to be added.
    """
    self._rounds[id] = round

capture_last_screenshot(save_path, full_screen=False) async

Capture the last window screenshot.

Parameters:
  • save_path (str) –

    The path to save the window screenshot.

  • full_screen (bool, default: False ) –

    Whether to capture the full screen or just the active window.

Source code in module/basic.py
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
async def capture_last_screenshot(
    self, save_path: str, full_screen: bool = False
) -> None:
    """
    Capture the last window screenshot.
    :param save_path: The path to save the window screenshot.
    :param full_screen: Whether to capture the full screen or just the active window.
    """

    try:
        if full_screen:
            command = Command(
                tool_name="capture_desktop_screenshot",
                parameters={"all_screens": True},
                tool_type="data_collection",
            )
        else:

            command = Command(
                tool_name="capture_window_screenshot",
                parameters={},
                tool_type="data_collection",
            )

        result = await self.context.command_dispatcher.execute_commands([command])
        image = result[0].result

        self.logger.info(f"Captured screenshot at final: {save_path}")
        if image:
            utils.save_image_string(image, save_path)

    except Exception as e:
        self.logger.warning(
            f"The last snapshot capture failed, due to the error: {e}"
        )

capture_last_snapshot() async

Capture the last snapshot of the application, including the screenshot and the XML file if configured.

Source code in module/basic.py
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
async def capture_last_snapshot(self) -> None:
    """
    Capture the last snapshot of the application, including the screenshot and the XML file if configured.
    """  # Capture the final screenshot
    screenshot_save_path = self.log_path + "action_step_final.png"

    if (
        self.application_window is not None
        or self.application_window_info is not None
    ):

        await self.capture_last_screenshot(screenshot_save_path)

        if ufo_config.system.save_ui_tree:
            ui_tree_path = os.path.join(self.log_path, "ui_trees")
            ui_tree_file_name = "ui_tree_final.json"
            ui_tree_save_path = os.path.join(ui_tree_path, ui_tree_file_name)
            await self.capture_last_ui_tree(ui_tree_save_path)

        if ufo_config.system.save_full_screen:

            desktop_save_path = self.log_path + "desktop_final.png"

            await self.capture_last_screenshot(desktop_save_path, full_screen=True)

capture_last_ui_tree(save_path) async

Capture the last UI tree snapshot.

Parameters:
  • save_path (str) –

    The path to save the UI tree snapshot.

Source code in module/basic.py
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
async def capture_last_ui_tree(self, save_path: str) -> None:
    """
    Capture the last UI tree snapshot.
    :param save_path: The path to save the UI tree snapshot.
    """

    result = await self.context.command_dispatcher.execute_commands(
        [
            Command(
                tool_name="get_ui_tree",
                parameters={},
                tool_type="data_collection",
            )
        ]
    )

    if result and result[0].result:
        with open(save_path, "w") as file:
            json.dump(result[0].result, file, indent=4)

create_following_round()

Create a following round. return: The following round.

Source code in module/basic.py
532
533
534
535
536
537
def create_following_round(self) -> BaseRound:
    """
    Create a following round.
    return: The following round.
    """
    pass

create_new_round() abstractmethod

Create a new round.

Source code in module/basic.py
517
518
519
520
521
522
@abstractmethod
def create_new_round(self) -> Optional[BaseRound]:
    """
    Create a new round.
    """
    pass

evaluation()

Evaluate the session.

Source code in module/basic.py
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
def evaluation(self) -> None:
    """
    Evaluate the session.
    """
    console.print("📊 Evaluating the session...", style="yellow")

    is_visual = ufo_config.evaluation_agent.visual_mode

    evaluator = EvaluationAgent(
        name="eva_agent",
        is_visual=is_visual,
        main_prompt=ufo_config.system.EVALUATION_PROMPT,
        example_prompt="",
    )

    requests = self.request_to_evaluate()

    # Evaluate the session, first use the default setting, if failed, then disable the screenshot evaluation.
    try:
        result, cost = evaluator.evaluate(
            request=requests,
            log_path=self.log_path,
            eva_all_screenshots=ufo_config.system.eva_all_screenshots,
            context=self.context,
        )
    except Exception as e:
        result, cost = evaluator.evaluate(
            request=requests,
            log_path=self.log_path,
            eva_all_screenshots=False,
            context=self.context,
        )

    # Add additional information to the evaluation result.
    additional_info = {
        "level": "session",
        "request": requests,
        "type": "evaluation_result",
    }
    result.update(additional_info)

    self._results.append(result)

    self.cost += cost

    evaluator.print_response(result)

    self.evaluation_logger.write(json.dumps(result))

    self.logger.info(
        f"Evaluation result saved to {os.path.join(self.log_path, 'evaluation.log')}"
    )

experience_saver()

Save the current trajectory as agent experience.

Source code in module/basic.py
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
def experience_saver(self) -> None:
    """
    Save the current trajectory as agent experience.
    """
    console.print(
        "📚 Summarizing and saving the execution flow as experience...",
        style="yellow",
    )

    summarizer = ExperienceSummarizer(
        ufo_config.app_agent.visual_mode,
        ufo_config.system.EXPERIENCE_PROMPT,
        ufo_config.system.APPAGENT_EXAMPLE_PROMPT,
        ufo_config.system.API_PROMPT,
    )
    experience = summarizer.read_logs(self.log_path)
    summaries, cost = summarizer.get_summary_list(experience)

    experience_path = ufo_config.system.EXPERIENCE_SAVED_PATH
    utils.create_folder(experience_path)
    summarizer.create_or_update_yaml(
        summaries, os.path.join(experience_path, "experience.yaml")
    )
    summarizer.create_or_update_vector_db(
        summaries, os.path.join(experience_path, "experience_db")
    )

    self.cost += cost
    self.logger.info(f"The experience has been saved to {experience_path}")

initialize_logger(log_path, log_filename, mode='a') staticmethod

Initialize logging. log_path: The path of the log file. log_filename: The name of the log file. return: The logger.

Source code in module/basic.py
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
@staticmethod
def initialize_logger(log_path: str, log_filename: str, mode="a") -> logging.Logger:
    """
    Initialize logging.
    log_path: The path of the log file.
    log_filename: The name of the log file.
    return: The logger.
    """
    # Code for initializing logging
    logger = logging.Logger(log_filename)

    if not ufo_config.system.print_log:
        # Remove existing handlers if PRINT_LOG is False
        logger.handlers = []

    log_file_path = os.path.join(log_path, log_filename)
    file_handler = logging.FileHandler(log_file_path, mode=mode, encoding="utf-8")
    formatter = logging.Formatter("%(message)s")
    file_handler.setFormatter(formatter)
    logger.addHandler(file_handler)
    logger.setLevel(ufo_config.system.log_level)

    return logger

is_error()

Check if the session is in error state. return: True if the session is in error state, otherwise False.

Source code in module/basic.py
767
768
769
770
771
772
773
774
def is_error(self):
    """
    Check if the session is in error state.
    return: True if the session is in error state, otherwise False.
    """
    if self.current_round is not None:
        return self.current_round.state.name() == AgentStatus.ERROR.value
    return False

is_finished()

Check if the session is ended. return: True if the session is ended, otherwise False.

Source code in module/basic.py
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
def is_finished(self) -> bool:
    """
    Check if the session is ended.
    return: True if the session is ended, otherwise False.
    """
    if (
        self._finish
        or self.step >= ufo_config.system.max_step
        or self.total_rounds >= ufo_config.system.max_round
    ):
        return True

    if self.is_error():
        return True

    return False

next_request() abstractmethod

Get the next request of the session. return: The request of the session.

Source code in module/basic.py
524
525
526
527
528
529
530
@abstractmethod
def next_request(self) -> str:
    """
    Get the next request of the session.
    return: The request of the session.
    """
    pass

print_cost()

Print the total cost of the session.

Source code in module/basic.py
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
def print_cost(self) -> None:
    """
    Print the total cost of the session.
    """

    if isinstance(self.cost, float) and self.cost > 0:
        formatted_cost = "${:.2f}".format(self.cost)
        console.print(
            f"💰 Total request cost of the session: {formatted_cost}",
            style="yellow",
        )
    else:
        console.print(
            f"ℹ️  Cost is not available for the model {ufo_config.host_agent.api_model} or {ufo_config.app_agent.api_model}.",
            style="yellow",
        )
        self.logger.warning("Cost information is not available.")

request_to_evaluate() abstractmethod

Get the request to evaluate. return: The request(s) to evaluate.

Source code in module/basic.py
793
794
795
796
797
798
799
@abstractmethod
def request_to_evaluate(self) -> str:
    """
    Get the request to evaluate.
    return: The request(s) to evaluate.
    """
    pass

reset() abstractmethod

Reset the session to initial state.

Source code in module/basic.py
801
802
803
804
805
806
@abstractmethod
def reset(self) -> None:
    """
    Reset the session to initial state.
    """
    pass

run() async

Run the session.

Returns:
  • List[Dict[str, str]]

    The result per session

Source code in module/basic.py
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
async def run(self) -> List[Dict[str, str]]:
    """
    Run the session.
    :return: The result per session
    """

    while not self.is_finished():

        round = self.create_new_round()
        if round is None:
            break

        round_result = await round.run()

        self.results.append({"request": round.request, "result": round_result})

    await self.capture_last_snapshot()

    if self._should_evaluate and not self.is_error():
        self.evaluation()

    if ufo_config.system.log_to_markdown:

        self.save_log_to_markdown()

    self.print_cost()

    return self.results

save_log_to_markdown()

Save the log of the session to markdown file.

Source code in module/basic.py
547
548
549
550
551
552
553
554
555
def save_log_to_markdown(self) -> None:
    """
    Save the log of the session to markdown file.
    """

    file_path = self.log_path
    trajectory = Trajectory(file_path)
    trajectory.to_markdown(file_path + "/output.md")
    self.logger.info(f"Trajectory saved to {file_path + '/output.md'}")

Session (Windows)

Bases: WindowsBaseSession

A session for UFO.

Initialize a session.

Parameters:
  • task (str) –

    The name of current task.

  • should_evaluate (bool) –

    Whether to evaluate the session.

  • id (int) –

    The id of the session.

  • request (str, default: '' ) –

    The user request of the session, optional. If not provided, UFO will ask the user to input the request.

  • mode (str, default: 'normal' ) –

    The mode of the task.

Source code in module/sessions/session.py
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def __init__(
    self,
    task: str,
    should_evaluate: bool,
    id: int,
    request: str = "",
    mode: str = "normal",
) -> None:
    """
    Initialize a session.
    :param task: The name of current task.
    :param should_evaluate: Whether to evaluate the session.
    :param id: The id of the session.
    :param request: The user request of the session, optional. If not provided, UFO will ask the user to input the request.
    :param mode: The mode of the task.
    """

    self._mode = mode
    super().__init__(task, should_evaluate, id)

    self._init_request = request
    self.logger = logging.getLogger(__name__)

create_new_round()

Create a new round.

Source code in module/sessions/session.py
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
def create_new_round(self) -> Optional[BaseRound]:
    """
    Create a new round.
    """

    # Get a request for the new round.
    request = self.next_request()

    # Create a new round and return None if the session is finished.

    if self.is_finished():
        return None

    self._host_agent.set_state(self._host_agent.default_state)

    round = BaseRound(
        request=request,
        agent=self._host_agent,
        context=self.context,
        should_evaluate=ufo_config.system.eva_round,
        id=self.total_rounds,
    )

    self.add_round(round.id, round)

    return round

next_request()

Get the request for the host agent.

Returns:
  • str

    The request for the host agent.

Source code in module/sessions/session.py
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
def next_request(self) -> str:
    """
    Get the request for the host agent.
    :return: The request for the host agent.
    """
    if self.total_rounds == 0:

        # If the request is provided via command line, use it directly.
        if self._init_request:
            return self._init_request
        # Otherwise, ask the user to input the request with enhanced UX.
        else:
            return interactor.first_request()
    else:
        request, iscomplete = interactor.new_request()
        if iscomplete:
            self._finish = True
        return request

request_to_evaluate()

Get the request to evaluate. return: The request(s) to evaluate.

Source code in module/sessions/session.py
151
152
153
154
155
156
157
def request_to_evaluate(self) -> str:
    """
    Get the request to evaluate.
    return: The request(s) to evaluate.
    """
    request_memory = self._host_agent.blackboard.requests
    return request_memory.to_json()

run() async

Run the session.

Source code in module/sessions/session.py
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
async def run(self) -> None:
    """
    Run the session.
    """
    await super().run()
    # Save the experience if the user asks so.

    save_experience = ufo_config.system.save_experience

    self.logger.info(f"Save experience setting: {save_experience}")

    if save_experience == "always":
        self.experience_saver()
    elif save_experience == "ask":
        if interactor.experience_asker():
            self.experience_saver()

    elif save_experience == "auto":
        task_completed = self.results.get("complete", "no")
        if task_completed.lower() == "yes":
            self.experience_saver()

    elif save_experience == "always_not":
        pass

LinuxSession

Bases: LinuxBaseSession

A session for UFO on Linux platform. Unlike Windows sessions, Linux sessions don't use a HostAgent. They work directly with application agents.

Initialize a Linux session.

Parameters:
  • task (str) –

    The name of current task.

  • should_evaluate (bool) –

    Whether to evaluate the session.

  • id (int) –

    The id of the session.

  • request (str, default: '' ) –

    The user request of the session.

  • mode (str, default: 'normal' ) –

    The mode of the task.

Source code in module/sessions/linux_session.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def __init__(
    self,
    task: str,
    should_evaluate: bool,
    id: int,
    request: str = "",
    mode: str = "normal",
) -> None:
    """
    Initialize a Linux session.
    :param task: The name of current task.
    :param should_evaluate: Whether to evaluate the session.
    :param id: The id of the session.
    :param request: The user request of the session.
    :param mode: The mode of the task.
    """
    self._mode = mode
    self._init_request = request
    super().__init__(task, should_evaluate, id)
    self.logger = logging.getLogger(__name__)

create_new_round()

Create a new round for Linux session. Since there's no host agent, directly create app-level rounds.

Source code in module/sessions/linux_session.py
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def create_new_round(self) -> Optional[BaseRound]:
    """
    Create a new round for Linux session.
    Since there's no host agent, directly create app-level rounds.
    """
    request = self.next_request()

    if self.is_finished():
        return None

    round = BaseRound(
        request=request,
        agent=self._agent,
        context=self.context,
        should_evaluate=ufo_config.system.eva_round,
        id=self.total_rounds,
    )

    self.add_round(round.id, round)
    return round

next_request()

Get the request for the app agent.

Returns:
  • str

    The request for the app agent.

Source code in module/sessions/linux_session.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
def next_request(self) -> str:
    """
    Get the request for the app agent.
    :return: The request for the app agent.
    """
    if self.total_rounds == 0:
        if self._init_request:
            return self._init_request
        else:
            return interactor.first_request()
    else:
        request, iscomplete = interactor.new_request()
        if iscomplete:
            self._finish = True
        return request

request_to_evaluate()

Get the request to evaluate.

Returns:
  • str

    The request(s) to evaluate.

Source code in module/sessions/linux_session.py
104
105
106
107
108
109
110
111
112
113
def request_to_evaluate(self) -> str:
    """
    Get the request to evaluate.
    :return: The request(s) to evaluate.
    """
    # For Linux session, collect requests from all rounds
    if self.current_round and hasattr(self.current_round.agent, "blackboard"):
        request_memory = self.current_round.agent.blackboard.requests
        return request_memory.to_json()
    return self._init_request

See Also