Session

A Session is a continuous conversation instance between the user and UFO, managing multiple rounds of interaction from initial request to task completion across different execution modes and platforms.

Quick Reference:


Overview

A Session represents a complete conversation workflow, containing one or more Rounds of agent execution. Sessions manage:

  1. Context: Shared state across all rounds
  2. Agents: HostAgent and AppAgent (or LinuxAgent)
  3. Rounds: Individual request-response cycles
  4. Evaluation: Optional task completion assessment
  5. Experience: Learning from successful workflows

Relationship: Session vs Round

graph TB subgraph "Session (Conversation)" S[Session Instance] CTX[Context<br/>Shared State] R1[Round 1<br/>Request 1] R2[Round 2<br/>Request 2] R3[Round 3<br/>Request 3] EVAL[Evaluation<br/>Optional] end subgraph "Round 1 Details" HOST1[HostAgent] APP1[AppAgent] CMD1[Commands] end subgraph "Round 2 Details" HOST2[HostAgent] APP2[AppAgent] CMD2[Commands] end S --> CTX S --> R1 S --> R2 S --> R3 S --> EVAL R1 -.shares.-> CTX R2 -.shares.-> CTX R3 -.shares.-> CTX R1 --> HOST1 HOST1 --> APP1 APP1 --> CMD1 R2 --> HOST2 HOST2 --> APP2 APP2 --> CMD2 style S fill:#e1f5ff style CTX fill:#fff4e1 style R1 fill:#f0ffe1 style R2 fill:#f0ffe1 style R3 fill:#f0ffe1 style EVAL fill:#ffe1f5

Session Types

UFO supports 7 session types across Windows and Linux platforms:

Session Type Platform Mode Description
Session Windows normal, normal_operator Interactive with HostAgent
ServiceSession Windows service WebSocket-controlled via AIP
FollowerSession Windows follower Replays saved plans
FromFileSession Windows batch_normal Executes from request files
OpenAIOperatorSession Windows operator Pure operator mode
LinuxSession Linux normal, normal_operator Interactive without HostAgent
LinuxServiceSession Linux service WebSocket-controlled on Linux

Class Hierarchy

graph TB BASE[BaseSession<br/>Abstract] WIN_BASE[WindowsBaseSession<br/>with HostAgent] LINUX_BASE[LinuxBaseSession<br/>without HostAgent] SESSION[Session<br/>Interactive] SERVICE[ServiceSession<br/>WebSocket] FOLLOWER[FollowerSession<br/>Plan Replay] FROMFILE[FromFileSession<br/>Batch] OPERATOR[OpenAIOperatorSession<br/>Operator] LINUX_SESS[LinuxSession<br/>Interactive] LINUX_SERVICE[LinuxServiceSession<br/>WebSocket] BASE --> WIN_BASE BASE --> LINUX_BASE WIN_BASE --> SESSION WIN_BASE --> SERVICE WIN_BASE --> FOLLOWER WIN_BASE --> FROMFILE WIN_BASE --> OPERATOR LINUX_BASE --> LINUX_SESS LINUX_BASE --> LINUX_SERVICE style BASE fill:#e1f5ff style WIN_BASE fill:#fff4e1 style LINUX_BASE fill:#f0ffe1 style SESSION fill:#e1ffe1 style LINUX_SESS fill:#e1ffe1

Platform Base Classes

  • WindowsBaseSession: Creates HostAgent, supports two-tier architecture
  • LinuxBaseSession: Single-tier architecture with LinuxAgent only

Session Lifecycle

Standard Lifecycle

stateDiagram-v2 [*] --> Initialized: __init__ Initialized --> ContextReady: _init_context ContextReady --> Running: run() Running --> RoundCreate: create_new_round RoundCreate --> RoundExecute: round.run() RoundExecute --> RoundComplete: Round finishes RoundComplete --> CheckMore: is_finished? CheckMore --> RoundCreate: More requests CheckMore --> Snapshot: No more requests Snapshot --> Evaluation: capture_last_snapshot Evaluation --> CostPrint: evaluation() if enabled CostPrint --> [*]: Session complete

Core Execution Loop

The main session logic:

async def run(self) -> None:
    """
    Run the session.
    """

    while not self.is_finished():
        # Create new round for each request
        round = self.create_new_round()
        if round is None:
            break

        # Execute the round
        await round.run()

    # Capture final state
    if self.application_window is not None:
        await self.capture_last_snapshot()

    # Evaluate if configured
    if self._should_evaluate and not self.is_error():
        await self.evaluation()

    # Print cost summary
    self.print_cost()

Lifecycle Stages

1. Initialization

session = Session(
    task="email_task",
    should_evaluate=True,
    id=0,
    request="Send an email to John",
    mode="normal"
)

What happens: - Task name assigned - Session ID set - Initial request stored - Mode configured

2. Context Initialization

def _init_context(self) -> None:
    """Initialize the session context."""
    super()._init_context()

    # Create MCP server manager
    mcp_server_manager = MCPServerManager()

    # Create local dispatcher
    command_dispatcher = LocalCommandDispatcher(
        session=self,
        mcp_server_manager=mcp_server_manager
    )

    # Attach to context
    self.context.attach_command_dispatcher(command_dispatcher)

What happens: - Context object created - Command dispatcher attached (Local or WebSocket) - MCP servers initialized (if applicable) - Application window tracked

3. Round Creation

def create_new_round(self):
    """Create a new round."""

    # Get request (first or new)
    if not self.context.get(ContextNames.REQUEST):
        request = first_request()
    else:
        request, complete = new_request()
        if complete:
            return None

    # Create round with request
    round = Round(
        task=self.task,
        context=self.context,
        request=request,
        id=self._round_num
    )

    self._round_num += 1
    return round

What happens: - User prompted for request (interactive modes) - Or request read from file/plan (non-interactive) - Round object created with shared context - Round counter incremented

4. Round Execution

await round.run()

What happens: - HostAgent selects application (Windows) - AppAgent executes in application (or LinuxAgent directly) - Commands dispatched and executed - Results captured in context - Experience logged

5. Continuation Check

def is_finished(self) -> bool:
    """Check if session is complete."""
    return self.context.get(ContextNames.SESSION_FINISH, False)

What happens: - Check if user wants another request - Check if error occurred - Check if plan is complete (follower/batch modes)

6. Final Snapshot

async def capture_last_snapshot(self) -> None:
    """Capture the last snapshot of the application."""

    last_round = self.context.get(ContextNames.ROUND_STEP)
    subtask_amount = self.context.get(ContextNames.SUBTASK_AMOUNT)

    # Capture screenshot
    screenshot = self.application_window.capture_screenshot_infor()

    # Save to logs
    self.file_writer.save_screenshot(
        screenshot,
        last_round,
        subtask_amount,
        "last"
    )

What happens: - Screenshot captured - Control tree logged - Final state preserved

7. Evaluation

async def evaluation(self) -> None:
    """Evaluate the session."""

    evaluator = EvaluationAgent(
        name="evaluation",
        process_name=self.context.get(ContextNames.APPLICATION_PROCESS_NAME),
        app_root_name=self.context.get(ContextNames.APPLICATION_ROOT_NAME),
        is_visual=self.configs["EVA_SESSION"]["VIS_EVAL"],
        main_prompt=self.configs["EVA_SESSION"]["MAIN_PROMPT"],
        api_prompt=self.configs["EVA_SESSION"]["API_PROMPT"]
    )

    score = await evaluator.evaluate(
        request=self.context.get(ContextNames.REQUEST),
        trajectory=self.context.get(ContextNames.TRAJECTORY)
    )

    self.file_writer.save_evaluation(score)

What happens: - EvaluationAgent created - Task completion assessed - Score logged - Feedback saved

8. Cost Summary

def print_cost(self) -> None:
    """Print the session cost."""

    total_cost = self.context.get(ContextNames.TOTAL_COST, 0.0)
    total_tokens = self.context.get(ContextNames.TOTAL_TOKENS, 0)

    console.print(f"[bold green]Session Complete[/bold green]")
    console.print(f"Total Cost: ${total_cost:.4f}")
    console.print(f"Total Tokens: {total_tokens}")

Execution Modes

Normal Mode

Interactive execution with user in the loop:

session = Session(
    task="document_edit",
    should_evaluate=True,
    id=0,
    request="",  # Will prompt user
    mode="normal"
)

await session.run()

Features: - User prompted for initial request via first_request() - User prompted for each new request via new_request() - Commands executed locally via LocalCommandDispatcher - User can exit anytime by typing "N"

Flow:

1. Display welcome panel
2. User enters: "Open Word"
3. HostAgent selects Word application
4. AppAgent types content
5. User asked: "What next?"
6. User enters: "Save document"
7. AppAgent saves file
8. User asked: "What next?"
9. User enters: "N" (exit)
10. Session ends

Normal_Operator Mode

Normal mode with operator capabilities:

session = Session(
    task="complex_workflow",
    should_evaluate=True,
    id=0,
    request="Organize my files by date",
    mode="normal_operator"
)

Differences from Normal: - Agent can use operator-level actions - More powerful command set - Same interactive workflow

Service Mode

WebSocket-controlled remote execution:

from aip.protocol.task_execution import TaskExecutionProtocol

protocol = TaskExecutionProtocol(websocket_connection)

session = ServiceSession(
    task="remote_automation",
    should_evaluate=True,
    id="session_abc123",
    request="Click Submit button",
    task_protocol=protocol
)

await session.run()

Features: - No user interaction prompts - Single request per session - Commands sent via WebSocket - Results returned to server - Uses WebSocketCommandDispatcher

Flow:

1. Server sends request via WebSocket
2. ServiceSession created
3. Agent generates commands
4. Commands sent to client via WebSocket
5. Client executes locally
6. Results sent back
7. Session finishes immediately

Key Difference:

def is_finished(self) -> bool:
    """Service session finishes after one round."""
    return self._round_num > 0

Follower Mode

Replay saved action plans:

session = FollowerSession(
    task="email_replay",
    plan_file="/plans/send_email.json",
    should_evaluate=True,
    id=0
)

await session.run()

Features: - No user prompts - Reads actions from plan file - Deterministic execution - Good for testing/demos

Plan File Format:

{
  "request": "Send an email to John",
  "actions": [
    {
      "agent": "HostAgent",
      "action": "select_application",
      "parameters": {"app_name": "Outlook"}
    },
    {
      "agent": "AppAgent",
      "action": "click_element",
      "parameters": {"label": "New Email"}
    }
  ]
}

Batch_Normal Mode

Execute multiple requests from files:

session = FromFileSession(
    task="batch_task",
    plan_file="/requests/task1.json",
    should_evaluate=True,
    id=0
)

await session.run()

Features: - Request loaded from file - No user interaction - Can batch multiple files with SessionPool - Task status tracking available

Request File:

{
  "request": "Create a spreadsheet with sales data"
}

Operator Mode

Pure operator-level execution:

session = OpenAIOperatorSession(
    task="system_automation",
    should_evaluate=True,
    id=0,
    request="Install and configure software"
)

await session.run()

Features: - Operator-level permissions - Can modify system settings - More powerful than AppAgent - Same interactive prompts as normal mode


Platform-Specific Sessions

Windows Sessions

Characteristics: - Two-tier architecture: HostAgent → AppAgent - Base class: WindowsBaseSession - Agent flow: HostAgent selects app, AppAgent controls it - Automation: Uses UIA (UI Automation)

Example:

class Session(WindowsBaseSession):
    """Windows interactive session."""

    def _init_context(self):
        """Initialize with HostAgent."""
        super()._init_context()

        # HostAgent created by WindowsBaseSession
        self.host_agent = self.create_host_agent()

        # MCP and LocalCommandDispatcher
        self.setup_command_dispatcher()

Linux Sessions

Characteristics: - Single-tier architecture: LinuxAgent only (no HostAgent) - Base class: LinuxBaseSession - Agent flow: LinuxAgent controls application directly - Automation: Platform-specific tools

Example:

class LinuxSession(LinuxBaseSession):
    """Linux interactive session."""

    def _init_context(self):
        """Initialize without HostAgent."""
        super()._init_context()

        # No HostAgent - direct LinuxAgent usage
        self.linux_agent = self.create_linux_agent(
            application_name=self.application_name
        )

Comparison:

Aspect Windows Linux
Architecture Two-tier (HostAgent + AppAgent) Single-tier (LinuxAgent)
Application Selection HostAgent decides Pre-specified or LinuxAgent decides
Agent Switching Yes (HostAgent ↔ AppAgent) No
Modes Supported All 7 modes normal, normal_operator, service
UI Automation UIA (UIAutomation) Platform tools

See Platform Sessions for detailed comparison.


Experience Saving

Sessions can save successful workflows for future learning:

# After successful task completion
if self.configs["SAVE_EXPERIENCE"] == "ask":
    save = experience_asker()

    if save:
        self.save_experience()

Save Modes:

Mode Behavior
always Auto-save every successful session
ask Prompt user after each session
auto Save if evaluation score > threshold
always_not Never save

Saved Experience Structure:

{
  "task": "Send email",
  "request": "Send an email to John about the meeting",
  "trajectory": [
    {
      "round": 0,
      "agent": "HostAgent",
      "observation": "Desktop with Outlook icon",
      "action": "select_application",
      "parameters": {"app_name": "Outlook"}
    },
    {
      "round": 0,
      "agent": "AppAgent",
      "observation": "Outlook main window",
      "action": "click_element",
      "parameters": {"label": "New Email"}
    }
  ],
  "outcome": "success",
  "evaluation_score": 0.95,
  "cost": 0.0234,
  "tokens": 1542
}

Error Handling

Error States

Sessions track errors through context:

def is_error(self) -> bool:
    """Check if session encountered error."""
    return self.context.get(ContextNames.ERROR, False)

def set_error(self, error_message: str):
    """Set error state."""
    self.context.set(ContextNames.ERROR, True)
    self.context.set(ContextNames.ERROR_MESSAGE, error_message)

Error Recovery

try:
    await round.run()
except AgentError as e:
    self.set_error(str(e))
    logger.error(f"Round {self._round_num} failed: {e}")

    # Decide whether to continue or abort
    if self.can_recover(e):
        # Try next round
        continue
    else:
        # Abort session
        break

Common Errors

Error Type Cause Handling
TimeoutError Command execution timeout Retry or skip
ConnectionError WebSocket/MCP disconnection Reconnect or abort
AgentError Agent decision failure Log and retry
ValidationError Invalid command parameters Skip command

Best Practices

Session Creation

Efficient Sessions

  • ✅ Use SessionFactory.create_session() for platform-aware creation
  • ✅ Enable evaluation for quality tracking
  • ✅ Choose appropriate mode for use case
  • ✅ Set meaningful task names for logging
  • ❌ Don't create sessions directly (use factory)
  • ❌ Don't mix modes (each session has one mode)

Interactive Sessions

User Experience

  • ✅ Provide clear initial requests
  • ✅ Allow users to exit gracefully ("N" option)
  • ✅ Show progress and confirmations
  • ✅ Handle sensitive actions with confirmation
  • ❌ Don't prompt excessively
  • ❌ Don't hide errors from users

Service Sessions

WebSocket Considerations

  • ✅ Always provide task_protocol
  • ✅ Handle connection loss gracefully
  • ✅ Set appropriate timeouts
  • ✅ Validate requests before execution
  • ❌ Don't assume connection is stable
  • ❌ Don't block waiting for results indefinitely

Batch Sessions

Batch Processing

  • ✅ Enable task status tracking
  • ✅ Use descriptive file names
  • ✅ Group similar tasks
  • ✅ Log failures for retry
  • ❌ Don't stop batch on first failure
  • ❌ Don't run too many sessions in parallel

Examples

Example 1: Basic Interactive Session

from ufo.module.sessions.session import Session

# Create session
session = Session(
    task="word_editing",
    should_evaluate=True,
    id=0,
    request="",  # Will prompt user
    mode="normal"
)

# Run session
await session.run()

# User interaction:
# 1. Welcome panel shown
# 2. User enters: "Open Word and type Hello World"
# 3. HostAgent selects Word
# 4. AppAgent types text
# 5. User asked for next request
# 6. User enters: "N" to exit
# 7. Session evaluates and ends

Example 2: Service Session

from ufo.module.sessions.service_session import ServiceSession
from aip.protocol.task_execution import TaskExecutionProtocol

# WebSocket established
protocol = TaskExecutionProtocol(websocket)

# Create service session
session = ServiceSession(
    task="remote_click",
    should_evaluate=False,  # Server evaluates
    id="sess_12345",
    request="Click the Submit button",
    task_protocol=protocol
)

# Run (non-blocking for client)
await session.run()

# Session finishes after one request

Example 3: Follower Session

from ufo.module.sessions.session import FollowerSession

# Replay saved plan
session = FollowerSession(
    task="email_demo",
    plan_file="./plans/send_email.json",
    should_evaluate=True,
    id=0
)

await session.run()

# Executes exactly as recorded in plan file
# No user prompts
# Deterministic execution

Example 4: Linux Session

from ufo.module.sessions.linux_session import LinuxSession

# Linux interactive session
session = LinuxSession(
    task="linux_task",
    should_evaluate=True,
    id=0,
    request="Open gedit and type Hello Linux",
    mode="normal",
    application_name="gedit"
)

await session.run()

# Single-tier architecture
# No HostAgent
# LinuxAgent controls gedit directly

Reference

BaseSession

Bases: ABC

A basic session in UFO. A session consists of multiple rounds of interactions and conversations.

Initialize a session.

Parameters:
  • task (str) –

    The name of current task.

  • should_evaluate (bool) –

    Whether to evaluate the session.

  • id (str) –

    The id of the session.

Source code in module/basic.py
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
def __init__(self, task: str, should_evaluate: bool, id: str) -> None:
    """
    Initialize a session.
    :param task: The name of current task.
    :param should_evaluate: Whether to evaluate the session.
    :param id: The id of the session.
    """

    self._should_evaluate = should_evaluate
    self._id = id
    self.task = task

    # Logging-related properties
    self.log_path = f"logs/{task}/"
    utils.create_folder(self.log_path)

    self._rounds: Dict[int, BaseRound] = {}

    self._context = Context()
    self._init_context()
    self._finish = False
    self._results = []
    self.logger = logging.getLogger(__name__)

    # Initialize platform-specific agents
    # Subclasses should override _init_agents() to set up their agents
    self._host_agent: Optional[HostAgent] = None
    self._init_agents()

application_window property writable

Get the application of the session. return: The application of the session.

application_window_info property writable

Get the application window info of the session. return: The application window info of the session.

context property

Get the context of the session. return: The context of the session.

cost property writable

Get the cost of the session. return: The cost of the session.

current_agent_class property

Get the class name of the current agent. return: The class name of the current agent.

current_round property

Get the current round of the session. return: The current round of the session.

evaluation_logger property

Get the file writer for evaluation. return: The file writer for evaluation.

host_agent property

Get the host agent of the session. May return None for sessions that don't use a host agent (e.g., Linux).

Returns:
  • Optional[HostAgent]

    The host agent of the session, or None if not applicable.

id property

Get the id of the session. return: The id of the session.

results property writable

Get the evaluation results of the session. return: The evaluation results of the session.

rounds property

Get the rounds of the session. return: The rounds of the session.

session_type property

Get the class name of the session. return: The class name of the session.

step property

Get the step of the session. return: The step of the session.

total_rounds property

Get the total number of rounds in the session. return: The total number of rounds in the session.

add_round(id, round)

Add a round to the session.

Parameters:
  • id (int) –

    The id of the round.

  • round (BaseRound) –

    The round to be added.

Source code in module/basic.py
551
552
553
554
555
556
557
def add_round(self, id: int, round: BaseRound) -> None:
    """
    Add a round to the session.
    :param id: The id of the round.
    :param round: The round to be added.
    """
    self._rounds[id] = round

capture_last_screenshot(save_path, full_screen=False) async

Capture the last window screenshot.

Parameters:
  • save_path (str) –

    The path to save the window screenshot.

  • full_screen (bool, default: False ) –

    Whether to capture the full screen or just the active window.

Source code in module/basic.py
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
async def capture_last_screenshot(
    self, save_path: str, full_screen: bool = False
) -> None:
    """
    Capture the last window screenshot.
    :param save_path: The path to save the window screenshot.
    :param full_screen: Whether to capture the full screen or just the active window.
    """

    try:
        if full_screen:
            command = Command(
                tool_name="capture_desktop_screenshot",
                parameters={"all_screens": True},
                tool_type="data_collection",
            )
        else:

            command = Command(
                tool_name="capture_window_screenshot",
                parameters={},
                tool_type="data_collection",
            )

        result = await self.context.command_dispatcher.execute_commands([command])
        image = result[0].result

        self.logger.info(f"Captured screenshot at final: {save_path}")
        if image:
            utils.save_image_string(image, save_path)

    except Exception as e:
        self.logger.warning(
            f"The last snapshot capture failed, due to the error: {e}"
        )

capture_last_snapshot() async

Capture the last snapshot of the application, including the screenshot and the XML file if configured.

Source code in module/basic.py
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
async def capture_last_snapshot(self) -> None:
    """
    Capture the last snapshot of the application, including the screenshot and the XML file if configured.
    """  # Capture the final screenshot
    screenshot_save_path = self.log_path + "action_step_final.png"

    if (
        self.application_window is not None
        or self.application_window_info is not None
    ):

        await self.capture_last_screenshot(screenshot_save_path)

        if ufo_config.system.save_ui_tree:
            ui_tree_path = os.path.join(self.log_path, "ui_trees")
            ui_tree_file_name = "ui_tree_final.json"
            ui_tree_save_path = os.path.join(ui_tree_path, ui_tree_file_name)
            await self.capture_last_ui_tree(ui_tree_save_path)

        if ufo_config.system.save_full_screen:

            desktop_save_path = self.log_path + "desktop_final.png"

            await self.capture_last_screenshot(desktop_save_path, full_screen=True)

capture_last_ui_tree(save_path) async

Capture the last UI tree snapshot.

Parameters:
  • save_path (str) –

    The path to save the UI tree snapshot.

Source code in module/basic.py
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
async def capture_last_ui_tree(self, save_path: str) -> None:
    """
    Capture the last UI tree snapshot.
    :param save_path: The path to save the UI tree snapshot.
    """

    result = await self.context.command_dispatcher.execute_commands(
        [
            Command(
                tool_name="get_ui_tree",
                parameters={},
                tool_type="data_collection",
            )
        ]
    )

    if result and result[0].result:
        with open(save_path, "w") as file:
            json.dump(result[0].result, file, indent=4)

create_following_round()

Create a following round. return: The following round.

Source code in module/basic.py
544
545
546
547
548
549
def create_following_round(self) -> BaseRound:
    """
    Create a following round.
    return: The following round.
    """
    pass

create_new_round() abstractmethod

Create a new round.

Source code in module/basic.py
529
530
531
532
533
534
@abstractmethod
def create_new_round(self) -> Optional[BaseRound]:
    """
    Create a new round.
    """
    pass

evaluation()

Evaluate the session.

Source code in module/basic.py
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
def evaluation(self) -> None:
    """
    Evaluate the session.
    """
    console.print(_safe_console_text("📊 Evaluating the session..."), style="yellow")

    is_visual = ufo_config.evaluation_agent.visual_mode

    evaluator = EvaluationAgent(
        name="eva_agent",
        is_visual=is_visual,
        main_prompt=ufo_config.system.EVALUATION_PROMPT,
        example_prompt="",
    )

    requests = self.request_to_evaluate()

    # Evaluate the session, first use the default setting, if failed, then disable the screenshot evaluation.
    try:
        result, cost = evaluator.evaluate(
            request=requests,
            log_path=self.log_path,
            eva_all_screenshots=ufo_config.system.eva_all_screenshots,
            context=self.context,
        )
    except Exception as e:
        result, cost = evaluator.evaluate(
            request=requests,
            log_path=self.log_path,
            eva_all_screenshots=False,
            context=self.context,
        )

    # Add additional information to the evaluation result.
    additional_info = {
        "level": "session",
        "request": requests,
        "type": "evaluation_result",
    }
    result.update(additional_info)

    self._results.append(result)

    self.cost += cost

    evaluator.print_response(result)

    self.evaluation_logger.write(json.dumps(result))

    self.logger.info(
        f"Evaluation result saved to {os.path.join(self.log_path, 'evaluation.log')}"
    )

experience_saver()

Save the current trajectory as agent experience.

Source code in module/basic.py
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
def experience_saver(self) -> None:
    """
    Save the current trajectory as agent experience.
    """
    console.print(
        _safe_console_text(
            "📚 Summarizing and saving the execution flow as experience..."
        ),
        style="yellow",
    )

    summarizer = ExperienceSummarizer(
        ufo_config.app_agent.visual_mode,
        ufo_config.system.EXPERIENCE_PROMPT,
        ufo_config.system.APPAGENT_EXAMPLE_PROMPT,
        ufo_config.system.API_PROMPT,
    )
    experience = summarizer.read_logs(self.log_path)
    summaries, cost = summarizer.get_summary_list(experience)

    experience_path = ufo_config.system.EXPERIENCE_SAVED_PATH
    utils.create_folder(experience_path)
    summarizer.create_or_update_yaml(
        summaries, os.path.join(experience_path, "experience.yaml")
    )
    summarizer.create_or_update_vector_db(
        summaries, os.path.join(experience_path, "experience_db")
    )

    self.cost += cost
    self.logger.info(f"The experience has been saved to {experience_path}")

initialize_logger(log_path, log_filename, mode='a') staticmethod

Initialize logging. log_path: The path of the log file. log_filename: The name of the log file. return: The logger.

Source code in module/basic.py
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
@staticmethod
def initialize_logger(log_path: str, log_filename: str, mode="a") -> logging.Logger:
    """
    Initialize logging.
    log_path: The path of the log file.
    log_filename: The name of the log file.
    return: The logger.
    """
    # Code for initializing logging
    logger = logging.Logger(log_filename)

    if not ufo_config.system.print_log:
        # Remove existing handlers if PRINT_LOG is False
        logger.handlers = []

    log_file_path = os.path.join(log_path, log_filename)
    file_handler = logging.FileHandler(log_file_path, mode=mode, encoding="utf-8")
    formatter = logging.Formatter("%(message)s")
    file_handler.setFormatter(formatter)
    logger.addHandler(file_handler)
    logger.setLevel(ufo_config.system.log_level)

    return logger

is_error()

Check if the session is in error state. return: True if the session is in error state, otherwise False.

Source code in module/basic.py
785
786
787
788
789
790
791
792
def is_error(self):
    """
    Check if the session is in error state.
    return: True if the session is in error state, otherwise False.
    """
    if self.current_round is not None:
        return self.current_round.state.name() == AgentStatus.ERROR.value
    return False

is_finished()

Check if the session is ended. return: True if the session is ended, otherwise False.

Source code in module/basic.py
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
def is_finished(self) -> bool:
    """
    Check if the session is ended.
    return: True if the session is ended, otherwise False.
    """
    if (
        self._finish
        or self.step >= ufo_config.system.max_step
        or self.total_rounds >= ufo_config.system.max_round
    ):
        return True

    if self.is_error():
        return True

    return False

next_request() abstractmethod

Get the next request of the session. return: The request of the session.

Source code in module/basic.py
536
537
538
539
540
541
542
@abstractmethod
def next_request(self) -> str:
    """
    Get the next request of the session.
    return: The request of the session.
    """
    pass

print_cost()

Print the total cost of the session.

Source code in module/basic.py
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
def print_cost(self) -> None:
    """
    Print the total cost of the session.
    """

    if isinstance(self.cost, float) and self.cost > 0:
        formatted_cost = "${:.2f}".format(self.cost)
        console.print(
            _safe_console_text(
                f"💰 Total request cost of the session: {formatted_cost}"
            ),
            style="yellow",
        )
    else:
        console.print(
            _safe_console_text(
                f"ℹ️  Cost is not available for the model {ufo_config.host_agent.api_model} or {ufo_config.app_agent.api_model}."
            ),
            style="yellow",
        )
        self.logger.warning("Cost information is not available.")

request_to_evaluate() abstractmethod

Get the request to evaluate. return: The request(s) to evaluate.

Source code in module/basic.py
811
812
813
814
815
816
817
@abstractmethod
def request_to_evaluate(self) -> str:
    """
    Get the request to evaluate.
    return: The request(s) to evaluate.
    """
    pass

reset() abstractmethod

Reset the session to initial state.

Source code in module/basic.py
819
820
821
822
823
824
@abstractmethod
def reset(self) -> None:
    """
    Reset the session to initial state.
    """
    pass

run() async

Run the session.

Returns:
  • List[Dict[str, str]]

    The result per session

Source code in module/basic.py
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
async def run(self) -> List[Dict[str, str]]:
    """
    Run the session.
    :return: The result per session
    """

    while not self.is_finished():

        round = self.create_new_round()
        if round is None:
            break

        round_result = await round.run()

        self.results.append({"request": round.request, "result": round_result})

    await self.capture_last_snapshot()

    if self._should_evaluate and not self.is_error():
        self.evaluation()

    if ufo_config.system.log_to_markdown:

        self.save_log_to_markdown()

    self.print_cost()

    return self.results

save_log_to_markdown()

Save the log of the session to markdown file.

Source code in module/basic.py
559
560
561
562
563
564
565
566
567
def save_log_to_markdown(self) -> None:
    """
    Save the log of the session to markdown file.
    """

    file_path = self.log_path
    trajectory = Trajectory(file_path)
    trajectory.to_markdown(file_path + "/output.md")
    self.logger.info(f"Trajectory saved to {file_path + '/output.md'}")

Session (Windows)

Bases: WindowsBaseSession

A session for UFO.

Initialize a session.

Parameters:
  • task (str) –

    The name of current task.

  • should_evaluate (bool) –

    Whether to evaluate the session.

  • id (int) –

    The id of the session.

  • request (str, default: '' ) –

    The user request of the session, optional. If not provided, UFO will ask the user to input the request.

  • mode (str, default: 'normal' ) –

    The mode of the task.

Source code in module/sessions/session.py
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def __init__(
    self,
    task: str,
    should_evaluate: bool,
    id: int,
    request: str = "",
    mode: str = "normal",
) -> None:
    """
    Initialize a session.
    :param task: The name of current task.
    :param should_evaluate: Whether to evaluate the session.
    :param id: The id of the session.
    :param request: The user request of the session, optional. If not provided, UFO will ask the user to input the request.
    :param mode: The mode of the task.
    """

    self._mode = mode
    super().__init__(task, should_evaluate, id)

    self._init_request = request
    self.logger = logging.getLogger(__name__)

create_new_round()

Create a new round.

Source code in module/sessions/session.py
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
def create_new_round(self) -> Optional[BaseRound]:
    """
    Create a new round.
    """

    # Get a request for the new round.
    request = self.next_request()

    # Create a new round and return None if the session is finished.

    if self.is_finished():
        return None

    self._host_agent.set_state(self._host_agent.default_state)

    round = BaseRound(
        request=request,
        agent=self._host_agent,
        context=self.context,
        should_evaluate=ufo_config.system.eva_round,
        id=self.total_rounds,
    )

    self.add_round(round.id, round)

    return round

next_request()

Get the request for the host agent.

Returns:
  • str

    The request for the host agent.

Source code in module/sessions/session.py
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
def next_request(self) -> str:
    """
    Get the request for the host agent.
    :return: The request for the host agent.
    """
    if self.total_rounds == 0:

        # If the request is provided via command line, use it directly.
        if self._init_request:
            return self._init_request
        # Otherwise, ask the user to input the request with enhanced UX.
        else:
            return interactor.first_request()
    else:
        request, iscomplete = interactor.new_request()
        if iscomplete:
            self._finish = True
        return request

request_to_evaluate()

Get the request to evaluate. return: The request(s) to evaluate.

Source code in module/sessions/session.py
152
153
154
155
156
157
158
def request_to_evaluate(self) -> str:
    """
    Get the request to evaluate.
    return: The request(s) to evaluate.
    """
    request_memory = self._host_agent.blackboard.requests
    return request_memory.to_json()

run() async

Run the session.

Source code in module/sessions/session.py
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
async def run(self) -> None:
    """
    Run the session.
    """
    await super().run()
    # Save the experience if the user asks so.

    save_experience = ufo_config.system.save_experience

    self.logger.info(f"Save experience setting: {save_experience}")

    if save_experience == "always":
        self.experience_saver()
    elif save_experience == "ask":
        if interactor.experience_asker():
            self.experience_saver()

    elif save_experience == "auto":
        task_completed = self.results.get("complete", "no")
        if task_completed.lower() == "yes":
            self.experience_saver()

    elif save_experience == "always_not":
        pass

LinuxSession

Bases: LinuxBaseSession

A session for UFO on Linux platform. Unlike Windows sessions, Linux sessions don't use a HostAgent. They work directly with application agents.

Initialize a Linux session.

Parameters:
  • task (str) –

    The name of current task.

  • should_evaluate (bool) –

    Whether to evaluate the session.

  • id (int) –

    The id of the session.

  • request (str, default: '' ) –

    The user request of the session.

  • mode (str, default: 'normal' ) –

    The mode of the task.

Source code in module/sessions/linux_session.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
def __init__(
    self,
    task: str,
    should_evaluate: bool,
    id: int,
    request: str = "",
    mode: str = "normal",
) -> None:
    """
    Initialize a Linux session.
    :param task: The name of current task.
    :param should_evaluate: Whether to evaluate the session.
    :param id: The id of the session.
    :param request: The user request of the session.
    :param mode: The mode of the task.
    """
    self._mode = mode
    self._init_request = request
    super().__init__(task, should_evaluate, id)
    self.logger = logging.getLogger(__name__)

create_new_round()

Create a new round for Linux session. Since there's no host agent, directly create app-level rounds.

Source code in module/sessions/linux_session.py
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
def create_new_round(self) -> Optional[BaseRound]:
    """
    Create a new round for Linux session.
    Since there's no host agent, directly create app-level rounds.
    """
    request = self.next_request()

    if self.is_finished():
        return None

    round = BaseRound(
        request=request,
        agent=self._agent,
        context=self.context,
        should_evaluate=ufo_config.system.eva_round,
        id=self.total_rounds,
    )

    self.add_round(round.id, round)
    return round

next_request()

Get the request for the app agent.

Returns:
  • str

    The request for the app agent.

Source code in module/sessions/linux_session.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
def next_request(self) -> str:
    """
    Get the request for the app agent.
    :return: The request for the app agent.
    """
    if self.total_rounds == 0:
        if self._init_request:
            return self._init_request
        else:
            return interactor.first_request()
    else:
        request, iscomplete = interactor.new_request()
        if iscomplete:
            self._finish = True
        return request

request_to_evaluate()

Get the request to evaluate.

Returns:
  • str

    The request(s) to evaluate.

Source code in module/sessions/linux_session.py
104
105
106
107
108
109
110
111
112
113
def request_to_evaluate(self) -> str:
    """
    Get the request to evaluate.
    :return: The request(s) to evaluate.
    """
    # For Linux session, collect requests from all rounds
    if self.current_round and hasattr(self.current_round.agent, "blackboard"):
        request_memory = self.current_round.agent.blackboard.requests
        return request_memory.to_json()
    return self._init_request

See Also