Session
A Session is a continuous conversation instance between the user and UFO, managing multiple rounds of interaction from initial request to task completion across different execution modes and platforms.
Quick Reference:
- Session types? See Session Types
- Lifecycle? See Session Lifecycle
- Mode differences? See Execution Modes
- Platform differences? See Platform-Specific Sessions
Overview
A Session represents a complete conversation workflow, containing one or more Rounds of agent execution. Sessions manage:
- Context: Shared state across all rounds
- Agents: HostAgent and AppAgent (or LinuxAgent)
- Rounds: Individual request-response cycles
- Evaluation: Optional task completion assessment
- Experience: Learning from successful workflows
Relationship: Session vs Round
Session Types
UFO supports 7 session types across Windows and Linux platforms:
| Session Type | Platform | Mode | Description |
|---|---|---|---|
| Session | Windows | normal, normal_operator |
Interactive with HostAgent |
| ServiceSession | Windows | service |
WebSocket-controlled via AIP |
| FollowerSession | Windows | follower |
Replays saved plans |
| FromFileSession | Windows | batch_normal |
Executes from request files |
| OpenAIOperatorSession | Windows | operator |
Pure operator mode |
| LinuxSession | Linux | normal, normal_operator |
Interactive without HostAgent |
| LinuxServiceSession | Linux | service |
WebSocket-controlled on Linux |
Class Hierarchy
Platform Base Classes
WindowsBaseSession: Creates HostAgent, supports two-tier architectureLinuxBaseSession: Single-tier architecture with LinuxAgent only
Session Lifecycle
Standard Lifecycle
Core Execution Loop
The main session logic:
async def run(self) -> None:
"""
Run the session.
"""
while not self.is_finished():
# Create new round for each request
round = self.create_new_round()
if round is None:
break
# Execute the round
await round.run()
# Capture final state
if self.application_window is not None:
await self.capture_last_snapshot()
# Evaluate if configured
if self._should_evaluate and not self.is_error():
await self.evaluation()
# Print cost summary
self.print_cost()
Lifecycle Stages
1. Initialization
session = Session(
task="email_task",
should_evaluate=True,
id=0,
request="Send an email to John",
mode="normal"
)
What happens: - Task name assigned - Session ID set - Initial request stored - Mode configured
2. Context Initialization
def _init_context(self) -> None:
"""Initialize the session context."""
super()._init_context()
# Create MCP server manager
mcp_server_manager = MCPServerManager()
# Create local dispatcher
command_dispatcher = LocalCommandDispatcher(
session=self,
mcp_server_manager=mcp_server_manager
)
# Attach to context
self.context.attach_command_dispatcher(command_dispatcher)
What happens: - Context object created - Command dispatcher attached (Local or WebSocket) - MCP servers initialized (if applicable) - Application window tracked
3. Round Creation
def create_new_round(self):
"""Create a new round."""
# Get request (first or new)
if not self.context.get(ContextNames.REQUEST):
request = first_request()
else:
request, complete = new_request()
if complete:
return None
# Create round with request
round = Round(
task=self.task,
context=self.context,
request=request,
id=self._round_num
)
self._round_num += 1
return round
What happens: - User prompted for request (interactive modes) - Or request read from file/plan (non-interactive) - Round object created with shared context - Round counter incremented
4. Round Execution
await round.run()
What happens: - HostAgent selects application (Windows) - AppAgent executes in application (or LinuxAgent directly) - Commands dispatched and executed - Results captured in context - Experience logged
5. Continuation Check
def is_finished(self) -> bool:
"""Check if session is complete."""
return self.context.get(ContextNames.SESSION_FINISH, False)
What happens: - Check if user wants another request - Check if error occurred - Check if plan is complete (follower/batch modes)
6. Final Snapshot
async def capture_last_snapshot(self) -> None:
"""Capture the last snapshot of the application."""
last_round = self.context.get(ContextNames.ROUND_STEP)
subtask_amount = self.context.get(ContextNames.SUBTASK_AMOUNT)
# Capture screenshot
screenshot = self.application_window.capture_screenshot_infor()
# Save to logs
self.file_writer.save_screenshot(
screenshot,
last_round,
subtask_amount,
"last"
)
What happens: - Screenshot captured - Control tree logged - Final state preserved
7. Evaluation
async def evaluation(self) -> None:
"""Evaluate the session."""
evaluator = EvaluationAgent(
name="evaluation",
process_name=self.context.get(ContextNames.APPLICATION_PROCESS_NAME),
app_root_name=self.context.get(ContextNames.APPLICATION_ROOT_NAME),
is_visual=self.configs["EVA_SESSION"]["VIS_EVAL"],
main_prompt=self.configs["EVA_SESSION"]["MAIN_PROMPT"],
api_prompt=self.configs["EVA_SESSION"]["API_PROMPT"]
)
score = await evaluator.evaluate(
request=self.context.get(ContextNames.REQUEST),
trajectory=self.context.get(ContextNames.TRAJECTORY)
)
self.file_writer.save_evaluation(score)
What happens: - EvaluationAgent created - Task completion assessed - Score logged - Feedback saved
8. Cost Summary
def print_cost(self) -> None:
"""Print the session cost."""
total_cost = self.context.get(ContextNames.TOTAL_COST, 0.0)
total_tokens = self.context.get(ContextNames.TOTAL_TOKENS, 0)
console.print(f"[bold green]Session Complete[/bold green]")
console.print(f"Total Cost: ${total_cost:.4f}")
console.print(f"Total Tokens: {total_tokens}")
Execution Modes
Normal Mode
Interactive execution with user in the loop:
session = Session(
task="document_edit",
should_evaluate=True,
id=0,
request="", # Will prompt user
mode="normal"
)
await session.run()
Features:
- User prompted for initial request via first_request()
- User prompted for each new request via new_request()
- Commands executed locally via LocalCommandDispatcher
- User can exit anytime by typing "N"
Flow:
1. Display welcome panel
2. User enters: "Open Word"
3. HostAgent selects Word application
4. AppAgent types content
5. User asked: "What next?"
6. User enters: "Save document"
7. AppAgent saves file
8. User asked: "What next?"
9. User enters: "N" (exit)
10. Session ends
Normal_Operator Mode
Normal mode with operator capabilities:
session = Session(
task="complex_workflow",
should_evaluate=True,
id=0,
request="Organize my files by date",
mode="normal_operator"
)
Differences from Normal: - Agent can use operator-level actions - More powerful command set - Same interactive workflow
Service Mode
WebSocket-controlled remote execution:
from aip.protocol.task_execution import TaskExecutionProtocol
protocol = TaskExecutionProtocol(websocket_connection)
session = ServiceSession(
task="remote_automation",
should_evaluate=True,
id="session_abc123",
request="Click Submit button",
task_protocol=protocol
)
await session.run()
Features:
- No user interaction prompts
- Single request per session
- Commands sent via WebSocket
- Results returned to server
- Uses WebSocketCommandDispatcher
Flow:
1. Server sends request via WebSocket
2. ServiceSession created
3. Agent generates commands
4. Commands sent to client via WebSocket
5. Client executes locally
6. Results sent back
7. Session finishes immediately
Key Difference:
def is_finished(self) -> bool:
"""Service session finishes after one round."""
return self._round_num > 0
Follower Mode
Replay saved action plans:
session = FollowerSession(
task="email_replay",
plan_file="/plans/send_email.json",
should_evaluate=True,
id=0
)
await session.run()
Features: - No user prompts - Reads actions from plan file - Deterministic execution - Good for testing/demos
Plan File Format:
{
"request": "Send an email to John",
"actions": [
{
"agent": "HostAgent",
"action": "select_application",
"parameters": {"app_name": "Outlook"}
},
{
"agent": "AppAgent",
"action": "click_element",
"parameters": {"label": "New Email"}
}
]
}
Batch_Normal Mode
Execute multiple requests from files:
session = FromFileSession(
task="batch_task",
plan_file="/requests/task1.json",
should_evaluate=True,
id=0
)
await session.run()
Features: - Request loaded from file - No user interaction - Can batch multiple files with SessionPool - Task status tracking available
Request File:
{
"request": "Create a spreadsheet with sales data"
}
Operator Mode
Pure operator-level execution:
session = OpenAIOperatorSession(
task="system_automation",
should_evaluate=True,
id=0,
request="Install and configure software"
)
await session.run()
Features: - Operator-level permissions - Can modify system settings - More powerful than AppAgent - Same interactive prompts as normal mode
Platform-Specific Sessions
Windows Sessions
Characteristics:
- Two-tier architecture: HostAgent → AppAgent
- Base class: WindowsBaseSession
- Agent flow: HostAgent selects app, AppAgent controls it
- Automation: Uses UIA (UI Automation)
Example:
class Session(WindowsBaseSession):
"""Windows interactive session."""
def _init_context(self):
"""Initialize with HostAgent."""
super()._init_context()
# HostAgent created by WindowsBaseSession
self.host_agent = self.create_host_agent()
# MCP and LocalCommandDispatcher
self.setup_command_dispatcher()
Linux Sessions
Characteristics:
- Single-tier architecture: LinuxAgent only (no HostAgent)
- Base class: LinuxBaseSession
- Agent flow: LinuxAgent controls application directly
- Automation: Platform-specific tools
Example:
class LinuxSession(LinuxBaseSession):
"""Linux interactive session."""
def _init_context(self):
"""Initialize without HostAgent."""
super()._init_context()
# No HostAgent - direct LinuxAgent usage
self.linux_agent = self.create_linux_agent(
application_name=self.application_name
)
Comparison:
| Aspect | Windows | Linux |
|---|---|---|
| Architecture | Two-tier (HostAgent + AppAgent) | Single-tier (LinuxAgent) |
| Application Selection | HostAgent decides | Pre-specified or LinuxAgent decides |
| Agent Switching | Yes (HostAgent ↔ AppAgent) | No |
| Modes Supported | All 7 modes | normal, normal_operator, service |
| UI Automation | UIA (UIAutomation) | Platform tools |
See Platform Sessions for detailed comparison.
Experience Saving
Sessions can save successful workflows for future learning:
# After successful task completion
if self.configs["SAVE_EXPERIENCE"] == "ask":
save = experience_asker()
if save:
self.save_experience()
Save Modes:
| Mode | Behavior |
|---|---|
always |
Auto-save every successful session |
ask |
Prompt user after each session |
auto |
Save if evaluation score > threshold |
always_not |
Never save |
Saved Experience Structure:
{
"task": "Send email",
"request": "Send an email to John about the meeting",
"trajectory": [
{
"round": 0,
"agent": "HostAgent",
"observation": "Desktop with Outlook icon",
"action": "select_application",
"parameters": {"app_name": "Outlook"}
},
{
"round": 0,
"agent": "AppAgent",
"observation": "Outlook main window",
"action": "click_element",
"parameters": {"label": "New Email"}
}
],
"outcome": "success",
"evaluation_score": 0.95,
"cost": 0.0234,
"tokens": 1542
}
Error Handling
Error States
Sessions track errors through context:
def is_error(self) -> bool:
"""Check if session encountered error."""
return self.context.get(ContextNames.ERROR, False)
def set_error(self, error_message: str):
"""Set error state."""
self.context.set(ContextNames.ERROR, True)
self.context.set(ContextNames.ERROR_MESSAGE, error_message)
Error Recovery
try:
await round.run()
except AgentError as e:
self.set_error(str(e))
logger.error(f"Round {self._round_num} failed: {e}")
# Decide whether to continue or abort
if self.can_recover(e):
# Try next round
continue
else:
# Abort session
break
Common Errors
| Error Type | Cause | Handling |
|---|---|---|
| TimeoutError | Command execution timeout | Retry or skip |
| ConnectionError | WebSocket/MCP disconnection | Reconnect or abort |
| AgentError | Agent decision failure | Log and retry |
| ValidationError | Invalid command parameters | Skip command |
Best Practices
Session Creation
Efficient Sessions
- ✅ Use
SessionFactory.create_session()for platform-aware creation - ✅ Enable evaluation for quality tracking
- ✅ Choose appropriate mode for use case
- ✅ Set meaningful task names for logging
- ❌ Don't create sessions directly (use factory)
- ❌ Don't mix modes (each session has one mode)
Interactive Sessions
User Experience
- ✅ Provide clear initial requests
- ✅ Allow users to exit gracefully ("N" option)
- ✅ Show progress and confirmations
- ✅ Handle sensitive actions with confirmation
- ❌ Don't prompt excessively
- ❌ Don't hide errors from users
Service Sessions
WebSocket Considerations
- ✅ Always provide
task_protocol - ✅ Handle connection loss gracefully
- ✅ Set appropriate timeouts
- ✅ Validate requests before execution
- ❌ Don't assume connection is stable
- ❌ Don't block waiting for results indefinitely
Batch Sessions
Batch Processing
- ✅ Enable task status tracking
- ✅ Use descriptive file names
- ✅ Group similar tasks
- ✅ Log failures for retry
- ❌ Don't stop batch on first failure
- ❌ Don't run too many sessions in parallel
Examples
Example 1: Basic Interactive Session
from ufo.module.sessions.session import Session
# Create session
session = Session(
task="word_editing",
should_evaluate=True,
id=0,
request="", # Will prompt user
mode="normal"
)
# Run session
await session.run()
# User interaction:
# 1. Welcome panel shown
# 2. User enters: "Open Word and type Hello World"
# 3. HostAgent selects Word
# 4. AppAgent types text
# 5. User asked for next request
# 6. User enters: "N" to exit
# 7. Session evaluates and ends
Example 2: Service Session
from ufo.module.sessions.service_session import ServiceSession
from aip.protocol.task_execution import TaskExecutionProtocol
# WebSocket established
protocol = TaskExecutionProtocol(websocket)
# Create service session
session = ServiceSession(
task="remote_click",
should_evaluate=False, # Server evaluates
id="sess_12345",
request="Click the Submit button",
task_protocol=protocol
)
# Run (non-blocking for client)
await session.run()
# Session finishes after one request
Example 3: Follower Session
from ufo.module.sessions.session import FollowerSession
# Replay saved plan
session = FollowerSession(
task="email_demo",
plan_file="./plans/send_email.json",
should_evaluate=True,
id=0
)
await session.run()
# Executes exactly as recorded in plan file
# No user prompts
# Deterministic execution
Example 4: Linux Session
from ufo.module.sessions.linux_session import LinuxSession
# Linux interactive session
session = LinuxSession(
task="linux_task",
should_evaluate=True,
id=0,
request="Open gedit and type Hello Linux",
mode="normal",
application_name="gedit"
)
await session.run()
# Single-tier architecture
# No HostAgent
# LinuxAgent controls gedit directly
Reference
BaseSession
Bases: ABC
A basic session in UFO. A session consists of multiple rounds of interactions and conversations.
Initialize a session.
| Parameters: |
|
|---|
Source code in module/basic.py
449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 | |
application_window
property
writable
Get the application of the session. return: The application of the session.
application_window_info
property
writable
Get the application window info of the session. return: The application window info of the session.
context
property
Get the context of the session. return: The context of the session.
cost
property
writable
Get the cost of the session. return: The cost of the session.
current_agent_class
property
Get the class name of the current agent. return: The class name of the current agent.
current_round
property
Get the current round of the session. return: The current round of the session.
evaluation_logger
property
Get the file writer for evaluation. return: The file writer for evaluation.
host_agent
property
Get the host agent of the session. May return None for sessions that don't use a host agent (e.g., Linux).
| Returns: |
|
|---|
id
property
Get the id of the session. return: The id of the session.
results
property
writable
Get the evaluation results of the session. return: The evaluation results of the session.
rounds
property
Get the rounds of the session. return: The rounds of the session.
session_type
property
Get the class name of the session. return: The class name of the session.
step
property
Get the step of the session. return: The step of the session.
total_rounds
property
Get the total number of rounds in the session. return: The total number of rounds in the session.
add_round(id, round)
Add a round to the session.
| Parameters: |
|
|---|
Source code in module/basic.py
539 540 541 542 543 544 545 | |
capture_last_screenshot(save_path, full_screen=False)
async
Capture the last window screenshot.
| Parameters: |
|
|---|
Source code in module/basic.py
918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 | |
capture_last_snapshot()
async
Capture the last snapshot of the application, including the screenshot and the XML file if configured.
Source code in module/basic.py
893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 | |
capture_last_ui_tree(save_path)
async
Capture the last UI tree snapshot.
| Parameters: |
|
|---|
Source code in module/basic.py
954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 | |
create_following_round()
Create a following round. return: The following round.
Source code in module/basic.py
532 533 534 535 536 537 | |
create_new_round()
abstractmethod
Create a new round.
Source code in module/basic.py
517 518 519 520 521 522 | |
evaluation()
Evaluate the session.
Source code in module/basic.py
808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 | |
experience_saver()
Save the current trajectory as agent experience.
Source code in module/basic.py
719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 | |
initialize_logger(log_path, log_filename, mode='a')
staticmethod
Initialize logging. log_path: The path of the log file. log_filename: The name of the log file. return: The logger.
Source code in module/basic.py
974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 | |
is_error()
Check if the session is in error state. return: True if the session is in error state, otherwise False.
Source code in module/basic.py
767 768 769 770 771 772 773 774 | |
is_finished()
Check if the session is ended. return: True if the session is ended, otherwise False.
Source code in module/basic.py
776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 | |
next_request()
abstractmethod
Get the next request of the session. return: The request of the session.
Source code in module/basic.py
524 525 526 527 528 529 530 | |
print_cost()
Print the total cost of the session.
Source code in module/basic.py
749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 | |
request_to_evaluate()
abstractmethod
Get the request to evaluate. return: The request(s) to evaluate.
Source code in module/basic.py
793 794 795 796 797 798 799 | |
reset()
abstractmethod
Reset the session to initial state.
Source code in module/basic.py
801 802 803 804 805 806 | |
run()
async
Run the session.
| Returns: |
|
|---|
Source code in module/basic.py
478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 | |
save_log_to_markdown()
Save the log of the session to markdown file.
Source code in module/basic.py
547 548 549 550 551 552 553 554 555 | |
Session (Windows)
Bases: WindowsBaseSession
A session for UFO.
Initialize a session.
| Parameters: |
|
|---|
Source code in module/sessions/session.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |
create_new_round()
Create a new round.
Source code in module/sessions/session.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | |
next_request()
Get the request for the host agent.
| Returns: |
|
|---|
Source code in module/sessions/session.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | |
request_to_evaluate()
Get the request to evaluate. return: The request(s) to evaluate.
Source code in module/sessions/session.py
151 152 153 154 155 156 157 | |
run()
async
Run the session.
Source code in module/sessions/session.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | |
LinuxSession
Bases: LinuxBaseSession
A session for UFO on Linux platform. Unlike Windows sessions, Linux sessions don't use a HostAgent. They work directly with application agents.
Initialize a Linux session.
| Parameters: |
|
|---|
Source code in module/sessions/linux_session.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | |
create_new_round()
Create a new round for Linux session. Since there's no host agent, directly create app-level rounds.
Source code in module/sessions/linux_session.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | |
next_request()
Get the request for the app agent.
| Returns: |
|
|---|
Source code in module/sessions/linux_session.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
request_to_evaluate()
Get the request to evaluate.
| Returns: |
|
|---|
Source code in module/sessions/linux_session.py
104 105 106 107 108 109 110 111 112 113 | |
See Also
- Round - Individual request-response cycles
- Context - Shared state management
- Session Factory - Session creation
- Platform Sessions - Windows vs Linux