PDFReaderExecutor Server
Overview
PDFReaderExecutor provides PDF text extraction with optional human simulation capabilities.
Server Type: Action
Deployment: Local (in-process)
Agent: AppAgent, HostAgent
LLM-Selectable: ✅ Yes
Server Information
| Property | Value |
|---|---|
| Namespace | PDFReaderExecutor |
| Server Name | UFO PDF Reader MCP Server |
| Platform | Cross-platform (Windows, Linux, macOS) |
| Dependencies | PyPDF2 |
| Tool Type | action |
Tools
extract_pdf_text
Extract text content from a single PDF file with optional human simulation.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
pdf_path |
str |
✅ Yes | - | Full path to PDF file |
simulate_human |
bool |
No | True |
Simulate human-like document review |
Returns
str - Extracted text content with page markers
Human Simulation Behavior
When simulate_human=True:
1. Opens PDF with default application
2. Waits 2-5 seconds (random) to simulate reading
3. Extracts text with page-by-page delays (0.5-1.5 seconds)
4. Closes PDF file
When simulate_human=False:
- Direct text extraction (no delays)
- No application launching
Example
# With human simulation (default)
result = await computer.run_actions([
MCPToolCall(
tool_key="action::extract_pdf_text",
tool_name="extract_pdf_text",
parameters={
"pdf_path": "C:\\Documents\\report.pdf",
"simulate_human": True
}
)
])
# Fast extraction (no simulation)
result = await computer.run_actions([
MCPToolCall(
tool_key="action::extract_pdf_text",
tool_name="extract_pdf_text",
parameters={
"pdf_path": "C:\\Documents\\report.pdf",
"simulate_human": False
}
)
])
Output Format
--- Page 1 ---
This is the content of page 1.
--- Page 2 ---
This is the content of page 2.
--- Page 3 ---
This is the content of page 3.
Error Handling
Returns error message string if:
- File not found: "Error: PDF file not found at {path}"
- Not a PDF: "Error: File {path} is not a PDF file"
- Read error: "Error reading PDF {path}: {details}"
list_pdfs_in_directory
List all PDF files in a specified directory.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
directory_path |
str |
✅ Yes | Directory path to scan |
Returns
List[str] - List of PDF file paths (sorted)
Example
result = await computer.run_actions([
MCPToolCall(
tool_key="action::list_pdfs_in_directory",
tool_name="list_pdfs_in_directory",
parameters={"directory_path": "C:\\Documents\\Reports"}
)
])
# Output: [
# "C:\\Documents\\Reports\\Q1_Report.pdf",
# "C:\\Documents\\Reports\\Q2_Report.pdf",
# "C:\\Documents\\Reports\\Q3_Report.pdf"
# ]
Error Handling
Returns empty list [] if:
- Directory doesn't exist
- Path is not a directory
- No PDF files found
extract_all_pdfs_text
Extract text from all PDF files in a directory with human simulation.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
directory_path |
str |
✅ Yes | - | Directory containing PDFs |
simulate_human |
bool |
No | True |
Simulate human review for each PDF |
Returns
Dict[str, str] - Dictionary mapping filenames to extracted text
Human Simulation Behavior
When simulate_human=True:
- Brief pause between files (1-3 seconds random)
- Each PDF processed with human simulation
- Progress messages logged
Example
result = await computer.run_actions([
MCPToolCall(
tool_key="action::extract_all_pdfs_text",
tool_name="extract_all_pdfs_text",
parameters={
"directory_path": "C:\\Documents\\Reports",
"simulate_human": True
}
)
])
# Output: {
# "Q1_Report.pdf": "--- Page 1 ---\nQ1 Sales Report\n...",
# "Q2_Report.pdf": "--- Page 1 ---\nQ2 Sales Report\n...",
# "Q3_Report.pdf": "--- Page 1 ---\nQ3 Sales Report\n..."
# }
Error Handling
Returns dictionary with error key if:
- Directory not found: {"error": "Directory not found: {path}"}
- Not a directory: {"error": "Path is not a directory: {path}"}
- No PDFs found: {"message": "No PDF files found in directory: {path}"}
Configuration
AppAgent:
default:
action:
- namespace: PDFReaderExecutor
type: local
HostAgent:
default:
action:
- namespace: PDFReaderExecutor
type: local
Best Practices
1. Disable Simulation for Batch Processing
# ✅ Good: Fast batch processing
result = await computer.run_actions([
MCPToolCall(
tool_key="action::extract_all_pdfs_text",
tool_name="extract_all_pdfs_text",
parameters={
"directory_path": "C:\\Documents",
"simulate_human": False # Faster
}
)
])
# ❌ Bad: Slow with simulation
result = await computer.run_actions([
MCPToolCall(
tool_key="action::extract_all_pdfs_text",
tool_name="extract_all_pdfs_text",
parameters={
"directory_path": "C:\\Documents",
"simulate_human": True # 2-5 seconds per file
}
)
])
2. Verify Files Exist
# List PDFs first
pdf_list = await computer.run_actions([
MCPToolCall(
tool_key="action::list_pdfs_in_directory",
parameters={"directory_path": "C:\\Documents"}
)
])
if pdf_list[0].data:
# Extract from first PDF
text = await computer.run_actions([
MCPToolCall(
tool_key="action::extract_pdf_text",
parameters={
"pdf_path": pdf_list[0].data[0],
"simulate_human": False
}
)
])
else:
logger.warning("No PDF files found")
3. Handle Large Documents
# Extract text
result = await computer.run_actions([
MCPToolCall(
tool_key="action::extract_pdf_text",
parameters={"pdf_path": "large_document.pdf", "simulate_human": False}
)
])
text = result[0].data
# Process in chunks if needed
if len(text) > 100000: # Large document
chunks = [text[i:i+50000] for i in range(0, len(text), 50000)]
for chunk in chunks:
process_chunk(chunk)
Use Cases
Document Analysis Pipeline
# 1. List all PDFs
pdfs = await computer.run_actions([
MCPToolCall(
tool_key="action::list_pdfs_in_directory",
parameters={"directory_path": "C:\\Contracts"}
)
])
# 2. Extract text from each
for pdf_path in pdfs[0].data:
text = await computer.run_actions([
MCPToolCall(
tool_key="action::extract_pdf_text",
parameters={"pdf_path": pdf_path, "simulate_human": False}
)
])
# 3. Analyze text
analyze_contract(text[0].data)
Batch Report Processing
# Extract all reports at once
reports = await computer.run_actions([
MCPToolCall(
tool_key="action::extract_all_pdfs_text",
tool_name="extract_all_pdfs_text",
parameters={
"directory_path": "C:\\Reports\\2024",
"simulate_human": False
}
)
])
# Process all reports
for filename, content in reports[0].data.items():
logger.info(f"Processing {filename}")
# Extract data from content
data = extract_report_data(content)
Limitations
- Text-only: Cannot extract images or formatting
- OCR not supported: Scanned PDFs with no text layer will return empty
- Table parsing: Complex tables may not preserve structure
- No modification: Read-only operations (cannot edit PDFs)
Performance
| Operation | simulate_human=True | simulate_human=False |
|---|---|---|
| Single PDF (10 pages) | ~10-20 seconds | ~1 second |
| Batch 10 PDFs | ~2-3 minutes | ~10 seconds |
| Large PDF (100 pages) | ~2-5 minutes | ~5-10 seconds |
Related Documentation
- Action Servers - Action server concepts
- Local Servers - Local deployment