PDFReaderExecutor Server

Overview

PDFReaderExecutor provides PDF text extraction with optional human simulation capabilities.

Server Type: Action
Deployment: Local (in-process)
Agent: AppAgent, HostAgent
LLM-Selectable: ✅ Yes

Server Information

Property	Value
Namespace	`PDFReaderExecutor`
Server Name	`UFO PDF Reader MCP Server`
Platform	Cross-platform (Windows, Linux, macOS)
Dependencies	PyPDF2
Tool Type	`action`

Tools

extract_pdf_text

Extract text content from a single PDF file with optional human simulation.

Parameters

Parameter	Type	Required	Default	Description
`pdf_path`	`str`	✅ Yes	-	Full path to PDF file
`simulate_human`	`bool`	No	`True`	Simulate human-like document review

Returns

str - Extracted text content with page markers

Human Simulation Behavior

When simulate_human=True: 1. Opens PDF with default application 2. Waits 2-5 seconds (random) to simulate reading 3. Extracts text with page-by-page delays (0.5-1.5 seconds) 4. Closes PDF file

When simulate_human=False: - Direct text extraction (no delays) - No application launching

Example

# With human simulation (default)
result = await computer.run_actions([
    MCPToolCall(
        tool_key="action::extract_pdf_text",
        tool_name="extract_pdf_text",
        parameters={
            "pdf_path": "C:\\Documents\\report.pdf",
            "simulate_human": True
        }
    )
])

# Fast extraction (no simulation)
result = await computer.run_actions([
    MCPToolCall(
        tool_key="action::extract_pdf_text",
        tool_name="extract_pdf_text",
        parameters={
            "pdf_path": "C:\\Documents\\report.pdf",
            "simulate_human": False
        }
    )
])

Output Format

--- Page 1 ---
This is the content of page 1.

--- Page 2 ---
This is the content of page 2.

--- Page 3 ---
This is the content of page 3.

Error Handling

Returns error message string if: - File not found: "Error: PDF file not found at {path}" - Not a PDF: "Error: File {path} is not a PDF file" - Read error: "Error reading PDF {path}: {details}"

list_pdfs_in_directory

List all PDF files in a specified directory.

Parameters

Parameter	Type	Required	Description
`directory_path`	`str`	✅ Yes	Directory path to scan

Returns

List[str] - List of PDF file paths (sorted)

Example

result = await computer.run_actions([
    MCPToolCall(
        tool_key="action::list_pdfs_in_directory",
        tool_name="list_pdfs_in_directory",
        parameters={"directory_path": "C:\\Documents\\Reports"}
    )
])

# Output: [
#     "C:\\Documents\\Reports\\Q1_Report.pdf",
#     "C:\\Documents\\Reports\\Q2_Report.pdf",
#     "C:\\Documents\\Reports\\Q3_Report.pdf"
# ]

Error Handling

Returns empty list [] if: - Directory doesn't exist - Path is not a directory - No PDF files found

extract_all_pdfs_text

Extract text from all PDF files in a directory with human simulation.

Parameters

Parameter	Type	Required	Default	Description
`directory_path`	`str`	✅ Yes	-	Directory containing PDFs
`simulate_human`	`bool`	No	`True`	Simulate human review for each PDF

Returns

Dict[str, str] - Dictionary mapping filenames to extracted text

Human Simulation Behavior

When simulate_human=True: - Brief pause between files (1-3 seconds random) - Each PDF processed with human simulation - Progress messages logged

Example

result = await computer.run_actions([
    MCPToolCall(
        tool_key="action::extract_all_pdfs_text",
        tool_name="extract_all_pdfs_text",
        parameters={
            "directory_path": "C:\\Documents\\Reports",
            "simulate_human": True
        }
    )
])

# Output: {
#     "Q1_Report.pdf": "--- Page 1 ---\nQ1 Sales Report\n...",
#     "Q2_Report.pdf": "--- Page 1 ---\nQ2 Sales Report\n...",
#     "Q3_Report.pdf": "--- Page 1 ---\nQ3 Sales Report\n..."
# }

Error Handling

Returns dictionary with error key if: - Directory not found: {"error": "Directory not found: {path}"} - Not a directory: {"error": "Path is not a directory: {path}"} - No PDFs found: {"message": "No PDF files found in directory: {path}"}

Configuration

AppAgent:
  default:
    action:
      - namespace: PDFReaderExecutor
        type: local

HostAgent:
  default:
    action:
      - namespace: PDFReaderExecutor
        type: local

Best Practices

1. Disable Simulation for Batch Processing

# ✅ Good: Fast batch processing
result = await computer.run_actions([
    MCPToolCall(
        tool_key="action::extract_all_pdfs_text",
        tool_name="extract_all_pdfs_text",
        parameters={
            "directory_path": "C:\\Documents",
            "simulate_human": False  # Faster
        }
    )
])

# ❌ Bad: Slow with simulation
result = await computer.run_actions([
    MCPToolCall(
        tool_key="action::extract_all_pdfs_text",
        tool_name="extract_all_pdfs_text",
        parameters={
            "directory_path": "C:\\Documents",
            "simulate_human": True  # 2-5 seconds per file
        }
    )
])

2. Verify Files Exist

# List PDFs first
pdf_list = await computer.run_actions([
    MCPToolCall(
        tool_key="action::list_pdfs_in_directory",
        parameters={"directory_path": "C:\\Documents"}
    )
])

if pdf_list[0].data:
    # Extract from first PDF
    text = await computer.run_actions([
        MCPToolCall(
            tool_key="action::extract_pdf_text",
            parameters={
                "pdf_path": pdf_list[0].data[0],
                "simulate_human": False
            }
        )
    ])
else:
    logger.warning("No PDF files found")

3. Handle Large Documents

# Extract text
result = await computer.run_actions([
    MCPToolCall(
        tool_key="action::extract_pdf_text",
        parameters={"pdf_path": "large_document.pdf", "simulate_human": False}
    )
])

text = result[0].data

# Process in chunks if needed
if len(text) > 100000:  # Large document
    chunks = [text[i:i+50000] for i in range(0, len(text), 50000)]
    for chunk in chunks:
        process_chunk(chunk)

Use Cases

Document Analysis Pipeline

# 1. List all PDFs
pdfs = await computer.run_actions([
    MCPToolCall(
        tool_key="action::list_pdfs_in_directory",
        parameters={"directory_path": "C:\\Contracts"}
    )
])

# 2. Extract text from each
for pdf_path in pdfs[0].data:
    text = await computer.run_actions([
        MCPToolCall(
            tool_key="action::extract_pdf_text",
            parameters={"pdf_path": pdf_path, "simulate_human": False}
        )
    ])

    # 3. Analyze text
    analyze_contract(text[0].data)

Batch Report Processing

# Extract all reports at once
reports = await computer.run_actions([
    MCPToolCall(
        tool_key="action::extract_all_pdfs_text",
        tool_name="extract_all_pdfs_text",
        parameters={
            "directory_path": "C:\\Reports\\2024",
            "simulate_human": False
        }
    )
])

# Process all reports
for filename, content in reports[0].data.items():
    logger.info(f"Processing {filename}")
    # Extract data from content
    data = extract_report_data(content)

Limitations

Text-only: Cannot extract images or formatting
OCR not supported: Scanned PDFs with no text layer will return empty
Table parsing: Complex tables may not preserve structure
No modification: Read-only operations (cannot edit PDFs)

Performance

Operation	simulate_human=True	simulate_human=False
Single PDF (10 pages)	~10-20 seconds	~1 second
Batch 10 PDFs	~2-3 minutes	~10 seconds
Large PDF (100 pages)	~2-5 minutes	~5-10 seconds

Action Servers - Action server concepts
Local Servers - Local deployment

PDFReaderExecutor Server

Overview

Server Information

Tools

extract_pdf_text

Parameters

Returns

Human Simulation Behavior

Example

Output Format

Error Handling

list_pdfs_in_directory

Parameters

Returns

Example

Error Handling

extract_all_pdfs_text

Parameters

Returns

Human Simulation Behavior

Example

Error Handling

Configuration

Best Practices

1. Disable Simulation for Batch Processing

2. Verify Files Exist

3. Handle Large Documents

Use Cases

Document Analysis Pipeline

Batch Report Processing

Limitations

Performance

Related Documentation