Development Best Practices for MCP Servers¶
Overview¶
Building effective MCP servers requires different thinking than traditional API development. This page provides practical guidance for designing, implementing, and testing MCP tools that agents can successfully use.
Tool Design Principles¶
The quality of your MCP tools directly impacts agent performance and user experience. Well-designed tools enable agents to accomplish tasks efficiently, while poorly designed tools lead to confusion, errors, and frustrated users.
This section covers the core principles for designing tools that agents can understand and use effectively.
Key Principle: Design for Agent Experience, Not API Completeness
Traditional APIs expose endpoints for flexibility. MCP tools should expose tasks that accomplish specific goals. Think "what does the user want to accomplish?" rather than "what operations does the system support?"
Experience-Oriented vs System APIs¶
The most common mistake when building MCP servers is treating them like REST APIs: exposing every operation as a separate tool. This overwhelms LLMs and leads to poor agent performance.
❌ Avoid: Auto-generating tools from OpenAPI specs that expose every endpoint
✅ Prefer: Task-oriented tools that encapsulate multi-step workflows
Examples:
| System API Approach | Experience-Oriented Approach |
|---|---|
create_ticket, add_comment, assign_ticket, notify_user |
file_support_request (does all 4 steps internally) |
list_deployments, get_deployment, get_logs |
summarize_recent_deployments (aggregates and analyzes) |
search_orders, get_order_details, get_customer_info |
lookup_order (combines all 3, returns complete context) |
Why this matters: Each tool invocation consumes tokens and requires the LLM to reason about next steps. Combining related operations into single tools reduces complexity and improves reliability.
Key Design Guidelines¶
These five guidelines translate the core principle into concrete practices. Apply them when designing any MCP tool to maximize agent success.
1. Limit Tool Count
Why: LLMs perform significantly worse when choosing from large tool sets. Keep servers focused and specialized with a manageable number of tools.
If you have many operations:
- Split into multiple domain-specific servers (Sales, Finance, IT)
- Group related operations into composite tools
- Use tool parameters to handle variations (e.g.,
statusparameter instead ofget_active_orders,get_pending_orders)
Rule of thumb: If you're exposing more than 10-15 tools from a single server, consider whether they can be consolidated or split across multiple servers.
2. Task-Focused Naming
Why: Tool names should describe what the user accomplishes, not what the system does.
Examples:
- ❌
execute_sql_query→ ✅analyze_sales_trends - ❌
get_tickets_by_status→ ✅find_open_support_issues - ❌
update_record→ ✅change_customer_address
3. Return Decision-Ready Data
Why: Agents should get actionable summaries, not raw data dumps they need to process.
Pattern: Instead of returning 100 rows of deployment logs, return:
{
"summary": "3 deployments in last 24h: 2 successful, 1 failed",
"failed_deployment": {
"name": "api-v2.1.3",
"error": "Health check timeout",
"recommendation": "Check database connection pool settings"
},
"recent_deployments": [...]
}
4. Embed Guardrails Server-Side
Why: Don't rely on LLMs to enforce business rules. Validate and constrain operations in your server implementation.
Examples:
- Validate refund amounts don't exceed order total
- Prevent account deletion if balance > 0
- Require manager approval for expenses > $5000
- Enforce data residency rules based on tenant
5. Write Clear Descriptions
Why: The description is the LLM's primary signal for when and how to use each tool.
Template:
{
"name": "file_support_request",
"description": "Creates a new support ticket with customer details, issue description, and priority. Automatically assigns to appropriate team based on category. Use when customer reports a problem or needs help.",
"inputSchema": {...}
}
Best Practices:
- Start with the action verb ("Creates", "Retrieves", "Updates")
- Explain what happens automatically
- State when the tool should be used
- Keep to 2-3 sentences maximum
Common Anti-Patterns¶
Learn from these common mistakes:
Exposing Raw Database Queries
Anti-Pattern: execute_sql(query: string) tool
Why it's bad: - Security risk (SQL injection, data exposure) - LLMs aren't reliable SQL generators - No way to enforce access control
Instead: Create purpose-specific tools (get_customer_orders, search_products)
Too Many Similar Tools
Anti-Pattern: get_active_orders, get_pending_orders, get_completed_orders, get_cancelled_orders
Why it's bad: Increases tool count, confuses LLMs
Instead: get_orders(status: enum) - Use parameters for variations
Returning Unstructured Text
Anti-Pattern: Returning HTML, markdown, or verbose prose
Why it's bad: Agents can't extract structured data reliably
Instead: Return JSON with clear fields the agent can reference
No Rate Limiting or Quotas
Anti-Pattern: Allowing unlimited tool invocations
Why it's bad: Agents can trigger runaway loops or exhaust backend resources
Instead: Implement rate limits per user/client in Azure APIM
Schema Design¶
Schemas are how you communicate with the LLM about what your tool needs and what it returns. Well-designed schemas help the LLM understand exactly what parameters to provide and what to expect back. Poorly designed schemas lead to validation errors, confused agents, and failed invocations.
Think of schemas as the contract between your tool and the agent using it.
Input Schema (Required)¶
Every tool must define an inputSchema using JSON Schema. This tells the LLM what parameters the tool expects and how to use them correctly.
MCP Requirements:
inputSchemais required and must be a valid JSON Schema object (notnull)- Defaults to JSON Schema 2020-12 if no
$schemafield is present - For tools with no parameters, use:
{"type": "object", "additionalProperties": false} - Tool names should be 1-128 characters, case-sensitive, using only:
A-Z,a-z,0-9,_,-,.
Key principle: Keep schemas simple and self-documenting. Rich descriptions help the LLM understand not just what parameters exist, but when and how to use them.
Good Input Schema
{
"type": "object",
"properties": {
"customer_email": {
"type": "string",
"format": "email",
"description": "Customer's email address for communication and ticket tracking"
},
"issue_type": {
"type": "string",
"enum": ["billing", "technical", "account"],
"description": "Category of the support request. Use 'billing' for payment issues, 'technical' for product problems, 'account' for login or access issues."
},
"priority": {
"type": "string",
"enum": ["low", "normal", "high", "urgent"],
"default": "normal",
"description": "Urgency level. Use 'urgent' only for system outages affecting multiple users. Default is 'normal' for standard requests."
},
"description": {
"type": "string",
"description": "Detailed description of the issue in the customer's own words. Include relevant error messages or symptoms."
}
},
"required": ["customer_email", "issue_type", "description"]
}
What makes this effective:
- Enums constrain choices: Prevents invalid values and helps LLM select the right category
- Rich descriptions: Explains when and how to use each option with specific examples
- Required fields: Clearly marked so LLM knows what must be provided
- Sensible defaults: Reduces decision burden for common cases
- Helpful context: Descriptions guide the LLM toward correct usage
Output Schema (Optional)¶
Tools may optionally define an outputSchema to validate structured results. Think of this as documentation for the LLM about what to expect back, plus a validation layer to catch issues early.
When provided:
- Servers must return structured results conforming to this schema
- Clients should validate results against the schema
- Results can include both
content(unstructured) andstructuredContent(structured) fields
Why use output schemas?: They enable strict validation, provide type information for better integration, and help LLMs understand how to parse and use your tool's responses.
Output Content Guidelines¶
Structure matters. The way you format responses directly affects whether agents can extract and use the information correctly.
The golden rule: Return structured data that agents can reason about, not HTML or verbose text that requires parsing.
Most tools should return simple JSON with clear, extractable fields. For specialized use cases, MCP supports additional content types like base64-encoded images or resource links, but start with structured JSON unless you have a specific need for richer content.
❌ Avoid: Unstructured responses that bury information in prose
{
"message": "Ticket #12345 created successfully. Assigned to IT team. Customer will be notified via email."
}
✅ Prefer: Structured responses with clear, extractable fields
{
"ticket_id": "12345",
"status": "open",
"assigned_team": "IT",
"estimated_response_time": "2 hours",
"next_steps": [
"Customer notified via email",
"IT team alerted",
"Agent will follow up if no response in 2 hours"
]
}
Why this matters: The second format lets the agent extract specific values (ticket_id, status) and reason about next steps. The first format requires parsing prose, which is error-prone and unreliable.
Error Handling¶
Errors are inevitable, but how you handle them determines whether an agent can recover or gets stuck. The difference between a frustrating failure and a successful retry often comes down to how clearly you communicate what went wrong and what to do about it.
Think of error messages as teaching moments for the LLM. A good error message doesn't just say "something failed"—it explains the problem, provides context, and suggests a path forward.
Two Types of Errors¶
MCP distinguishes between errors that the agent can potentially fix and errors that indicate deeper problems:
Tool Execution Errors (isError: true) are for problems the LLM can understand and correct:
{
"content": [
{
"type": "text",
"text": "User lacks permission to approve expenses over $5000. Required role: Finance.Approver. Your roles: Employee. Suggested action: Request approval from finance team."
}
],
"isError": true
}
Why this works: The error explains what failed (permission check), what was expected (Finance.Approver role), what the user has (Employee role), and what to do next (request approval). The LLM can use this information to adjust its approach or inform the user.
Protocol Errors (JSON-RPC) are for structural problems like malformed requests or unknown tools. These indicate issues the LLM is unlikely to fix, such as calling a tool that doesn't exist or passing invalid JSON.
Writing Effective Error Messages¶
The best error messages follow a simple pattern: What happened → Why it happened → What to do next
Good Error Messages
Rate Limit:
"Rate limit exceeded: 100 requests per minute. Current usage: 103 requests.
Wait 45 seconds before retrying or reduce request frequency."
Validation Error:
"Invalid date format in 'start_date' field. Received: '2024/12/23'.
Expected format: YYYY-MM-DD (e.g., '2024-12-23')."
Permission Error:
"Insufficient permissions to delete customer records. Required role: Admin.
Your roles: Viewer, Editor. Contact your administrator to request Admin access."
Resource Not Found:
"Order #12345 not found. Verify the order ID is correct.
Recent orders: #12344, #12346, #12347."
What makes these effective: - State the problem clearly - Provide specific context (what was received, what was expected) - Offer actionable next steps - Include helpful details (like similar valid values)
Poor Error Messages
❌ "Error"
❌ "Invalid input"
❌ "Permission denied"
❌ "Not found"
Why these fail: No context about what went wrong, no guidance on how to fix it. The agent has no path forward.
Security Considerations¶
Balance Helpfulness with Information Disclosure
Error messages should help agents recover without exposing sensitive system details. Always ask: "Could an attacker use this information?"
Never expose in error messages:
- Internal file paths, stack traces, or database schemas
- Other users' data (emails, names, account details)
- Whether specific resources exist (prevents enumeration attacks)
- API keys, connection strings, or credentials
- Detailed system architecture or service names
Safe to include:
- What operation failed ("refund request", "ticket creation")
- Why it failed at a business logic level ("insufficient permissions", "invalid date format")
- What the agent should do next ("wait 45 seconds", "request approval")
- Request/correlation IDs for support teams (sanitized)
Pattern: Log detailed errors server-side for debugging, but return sanitized, actionable messages to agents.
Secure vs Insecure Error Messages
❌ Information Disclosure:
"User john.doe@company.com does not exist in database table 'users'
(connection: sql-prod-eastus.database.windows.net)"
✅ Secure Alternative:
"Unable to process request. Verify the email address and try again.
Request ID: req-abc-123"
❌ Excessive Technical Detail:
"NullReferenceException at OrderService.cs:142 in GetCustomerOrders().
Stack trace: [...]"
✅ Secure Alternative:
"Unable to retrieve orders at this time. Please try again or contact support.
Request ID: req-xyz-789"
Include Tracing Information¶
For operations teams debugging issues, include sanitized request IDs or correlation tokens:
{
"content": [
{
"type": "text",
"text": "Unable to create ticket due to temporary service issue. Request ID: req-abc-123. Please retry in a few moments."
}
],
"isError": true
}
Log detailed context server-side only (stack traces, input parameters, user identity, internal service names). Keep error messages to agents focused on what they need to recover, not what you need to debug.
Testing MCP Servers¶
Testing MCP servers is fundamentally different from testing traditional APIs. With REST APIs, you verify that endpoints return correct responses for given inputs. With MCP servers, you also need to verify that LLMs can understand and correctly use your tools. This means testing not just functionality, but discoverability, clarity, and agent behavior.
A tool that works perfectly in isolation can still fail if its description is ambiguous, its schema is unclear, or the LLM chooses it at the wrong time. Effective MCP testing validates the entire interaction chain: from tool discovery, through parameter selection, to result interpretation.
Three Layers of Testing¶
Think of MCP testing as a pyramid with three layers, each testing different aspects of your implementation:
┌─────────────────────────────────┐
│ 🎯 AGENT TESTS │
│ Validate LLM Behavior │
│ • Real-world scenarios │
│ • Tool selection & workflows │
│ • Slowest, most valuable │
└─────────────────────────────────┘
▲
│
┌──────────────────────────────────────────────┐
│ ⚙️ PROTOCOL TESTS │
│ Verify MCP Compliance │
│ • Tool discovery & invocation │
│ • Response format & error handling │
│ • Moderate speed │
└──────────────────────────────────────────────┘
▲
│
┌────────────────────────────────────────────────────────┐
│ 🔧 UNIT TESTS │
│ Test Tool Logic │
│ • Business rules & validation │
│ • Input/output correctness │
│ • Fast, deterministic │
└────────────────────────────────────────────────────────┘
Layer 1: Unit Tests verify your tool logic works correctly
Layer 2: Protocol Tests verify you implement MCP correctly
Layer 3: Agent Tests verify LLMs can successfully use your tools
Each layer builds on the previous one. Skip unit tests and you'll waste time debugging agent failures caused by basic logic errors. Skip agent tests and you'll ship tools that work in theory but fail in practice.
Unit Tests: Tool Logic¶
At this layer, test your tool implementation as pure functions. Mock external dependencies (databases, APIs) and verify that given specific inputs, you get expected outputs.
What to test:
- Input validation: Required fields are enforced, invalid values rejected, edge cases handled
- Business logic: Rules execute correctly (refund limits, permission checks, data transformations)
- Output structure: Results match your schema, include all required fields
- Error conditions: Graceful handling of failures (network errors, missing resources, rate limits)
Why this matters: Unit tests are fast and deterministic. They catch logic bugs before you even start the server, saving time in later testing stages.
Unit Test Examples
def test_file_support_request_creates_ticket():
"""Test basic ticket creation flow"""
result = file_support_request(
customer_email="test@example.com",
issue_type="billing",
description="Charged twice"
)
assert result["ticket_id"] is not None
assert result["assigned_team"] == "Finance"
assert result["status"] == "open"
def test_file_support_request_validates_email():
"""Test input validation"""
with pytest.raises(ValueError, match="Invalid email format"):
file_support_request(
customer_email="not-an-email",
issue_type="billing",
description="Issue"
)
def test_file_support_request_handles_api_timeout():
"""Test error handling"""
with mock.patch('ticketing_api.create', side_effect=TimeoutError):
result = file_support_request(
customer_email="test@example.com",
issue_type="technical",
description="Can't login"
)
assert result["isError"] is True
assert "retry" in result["content"][0]["text"].lower()
Protocol Tests: MCP Compliance¶
At this layer, test that your server correctly implements the MCP specification. This includes proper JSON-RPC message handling, correct tool schema format, and conformance to protocol requirements.
What to test:
- Tool discovery:
tools/listreturns correctly formatted tool definitions with valid schemas - Tool invocation:
tools/callaccepts requests, executes tools, returns proper response format - Error responses: Protocol errors use correct JSON-RPC error codes and structure
- Authentication: JWT validation, unauthorized requests rejected, user context propagated
Why this matters: Protocol compliance ensures your server works with any MCP client. These tests catch structural issues like malformed JSON schemas or incorrect response formats that would cause client-side failures.
Protocol Test Examples
def test_tools_list_returns_valid_schemas():
"""Verify tool definitions conform to MCP spec"""
response = client.request("tools/list")
assert "tools" in response
for tool in response["tools"]:
assert "name" in tool
assert "description" in tool
assert "inputSchema" in tool
# Verify schema is valid JSON Schema
validate_json_schema(tool["inputSchema"])
def test_tools_call_returns_correct_format():
"""Verify tool invocation returns proper structure"""
response = client.call_tool(
"file_support_request",
{
"customer_email": "test@example.com",
"issue_type": "billing",
"description": "Issue"
}
)
assert "content" in response
assert isinstance(response["content"], list)
assert all("type" in item for item in response["content"])
def test_unauthorized_request_rejected():
"""Verify authentication is enforced"""
client_no_auth = MCPClient(server_url, auth_token=None)
with pytest.raises(ProtocolError, match="401"):
client_no_auth.call_tool("file_support_request", {...})
Tools for protocol testing:
- MCP Inspector: Visual tool for testing MCP servers interactively
- MCP SDK test utilities: Built-in helpers for protocol conformance testing
- JSON Schema validators: Verify your tool schemas are valid
Agent Tests: Real-World Usage¶
At this layer, test with actual LLM agents to validate that your tools work in practice. This is where you discover if tool descriptions are clear, if the LLM chooses the right tools, and if multi-step workflows succeed.
What to test:
- Tool selection: Given a user request, does the agent choose the right tool?
- Parameter extraction: Does the agent correctly extract parameters from user input?
- Multi-step workflows: Can the agent chain multiple tool calls to complete complex tasks?
- Error recovery: When a tool fails, does the agent retry appropriately or inform the user?
Why this matters: This is the only way to validate the user experience. A tool can be technically correct but unusable if the LLM misunderstands when to call it or how to interpret results.
Agent Test Examples
def test_agent_files_billing_complaint():
"""Test end-to-end ticket creation from user request"""
agent = create_test_agent(tools=["file_support_request"])
response = agent.process(
"Customer jane@example.com was charged twice for order #5678"
)
# Verify agent called the right tool
assert "file_support_request" in response.tool_calls
# Verify parameters extracted correctly
call = response.tool_calls["file_support_request"]
assert call["customer_email"] == "jane@example.com"
assert call["issue_type"] == "billing"
assert "charged twice" in call["description"]
def test_agent_handles_ambiguous_request():
"""Test how agent handles unclear user input"""
agent = create_test_agent(tools=["file_support_request"])
response = agent.process("I need help")
# Agent should ask clarifying questions, not make assumptions
assert any(
keyword in response.text.lower()
for keyword in ["what", "which", "tell me more"]
)
def test_agent_recovers_from_rate_limit():
"""Test error recovery behavior"""
agent = create_test_agent(tools=["lookup_order"])
# First call succeeds, second hits rate limit, third succeeds
with mock_rate_limiting(limit=1):
response = agent.process(
"Look up order #1234, then order #5678"
)
# Verify agent retried after delay
assert response.success
assert len(response.tool_calls) == 3 # Initial + retry + second order
Example test scenarios:
- Happy path: "File a billing issue for customer@example.com who was charged twice"
- Ambiguous input: "I have a problem" (does agent ask clarifying questions?)
- Multi-step task: "Find all open tickets assigned to IT and summarize common issues"
- Error handling: "Create a ticket for invalid-email" (does agent handle validation errors?)
- Edge cases: "File an urgent ticket for a system-wide outage affecting all users"
Testing Strategy¶
Build Quality Layer by Layer
🔧 Start with unit tests to ensure tool logic is solid
- Fast and catch basic bugs early
- Test business rules, validation, error handling
- Run on every commit
⚙️ Add protocol tests once tools work individually
- Verify MCP compliance and catch integration issues
- Test tool discovery, invocation, response format
- Run on every commit
🎯 Finish with agent tests to validate real-world usage
- Slower and more complex, but catch usability issues
- Test tool selection, parameter extraction, workflows
- Run on pull requests or nightly builds
Debugging workflow: If agent tests fail, first check if unit and protocol tests pass. If they do, the problem is likely with tool descriptions, schema clarity, or the number of tools (LLM overwhelmed by choices). If lower-layer tests fail, fix those first before returning to agent tests.
Reference Documentation¶
MCP Specification¶
- MCP Schema Design - Official JSON Schema usage guidelines for tools
- Tools Specification - Complete reference for tool definitions, schemas, and error handling
- Server Implementation - Core server features and capabilities
- Resources - How to return resource links and embedded resources from tools
Tool Design Best Practices¶
- Anthropic: Writing Tools for Agents - Comprehensive guide on designing effective tools for LLM agents
Related Security Topics¶
- MCP03 - Tool Poisoning - Malicious tool definitions and descriptions
- MCP05 - Command Injection - Unsafe tool parameter handling
- MCP06 - Prompt Injection - Manipulating agent behavior via tool responses
- MCP08 - Lack of Audit & Telemetry - Monitoring and logging best practices
- MCP10 - Context Oversharing - Information disclosure in tool outputs
Next Steps¶
- Deciding what to build? → When to Use MCP for decision framework
- Wrapping existing APIs? → Migration Guidance for adapter patterns
- Ready to deploy? → Deployment Patterns for infrastructure strategies
- Need governance controls? → Enterprise Patterns for organizational guidance
- Securing your implementation? → OWASP MCP Top 10 for security best practices