Skip to content

Anthropic Provider

Integrates Anthropic's Claude models into Amplifier. Supports streaming, tool calling, extended thinking, and ChatRequest format.

Module ID

provider-anthropic

Installation

providers:
  - module: provider-anthropic
    source: git+https://github.com/microsoft/amplifier-module-provider-anthropic@main
    config:
      default_model: claude-sonnet-4-5

Configuration

Core Options

Option Type Default Description
api_key string $ANTHROPIC_API_KEY API authentication key
base_url string (none) Custom API endpoint (for proxies, local APIs)
default_model string claude-sonnet-4-5 Default model
max_tokens int 64000 Maximum response tokens
temperature float 0.7 Sampling temperature
timeout float 600.0 API timeout in seconds
priority int 100 Provider priority for selection

Feature Toggles

Option Type Default Description
use_streaming bool true Use streaming API (required for large contexts)
enable_prompt_caching bool true Enable prompt caching (90% cost reduction on cached tokens)
enable_web_search bool false Enable native web search tool
enable_1m_context bool true Enable 1M token context window (Sonnet and Opus only)
filtered bool true Filter to curated model list
beta_headers string/list (none) Anthropic beta header(s) for experimental features

Rate Limit & Retry

Option Type Default Description
max_retries int 5 Total retry attempts before failing
retry_jitter bool/float true Jitter fraction (0.0–1.0). Also accepts true (→ 0.2) or false (→ 0.0) for backward compatibility
max_retry_delay float 60.0 Maximum wait between retries (seconds)
min_retry_delay float 1.0 Minimum delay if no retry-after header
overloaded_delay_multiplier float 10.0 Multiplier applied to delays for 529 Overloaded errors
throttle_threshold float 0.02 Pre-emptive throttle trigger (2% capacity remaining)
throttle_delay float 1.0 Throttle delay when approaching limits (seconds)
max_concurrent_requests int 5 Process-wide in-flight API call limit (0 to disable)
rate_limit_state_path string ~/.amplifier/rate-limit-state.json Path for cross-process rate-limit state sharing (empty string to disable)

Debug Options

Option Type Default Description
raw bool false Include raw payload in provider events

Overload Fallback

Option Type Default Description
fallback_on_overload bool false Temporarily downgrade to a lower-tier model when a higher-tier model stays overloaded
fallback_retry_count int 1 Overload retries before triggering downgrade
fallback_cooldown_seconds float 1800.0 Duration (seconds) the downgrade window stays active before retrying the higher model
fallback_sonnet_model string claude-sonnet-4-6 Model to use when Opus is overloaded
fallback_haiku_model string claude-haiku-4-5 Model to use when Sonnet is overloaded
persist_fallback_state bool false Persist temporary downgrade state across separate Amplifier processes
fallback_state_path string ~/.amplifier/anthropic-fallback-state.json Path for cross-process fallback state file (only used when persist_fallback_state is true)

Supported Models

Model Description
claude-sonnet-4-5 Claude Sonnet 4.5 (recommended, default)
claude-opus-4-6 Claude Opus 4.6 (most capable)
claude-haiku-4-5 Claude Haiku 4.5 (fastest, cheapest)

Extended Thinking

Enable extended thinking for complex reasoning tasks:

providers:
  - module: provider-anthropic
    config:
      default_model: claude-sonnet-4-5

Then in your request:

request = ChatRequest(
    model="claude-sonnet-4-5",
    messages=[...],
    enable_thinking=True
)

The response will include thinking blocks showing the model's reasoning process.

Prompt Caching

Anthropic's prompt caching reduces costs by 90% for repeated content. Enable it:

providers:
  - module: provider-anthropic
    config:
      enable_prompt_caching: true

The provider automatically marks eligible content for caching based on message structure.

Enable native web search (Claude 4+ models only):

providers:
  - module: provider-anthropic
    config:
      enable_web_search: true

The model can then search the web directly during conversations.

Beta Headers

Anthropic provides experimental features through beta headers. Enable these features by adding the beta_headers configuration field.

Single beta header:

providers:
  - module: provider-anthropic
    config:
      default_model: claude-sonnet-4-5
      beta_headers: "context-1m-2025-08-07"  # Enable 1M token context window

Multiple beta headers:

providers:
  - module: provider-anthropic
    config:
      default_model: claude-sonnet-4-5
      beta_headers:
        - "context-1m-2025-08-07"
        - "future-feature-header"

Notes: - Beta features are experimental and subject to change - Check Anthropic's documentation for available beta headers - Invalid beta headers will cause API errors (fail fast) - Beta header usage is logged at initialization for observability

Rate Limit Management

The provider tracks rate limits from response headers and pre-emptively throttles when approaching limits:

providers:
  - module: provider-anthropic
    config:
      throttle_threshold: 0.02  # Throttle at 2% capacity remaining
      throttle_delay: 1.0  # Fallback sleep when no reset timestamp available

Events: - provider:throttle - Emitted when pre-emptive throttling occurs - provider:retry - Emitted before each retry attempt

Overload Fallback

When fallback_on_overload is enabled, the provider temporarily routes requests to a lower-tier model if the higher-tier model remains overloaded after fallback_retry_count attempts. The downgrade window stays active for fallback_cooldown_seconds before automatically retrying the higher model.

providers:
  - module: provider-anthropic
    config:
      fallback_on_overload: true
      fallback_retry_count: 1
      fallback_cooldown_seconds: 1800

Events: - provider:fallback_open - Emitted when a temporary downgrade window opens - provider:fallback_active - Emitted on each request served through an active downgrade window

Retry Configuration

providers:
  - module: provider-anthropic
    config:
      max_retries: 5
      min_retry_delay: 1.0
      max_retry_delay: 60.0
      retry_jitter: 0.2

The provider uses exponential backoff with jitter and honors retry-after headers from the API.

Error Handling

The provider translates Anthropic SDK exceptions to kernel errors:

SDK Exception Condition Kernel Error Retryable
RateLimitError 429 RateLimitError Yes
OverloadedError 529 ProviderUnavailableError Yes (10× backoff)
APIStatusError 5xx (status >= 500) ProviderUnavailableError Yes
AuthenticationError 401 AuthenticationError No
BadRequestError context length / too many tokens ContextLengthError No
BadRequestError safety / content filter / blocked ContentFilterError No
BadRequestError other InvalidRequestError No
APIStatusError 403 AccessDeniedError No
APIStatusError 404 NotFoundError No
APIStatusError other non-5xx LLMError No
asyncio.TimeoutError LLMTimeoutError Yes
Other LLMError Yes

Debug Events

The provider emits events on every request and response. Enable the raw option to include the complete unmodified API payload in those events:

providers:
  - module: provider-anthropic
    config:
      raw: true  # Include raw API payload in provider events

Events: - llm:request - Request details (message counts, model, params; includes raw payload if raw: true) - llm:response - Response details (usage, stop reason; includes raw payload if raw: true)

Environment Variables

export ANTHROPIC_API_KEY="your-api-key-here"
export ANTHROPIC_BASE_URL="https://api.anthropic.com"  # Optional: override API endpoint

Example Configuration

providers:
  - module: provider-anthropic
    source: git+https://github.com/microsoft/amplifier-module-provider-anthropic@main
    config:
      api_key: ${ANTHROPIC_API_KEY}
      default_model: claude-sonnet-4-5
      max_tokens: 8192
      temperature: 1.0
      enable_prompt_caching: true
      enable_web_search: false
      max_retries: 5
      retry_jitter: 0.2