Anthropic Provider¶
Integrates Anthropic's Claude models into Amplifier. Supports streaming, tool calling, extended thinking, and ChatRequest format.
Module ID¶
provider-anthropic
Installation¶
providers:
- module: provider-anthropic
source: git+https://github.com/microsoft/amplifier-module-provider-anthropic@main
config:
default_model: claude-sonnet-4-5
Configuration¶
Core Options¶
| Option | Type | Default | Description |
|---|---|---|---|
api_key | string | $ANTHROPIC_API_KEY | API authentication key |
base_url | string | (none) | Custom API endpoint (for proxies, local APIs) |
default_model | string | claude-sonnet-4-5 | Default model |
max_tokens | int | 64000 | Maximum response tokens |
temperature | float | 0.7 | Sampling temperature |
timeout | float | 600.0 | API timeout in seconds |
priority | int | 100 | Provider priority for selection |
Feature Toggles¶
| Option | Type | Default | Description |
|---|---|---|---|
use_streaming | bool | true | Use streaming API (required for large contexts) |
enable_prompt_caching | bool | true | Enable prompt caching (90% cost reduction on cached tokens) |
enable_web_search | bool | false | Enable native web search tool |
enable_1m_context | bool | true | Enable 1M token context window (Sonnet and Opus only) |
filtered | bool | true | Filter to curated model list |
beta_headers | string/list | (none) | Anthropic beta header(s) for experimental features |
Rate Limit & Retry¶
| Option | Type | Default | Description |
|---|---|---|---|
max_retries | int | 5 | Total retry attempts before failing |
retry_jitter | bool/float | true | Jitter fraction (0.0–1.0). Also accepts true (→ 0.2) or false (→ 0.0) for backward compatibility |
max_retry_delay | float | 60.0 | Maximum wait between retries (seconds) |
min_retry_delay | float | 1.0 | Minimum delay if no retry-after header |
overloaded_delay_multiplier | float | 10.0 | Multiplier applied to delays for 529 Overloaded errors |
throttle_threshold | float | 0.02 | Pre-emptive throttle trigger (2% capacity remaining) |
throttle_delay | float | 1.0 | Throttle delay when approaching limits (seconds) |
max_concurrent_requests | int | 5 | Process-wide in-flight API call limit (0 to disable) |
rate_limit_state_path | string | ~/.amplifier/rate-limit-state.json | Path for cross-process rate-limit state sharing (empty string to disable) |
Debug Options¶
| Option | Type | Default | Description |
|---|---|---|---|
raw | bool | false | Include raw payload in provider events |
Overload Fallback¶
| Option | Type | Default | Description |
|---|---|---|---|
fallback_on_overload | bool | false | Temporarily downgrade to a lower-tier model when a higher-tier model stays overloaded |
fallback_retry_count | int | 1 | Overload retries before triggering downgrade |
fallback_cooldown_seconds | float | 1800.0 | Duration (seconds) the downgrade window stays active before retrying the higher model |
fallback_sonnet_model | string | claude-sonnet-4-6 | Model to use when Opus is overloaded |
fallback_haiku_model | string | claude-haiku-4-5 | Model to use when Sonnet is overloaded |
persist_fallback_state | bool | false | Persist temporary downgrade state across separate Amplifier processes |
fallback_state_path | string | ~/.amplifier/anthropic-fallback-state.json | Path for cross-process fallback state file (only used when persist_fallback_state is true) |
Supported Models¶
| Model | Description |
|---|---|
claude-sonnet-4-5 | Claude Sonnet 4.5 (recommended, default) |
claude-opus-4-6 | Claude Opus 4.6 (most capable) |
claude-haiku-4-5 | Claude Haiku 4.5 (fastest, cheapest) |
Extended Thinking¶
Enable extended thinking for complex reasoning tasks:
Then in your request:
The response will include thinking blocks showing the model's reasoning process.
Prompt Caching¶
Anthropic's prompt caching reduces costs by 90% for repeated content. Enable it:
The provider automatically marks eligible content for caching based on message structure.
Web Search¶
Enable native web search (Claude 4+ models only):
The model can then search the web directly during conversations.
Beta Headers¶
Anthropic provides experimental features through beta headers. Enable these features by adding the beta_headers configuration field.
Single beta header:
providers:
- module: provider-anthropic
config:
default_model: claude-sonnet-4-5
beta_headers: "context-1m-2025-08-07" # Enable 1M token context window
Multiple beta headers:
providers:
- module: provider-anthropic
config:
default_model: claude-sonnet-4-5
beta_headers:
- "context-1m-2025-08-07"
- "future-feature-header"
Notes: - Beta features are experimental and subject to change - Check Anthropic's documentation for available beta headers - Invalid beta headers will cause API errors (fail fast) - Beta header usage is logged at initialization for observability
Rate Limit Management¶
The provider tracks rate limits from response headers and pre-emptively throttles when approaching limits:
providers:
- module: provider-anthropic
config:
throttle_threshold: 0.02 # Throttle at 2% capacity remaining
throttle_delay: 1.0 # Fallback sleep when no reset timestamp available
Events: - provider:throttle - Emitted when pre-emptive throttling occurs - provider:retry - Emitted before each retry attempt
Overload Fallback¶
When fallback_on_overload is enabled, the provider temporarily routes requests to a lower-tier model if the higher-tier model remains overloaded after fallback_retry_count attempts. The downgrade window stays active for fallback_cooldown_seconds before automatically retrying the higher model.
providers:
- module: provider-anthropic
config:
fallback_on_overload: true
fallback_retry_count: 1
fallback_cooldown_seconds: 1800
Events: - provider:fallback_open - Emitted when a temporary downgrade window opens - provider:fallback_active - Emitted on each request served through an active downgrade window
Retry Configuration¶
providers:
- module: provider-anthropic
config:
max_retries: 5
min_retry_delay: 1.0
max_retry_delay: 60.0
retry_jitter: 0.2
The provider uses exponential backoff with jitter and honors retry-after headers from the API.
Error Handling¶
The provider translates Anthropic SDK exceptions to kernel errors:
| SDK Exception | Condition | Kernel Error | Retryable |
|---|---|---|---|
RateLimitError | 429 | RateLimitError | Yes |
OverloadedError | 529 | ProviderUnavailableError | Yes (10× backoff) |
APIStatusError | 5xx (status >= 500) | ProviderUnavailableError | Yes |
AuthenticationError | 401 | AuthenticationError | No |
BadRequestError | context length / too many tokens | ContextLengthError | No |
BadRequestError | safety / content filter / blocked | ContentFilterError | No |
BadRequestError | other | InvalidRequestError | No |
APIStatusError | 403 | AccessDeniedError | No |
APIStatusError | 404 | NotFoundError | No |
APIStatusError | other non-5xx | LLMError | No |
asyncio.TimeoutError | — | LLMTimeoutError | Yes |
| Other | — | LLMError | Yes |
Debug Events¶
The provider emits events on every request and response. Enable the raw option to include the complete unmodified API payload in those events:
providers:
- module: provider-anthropic
config:
raw: true # Include raw API payload in provider events
Events: - llm:request - Request details (message counts, model, params; includes raw payload if raw: true) - llm:response - Response details (usage, stop reason; includes raw payload if raw: true)
Environment Variables¶
export ANTHROPIC_API_KEY="your-api-key-here"
export ANTHROPIC_BASE_URL="https://api.anthropic.com" # Optional: override API endpoint
Example Configuration¶
providers:
- module: provider-anthropic
source: git+https://github.com/microsoft/amplifier-module-provider-anthropic@main
config:
api_key: ${ANTHROPIC_API_KEY}
default_model: claude-sonnet-4-5
max_tokens: 8192
temperature: 1.0
enable_prompt_caching: true
enable_web_search: false
max_retries: 5
retry_jitter: 0.2