Skip to content

LLMs

srbench talks to OpenAI, Anthropic, Gemini, and Azure through a single client. This page covers everything you need to point it at the right provider.

Provider keys

Set the API key for whichever providers you plan to use in your .env:

ProviderEnvironment variableSign up
OpenAIOPENAI_API_KEYplatform.openai.com
AnthropicANTHROPIC_API_KEYconsole.anthropic.com
GeminiGEMINI_API_KEYaistudio.google.com
Azure poolSRBENCH_AZURE_POOL_PATHSee Azure pool below

You only need keys for the providers you actually plan to use.

Model name format

The CLI flag is the same in every case — the prefix tells srbench which provider to route to:

ProviderFormatExample
OpenAIgpt-* or openai/gpt-*gpt-4.1
Anthropicclaude-* or anthropic/claude-*claude-sonnet-4-5
Geminigemini-* or gemini/gemini-*gemini-2.5-flash
Azure poolazure_pool/<deployment>azure_pool/gpt-4.1
bash
srbench benchmark calendar --model claude-sonnet-4-5 ...
srbench benchmark calendar --model azure_pool/gpt-4.1 ...

Bring your own model

Any OpenAI-compatible server — vLLM, Ollama, LM Studio, a private gateway — works through the OpenAI provider. Two pieces:

  1. Use the openai/<your-model-name> prefix so srbench routes through the OpenAI client without trying to match it against known gpt-* aliases.
  2. Pass --base-url to point that client at your server.
bash
srbench benchmark calendar \
    --model openai/meta-llama/Llama-3.1-70B-Instruct \
    --base-url http://localhost:8000/v1 \
    --data data/calendar-scheduling/small.yaml \
    --limit 2

OPENAI_API_KEY must still be set — most local servers ignore the value, so any non-empty string (e.g. OPENAI_API_KEY=dummy) is fine.

If you want different roles to hit different endpoints (e.g. an assistant model on vLLM and a judge on hosted OpenAI), use the per-role variants instead of the global --base-url: --assistant-base-url, --requestor-base-url, --judge-base-url for calendar; --buyer-base-url, --seller-base-url, --judge-base-url for marketplace.

Azure pool

The Azure pool provider load-balances requests across multiple Azure OpenAI deployments — useful when one deployment's quota isn't enough for a long sweep.

Populate the pool config from your Azure subscription:

bash
srbench llm azure-pool populate --output-dir configs/azure_pool

Then point SRBENCH_AZURE_POOL_PATH at the output directory in .env:

bash
SRBENCH_AZURE_POOL_PATH=configs/azure_pool

After that, any model name prefixed with azure_pool/ (e.g. azure_pool/gpt-4.1) will be routed across all populated deployments for that model.

Concurrency

srbench enforces per-provider concurrency limits to avoid hammering rate limits. There are two knobs:

VariableWhat it controls
SRBENCH_LLM_SIZEMax concurrent calls per provider, total
SRBENCH_LLM_TASK_SIZEMax concurrent calls per task

Set defaults in .env:

bash
SRBENCH_LLM_SIZE=64
SRBENCH_LLM_TASK_SIZE=5

Or override per run on the command line — these flags work on both srbench benchmark and srbench experiment:

bash
srbench experiment experiments/v0.1.0 \
    --batch-size 200 \
    --task-concurrency 5 \
    --llm-concurrency 64
FlagWhat it maps to
--llm-concurrencySRBENCH_LLM_SIZE
--task-concurrencySRBENCH_LLM_TASK_SIZE
--batch-sizeTasks executing in parallel

Per-provider overrides

You can override either limit for a specific provider with a suffix:

bash
SRBENCH_LLM_SIZE_OPENAI=128
SRBENCH_LLM_SIZE_ANTHROPIC=20
SRBENCH_LLM_TASK_SIZE_AZURE=3

Provider keys: openai, anthropic, google, azure.

Tuning rule of thumb

  • Start with whatever rate limit your provider enforces. Set SRBENCH_LLM_SIZE a little under it.
  • If you see lots of timeouts or 429s, lower it.
  • If utilization looks low and the provider isn't rate-limiting you, raise it.
  • SRBENCH_LLM_TASK_SIZE only matters when each task makes many parallel LLM calls (e.g. with high judge vote counts). Leave it unset if unsure.

Released under the MIT License.