Social Reasoning Bench

Evaluate the social reasoning capabilities of LLM agents in multi-party environments.

Get started

Install srbench and start evaluating models.

Read the blog

Our motivation, methodology, and findings.

Browse the results

Peruse the raw data from our experiments.

Quick Start

Evaluate the social reasoning ability of your own LLM. For example's sake, we'll assume your LLM is served as my-model via vLLM on the localhost.

bash

# 1. Clone and install
git clone https://github.com/microsoft/social-reasoning-bench.git srbench
cd srbench
uv sync --all-packages --all-groups --all-extras
source .venv/bin/activate

# 2. Setup env vars. To reproduce our results, Gemini is used.
GEMINI_API_KEY=<your api key>

# 3. Run the v0.1.0 experiment sweep with your model as the assistant
srbench experiment experiments/v0.1.0 \
    --output-base outputs/my-model
    --assistant-model openai/my-model \
    --assistant-base-url http://localhost:8000/v1 \
    --assistant-api-key none
    # To just test a few examples per experiment in the sweep
    # --set limit=10

# 4. View the results, pre-loaded with your run
srbench dashboard outputs/my-model

See Installation, Experiments, and LLMs for detailed instructions.

Social Reasoning Bench

Get started

Read the blog

Browse the results

Quick Start ​

Quick Start