Skip to content

Social Reasoning Bench

Evaluate the social reasoning capabilities of LLM agents in multi-party environments.

Quick Start

Evaluate the social reasoning ability of your own LLM. For example's sake, we'll assume your LLM is served as my-model via vLLM on the localhost.

bash
# 1. Clone and install
git clone https://github.com/microsoft/social-reasoning-bench.git srbench
cd srbench
uv sync --all-packages --all-groups --all-extras
source .venv/bin/activate

# 2. Setup env vars. To reproduce our results, Gemini is used.
GEMINI_API_KEY=<your api key>

# 3. Run the v0.1.0 experiment sweep with your model as the assistant
srbench experiment experiments/v0.1.0 \
    --output-base outputs/my-model
    --assistant-model openai/my-model \
    --assistant-base-url http://localhost:8000/v1 \
    --assistant-api-key none
    # To just test a few examples per experiment in the sweep
    # --set limit=10

# 4. View the results, pre-loaded with your run
srbench dashboard outputs/my-model

See Installation, Experiments, and LLMs for detailed instructions.

Released under the MIT License.