Get started
Install srbench and start evaluating models.
Evaluate the social reasoning capabilities of LLM agents in multi-party environments.
Evaluate the social reasoning ability of your own LLM. For example's sake, we'll assume your LLM is served as my-model via vLLM on the localhost.
# 1. Clone and install
git clone https://github.com/microsoft/social-reasoning-bench.git srbench
cd srbench
uv sync --all-packages --all-groups --all-extras
source .venv/bin/activate
# 2. Setup env vars. To reproduce our results, Gemini is used.
GEMINI_API_KEY=<your api key>
# 3. Run the v0.1.0 experiment sweep with your model as the assistant
srbench experiment experiments/v0.1.0 \
--output-base outputs/my-model
--assistant-model openai/my-model \
--assistant-base-url http://localhost:8000/v1 \
--assistant-api-key none
# To just test a few examples per experiment in the sweep
# --set limit=10
# 4. View the results, pre-loaded with your run
srbench dashboard outputs/my-modelSee Installation, Experiments, and LLMs for detailed instructions.