Benchmark Scenarios - PyRIT Documentation

Benchmark scenarios are a subset of scenarios that compare the effectiveness of attacks across an axis that varies within the scenario itself. The axis can be many things; currently, the only benchmark variant is the adversarial benchmark, whose axis of change is the adversarial model used in attacks.

Adversarial Benchmark¶

The adversarial benchmarking scenario (AdversarialBenchmark) compares the effectiveness of different adversarial models in successfully executing attacks against a target model.

from pyrit.output import output_scenario_async
from pyrit.prompt_target import OpenAIChatTarget
from pyrit.scenario.scenarios.benchmark import AdversarialBenchmark
from pyrit.setup import IN_MEMORY, initialize_pyrit_async
from pyrit.setup.initializers import LoadDefaultDatasets

await initialize_pyrit_async(memory_db_type=IN_MEMORY, initializers=[LoadDefaultDatasets()])  # type: ignore

# Pass any number of adversarial PromptTarget instances (with chat-target
# capabilities — multi-turn and editable history) as a list; AdversarialBenchmark
# infers a label for each from its identifier and runs every benchmark-friendly
# attack technique against the objective target with each adversarial model.
adversarial_model = OpenAIChatTarget()

benchmark_scenario = AdversarialBenchmark(adversarial_models=[adversarial_model])

await benchmark_scenario.initialize_async(  # type: ignore
    objective_target=OpenAIChatTarget(), max_concurrency=2
)

baseline_result = await benchmark_scenario.run_async()  # type: ignore

# Resume handle: re-run with `AdversarialBenchmark(..., scenario_result_id=<this id>)` to pick
# up where this run left off (constructor args must match the original run).
print(f"Scenario result id: {baseline_result.id}")


await output_scenario_async(baseline_result)

Found default environment files: ['./.pyrit/.env', './.pyrit/.env.local']
Loaded environment file: ./.pyrit/.env
Loaded environment file: ./.pyrit/.env.local

Scenario result id: e01b35d2-c7f8-49bd-aafd-5d44ef7235f4

====================================================================================================
                              📊 SCENARIO RESULTS: AdversarialBenchmark                              
====================================================================================================

▼ Scenario Information
────────────────────────────────────────────────────────────────────────────────────────────────────
  📋 Scenario Details
    • Name: AdversarialBenchmark
    • Scenario Version: 1
    • PyRIT Version: 0.14.0.dev0
    • Description:
        Benchmarking scenario that compares the attack success rate (ASR) of several different adversarial models.

  🎯 Target Information
    • Target Type: OpenAIChatTarget
    • Target Model: gpt-4o-japan-nilfilter
    • Target Endpoint: https://pyrit-japan-test.openai.azure.com/openai/v1

  📊 Scorer Information
    ▸ Scorer Identifier
      • Scorer Type: TrueFalseInverterScorer
      • scorer_type: true_false
      • score_aggregator: OR_
        └─ Composite of 1 scorer(s):
            • Scorer Type: SelfAskRefusalScorer
            • scorer_type: true_false
            • score_aggregator: OR_
            • model_name: gpt-4o-japan-nilfilter

    ▸ Performance Metrics
      Official evaluation has not been run yet for this specific configuration

▼ Overall Statistics
────────────────────────────────────────────────────────────────────────────────────────────────────
  📈 Summary
    • Total Strategies: 3
    • Total Attack Results: 24
    • Overall Success Rate: 25%
    • Unique Objectives: 8

▼ Per-Group Breakdown
────────────────────────────────────────────────────────────────────────────────────────────────────

  🔸 Group: gpt-4o-japan-nilfilter
    • Number of Results: 24
    • Success Rate: 25%

====================================================================================================