Benchmark scenarios are a subset of scenarios that compare the effectiveness of attacks across an axis that varies within the scenario itself. The axis can be many things; currently, the only benchmark variant is the adversarial benchmark, whose axis of change is the adversarial model used in attacks.
Adversarial BenchmarkΒΆ
The adversarial benchmarking scenario (AdversarialBenchmark) compares the effectiveness of different adversarial models in successfully executing attacks against a target model.
from pyrit.prompt_target import OpenAIChatTarget
from pyrit.scenario.printer.console_printer import ConsoleScenarioResultPrinter
from pyrit.scenario.scenarios.benchmark import AdversarialBenchmark
from pyrit.setup import IN_MEMORY, initialize_pyrit_async
from pyrit.setup.initializers import LoadDefaultDatasets
await initialize_pyrit_async(memory_db_type=IN_MEMORY, initializers=[LoadDefaultDatasets()]) # type: ignore
# Pass any number of adversarial PromptChatTargets as a list; AdversarialBenchmark
# infers a label for each from its identifier and runs every benchmark-friendly
# attack technique against the objective target with each adversarial model.
adversarial_model = OpenAIChatTarget()
benchmark_scenario = AdversarialBenchmark(adversarial_models=[adversarial_model])
await benchmark_scenario.initialize_async( # type: ignore
objective_target=OpenAIChatTarget(), max_concurrency=2
)
baseline_result = await benchmark_scenario.run_async() # type: ignore
# Resume handle: re-run with `AdversarialBenchmark(..., scenario_result_id=<this id>)` to pick
# up where this run left off (constructor args must match the original run).
print(f"Scenario result id: {baseline_result.id}")
printer = ConsoleScenarioResultPrinter()
await printer.print_summary_async(baseline_result) # type: ignoreFound default environment files: ['./.pyrit/.env', './.pyrit/.env.local']
Loaded environment file: ./.pyrit/.env
Loaded environment file: ./.pyrit/.env.local
Loading...
Scenario result id: e01b35d2-c7f8-49bd-aafd-5d44ef7235f4
====================================================================================================
π SCENARIO RESULTS: AdversarialBenchmark
====================================================================================================
βΌ Scenario Information
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Scenario Details
β’ Name: AdversarialBenchmark
β’ Scenario Version: 1
β’ PyRIT Version: 0.14.0.dev0
β’ Description:
Benchmarking scenario that compares the attack success rate (ASR) of several different adversarial models.
π― Target Information
β’ Target Type: OpenAIChatTarget
β’ Target Model: gpt-4o-japan-nilfilter
β’ Target Endpoint: https://pyrit-japan-test.openai.azure.com/openai/v1
π Scorer Information
βΈ Scorer Identifier
β’ Scorer Type: TrueFalseInverterScorer
β’ scorer_type: true_false
β’ score_aggregator: OR_
ββ Composite of 1 scorer(s):
β’ Scorer Type: SelfAskRefusalScorer
β’ scorer_type: true_false
β’ score_aggregator: OR_
β’ model_name: gpt-4o-japan-nilfilter
βΈ Performance Metrics
Official evaluation has not been run yet for this specific configuration
βΌ Overall Statistics
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Summary
β’ Total Strategies: 3
β’ Total Attack Results: 24
β’ Overall Success Rate: 25%
β’ Unique Objectives: 8
βΌ Per-Group Breakdown
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
πΈ Group: gpt-4o-japan-nilfilter
β’ Number of Results: 24
β’ Success Rate: 25%
====================================================================================================