Skip to content

Overview

RAMPART is a pytest-native safety testing framework for agentic AI applications. You write tests that probe your agent for safety violations — injection attacks, behavioral regressions, data exfiltration — and RAMPART orchestrates the interaction, evaluates the outcome, and reports the results.

Tests look like regular pytest tests. RAMPART provides the execution strategies, evaluation logic, and reporting infrastructure; you provide the adapter that connects your agent to the framework.


Core Concepts

RAMPART has two top-level execution categories:

Category Tests for "Detected" means Result
Attack Bad behavior the agent should not exhibit The attack succeeded UNSAFE
Probe Good behavior the agent should exhibit The expected behavior is present SAFE

Both categories produce the same Result type. The difference is in how evaluator outcomes map to safety verdicts.

RAMPART ships with the following attacks; more will be added:

  • XPIA — Cross-Prompt Injection Attack

RAMPART ships with the following probes; more will be added:


Architecture at a Glance

RAMPART integrates within your consumer package (the thin integration layer in your product team's repo) and the agent under test, building on PyRIT primitives underneath. The diagram below shows how the layers fit together and where your code plugs in:

RAMPART layered architecture: PyRIT (L1), RAMPART (L2), Consumer Package (L3), Agent Under Test (L4)

RAMPART's four layers — from PyRIT primitives up to the agent under test.

Component Model

Zooming in on the runtime path, every RAMPART test wires together the same handful of components:

Component You provide RAMPART provides
AgentAdapter Implementation that creates sessions and declares capabilities The protocol
Session Implementation that sends requests and returns responses The protocol
Surface Implementation for your data sources (or use built-ins) The protocol; built-in surfaces like OneDriveSurface
Evaluator Choice and configuration Built-ins: ToolCalled, ResponseContains, SideEffectOccurred
PromptDriver Trigger prompts (as strings) or a custom driver StaticDriver, LLMDriver
ReportSink Choice of sink and output location JsonFileReportSink

Execution Lifecycle

A single test run flows from your pytest test, through a RAMPART attack or probe, into your AgentAdapter, and out to the agent system — then back again as a Result:

RAMPART test execution flow: Test → Attacks/Probes → AgentAdapter → Agent System, with Pass/Fail returned

Request / response cycle for a single test run.

Under the hood, every execution follows a common lifecycle owned by BaseExecution, which drives the per-turn loop between the strategy, your adapter, and the evaluator:

sequenceDiagram
    participant Test as Your Test
    participant Exec as BaseExecution
    participant Strat as Strategy
    participant Adapter as Your Adapter
    participant Eval as Evaluator

    Test->>Exec: execute_async(adapter)
    Exec->>Exec: fire ON_PRE_EXECUTE
    Exec->>Strat: _execute_async(adapter)

    loop Each turn (up to max_turns)
        Strat->>Strat: driver.next_prompt_async(history)
        Strat->>Adapter: session.send_async(request)
        Adapter-->>Strat: Response
        Strat->>Eval: evaluate_async(context)
        Eval-->>Strat: EvalResult
        Note over Strat: Early stop if detected
    end

    Strat-->>Exec: Result
    Exec->>Exec: fire ON_POST_EXECUTE
    Exec-->>Test: Result

Attack strategies add an injection phase before the conversation loop. Probe strategies skip injection entirely.

If an InfrastructureError is raised during execution, BaseExecution catches it and produces a Result with SafetyStatus.ERROR.


The Result Contract

Result is the single output type for all tests. Its boolean conversion (bool(result)) returns result.safe:

Python
result = await Attacks.xpia(...).execute_async(adapter=my_adapter)
assert result, result.summary

If the agent behaved safely, the assertion passes. If not, the failure message is the human-readable summary.


Evaluator Polarity

Evaluators are polarity-free. They answer "did X happen?" — not "is X good or bad?" The meaning of detection depends on context:

  • In an attack, detection means the attack objective was achieved → UNSAFE
  • In a probe, detection means the expected behavior is present → SAFE

The Attacks and Probes factories handle this mapping automatically via resolve_as_attack and resolve_as_probe.

You can reuse the same evaluator in both contexts. A ToolCalled evaluator detects whether a tool was called — whether that's good or bad depends on whether you're attacking or probing.


pytest Integration

RAMPART registers as a pytest plugin automatically when installed. It provides:

  • Markers: @pytest.mark.harm(...) for categorization, @pytest.mark.trial(n=...) for statistical repetition
  • Automatic result collection: Results from Attacks.* and Probes.* are collected without manual wiring
  • Terminal summary: A safety summary printed after the standard pytest output
  • Report sinks: Structured output via the rampart_sinks fixture

See pytest Markers & Fixtures for setup details.