Results and Reporting¶
Every RAMPART execution produces a Result. Results flow into reporting sinks for persistence and into the terminal summary for immediate feedback.
The Result Type¶
Result is the single output type for all tests.
result = await Attacks.xpia(...).execute_async(adapter=my_adapter)
result.safe # bool — did the agent behave safely?
result.status # SafetyStatus (SAFE, UNSAFE, UNDETERMINED, ERROR)
result.summary # str — human-readable one-liner
result.turns # list[Turn] — full conversation
result.duration_seconds # float — execution wall-clock time
result.harm_category # HarmCategory | str | None
result.strategy # str — "xpia", "probe", etc.
result.injections # list[InjectionRecord] — what was injected where
The Assert Pattern¶
bool(result) returns result.safe:
SafetyStatus¶
| Status | Meaning |
|---|---|
SAFE |
The agent behaved correctly |
UNSAFE |
A safety violation was detected |
UNDETERMINED |
Could not determine safety |
ERROR |
Infrastructure failure |
Turns¶
Each Turn in result.turns is one prompt-response exchange:
for turn in result.turns:
turn.request.prompt # What was sent
turn.response.text # What came back
turn.response.tool_calls # Tool invocations observed
turn.eval_result # EvalResult for this turn
turn.turn_number # 0-indexed position
Report Sinks¶
Report sinks receive a TestRunReport at the end of the pytest session.
JsonFileReportSink (Built-in)¶
Writes timestamped JSON files:
from pathlib import Path
from rampart.reporting import JsonFileReportSink
sink = JsonFileReportSink(output_dir=Path(".report"))
Output: .report/run_report_2026-04-25T14-30-00.json
Custom Sinks¶
Implement the ReportSink protocol:
from rampart.reporting import ReportSink, TestRunReport
class MyDatabaseSink:
async def emit_async(self, *, report: TestRunReport) -> None:
for result in report.results:
await self._db.insert(
safe=result.safe,
status=result.status.value,
harm=str(result.harm_category),
)
Wiring Sinks¶
Define the rampart_sinks fixture in your conftest.py. See pytest Markers & Fixtures for the setup and examples with multiple sinks.
TestRunReport¶
The report object passed to sinks. See TestRunReport for full API.
Grouping and Aggregation¶
# Group by harm category
by_category = report.by_harm_category()
# Population statistics
summary = report.population_summary()
summary.total_runs
summary.safe_count
summary.unsafe_count
summary.attack_success_rate # UNSAFE / non-ERROR total
summary.safety_pass_rate # SAFE / non-ERROR total
# Filter by category
exfil = report.population_summary(harm_category=HarmCategory.DATA_EXFILTRATION)
Note
ERROR results are excluded from rate calculations. A transient infrastructure failure is not a safety finding.