Skip to content

API Reference — Reporting

Reporting infrastructure for persisting test results.

reporting

Reporting infrastructure.

Re-exports: ReportSink, TestRunReport, JsonFileReportSink.

ReportSink

Bases: Protocol

Receives test run reports and persists them to an external destination.

Implementations handle serialization and delivery to their target (database, metrics pipeline, file store, etc.). Terminal output is not a ReportSink concern — it is owned by the pytest_terminal_summary hook in the plugin.

emit_async async

Python
emit_async(*, report)

Emit a complete test run report.

Parameters:

Name Type Description Default
report TestRunReport

The aggregated test run results.

required
Source code in rampart/reporting/sink.py
Python
async def emit_async(self, *, report: TestRunReport) -> None:
    """Emit a complete test run report.

    Args:
        report (TestRunReport): The aggregated test run results.
    """
    ...

TestRunReport dataclass

Python
TestRunReport(
    *,
    results=list[Result](),
    total_runs=0,
    passed=0,
    failed=0,
    undetermined=0,
    errors=0,
    duration_seconds=0.0,
    metadata=dict[str, Any](),
)

Aggregated results from a complete test run.

Built by the pytest plugin at session end from all collected Result objects and standard pytest outcomes.

Parameters:

Name Type Description Default
results list[Result]

All Result objects recorded during the run.

list[Result]()
total_runs int

Total number of Result objects (one per execution run).

0
passed int

Number of runs that passed.

0
failed int

Number of runs that failed.

0
undetermined int

Number with undetermined outcomes.

0
errors int

Number with infrastructure errors.

0
duration_seconds float

Total run duration.

0.0
metadata dict[str, Any]

Run-level metadata (CI job ID, commit hash, etc.).

dict[str, Any]()

by_harm_category

Python
by_harm_category()

Group results by harm category.

HarmCategory is a StrEnum, so both built-in enum values and custom plain strings are native strings at runtime. The grouping key is always a plain string.

Returns:

Type Description
dict[str, list[Result]]

dict[str, list[Result]]: Results grouped by harm category string.

Source code in rampart/reporting/sink.py
Python
def by_harm_category(self) -> dict[str, list[Result]]:
    """Group results by harm category.

    HarmCategory is a StrEnum, so both built-in enum values and
    custom plain strings are native strings at runtime. The grouping
    key is always a plain string.

    Returns:
        dict[str, list[Result]]: Results grouped by harm category string.
    """
    grouped: dict[str, list[Result]] = {}
    for r in self.results:
        key = str(r.harm_category) if r.harm_category else "uncategorized"
        grouped.setdefault(key, []).append(r)
    return grouped

population_summary

Python
population_summary(*, harm_category=None)

Compute aggregate statistics over collected Result objects.

Each Result corresponds to one test execution — one run of one test body. For parametrized payload suites, each payload variant is one Result. For trial-marked tests, each trial clone is one Result; trial groups are aggregated separately by the plugin before this method is called.

This method does not distinguish payloads from trial repetitions. Callers that need population-level statistics (distinct payloads, not repeated trials) should filter Results to non-trial items before calling, or use the plugin-managed trial-group aggregates.

Parameters:

Name Type Description Default
harm_category HarmCategory | str | None

Filter to a specific category. Accepts built-in HarmCategory values or plain strings for team-defined categories. None computes over all results.

None

Returns:

Name Type Description
PopulationSummary PopulationSummary

Statistics including total_runs, safe_count, unsafe_count, undetermined_count, error_count, attack_success_rate, and safety_pass_rate.

Source code in rampart/reporting/sink.py
Python
def population_summary(
    self,
    *,
    harm_category: HarmCategory | str | None = None,
) -> PopulationSummary:
    """Compute aggregate statistics over collected Result objects.

    Each Result corresponds to one test execution — one run of one
    test body. For parametrized payload suites, each payload variant
    is one Result. For trial-marked tests, each trial clone is one
    Result; trial groups are aggregated separately by the plugin
    before this method is called.

    This method does not distinguish payloads from trial repetitions.
    Callers that need population-level statistics (distinct payloads,
    not repeated trials) should filter Results to non-trial items
    before calling, or use the plugin-managed trial-group aggregates.

    Args:
        harm_category (HarmCategory | str | None): Filter to a specific
            category. Accepts built-in HarmCategory values or plain
            strings for team-defined categories. None computes over
            all results.

    Returns:
        PopulationSummary: Statistics including total_runs, safe_count,
            unsafe_count, undetermined_count, error_count,
            attack_success_rate, and safety_pass_rate.
    """
    subset = self.results
    if harm_category is not None:
        subset = [r for r in self.results if r.harm_category == harm_category]

    total = len(subset)
    if total == 0:
        return PopulationSummary(
            total_runs=0,
            safe_count=0,
            unsafe_count=0,
            undetermined_count=0,
            error_count=0,
            attack_success_rate=0.0,
            safety_pass_rate=0.0,
        )

    safe = sum(1 for r in subset if r.status == SafetyStatus.SAFE)
    unsafe = sum(1 for r in subset if r.status == SafetyStatus.UNSAFE)
    undetermined = sum(1 for r in subset if r.status == SafetyStatus.UNDETERMINED)
    error_count = sum(1 for r in subset if r.status == SafetyStatus.ERROR)

    # ERROR results are excluded from attack success rate. A SharePoint
    # 503 is not a safety finding. Including errors in the denominator
    # dilutes the rate; including them in the numerator inflates it.
    diagnostic_total = total - error_count

    return PopulationSummary(
        total_runs=total,
        safe_count=safe,
        unsafe_count=unsafe,
        undetermined_count=undetermined,
        error_count=error_count,
        attack_success_rate=(
            unsafe / diagnostic_total if diagnostic_total > 0 else 0.0
        ),
        safety_pass_rate=(safe / diagnostic_total if diagnostic_total > 0 else 0.0),
    )

JsonFileReportSink

Python
JsonFileReportSink(*, output_dir)

Writes the test run report to a JSON file.

Each run produces a timestamped file: <output_dir>/run_report_2026-03-19T21-30-00.json

Parameters:

Name Type Description Default
output_dir Path

Directory to write report files into. Created automatically if it does not exist.

required

Initialize with an output directory for report files.

Source code in rampart/reporting/json_file.py
Python
def __init__(self, *, output_dir: Path) -> None:
    """Initialize with an output directory for report files."""
    self._output_dir = output_dir

emit_async async

Python
emit_async(*, report)

Serialize the report to a JSON file.

Parameters:

Name Type Description Default
report TestRunReport

The aggregated test run results.

required
Source code in rampart/reporting/json_file.py
Python
async def emit_async(self, *, report: TestRunReport) -> None:
    """Serialize the report to a JSON file.

    Args:
        report (TestRunReport): The aggregated test run results.
    """
    self._output_dir.mkdir(parents=True, exist_ok=True)

    timestamp = datetime.now(UTC).strftime("%Y-%m-%dT%H-%M-%S")
    filepath = self._output_dir / f"run_report_{timestamp}.json"

    data = self._serialize_report(report)
    filepath.write_text(json.dumps(data, indent=2, default=str))