API Reference — Reporting¶

Reporting infrastructure for persisting test results.

reporting ¶

Reporting infrastructure.

Re-exports: ReportSink, TestRunReport, JsonFileReportSink.

ReportSink ¶

Bases: Protocol

Receives test run reports and persists them to an external destination.

Implementations handle serialization and delivery to their target (database, metrics pipeline, file store, etc.). Terminal output is not a ReportSink concern — it is owned by the pytest_terminal_summary hook in the plugin.

emit_async `async` ¶

Python

emit_async(*, report)

Emit a complete test run report.

Parameters:

Name	Type	Description	Default
`report`	`TestRunReport`	The aggregated test run results.	required

Source code in rampart/reporting/sink.py

Python
async def emit_async(self, *, report: TestRunReport) -> None:
    """Emit a complete test run report.

    Args:
        report (TestRunReport): The aggregated test run results.
    """
    ...

TestRunReport `dataclass` ¶

Python

TestRunReport(
    *,
    results=list[Result](),
    total_runs=0,
    passed=0,
    failed=0,
    undetermined=0,
    errors=0,
    duration_seconds=0.0,
    metadata=dict[str, Any](),
)

Aggregated results from a complete test run.

Built by the pytest plugin at session end from all collected Result objects and standard pytest outcomes.

Parameters:

Name	Type	Description	Default
`results`	`list[Result]`	All Result objects recorded during the run.	`list[Result]()`
`total_runs`	`int`	Total number of Result objects (one per execution run).	`0`
`passed`	`int`	Number of runs that passed.	`0`
`failed`	`int`	Number of runs that failed.	`0`
`undetermined`	`int`	Number with undetermined outcomes.	`0`
`errors`	`int`	Number with infrastructure errors.	`0`
`duration_seconds`	`float`	Total run duration.	`0.0`
`metadata`	`dict[str, Any]`	Run-level metadata (CI job ID, commit hash, etc.).	`dict[str, Any]()`

by_harm_category ¶

Python

by_harm_category()

Group results by harm category.

HarmCategory is a StrEnum, so both built-in enum values and custom plain strings are native strings at runtime. The grouping key is always a plain string.

Returns:

Type	Description
`dict[str, list[Result]]`	dict[str, list[Result]]: Results grouped by harm category string.

Source code in rampart/reporting/sink.py

Python
def by_harm_category(self) -> dict[str, list[Result]]:
    """Group results by harm category.

    HarmCategory is a StrEnum, so both built-in enum values and
    custom plain strings are native strings at runtime. The grouping
    key is always a plain string.

    Returns:
        dict[str, list[Result]]: Results grouped by harm category string.
    """
    grouped: dict[str, list[Result]] = {}
    for r in self.results:
        key = str(r.harm_category) if r.harm_category else "uncategorized"
        grouped.setdefault(key, []).append(r)
    return grouped

population_summary ¶

Python

population_summary(*, harm_category=None)

Compute aggregate statistics over collected Result objects.

Each Result corresponds to one test execution — one run of one test body. For parametrized payload suites, each payload variant is one Result. For trial-marked tests, each trial clone is one Result; trial groups are aggregated separately by the plugin before this method is called.

This method does not distinguish payloads from trial repetitions. Callers that need population-level statistics (distinct payloads, not repeated trials) should filter Results to non-trial items before calling, or use the plugin-managed trial-group aggregates.

Parameters:

Name	Type	Description	Default
`harm_category`	`HarmCategory \| str \| None`	Filter to a specific category. Accepts built-in HarmCategory values or plain strings for team-defined categories. None computes over all results.	`None`

Returns:

Name	Type	Description
`PopulationSummary`	`PopulationSummary`	Statistics including total_runs, safe_count, unsafe_count, undetermined_count, error_count, attack_success_rate, and safety_pass_rate.

Source code in rampart/reporting/sink.py

Python
def population_summary(
    self,
    *,
    harm_category: HarmCategory | str | None = None,
) -> PopulationSummary:
    """Compute aggregate statistics over collected Result objects.

    Each Result corresponds to one test execution — one run of one
    test body. For parametrized payload suites, each payload variant
    is one Result. For trial-marked tests, each trial clone is one
    Result; trial groups are aggregated separately by the plugin
    before this method is called.

    This method does not distinguish payloads from trial repetitions.
    Callers that need population-level statistics (distinct payloads,
    not repeated trials) should filter Results to non-trial items
    before calling, or use the plugin-managed trial-group aggregates.

    Args:
        harm_category (HarmCategory | str | None): Filter to a specific
            category. Accepts built-in HarmCategory values or plain
            strings for team-defined categories. None computes over
            all results.

    Returns:
        PopulationSummary: Statistics including total_runs, safe_count,
            unsafe_count, undetermined_count, error_count,
            attack_success_rate, and safety_pass_rate.
    """
    subset = self.results
    if harm_category is not None:
        subset = [r for r in self.results if r.harm_category == harm_category]

    total = len(subset)
    if total == 0:
        return PopulationSummary(
            total_runs=0,
            safe_count=0,
            unsafe_count=0,
            undetermined_count=0,
            error_count=0,
            attack_success_rate=0.0,
            safety_pass_rate=0.0,
        )

    safe = sum(1 for r in subset if r.status == SafetyStatus.SAFE)
    unsafe = sum(1 for r in subset if r.status == SafetyStatus.UNSAFE)
    undetermined = sum(1 for r in subset if r.status == SafetyStatus.UNDETERMINED)
    error_count = sum(1 for r in subset if r.status == SafetyStatus.ERROR)

    # ERROR results are excluded from attack success rate. A SharePoint
    # 503 is not a safety finding. Including errors in the denominator
    # dilutes the rate; including them in the numerator inflates it.
    diagnostic_total = total - error_count

    return PopulationSummary(
        total_runs=total,
        safe_count=safe,
        unsafe_count=unsafe,
        undetermined_count=undetermined,
        error_count=error_count,
        attack_success_rate=(
            unsafe / diagnostic_total if diagnostic_total > 0 else 0.0
        ),
        safety_pass_rate=(safe / diagnostic_total if diagnostic_total > 0 else 0.0),
    )

JsonFileReportSink ¶

Python

JsonFileReportSink(*, output_dir)

Writes the test run report to a JSON file.

Each run produces a timestamped file: <output_dir>/run_report_2026-03-19T21-30-00.json

Parameters:

Name	Type	Description	Default
`output_dir`	`Path`	Directory to write report files into. Created automatically if it does not exist.	required

Initialize with an output directory for report files.

Source code in rampart/reporting/json_file.py

Python
def __init__(self, *, output_dir: Path) -> None:
    """Initialize with an output directory for report files."""
    self._output_dir = output_dir

emit_async `async` ¶

Python

emit_async(*, report)

Serialize the report to a JSON file.

Parameters:

Name	Type	Description	Default
`report`	`TestRunReport`	The aggregated test run results.	required

Source code in rampart/reporting/json_file.py

Python
async def emit_async(self, *, report: TestRunReport) -> None:
    """Serialize the report to a JSON file.

    Args:
        report (TestRunReport): The aggregated test run results.
    """
    self._output_dir.mkdir(parents=True, exist_ok=True)

    timestamp = datetime.now(UTC).strftime("%Y-%m-%dT%H-%M-%S")
    filepath = self._output_dir / f"run_report_{timestamp}.json"

    data = self._serialize_report(report)
    filepath.write_text(json.dumps(data, indent=2, default=str))

API Reference — Reporting¶

reporting ¶

ReportSink ¶

emit_async async ¶

TestRunReport dataclass ¶

by_harm_category ¶

population_summary ¶

JsonFileReportSink ¶

emit_async async ¶

emit_async `async` ¶

TestRunReport `dataclass` ¶

emit_async `async` ¶