Testing Standards¶

RAMPART uses pytest with pytest-asyncio for its test suite. This page covers test organization, writing guidelines, and coverage expectations. For the complete reference, see the unit test standards.

The standards on this page apply to both unit and integration tests — the underlying instruction file targets all files under tests/. Integration tests differ in scope (end-to-end across modules) and may use real components instead of mocks, but the naming, typing, and structural rules are identical.

Test Organization¶

Directory Structure¶

Tests mirror the source tree:

Text Only

tests/
├── fixtures.py              # Shared test utilities
├── unit/                    # Unit tests (run in CI)
│   ├── attacks/
│   │   └── test_xpia.py
│   ├── converters/
│   ├── core/
│   │   ├── test_execution.py
│   │   ├── test_result.py
│   │   └── ...
│   ├── drivers/
│   ├── evaluators/
│   ├── payloads/
│   ├── probes/
│   ├── pyrit_bridge/
│   ├── pytest_plugin/
│   ├── reporting/
│   └── surfaces/
└── integration/             # Integration tests (not in CI)
    └── test_smoke.py

Place unit tests at tests/unit/<module>/test_<component>.py, mirroring the rampart/ source structure.

Unit vs Integration Tests¶

	Unit Tests	Integration Tests
Location	`tests/unit/`	`tests/integration/`
Run in CI	✅ Yes	❌ No
External dependencies	All mocked	None today (smoke test uses `MockAdapter`); future tests may require a real agent environment
Speed	Fast (seconds)	Slow (minutes)
Command	`uv run pytest tests/unit`	`uv run pytest tests/integration`

Test Classes and Methods¶

Group related tests into classes with descriptive names starting with Test
Test methods must have return type annotation -> None
Async test methods must end with _async
asyncio_mode = "auto" is configured globally — no need for @pytest.mark.asyncio

Python

class TestXPIAExecution:
    def test_returns_safe_when_not_detected(self) -> None:
        ...

    async def test_activates_handles_async(self) -> None:
        ...

Writing Tests¶

Test Data Helpers¶

Define small private helper functions at the top of test files instead of fixtures when no setup/teardown is needed:

Python

def _make_result(*, safe: bool = True) -> Result:
    """Build a minimal Result for testing."""
    return Result(
        safe=safe,
        status=SafetyStatus.SAFE if safe else SafetyStatus.UNSAFE,
        summary="test",
        strategy="test",
    )

Mocking¶

Mock all external dependencies (APIs, file systems, network)
Mock at the boundary — don't mock internal implementation details
Use AsyncMock for async methods, MagicMock for sync

Python

mock_session = AsyncMock()
mock_session.send_async.return_value = Response(text="safe response")

mock_adapter = AsyncMock()
mock_adapter.create_session_async.return_value = mock_session

Assertions¶

Use direct assert statements (not self.assertEqual)
Use is for identity checks (enums, singletons, None)
Use == for value equality
Use pytest.raises with match for error messages

Python

assert result.status is SafetyStatus.SAFE
assert result.summary == "Expected behavior detected"

with pytest.raises(ValueError, match="timeout must be positive"):
    Config(timeout=-1)

Relaxed Lint Rules in Tests¶

Test files have relaxed lint rules (configured via per-file-ignores in pyproject.toml):

No docstrings required
No type annotations required (except -> None on test methods)
Magic values in assertions are fine
Private member access (_private) is allowed
Local imports inside test functions are acceptable

Writing Tests for New Components¶

Testing a New Attack¶

When adding a new attack, test:

Execution lifecycle — the attack calls BaseExecution.execute_async correctly
Phase orchestration — injection, session creation, prompt driving, evaluation happen in order
Result resolution — resolve_as_attack is applied (detected → UNSAFE, not detected → SAFE)
Edge cases — empty handles, max turns reached, early stopping on detection
Error handling — infrastructure errors produce SafetyStatus.ERROR

Testing a New Probe¶

Similar to attacks, but:

No injection phase to test
Result resolution uses resolve_as_probe (detected → SAFE, not detected → UNSAFE)

Testing a New Evaluator¶

Detection — evaluator correctly identifies the target condition
Non-detection — evaluator correctly reports absence of the condition
Edge cases — empty responses, missing data, multiple turns
Evidence — evaluator populates evidence and rationale in EvalResult

Coverage¶

Expectations¶

The project enforces a minimum 80% code coverage threshold
Coverage is measured with coverage.py, configured in pyproject.toml
CI runs a dedicated coverage job on every push and pull request

Running Coverage Locally¶

Bash

# Run tests with coverage
uv run coverage run -m pytest tests/unit -q

# View the report
uv run coverage report

# See which lines are missing coverage
uv run coverage report --show-missing

Parallel Test Execution¶

The project includes pytest-xdist for parallel test execution:

Bash

uv run pytest tests/unit -n auto