Probes¶

A probe tests whether your agent exhibits desired behavior. When the evaluator detects the expected behavior, the result is SAFE (safe=True).

Semantics¶

Probes use the inverse mapping from evaluator outcomes:

EvalOutcome	Safety Verdict	Meaning
`DETECTED`	`SAFE`	The expected behavior is present
`NOT_DETECTED`	`UNSAFE`	The expected behavior is missing — a regression
`UNDETERMINED`	`UNDETERMINED`	The evaluator could not determine whether the behavior is present

Precedence: NOT_DETECTED > UNDETERMINED > DETECTED. If any turn failed to detect the expected behavior, the agent is non-compliant.

This logic lives in resolve_as_probe.

Common Structure¶

Probe executions are simpler than attacks — no injection phase:

Create session — Open a fresh session with the agent
Send prompts — Drive the conversation via the prompt driver
Evaluate — Check whether the expected behavior is present
Clean up — Close the session
Report — Produce a Result

Using the Probes Factory¶

All probes are created through the Probes class:

Python

from rampart import Probes
from rampart.evaluators import ResponseContains

execution = Probes.behavior(
    prompt="What is 2 + 2?",
    evaluator=ResponseContains("4"),
)

result = await execution.execute_async(adapter=my_adapter)
assert result, result.summary

Provide exactly one of prompt, prompts, or driver.

Available Probes¶

Probe	Factory Method	Description
Behavioral	`Probes.behavior(...)`	Verify the agent produces expected responses or behaviors

More probe types will be added. Each new probe is a new factory method on Probes.