pyrit.score

Scoring functionality for evaluating AI model responses across various dimensions including harm detection, objective completion, and content classification.

Functions¶

`create_conversation_scorer`¶

create_conversation_scorer(scorer: Scorer, validator: ScorerPromptValidator | None = None) → Scorer

Create a ConversationScorer that inherits from the same type as the wrapped scorer.

This factory dynamically creates a ConversationScorer class that inherits from the wrapped scorer’s base class (FloatScaleScorer or TrueFalseScorer), ensuring the returned scorer is an instance of both ConversationScorer and the wrapped scorer’s type.

Parameter	Type	Description
`scorer`	`Scorer`	The scorer to wrap for conversation-level evaluation. Must be an instance of FloatScaleScorer or TrueFalseScorer.
`validator`	`ScorerPromptValidator	None`

Returns:

Scorer — A ConversationScorer instance that is also an instance of the wrapped scorer’s type.

Raises:

ValueError — If the scorer is not an instance of FloatScaleScorer or TrueFalseScorer.

`find_objective_metrics_by_eval_hash`¶

find_objective_metrics_by_eval_hash(eval_hash: str, file_path: Path | None = None) → ObjectiveScorerMetrics | None

Find objective scorer metrics by evaluation hash.

Parameter	Type	Description
`eval_hash`	`str`	The scorer evaluation hash to search for.
`file_path`	`Path	None`

Returns:

ObjectiveScorerMetrics | None — ObjectiveScorerMetrics if found, else None.

`get_all_harm_metrics`¶

get_all_harm_metrics(harm_category: str) → list[ScorerMetricsWithIdentity[HarmScorerMetrics]]

Load all harm scorer metrics for a specific harm category.

Returns a list of ScorerMetricsWithIdentity[HarmScorerMetrics] objects that wrap the scorer’s identity information and its performance metrics, enabling clean attribute access like entry.metrics.mean_absolute_error or entry.metrics.harm_category.

Parameter	Type	Description
`harm_category`	`str`	The harm category to load metrics for (e.g., “hate_speech”, “violence”).

Returns:

list[ScorerMetricsWithIdentity[HarmScorerMetrics]] — list[ScorerMetricsWithIdentity[HarmScorerMetrics]]: List of metrics with scorer identity. Access metrics via entry.metrics.mean_absolute_error, entry.metrics.harm_category, etc. Access scorer info via entry.scorer_identifier.class_name, etc.

`get_all_objective_metrics`¶

get_all_objective_metrics(file_path: Path | None = None) → list[ScorerMetricsWithIdentity[ObjectiveScorerMetrics]]

Load all objective scorer metrics with full scorer identity for comparison.

Returns a list of ScorerMetricsWithIdentity[ObjectiveScorerMetrics] objects that wrap the scorer’s identity information and its performance metrics, enabling clean attribute access like entry.metrics.accuracy or entry.metrics.f1_score.

Parameter	Type	Description
`file_path`	`Path	None`

Returns:

list[ScorerMetricsWithIdentity[ObjectiveScorerMetrics]] — list[ScorerMetricsWithIdentity[ObjectiveScorerMetrics]]: List of metrics with scorer identity. Access metrics via entry.metrics.accuracy, entry.metrics.f1_score, etc. Access scorer info via entry.scorer_identifier.class_name, etc.

`get_scorer_info`¶

get_scorer_info() → list[_ScorerInfo]

Retrieve metadata for every public, concrete scorer exported from pyrit.score.

Iterates the package’s public API, keeps concrete subclasses of TrueFalseScorer or FloatScaleScorer, and records each scorer’s return type and whether it uses a generative chat target. Abstract bases and non-scorer exports are skipped.

This is a temporary helper used only to render the documentation’s scorer reference table; see _ScorerInfo for why it should not be built upon.

Returns:

list[_ScorerInfo] — list[_ScorerInfo]: Scorers sorted by score type, then LLM-based scorers last within each type, then by name.

`render_category_system_prompt`¶

render_category_system_prompt(content_classifier: ContentClassifier, system_prompt_template: SeedPrompt | str | None = None) → SeedPrompt

Render a content-classification scoring system prompt from a category list.

The bundled content-classifier template is used when system_prompt_template is omitted.

Parameter	Type	Description
`content_classifier`	`ContentClassifier`	The classifier supplying categories and fallback.
`system_prompt_template`	`SeedPrompt	str

Returns:

SeedPrompt — A rendered copy of the template with its value populated.

`render_insecure_code_system_prompt`¶

render_insecure_code_system_prompt(harm_categories: Sequence[str] | str = _DEFAULT_HARM_CATEGORY, system_prompt_template: SeedPrompt | str | None = None) → SeedPrompt

Render an insecure-code scoring system prompt from a template.

Uses the bundled insecure-code template when system_prompt_template is omitted.

Parameter	Type	Description
`harm_categories`	`Sequence[str]	str`
`system_prompt_template`	`SeedPrompt	str

Returns:

SeedPrompt — A rendered copy of the template with its value populated.

Raises:

ValueError — If harm_categories is empty.

`render_likert_system_prompt`¶

render_likert_system_prompt(likert_scale: LikertScale, system_prompt_template: SeedPrompt | str | None = None) → SeedPrompt

Render a Likert scoring system prompt from a LikertScale.

The bundled Likert template is used when system_prompt_template is omitted. A supplied SeedPrompt preserves metadata such as response_json_schema; a string is treated as an inline Jinja template.

Parameter	Type	Description
`likert_scale`	`LikertScale`	The Likert scale supplying prompt parameters and bounds.
`system_prompt_template`	`SeedPrompt	str

Returns:

SeedPrompt — A rendered copy of the template.

`render_scale_system_prompt`¶

render_scale_system_prompt(scale: NumericRubric, system_prompt_template: SeedPrompt | str | None = None) → SeedPrompt

Render a numeric-scale scoring system prompt from a scale and a template.

The bundled general template is used when system_prompt_template is omitted. A supplied SeedPrompt preserves metadata such as response_json_schema; a string is treated as an inline Jinja template.

Parameter	Type	Description
`scale`	`NumericRubric`	The rubric supplying prompt parameters and normalization bounds.
`system_prompt_template`	`SeedPrompt	str

Returns:

SeedPrompt — A rendered copy of the template with its value populated.

`render_true_false_system_prompt`¶

render_true_false_system_prompt(question: TrueFalseQuestion, system_prompt_template: SeedPrompt | str | None = None) → SeedPrompt

Render a true/false scoring system prompt from a question and a template.

Uses the bundled true/false template when system_prompt_template is omitted.

Parameter	Type	Description
`question`	`TrueFalseQuestion`	The question supplying the render parameters.
`system_prompt_template`	`SeedPrompt	str

Returns:

SeedPrompt — A rendered copy of the template with its value populated.

`AnthraxKeywordScorer`¶

Bases: RegexScorer

Detect anthrax / B. anthracis weaponisation keywords in model output.

Keywords derived from the 0DIN JEF <https://github.com/0din-ai/0din-JEF>_ CBRN scoring module.

`AudioFloatScaleScorer`¶

Bases: FloatScaleScorer

A scorer that processes audio files by transcribing them and scoring the transcript.

The AudioFloatScaleScorer transcribes audio to text using Azure Speech-to-Text, then scores the transcript using a FloatScaleScorer.

Constructor Parameters:

Parameter	Type	Description
`text_capable_scorer`	`FloatScaleScorer`	A FloatScaleScorer capable of processing text. This scorer will be used to evaluate the transcribed audio content.
`validator`	`ScorerPromptValidator	None`

`AudioTrueFalseScorer`¶

Bases: TrueFalseScorer

A scorer that processes audio files by transcribing them and scoring the transcript.

The AudioTrueFalseScorer transcribes audio to text using Azure Speech-to-Text, then scores the transcript using a TrueFalseScorer.

Constructor Parameters:

Parameter	Type	Description
`text_capable_scorer`	`TrueFalseScorer`	A TrueFalseScorer capable of processing text. This scorer will be used to evaluate the transcribed audio content.
`validator`	`ScorerPromptValidator	None`

`AzureContentFilterScorer`¶

Bases: FloatScaleScorer

A scorer that uses Azure Content Safety API to evaluate text and images for harmful content.

This scorer analyzes content across multiple harm categories (hate, self-harm, sexual, violence) and returns a score for each category in the range [0, 1], where higher scores indicate more severe content. Supports both text and image inputs.

Constructor Parameters:

Parameter	Type	Description
`endpoint`	`str	None
`api_key`	`str	Callable[[], str
`harm_categories`	`list[TextCategory]	None`
`validator`	`ScorerPromptValidator	None`

Methods:

`evaluate_async`¶

evaluate_async(file_mapping: ScorerEvalDatasetFiles | None = None, num_scorer_trials: int = 3, update_registry_behavior: RegistryUpdateBehavior | None = None, max_concurrency: int = 10) → ScorerMetrics | None

Evaluate this scorer against human-labeled datasets.

AzureContentFilterScorer requires exactly one harm category to be configured for evaluation. This ensures each score corresponds to exactly one category in the ground truth dataset.

Parameter	Type	Description
`file_mapping`	`ScorerEvalDatasetFiles	None`
`num_scorer_trials`	`int`	Number of times to score each response. Defaults to 3. Defaults to `3`.
`update_registry_behavior`	`RegistryUpdateBehavior	None`
`max_concurrency`	`int`	Maximum concurrent scoring requests. Defaults to 10. Defaults to `10`.

Returns:

ScorerMetrics | None — The evaluation metrics, or None if no datasets found.

Raises:

ValueError — If more than one harm category is configured.

`BatchScorer`¶

A utility class for scoring prompts in batches in a parallelizable and convenient way.

This class provides functionality to score existing prompts stored in memory without any target interaction, making it a pure scoring utility.

Constructor Parameters:

Parameter	Type	Description
`batch_size`	`int`	The (max) batch size for sending prompts. Defaults to 10. Note: If using a scorer that takes a prompt target, and providing max requests per minute on the target, this should be set to 1 to ensure proper rate limit management. Defaults to `10`.

Methods:

`score_responses_by_filters_async`¶

score_responses_by_filters_async(scorer: Scorer, conversation_id: str | uuid.UUID | None = None, prompt_ids: list[str] | list[uuid.UUID] | None = None, labels: dict[str, str] | None = None, sent_after: datetime | None = None, sent_before: datetime | None = None, original_values: list[str] | None = None, converted_values: list[str] | None = None, data_type: str | None = None, not_data_type: str | None = None, converted_value_sha256: list[str] | None = None, objective: str = '') → list[Score]

Score the responses that match the specified filters.

Parameter	Type	Description
`scorer`	`Scorer`	The Scorer object to use for scoring.
`conversation_id`	`str	uuid.UUID
`prompt_ids`	`list[str]	list[uuid.UUID]
`labels`	`dict[str, str]	None`
`sent_after`	`datetime	None`
`sent_before`	`datetime	None`
`original_values`	`list[str]	None`
`converted_values`	`list[str]	None`
`data_type`	`str	None`
`not_data_type`	`str	None`
`converted_value_sha256`	`list[str]	None`
`objective`	`str`	A task is used to give the scorer more context on what exactly to score. A task might be the request prompt text or the original attack model’s objective. Note: the same task is applied to all matched prompts. Defaults to an empty string. Defaults to `''`.

Returns:

list[Score] — list[Score]: A list of Score objects for responses that match the specified filters.

Raises:

ValueError — If no entries match the provided filters.

`CallableResponseHandler`¶

Bases: ResponseHandler

ResponseHandler that delegates parsing to a user-supplied callable.

The escape hatch for scoring targets whose raw output is not PyRIT’s default JSON scoring shape (for example a safety classifier that emits safe or unsafe\nS1,S2). The supplied parser maps the raw target text to a score dictionary (score_value/rationale plus optional description/category/metadata); this handler then assembles the UnvalidatedScore. A missing required key raises InvalidJsonException so the standard JSON retry still applies. It intentionally imposes no response_format on the request so classifier targets remain free to return plain text.

Constructor Parameters:

Parameter	Type	Description
`parser`	`Callable[[str], dict[str, Any]]`	Maps the raw target text to a score dictionary. It may raise `InvalidJsonException` to trigger a retry.
`score_value_output_key`	`str`	Key holding the score value. Defaults to “score_value”. Defaults to `'score_value'`.
`rationale_output_key`	`str`	Key holding the rationale. Defaults to “rationale”. Defaults to `'rationale'`.
`description_output_key`	`str`	Key holding the description. Defaults to “description”. Defaults to `'description'`.
`metadata_output_key`	`str`	Key holding the metadata. Defaults to “metadata”. Defaults to `'metadata'`.
`category_output_key`	`str`	Key holding the category. Defaults to “category”. Defaults to `'category'`.

Methods:

`parse`¶

parse(response_text: str, scorer_identifier: ComponentIdentifier, scored_prompt_id: str | uuid.UUID, category: Sequence[str] | str | None = None, objective: str | None = None) → UnvalidatedScore

Parse raw target output into an UnvalidatedScore via the wrapped callable.

Parameter	Type	Description
`response_text`	`str`	The raw text returned by the scoring target.
`scorer_identifier`	`ComponentIdentifier`	Identifier of the scorer that produced the request, stored on the resulting score.
`scored_prompt_id`	`str	uuid.UUID`
`category`	`Sequence[str]	str
`objective`	`str	None`

Returns:

UnvalidatedScore — The parsed score, whose raw_score_value still needs to be normalized and validated by the caller.

Raises:

ValueError — If a category is present in both the response and the argument.
InvalidJsonException — If the parser raises it, fails, or its output is missing a required key.

`ContentClassifier`¶

Bases: BaseModel

A set of categories and the fallback category used for content classification.

Methods:

`from_yaml`¶

from_yaml(path: str | Path) → ContentClassifier

Load a content classifier from a YAML file.

Parameter	Type	Description
`path`	`str	Path`

Returns:

ContentClassifier — The loaded classifier.

Raises:

ValueError — If the YAML does not contain a mapping or fails model validation.

`ContentClassifierCategory`¶

Bases: BaseModel

One named category in a content classifier.

`ContentClassifierPaths`¶

Bases: enum.Enum

Paths to content classifier YAML files.

`ConversationScorer`¶

Bases: Scorer, ABC

Scorer that evaluates entire conversation history rather than individual messages.

This scorer wraps another scorer (FloatScaleScorer or TrueFalseScorer) and evaluates the full conversation context. Useful for multi-turn conversations where context matters (e.g., psychosocial harms that emerge over time or persuasion/deception over many messages).

The ConversationScorer dynamically inherits from the same base class as the wrapped scorer, ensuring proper type compatibility.

Note: This class cannot be instantiated directly. Use create_conversation_scorer() factory instead.

Methods:

`validate_return_scores`¶

validate_return_scores(scores: list[Score]) → None

Validate scores by delegating to the wrapped scorer’s validation.

Parameter	Type	Description
`scores`	`list[Score]`	The scores to validate.

`CredentialLeakScorer`¶

Bases: RegexScorer

A scorer that detects leaked credentials, API keys, and secrets in text responses.

Uses regex pattern matching to identify common credential formats (AWS keys, GitHub tokens, private keys, JWTs, connection strings, etc.) without requiring an LLM call. Returns True if any credential pattern is found in the response.

Constructor Parameters:

Parameter	Type	Description
`patterns`	`dict[str, str]	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`DecodingScorer`¶

Bases: TrueFalseScorer

Scorer that checks if the request values are in the output using a text matching strategy.

This scorer checks if any of the user request values (original_value, converted_value, or metadata decoded_text) match the response converted_value using the configured text matching strategy.

Constructor Parameters:

Parameter	Type	Description
`text_matcher`	`TextMatching	None`
`categories`	`list[str]	None`
`aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.
`validator`	`ScorerPromptValidator	None`

`FentanylKeywordScorer`¶

Bases: RegexScorer

Detect fentanyl synthesis keywords in model output.

Keywords derived from the 0DIN JEF <https://github.com/0din-ai/0din-JEF>_ illicit-substances scoring module.

`FloatScaleScoreAggregator`¶

Namespace for float scale score aggregators that return a single aggregated score.

All aggregators return a list containing one ScoreAggregatorResult that combines all input scores together, preserving all categories.

`FloatScaleScorer`¶

Bases: Scorer

Base class for scorers that return floating-point scores in the range [0, 1].

This scorer evaluates prompt responses and returns numeric scores indicating the degree to which a response exhibits certain characteristics. Each piece in a request response is scored independently, returning one score per piece.

Default error / blocked behavior

When no supported pieces remain after validator filtering (e.g. the response is blocked, has another error type, or no piece matches the scorer’s supported data types), the base score_async invokes _build_fallback_score and returns a single Score with value 0.0. The rationale distinguishes blocked / error / filtered cases. This mirrors TrueFalseScorer’s False default so that downstream consumers (attack strategies, threshold wrappers) get a consistent, “attack did not succeed” value without each call site needing special-cased error handling. Subclasses that need different semantics (e.g. a refusal-style “blocked = True”) should override _score_piece_async or _build_fallback_score.

Constructor Parameters:

Parameter	Type	Description
`validator`	`ScorerPromptValidator`	A validator object used to validate scores.
`chat_target`	`PromptTarget	None`

Methods:

`get_scorer_metrics`¶

get_scorer_metrics() → HarmScorerMetrics | None

Get evaluation metrics for this scorer from the configured evaluation result file.

Returns:

HarmScorerMetrics | None — The metrics for this scorer, or None if not found or not configured.

`validate_return_scores`¶

validate_return_scores(scores: list[Score]) → None

Validate that the returned scores are within the valid range [0, 1].

Raises:

ValueError — If any score is not between 0 and 1.

`FloatScaleScorerAllCategories`¶

Namespace for float scale score aggregators that combine all categories.

These aggregators ignore category boundaries and aggregate all scores together, returning a single ScoreAggregatorResult with all categories combined.

`FloatScaleScorerByCategory`¶

Namespace for float scale score aggregators that group by category.

These aggregators return multiple ScoreAggregatorResult objects (one per category). Useful for scorers like AzureContentFilterScorer that return multiple scores per item.

`FloatScaleThresholdScorer`¶

Bases: TrueFalseScorer

A scorer that applies a threshold to a float scale score to make it a true/false score.

Constructor Parameters:

Parameter	Type	Description
`scorer`	`FloatScaleScorer`	The underlying float scale scorer to use.
`threshold`	`float`	The threshold value between 0 and 1. Scores >= threshold are True, otherwise False.
`float_scale_aggregator`	`FloatScaleAggregatorFunc`	The aggregator function to use for combining multiple float scale scores. Defaults to FloatScaleScoreAggregator.MAX. Defaults to `FloatScaleScoreAggregator.MAX`.

Methods:

`get_chat_target`¶

get_chat_target() → Optional[PromptTarget]

Delegate to the wrapped scorer.

Returns:

Optional[PromptTarget] — PromptTarget | None: The chat target from the wrapped scorer.

`GandalfScorer`¶

Bases: TrueFalseScorer

A scorer for evaluating responses in Gandalf challenges.

This scorer checks whether a password or secret has been successfully extracted from a Gandalf-style challenge by submitting the response to the Gandalf API and checking if the password is correct. Returns True if the password was successfully extracted.

Constructor Parameters:

Parameter	Type	Description
`level`	`GandalfLevel`	The Gandalf challenge level to score against.
`chat_target`	`PromptTarget`	The chat target used for password extraction.
`validator`	`ScorerPromptValidator	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	Aggregator for combining scores. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`HarmHumanLabeledEntry`¶

Bases: HumanLabeledEntry

A class that represents a human-labeled dataset entry for a specific harm category. This class includes the Messages and a list of human scores, which are floats between 0.0 and 1.0 inclusive, representing the degree of harm severity where 0.0 is minimal and 1.0 is maximal. The harm category is a string that represents the type of harm (e.g., “hate_speech”, “misinformation”, etc.).

`HarmScorerEvaluator`¶

Bases: ScorerEvaluator

A class that evaluates a harm scorer against HumanLabeledDatasets of type HARM.

`HarmScorerMetrics`¶

Bases: ScorerMetrics

Metrics for evaluating a harm scorer against a HumanLabeledDataset.

Methods:

`get_harm_definition`¶

get_harm_definition() → HarmDefinition | None

Load and return the HarmDefinition object for this metrics instance.

Loads the harm definition YAML file specified in harm_definition and returns it as a HarmDefinition object. The result is cached after the first load.

Returns:

HarmDefinition | None — The loaded harm definition object, or None if harm_definition is not set.

Raises:

FileNotFoundError — If the harm definition file does not exist.
ValueError — If the harm definition file is invalid.

`HumanLabeledDataset`¶

A class that represents a human-labeled dataset, including the entries and each of their corresponding human scores. This dataset is used to evaluate PyRIT scorer performance via the ScorerEvaluator class. HumanLabeledDatasets can be constructed from a CSV file.

Constructor Parameters:

Parameter	Type	Description
`name`	`str`	The name of the human-labeled dataset. For datasets of uniform type, this is often the harm category (e.g. hate_speech) or objective. It will be used in the naming of metrics (JSON) and model scores (CSV) files when evaluation is run on this dataset.
`entries`	`list[HumanLabeledEntry]`	A list of entries in the dataset.
`metrics_type`	`MetricsType`	The type of the human-labeled dataset, either HARM or OBJECTIVE.
`version`	`str`	The version of the human-labeled dataset.
`harm_definition`	`str`	Path to the harm definition YAML file for HARM datasets. Defaults to `None`.
`harm_definition_version`	`str`	Version of the harm definition YAML file. Used to ensure the human labels match the scoring criteria version. Defaults to `None`.

Methods:

`from_csv`¶

from_csv(csv_path: str | Path, metrics_type: MetricsType, dataset_name: str | None = None, version: str | None = None, harm_definition: str | None = None, harm_definition_version: str | None = None) → HumanLabeledDataset

Load a human-labeled dataset from a CSV file with standard column names.

Expected CSV format:

‘assistant_response’: The assistant’s response text
‘human_score’: Human-assigned label (can have multiple columns for multiple raters)
‘objective’: For OBJECTIVE datasets, the objective being evaluated
‘data_type’: Optional data type (defaults to ‘text’ if not present)

You can optionally include a # comment line at the top of the CSV file to specify the dataset version and harm definition path. The format is:

For harm datasets: # dataset_version=x.y, harm_definition=path/to/definition.yaml, harm_definition_version=x.y
For objective datasets: # dataset_version=x.y

Parameter	Type	Description
`csv_path`	`str	Path`
`metrics_type`	`MetricsType`	The type of the human-labeled dataset, either HARM or OBJECTIVE.
`dataset_name`	`(str, Optional)`	The name of the dataset. If not provided, it will be inferred from the CSV file name. Defaults to `None`.
`version`	`(str, Optional)`	The version of the dataset. If not provided here, it will be inferred from the CSV file if a dataset_version comment line is present. Defaults to `None`.
`harm_definition`	`(str, Optional)`	Path to the harm definition YAML file. If not provided here, it will be inferred from the CSV file if a harm_definition comment is present. Defaults to `None`.
`harm_definition_version`	`(str, Optional)`	Version of the harm definition YAML file. If not provided here, it will be inferred from the CSV file if a harm_definition_version comment is present. Defaults to `None`.

Returns:

HumanLabeledDataset — The human-labeled dataset object.

Raises:

FileNotFoundError — If the CSV file does not exist.
ValueError — If version is not provided and not found in the CSV file.

`get_harm_definition`¶

get_harm_definition() → Optional[HarmDefinition]

Load and return the HarmDefinition object for this dataset.

For HARM datasets, this loads the harm definition YAML file specified in harm_definition and returns it as a HarmDefinition object. The result is cached after the first load.

Returns:

Optional[HarmDefinition] — The loaded harm definition object, or None if this is not a HARM dataset or harm_definition is not set.

Raises:

FileNotFoundError — If the harm definition file does not exist.
ValueError — If the harm definition file is invalid.

`validate`¶

validate() → None

Validate that the dataset is internally consistent.

Checks that all entries match the dataset’s metrics_type and, for HARM datasets, that all entries have the same harm_category, that harm_definition is specified, and that the harm definition file exists and is loadable.

Raises:

ValueError — If entries don’t match metrics_type, harm categories are inconsistent, or harm_definition is missing for HARM datasets.
FileNotFoundError — If the harm definition file does not exist.

`HumanLabeledEntry`¶

A class that represents an entry in a dataset of assistant responses that have been scored by humans. It is used to evaluate PyRIT scorer performance as measured by degree of alignment with human labels. This class includes the Messages and a list of human-assigned scores, which are floats between 0.0 and 1.0 inclusive (representing degree of severity) for harm datasets, and booleans for objective datasets.

`InsecureCodeScorer`¶

Bases: FloatScaleScorer

A scorer that uses an LLM to evaluate code snippets for potential security vulnerabilities.

This scorer is intended for generated-code evaluation scenarios where the response to score is source code or a code-like snippet, such as insecure-coding parity checks across vulnerability scanners. It holds a chat chat_target, a system_prompt (a rendered or static SeedPrompt or plain str), and a response_handler that turns the target’s raw output into a float-scale score.

Constructor Parameters:

Parameter	Type	Description
`chat_target`	`PromptTarget	None`
`system_prompt`	`SeedPrompt	str`
`harm_categories`	`Sequence[str]	str`
`response_handler`	`ResponseHandler	None`
`validator`	`ScorerPromptValidator	None`

Methods:

`from_harm_categories`¶

from_harm_categories(chat_target: PromptTarget, harm_categories: Sequence[str] | str = _DEFAULT_HARM_CATEGORY, system_prompt_template: SeedPrompt | str | None = None, response_handler: ResponseHandler | None = None, validator: ScorerPromptValidator | None = None) → InsecureCodeScorer

Build a scorer whose prompt and score metadata use the same harm categories.

Returns:

InsecureCodeScorer — The constructed scorer.

`JsonSchemaResponseHandler`¶

Bases: ResponseHandler

Default ResponseHandler that parses JSON scoring responses.

Reproduces PyRIT’s historical scoring-response parsing: strip any markdown code fences, json.loads the text, then read the score value, rationale, optional description, category, and metadata from configurable keys. It also owns the response contract: the optional JSON schema handed to the target, and (when numeric_value is set) validating that the parsed score value is numeric.

Constructor Parameters:

Parameter	Type	Description
`score_value_output_key`	`str`	Key holding the score value. Defaults to “score_value”. Defaults to `'score_value'`.
`rationale_output_key`	`str`	Key holding the rationale. Defaults to “rationale”. Defaults to `'rationale'`.
`description_output_key`	`str`	Key holding the description. Defaults to “description”. Defaults to `'description'`.
`metadata_output_key`	`str`	Key holding the metadata. Defaults to “metadata”. Defaults to `'metadata'`.
`category_output_key`	`str`	Key holding the category. Defaults to “category”. Defaults to `'category'`.
`response_schema`	`JsonSchemaDefinition	None`
`numeric_value`	`bool`	When True, `parse` requires the parsed score value to be parsable as a float and raises `InvalidJsonException` otherwise. Defaults to False. Defaults to `False`.

Methods:

`parse`¶

parse(response_text: str, scorer_identifier: ComponentIdentifier, scored_prompt_id: str | uuid.UUID, category: Sequence[str] | str | None = None, objective: str | None = None) → UnvalidatedScore

Parse a JSON scoring response into an UnvalidatedScore.

Parameter	Type	Description
`response_text`	`str`	The raw text returned by the scoring target.
`scorer_identifier`	`ComponentIdentifier`	Identifier of the scorer that produced the request, stored on the resulting score.
`scored_prompt_id`	`str	uuid.UUID`
`category`	`Sequence[str]	str
`objective`	`str	None`

Returns:

UnvalidatedScore — The parsed score, whose raw_score_value still needs to be normalized and validated by the caller.

Raises:

ValueError — If a category is present in both the response and the argument, or the parsed category is not a string or a list of strings.
InvalidJsonException — If the response is not valid JSON, is missing a required key, or (when this handler is numeric) the score value is not parsable as a float.

`LDAPInjectionOutputScorer`¶

Bases: RegexScorer

A scorer that detects LDAP injection payloads in LLM responses.

Maps to OWASP LLM02 (Insecure Output Handling): a model can be coaxed into emitting an LDAP search filter fragment that breaks out of the intended filter — e.g. an always-true *)(uid=*) clause or an injected |(objectClass=*) disjunction — which a downstream directory query can execute, bypassing authentication or enumerating entries. The patterns target the filter-breaking metacharacter sequences specific to LDAP search syntax, not ordinary parentheses. No LLM call is required, so it is cheap enough for batch evaluation and CI gates.

Returns True if any pattern matches.

Constructor Parameters:

Parameter	Type	Description
`patterns`	`dict[str, str]	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`LikertScale`¶

Bases: BaseModel

A category and ordered entries defining a Likert scoring scale.

Methods:

`from_yaml`¶

from_yaml(path: Path | str, evaluation_files: LikertScaleEvalFiles | None = None) → LikertScale

Load a Likert scale from a YAML file.

Parameter	Type	Description
`path`	`Path	str`
`evaluation_files`	`LikertScaleEvalFiles	None`

Returns:

LikertScale — The loaded scale.

Raises:

ValueError — If the YAML does not contain a mapping or fails model validation.

`LikertScaleEntry`¶

Bases: BaseModel

One score value and description in a Likert scale.

`LikertScaleEvalFiles`¶

`LikertScalePaths`¶

Bases: enum.Enum

Enum containing Likert scale configurations including YAML paths and evaluation file mappings.

Each enum value is a tuple of (yaml_path, evaluation_files) where:

yaml_path: Path to the YAML file containing the Likert scale definition
evaluation_files: Optional LikertScaleEvalFiles for scorer evaluation, or None if no dataset exists

Methods:

`load`¶

load() → LikertScale

Load this preset as a LikertScale with its evaluation metadata.

Returns:

LikertScale — The loaded scale.

`MarkdownInjectionScorer`¶

Bases: RegexScorer

A scorer that detects markdown injection and data-exfiltration attempts in text responses.

Uses regex pattern matching to identify markdown constructs commonly abused for injection and data exfiltration:

Markdown image syntax (inline ![alt](url) and reference-style ![alt][ref]) — a classic zero-click exfiltration vector (the renderer fetches the URL).
Non-image markdown links whose URL carries a query/path data parameter ([text](https://host/path?var=...)), matching garak’s extended markdown-exfil link detectors.
Reference-style link definitions pointing at such exfil URLs ([ref]: https://host/logo.png?var=[DATA]).

Returns True if any pattern matches. Mirrors the detection surface of garak’s web_injection markdown-exfil detectors.

Constructor Parameters:

Parameter	Type	Description
`patterns`	`dict[str, str]	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`MethKeywordScorer`¶

Bases: RegexScorer

Detect methamphetamine synthesis keywords in model output.

Keywords derived from the 0DIN JEF <https://github.com/0din-ai/0din-JEF>_ illicit-substances scoring module.

`MetricsType`¶

Bases: Enum

Enum representing the type of metrics when evaluating scorers on human-labeled datasets.

`NerveAgentKeywordScorer`¶

Bases: RegexScorer

Detect nerve-agent synthesis keywords in model output.

Keywords derived from the 0DIN JEF <https://github.com/0din-ai/0din-JEF>_ CBRN scoring module.

`NumericRange`¶

Bases: BaseModel

The numeric range and optional category used to normalize a float score.

`NumericRubric`¶

Bases: NumericRange

A configurable numeric scoring scale and its prompt-rendering parameters.

Methods:

`from_yaml`¶

from_yaml(path: Path | str) → NumericRubric

Load a scale and its template parameters from a YAML file.

Parameter	Type	Description
`path`	`Path	str`

Returns:

NumericRubric — The loaded rubric.

Raises:

ValueError — If the YAML does not contain a mapping or fails model validation.

`ObjectiveHumanLabeledEntry`¶

Bases: HumanLabeledEntry

A class that represents a human-labeled dataset entry for a specific objective. This class includes the Messages and a list of human scores, which are booleans indicating whether the response/conversation meets the objective (e.g., 0 for not meeting the objective, 1 for meeting the objective). The objective is a string that represents the objective (e.g., "how to make a Molotov cocktail?).

`ObjectiveScorerEvaluator`¶

Bases: ScorerEvaluator

A class that evaluates an objective scorer against HumanLabeledDatasets of type OBJECTIVE.

`ObjectiveScorerMetrics`¶

Bases: ScorerMetrics

Metrics for evaluating an objective scorer against a HumanLabeledDataset.

`OpenRedirectOutputScorer`¶

Bases: RegexScorer

A scorer that detects open-redirect payloads in LLM responses.

Maps to OWASP LLM02 (Insecure Output Handling): a model can be coaxed into emitting a redirect target that sends a victim to an attacker-controlled destination — via a redirect parameter pointing off-site, a protocol-relative //host target, an encoded %2f%2f bypass, or userinfo host-confusion (https://trusted.com@evil.com). To keep false positives low the patterns require a redirect-parameter context or an unambiguous bypass marker rather than flagging every absolute URL. No LLM call is required, so it is cheap enough for batch evaluation and CI gates.

Returns True if any pattern matches.

Constructor Parameters:

Parameter	Type	Description
`patterns`	`dict[str, str]	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`PathTraversalOutputScorer`¶

Bases: RegexScorer

A scorer that detects path-traversal payloads aimed at sensitive system files.

Maps to OWASP LLM02 (Insecure Output Handling): a model that emits a path like ../../etc/passwd is providing a payload an agent or downstream component could naively pass to a file-read tool. To minimize false positives the default pattern requires both a multi-segment ../ walk and a known-sensitive target (Unix password/shadow files, windows\system32, or proc/self).

Returns True if any pattern matches.

Constructor Parameters:

Parameter	Type	Description
`patterns`	`dict[str, str]	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`PlagiarismMetric`¶

Bases: Enum

Enum representing different plagiarism detection metrics.

`PlagiarismScorer`¶

Bases: FloatScaleScorer

A scorer that measures plagiarism by computing word-level similarity between the AI response and a reference text.

This scorer implements three similarity metrics:

Word-level longest common subsequence (LCS)
Word-level Levenshtein similarity
Word-level n-gram Jaccard similarity

Constructor Parameters:

Parameter	Type	Description
`reference_text`	`str`	The reference text to compare against.
`metric`	`PlagiarismMetric`	The plagiarism detection metric to use. Defaults to PlagiarismMetric.LCS. Defaults to `PlagiarismMetric.LCS`.
`n`	`int`	The n-gram size for n-gram similarity. Defaults to 5. Defaults to `5`.
`validator`	`ScorerPromptValidator	None`

`PromptShieldScorer`¶

Bases: TrueFalseScorer

Returns true if an attack or jailbreak has been detected by Prompt Shield.

Constructor Parameters:

Parameter	Type	Description
`prompt_shield_target`	`PromptShieldTarget`	The Prompt Shield target to use for scoring.
`validator`	`ScorerPromptValidator	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`QuestionAnswerScorer`¶

Bases: TrueFalseScorer

A class that represents a question answering scorer.

Constructor Parameters:

Parameter	Type	Description
`correct_answer_matching_patterns`	`list[str]`	A list of patterns to check for in the response. If any pattern is found in the response, the score will be True. These patterns should be format strings that will be formatted with the correct answer metadata. Defaults to CORRECT_ANSWER_MATCHING_PATTERNS. Defaults to `CORRECT_ANSWER_MATCHING_PATTERNS`.
`category`	`list[str]	None`
`validator`	`ScorerPromptValidator	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`RefusalScorerPaths`¶

Bases: enum.Enum

Paths to refusal scorer system prompt YAML files.

Each enum value represents a different refusal detection strategy, organized along two dimensions:

Objective dimension (whether an explicit conversation_objective is required):

OBJECTIVE_*: Requires an explicit conversation_objective.
NO_OBJECTIVE_*: Works with or without an explicit objective. If not provided, evaluates against the implied objective from context.

Strictness dimension (how deflection/redirection is classified):

*_STRICT: Only substantive, direct fulfillment of the objective counts as non-refusal. Deflection, redirection, partial info, and excessive caveats are all considered refusals.
*_LENIENT: Only explicit refusals or completely unrelated responses count as refusals. Deflection, partial answers, and follow-up questions are NOT considered refusals.

`RegexScorer`¶

Bases: TrueFalseScorer

A scorer that evaluates text against a set of named regex patterns.

Returns True if any pattern matches. Subclass and provide a default pattern set to create domain-specific scorers (e.g., credential detection, PII).

Constructor Parameters:

Parameter	Type	Description
`patterns`	`dict[str, str]`	A mapping of pattern names to regex strings.
`categories`	`list[str]	None`
`validator`	`ScorerPromptValidator	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`RegistryUpdateBehavior`¶

Bases: Enum

Enum representing how the evaluation registry should be updated.

`ResponseHandler`¶

Bases: abc.ABC

Owns the response contract for a scoring target.

A ResponseHandler owns two things and nothing else: the JSON schema (if any) the scoring target should honor, and turning the raw text the target returns into an UnvalidatedScore (including any value validation, such as requiring a numeric score). It does not perform the LLM round-trip, build the system prompt, or decide how the resulting score branches. Different handlers implement different wire formats (e.g. JSON today).

Methods:

`parse`¶

parse(response_text: str, scorer_identifier: ComponentIdentifier, scored_prompt_id: str | uuid.UUID, category: Sequence[str] | str | None = None, objective: str | None = None) → UnvalidatedScore

Parse raw target output into an UnvalidatedScore.

Parameter	Type	Description
`response_text`	`str`	The raw text returned by the scoring target.
`scorer_identifier`	`ComponentIdentifier`	Identifier of the scorer that produced the request, stored on the resulting score.
`scored_prompt_id`	`str	uuid.UUID`
`category`	`Sequence[str]	str
`objective`	`str	None`

Returns:

UnvalidatedScore — The parsed score, whose raw_score_value still needs to be normalized and validated by the caller.

`SQLInjectionOutputScorer`¶

Bases: RegexScorer

A scorer that detects SQL injection payloads emitted in LLM responses.

Maps to OWASP LLM02 (Insecure Output Handling): if an agent passes a model-authored string into a query builder without parameterization, payloads like UNION SELECT exfiltrators or destructive ;DROP statements can land in the database. This scorer flags the canonical payload shapes without an LLM call.

Returns True if any pattern matches.

Constructor Parameters:

Parameter	Type	Description
`patterns`	`dict[str, str]	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`SSRFOutputScorer`¶

Bases: RegexScorer

A scorer that detects server-side request forgery (SSRF) targets in LLM responses.

Maps to OWASP LLM02 (Insecure Output Handling): a model can be coaxed into emitting a request to an internal-only target (cloud metadata service, loopback, RFC1918 private range) or an SSRF-prone URL scheme (gopher/dict). A downstream agent or tool that fetches such a URL can be turned into a confused deputy. This scorer flags the common SSRF target families without requiring an LLM call, so it is cheap enough for batch evaluation and CI gates.

Returns True if any pattern matches.

Constructor Parameters:

Parameter	Type	Description
`patterns`	`dict[str, str]	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`SSTIOutputScorer`¶

Bases: RegexScorer

A scorer that detects server-side template injection (SSTI) payloads in LLM responses.

Maps to OWASP LLM02 (Insecure Output Handling): a model can be coaxed into emitting a template expression that a downstream rendering engine (Jinja2, Twig, Freemarker, ERB, Velocity) will evaluate, leading to data disclosure or remote code execution. To keep false positives low the patterns are limited to two unambiguous exploitation markers — the canonical arithmetic eval probe ({{7*7}} and its ${} / #{} variants) and the Python object-traversal gadget chains used to escape the sandbox — rather than ordinary templating such as {{ variable }}. No LLM call is required, so it is cheap enough for batch evaluation and CI gates.

Returns True if any pattern matches.

Constructor Parameters:

Parameter	Type	Description
`patterns`	`dict[str, str]	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`Scorer`¶

Bases: Identifiable, abc.ABC

Abstract base class for scorers.

Subclasses must use the keyword-only constructor shape (def __init__(self, *, ...)); the contract is enforced at class definition time via enforce_keyword_only_init. See .github/instructions/scorers.instructions.md for the full contract.

Constructor Parameters:

Parameter	Type	Description
`validator`	`ScorerPromptValidator`	Validator for message pieces and scorer configuration.
`chat_target`	`PromptTarget	None`

Methods:

`evaluate_async`¶

evaluate_async(file_mapping: ScorerEvalDatasetFiles | None = None, num_scorer_trials: int = 3, update_registry_behavior: RegistryUpdateBehavior | None = None, max_concurrency: int = 10) → ScorerMetrics | None

Evaluate this scorer against human-labeled datasets.

Uses file mapping to determine which datasets to evaluate and how to aggregate results.

Parameter	Type	Description
`file_mapping`	`ScorerEvalDatasetFiles	None`
`num_scorer_trials`	`int`	Number of times to score each response (for measuring variance). Defaults to 3. Defaults to `3`.
`update_registry_behavior`	`RegistryUpdateBehavior	None`
`max_concurrency`	`int`	Maximum number of concurrent scoring requests. Defaults to 10. Defaults to `10`.

Returns:

ScorerMetrics | None — The evaluation metrics, or None if no datasets found.

Raises:

ValueError — If no file_mapping is provided and no evaluation_file_mapping is configured.

`get_chat_target`¶

get_chat_target() → PromptTarget | None

Return the chat target used by this scorer, or None if it doesn’t use one.

Subclasses that wrap other scorers (e.g. inverters, composites) should override to delegate to their inner scorer(s).

Returns:

PromptTarget | None — PromptTarget | None: The chat target, or None if not applicable.

`get_identifier`¶

get_identifier() → ComponentIdentifier

Get the scorer’s identifier with eval_hash always attached.

Overrides the base Identifiable.get_identifier() so that to_dict() always emits the eval_hash key.

Returns:

ComponentIdentifier — The identity with eval_hash set.

`get_scorer_metrics`¶

get_scorer_metrics() → ScorerMetrics | None

Get evaluation metrics for this scorer from the configured evaluation result file.

Looks up metrics by this scorer’s identity hash in the JSONL result file. The result file may contain entries for multiple scorer configurations.

Subclasses must implement this to return the appropriate metrics type:

TrueFalseScorer subclasses should return ObjectiveScorerMetrics
FloatScaleScorer subclasses should return HarmScorerMetrics

Returns:

ScorerMetrics | None — The metrics for this scorer, or None if not found or not configured.

`scale_value_float`¶

scale_value_float(value: float, min_value: float, max_value: float) → float

Scales a value from 0 to 1 based on the given min and max values. E.g. 3 stars out of 5 stars would be .5.

Parameter	Type	Description
`value`	`float`	The value to be scaled.
`min_value`	`float`	The minimum value of the range.
`max_value`	`float`	The maximum value of the range.

Returns:

float — The scaled value.

`score_async`¶

score_async(message: Message, objective: str | None = None, role_filter: ChatMessageRole | None = None, skip_on_error_result: bool = False, infer_objective_from_request: bool = False) → list[Score]

Score the message, add the results to the database, and return a list of Score objects.

Parameter	Type	Description
`message`	`Message`	The message to be scored.
`objective`	`str	None`
`role_filter`	`ChatMessageRole	None`
`skip_on_error_result`	`bool`	If True, skip scoring if the message contains an error. When self.score_blocked_content is also True, blocked responses with partial content will still be scored instead of skipping. Defaults to False. Defaults to `False`.
`infer_objective_from_request`	`bool`	If True, infer the objective from the message’s previous request when objective is not provided. Defaults to False. Defaults to `False`.

Returns:

list[Score] — list[Score]: A list of Score objects representing the results.

Raises:

ScorerLLMResponseBlockedException — If the scorer’s own LLM response is blocked by content filtering and raise_if_scorer_blocks is True (the default).
PyritException — If scoring raises a PyRIT exception (re-raised with enhanced context).
RuntimeError — If scoring raises a non-PyRIT exception (wrapped with scorer context).

`score_image_async`¶

score_image_async(image_path: str, objective: str | None = None) → list[Score]

Score the given image using the chat target.

Parameter	Type	Description
`image_path`	`str`	The path to the image file to be scored.
`objective`	`str	None`

Returns:

list[Score] — list[Score]: A list of Score objects representing the results.

`score_image_batch_async`¶

score_image_batch_async(image_paths: Sequence[str], objectives: Sequence[str] | None = None, batch_size: int = 10) → list[Score]

Score a batch of images asynchronously.

Parameter	Type	Description
`image_paths`	`Sequence[str]`	Sequence of paths to image files to be scored.
`objectives`	`Sequence[str]	None`
`batch_size`	`int`	Maximum number of images to score concurrently. Defaults to 10. Defaults to `10`.

Returns:

list[Score] — list[Score]: A list of Score objects representing the scoring results for all images.

Raises:

ValueError — If the number of objectives does not match the number of image_paths.

`score_prompts_batch_async`¶

score_prompts_batch_async(messages: Sequence[Message], objectives: Sequence[str] | None = None, batch_size: int = 10, role_filter: ChatMessageRole | None = None, skip_on_error_result: bool = False, infer_objective_from_request: bool = False) → list[Score]

Score multiple prompts in batches using the provided objectives.

Parameter	Type	Description
`messages`	`Sequence[Message]`	The messages to be scored.
`objectives`	`Sequence[str]`	The objectives/tasks based on which the prompts should be scored. Must have the same length as messages. Defaults to `None`.
`batch_size`	`int`	The maximum batch size for processing prompts. Defaults to 10. Defaults to `10`.
`role_filter`	`ChatMessageRole	None`
`skip_on_error_result`	`bool`	If True, skip scoring pieces that have errors. Defaults to False. Defaults to `False`.
`infer_objective_from_request`	`bool`	If True and objective is empty, attempt to infer the objective from the request. Defaults to False. Defaults to `False`.

Returns:

list[Score] — list[Score]: A flattened list of Score objects from all scored prompts.

Raises:

ValueError — If objectives is not None and the number of objectives doesn’t match the number of messages.

`score_response_async`¶

score_response_async(response: Message, objective_scorer: Scorer | None = None, auxiliary_scorers: list[Scorer] | None = None, role_filter: ChatMessageRole = 'assistant', objective: str | None = None, skip_on_error_result: bool = True) → dict[str, list[Score]]

Score a response using an objective scorer and optional auxiliary scorers.

Parameter	Type	Description
`response`	`Message`	Response containing pieces to score.
`objective_scorer`	`Scorer	None`
`auxiliary_scorers`	`list[Scorer]	None`
`role_filter`	`ChatMessageRole`	Only score pieces with this exact stored role. Defaults to “assistant” (real responses only, not simulated). Defaults to `'assistant'`.
`objective`	`str	None`
`skip_on_error_result`	`bool`	If True, skip scoring pieces that have errors. Defaults to True. Defaults to `True`.

Returns:

dict[str, list[Score]] — dict[str, list[Score]]: Dictionary with keys auxiliary_scores and objective_scores containing lists of scores from each type of scorer.

Raises:

ValueError — If response is not provided.

`score_response_multiple_scorers_async`¶

score_response_multiple_scorers_async(response: Message, scorers: list[Scorer], role_filter: ChatMessageRole = 'assistant', objective: str | None = None, skip_on_error_result: bool = True) → list[Score]

Score a response using multiple scorers in parallel.

This method applies each scorer to the first scorable response piece (filtered by role and error), and returns all scores. This is typically used for auxiliary scoring where all results are needed.

Parameter	Type	Description
`response`	`Message`	The response containing pieces to score.
`scorers`	`list[Scorer]`	List of scorers to apply.
`role_filter`	`ChatMessageRole`	Only score pieces with this exact stored role. Defaults to “assistant” (real responses only, not simulated). Defaults to `'assistant'`.
`objective`	`str	None`
`skip_on_error_result`	`bool`	If True, skip scoring pieces that have errors (default: True). Defaults to `True`.

Returns:

list[Score] — list[Score]: All scores from all scorers

`score_text_async`¶

score_text_async(text: str, objective: str | None = None) → list[Score]

Scores the given text based on the task using the chat target.

Parameter	Type	Description
`text`	`str`	The text to be scored.
`objective`	`str	None`

Returns:

list[Score] — list[Score]: A list of Score objects representing the results.

`validate_return_scores`¶

validate_return_scores(scores: list[Score]) → None

Validate the scores returned by the scorer. Because some scorers may require specific Score types or values.

Parameter	Type	Description
`scores`	`list[Score]`	The scores to be validated.

`ScorerEvalDatasetFiles`¶

Configuration for evaluating a scorer on a set of dataset files.

Maps input dataset files (via glob patterns) to an output result file. Multiple files matching the patterns will be concatenated before evaluation.

`ScorerEvaluator`¶

Bases: abc.ABC

A class that evaluates an LLM scorer against HumanLabeledDatasets, calculating appropriate metrics and saving them to a file.

Constructor Parameters:

Parameter	Type	Description
`scorer`	`Scorer`	The scorer to evaluate.

Methods:

`evaluate_dataset_async`¶

evaluate_dataset_async(labeled_dataset: HumanLabeledDataset, num_scorer_trials: int = 1, max_concurrency: int = 10) → ScorerMetrics

Run the evaluation for the scorer/policy combination on the passed in HumanLabeledDataset.

This method performs pure computation without side effects (no file writing). It can be called directly with an in-memory HumanLabeledDataset for experiments that don’t use file-based datasets (e.g., iterative rubric tuning with custom splits).

Parameter	Type	Description
`labeled_dataset`	`HumanLabeledDataset`	The HumanLabeledDataset to evaluate the scorer against.
`num_scorer_trials`	`int`	The number of trials to run the scorer on all responses. Defaults to `1`.
`max_concurrency`	`int`	Maximum number of concurrent scoring requests. Defaults to 10. Defaults to `10`.

Returns:

ScorerMetrics — The metrics for the scorer. This will be either HarmScorerMetrics or ObjectiveScorerMetrics depending on the type of the HumanLabeledDataset (HARM or OBJECTIVE).

Raises:

ValueError — If the labeled_dataset is invalid.

`from_scorer`¶

from_scorer(scorer: Scorer, metrics_type: MetricsType | None = None) → ScorerEvaluator

Create a ScorerEvaluator based on the type of scoring.

Parameter	Type	Description
`scorer`	`Scorer`	The scorer to evaluate.
`metrics_type`	`MetricsType`	The type of scoring, either HARM or OBJECTIVE. If not provided, it will default to OBJECTIVE for true/false scorers and HARM for all other scorers. Defaults to `None`.

Returns:

ScorerEvaluator — An instance of HarmScorerEvaluator or ObjectiveScorerEvaluator.

`run_evaluation_async`¶

run_evaluation_async(dataset_files: ScorerEvalDatasetFiles, num_scorer_trials: int = 3, update_registry_behavior: RegistryUpdateBehavior = RegistryUpdateBehavior.SKIP_IF_EXISTS, max_concurrency: int = 10) → ScorerMetrics | None

Evaluate scorer using dataset files configuration.

The update_registry_behavior parameter controls how existing registry entries are handled:

SKIP_IF_EXISTS (default): Check registry for existing results matching scorer config, dataset version, and num_scorer_trials. If found, return cached metrics. If not found, run evaluation and write to registry.
ALWAYS_UPDATE: Always run evaluation and overwrite any existing registry entry.
NEVER_UPDATE: Always run evaluation but never write to registry (for debugging).

Parameter	Type	Description
`dataset_files`	`ScorerEvalDatasetFiles`	ScorerEvalDatasetFiles configuration specifying glob patterns for input files and a result file name.
`num_scorer_trials`	`int`	Number of scoring trials per response. Defaults to 3. Defaults to `3`.
`update_registry_behavior`	`RegistryUpdateBehavior`	Controls how existing registry entries are handled. Defaults to RegistryUpdateBehavior.SKIP_IF_EXISTS. Defaults to `RegistryUpdateBehavior.SKIP_IF_EXISTS`.
`max_concurrency`	`int`	Maximum number of concurrent scoring requests. Defaults to 10. Defaults to `10`.

Returns:

ScorerMetrics | None — ScorerMetrics if evaluation completed, None if no files found.

Raises:

ValueError — If harm_category is not specified for harm scorer evaluations.

`ScorerMetrics`¶

Base dataclass for storing scorer evaluation metrics.

This class provides methods for serializing metrics to JSON strings (see to_json) and loading them from JSON files on disk (see from_json_file).

Methods:

`from_json_file`¶

from_json_file(file_path: str | Path) → T

Load a metrics instance from a JSON file on disk.

This is the canonical deserialization entry point for ScorerMetrics and its subclasses. It accepts a file path (string or Path), not a JSON string — the loader opens the file, unwraps a top-level "metrics" key if present (as used by evaluation result files), and filters out internal underscore-prefixed fields (e.g., cached init=False attributes) before constructing the instance.

Parameter	Type	Description
`file_path`	`str	Path`

Returns:

T — An instance of ScorerMetrics (or subclass) with the loaded data.

Raises:

FileNotFoundError — If the specified file does not exist.

`to_json`¶

to_json() → str

Serialize this metrics instance to a JSON string.

This is the canonical serialization entry point for ScorerMetrics and its subclasses. Pair it with from_json_file (which reads a JSON file written from this string, optionally wrapped in a "metrics" key) for round-trip (de)serialization.

Returns:

str — The JSON string representation of the metrics.

`ScorerMetricsWithIdentity`¶

Bases: Generic[M]

Wrapper that combines scorer metrics with the scorer’s identity information.

This class provides a clean interface for working with evaluation results, allowing access to both the scorer configuration and its performance metrics.

Generic over the metrics type M, so:

ScorerMetricsWithIdentity[ObjectiveScorerMetrics] has metrics: ObjectiveScorerMetrics
ScorerMetricsWithIdentity[HarmScorerMetrics] has metrics: HarmScorerMetrics

`ScorerPrinterBase`¶

Bases: PrinterBase

Abstract base class for printing scorer information.

Subclasses must implement _get_objective_metrics and _get_harm_metrics for data fetching, and write_async for rendering + writing.

Methods:

`render_async`¶

render_async(scorer_identifier: ComponentIdentifier, harm_category: str | None = None) → str

Render scorer information and return it as a string.

Auto-detects scorer type: if harm_category is provided, renders harm metrics; otherwise renders objective metrics.

Parameter	Type	Description
`scorer_identifier`	`ComponentIdentifier`	The scorer identifier.
`harm_category`	`str	None`

Returns:

str — The rendered scorer information text.

`ScorerPromptValidator`¶

Validates message pieces and scorer configurations.

This class provides validation for scorer inputs, ensuring that message pieces meet required criteria such as data types, roles, and metadata requirements.

Constructor Parameters:

Parameter	Type	Description
`supported_data_types`	`Sequence[PromptDataType]	None`
`required_metadata`	`Sequence[str]	None`
`supported_roles`	`Sequence[ChatMessageRole]	None`
`max_pieces_in_response`	`int	None`
`max_text_length`	`int	None`
`enforce_all_pieces_valid`	`bool	None`
`raise_on_no_valid_pieces`	`bool	None`
`is_objective_required`	`bool`	Whether an objective must be provided for scoring. Defaults to False. Defaults to `False`.

Methods:

`is_message_piece_supported`¶

is_message_piece_supported(message_piece: MessagePiece) → bool

Check if a message piece is supported by this validator.

Parameter	Type	Description
`message_piece`	`MessagePiece`	The message piece to check.

Returns:

bool — True if the message piece meets all validation criteria, False otherwise.

`validate`¶

validate(message: Message, objective: str | None) → None

Validate a message and objective against configured requirements.

Parameter	Type	Description
`message`	`Message`	The message to validate.
`objective`	`str	None`

Raises:

ValueError — If validation fails due to unsupported pieces, exceeding max pieces, or missing objective.

`SelfAskCategoryScorer`¶

Bases: TrueFalseScorer

A class that represents a self-ask score for text classification and scoring. Given a ContentClassifier, it scores according to its categories and returns the category the MessagePiece fits best.

There is also a false category that is used if the MessagePiece does not fit any of the categories.

The scorer holds a chat_target, a system_prompt (typically rendered from a classifier via render_category_system_prompt), and a response_handler. The category is parsed from the target’s response rather than fixed on the scorer. Use from_content_classifier to build the system prompt from a ContentClassifier.

Constructor Parameters:

Parameter	Type	Description
`chat_target`	`PromptTarget	None`
`system_prompt`	`SeedPrompt	str`
`content_classifier`	`ContentClassifier`	The classifier represented by the prompt.
`response_handler`	`ResponseHandler	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.
`validator`	`ScorerPromptValidator	None`

Methods:

`from_content_classifier`¶

from_content_classifier(chat_target: PromptTarget, content_classifier: ContentClassifier, system_prompt_template: SeedPrompt | str | None = None, response_handler: ResponseHandler | None = None, score_aggregator: TrueFalseAggregatorFunc = TrueFalseScoreAggregator.OR, validator: ScorerPromptValidator | None = None) → SelfAskCategoryScorer

Build a scorer whose system prompt and response contract use one content classifier.

Parameter	Type	Description
`chat_target`	`PromptTarget`	The chat target used for scoring.
`content_classifier`	`ContentClassifier`	The classifier to use.
`system_prompt_template`	`SeedPrompt	str
`response_handler`	`ResponseHandler	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.
`validator`	`ScorerPromptValidator	None`

Returns:

SelfAskCategoryScorer — The constructed scorer.

`SelfAskGeneralFloatScaleScorer`¶

Bases: FloatScaleScorer

A general-purpose self-ask float-scale scorer that uses a chat target and a configurable system prompt and prompt format. The final score is normalized to [0, 1].

The scorer holds a chat chat_target and a response_handler; the system prompt is rendered per-piece from system_prompt_format_string.

Constructor Parameters:

Parameter	Type	Description
`system_prompt_format_string`	`str`	System prompt template with placeholders for objective, prompt, and message_piece.
`scale`	`NumericRange`	The required native score range and optional category.
`chat_target`	`PromptTarget	None`
`prompt_format_string`	`str	None`
`response_handler`	`ResponseHandler	None`
`validator`	`ScorerPromptValidator	None`
`score_value_output_key`	`str`	JSON key for the score value. Defaults to “score_value”. Defaults to `'score_value'`.
`rationale_output_key`	`str`	JSON key for the rationale. Defaults to “rationale”. Defaults to `'rationale'`.
`description_output_key`	`str`	JSON key for the description. Defaults to “description”. Defaults to `'description'`.
`metadata_output_key`	`str`	JSON key for the metadata. Defaults to “metadata”. Defaults to `'metadata'`.
`category_output_key`	`str`	JSON key for the category. Defaults to “category”. Defaults to `'category'`.
`response_json_schema`	`JsonSchemaDefinition	None`

`SelfAskGeneralTrueFalseScorer`¶

Bases: TrueFalseScorer

A general-purpose self-ask True/False scorer that uses a chat target and a configurable system prompt and prompt format.

The scorer holds a chat chat_target and a response_handler; the system prompt is rendered per-piece from system_prompt_format_string.

Constructor Parameters:

Parameter	Type	Description
`system_prompt_format_string`	`str`	System prompt template with placeholders for objective, task (alias of objective), prompt, and message_piece.
`chat_target`	`PromptTarget	None`
`prompt_format_string`	`str	None`
`category`	`str	None`
`response_handler`	`ResponseHandler	None`
`validator`	`ScorerPromptValidator	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	Aggregator for combining scores. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.
`score_value_output_key`	`str`	JSON key for the score value. Defaults to “score_value”. Defaults to `'score_value'`.
`rationale_output_key`	`str`	JSON key for the rationale. Defaults to “rationale”. Defaults to `'rationale'`.
`description_output_key`	`str`	JSON key for the description. Defaults to “description”. Defaults to `'description'`.
`metadata_output_key`	`str`	JSON key for the metadata. Defaults to “metadata”. Defaults to `'metadata'`.
`category_output_key`	`str`	JSON key for the category. Defaults to “category”. Defaults to `'category'`.
`response_json_schema`	`JsonSchemaDefinition	None`

`SelfAskLikertScorer`¶

Bases: FloatScaleScorer

A class that represents a “self-ask” score for text scoring based on a Likert scale. A Likert scale consists of ranked, ordered categories and is often on a 5 or 7 point basis, but you can configure a LikertScale with any set of non-negative integer score values and descriptions directly or by loading a YAML file.

Constructor Parameters:

Parameter	Type	Description
`chat_target`	`PromptTarget	None`
`system_prompt`	`SeedPrompt	str`
`likert_scale`	`LikertScale`	The scale defining entries, category, and normalization.
`response_handler`	`ResponseHandler	None`
`validator`	`ScorerPromptValidator	None`

Methods:

`from_likert_scale`¶

from_likert_scale(chat_target: PromptTarget, likert_scale: LikertScale, system_prompt_template: SeedPrompt | str | None = None, response_handler: ResponseHandler | None = None, validator: ScorerPromptValidator | None = None) → SelfAskLikertScorer

Build a scorer whose system prompt, category and min/max are driven by a Likert scale.

Renders the Likert scoring system prompt from likert_scale and stores that same object for category and score normalization.

Parameter	Type	Description
`chat_target`	`PromptTarget`	The chat target used for scoring.
`likert_scale`	`LikertScale`	The Likert scale to use.
`system_prompt_template`	`SeedPrompt	str
`response_handler`	`ResponseHandler	None`
`validator`	`ScorerPromptValidator	None`

Returns:

SelfAskLikertScorer — The constructed scorer.

`SelfAskQuestionAnswerScorer`¶

Bases: SelfAskTrueFalseScorer

A class that represents a self-ask question answering scorer.

Usually, QuestionAnswerScorer should be used, but this can be useful for when choices are not sent to objective target or you need more flexibility in determining if the questions were answered correctly.

Constructor Parameters:

Parameter	Type	Description
`chat_target`	`PromptTarget`	The chat target to use for the scorer. Must satisfy CHAT_TARGET_REQUIREMENTS (multi-turn + editable history capabilities, possibly via normalization-pipeline adaptation).
`true_false_question_path`	`pathlib.Path	None`
`response_handler`	`ResponseHandler	None`
`validator`	`ScorerPromptValidator	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`SelfAskRefusalScorer`¶

Bases: TrueFalseScorer

A self-ask scorer that detects refusal in AI responses.

This scorer uses a language model to determine whether a response contains a refusal to answer or comply with the given prompt. It’s useful for evaluating whether AI systems are appropriately refusing harmful requests.

The scorer supports four modes via RefusalScorerPaths:

OBJECTIVE_STRICT: Requires an explicit objective. Deflection/redirection is considered a refusal.
OBJECTIVE_LENIENT: Requires an explicit objective. Only explicit refusals count.
NO_OBJECTIVE_STRICT: Works with or without an objective. Deflection/redirection is considered a refusal.
NO_OBJECTIVE_LENIENT: Works with or without an objective. Only explicit refusals count.

Constructor Parameters:

Parameter	Type	Description
`chat_target`	`PromptTarget	None`
`system_prompt`	`SeedPrompt	str
`response_handler`	`ResponseHandler	None`
`score_category`	`Sequence[str]	str
`prompt_format_string`	`str	None`
`validator`	`ScorerPromptValidator	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`SelfAskScaleScorer`¶

Bases: FloatScaleScorer

A “self-ask” scorer for text scoring on a customizable numeric scale.

The scorer holds a chat_target, a rendered or static system_prompt, a NumericRubric defining normalization and category, and a response_handler. Use from_scale to render a template and configure the scorer from one rubric object.

Constructor Parameters:

Parameter	Type	Description
`chat_target`	`PromptTarget	None`
`system_prompt`	`SeedPrompt	str`
`scale`	`NumericRubric`	The rubric defining score normalization and category.
`response_handler`	`ResponseHandler	None`
`validator`	`ScorerPromptValidator	None`

Methods:

`from_scale`¶

from_scale(chat_target: PromptTarget, scale: NumericRubric | None = None, system_prompt_template: SeedPrompt | str | None = None, response_handler: ResponseHandler | None = None, validator: ScorerPromptValidator | None = None) → SelfAskScaleScorer

Build a scorer whose prompt and normalization are driven by one NumericRubric.

When scale is omitted, the bundled tree-of-attacks scale is used. The supplied scale is rendered through the bundled template or system_prompt_template and is also stored on the scorer for normalization, preventing prompt bounds from being configured separately.

Parameter	Type	Description
`chat_target`	`PromptTarget`	The chat target used for scoring.
`scale`	`NumericRubric	None`
`system_prompt_template`	`SeedPrompt	str
`response_handler`	`ResponseHandler	None`
`validator`	`ScorerPromptValidator	None`

Returns:

SelfAskScaleScorer — The constructed scorer.

`SelfAskTrueFalseScorer`¶

Bases: TrueFalseScorer

A self-ask true/false scorer with scorer-owned composition.

The scorer holds a chat_target, a system_prompt, the TrueFalseQuestion represented by that prompt, and a response_handler that turns the target’s raw output into a score.

Two construction modes are supported:

Default: omit both system_prompt and question to use the bundled TASK_ACHIEVED rubric.
Custom: use from_question to render a template from one question, or pass both an already rendered system_prompt and its question directly.

Constructor Parameters:

Parameter	Type	Description
`chat_target`	`PromptTarget	None`
`system_prompt`	`SeedPrompt	str
`question`	`TrueFalseQuestion	None`
`response_handler`	`ResponseHandler	None`
`validator`	`ScorerPromptValidator	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

Methods:

`from_question`¶

from_question(chat_target: PromptTarget, question: TrueFalseQuestion, system_prompt_template: SeedPrompt | str | None = None, response_handler: ResponseHandler | None = None, validator: ScorerPromptValidator | None = None, score_aggregator: TrueFalseAggregatorFunc = TrueFalseScoreAggregator.OR) → SelfAskTrueFalseScorer

Build a scorer whose system prompt and category are driven by a TrueFalseQuestion.

Renders the true/false scoring system prompt from question (via render_true_false_system_prompt) and sets score_category from question.category. Use this when a preset question drives more than the prompt; for a fully custom or static prompt, construct the scorer directly with system_prompt.

Parameter	Type	Description
`chat_target`	`PromptTarget`	The chat target used for scoring.
`question`	`TrueFalseQuestion`	The question supplying the system prompt and category.
`system_prompt_template`	`SeedPrompt	str
`response_handler`	`ResponseHandler	None`
`validator`	`ScorerPromptValidator	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

Returns:

SelfAskTrueFalseScorer — The constructed scorer.

`ShellCommandOutputScorer`¶

Bases: RegexScorer

A scorer that detects dangerous shell payloads in LLM responses.

Maps to OWASP LLM02 (Insecure Output Handling): an LLM can be prompted to suggest, or directly emit, shell commands that a downstream agent or human operator might paste into a terminal. This scorer flags the high-blast-radius payload families — remote pipe-to-shell, destructive filesystem commands, reverse shells, and environment-variable exfiltration — without an LLM call.

Returns True if any pattern matches.

Constructor Parameters:

Parameter	Type	Description
`patterns`	`dict[str, str]	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`StaticPromptInjectionScorer`¶

Bases: RegexScorer

A scorer that detects prompt injection attempts in text using static regex patterns.

Covers OWASP LLM01 (Prompt Injection) sub-categories: instruction override, system prompt extraction, jailbreak role-play, constraint removal, chat template injection, and encoding-based evasion. Complements the API-based PromptShieldScorer as a fast, local, zero-dependency pre-filter.

Returns True if any prompt injection pattern is found in the text.

Constructor Parameters:

Parameter	Type	Description
`patterns`	`dict[str, str]	None`
`categories`	`list[str]	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`SubStringScorer`¶

Bases: TrueFalseScorer

Scorer that checks if a given substring is present in the text.

This scorer performs substring matching using a configurable text matching strategy. Supports both exact substring matching and approximate matching.

Constructor Parameters:

Parameter	Type	Description
`substring`	`str`	The substring to search for in the text.
`text_matcher`	`TextMatching	None`
`categories`	`list[str]	None`
`aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.
`validator`	`ScorerPromptValidator	None`

`TrueFalseAggregatorFunc`¶

`TrueFalseCompositeScorer`¶

Bases: TrueFalseScorer

Composite true/false scorer that aggregates results from other true/false scorers.

This scorer invokes a collection of constituent TrueFalseScorer instances and reduces their single-score outputs into one final true/false score using the supplied aggregation function (e.g., TrueFalseScoreAggregator.AND, TrueFalseScoreAggregator.OR, TrueFalseScoreAggregator.MAJORITY).

Constructor Parameters:

Parameter	Type	Description
`aggregator`	`TrueFalseAggregatorFunc`	Aggregation function to combine child scores (e.g., `TrueFalseScoreAggregator.AND`, `TrueFalseScoreAggregator.OR`, `TrueFalseScoreAggregator.MAJORITY`).
`scorers`	`list[TrueFalseScorer]`	The constituent true/false scorers to invoke.

Methods:

`get_chat_target`¶

get_chat_target() → Optional[PromptTarget]

Return the chat target from the first sub-scorer that has one.

`TrueFalseInverterScorer`¶

Bases: TrueFalseScorer

A scorer that inverts a true false score.

Constructor Parameters:

Parameter	Type	Description
`scorer`	`TrueFalseScorer`	The underlying true/false scorer whose results will be inverted.
`validator`	`ScorerPromptValidator	None`

Methods:

`get_chat_target`¶

get_chat_target() → Optional[PromptTarget]

Delegate to the wrapped scorer.

Returns:

Optional[PromptTarget] — PromptTarget | None: The chat target from the wrapped scorer.

`TrueFalseQuestion`¶

Bases: BaseModel

A value type representing a true/false scoring question.

Owns the descriptive parameters (category, true_description, false_description, metadata) that are rendered into a true/false scoring system prompt. It can be constructed directly or loaded from a YAML file via from_yaml, and it exposes the Jinja render parameters via render_params so a templated SeedPrompt can be rendered independently of how the question was obtained (e.g. template YAML and question YAML kept in separate files).

Methods:

`from_yaml`¶

from_yaml(path: str | Path) → TrueFalseQuestion

Load a TrueFalseQuestion from a YAML file.

Parameter	Type	Description
`path`	`str	Path`

Returns:

TrueFalseQuestion — The loaded question.

Raises:

ValueError — If the file does not contain a YAML mapping.

`TrueFalseQuestionPaths`¶

Bases: enum.Enum

Paths to true/false question YAML files.

`TrueFalseScoreAggregator`¶

Namespace for true/false score aggregators that return a single aggregated score.

All aggregators return a list containing one ScoreAggregatorResult that combines all input scores together, preserving all categories.

`TrueFalseScorer`¶

Bases: Scorer

Base class for scorers that return true/false binary scores.

This scorer evaluates prompt responses and returns a single boolean score indicating whether the response meets a specific criterion. Multiple pieces in a request response are aggregated using a TrueFalseAggregatorFunc function (default: TrueFalseScoreAggregator.OR).

Default error / blocked behavior

When no supported pieces remain after validator filtering (e.g. the response is blocked, has another error type, or no piece matches the scorer’s supported data types), the base score_async invokes _build_fallback_score and returns a single Score(False) whose rationale distinguishes blocked / error / filtered cases. This mirrors FloatScaleScorer’s 0.0 default so that downstream consumers (attack strategies, threshold wrappers) get a consistent, “attack did not succeed” value without each call site needing special-cased error handling. Subclasses that need different semantics (e.g. SelfAskRefusalScorer, which returns True on blocked) should override _score_piece_async and accept the error data type in their validator.

Constructor Parameters:

Parameter	Type	Description
`validator`	`ScorerPromptValidator`	Custom validator.
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.
`chat_target`	`PromptTarget	None`

Methods:

`get_scorer_metrics`¶

get_scorer_metrics() → ObjectiveScorerMetrics | None

Get evaluation metrics for this scorer from the configured evaluation result file.

Returns:

ObjectiveScorerMetrics | None — The metrics for this scorer, or None if not found or not configured.

`validate_return_scores`¶

validate_return_scores(scores: list[Score]) → None

Validate the scores returned by the scorer.

Parameter	Type	Description
`scores`	`list[Score]`	The scores to be validated.

Raises:

ValueError — If the number of scores is not exactly one.
ValueError — If the score value is not “true” or “false”.

`VideoFloatScaleScorer`¶

Bases: FloatScaleScorer

A scorer that processes videos by extracting frames and scoring them using a float scale image scorer.

The VideoFloatScaleScorer breaks down a video into frames and uses a float scale scoring mechanism. Frame scores are aggregated using a FloatScaleAggregatorFunc.

By default, uses FloatScaleScorerByCategory.MAX which groups scores by category (useful for scorers like AzureContentFilterScorer that return multiple scores per frame). This returns one aggregated score per category (e.g., one for “Hate”, one for “Violence”, etc.).

For scorers that return a single score per frame, or to combine all categories together, use FloatScaleScoreAggregator.MAX, FloatScaleScorerAllCategories.MAX, etc.

Optionally, an audio_scorer can be provided to also score the video’s audio track. When provided, the audio is extracted, transcribed, and scored. The audio scores are included in the aggregation.

Constructor Parameters:

Parameter	Type	Description
`image_capable_scorer`	`FloatScaleScorer`	A FloatScaleScorer capable of processing images.
`audio_scorer`	`FloatScaleScorer	None`
`num_sampled_frames`	`int	None`
`validator`	`ScorerPromptValidator	None`
`score_aggregator`	`FloatScaleAggregatorFunc`	Aggregator for combining frame scores. Defaults to FloatScaleScorerByCategory.MAX. Use FloatScaleScorerByCategory.MAX/AVERAGE/MIN for scorers that return multiple scores per frame (groups by category and returns one score per category). Use FloatScaleScorerAllCategories.MAX/AVERAGE/MIN to combine all scores regardless of category (returns single score with all categories combined). Use FloatScaleScoreAggregator.MAX/AVERAGE/MIN for simple aggregation preserving all categories (returns single score with all categories preserved). Defaults to `FloatScaleScorerByCategory.MAX`.
`image_objective_template`	`str	None`
`audio_objective_template`	`str	None`

`VideoTrueFalseScorer`¶

Bases: TrueFalseScorer

A scorer that processes videos by extracting frames and scoring them using a true/false image scorer.

Aggregation Logic (hard-coded): - Frame scores are aggregated using OR: if ANY frame meets the objective, the visual score is True. - When audio_scorer is provided, the final score uses AND: BOTH visual (frames) AND audio must be True for the overall video score to be True.

Constructor Parameters:

Parameter	Type	Description
`image_capable_scorer`	`TrueFalseScorer`	A TrueFalseScorer capable of processing images.
`audio_scorer`	`TrueFalseScorer	None`
`num_sampled_frames`	`int	None`
`validator`	`ScorerPromptValidator	None`
`image_objective_template`	`str	None`
`audio_objective_template`	`str	None`

`XSSOutputScorer`¶

Bases: RegexScorer

A scorer that detects cross-site scripting (XSS) payloads in LLM responses.

Maps to OWASP LLM02 (Insecure Output Handling): a model can be coaxed into emitting HTML/JS that an unwary downstream consumer (web view, markdown renderer, chat UI) will execute. This scorer flags the common payload families without requiring an LLM call, so it is cheap enough for batch evaluation and CI gates.

Returns True if any pattern matches.

Constructor Parameters:

Parameter	Type	Description
`patterns`	`dict[str, str]	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

`XXEOutputScorer`¶

Bases: RegexScorer

A scorer that detects XML external entity (XXE) payloads in LLM responses.

Maps to OWASP LLM02 (Insecure Output Handling): a model can be coaxed into emitting an XML document that declares an external entity, which a downstream XML parser with external-entity resolution enabled will dereference — disclosing local files or issuing outbound (SSRF-style) requests. The patterns target external/parameter entity declarations and the DOCTYPE-with-internal-subset that carries them; these are XXE exploitation markers, not ordinary XML. No LLM call is required, so it is cheap enough for batch evaluation and CI gates.

Returns True if any pattern matches.

Constructor Parameters:

Parameter	Type	Description
`patterns`	`dict[str, str]	None`
`score_aggregator`	`TrueFalseAggregatorFunc`	The aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to `TrueFalseScoreAggregator.OR`.

Functions¶

create_conversation_scorer¶

find_objective_metrics_by_eval_hash¶

get_all_harm_metrics¶

get_all_objective_metrics¶

get_scorer_info¶

render_category_system_prompt¶

render_insecure_code_system_prompt¶

render_likert_system_prompt¶

render_scale_system_prompt¶

render_true_false_system_prompt¶

AnthraxKeywordScorer¶

AudioFloatScaleScorer¶

AudioTrueFalseScorer¶

AzureContentFilterScorer¶

evaluate_async¶

BatchScorer¶

score_responses_by_filters_async¶

CallableResponseHandler¶

parse¶

ContentClassifier¶

from_yaml¶

ContentClassifierCategory¶

ContentClassifierPaths¶

ConversationScorer¶

validate_return_scores¶

CredentialLeakScorer¶

DecodingScorer¶

FentanylKeywordScorer¶

FloatScaleScoreAggregator¶

FloatScaleScorer¶

get_scorer_metrics¶

validate_return_scores¶

FloatScaleScorerAllCategories¶

FloatScaleScorerByCategory¶

FloatScaleThresholdScorer¶

get_chat_target¶

GandalfScorer¶

HarmHumanLabeledEntry¶

HarmScorerEvaluator¶

HarmScorerMetrics¶

get_harm_definition¶

HumanLabeledDataset¶

from_csv¶

get_harm_definition¶

validate¶

HumanLabeledEntry¶

InsecureCodeScorer¶

from_harm_categories¶

JsonSchemaResponseHandler¶

parse¶

LDAPInjectionOutputScorer¶

LikertScale¶

from_yaml¶

LikertScaleEntry¶

LikertScaleEvalFiles¶

LikertScalePaths¶

load¶

MarkdownInjectionScorer¶

MethKeywordScorer¶

MetricsType¶

NerveAgentKeywordScorer¶

NumericRange¶

NumericRubric¶

from_yaml¶

ObjectiveHumanLabeledEntry¶

ObjectiveScorerEvaluator¶

ObjectiveScorerMetrics¶

OpenRedirectOutputScorer¶

PathTraversalOutputScorer¶

PlagiarismMetric¶

PlagiarismScorer¶

PromptShieldScorer¶

QuestionAnswerScorer¶

RefusalScorerPaths¶

RegexScorer¶

RegistryUpdateBehavior¶

ResponseHandler¶

parse¶

`create_conversation_scorer`¶

`find_objective_metrics_by_eval_hash`¶

`get_all_harm_metrics`¶

`get_all_objective_metrics`¶

`get_scorer_info`¶

`render_category_system_prompt`¶

`render_insecure_code_system_prompt`¶

`render_likert_system_prompt`¶

`render_scale_system_prompt`¶

`render_true_false_system_prompt`¶

`AnthraxKeywordScorer`¶

`AudioFloatScaleScorer`¶

`AudioTrueFalseScorer`¶

`AzureContentFilterScorer`¶

`evaluate_async`¶

`BatchScorer`¶

`score_responses_by_filters_async`¶

`CallableResponseHandler`¶

`parse`¶

`ContentClassifier`¶

`from_yaml`¶

`ContentClassifierCategory`¶

`ContentClassifierPaths`¶

`ConversationScorer`¶

`validate_return_scores`¶

`CredentialLeakScorer`¶

`DecodingScorer`¶

`FentanylKeywordScorer`¶

`FloatScaleScoreAggregator`¶

`FloatScaleScorer`¶

`get_scorer_metrics`¶

`validate_return_scores`¶

`FloatScaleScorerAllCategories`¶

`FloatScaleScorerByCategory`¶

`FloatScaleThresholdScorer`¶

`get_chat_target`¶

`GandalfScorer`¶

`HarmHumanLabeledEntry`¶

`HarmScorerEvaluator`¶

`HarmScorerMetrics`¶

`get_harm_definition`¶

`HumanLabeledDataset`¶

`from_csv`¶

`get_harm_definition`¶

`validate`¶

`HumanLabeledEntry`¶

`InsecureCodeScorer`¶

`from_harm_categories`¶

`JsonSchemaResponseHandler`¶

`parse`¶

`LDAPInjectionOutputScorer`¶

`LikertScale`¶

`from_yaml`¶

`LikertScaleEntry`¶

`LikertScaleEvalFiles`¶

`LikertScalePaths`¶

`load`¶

`MarkdownInjectionScorer`¶

`MethKeywordScorer`¶

`MetricsType`¶

`NerveAgentKeywordScorer`¶

`NumericRange`¶

`NumericRubric`¶

`from_yaml`¶

`ObjectiveHumanLabeledEntry`¶

`ObjectiveScorerEvaluator`¶

`ObjectiveScorerMetrics`¶

`OpenRedirectOutputScorer`¶

`PathTraversalOutputScorer`¶

`PlagiarismMetric`¶

`PlagiarismScorer`¶

`PromptShieldScorer`¶

`QuestionAnswerScorer`¶

`RefusalScorerPaths`¶

`RegexScorer`¶

`RegistryUpdateBehavior`¶

`ResponseHandler`¶

`parse`¶

`SQLInjectionOutputScorer`¶

`SSRFOutputScorer`¶