Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

pyrit.score

Scoring functionality for evaluating AI model responses across various dimensions including harm detection, objective completion, and content classification.

Functions

create_conversation_scorer

create_conversation_scorer(scorer: Scorer, validator: ScorerPromptValidator | None = None) → Scorer

Create a ConversationScorer that inherits from the same type as the wrapped scorer.

This factory dynamically creates a ConversationScorer class that inherits from the wrapped scorer’s base class (FloatScaleScorer or TrueFalseScorer), ensuring the returned scorer is an instance of both ConversationScorer and the wrapped scorer’s type.

ParameterTypeDescription
scorerScorerThe scorer to wrap for conversation-level evaluation. Must be an instance of FloatScaleScorer or TrueFalseScorer.
validator`ScorerPromptValidatorNone`

Returns:

Raises:

find_objective_metrics_by_eval_hash

find_objective_metrics_by_eval_hash(eval_hash: str, file_path: Path | None = None) → ObjectiveScorerMetrics | None

Find objective scorer metrics by evaluation hash.

ParameterTypeDescription
eval_hashstrThe scorer evaluation hash to search for.
file_path`PathNone`

Returns:

get_all_harm_metrics

get_all_harm_metrics(harm_category: str) → list[ScorerMetricsWithIdentity[HarmScorerMetrics]]

Load all harm scorer metrics for a specific harm category.

Returns a list of ScorerMetricsWithIdentity[HarmScorerMetrics] objects that wrap the scorer’s identity information and its performance metrics, enabling clean attribute access like entry.metrics.mean_absolute_error or entry.metrics.harm_category.

ParameterTypeDescription
harm_categorystrThe harm category to load metrics for (e.g., “hate_speech”, “violence”).

Returns:

get_all_objective_metrics

get_all_objective_metrics(file_path: Path | None = None) → list[ScorerMetricsWithIdentity[ObjectiveScorerMetrics]]

Load all objective scorer metrics with full scorer identity for comparison.

Returns a list of ScorerMetricsWithIdentity[ObjectiveScorerMetrics] objects that wrap the scorer’s identity information and its performance metrics, enabling clean attribute access like entry.metrics.accuracy or entry.metrics.f1_score.

ParameterTypeDescription
file_path`PathNone`

Returns:

get_scorer_info

get_scorer_info() → list[_ScorerInfo]

Retrieve metadata for every public, concrete scorer exported from pyrit.score.

Iterates the package’s public API, keeps concrete subclasses of TrueFalseScorer or FloatScaleScorer, and records each scorer’s return type and whether it uses a generative chat target. Abstract bases and non-scorer exports are skipped.

This is a temporary helper used only to render the documentation’s scorer reference table; see _ScorerInfo for why it should not be built upon.

Returns:

AnthraxKeywordScorer

Bases: RegexScorer

Detect anthrax / B. anthracis weaponisation keywords in model output.

Keywords derived from the 0DIN JEF <https://github.com/0din-ai/0din-JEF>_ CBRN scoring module.

AudioFloatScaleScorer

Bases: FloatScaleScorer

A scorer that processes audio files by transcribing them and scoring the transcript.

The AudioFloatScaleScorer transcribes audio to text using Azure Speech-to-Text, then scores the transcript using a FloatScaleScorer.

Constructor Parameters:

ParameterTypeDescription
text_capable_scorerFloatScaleScorerA FloatScaleScorer capable of processing text. This scorer will be used to evaluate the transcribed audio content.
validator`ScorerPromptValidatorNone`

AudioTrueFalseScorer

Bases: TrueFalseScorer

A scorer that processes audio files by transcribing them and scoring the transcript.

The AudioTrueFalseScorer transcribes audio to text using Azure Speech-to-Text, then scores the transcript using a TrueFalseScorer.

Constructor Parameters:

ParameterTypeDescription
text_capable_scorerTrueFalseScorerA TrueFalseScorer capable of processing text. This scorer will be used to evaluate the transcribed audio content.
validator`ScorerPromptValidatorNone`

AzureContentFilterScorer

Bases: FloatScaleScorer

A scorer that uses Azure Content Safety API to evaluate text and images for harmful content.

This scorer analyzes content across multiple harm categories (hate, self-harm, sexual, violence) and returns a score for each category in the range [0, 1], where higher scores indicate more severe content. Supports both text and image inputs.

Constructor Parameters:

ParameterTypeDescription
endpoint`strNone
api_key`strCallable[[], str
harm_categories`list[TextCategory]None`
validator`ScorerPromptValidatorNone`

Methods:

evaluate_async

evaluate_async(file_mapping: Optional[ScorerEvalDatasetFiles] = None, num_scorer_trials: int = 3, update_registry_behavior: RegistryUpdateBehavior | None = None, max_concurrency: int = 10) → Optional[ScorerMetrics]

Evaluate this scorer against human-labeled datasets.

AzureContentFilterScorer requires exactly one harm category to be configured for evaluation. This ensures each score corresponds to exactly one category in the ground truth dataset.

ParameterTypeDescription
file_mappingOptional[ScorerEvalDatasetFiles]Optional ScorerEvalDatasetFiles configuration. If not provided, uses the mapping based on the configured harm category. Defaults to None.
num_scorer_trialsintNumber of times to score each response. Defaults to 3. Defaults to 3.
update_registry_behavior`RegistryUpdateBehaviorNone`
max_concurrencyintMaximum concurrent scoring requests. Defaults to 10. Defaults to 10.

Returns:

Raises:

BatchScorer

A utility class for scoring prompts in batches in a parallelizable and convenient way.

This class provides functionality to score existing prompts stored in memory without any target interaction, making it a pure scoring utility.

Constructor Parameters:

ParameterTypeDescription
batch_sizeintThe (max) batch size for sending prompts. Defaults to 10. Note: If using a scorer that takes a prompt target, and providing max requests per minute on the target, this should be set to 1 to ensure proper rate limit management. Defaults to 10.

Methods:

score_responses_by_filters_async

score_responses_by_filters_async(scorer: Scorer, attack_id: str | uuid.UUID | None = None, conversation_id: str | uuid.UUID | None = None, prompt_ids: list[str] | list[uuid.UUID] | None = None, labels: dict[str, str] | None = None, sent_after: datetime | None = None, sent_before: datetime | None = None, original_values: list[str] | None = None, converted_values: list[str] | None = None, data_type: str | None = None, not_data_type: str | None = None, converted_value_sha256: list[str] | None = None, objective: str = '') → list[Score]

Score the responses that match the specified filters.

ParameterTypeDescription
scorerScorerThe Scorer object to use for scoring.
attack_id`struuid.UUID
conversation_id`struuid.UUID
prompt_ids`list[str]list[uuid.UUID]
labels`dict[str, str]None`
sent_after`datetimeNone`
sent_before`datetimeNone`
original_values`list[str]None`
converted_values`list[str]None`
data_type`strNone`
not_data_type`strNone`
converted_value_sha256`list[str]None`
objectivestrA task is used to give the scorer more context on what exactly to score. A task might be the request prompt text or the original attack model’s objective. Note: the same task is applied to all matched prompts. Defaults to an empty string. Defaults to ''.

Returns:

Raises:

PrettyScorerMemoryPrinter

Bases: PrettyScorerPrinter

Framework pretty printer for scorer information.

Implements metrics fetching via the scorer evaluation registry (deferred import). All formatting logic lives in PrettyScorerPrinter.

Methods:

render_async

render_async(scorer_identifier: ComponentIdentifier, harm_category: str | None = None) → str

Render scorer information and return it as a string.

ParameterTypeDescription
scorer_identifierComponentIdentifierThe scorer identifier.
harm_category`strNone`

Returns:

ContentClassifierPaths

Bases: enum.Enum

Paths to content classifier YAML files.

ConversationScorer

Bases: Scorer, ABC

Scorer that evaluates entire conversation history rather than individual messages.

This scorer wraps another scorer (FloatScaleScorer or TrueFalseScorer) and evaluates the full conversation context. Useful for multi-turn conversations where context matters (e.g., psychosocial harms that emerge over time or persuasion/deception over many messages).

The ConversationScorer dynamically inherits from the same base class as the wrapped scorer, ensuring proper type compatibility.

Note: This class cannot be instantiated directly. Use create_conversation_scorer() factory instead.

Methods:

validate_return_scores

validate_return_scores(scores: list[Score]) → None

Validate scores by delegating to the wrapped scorer’s validation.

ParameterTypeDescription
scoreslist[Score]The scores to validate.

CredentialLeakScorer

Bases: RegexScorer

A scorer that detects leaked credentials, API keys, and secrets in text responses.

Uses regex pattern matching to identify common credential formats (AWS keys, GitHub tokens, private keys, JWTs, connection strings, etc.) without requiring an LLM call. Returns True if any credential pattern is found in the response.

Constructor Parameters:

ParameterTypeDescription
patterns`dict[str, str]None`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.

DecodingScorer

Bases: TrueFalseScorer

Scorer that checks if the request values are in the output using a text matching strategy.

This scorer checks if any of the user request values (original_value, converted_value, or metadata decoded_text) match the response converted_value using the configured text matching strategy.

Constructor Parameters:

ParameterTypeDescription
text_matcher`TextMatchingNone`
categories`list[str]None`
aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.
validator`ScorerPromptValidatorNone`

FentanylKeywordScorer

Bases: RegexScorer

Detect fentanyl synthesis keywords in model output.

Keywords derived from the 0DIN JEF <https://github.com/0din-ai/0din-JEF>_ illicit-substances scoring module.

FloatScaleScoreAggregator

Namespace for float scale score aggregators that return a single aggregated score.

All aggregators return a list containing one ScoreAggregatorResult that combines all input scores together, preserving all categories.

FloatScaleScorer

Bases: Scorer

Base class for scorers that return floating-point scores in the range [0, 1].

This scorer evaluates prompt responses and returns numeric scores indicating the degree to which a response exhibits certain characteristics. Each piece in a request response is scored independently, returning one score per piece.

Default error / blocked behavior

When no supported pieces remain after validator filtering (e.g. the response is blocked, has another error type, or no piece matches the scorer’s supported data types), the base score_async invokes _build_fallback_score and returns a single Score with value 0.0. The rationale distinguishes blocked / error / filtered cases. This mirrors TrueFalseScorer’s False default so that downstream consumers (attack strategies, threshold wrappers) get a consistent, “attack did not succeed” value without each call site needing special-cased error handling. Subclasses that need different semantics (e.g. a refusal-style “blocked = True”) should override _score_piece_async or _build_fallback_score.

Constructor Parameters:

ParameterTypeDescription
validatorScorerPromptValidatorA validator object used to validate scores.
chat_target`PromptTargetNone`

Methods:

get_scorer_metrics

get_scorer_metrics() → Optional[HarmScorerMetrics]

Get evaluation metrics for this scorer from the configured evaluation result file.

Returns:

validate_return_scores

validate_return_scores(scores: list[Score]) → None

Validate that the returned scores are within the valid range [0, 1].

Raises:

FloatScaleScorerAllCategories

Namespace for float scale score aggregators that combine all categories.

These aggregators ignore category boundaries and aggregate all scores together, returning a single ScoreAggregatorResult with all categories combined.

FloatScaleScorerByCategory

Namespace for float scale score aggregators that group by category.

These aggregators return multiple ScoreAggregatorResult objects (one per category). Useful for scorers like AzureContentFilterScorer that return multiple scores per item.

FloatScaleThresholdScorer

Bases: TrueFalseScorer

A scorer that applies a threshold to a float scale score to make it a true/false score.

Constructor Parameters:

ParameterTypeDescription
scorerFloatScaleScorerThe underlying float scale scorer to use.
thresholdfloatThe threshold value between 0 and 1. Scores >= threshold are True, otherwise False.
float_scale_aggregatorFloatScaleAggregatorFuncThe aggregator function to use for combining multiple float scale scores. Defaults to FloatScaleScoreAggregator.MAX. Defaults to FloatScaleScoreAggregator.MAX.

Methods:

get_chat_target

get_chat_target() → Optional[PromptTarget]

Delegate to the wrapped scorer.

Returns:

GandalfScorer

Bases: TrueFalseScorer

A scorer for evaluating responses in Gandalf challenges.

This scorer checks whether a password or secret has been successfully extracted from a Gandalf-style challenge by submitting the response to the Gandalf API and checking if the password is correct. Returns True if the password was successfully extracted.

Constructor Parameters:

ParameterTypeDescription
levelGandalfLevelThe Gandalf challenge level to score against.
chat_targetPromptTargetThe chat target used for password extraction.
validator`ScorerPromptValidatorNone`
score_aggregatorTrueFalseAggregatorFuncAggregator for combining scores. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.

HarmHumanLabeledEntry

Bases: HumanLabeledEntry

A class that represents a human-labeled dataset entry for a specific harm category. This class includes the Messages and a list of human scores, which are floats between 0.0 and 1.0 inclusive, representing the degree of harm severity where 0.0 is minimal and 1.0 is maximal. The harm category is a string that represents the type of harm (e.g., “hate_speech”, “misinformation”, etc.).

HarmScorerEvaluator

Bases: ScorerEvaluator

A class that evaluates a harm scorer against HumanLabeledDatasets of type HARM.

HarmScorerMetrics

Bases: ScorerMetrics

Metrics for evaluating a harm scorer against a HumanLabeledDataset.

Methods:

get_harm_definition

get_harm_definition() → HarmDefinition | None

Load and return the HarmDefinition object for this metrics instance.

Loads the harm definition YAML file specified in harm_definition and returns it as a HarmDefinition object. The result is cached after the first load.

Returns:

Raises:

HumanLabeledDataset

A class that represents a human-labeled dataset, including the entries and each of their corresponding human scores. This dataset is used to evaluate PyRIT scorer performance via the ScorerEvaluator class. HumanLabeledDatasets can be constructed from a CSV file.

Constructor Parameters:

ParameterTypeDescription
namestrThe name of the human-labeled dataset. For datasets of uniform type, this is often the harm category (e.g. hate_speech) or objective. It will be used in the naming of metrics (JSON) and model scores (CSV) files when evaluation is run on this dataset.
entrieslist[HumanLabeledEntry]A list of entries in the dataset.
metrics_typeMetricsTypeThe type of the human-labeled dataset, either HARM or OBJECTIVE.
versionstrThe version of the human-labeled dataset.
harm_definitionstrPath to the harm definition YAML file for HARM datasets. Defaults to None.
harm_definition_versionstrVersion of the harm definition YAML file. Used to ensure the human labels match the scoring criteria version. Defaults to None.

Methods:

from_csv

from_csv(csv_path: str | Path, metrics_type: MetricsType, dataset_name: str | None = None, version: str | None = None, harm_definition: str | None = None, harm_definition_version: str | None = None) → HumanLabeledDataset

Load a human-labeled dataset from a CSV file with standard column names.

Expected CSV format:

You can optionally include a # comment line at the top of the CSV file to specify the dataset version and harm definition path. The format is:

ParameterTypeDescription
csv_path`strPath`
metrics_typeMetricsTypeThe type of the human-labeled dataset, either HARM or OBJECTIVE.
dataset_name(str, Optional)The name of the dataset. If not provided, it will be inferred from the CSV file name. Defaults to None.
version(str, Optional)The version of the dataset. If not provided here, it will be inferred from the CSV file if a dataset_version comment line is present. Defaults to None.
harm_definition(str, Optional)Path to the harm definition YAML file. If not provided here, it will be inferred from the CSV file if a harm_definition comment is present. Defaults to None.
harm_definition_version(str, Optional)Version of the harm definition YAML file. If not provided here, it will be inferred from the CSV file if a harm_definition_version comment is present. Defaults to None.

Returns:

Raises:

get_harm_definition

get_harm_definition() → Optional[HarmDefinition]

Load and return the HarmDefinition object for this dataset.

For HARM datasets, this loads the harm definition YAML file specified in harm_definition and returns it as a HarmDefinition object. The result is cached after the first load.

Returns:

Raises:

validate

validate() → None

Validate that the dataset is internally consistent.

Checks that all entries match the dataset’s metrics_type and, for HARM datasets, that all entries have the same harm_category, that harm_definition is specified, and that the harm definition file exists and is loadable.

Raises:

HumanLabeledEntry

A class that represents an entry in a dataset of assistant responses that have been scored by humans. It is used to evaluate PyRIT scorer performance as measured by degree of alignment with human labels. This class includes the Messages and a list of human-assigned scores, which are floats between 0.0 and 1.0 inclusive (representing degree of severity) for harm datasets, and booleans for objective datasets.

InsecureCodeScorer

Bases: FloatScaleScorer

A scorer that uses an LLM to evaluate code snippets for potential security vulnerabilities.

This scorer is intended for generated-code evaluation scenarios where the response to score is source code or a code-like snippet, such as insecure-coding parity checks across vulnerability scanners. Configuration is loaded from a YAML file for dynamic prompts and instructions.

Constructor Parameters:

ParameterTypeDescription
chat_targetPromptTargetThe target to use for scoring code security.
system_prompt_path`strPath
validator`ScorerPromptValidatorNone`

LikertScaleEvalFiles

Configuration for evaluating a Likert scale scorer on a set of dataset files.

LikertScalePaths

Bases: enum.Enum

Enum containing Likert scale configurations including YAML paths and evaluation file mappings.

Each enum value is a tuple of (yaml_path, evaluation_files) where:

MarkdownInjectionScorer

Bases: TrueFalseScorer

A scorer that detects markdown injection attempts in text responses.

This scorer checks for the presence of markdown syntax patterns that could be used for injection attacks, such as links, images, or other markdown constructs that might be exploited. Returns True if markdown injection is detected.

Constructor Parameters:

ParameterTypeDescription
validator`ScorerPromptValidatorNone`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.

MethKeywordScorer

Bases: RegexScorer

Detect methamphetamine synthesis keywords in model output.

Keywords derived from the 0DIN JEF <https://github.com/0din-ai/0din-JEF>_ illicit-substances scoring module.

MetricsType

Bases: Enum

Enum representing the type of metrics when evaluating scorers on human-labeled datasets.

NerveAgentKeywordScorer

Bases: RegexScorer

Detect nerve-agent synthesis keywords in model output.

Keywords derived from the 0DIN JEF <https://github.com/0din-ai/0din-JEF>_ CBRN scoring module.

ObjectiveHumanLabeledEntry

Bases: HumanLabeledEntry

A class that represents a human-labeled dataset entry for a specific objective. This class includes the Messages and a list of human scores, which are booleans indicating whether the response/conversation meets the objective (e.g., 0 for not meeting the objective, 1 for meeting the objective). The objective is a string that represents the objective (e.g., "how to make a Molotov cocktail?).

ObjectiveScorerEvaluator

Bases: ScorerEvaluator

A class that evaluates an objective scorer against HumanLabeledDatasets of type OBJECTIVE.

ObjectiveScorerMetrics

Bases: ScorerMetrics

Metrics for evaluating an objective scorer against a HumanLabeledDataset.

PathTraversalOutputScorer

Bases: RegexScorer

A scorer that detects path-traversal payloads aimed at sensitive system files.

Maps to OWASP LLM02 (Insecure Output Handling): a model that emits a path like ../../etc/passwd is providing a payload an agent or downstream component could naively pass to a file-read tool. To minimize false positives the default pattern requires both a multi-segment ../ walk and a known-sensitive target (Unix password/shadow files, windows\system32, or proc/self).

Returns True if any pattern matches.

Constructor Parameters:

ParameterTypeDescription
patterns`dict[str, str]None`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.

PlagiarismMetric

Bases: Enum

Enum representing different plagiarism detection metrics.

PlagiarismScorer

Bases: FloatScaleScorer

A scorer that measures plagiarism by computing word-level similarity between the AI response and a reference text.

This scorer implements three similarity metrics:

  1. Word-level longest common subsequence (LCS)

  2. Word-level Levenshtein similarity

  3. Word-level n-gram Jaccard similarity

Constructor Parameters:

ParameterTypeDescription
reference_textstrThe reference text to compare against.
metricPlagiarismMetricThe plagiarism detection metric to use. Defaults to PlagiarismMetric.LCS. Defaults to PlagiarismMetric.LCS.
nintThe n-gram size for n-gram similarity. Defaults to 5. Defaults to 5.
validator`ScorerPromptValidatorNone`

PromptShieldScorer

Bases: TrueFalseScorer

Returns true if an attack or jailbreak has been detected by Prompt Shield.

Constructor Parameters:

ParameterTypeDescription
prompt_shield_targetPromptShieldTargetThe Prompt Shield target to use for scoring.
validator`ScorerPromptValidatorNone`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.

QuestionAnswerScorer

Bases: TrueFalseScorer

A class that represents a question answering scorer.

Constructor Parameters:

ParameterTypeDescription
correct_answer_matching_patternslist[str]A list of patterns to check for in the response. If any pattern is found in the response, the score will be True. These patterns should be format strings that will be formatted with the correct answer metadata. Defaults to CORRECT_ANSWER_MATCHING_PATTERNS. Defaults to CORRECT_ANSWER_MATCHING_PATTERNS.
category`list[str]None`
validator`ScorerPromptValidatorNone`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.

RefusalScorerPaths

Bases: enum.Enum

Paths to refusal scorer system prompt YAML files.

Each enum value represents a different refusal detection strategy, organized along two dimensions:

Objective dimension (whether an explicit conversation_objective is required):

Strictness dimension (how deflection/redirection is classified):

RegexScorer

Bases: TrueFalseScorer

A scorer that evaluates text against a set of named regex patterns.

Returns True if any pattern matches. Subclass and provide a default pattern set to create domain-specific scorers (e.g., credential detection, PII).

Constructor Parameters:

ParameterTypeDescription
patternsdict[str, str]A mapping of pattern names to regex strings.
categories`list[str]None`
validator`ScorerPromptValidatorNone`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.

RegistryUpdateBehavior

Bases: Enum

Enum representing how the evaluation registry should be updated.

SQLInjectionOutputScorer

Bases: RegexScorer

A scorer that detects SQL injection payloads emitted in LLM responses.

Maps to OWASP LLM02 (Insecure Output Handling): if an agent passes a model-authored string into a query builder without parameterization, payloads like UNION SELECT exfiltrators or destructive ;DROP statements can land in the database. This scorer flags the canonical payload shapes without an LLM call.

Returns True if any pattern matches.

Constructor Parameters:

ParameterTypeDescription
patterns`dict[str, str]None`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.

Scorer

Bases: Identifiable, abc.ABC

Abstract base class for scorers.

Subclasses must use the keyword-only constructor shape (def __init__(self, *, ...)); the contract is enforced at class definition time via enforce_keyword_only_init. See .github/instructions/scorers.instructions.md for the full contract.

Constructor Parameters:

ParameterTypeDescription
validatorScorerPromptValidatorValidator for message pieces and scorer configuration.
chat_target`PromptTargetNone`

Methods:

evaluate_async

evaluate_async(file_mapping: ScorerEvalDatasetFiles | None = None, num_scorer_trials: int = 3, update_registry_behavior: RegistryUpdateBehavior | None = None, max_concurrency: int = 10) → ScorerMetrics | None

Evaluate this scorer against human-labeled datasets.

Uses file mapping to determine which datasets to evaluate and how to aggregate results.

ParameterTypeDescription
file_mapping`ScorerEvalDatasetFilesNone`
num_scorer_trialsintNumber of times to score each response (for measuring variance). Defaults to 3. Defaults to 3.
update_registry_behavior`RegistryUpdateBehaviorNone`
max_concurrencyintMaximum number of concurrent scoring requests. Defaults to 10. Defaults to 10.

Returns:

Raises:

get_chat_target

get_chat_target() → PromptTarget | None

Return the chat target used by this scorer, or None if it doesn’t use one.

Subclasses that wrap other scorers (e.g. inverters, composites) should override to delegate to their inner scorer(s).

Returns:

get_identifier

get_identifier() → ComponentIdentifier

Get the scorer’s identifier with eval_hash always attached.

Overrides the base Identifiable.get_identifier() so that to_dict() always emits the eval_hash key.

Returns:

get_scorer_metrics

get_scorer_metrics() → ScorerMetrics | None

Get evaluation metrics for this scorer from the configured evaluation result file.

Looks up metrics by this scorer’s identity hash in the JSONL result file. The result file may contain entries for multiple scorer configurations.

Subclasses must implement this to return the appropriate metrics type:

Returns:

scale_value_float

scale_value_float(value: float, min_value: float, max_value: float) → float

Scales a value from 0 to 1 based on the given min and max values. E.g. 3 stars out of 5 stars would be .5.

ParameterTypeDescription
valuefloatThe value to be scaled.
min_valuefloatThe minimum value of the range.
max_valuefloatThe maximum value of the range.

Returns:

score_async

score_async(message: Message, objective: str | None = None, role_filter: ChatMessageRole | None = None, skip_on_error_result: bool = False, infer_objective_from_request: bool = False) → list[Score]

Score the message, add the results to the database, and return a list of Score objects.

ParameterTypeDescription
messageMessageThe message to be scored.
objective`strNone`
role_filter`ChatMessageRoleNone`
skip_on_error_resultboolIf True, skip scoring if the message contains an error. When self.score_blocked_content is also True, blocked responses with partial content will still be scored instead of skipping. Defaults to False. Defaults to False.
infer_objective_from_requestboolIf True, infer the objective from the message’s previous request when objective is not provided. Defaults to False. Defaults to False.

Returns:

Raises:

score_image_async

score_image_async(image_path: str, objective: str | None = None) → list[Score]

Score the given image using the chat target.

ParameterTypeDescription
image_pathstrThe path to the image file to be scored.
objective`strNone`

Returns:

score_image_batch_async

score_image_batch_async(image_paths: Sequence[str], objectives: Sequence[str] | None = None, batch_size: int = 10) → list[Score]

Score a batch of images asynchronously.

ParameterTypeDescription
image_pathsSequence[str]Sequence of paths to image files to be scored.
objectives`Sequence[str]None`
batch_sizeintMaximum number of images to score concurrently. Defaults to 10. Defaults to 10.

Returns:

Raises:

score_prompts_batch_async

score_prompts_batch_async(messages: Sequence[Message], objectives: Sequence[str] | None = None, batch_size: int = 10, role_filter: ChatMessageRole | None = None, skip_on_error_result: bool = False, infer_objective_from_request: bool = False) → list[Score]

Score multiple prompts in batches using the provided objectives.

ParameterTypeDescription
messagesSequence[Message]The messages to be scored.
objectivesSequence[str]The objectives/tasks based on which the prompts should be scored. Must have the same length as messages. Defaults to None.
batch_sizeintThe maximum batch size for processing prompts. Defaults to 10. Defaults to 10.
role_filter`ChatMessageRoleNone`
skip_on_error_resultboolIf True, skip scoring pieces that have errors. Defaults to False. Defaults to False.
infer_objective_from_requestboolIf True and objective is empty, attempt to infer the objective from the request. Defaults to False. Defaults to False.

Returns:

Raises:

score_response_async

score_response_async(response: Message, objective_scorer: Scorer | None = None, auxiliary_scorers: list[Scorer] | None = None, role_filter: ChatMessageRole = 'assistant', objective: str | None = None, skip_on_error_result: bool = True) → dict[str, list[Score]]

Score a response using an objective scorer and optional auxiliary scorers.

ParameterTypeDescription
responseMessageResponse containing pieces to score.
objective_scorer`ScorerNone`
auxiliary_scorers`list[Scorer]None`
role_filterChatMessageRoleOnly score pieces with this exact stored role. Defaults to “assistant” (real responses only, not simulated). Defaults to 'assistant'.
objective`strNone`
skip_on_error_resultboolIf True, skip scoring pieces that have errors. Defaults to True. Defaults to True.

Returns:

Raises:

score_response_multiple_scorers_async

score_response_multiple_scorers_async(response: Message, scorers: list[Scorer], role_filter: ChatMessageRole = 'assistant', objective: str | None = None, skip_on_error_result: bool = True) → list[Score]

Score a response using multiple scorers in parallel.

This method applies each scorer to the first scorable response piece (filtered by role and error), and returns all scores. This is typically used for auxiliary scoring where all results are needed.

ParameterTypeDescription
responseMessageThe response containing pieces to score.
scorerslist[Scorer]List of scorers to apply.
role_filterChatMessageRoleOnly score pieces with this exact stored role. Defaults to “assistant” (real responses only, not simulated). Defaults to 'assistant'.
objective`strNone`
skip_on_error_resultboolIf True, skip scoring pieces that have errors (default: True). Defaults to True.

Returns:

score_text_async

score_text_async(text: str, objective: str | None = None) → list[Score]

Scores the given text based on the task using the chat target.

ParameterTypeDescription
textstrThe text to be scored.
objective`strNone`

Returns:

validate_return_scores

validate_return_scores(scores: list[Score]) → None

Validate the scores returned by the scorer. Because some scorers may require specific Score types or values.

ParameterTypeDescription
scoreslist[Score]The scores to be validated.

ScorerEvalDatasetFiles

Configuration for evaluating a scorer on a set of dataset files.

Maps input dataset files (via glob patterns) to an output result file. Multiple files matching the patterns will be concatenated before evaluation.

ScorerEvaluator

Bases: abc.ABC

A class that evaluates an LLM scorer against HumanLabeledDatasets, calculating appropriate metrics and saving them to a file.

Constructor Parameters:

ParameterTypeDescription
scorerScorerThe scorer to evaluate.

Methods:

evaluate_dataset_async

evaluate_dataset_async(labeled_dataset: HumanLabeledDataset, num_scorer_trials: int = 1, max_concurrency: int = 10) → ScorerMetrics

Run the evaluation for the scorer/policy combination on the passed in HumanLabeledDataset.

This method performs pure computation without side effects (no file writing). It can be called directly with an in-memory HumanLabeledDataset for experiments that don’t use file-based datasets (e.g., iterative rubric tuning with custom splits).

ParameterTypeDescription
labeled_datasetHumanLabeledDatasetThe HumanLabeledDataset to evaluate the scorer against.
num_scorer_trialsintThe number of trials to run the scorer on all responses. Defaults to 1.
max_concurrencyintMaximum number of concurrent scoring requests. Defaults to 10. Defaults to 10.

Returns:

Raises:

from_scorer

from_scorer(scorer: Scorer, metrics_type: MetricsType | None = None) → ScorerEvaluator

Create a ScorerEvaluator based on the type of scoring.

ParameterTypeDescription
scorerScorerThe scorer to evaluate.
metrics_typeMetricsTypeThe type of scoring, either HARM or OBJECTIVE. If not provided, it will default to OBJECTIVE for true/false scorers and HARM for all other scorers. Defaults to None.

Returns:

run_evaluation_async

run_evaluation_async(dataset_files: ScorerEvalDatasetFiles, num_scorer_trials: int = 3, update_registry_behavior: RegistryUpdateBehavior = RegistryUpdateBehavior.SKIP_IF_EXISTS, max_concurrency: int = 10) → ScorerMetrics | None

Evaluate scorer using dataset files configuration.

The update_registry_behavior parameter controls how existing registry entries are handled:

ParameterTypeDescription
dataset_filesScorerEvalDatasetFilesScorerEvalDatasetFiles configuration specifying glob patterns for input files and a result file name.
num_scorer_trialsintNumber of scoring trials per response. Defaults to 3. Defaults to 3.
update_registry_behaviorRegistryUpdateBehaviorControls how existing registry entries are handled. Defaults to RegistryUpdateBehavior.SKIP_IF_EXISTS. Defaults to RegistryUpdateBehavior.SKIP_IF_EXISTS.
max_concurrencyintMaximum number of concurrent scoring requests. Defaults to 10. Defaults to 10.

Returns:

Raises:

ScorerMetrics

Base dataclass for storing scorer evaluation metrics.

This class provides methods for serializing metrics to JSON strings (see to_json) and loading them from JSON files on disk (see from_json_file).

Methods:

from_json_file

from_json_file(file_path: str | Path) → T

Load a metrics instance from a JSON file on disk.

This is the canonical deserialization entry point for ScorerMetrics and its subclasses. It accepts a file path (string or Path), not a JSON string — the loader opens the file, unwraps a top-level "metrics" key if present (as used by evaluation result files), and filters out internal underscore-prefixed fields (e.g., cached init=False attributes) before constructing the instance.

ParameterTypeDescription
file_path`strPath`

Returns:

Raises:

to_json

to_json() → str

Serialize this metrics instance to a JSON string.

This is the canonical serialization entry point for ScorerMetrics and its subclasses. Pair it with from_json_file (which reads a JSON file written from this string, optionally wrapped in a "metrics" key) for round-trip (de)serialization.

Returns:

ScorerMetricsWithIdentity

Bases: Generic[M]

Wrapper that combines scorer metrics with the scorer’s identity information.

This class provides a clean interface for working with evaluation results, allowing access to both the scorer configuration and its performance metrics.

Generic over the metrics type M, so:

ScorerPrinterBase

Bases: PrinterBase

Abstract base class for printing scorer information.

Subclasses must implement _get_objective_metrics and _get_harm_metrics for data fetching, and write_async for rendering + writing.

Methods:

print_harm_scorer

print_harm_scorer(scorer_identifier: ComponentIdentifier, harm_category: str) → None

Use write_async instead. This method is deprecated.

ParameterTypeDescription
scorer_identifierComponentIdentifierThe scorer identifier.
harm_categorystrThe harm category.

print_objective_scorer

print_objective_scorer(scorer_identifier: ComponentIdentifier) → None

Use write_async instead. This method is deprecated.

ParameterTypeDescription
scorer_identifierComponentIdentifierThe scorer identifier.

render_async

render_async(scorer_identifier: ComponentIdentifier, harm_category: str | None = None) → str

Render scorer information and return it as a string.

Auto-detects scorer type: if harm_category is provided, renders harm metrics; otherwise renders objective metrics.

ParameterTypeDescription
scorer_identifierComponentIdentifierThe scorer identifier.
harm_category`strNone`

Returns:

ScorerPromptValidator

Validates message pieces and scorer configurations.

This class provides validation for scorer inputs, ensuring that message pieces meet required criteria such as data types, roles, and metadata requirements.

Constructor Parameters:

ParameterTypeDescription
supported_data_types`Sequence[PromptDataType]None`
required_metadata`Sequence[str]None`
supported_roles`Sequence[ChatMessageRole]None`
max_pieces_in_response`intNone`
max_text_length`intNone`
enforce_all_pieces_valid`boolNone`
raise_on_no_valid_pieces`boolNone`
is_objective_requiredboolWhether an objective must be provided for scoring. Defaults to False. Defaults to False.

Methods:

is_message_piece_supported

is_message_piece_supported(message_piece: MessagePiece) → bool

Check if a message piece is supported by this validator.

ParameterTypeDescription
message_pieceMessagePieceThe message piece to check.

Returns:

validate

validate(message: Message, objective: str | None) → None

Validate a message and objective against configured requirements.

ParameterTypeDescription
messageMessageThe message to validate.
objective`strNone`

Raises:

SelfAskCategoryScorer

Bases: TrueFalseScorer

A class that represents a self-ask score for text classification and scoring. Given a classifier file, it scores according to these categories and returns the category the MessagePiece fits best.

There is also a false category that is used if the MessagePiece does not fit any of the categories.

Constructor Parameters:

ParameterTypeDescription
chat_targetPromptTargetThe chat target to interact with.
content_classifier_path`strPath`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.
validator`ScorerPromptValidatorNone`

SelfAskGeneralFloatScaleScorer

Bases: FloatScaleScorer

A general-purpose self-ask float-scale scorer that uses a chat target and a configurable system prompt and prompt format. The final score is normalized to [0, 1].

Constructor Parameters:

ParameterTypeDescription
chat_targetPromptTargetThe chat target used to score. Must satisfy CHAT_TARGET_REQUIREMENTS (multi-turn + editable history capabilities, possibly via normalization-pipeline adaptation).
system_prompt_format_stringstrSystem prompt template with placeholders for objective, prompt, and message_piece.
prompt_format_string`strNone`
category`strNone`
min_valueintMinimum of the model’s native scale. Defaults to 0. Defaults to 0.
max_valueintMaximum of the model’s native scale. Defaults to 100. Defaults to 100.
validator`ScorerPromptValidatorNone`
score_value_output_keystrJSON key for the score value. Defaults to “score_value”. Defaults to 'score_value'.
rationale_output_keystrJSON key for the rationale. Defaults to “rationale”. Defaults to 'rationale'.
description_output_keystrJSON key for the description. Defaults to “description”. Defaults to 'description'.
metadata_output_keystrJSON key for the metadata. Defaults to “metadata”. Defaults to 'metadata'.
category_output_keystrJSON key for the category. Defaults to “category”. Defaults to 'category'.
response_json_schema`JsonSchemaDefinitionNone`

SelfAskGeneralTrueFalseScorer

Bases: TrueFalseScorer

A general-purpose self-ask True/False scorer that uses a chat target and a configurable system prompt and prompt format.

Constructor Parameters:

ParameterTypeDescription
chat_targetPromptTargetThe chat target used to score. Must satisfy CHAT_TARGET_REQUIREMENTS (multi-turn + editable history capabilities, possibly via normalization-pipeline adaptation).
system_prompt_format_stringstrSystem prompt template with placeholders for objective, task (alias of objective), prompt, and message_piece.
prompt_format_string`strNone`
category`strNone`
validator`ScorerPromptValidatorNone`
score_aggregatorTrueFalseAggregatorFuncAggregator for combining scores. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.
score_value_output_keystrJSON key for the score value. Defaults to “score_value”. Defaults to 'score_value'.
rationale_output_keystrJSON key for the rationale. Defaults to “rationale”. Defaults to 'rationale'.
description_output_keystrJSON key for the description. Defaults to “description”. Defaults to 'description'.
metadata_output_keystrJSON key for the metadata. Defaults to “metadata”. Defaults to 'metadata'.
category_output_keystrJSON key for the category. Defaults to “category”. Defaults to 'category'.
response_json_schema`JsonSchemaDefinitionNone`

SelfAskLikertScorer

Bases: FloatScaleScorer

A class that represents a “self-ask” score for text scoring based on a Likert scale. A Likert scale consists of ranked, ordered categories and is often on a 5 or 7 point basis, but you can configure a scale with any set of non-negative integer score values and descriptions by providing a custom YAML file.

Constructor Parameters:

ParameterTypeDescription
chat_targetPromptTargetThe chat target to use for scoring.
likert_scale`LikertScalePathsNone`
custom_likert_path`PathNone`
custom_system_prompt_path`PathNone`
validator`ScorerPromptValidatorNone`

SelfAskQuestionAnswerScorer

Bases: SelfAskTrueFalseScorer

A class that represents a self-ask question answering scorer.

Usually, QuestionAnswerScorer should be used, but this can be useful for when choices are not sent to objective target or you need more flexibility in determining if the questions were answered correctly.

Constructor Parameters:

ParameterTypeDescription
chat_targetPromptTargetThe chat target to use for the scorer. Must satisfy CHAT_TARGET_REQUIREMENTS (multi-turn + editable history capabilities, possibly via normalization-pipeline adaptation).
true_false_question_path`pathlib.PathNone`
validator`ScorerPromptValidatorNone`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.

SelfAskRefusalScorer

Bases: TrueFalseScorer

A self-ask scorer that detects refusal in AI responses.

This scorer uses a language model to determine whether a response contains a refusal to answer or comply with the given prompt. It’s useful for evaluating whether AI systems are appropriately refusing harmful requests.

The scorer supports four modes via RefusalScorerPaths:

Constructor Parameters:

ParameterTypeDescription
chat_targetPromptTargetThe chat target to use for the scorer. Must satisfy CHAT_TARGET_REQUIREMENTS (multi-turn + editable history capabilities, possibly via normalization-pipeline adaptation).
refusal_system_prompt_path`RefusalScorerPathsPath
prompt_format_string`strNone`
validator`ScorerPromptValidatorNone`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.

SelfAskScaleScorer

Bases: FloatScaleScorer

A class that represents a “self-ask” score for text scoring for a customizable numeric scale.

Constructor Parameters:

ParameterTypeDescription
chat_targetPromptTargetThe chat target to use for scoring.
scale_arguments_path`Pathstr
system_prompt_path`Pathstr
validator`ScorerPromptValidatorNone`

SelfAskTrueFalseScorer

Bases: TrueFalseScorer

A class that represents a self-ask true/false for scoring.

Given written descriptions of “true” and “false” (passed as a file or a TrueFalseQuestion), it returns the value that matches either description most closely.

If no descriptions are provided, it defaults to the TASK_ACHIEVED scorer.

Constructor Parameters:

ParameterTypeDescription
chat_targetPromptTargetThe chat target to use for the scorer. Must satisfy CHAT_TARGET_REQUIREMENTS (multi-turn + editable history capabilities, possibly via normalization-pipeline adaptation).
true_false_question_path`strPath
true_false_question`TrueFalseQuestionNone`
true_false_system_prompt_path`strPath
validator`ScorerPromptValidatorNone`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.

ShellCommandOutputScorer

Bases: RegexScorer

A scorer that detects dangerous shell payloads in LLM responses.

Maps to OWASP LLM02 (Insecure Output Handling): an LLM can be prompted to suggest, or directly emit, shell commands that a downstream agent or human operator might paste into a terminal. This scorer flags the high-blast-radius payload families — remote pipe-to-shell, destructive filesystem commands, reverse shells, and environment-variable exfiltration — without an LLM call.

Returns True if any pattern matches.

Constructor Parameters:

ParameterTypeDescription
patterns`dict[str, str]None`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.

StaticPromptInjectionScorer

Bases: RegexScorer

A scorer that detects prompt injection attempts in text using static regex patterns.

Covers OWASP LLM01 (Prompt Injection) sub-categories: instruction override, system prompt extraction, jailbreak role-play, constraint removal, chat template injection, and encoding-based evasion. Complements the API-based PromptShieldScorer as a fast, local, zero-dependency pre-filter.

Returns True if any prompt injection pattern is found in the text.

Constructor Parameters:

ParameterTypeDescription
patterns`dict[str, str]None`
categories`list[str]None`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.

SubStringScorer

Bases: TrueFalseScorer

Scorer that checks if a given substring is present in the text.

This scorer performs substring matching using a configurable text matching strategy. Supports both exact substring matching and approximate matching.

Constructor Parameters:

ParameterTypeDescription
substringstrThe substring to search for in the text.
text_matcher`TextMatchingNone`
categories`list[str]None`
aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.
validator`ScorerPromptValidatorNone`

TrueFalseAggregatorFunc

TrueFalseCompositeScorer

Bases: TrueFalseScorer

Composite true/false scorer that aggregates results from other true/false scorers.

This scorer invokes a collection of constituent TrueFalseScorer instances and reduces their single-score outputs into one final true/false score using the supplied aggregation function (e.g., TrueFalseScoreAggregator.AND, TrueFalseScoreAggregator.OR, TrueFalseScoreAggregator.MAJORITY).

Constructor Parameters:

ParameterTypeDescription
aggregatorTrueFalseAggregatorFuncAggregation function to combine child scores (e.g., TrueFalseScoreAggregator.AND, TrueFalseScoreAggregator.OR, TrueFalseScoreAggregator.MAJORITY).
scorerslist[TrueFalseScorer]The constituent true/false scorers to invoke.

Methods:

get_chat_target

get_chat_target() → Optional[PromptTarget]

Return the chat target from the first sub-scorer that has one.

TrueFalseInverterScorer

Bases: TrueFalseScorer

A scorer that inverts a true false score.

Constructor Parameters:

ParameterTypeDescription
scorerTrueFalseScorerThe underlying true/false scorer whose results will be inverted.
validator`ScorerPromptValidatorNone`

Methods:

get_chat_target

get_chat_target() → Optional[PromptTarget]

Delegate to the wrapped scorer.

Returns:

TrueFalseQuestion

A class that represents a true/false question.

This is sent to an LLM and can be used as an alternative to a yaml file from TrueFalseQuestionPaths.

Constructor Parameters:

ParameterTypeDescription
true_descriptionstrDescription of what constitutes a “true” response.
false_descriptionstrDescription of what constitutes a “false” response. Defaults to a generic description if not provided. Defaults to ''.
categorystrThe category of the question. Defaults to an empty string. Defaults to ''.
metadatastrAdditional metadata for context. Defaults to an empty string. Defaults to ''.

Methods:

get

get(key: str, default: Any = None) → Any

Return the value of the specified key, or default if absent.

TrueFalseQuestionPaths

Bases: enum.Enum

Paths to true/false question YAML files.

TrueFalseScoreAggregator

Namespace for true/false score aggregators that return a single aggregated score.

All aggregators return a list containing one ScoreAggregatorResult that combines all input scores together, preserving all categories.

TrueFalseScorer

Bases: Scorer

Base class for scorers that return true/false binary scores.

This scorer evaluates prompt responses and returns a single boolean score indicating whether the response meets a specific criterion. Multiple pieces in a request response are aggregated using a TrueFalseAggregatorFunc function (default: TrueFalseScoreAggregator.OR).

Default error / blocked behavior

When no supported pieces remain after validator filtering (e.g. the response is blocked, has another error type, or no piece matches the scorer’s supported data types), the base score_async invokes _build_fallback_score and returns a single Score(False) whose rationale distinguishes blocked / error / filtered cases. This mirrors FloatScaleScorer’s 0.0 default so that downstream consumers (attack strategies, threshold wrappers) get a consistent, “attack did not succeed” value without each call site needing special-cased error handling. Subclasses that need different semantics (e.g. SelfAskRefusalScorer, which returns True on blocked) should override _score_piece_async and accept the error data type in their validator.

Constructor Parameters:

ParameterTypeDescription
validatorScorerPromptValidatorCustom validator.
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.
chat_target`PromptTargetNone`

Methods:

get_scorer_metrics

get_scorer_metrics() → Optional[ObjectiveScorerMetrics]

Get evaluation metrics for this scorer from the configured evaluation result file.

Returns:

validate_return_scores

validate_return_scores(scores: list[Score]) → None

Validate the scores returned by the scorer.

ParameterTypeDescription
scoreslist[Score]The scores to be validated.

Raises:

VideoFloatScaleScorer

Bases: FloatScaleScorer

A scorer that processes videos by extracting frames and scoring them using a float scale image scorer.

The VideoFloatScaleScorer breaks down a video into frames and uses a float scale scoring mechanism. Frame scores are aggregated using a FloatScaleAggregatorFunc.

By default, uses FloatScaleScorerByCategory.MAX which groups scores by category (useful for scorers like AzureContentFilterScorer that return multiple scores per frame). This returns one aggregated score per category (e.g., one for “Hate”, one for “Violence”, etc.).

For scorers that return a single score per frame, or to combine all categories together, use FloatScaleScoreAggregator.MAX, FloatScaleScorerAllCategories.MAX, etc.

Optionally, an audio_scorer can be provided to also score the video’s audio track. When provided, the audio is extracted, transcribed, and scored. The audio scores are included in the aggregation.

Constructor Parameters:

ParameterTypeDescription
image_capable_scorerFloatScaleScorerA FloatScaleScorer capable of processing images.
audio_scorer`FloatScaleScorerNone`
num_sampled_frames`intNone`
validator`ScorerPromptValidatorNone`
score_aggregatorFloatScaleAggregatorFuncAggregator for combining frame scores. Defaults to FloatScaleScorerByCategory.MAX. Use FloatScaleScorerByCategory.MAX/AVERAGE/MIN for scorers that return multiple scores per frame (groups by category and returns one score per category). Use FloatScaleScorerAllCategories.MAX/AVERAGE/MIN to combine all scores regardless of category (returns single score with all categories combined). Use FloatScaleScoreAggregator.MAX/AVERAGE/MIN for simple aggregation preserving all categories (returns single score with all categories preserved). Defaults to FloatScaleScorerByCategory.MAX.
image_objective_template`strNone`
audio_objective_template`strNone`

VideoTrueFalseScorer

Bases: TrueFalseScorer

A scorer that processes videos by extracting frames and scoring them using a true/false image scorer.

Aggregation Logic (hard-coded): - Frame scores are aggregated using OR: if ANY frame meets the objective, the visual score is True. - When audio_scorer is provided, the final score uses AND: BOTH visual (frames) AND audio must be True for the overall video score to be True.

Constructor Parameters:

ParameterTypeDescription
image_capable_scorerTrueFalseScorerA TrueFalseScorer capable of processing images.
audio_scorer`TrueFalseScorerNone`
num_sampled_frames`intNone`
validator`ScorerPromptValidatorNone`
image_objective_template`strNone`
audio_objective_template`strNone`

XSSOutputScorer

Bases: RegexScorer

A scorer that detects cross-site scripting (XSS) payloads in LLM responses.

Maps to OWASP LLM02 (Insecure Output Handling): a model can be coaxed into emitting HTML/JS that an unwary downstream consumer (web view, markdown renderer, chat UI) will execute. This scorer flags the common payload families without requiring an LLM call, so it is cheap enough for batch evaluation and CI gates.

Returns True if any pattern matches.

Constructor Parameters:

ParameterTypeDescription
patterns`dict[str, str]None`
score_aggregatorTrueFalseAggregatorFuncThe aggregator function to use. Defaults to TrueFalseScoreAggregator.OR. Defaults to TrueFalseScoreAggregator.OR.