Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

pyrit.message_normalizer

Functionality to normalize messages into compatible formats for targets.

ChatMessageNormalizer

Bases: MessageListNormalizer[ChatMessage], MessageStringNormalizer

Normalizer that converts a list of Messages to a list of ChatMessages.

This normalizer handles both single-part and multipart messages:

Constructor Parameters:

ParameterTypeDescription
use_developer_roleboolIf True, translates “system” role to “developer” role. Defaults to False.
system_message_behaviorSystemMessageBehaviorHow to handle system messages. Defaults to “keep”. Defaults to 'keep'.

Methods:

normalize_async

normalize_async(messages: list[Message]) → list[ChatMessage]

Convert a list of Messages to a list of ChatMessages.

For single-piece text messages, content is a string. For multi-piece or non-text messages, content is a list of content dicts.

ParameterTypeDescription
messageslist[Message]The list of Message objects to normalize.

Returns:

Raises:

normalize_string_async

normalize_string_async(messages: list[Message]) → str

Convert a list of Messages to a JSON string representation.

This serializes the list of ChatMessages to JSON format.

ParameterTypeDescription
messageslist[Message]The list of Message objects to normalize.

Returns:

ConversationContextNormalizer

Bases: MessageStringNormalizer

Normalizer that formats conversation history as turn-based text.

This is the standard format used by attacks like Crescendo and TAP for including conversation context in adversarial chat prompts. The output format is:

Turn 1:
user: <content>
assistant: <content>

Turn 2:
user: <content>
...

Methods:

normalize_string_async

normalize_string_async(messages: list[Message]) → str

Normalize a list of messages into a turn-based context string.

ParameterTypeDescription
messageslist[Message]The list of Message objects to normalize.

Returns:

Raises:

GenericSystemSquashNormalizer

Bases: MessageListNormalizer[Message]

Normalizer that combines the first system message with the first user message using generic instruction tags.

Methods:

normalize_async

normalize_async(messages: list[Message]) → list[Message]

Return messages with the first system message combined into the first user message.

The format uses generic instruction tags:

Instructions

{system_content}

{user_content}

ParameterTypeDescription
messageslist[Message]The list of messages to normalize.

Returns:

Raises:

HistorySquashNormalizer

Bases: MessageListNormalizer[Message]

Squashes a multi-turn conversation into a single user message.

Previous turns are formatted as labeled context and prepended to the latest message. Used by the normalization pipeline to adapt prompts for targets that do not support multi-turn conversations.

Methods:

normalize_async

normalize_async(messages: list[Message]) → list[Message]

Combine all messages into a single user message.

When there is only one message it is returned unchanged. Otherwise all prior turns are formatted as Role: content lines under a [Conversation History] header and the last message’s content appears under a [Current Message] header.

ParameterTypeDescription
messageslist[Message]The conversation messages to squash.

Returns:

Raises:

JsonSchemaNormalizer

Bases: MessageListNormalizer[Message]

Adapts JSON-schema metadata for targets that cannot enforce it natively.

The conversation normalization pipeline invokes this normalizer when the JSON_SCHEMA capability is not natively supported by the prompt target. For every message piece carrying a schema in JSON_SCHEMA_METADATA_KEY:

The original schema metadata key is removed in both cases so downstream consumers do not attempt to enforce a schema the target cannot honor.

Callers that need a different prose wrapper around the schema (for example, to match a domain-specific style guide or to suppress a particular phrasing) can pass a custom schema_instructions_template to __init__. The template must contain a {schema_json} placeholder where the pretty-printed schema body is substituted.

Constructor Parameters:

ParameterTypeDescription
schema_instructions_templatestrA str.format-style template appended to text pieces. Must contain a {schema_json} placeholder, which is replaced with the pretty-printed JSON schema body. Defaults to DEFAULT_SCHEMA_INSTRUCTIONS_TEMPLATE. Defaults to DEFAULT_SCHEMA_INSTRUCTIONS_TEMPLATE.

Methods:

normalize_async

normalize_async(messages: list[Message]) → list[Message]

Return messages adapted for a target that does not support JSON schemas.

New pieces and messages are constructed so the input (and any persisted metadata) is never mutated in place. Pieces without the schema key are copied through unchanged.

ParameterTypeDescription
messageslist[Message]The conversation messages to adapt.

Returns:

MessageListNormalizer

Bases: abc.ABC, Generic[T]

Abstract base class for normalizers that return a list of items.

Subclasses specify the type T (e.g., Message, ChatMessage) that the list contains. T must implement the DictConvertible protocol (have a to_dict() method).

Methods:

normalize_async

normalize_async(messages: list[Message]) → list[T]

Normalize the list of messages into a list of items.

ParameterTypeDescription
messageslist[Message]The list of Message objects to normalize.

Returns:

normalize_to_dicts_async

normalize_to_dicts_async(messages: list[Message]) → list[dict[str, Any]]

Normalize the list of messages into a list of dictionaries.

This method uses normalize_async and calls to_dict() on each item.

ParameterTypeDescription
messageslist[Message]The list of Message objects to normalize.

Returns:

MessageStringNormalizer

Bases: abc.ABC

Abstract base class for normalizers that return a string representation.

Use this for formatting messages into text for non-chat targets or context strings.

Methods:

normalize_string_async

normalize_string_async(messages: list[Message]) → str

Normalize the list of messages into a string representation.

ParameterTypeDescription
messageslist[Message]The list of Message objects to normalize.

Returns:

TokenizerTemplateNormalizer

Bases: MessageStringNormalizer

Enable application of the chat template stored in a Hugging Face tokenizer to a list of messages. For more details, see https://huggingface.co/docs/transformers/main/en/chat_templating.

Constructor Parameters:

ParameterTypeDescription
tokenizerPreTrainedTokenizerBaseA Hugging Face tokenizer with a chat template.
system_message_behaviorTokenizerSystemBehaviorHow to handle system messages. Options: - “keep”: Keep system messages as-is (default) - “squash”: Merge system into first user message - “ignore”: Drop system messages entirely - “developer”: Change system role to developer role Defaults to 'keep'.

Methods:

from_model

from_model(model_name_or_alias: str, token: str | None = None, system_message_behavior: TokenizerSystemBehavior | None = None) → TokenizerTemplateNormalizer

Create a normalizer from a model name or alias.

This factory method simplifies creating a normalizer by handling tokenizer loading automatically. Use aliases for common models or provide a full HuggingFace model path.

ParameterTypeDescription
model_name_or_aliasstrEither a full HuggingFace model name or an alias (e.g., ‘chatml’, ‘phi3’, ‘llama3’). See MODEL_ALIASES for available aliases.
token`strNone`
system_message_behavior`TokenizerSystemBehaviorNone`

Returns:

Raises:

normalize_string_async

normalize_string_async(messages: list[Message]) → str

Apply the chat template stored in the tokenizer to a list of messages.

Handles system messages based on the configured system_message_behavior:

ParameterTypeDescription
messageslist[Message]A list of Message objects.

Returns: