Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

pyrit.message_normalizer

Functionality to normalize messages into compatible formats for targets.

ChatMessageNormalizer

Bases: MessageListNormalizer[ChatMessage], MessageStringNormalizer

Normalizer that converts a list of Messages to a list of ChatMessages.

This normalizer handles both single-part and multipart messages:

Constructor Parameters:

ParameterTypeDescription
use_developer_roleboolIf True, translates “system” role to “developer” role. Defaults to False.
system_message_behaviorSystemMessageBehaviorHow to handle system messages. Defaults to “keep”. Defaults to 'keep'.

Methods:

normalize_async

normalize_async(messages: list[Message]) → list[ChatMessage]

Convert a list of Messages to a list of ChatMessages.

For single-piece text messages, content is a string. For multi-piece or non-text messages, content is a list of content dicts.

ParameterTypeDescription
messageslist[Message]The list of Message objects to normalize.

Returns:

Raises:

normalize_string_async

normalize_string_async(messages: list[Message]) → str

Convert a list of Messages to a JSON string representation.

This serializes the list of ChatMessages to JSON format.

ParameterTypeDescription
messageslist[Message]The list of Message objects to normalize.

Returns:

ConversationContextNormalizer

Bases: MessageStringNormalizer

Normalizer that formats conversation history as turn-based text.

This is the standard format used by attacks like Crescendo and TAP for including conversation context in adversarial chat prompts. The output format is:

Turn 1:
User: <content>
Assistant: <content>

Turn 2:
User: <content>
...

Methods:

normalize_string_async

normalize_string_async(messages: list[Message]) → str

Normalize a list of messages into a turn-based context string.

ParameterTypeDescription
messageslist[Message]The list of Message objects to normalize.

Returns:

Raises:

GenericSystemSquashNormalizer

Bases: MessageListNormalizer[Message]

Normalizer that combines the first system message with the first user message using generic instruction tags.

Methods:

normalize_async

normalize_async(messages: list[Message]) → list[Message]

Return messages with the first system message combined into the first user message.

The format uses generic instruction tags:

Instructions

{system_content}

{user_content}

ParameterTypeDescription
messageslist[Message]The list of messages to normalize.

Returns:

Raises:

HistorySquashNormalizer

Bases: MessageListNormalizer[Message]

Squashes a multi-turn conversation into a single user message.

Previous turns are formatted as labeled context and prepended to the latest message. Used by the normalization pipeline to adapt prompts for targets that do not support multi-turn conversations.

Methods:

normalize_async

normalize_async(messages: list[Message]) → list[Message]

Combine all messages into a single user message.

When there is only one message it is returned unchanged. Otherwise all prior turns are formatted as Role: content lines under a [Conversation History] header and the last message’s content appears under a [Current Message] header.

ParameterTypeDescription
messageslist[Message]The conversation messages to squash.

Returns:

Raises:

MessageListNormalizer

Bases: abc.ABC, Generic[T]

Abstract base class for normalizers that return a list of items.

Subclasses specify the type T (e.g., Message, ChatMessage) that the list contains. T must implement the DictConvertible protocol (have a to_dict() method).

Methods:

normalize_async

normalize_async(messages: list[Message]) → list[T]

Normalize the list of messages into a list of items.

ParameterTypeDescription
messageslist[Message]The list of Message objects to normalize.

Returns:

normalize_to_dicts_async

normalize_to_dicts_async(messages: list[Message]) → list[dict[str, Any]]

Normalize the list of messages into a list of dictionaries.

This method uses normalize_async and calls to_dict() on each item.

ParameterTypeDescription
messageslist[Message]The list of Message objects to normalize.

Returns:

MessageStringNormalizer

Bases: abc.ABC

Abstract base class for normalizers that return a string representation.

Use this for formatting messages into text for non-chat targets or context strings.

Methods:

normalize_string_async

normalize_string_async(messages: list[Message]) → str

Normalize the list of messages into a string representation.

ParameterTypeDescription
messageslist[Message]The list of Message objects to normalize.

Returns:

TokenizerTemplateNormalizer

Bases: MessageStringNormalizer

Enable application of the chat template stored in a Hugging Face tokenizer to a list of messages. For more details, see https://huggingface.co/docs/transformers/main/en/chat_templating.

Constructor Parameters:

ParameterTypeDescription
tokenizerPreTrainedTokenizerBaseA Hugging Face tokenizer with a chat template.
system_message_behaviorTokenizerSystemBehaviorHow to handle system messages. Options: - “keep”: Keep system messages as-is (default) - “squash”: Merge system into first user message - “ignore”: Drop system messages entirely - “developer”: Change system role to developer role Defaults to 'keep'.

Methods:

from_model

from_model(model_name_or_alias: str, token: Optional[str] = None, system_message_behavior: Optional[TokenizerSystemBehavior] = None) → TokenizerTemplateNormalizer

Create a normalizer from a model name or alias.

This factory method simplifies creating a normalizer by handling tokenizer loading automatically. Use aliases for common models or provide a full HuggingFace model path.

ParameterTypeDescription
model_name_or_aliasstrEither a full HuggingFace model name or an alias (e.g., ‘chatml’, ‘phi3’, ‘llama3’). See MODEL_ALIASES for available aliases.
tokenOptional[str]Optional HuggingFace token for gated models. If not provided, falls back to HUGGINGFACE_TOKEN environment variable. Defaults to None.
system_message_behaviorOptional[TokenizerSystemBehavior]Override how to handle system messages. If not provided, uses the model’s default config. Defaults to None.

Returns:

Raises:

normalize_string_async

normalize_string_async(messages: list[Message]) → str

Apply the chat template stored in the tokenizer to a list of messages.

Handles system messages based on the configured system_message_behavior:

ParameterTypeDescription
messageslist[Message]A list of Message objects.

Returns: