Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

pyrit.memory

Provide functionality for storing and retrieving conversation history and embeddings.

This package defines the core MemoryInterface and concrete implementations for different storage backends.

Functions

data_serializer_factory

data_serializer_factory(data_type: PromptDataType, value: str | None = None, extension: str | None = None, category: AllowedCategories) → DataTypeSerializer

Create a DataTypeSerializer instance.

ParameterTypeDescription
data_typestrThe type of the data (e.g., ‘text’, ‘image_path’, ‘audio_path’).
valuestrThe data value to be serialized. Defaults to None.
extensionOptional[str]The file extension, if applicable. Defaults to None.
categoryAllowedCategoriesThe category or context for the data (e.g., ‘seed-prompt-entries’).

Returns:

Raises:

set_message_piece_sha256_async

set_message_piece_sha256_async(message_piece: MessagePiece) → None

Compute and assign SHA256 hash values for a message piece’s original and converted payloads.

Async because blob payloads may need to be fetched. Must be called explicitly after the message piece is constructed and its values are finalized.

ParameterTypeDescription
message_pieceMessagePieceThe message piece to populate with SHA256 values.

set_seed_sha256_async

set_seed_sha256_async(seed: Seed) → None

Compute and assign the SHA256 hash value for a seed’s value.

Should be called after the seed value is serialized to text, as file paths used in the value may have changed from local to memory storage paths. Async due to blob retrieval.

ParameterTypeDescription
seedSeedThe seed to populate with its SHA256 value.

AttackResultEntry

Bases: Base

Represents the attack result data in the database.

Constructor Parameters:

ParameterTypeDescription
entryAttackResultThe attack result object to convert into a database entry.

Methods:

filter_json_serializable_metadata

filter_json_serializable_metadata(metadata: dict[str, Any]) → dict[str, Any]

Filter a dictionary to only include JSON-serializable values.

This function iterates through the metadata dictionary and keeps only values that can be serialized to JSON, discarding any non-serializable objects.

ParameterTypeDescription
metadatadict[str, Any]Dictionary with potentially non-serializable values

Returns:

get_attack_result

get_attack_result() → AttackResult

Convert this database entry back into an AttackResult object.

Returns:

AudioPathDataTypeSerializer

Bases: DataTypeSerializer

Serializer for audio path values stored on disk.

Constructor Parameters:

ParameterTypeDescription
categorystrData category folder name.
prompt_textOptional[str]Optional existing audio path. Defaults to None.
extensionOptional[str]Optional audio extension. Defaults to None.

Methods:

data_on_disk

data_on_disk() → bool

Indicate whether this serializer persists data on disk.

Returns:

AzureBlobStorageIO

Bases: StorageIO

Implementation of StorageIO for Azure Blob Storage.

Constructor Parameters:

ParameterTypeDescription
container_urlOptional[str]Azure Blob container URL. Defaults to None.
sas_tokenOptional[str]Optional SAS token. Defaults to None.
blob_content_typeSupportedContentTypeBlob content type for uploads. Defaults to SupportedContentType.PLAIN_TEXT.

Methods:

create_directory_if_not_exists_async

create_directory_if_not_exists_async(directory_path: Path | str) → None

Log a no-op directory creation for Azure Blob Storage.

ParameterTypeDescription
directory_pathUnion[Path, str]Requested directory path.

is_file_async

is_file_async(path: Path | str) → bool

Check whether the path refers to a file (blob) in Azure Blob Storage.

ParameterTypeDescription
pathUnion[Path, str]Blob URL or path to test.

Returns:

parse_blob_url

parse_blob_url(file_path: str) → tuple[str, str]

Parse a blob URL to extract the container and blob name.

ParameterTypeDescription
file_pathstrFull blob URL.

Returns:

Raises:

path_exists_async

path_exists_async(path: Path | str) → bool

Check whether a given path exists in the Azure Blob Storage container.

ParameterTypeDescription
pathUnion[Path, str]Blob URL or path to test.

Returns:

read_file_async

read_file_async(path: Path | str) → bytes

Asynchronously reads the content of a file (blob) from Azure Blob Storage.

If the provided path is a full URL (e.g., https://account.blob.core.windows.net/container/dir1/dir2/sample.png), it extracts the relative blob path (e.g., dir1/dir2/sample.png) to correctly access the blob. If a relative path is provided, it will use it as-is.

ParameterTypeDescription
pathstrThe path to the file (blob) in Azure Blob Storage. This can be either a full URL or a relative path.

Returns:

write_file_async

write_file_async(path: Path | str, data: bytes) → None

Write data to Azure Blob Storage at the specified path.

If the provided path is a full URL, the blob name is extracted from it. If a relative path is provided, it is used as the blob name directly.

ParameterTypeDescription
pathUnion[Path, str]Full blob URL or relative blob path.
databytesThe data to write.

AzureSQLMemory

Bases: MemoryInterface

A class to manage conversation memory using Azure SQL Server as the backend database. It leverages SQLAlchemy Base models for creating tables and provides CRUD operations to interact with the tables.

This class encapsulates the setup of the database connection, table creation based on SQLAlchemy models, and session management to perform database operations.

Constructor Parameters:

ParameterTypeDescription
connection_string`strNone`
results_container_url`strNone`
results_sas_token`strNone`
verboseboolWhether to enable verbose logging for the database engine. Defaults to False. Defaults to False.
skip_schema_migrationboolWhether to skip schema migration. Defaults to False. Defaults to False.
silentboolIf True, suppresses schema migration console output. Defaults to False. Defaults to False.

Methods:

dispose_engine

dispose_engine() → None

Dispose the engine and clean up resources.

get_all_embeddings

get_all_embeddings() → Sequence[EmbeddingDataEntry]

Fetch all entries from the specified table and returns them as model instances.

Returns:

get_conversation_stats

get_conversation_stats(conversation_ids: Sequence[str]) → dict[str, ConversationStats]

Azure SQL implementation: lightweight aggregate stats per conversation.

Executes a single SQL query that returns message count (distinct sequences), a truncated last-message preview, the first non-empty labels dict, and the earliest timestamp for each conversation_id.

ParameterTypeDescription
conversation_idsSequence[str]The conversation IDs to query.

Returns:

get_session

get_session() → Session

Provide a session for database operations.

Returns:

get_unique_attack_class_names

get_unique_attack_class_names() → list[str]

Azure SQL implementation: extract unique class_name values from the atomic_attack_identifier JSON column.

Returns:

get_unique_converter_class_names

get_unique_converter_class_names() → list[str]

Azure SQL implementation: extract unique converter class_name values from the children.attack_technique.children.attack.children.request_converters array in the atomic_attack_identifier JSON column.

Returns:

BinaryPathDataTypeSerializer

Bases: DataTypeSerializer

Serializer for generic binary path values stored on disk.

Constructor Parameters:

ParameterTypeDescription
categorystrThe category or context for the data.
prompt_textOptional[str]The binary file path or identifier. Defaults to None.
extensionOptional[str]The file extension, defaults to ‘bin’. Defaults to None.

Methods:

data_on_disk

data_on_disk() → bool

Indicate whether this serializer persists data on disk.

Returns:

CentralMemory

Provide a centralized memory instance across the framework. The provided memory instance will be reused for future calls.

Methods:

get_memory_instance

get_memory_instance() → MemoryInterface

Return a centralized memory instance.

Returns:

Raises:

set_memory_instance

set_memory_instance(passed_memory: MemoryInterface) → None

Set a provided memory instance as the central instance for subsequent calls.

ParameterTypeDescription
passed_memoryMemoryInterfaceThe memory instance to set as the central instance.

DataTypeSerializer

Bases: abc.ABC

Abstract base class for data type normalizers.

Responsible for reading and saving multi-modal data types to local disk or Azure Storage Account.

Methods:

data_on_disk

data_on_disk() → bool

Indicate whether the data is stored on disk.

Returns:

get_data_filename

get_data_filename(file_name: str | None = None) → Path | str

Generate or retrieve a unique filename for the data file (deprecated alias of get_data_filename_async).

ParameterTypeDescription
file_name`strNone`

Returns:

get_data_filename_async

get_data_filename_async(file_name: str | None = None) → Path | str

Generate or retrieve a unique filename for the data file.

ParameterTypeDescription
file_nameOptional[str]Optional file name override. Defaults to None.

Returns:

Raises:

get_extension

get_extension(file_path: str) → str | None

Get the file extension from the file path.

ParameterTypeDescription
file_pathstrInput file path.

Returns:

get_mime_type

get_mime_type(file_path: str) → str | None

Get the MIME type of the file path.

ParameterTypeDescription
file_pathstrInput file path.

Returns:

get_sha256

get_sha256() → str

Compute SHA256 hash for this serializer’s current value (deprecated alias of get_sha256_async).

Returns:

get_sha256_async

get_sha256_async() → str

Compute SHA256 hash for this serializer’s current value.

Returns:

Raises:

read_data

read_data() → bytes

Read data from storage (deprecated alias of read_data_async).

Returns:

read_data_async

read_data_async() → bytes

Read data from storage.

Returns:

Raises:

read_data_base64

read_data_base64() → str

Read data and return it as a base64 string (deprecated alias of read_data_base64_async).

Returns:

read_data_base64_async

read_data_base64_async() → str

Read data from storage and return it as a base64 string.

Returns:

save_b64_image

save_b64_image(data: str | bytes, output_filename: str | None = None) → None

Save a base64-encoded image to storage (deprecated alias of save_b64_image_async).

ParameterTypeDescription
data`strbytes`
output_filename`strNone`

save_b64_image_async

save_b64_image_async(data: str | bytes, output_filename: str | None = None) → None

Save a base64-encoded image to storage.

ParameterTypeDescription
data`strbytes`
output_filename(optional, str)filename to store image as. Defaults to UUID if not provided Defaults to None.

Raises:

save_data

save_data(data: bytes, output_filename: str | None = None) → None

Save data to storage (deprecated alias of save_data_async).

ParameterTypeDescription
databytesThe data to be saved.
output_filename`strNone`

save_data_async

save_data_async(data: bytes, output_filename: str | None = None) → None

Save data to storage.

ParameterTypeDescription
databytesbytes: The data to be saved.
output_filename(optional, str)filename to store data as. Defaults to UUID if not provided Defaults to None.

Raises:

save_formatted_audio

save_formatted_audio(data: bytes, num_channels: int = 1, sample_width: int = 2, sample_rate: int = 16000, output_filename: str | None = None) → None

Save formatted audio data to storage (deprecated alias of save_formatted_audio_async).

ParameterTypeDescription
databytesAudio data bytes.
num_channelsintNumber of channels in audio data. Defaults to 1.
sample_widthintSample width in bytes. Defaults to 2.
sample_rateintSample rate in Hz. Defaults to 16000.
output_filename`strNone`

save_formatted_audio_async

save_formatted_audio_async(data: bytes, num_channels: int = 1, sample_width: int = 2, sample_rate: int = 16000, output_filename: str | None = None) → None

Save PCM16 or similarly formatted audio data to storage.

ParameterTypeDescription
databytesbytes with audio data
output_filename(optional, str)filename to store audio as. Defaults to UUID if not provided Defaults to None.
num_channels(optional, int)number of channels in audio data. Defaults to 1 Defaults to 1.
sample_width(optional, int)sample width in bytes. Defaults to 2 Defaults to 2.
sample_rate(optional, int)sample rate in Hz. Defaults to 16000 Defaults to 16000.

Raises:

DiskStorageIO

Bases: StorageIO

Implementation of StorageIO for local disk storage.

Methods:

create_directory_if_not_exists_async

create_directory_if_not_exists_async(path: Path | str) → None

Asynchronously creates a directory if it doesn’t exist on the local disk.

ParameterTypeDescription
pathPathThe directory path to create.

is_file_async

is_file_async(path: Path | str) → bool

Check whether the given path is a file (not a directory).

ParameterTypeDescription
pathPathThe path to check.

Returns:

path_exists_async

path_exists_async(path: Path | str) → bool

Check whether a path exists on the local disk.

ParameterTypeDescription
pathPathThe path to check.

Returns:

read_file_async

read_file_async(path: Path | str) → bytes

Asynchronously reads a file from the local disk.

ParameterTypeDescription
pathUnion[Path, str]The path to the file.

Returns:

write_file_async

write_file_async(path: Path | str, data: bytes) → None

Asynchronously writes data to a file on the local disk.

ParameterTypeDescription
pathPathThe path to the file.
databytesThe content to write to the file.

EmbeddingDataEntry

Bases: Base

Represents the embedding data associated with conversation entries in the database. Each embedding is linked to a specific conversation entry via an id.

ErrorDataTypeSerializer

Bases: DataTypeSerializer

Serializer for error payloads stored as in-memory text.

Constructor Parameters:

ParameterTypeDescription
prompt_textstrError payload text.

Methods:

data_on_disk

data_on_disk() → bool

Indicate whether this serializer persists data on disk.

Returns:

ImagePathDataTypeSerializer

Bases: DataTypeSerializer

Serializer for image path values stored on disk.

Constructor Parameters:

ParameterTypeDescription
categorystrData category folder name.
prompt_textOptional[str]Optional existing image path. Defaults to None.
extensionOptional[str]Optional image extension. Defaults to None.

Methods:

data_on_disk

data_on_disk() → bool

Indicate whether this serializer persists data on disk.

Returns:

MemoryEmbedding

The MemoryEmbedding class is responsible for encoding the memory embeddings.

Constructor Parameters:

ParameterTypeDescription
embedding_model`EmbeddingSupportNone`

Methods:

generate_embedding_memory_data

generate_embedding_memory_data(message_piece: MessagePiece) → EmbeddingDataEntry

Generate metadata for a message piece.

ParameterTypeDescription
message_pieceMessagePiecethe message piece for which to generate a text embedding

Returns:

Raises:

MemoryInterface

Bases: abc.ABC

Abstract interface for conversation memory storage systems.

This interface defines the contract for storing and retrieving chat messages and conversation history. Implementations can use different storage backends such as files, databases, or cloud storage services.

Constructor Parameters:

ParameterTypeDescription
embedding_model`AnyNone`

Methods:

add_attack_results_to_memory

add_attack_results_to_memory(attack_results: Sequence[AttackResult]) → None

Insert a list of attack results into the memory storage. The database model automatically calculates objective_sha256 for consistency.

Raises:

add_conversation_to_memory

add_conversation_to_memory(conversation: Conversation) → None

Register a conversation in memory, recording its conversation-scoped metadata.

A conversation is a first-class entity held with a single target. Build a Conversation when it is created and call this once (before, or independently of, adding its messages) to record the target it is held with. Message writes (add_message_to_memory / add_message_pieces_to_memory) deliberately do not take a target, so that conversation ownership is expressed in a single place rather than threaded through every write.

Registration is idempotent only for an identical conversation: re-registering the same conversation_id with the same target is a no-op (so repeated per-turn registration is safe). Re-registering an existing conversation_id with a different target is a conflict and raises ValueError -- a conversation is held with exactly one target and is never re-targeted.

ParameterTypeDescription
conversationConversationThe conversation metadata to record, carrying the conversation_id and the target it is held with (if known).

Raises:

add_message_pieces_to_memory

add_message_pieces_to_memory(message_pieces: Sequence[MessagePiece]) → None

Insert a list of message pieces into the memory storage.

Pieces flagged via MessagePiece.not_in_memory = True are silently filtered out so callers don’t need to track persistence policy themselves. Every remaining piece must carry a non-empty conversation_id (the memory layer never invents one -- see _validate_persistable_conversation_ids).

Conversation-scoped metadata (the target a conversation is held with) is not recorded here; register it once via add_conversation_to_memory when the conversation is created.

This is a template method: subclasses implement only the backend-specific _add_message_pieces_to_memory and inherit the filtering and validation steps so no subclass can forget to run them.

ParameterTypeDescription
message_piecesSequence[MessagePiece]The pieces to persist.

add_message_to_memory

add_message_to_memory(request: Message) → None

Insert a list of message pieces into the memory storage.

Automatically updates the sequence to be the next number in the conversation. If necessary, generates embedding data for applicable entries

ParameterTypeDescription
requestMessageThe message to add to the memory.

add_scenario_results_to_memory

add_scenario_results_to_memory(scenario_results: Sequence[ScenarioResult]) → None

Insert a list of scenario results into the memory storage.

ParameterTypeDescription
scenario_resultsSequence[ScenarioResult]Sequence of ScenarioResult objects to store in the database.

add_scores_to_memory

add_scores_to_memory(scores: Sequence[Score]) → None

Insert a list of scores into the memory storage.

Callers that produce scores for pieces flagged via MessagePiece.not_in_memory = True should null out message_piece_id on those scores before calling this method so the score itself can still be persisted without a dangling piece linkage. Persisting the score even without a piece is intentional: aggregate analytics (e.g. refusal rate over a batch) still want the score row even when the scored content was never a real conversation turn.

add_seed_datasets_to_memory_async

add_seed_datasets_to_memory_async(datasets: Sequence[SeedDataset], added_by: str) → None

Insert a list of seed datasets into the memory storage.

ParameterTypeDescription
datasetsSequence[SeedDataset]A list of seed datasets to insert.
added_bystrThe user who added the datasets.

add_seed_groups_to_memory_async

add_seed_groups_to_memory_async(prompt_groups: Sequence[SeedGroup], added_by: str | None = None) → None

Insert a list of seed groups into the memory storage.

ParameterTypeDescription
prompt_groupsSequence[SeedGroup]A list of prompt groups to insert.
added_bystrThe user who added the prompt groups. Defaults to None.

Raises:

add_seeds_to_memory_async

add_seeds_to_memory_async(seeds: Sequence[Seed], added_by: str | None = None) → None

Insert a list of seeds into the memory storage.

ParameterTypeDescription
seedsSequence[Seed]A list of seeds to insert.
added_bystrThe user who added the seeds. Defaults to None.

Raises:

cleanup

cleanup() → None

Ensure cleanup on process exit.

disable_embedding

disable_embedding() → None

Disable embedding functionality for the memory interface.

Sets the memory_embedding attribute to None, disabling any embedding operations.

dispose_engine

dispose_engine() → None

Dispose the engine and clean up resources.

duplicate_conversation

duplicate_conversation(conversation_id: str) → str

Duplicate a conversation for reuse.

This can be useful when an attack strategy requires branching out from a particular point in the conversation. One cannot continue both branches with the same conversation ID since that would corrupt the memory. Instead, one needs to duplicate the conversation and continue with the new conversation ID.

ParameterTypeDescription
conversation_idstrThe conversation ID with existing conversations.

Returns:

duplicate_conversation_excluding_last_turn

duplicate_conversation_excluding_last_turn(conversation_id: str) → str

Duplicate a conversation, excluding the last turn. In this case, last turn is defined as before the last user request (e.g. if there is half a turn, it just removes that half).

This can be useful when an attack strategy requires back tracking the last prompt/response pair.

ParameterTypeDescription
conversation_idstrThe conversation ID with existing conversations.

Returns:

duplicate_messages

duplicate_messages(messages: Sequence[Message]) → tuple[str, Sequence[MessagePiece]]

Duplicate messages with a new conversation ID.

Each duplicated piece gets a fresh id and timestamp while preserving original_prompt_id for tracking lineage.

ParameterTypeDescription
messagesSequence[Message]The messages to duplicate.

Returns:

enable_embedding

enable_embedding(embedding_model: Any | None = None) → None

Enable embedding functionality for the memory interface.

ParameterTypeDescription
embedding_model`AnyNone`

Raises:

get_all_embeddings

get_all_embeddings() → Sequence[EmbeddingDataEntry]

Load all EmbeddingData from the memory storage handler.

get_attack_results

get_attack_results(attack_result_ids: Sequence[str] | None = None, conversation_id: str | None = None, objective: str | None = None, objective_sha256: Sequence[str] | None = None, outcome: str | None = None, attack_classes: Sequence[str] | None = None, atomic_attack_eval_hashes: Sequence[str] | None = None, converter_classes: Sequence[str] | None = None, converter_classes_match: Literal['all', 'any'] = 'all', has_converters: bool | None = None, labels: dict[str, str | Sequence[str]] | None = None, targeted_harm_categories: Sequence[str] | None = None, identifier_filters: Sequence[IdentifierFilter] | None = None, scenario_result_id: str | None = None) → Sequence[AttackResult]

Retrieve a list of AttackResult objects based on the specified filters.

ParameterTypeDescription
attack_result_ids`Sequence[str]None`
conversation_id`strNone`
objective`strNone`
objective_sha256`Sequence[str]None`
outcome`strNone`
attack_classes`Sequence[str]None`
atomic_attack_eval_hashes`Sequence[str]None`
converter_classes`Sequence[str]None`
converter_classes_matchLiteral['all', 'any']How to combine multiple entries in converter_classes. "all" (default) matches attacks that used every listed converter (AND, case-insensitive). "any" matches attacks that used at least one listed converter (OR, case-insensitive). Ignored when converter_classes has fewer than 2 entries or is empty. Defaults to 'all'.
has_converters`boolNone`
labels`dict[str, strSequence[str]]
targeted_harm_categories`Sequence[str]None`
identifier_filters`Sequence[IdentifierFilter]None`
scenario_result_id`strNone`

Returns:

Raises:

get_conversation

get_conversation(conversation_id: str) → MutableSequence[Message]

Retrieve the messages for a conversation (deprecated alias).

.. deprecated:: Use get_conversation_messages instead. The get_conversation name is being freed so it can return the conversation entity (currently exposed as _get_conversation) in a future release.

ParameterTypeDescription
conversation_idstrThe conversation ID to match.

Returns:

get_conversation_messages

get_conversation_messages(conversation_id: str) → MutableSequence[Message]

Retrieve a list of Message objects that have the specified conversation ID.

ParameterTypeDescription
conversation_idstrThe conversation ID to match.

Returns:

get_conversation_stats

get_conversation_stats(conversation_ids: Sequence[str]) → dict[str, ConversationStats]

Return lightweight aggregate statistics for one or more conversations.

Computes per-conversation message count (distinct sequence numbers), a truncated last-message preview, the first non-empty labels dict, and the earliest message timestamp using efficient SQL aggregation instead of loading full pieces.

ParameterTypeDescription
conversation_idsSequence[str]The conversation IDs to query.

Returns:

get_message_pieces

get_message_pieces(attack_id: str | uuid.UUID | None = None, role: str | None = None, conversation_id: str | uuid.UUID | None = None, prompt_ids: Sequence[str | uuid.UUID] | None = None, labels: dict[str, str] | None = None, prompt_metadata: dict[str, str | int] | None = None, sent_after: datetime | None = None, sent_before: datetime | None = None, original_values: Sequence[str] | None = None, converted_values: Sequence[str] | None = None, data_type: str | None = None, not_data_type: str | None = None, converted_value_sha256: Sequence[str] | None = None, identifier_filters: Sequence[IdentifierFilter] | None = None) → Sequence[MessagePiece]

Retrieve a list of MessagePiece objects based on the specified filters.

ParameterTypeDescription
attack_id`struuid.UUID
role`strNone`
conversation_id`struuid.UUID
prompt_ids`Sequence[str]Sequence[uuid.UUID]
labels`dict[str, str]None`
prompt_metadata`dict[str, strint]
sent_after`datetimeNone`
sent_before`datetimeNone`
original_values`Sequence[str]None`
converted_values`Sequence[str]None`
data_type`strNone`
not_data_type`strNone`
converted_value_sha256`Sequence[str]None`
identifier_filters`Sequence[IdentifierFilter]None`

Returns:

Raises:

get_prompt_scores

get_prompt_scores(attack_id: str | uuid.UUID | None = None, role: str | None = None, conversation_id: str | uuid.UUID | None = None, prompt_ids: Sequence[str | uuid.UUID] | None = None, labels: dict[str, str] | None = None, prompt_metadata: dict[str, str | int] | None = None, sent_after: datetime | None = None, sent_before: datetime | None = None, original_values: Sequence[str] | None = None, converted_values: Sequence[str] | None = None, data_type: str | None = None, not_data_type: str | None = None, converted_value_sha256: Sequence[str] | None = None) → Sequence[Score]

Retrieve scores attached to message pieces based on the specified filters.

ParameterTypeDescription
attack_id`struuid.UUID
role`strNone`
conversation_id`struuid.UUID
prompt_ids`Sequence[str]Sequence[uuid.UUID]
labels`dict[str, str]None`
prompt_metadata`dict[str, strint]
sent_after`datetimeNone`
sent_before`datetimeNone`
original_values`Sequence[str]None`
converted_values`Sequence[str]None`
data_type`strNone`
not_data_type`strNone`
converted_value_sha256`Sequence[str]None`

Returns:

get_request_from_response

get_request_from_response(response: Message) → Message

Retrieve the request that produced the given response.

ParameterTypeDescription
responseMessageThe response message object to match.

Returns:

Raises:

get_scenario_results

get_scenario_results(scenario_result_ids: Sequence[str] | None = None, scenario_name: str | None = None, scenario_version: int | None = None, pyrit_version: str | None = None, added_after: datetime | None = None, added_before: datetime | None = None, labels: dict[str, str] | None = None, objective_target_endpoint: str | None = None, objective_target_model_name: str | None = None, identifier_filters: Sequence[IdentifierFilter] | None = None, limit: int | None = None) → Sequence[ScenarioResult]

Retrieve a list of ScenarioResult objects based on the specified filters.

Results are always ordered by completion_time descending (most recent first).

ParameterTypeDescription
scenario_result_ids`Sequence[str]None`
scenario_name`strNone`
scenario_version`intNone`
pyrit_version`strNone`
added_after`datetimeNone`
added_before`datetimeNone`
labels`dict[str, str]None`
objective_target_endpoint`strNone`
objective_target_model_name`strNone`
identifier_filters`Sequence[IdentifierFilter]None`
limit`intNone`

Returns:

get_scores

get_scores(score_ids: Sequence[str] | None = None, score_type: str | None = None, score_category: str | None = None, sent_after: datetime | None = None, sent_before: datetime | None = None, identifier_filters: Sequence[IdentifierFilter] | None = None) → Sequence[Score]

Retrieve a list of Score objects based on the specified filters.

ParameterTypeDescription
score_ids`Sequence[str]None`
score_type`strNone`
score_category`strNone`
sent_after`datetimeNone`
sent_before`datetimeNone`
identifier_filters`Sequence[IdentifierFilter]None`

Returns:

get_seed_dataset_names

get_seed_dataset_names() → Sequence[str]

Return a list of all seed dataset names in the memory storage.

Returns:

get_seed_groups

get_seed_groups(value: str | None = None, value_sha256: Sequence[str] | None = None, dataset_name: str | None = None, dataset_name_pattern: str | None = None, data_types: Sequence[str] | None = None, harm_categories: Sequence[str] | None = None, added_by: str | None = None, authors: Sequence[str] | None = None, groups: Sequence[str] | None = None, source: str | None = None, seed_type: SeedType | None = None, parameters: Sequence[str] | None = None, metadata: dict[str, str | int] | None = None, prompt_group_ids: Sequence[uuid.UUID] | None = None, group_length: Sequence[int] | None = None) → Sequence[SeedGroup]

Retrieve groups of seed prompts based on the provided filtering criteria.

ParameterTypeDescription
value`(strNone, Optional)`
value_sha256`(Sequence[str]None, Optional)`
dataset_name`(strNone, Optional)`
dataset_name_pattern`(strNone, Optional)`
data_types`(Sequence[str]None, Optional)`
harm_categories`(Sequence[str]None, Optional)`
added_by`(strNone, Optional)`
authors`(Sequence[str]None, Optional)`
groups`(Sequence[str]None, Optional)`
source`(strNone, Optional)`
seed_type`(SeedTypeNone, Optional)`
parameters`(Sequence[str]None, Optional)`
metadata`(dict[str, strint]
prompt_group_ids`(Sequence[uuid.UUID]None, Optional)`
group_length`(Sequence[int]None, Optional)`

Returns:

get_seeds

get_seeds(value: str | None = None, value_sha256: Sequence[str] | None = None, dataset_name: str | None = None, dataset_name_pattern: str | None = None, data_types: Sequence[str] | None = None, harm_categories: Sequence[str] | None = None, added_by: str | None = None, authors: Sequence[str] | None = None, groups: Sequence[str] | None = None, source: str | None = None, seed_type: SeedType | None = None, parameters: Sequence[str] | None = None, metadata: dict[str, str | int] | None = None, prompt_group_ids: Sequence[uuid.UUID] | None = None) → Sequence[Seed]

Retrieve a list of seed prompts based on the specified filters.

ParameterTypeDescription
valuestrThe value to match by substring. If None, all values are returned. Defaults to None.
value_sha256strThe SHA256 hash of the value to match. If None, all values are returned. Defaults to None.
dataset_namestrThe dataset name to match exactly. If None, all dataset names are considered. Defaults to None.
dataset_name_patternstrA pattern to match dataset names using SQL LIKE syntax. Supports wildcards: % (any characters) and _ (single character). Examples: “harm%” matches names starting with “harm”, “%test%” matches names containing “test”. If both dataset_name and dataset_name_pattern are provided, dataset_name takes precedence. Defaults to None.
data_types`Sequence[str]None`
harm_categoriesSequence[str]A list of harm categories to filter by. If None, Defaults to None.
added_bystrThe user who added the prompts. Defaults to None.
authorsSequence[str]A list of authors to filter by. Note that this filters by substring, so a query for “Adam Jones” may not return results if the record is “A. Jones”, “Jones, Adam”, etc. If None, all authors are considered. Defaults to None.
groupsSequence[str]A list of groups to filter by. If None, all groups are considered. Defaults to None.
sourcestrThe source to filter by. If None, all sources are considered. Defaults to None.
seed_typeSeedTypeThe type of seed to filter by (“prompt”, “objective”, or “simulated_conversation”). Defaults to None.
parametersSequence[str]A list of parameters to filter by. Specifying parameters effectively returns prompt templates instead of prompts. Defaults to None.
metadata`dict[str, strint]`
prompt_group_idsSequence[uuid.UUID]A list of prompt group IDs to filter by. Defaults to None.

Returns:

get_session

get_session() → Any

Provide a SQLAlchemy session for transactional operations.

Returns:

get_unique_attack_class_names

get_unique_attack_class_names() → list[str]

Return sorted unique attack class names from all stored attack results.

Extracts class_name from the attack_identifier JSON column via a database-level DISTINCT query.

Returns:

get_unique_attack_labels

get_unique_attack_labels() → dict[str, list[str]]

Return all unique label key-value pairs across attack results.

Labels may live on PromptMemoryEntry.labels (joined via conversation_id) or directly on AttackResultEntry.labels. Both sources are queried (OR logic, mirroring the label filter behaviour in get_attack_results), and unique key-value pairs are aggregated in Python.

Returns:

get_unique_converter_class_names

get_unique_converter_class_names() → list[str]

Return sorted unique converter class names used across all attack results.

Extracts class_name values from the request_converter_identifiers array within the attack_identifier JSON column via a database-level query.

Returns:

print_schema

print_schema() → None

Print the schema of all tables in the database.

Raises:

reset_database

reset_database() → None

Drop and recreate all tables in the database.

Raises:

update_attack_result

update_attack_result(conversation_id: str, update_fields: dict[str, Any]) → bool

Update specific fields of an existing AttackResultEntry identified by conversation_id.

This method queries for the raw database entry by conversation_id and updates the specified fields in place, avoiding the creation of duplicate rows.

ParameterTypeDescription
conversation_idstrThe conversation ID of the attack result to update.
update_fieldsdict[str, Any]A dictionary of column names to new values. Valid fields include ‘adversarial_chat_conversation_ids’, ‘pruned_conversation_ids’, ‘outcome’, ‘attack_metadata’, etc.

Returns:

Raises:

update_attack_result_by_id

update_attack_result_by_id(attack_result_id: str, update_fields: dict[str, Any]) → bool

Update specific fields of an existing AttackResultEntry identified by its primary key.

ParameterTypeDescription
attack_result_idstrThe UUID primary key of the AttackResultEntry.
update_fieldsdict[str, Any]Column names to new values.

Returns:

update_labels_by_conversation_id

update_labels_by_conversation_id(conversation_id: str, labels: dict[str, Any]) → bool

Update the labels of prompt entries in memory for a given conversation ID.

ParameterTypeDescription
conversation_idstrThe conversation ID of the entries to be updated.
labelsdictNew dictionary of labels.

Returns:

update_prompt_entries_by_conversation_id

update_prompt_entries_by_conversation_id(conversation_id: str, update_fields: dict[str, Any]) → bool

Update prompt entries for a given conversation ID with the specified field values.

ParameterTypeDescription
conversation_idstrThe conversation ID of the entries to be updated.
update_fieldsdictA dictionary of field names and their new values (ex. {“labels”: {“test”: “value”}})

Returns:

Raises:

update_prompt_metadata_by_conversation_id

update_prompt_metadata_by_conversation_id(conversation_id: str, prompt_metadata: dict[str, str | int]) → bool

Update the metadata of prompt entries in memory for a given conversation ID.

ParameterTypeDescription
conversation_idstrThe conversation ID of the entries to be updated.
prompt_metadata`dict[str, strint]`

Returns:

update_scenario_metadata

update_scenario_metadata(scenario_result_id: str, metadata: dict[str, Any]) → None

Replace the scenario_metadata JSON blob on an existing scenario result.

Used by the scenario layer to persist first-run state (e.g. objective_hashes) that resume needs to replay. Performs a targeted UPDATE so it doesn’t clobber other columns.

ParameterTypeDescription
scenario_result_idstrThe ID of the scenario result to update.
metadatadict[str, Any]The full metadata dict to store. Pass the merged dict, not just the new keys — this writes the whole value.

Raises:

update_scenario_run_state

update_scenario_run_state(scenario_result_id: str, scenario_run_state: str, error_message: str | None = None, error_type: str | None = None) → None

Update the run state of an existing scenario result.

Performs a targeted UPDATE of only the state/error columns instead of rebuilding the entire ScenarioResultEntry row. The full-row rebuild used to read the stored row, mutate the ScenarioResult, and re-serialize every column — including attack_results_json which is being phased out and could be stale during the deprecation window. A targeted UPDATE avoids clobbering manifest data and is also cheaper.

ParameterTypeDescription
scenario_result_idstrThe ID of the scenario result to update.
scenario_run_statestrThe new state for the scenario (e.g., “CREATED”, “IN_PROGRESS”, “COMPLETED”, “FAILED”).
error_message`strNone`
error_type`strNone`

Raises:

PromptMemoryEntry

Bases: Base

Represents the prompt data.

Because of the nature of database and sql alchemy, type ignores are abundant :)

Constructor Parameters:

ParameterTypeDescription
entryMessagePieceThe message piece to convert into a database entry.

Methods:

get_message_piece

get_message_piece() → MessagePiece

Convert this database entry back into a MessagePiece object.

Returns:

SQLiteMemory

Bases: MemoryInterface

A memory interface that uses SQLite as the backend database.

This class provides functionality to insert, query, and manage conversation data using SQLite. It supports both file-based and in-memory databases.

Note: this is replacing the old DuckDB implementation.

Constructor Parameters:

ParameterTypeDescription
db_path`Pathstr
verboseboolWhether to enable verbose logging. Defaults to False. Defaults to False.
skip_schema_migrationboolWhether to skip schema migration. Defaults to False. Defaults to False.
silentboolIf True, suppresses schema migration console output. Defaults to False. Defaults to False.

Methods:

dispose_engine

dispose_engine() → None

Dispose the engine and close all connections.

get_all_embeddings

get_all_embeddings() → Sequence[EmbeddingDataEntry]

Fetch all entries from the specified table and returns them as model instances.

Returns:

get_all_table_models

get_all_table_models() → list[type[Base]]

Return a list of all table models used in the database by inspecting the Base registry.

Returns:

get_conversation_stats

get_conversation_stats(conversation_ids: Sequence[str]) → dict[str, ConversationStats]

SQLite implementation: lightweight aggregate stats per conversation.

Executes a single SQL query that returns message count (distinct sequences), a truncated last-message preview, the first non-empty labels dict, and the earliest timestamp for each conversation_id.

ParameterTypeDescription
conversation_idsSequence[str]The conversation IDs to query.

Returns:

get_session

get_session() → Session

Provide a SQLAlchemy session for transactional operations.

Returns:

get_unique_attack_class_names

get_unique_attack_class_names() → list[str]

SQLite implementation: extract unique class_name values from the atomic_attack_identifier JSON column.

Returns:

get_unique_converter_class_names

get_unique_converter_class_names() → list[str]

SQLite implementation: extract unique converter class_name values from the children.attack_technique.children.attack.children.request_converters array in the atomic_attack_identifier JSON column.

Returns:

print_schema

print_schema() → None

Print the schema of all tables in the SQLite database.

SeedEntry

Bases: Base

Represents the raw prompt or prompt template data as found in open datasets.

Note: This is different from the PromptMemoryEntry which is the processed prompt data. SeedPrompt merely reflects basic prompts before plugging into attacks, running through models with corresponding attack strategies, and applying converters. PromptMemoryEntry captures the processed prompt data before and after the above steps.

Constructor Parameters:

ParameterTypeDescription
entrySeedThe seed object to convert into a database entry.

Methods:

get_seed

get_seed() → Seed

Convert this database entry back into a Seed object.

Returns:

StorageIO

Bases: ABC

Abstract interface for storage systems (local disk, Azure Storage Account, etc.).

Methods:

create_directory_if_not_exists

create_directory_if_not_exists(path: Path | str) → None

Create a directory if it does not exist (deprecated alias of create_directory_if_not_exists_async).

ParameterTypeDescription
pathUnion[Path, str]The directory path to create.

create_directory_if_not_exists_async

create_directory_if_not_exists_async(path: Path | str) → None

Asynchronously creates a directory or equivalent in the storage system if it doesn’t exist.

is_file

is_file(path: Path | str) → bool

Check whether the given path is a file (deprecated alias of is_file_async).

ParameterTypeDescription
pathUnion[Path, str]The path to check.

Returns:

is_file_async

is_file_async(path: Path | str) → bool

Asynchronously checks if the path refers to a file (not a directory or container).

path_exists

path_exists(path: Path | str) → bool

Check whether a path exists (deprecated alias of path_exists_async).

ParameterTypeDescription
pathUnion[Path, str]The path to check.

Returns:

path_exists_async

path_exists_async(path: Path | str) → bool

Asynchronously checks if a file or blob exists at the given path.

read_file

read_file(path: Path | str) → bytes

Read a file from storage (deprecated alias of read_file_async).

ParameterTypeDescription
pathUnion[Path, str]The path to the file.

Returns:

read_file_async

read_file_async(path: Path | str) → bytes

Asynchronously reads the file (or blob) from the given path.

write_file

write_file(path: Path | str, data: bytes) → None

Write data to storage (deprecated alias of write_file_async).

ParameterTypeDescription
pathUnion[Path, str]The path to the file.
databytesThe content to write to the file.

write_file_async

write_file_async(path: Path | str, data: bytes) → None

Asynchronously writes data to the given path.

SupportedContentType

Bases: Enum

All supported content types for uploading blobs to provided storage account container. See all options here: https://www.iana.org/assignments/media-types/media-types.xhtml.

TextDataTypeSerializer

Bases: DataTypeSerializer

Serializer for text and text-like prompt values that stay in-memory.

Constructor Parameters:

ParameterTypeDescription
prompt_textstrPrompt value.
data_typePromptDataTypeText-like prompt data type. Defaults to 'text'.

Methods:

data_on_disk

data_on_disk() → bool

Indicate whether this serializer persists data on disk.

Returns:

URLDataTypeSerializer

Bases: DataTypeSerializer

Serializer for URL values and URL-backed local file references.

Constructor Parameters:

ParameterTypeDescription
categorystrData category folder name.
prompt_textstrURL or path value.
extensionOptional[str]Optional extension for persisted content. Defaults to None.

Methods:

data_on_disk

data_on_disk() → bool

Indicate whether this serializer persists data on disk.

Returns:

VideoPathDataTypeSerializer

Bases: DataTypeSerializer

Serializer for video path values stored on disk.

Constructor Parameters:

ParameterTypeDescription
categorystrThe category or context for the data.
prompt_textOptional[str]The video path or identifier. Defaults to None.
extensionOptional[str]The file extension, defaults to ‘mp4’. Defaults to None.

Methods:

data_on_disk

data_on_disk() → bool

Indicate whether this serializer persists data on disk.

Returns: