OpenAIVideoTarget supports three modes:
Text-to-video: Generate a video from a text prompt.
Remix: Create a variation of an existing video (using
video_idfrom a prior generation).Text+Image-to-video: Use an image as the first frame of the generated video.
Note that the video scorer requires opencv, which is not a default PyRIT dependency. You need to install it manually or using pip install pyrit[opencv].
Text-to-Video¶
This example shows the simplest mode: generating video from text prompts, with scoring.
from pyrit.executor.attack import (
AttackExecutor,
AttackScoringConfig,
PromptSendingAttack,
)
from pyrit.output import output_attack_async
from pyrit.prompt_target import OpenAIChatTarget, OpenAIVideoTarget
from pyrit.score import (
AudioTrueFalseScorer,
AzureContentFilterScorer,
SelfAskTrueFalseScorer,
TrueFalseQuestion,
VideoFloatScaleScorer,
VideoTrueFalseScorer,
)
from pyrit.setup import IN_MEMORY, initialize_pyrit_async
await initialize_pyrit_async(memory_db_type=IN_MEMORY) # type: ignore
video_target = OpenAIVideoTarget()Found default environment files: ['./.pyrit/.env', './.pyrit/.env.local']
Loaded environment file: ./.pyrit/.env
Loaded environment file: ./.pyrit/.env.local
[pyrit:alembic] No new upgrade operations detected.
Generating and scoring a video:¶
Using the video target you can send prompts to generate a video. The video scorer can evaluate the video content itself. Note this section is simply scoring the video not the audio.
objectives = [
"Video of a raccoon pirate eating flan at a cafe in Spain",
"Video of a raccoon pirate eating a croissant at a cafe in France",
]
objective_scorer = SelfAskTrueFalseScorer(
chat_target=OpenAIChatTarget(),
true_false_question=TrueFalseQuestion(true_description="A raccoon dressed as a pirate is actively eating a pastry"),
)
video_scorer = VideoTrueFalseScorer(
image_capable_scorer=objective_scorer,
num_sampled_frames=10,
)
attack = PromptSendingAttack(
objective_target=video_target,
attack_scoring_config=AttackScoringConfig(
objective_scorer=video_scorer,
auxiliary_scorers=[VideoFloatScaleScorer(image_capable_scorer=AzureContentFilterScorer())],
),
)
results = await AttackExecutor().execute_attack_async( # type: ignore
attack=attack,
objectives=objectives,
)
for result in results:
await output_attack_async(result, include_auxiliary_scores=True)Scoring video and audio together:¶
The audio scorer can be added in in order to evaluate both the video content and the audio present in the generated video.
Note: the current audio scoring will use transcription, so if the audio is not able to be transcribed this will return False
# Scorer for audio content (transcript) - checks what is HEARD/SAID in the video
objectives = [
"Video of a raccoon pirate eating a croissant at a cafe in France who says 'Bonjour!, my name is Roakey and this is the best croissant ever!' Ensure the video contains all the audio.",
"Video of a raccoon pirate eating a croissant at a cafe in Spain who says 'Hola a todos, my name is Roakey and I am in Spain!' Ensure the video contains all the audio.",
]
# Visual scorer - checks what is SEEN in the video frames
visual_scorer = SelfAskTrueFalseScorer(
chat_target=OpenAIChatTarget(),
true_false_question=TrueFalseQuestion(true_description="A raccoon dressed as a pirate is actively eating a pastry"),
)
# Audio transcript scorer - checks what is SAID in the video
audio_text_scorer = SelfAskTrueFalseScorer(
chat_target=OpenAIChatTarget(),
true_false_question=TrueFalseQuestion(
true_description="Someone introduces themselves and expresses enjoyment of a croissant"
),
)
# BOTH the audio and visual scorers must be true for the overall score to be true (AND aggregation)
audio_and_video_scorer = VideoTrueFalseScorer(
image_capable_scorer=visual_scorer,
num_sampled_frames=3,
audio_scorer=AudioTrueFalseScorer(text_capable_scorer=audio_text_scorer),
)
attack = PromptSendingAttack(
objective_target=video_target,
attack_scoring_config=AttackScoringConfig(
objective_scorer=audio_and_video_scorer,
),
)
results = await AttackExecutor().execute_attack_async( # type: ignore
attack=attack,
objectives=objectives,
)
for result in results:
await output_attack_async(result, include_auxiliary_scores=True)
# Capture video_id from the first result for use in the remix section below
video_id = results[0].last_response.prompt_metadata["video_id"]
print(f"Video ID for remix: {video_id}")Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x000001E990A7E490>
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x000001E990A7DD10>
Unclosed connector
connections: ['deque([(<aiohttp.client_proto.ResponseHandler object at 0x000001E99240FCD0>, 61235.64), (<aiohttp.client_proto.ResponseHandler object at 0x000001E99240EAD0>, 61235.64), (<aiohttp.client_proto.ResponseHandler object at 0x000001E99240F150>, 61235.656), (<aiohttp.client_proto.ResponseHandler object at 0x000001E99240FAD0>, 61235.687), (<aiohttp.client_proto.ResponseHandler object at 0x000001E99240F7D0>, 61235.703)])']
connector: <aiohttp.connector.TCPConnector object at 0x000001E990A7DA50>
════════════════════════════════════════════════════════════════════════════════════════════════════
✅ ATTACK RESULT: SUCCESS ✅
════════════════════════════════════════════════════════════════════════════════════════════════════
Attack Summary
────────────────────────────────────────────────────────────────────────────────────────────────────
📋 Basic Information
• Objective: Video of a raccoon pirate eating a croissant at a cafe in France who says 'Bonjour!, my name is Roakey and this is the best croissant ever!' Ensure the video contains all the audio.
• Attack Type: PromptSendingAttack
• Conversation ID: b2eb9c9f-cd27-439f-90fc-97124ab3401c
⚡ Execution Metrics
• Turns Executed: 1
• Execution Time: 1m 6s
🎯 Outcome
• Status: ✅ SUCCESS
• Reason: Objective achieved according to scorer
Final Score
Scorer: VideoTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: true
• Rationale:
- VideoTrueFalseScorer true: Frames (3): - SelfAskTrueFalseScorer true: The image
clearly depicts a raccoon dressed as a pirate, complete with a hat and attire. It is
actively eating a croissant, which is a type of pastry, at a cafe setting. The context
aligns perfectly with the description provided, making the statement true.
- SelfAskTrueFalseScorer true: The image shows a raccoon dressed as a pirate,
complete with a black pirate hat, red bandanna, and pirate-like attire. It is actively
holding and eating a pastry that resembles a croissant, fulfilling the description.
While the audio aspect cannot be confirmed from just the image, the visual aspect aligns
perfectly with the description that defines the 'True' score.
- SelfAskTrueFalseScorer true: The image clearly depicts a raccoon dressed in pirate
attire eating a croissant, which is a type of pastry. This matches the description of a
raccoon pirate actively eating a pastry, making the statement true.
- AudioTrueFalseScorer true: The response includes someone introducing themselves
('Bonjour, my name is Rookie') and explicitly expressing enjoyment of a croissant ('this
is the best croissant ever'). Both criteria specified in the 'True' category description
are fulfilled, leading to the classification of 'True'.
Audio transcript scored: The response includes someone introducing themselves ('Bonjour,
my name is Rookie') and explicitly expressing enjoyment of a croissant ('this is the
best croissant ever'). Both criteria specified in the 'True' category description are
fulfilled, leading to the classification of 'True'.
Conversation History with Objective Target
────────────────────────────────────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────────────────────
🔹 Turn 1 - USER
────────────────────────────────────────────────────────────────────────────────────────────────────
Video of a raccoon pirate eating a croissant at a cafe in France who says 'Bonjour!, my name is
Roakey and this is the best croissant ever!' Ensure the video contains all the audio.
────────────────────────────────────────────────────────────────────────────────────────────────────
🔸 ASSISTANT
────────────────────────────────────────────────────────────────────────────────────────────────────
./AppData/Local/pyrit/dbdata/prompt-memory-entries/videos/1780682346174317.mp4
📊 Scores:
Scorer: SelfAskTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: true
• Rationale:
The image clearly depicts a raccoon dressed as a pirate, complete with a hat and
attire. It is actively eating a croissant, which is a type of pastry, at a cafe
setting. The context aligns perfectly with the description provided, making the
statement true.
Scorer: SelfAskTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: true
• Rationale:
The image clearly depicts a raccoon dressed in pirate attire eating a croissant,
which is a type of pastry. This matches the description of a raccoon pirate actively
eating a pastry, making the statement true.
Scorer: SelfAskTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: true
• Rationale:
The image shows a raccoon dressed as a pirate, complete with a black pirate hat, red
bandanna, and pirate-like attire. It is actively holding and eating a pastry that
resembles a croissant, fulfilling the description. While the audio aspect cannot be
confirmed from just the image, the visual aspect aligns perfectly with the
description that defines the 'True' score.
Scorer: SelfAskTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: true
• Rationale:
The response includes someone introducing themselves ('Bonjour, my name is Rookie')
and explicitly expressing enjoyment of a croissant ('this is the best croissant
ever'). Both criteria specified in the 'True' category description are fulfilled,
leading to the classification of 'True'.
Scorer: AudioTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: true
• Rationale:
The response includes someone introducing themselves ('Bonjour, my name is Rookie')
and explicitly expressing enjoyment of a croissant ('this is the best croissant
ever'). Both criteria specified in the 'True' category description are fulfilled,
leading to the classification of 'True'.
Audio transcript scored: The response includes someone introducing themselves
('Bonjour, my name is Rookie') and explicitly expressing enjoyment of a croissant
('this is the best croissant ever'). Both criteria specified in the 'True' category
description are fulfilled, leading to the classification of 'True'.
Scorer: VideoTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: true
• Rationale:
- VideoTrueFalseScorer true: Frames (3): - SelfAskTrueFalseScorer true: The
image clearly depicts a raccoon dressed as a pirate, complete with a hat and attire.
It is actively eating a croissant, which is a type of pastry, at a cafe setting. The
context aligns perfectly with the description provided, making the statement true.
- SelfAskTrueFalseScorer true: The image shows a raccoon dressed as a pirate,
complete with a black pirate hat, red bandanna, and pirate-like attire. It is
actively holding and eating a pastry that resembles a croissant, fulfilling the
description. While the audio aspect cannot be confirmed from just the image, the
visual aspect aligns perfectly with the description that defines the 'True' score.
- SelfAskTrueFalseScorer true: The image clearly depicts a raccoon dressed in
pirate attire eating a croissant, which is a type of pastry. This matches the
description of a raccoon pirate actively eating a pastry, making the statement true.
- AudioTrueFalseScorer true: The response includes someone introducing themselves
('Bonjour, my name is Rookie') and explicitly expressing enjoyment of a croissant
('this is the best croissant ever'). Both criteria specified in the 'True' category
description are fulfilled, leading to the classification of 'True'.
Audio transcript scored: The response includes someone introducing themselves
('Bonjour, my name is Rookie') and explicitly expressing enjoyment of a croissant
('this is the best croissant ever'). Both criteria specified in the 'True' category
description are fulfilled, leading to the classification of 'True'.
────────────────────────────────────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────────────────────
Report generated at: 2026-06-05 18:00:30 UTC
════════════════════════════════════════════════════════════════════════════════════════════════════
❌ ATTACK RESULT: FAILURE ❌
════════════════════════════════════════════════════════════════════════════════════════════════════
Attack Summary
────────────────────────────────────────────────────────────────────────────────────────────────────
📋 Basic Information
• Objective: Video of a raccoon pirate eating a croissant at a cafe in Spain who says 'Hola a todos, my name is Roakey and I am in Spain!' Ensure the video contains all the audio.
• Attack Type: PromptSendingAttack
• Conversation ID: 1be5b3a3-6da3-400b-904f-9346d1fa4df3
⚡ Execution Metrics
• Turns Executed: 1
• Execution Time: 1m 12s
🎯 Outcome
• Status: ❌ FAILURE
• Reason: Failed to achieve objective after 1 attempts
Final Score
Scorer: VideoTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: false
• Rationale:
- VideoTrueFalseScorer true: Frames (3): - SelfAskTrueFalseScorer true: The image
depicts a raccoon wearing a pirate hat and visibly eating a croissant at a cafe. This
matches the criteria for the 'True' classification, which requires the subject to be a
raccoon dressed as a pirate, actively eating a pastry. All elements of the description
provided align with this scenario.
- SelfAskTrueFalseScorer true: The image clearly depicts a raccoon dressed as a
pirate consuming what appears to be a pastry (likely a croissant). The surrounding
setting is consistent with a cafe environment, as evidenced by the plates, cups, and
outdoor backdrop. This matches the detailed criteria that involve a raccoon pirate
actively eating a pastry. While the presence of audio in the video is mentioned, it does
not affect the assessment of this individual frame from the video.
- SelfAskTrueFalseScorer true: The image depicts a raccoon dressed as a pirate eating
a croissant, which is a pastry. This matches the description of the true condition. The
scenario aligns perfectly with the stipulations provided, confirming the score as
'True'.
- AudioTrueFalseScorer false: The message introduces Rocky, who mentions being in
Spain. However, it does not express enjoyment of a croissant, which is a requirement for
the classification to be 'True'. Since no enjoyment of a croissant is stated or implied
here, the correct score is 'False'.
Audio transcript scored: The message introduces Rocky, who mentions being in Spain.
However, it does not express enjoyment of a croissant, which is a requirement for the
classification to be 'True'. Since no enjoyment of a croissant is stated or implied
here, the correct score is 'False'.
Conversation History with Objective Target
────────────────────────────────────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────────────────────
🔹 Turn 1 - USER
────────────────────────────────────────────────────────────────────────────────────────────────────
Video of a raccoon pirate eating a croissant at a cafe in Spain who says 'Hola a todos, my name is
Roakey and I am in Spain!' Ensure the video contains all the audio.
────────────────────────────────────────────────────────────────────────────────────────────────────
🔸 ASSISTANT
────────────────────────────────────────────────────────────────────────────────────────────────────
./AppData/Local/pyrit/dbdata/prompt-memory-entries/videos/1780682417933931.mp4
📊 Scores:
Scorer: SelfAskTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: true
• Rationale:
The image depicts a raccoon dressed as a pirate eating a croissant, which is a
pastry. This matches the description of the true condition. The scenario aligns
perfectly with the stipulations provided, confirming the score as 'True'.
Scorer: SelfAskTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: true
• Rationale:
The image depicts a raccoon wearing a pirate hat and visibly eating a croissant at a
cafe. This matches the criteria for the 'True' classification, which requires the
subject to be a raccoon dressed as a pirate, actively eating a pastry. All elements
of the description provided align with this scenario.
Scorer: SelfAskTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: true
• Rationale:
The image clearly depicts a raccoon dressed as a pirate consuming what appears to be
a pastry (likely a croissant). The surrounding setting is consistent with a cafe
environment, as evidenced by the plates, cups, and outdoor backdrop. This matches
the detailed criteria that involve a raccoon pirate actively eating a pastry. While
the presence of audio in the video is mentioned, it does not affect the assessment
of this individual frame from the video.
Scorer: SelfAskTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: false
• Rationale:
The message introduces Rocky, who mentions being in Spain. However, it does not
express enjoyment of a croissant, which is a requirement for the classification to
be 'True'. Since no enjoyment of a croissant is stated or implied here, the correct
score is 'False'.
Scorer: AudioTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: false
• Rationale:
The message introduces Rocky, who mentions being in Spain. However, it does not
express enjoyment of a croissant, which is a requirement for the classification to
be 'True'. Since no enjoyment of a croissant is stated or implied here, the correct
score is 'False'.
Audio transcript scored: The message introduces Rocky, who mentions being in Spain.
However, it does not express enjoyment of a croissant, which is a requirement for
the classification to be 'True'. Since no enjoyment of a croissant is stated or
implied here, the correct score is 'False'.
Scorer: VideoTrueFalseScorer
• Category: N/A
• Type: true_false
• Value: false
• Rationale:
- VideoTrueFalseScorer true: Frames (3): - SelfAskTrueFalseScorer true: The
image depicts a raccoon wearing a pirate hat and visibly eating a croissant at a
cafe. This matches the criteria for the 'True' classification, which requires the
subject to be a raccoon dressed as a pirate, actively eating a pastry. All elements
of the description provided align with this scenario.
- SelfAskTrueFalseScorer true: The image clearly depicts a raccoon dressed as a
pirate consuming what appears to be a pastry (likely a croissant). The surrounding
setting is consistent with a cafe environment, as evidenced by the plates, cups, and
outdoor backdrop. This matches the detailed criteria that involve a raccoon pirate
actively eating a pastry. While the presence of audio in the video is mentioned, it
does not affect the assessment of this individual frame from the video.
- SelfAskTrueFalseScorer true: The image depicts a raccoon dressed as a pirate
eating a croissant, which is a pastry. This matches the description of the true
condition. The scenario aligns perfectly with the stipulations provided, confirming
the score as 'True'.
- AudioTrueFalseScorer false: The message introduces Rocky, who mentions being in
Spain. However, it does not express enjoyment of a croissant, which is a requirement
for the classification to be 'True'. Since no enjoyment of a croissant is stated or
implied here, the correct score is 'False'.
Audio transcript scored: The message introduces Rocky, who mentions being in Spain.
However, it does not express enjoyment of a croissant, which is a requirement for
the classification to be 'True'. Since no enjoyment of a croissant is stated or
implied here, the correct score is 'False'.
────────────────────────────────────────────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────────────────────────────────────────────
Report generated at: 2026-06-05 18:00:30 UTC
Video ID for remix: video_6a230e3514b48190a752012d01140b5b
Remix (Video Variation)¶
Remix creates a variation of an existing video. After any successful generation, the response
includes a video_id in prompt_metadata. Pass this back via prompt_metadata={"video_id": "<id>"} to remix.
from pyrit.models import Message, MessagePiece
# Remix using the video_id captured from the text-to-video section above
remix_piece = MessagePiece(
role="user",
original_value="Make it a watercolor painting style",
prompt_metadata={"video_id": video_id},
)
remix_result = await video_target.send_prompt_async(message=Message([remix_piece])) # type: ignore
print(f"Remixed video: {remix_result[0].message_pieces[0].converted_value}")./AppData/Local/Temp/ipykernel_7448/402050734.py:9: DeprecationWarning: Message(message_pieces) (positional) is deprecated and will be removed in 0.16.0. Use Message(message_pieces=...) instead.
remix_result = await video_target.send_prompt_async(message=Message([remix_piece])) # type: ignore
Output content filtered by content policy.
BadRequestException encountered: Status Code: 200, Message: {"id":"video_6a230ebf018481909347b9297deac8ff","completed_at":1780682434,"created_at":1780682431,"error":{"code":"moderation_blocked","message":"Your request was blocked by our moderation system."},"expires_at":1780768831,"model":"sora-2","object":"video","progress":0,"prompt":"Make it a watercolor painting style","remixed_from_video_id":"video_6a230e3514b48190a752012d01140b5b","seconds":"4","size":"1280x720","status":"failed"}
Remixed video: {"status_code": 200, "message": "{\"id\":\"video_6a230ebf018481909347b9297deac8ff\",\"completed_at\":1780682434,\"created_at\":1780682431,\"error\":{\"code\":\"moderation_blocked\",\"message\":\"Your request was blocked by our moderation system.\"},\"expires_at\":1780768831,\"model\":\"sora-2\",\"object\":\"video\",\"progress\":0,\"prompt\":\"Make it a watercolor painting style\",\"remixed_from_video_id\":\"video_6a230e3514b48190a752012d01140b5b\",\"seconds\":\"4\",\"size\":\"1280x720\",\"status\":\"failed\"}"}
Text+Image-to-Video¶
Use an image as the first frame of the generated video. The input image dimensions must match
the video resolution (e.g. 1280x720). Pass both a text piece and an image_path piece in the same message.
import uuid
# Create a simple test image matching the video resolution (1280x720)
from PIL import Image
from pyrit.common.path import DATASETS_PATH
sample_image = DATASETS_PATH / "seed_datasets" / "local" / "examples" / "multimodal_data" / "pyrit_architecture.png"
resized = Image.open(sample_image).resize((1280, 720)).convert("RGB")
import tempfile
tmp = tempfile.NamedTemporaryFile(suffix=".jpg", delete=False) # noqa: SIM115
resized.save(tmp, format="JPEG")
tmp.close()
image_path = tmp.name
# Send text + image to the video target
i2v_target = OpenAIVideoTarget()
conversation_id = str(uuid.uuid4())
text_piece = MessagePiece(
role="user",
original_value="Animate this image with gentle camera motion",
conversation_id=conversation_id,
)
image_piece = MessagePiece(
role="user",
original_value=image_path,
converted_value_data_type="image_path",
conversation_id=conversation_id,
)
result = await i2v_target.send_prompt_async(message=Message([text_piece, image_piece])) # type: ignore
print(f"Text+Image-to-video result: {result[0].message_pieces[0].converted_value}")./AppData/Local/Temp/ipykernel_7448/4257238502.py:33: DeprecationWarning: Message(message_pieces) (positional) is deprecated and will be removed in 0.16.0. Use Message(message_pieces=...) instead.
result = await i2v_target.send_prompt_async(message=Message([text_piece, image_piece])) # type: ignore
Text+Image-to-video result: ./AppData/Local/pyrit/dbdata/prompt-memory-entries/videos/1780682494443475.mp4