Video Alt Text
GenAIScript supports speech transcription and video frame extraction which can be combined to analyze videos.
Video Alt Text
The HTML video attribute does not have an alt
attribute.. but you can still attach a accessible description using the aria-label
attribute.
We will build a script that generates the description using the transcript and video frames.
Transcript
We use the transcribe
function to generate the transcript. It will use the transcription
model alias to compute a transcription.
For OpenAI, it defaults to openai:whisper-1
.
Transcriptions are useful to reduce hallucations of LLMs when analyzing images and also provides good timestemp candidates to screenshot the video stream.
const file = env.files[0]const transcript = await transcribe(file) // OpenAI whisper
Video Frames
The next step is to use the transcript to screenshot the video stream. GenAIScript uses ffmpeg to render the frames so make sure you have it installed and configured.
const frames = await ffmpeg.extractFrames(file, { transcript,})
Context
Both the transcript and the frames are added to the prompt context. Since some videos may be silent, we ignore empty transcripts. We also use low detail for the frames to improve performance.
def("TRANSCRIPT", transcript?.srt, { ignoreEmpty: true }) // ignore silent videosdefImages(frames, { detail: "low" }) // low detail for better performance
Prompting it together
Finally, we give the task to the LLM to generate the alt text.
$`You are an expert in assistive technology.You will analyze the video and generate a description alt text for the video.`
Using this script, you can automatically generate high quality alt text for videos.
genaiscript run video-alt-text path_to_video.mp4
Full source
script({ description: "Generate a description alt text for a video", accept: ".mp4,.webm", system: [ "system.output_plaintext", "system.safety_jailbreak", "system.safety_harmful_content", "system.safety_validate_harmful_content", ], files: "src/audio/helloworld.mp4", model: "vision",})
const file = env.files[0]const transcript = await transcribe(file, { cache: "alt-text" }) // OpenAI whisperconst frames = await ffmpeg.extractFrames(file, { transcript,}) // ffmpeg to extract frames
def("TRANSCRIPT", transcript?.srt, { ignoreEmpty: true }) // ignore silent videosdefImages(frames, { detail: "low" }) // low detail for better performance
$`You are an expert in assistive technology.You will analyze the video and generate a description alt text for the video.
- The video is included as a set of <FRAMES> images and the <TRANSCRIPT>.- Do not include alt text in the description.- Keep it short but descriptive.- Do not generate the [ character.`