Video Alt Text
GenAIScript supports speech transcription and video frame extraction which can be combined to analyze videos.
Video Alt Text
Section titled “Video Alt Text”The HTML video attribute does not have an alt
attribute.. but you can still attach a accessible description using the aria-label
attribute.
We will build a script that generates the description using the transcript and video frames.
Transcript
Section titled “Transcript”We use the transcribe
function to generate the transcript. It will use the transcription
model alias to compute a transcription.
For OpenAI, it defaults to openai:whisper-1
.
Transcriptions are useful to reduce hallucations of LLMs when analyzing images and also provides good timestemp candidates to screenshot the video stream.
const file = env.files[0]const transcript = await transcribe(file) // OpenAI whisper
Video Frames
Section titled “Video Frames”The next step is to use the transcript to screenshot the video stream. GenAIScript uses ffmpeg to render the frames so make sure you have it installed and configured.
const frames = await ffmpeg.extractFrames(file, { transcript,})
Context
Section titled “Context”Both the transcript and the frames are added to the prompt context. Since some videos may be silent, we ignore empty transcripts. We also use low detail for the frames to improve performance.
def("TRANSCRIPT", transcript?.srt, { ignoreEmpty: true }) // ignore silent videosdefImages(frames, { detail: "low" }) // low detail for better performance
Prompting it together
Section titled “Prompting it together”Finally, we give the task to the LLM to generate the alt text.
$`You are an expert in assistive technology.You will analyze the video and generate a description alt text for the video.`
Using this script, you can automatically generate high quality alt text for videos.
genaiscript run video-alt-text path_to_video.mp4
Full source
Section titled “Full source”script({ description: "Generate a description alt text for a video", accept: ".mp4,.webm", system: [ "system.output_plaintext", "system.safety_jailbreak", "system.safety_harmful_content", "system.safety_validate_harmful_content", ], files: "src/audio/helloworld.mp4", model: "vision",})
const file = env.files[0]const transcript = await transcribe(file, { cache: "alt-text" }) // OpenAI whisperconst frames = await ffmpeg.extractFrames(file, { transcript,}) // ffmpeg to extract frames
def("TRANSCRIPT", transcript?.srt, { ignoreEmpty: true }) // ignore silent videosdefImages(frames, { detail: "low" }) // low detail for better performance
$`You are an expert in assistive technology.You will analyze the video and generate a description alt text for the video.
- The video is included as a set of <FRAMES> images and the <TRANSCRIPT>.- Do not include alt text in the description.- Keep it short but descriptive.- Do not generate the [ character.`