Videos as Inputs
While most LLMs do not support videos natively, they can be integrated in scripts by rendering frames and adding them as images to the prompt. This can be tedious and GenAIScript provides efficient helpers to streamline this process.
ffmpeg configuration
The functionalities to render and analyze videos rely on ffmpeg and ffprobe.
On Linux, you can try
sudo apt-get update && sudo apt-get install ffmpeg
Make sure these tools are installed locally and available in your PATH,
or configure the FFMPEG_PATH
/ FFPROBE_PATH
environment variables to point to the ffmpeg
/ffprobe
executable.
Extracting frames
As mentionned above, multi-modal LLMs typically support images as a sequence of frames (or screenshots).
The ffmpeg.extractFrames
will render frames from a video file
and return them as an array of file paths. You can use the result with defImages
directly.
- by default, extract keyframes (intra-frames)
const frames = await ffmpeg.extractFrames("path_to_video")defImages(frames)
- specify a number of frames using
count
const frames = await ffmpeg.extractFrames("...", { count: 10 })
- specify timestamps in seconds or percentages of the video duration using
timestamps
(ortimes
)
const frames = await ffmpeg.extractFrames("...", { timestamps: ["00:00", "05:00"],})
- specify the transcript computed by the transcribe function. GenAIScript will extract a frame at the start of each segment.
const transcript = await transcribe("...")const frames = await ffmpeg.extractFrames("...", { transcript })
- specify a scene threshold (between 0 and 1)
const transcript = await transcribe("...", { sceneThreshold: 0.3 })
Extracting audio
The ffmpeg.extractAudio
will extract the audio from a video file
as a .wav
file.
const audio = await ffmpeg.extractAudio("path_to_video")
The conversion to audio happens automatically for videos when using transcribe.
Extracting clips
You can extract a clip from a video file using ffmpeg.extractClip
.
const clip = await ffmpeg.extractClip("path_to_video", { start: "00:00:10", duration: 5,})
Probing videos
You can extract metadata from a video file using ffmpeg.probe
.
const info = await ffmpeg.probe("path_to_video")const { duration } = info.streams[0]console.log(`video duration: ${duration} seconds`)
Custom ffmpeg options
You can further customize the ffmpeg
configuration
by passing outputOptions
.
const audio = await ffmpeg.extractAudio("path_to_video", { outputOptions: "-b:a 16k",})
Or interact directly with the ffmpeg
command builder
(which is the native fluent-ffmpeg command builder).
Note that in this case, you should also provide a cache “hash” to avoid re-rendering.
const custom = await ffmpeg.run( "src/audio/helloworld.mp4", (cmd) => { cmd.noAudio() cmd.keepDisplayAspectRatio() cmd.autopad() cmd.size(`200x200`) return "out.mp4" }, { cache: "kar-200x200" })
CLI
The cli supports various command to run the video transformations.
genaiscript video probe myvid.mp4