A small, square, flat-design icon features simplified shapes: a film strip, camera, and scissors grouped to indicate video cutting; a jagged line for an audio waveform to suggest audio extraction; and a plain rectangle representing extracted video frames. The icon uses a basic, 8-bit, 5-color palette, all in a geometric, corporate style, with no background, people, or text.

Videos as Inputs

While most LLMs do not support videos natively, they can be integrated in scripts by rendering frames and adding them as images to the prompt. This can be tedious and GenAIScript provides efficient helpers to streamline this process.

ffmpeg configuration

The functionalities to render and analyze videos rely on ffmpeg and ffprobe.

On Linux, you can try

sudo apt-get update && sudo apt-get install ffmpeg

Make sure these tools are installed locally and available in your PATH, or configure the FFMPEG_PATH / FFPROBE_PATH environment variables to point to the ffmpeg/ffprobe executable.

Extracting frames

As mentionned above, multi-modal LLMs typically support images as a sequence of frames (or screenshots).

The ffmpeg.extractFrames will render frames from a video file and return them as an array of file paths. You can use the result with defImages directly.

by default, extract keyframes (intra-frames)

const frames = await ffmpeg.extractFrames("path_to_video")
defImages(frames)

specify a number of frames using count

const frames = await ffmpeg.extractFrames("...", { count: 10 })

specify timestamps in seconds or percentages of the video duration using timestamps (or times)

const frames = await ffmpeg.extractFrames("...", {
    timestamps: ["00:00", "05:00"],
})

specify the transcript computed by the transcribe function. GenAIScript will extract a frame at the start of each segment.

const transcript = await transcribe("...")
const frames = await ffmpeg.extractFrames("...", { transcript })

specify a scene threshold (between 0 and 1)

const transcript = await transcribe("...", { sceneThreshold: 0.3 })

Extracting audio

The ffmpeg.extractAudio will extract the audio from a video file as a .wav file.

const audio = await ffmpeg.extractAudio("path_to_video")

The conversion to audio happens automatically for videos when using transcribe.

Extracting clips

You can extract a clip from a video file using ffmpeg.extractClip.

const clip = await ffmpeg.extractClip("path_to_video", {
    start: "00:00:10",
    duration: 5,
})

Probing videos

You can extract metadata from a video file using ffmpeg.probe.

const info = await ffmpeg.probe("path_to_video")
const { duration } = info.streams[0]
console.log(`video duration: ${duration} seconds`)

Custom ffmpeg options

You can further customize the ffmpeg configuration by passing outputOptions.

const audio = await ffmpeg.extractAudio("path_to_video", {
    outputOptions: "-b:a 16k",
})

Or interact directly with the ffmpeg command builder (which is the native fluent-ffmpeg command builder). Note that in this case, you should also provide a cache “hash” to avoid re-rendering.

const custom = await ffmpeg.run(
    "src/audio/helloworld.mp4",
    (cmd) => {
        cmd.noAudio()
        cmd.keepDisplayAspectRatio()
        cmd.autopad()
        cmd.size(`200x200`)
        return "out.mp4"
    },
    { cache: "kar-200x200" }
)

CLI

The cli supports various command to run the video transformations.

genaiscript video probe myvid.mp4