A simple 8-bit style graphic features a blocky computer monitor displaying an audio waveform. Nearby, a video file icon transitions into an audio file icon, indicated by an arrow between them. There is also a symbol for SRT subtitles and a small settings gear, all created with flat geometric shapes using five colors against a blank background. There are no people or text in the image, and the size is 128 by 128 pixels.

Audio Transcription

GenAIScript supports transcription and translations from OpenAI like APIs.

const { text } = await transcribe("video.mp4")

Configuration

The transcription API will automatically use ffmpeg to convert videos to audio files (opus codec in a ogg container).

You need to install ffmpeg on your system. If the FFMPEG_PATH environment variable is set, GenAIScript will use it as the full path to the ffmpeg executable. Otherwise, it will attempt to call ffmpeg directly (so it should be in your PATH).

model

By default, the API uses the transcription model alias to transcribe the audio. You can also specify a different model alias using the model option.

const { text } = await transcribe("...", { model: "openai:whisper-1" })

Segments

For models that support it, you can retrieve the individual segments.

const { segments } = await transcribe("...")
for (const segment of segments) {
    const { start, text } = segment
    console.log(`[${start}] ${text}`)
}

SRT and VTT

GenAIScript renders the segments to SRT and WebVTT formats as well.

const { srt, vtt } = await transcribe("...")

Translation

Some models also support transcribing and translating to English in one pass. For this case, set the translate: true flag.

const { srt } = await transcribe("...", { translate: true })

Cache

You can cache the transcription results by setting the cache option to true (or a custom name).

const { srt } = await transcribe("...", { cache: true })

or a custom salt

const { srt } = await transcribe("...", { cache: "whisper" })

VTT, SRT parsers

You can parse VTT and SRT files using the parsers.transcription function.

const segments = parsers.transcription("WEBVTT...")