Audio Transcription
GenAIScript supports transcription and translations from OpenAI like APIs.
const { text } = await transcribe("video.mp4")
Configuration
The transcription API will automatically use ffmpeg to convert videos to audio files (opus codec in a ogg container).
You need to install ffmpeg on your system. If the FFMPEG_PATH
environment variable is set,
GenAIScript will use it as the full path to the ffmpeg executable.
Otherwise, it will attempt to call ffmpeg directly
(so it should be in your PATH).
model
By default, the API uses the transcription
model alias to transcribe the audio.
You can also specify a different model alias using the model
option.
const { text } = await transcribe("...", { model: "openai:whisper-1" })
Segments
For models that support it, you can retreive the individual segments.
const { segments } = await transcribe("...")for (const segment of segments) { const { start, text } = segment console.log(`[${start}] ${text}`)}
SRT and VTT
GenAIScript renders the segments to SRT and WebVTT formats as well.
const { srt, vtt } = await transcribe("...")
Translation
Some models also support transcribing and translating to English in one pass. For this case,
set the translate: true
flag.
const { srt } = await transcribe("...", { translate: true })
Cache
You can cache the transcription results by setting the cache
option to true
(or a custom name).
const { srt } = await transcribe("...", { cache: true })
or a custom salt
const { srt } = await transcribe("...", { cache: "whisper" })