-
Notifications
You must be signed in to change notification settings - Fork 138
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
video frame rendering using ffmpeg (#997)
* feat: 🎥 add video tasks and frame extraction support * feat: ✨ add support for custom options in frame extraction * feat: ✨ add options for frame extraction in CLI * feat: 🎥 add video frame extraction and tracing support * feat: ✨ add video transcript generation script * refactor: ♻️ remove unused imports in ffmpeg module * refactor: ♻️ rename and update video processing functions * feat: ✨ add video segmentation and caching utils * adding caching * feat: ✨ add SRT/VTT rendering to transcription results * feat: 📝 update transcript processing for srt format * feat: ✨ add videoProbe function for video metadata extraction * feat: ✨ add video support with ffmpeg utilities * feat: 🎥 add transcript integration for video frame extraction * feat: ✨ add transcript-based frame extraction example * feat: ✨ add video alt text generation guide and script * feat: ✨ add new cards for Speech To Text, Images, Videos * docs: 📝 fix broken links in transcription and videos docs
- Loading branch information
Showing
27 changed files
with
853 additions
and
140 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
--- | ||
title: Video Alt Text | ||
sidebar: | ||
order: 50 | ||
description: Learn how to generate alt text for videos | ||
keywords: Video | ||
--- | ||
|
||
import { Code } from "@astrojs/starlight/components" | ||
import src from "../../../../../packages/sample/genaisrc/video-alt-text.genai.mjs?raw" | ||
|
||
GenAIScript supports [speech transcription](/genaiscript/reference/scripts/transcription) | ||
and [video frame extraction](/genaiscript/reference/scripts/videos) which can be combined to analyze videos. | ||
|
||
## Video Alt Text | ||
|
||
The HTML video attribute does not have an `alt` attribute.. but you can still attach a accessible description using the `aria-label` attribute. | ||
We will build a script that generates the description using the transcript and video frames. | ||
|
||
## Transcript | ||
|
||
We use the `transcribe` function to generate the transcript. It will use the `transcription` model alias to compute a transcription. | ||
Transcriptions are useful to reduce hallucations of LLMs when analyzing images and also provides | ||
good timestemp candidates to screenshot the video stream. | ||
|
||
```js | ||
const file = env.files[0] | ||
const transcript = await transcribe(file) // OpenAI whisper | ||
``` | ||
|
||
## Video Frames | ||
|
||
The next step is to use the transcript to screenshot the video stream. GenAIScript uses [ffmpeg](https://ffmpeg.org/) to render the frames | ||
so make sure you have it installed and configured. | ||
|
||
```js | ||
const frames = await parsers.videoFrames(file, { | ||
transcript, | ||
}) | ||
``` | ||
|
||
## Context | ||
|
||
Both the transcript and the frames are added to the prompt context. Since some videos may be silent, we ignore empty transcripts. | ||
We also use low detail for the frames to improve performance. | ||
|
||
```js | ||
def("TRANSCRIPT", transcript?.srt, { ignoreEmpty: true }) // ignore silent videos | ||
defImages(frames, { detail: "low" }) // low detail for better performance | ||
``` | ||
## Prompting it together | ||
Finally, we give the task to the LLM to generate the alt text. | ||
```js | ||
$`You are an expert in assistive technology. | ||
You will analyze the video and generate a description alt text for the video. | ||
` | ||
``` | ||
Using this script, you can automatically generate high quality alt text for videos. | ||
```sh | ||
npx --yes genaiscript run video-alt-text path_to_video.mp4 | ||
``` | ||
## Full source | ||
<Code code={src} wrap={true} lang="js" title="video-alt-text.genai.mjs" /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
--- | ||
title: Videos | ||
description: How to use the Video in scripts | ||
sidebar: | ||
order: 10.01 | ||
--- | ||
|
||
While most LLMs do not support videos natively, they can be integrated in scripts by rendering frames | ||
and adding them as images to the prompt. This can be tedious and GenAIScript provides efficient helpers | ||
to streamline this process. | ||
|
||
## ffmpeg and ffprobe | ||
|
||
The functionalities to render and analyze videos rely on [ffmpeg](https://ffmpeg.org/) | ||
and [ffprobe](https://ffmpeg.org/ffprobe.html). | ||
|
||
Make sure these tools are installed locally and available in your PATH, | ||
or configure the `FFMPEG_PATH` / `FFPROBE_PATH` environment variables to point to the `ffmpeg`/`ffprobe` executable. | ||
|
||
### ffmpeg output caching | ||
|
||
Since video processing can be slow, GenAIScript caches the results in subfolders under `.genaiscript/videos/...` | ||
where the subfolder name is a hash from the video file content and the options used to render the video. | ||
This way, you can re-run the script without having to re-render the video. | ||
|
||
You can review the `ffmpeg` console log in the `log.txt` file in the cache folder. | ||
|
||
## Extracting frames | ||
|
||
As mentionned above, multi-modal LLMs typically support images as a sequence | ||
of frames (or screenshots). | ||
|
||
The `parsers.videoFrames` will render frames from a video file or url | ||
and return them as an array of file paths. You can use the result with `defImages` directly. | ||
|
||
```js | ||
const frames = await parsers.videoFrames("path_url_to_video") | ||
def("FRAMES", frames) | ||
``` | ||
|
||
- specify a number of frames using `count` | ||
|
||
```js "count: 10" | ||
const frames = await parsers.videoFrames("...", { count: 10 }) | ||
``` | ||
|
||
- specify timestamps in seconds or percentages of the video duration using `timestamps` (or `times`) | ||
|
||
```js "timestamps" | ||
const frames = await parsers.videoFrames("...", { timestamps: ["0%", "50%"] }) | ||
``` | ||
|
||
- specify the transcript computed by the [transcribe](/genaiscript/reference/scripts/transcription) function. GenAIScript | ||
will extract a frame at the start of each segment. | ||
|
||
```js "timestamps" | ||
const transcript = await transcribe("...") | ||
const frames = await parsers.videoFrames("...", { transcript }) | ||
``` | ||
|
||
## Extracting audio | ||
|
||
The `parsers.videoAudio` will extract the audio from a video file or url | ||
as a `.wav` file. | ||
|
||
```js | ||
const audio = await parsers.videoAudio("path_url_to_video") | ||
``` | ||
|
||
The conversion to audio happens automatically | ||
for videos when using [transcribe](/genaiscript/reference/scripts/transcription). | ||
|
||
## Probing videos | ||
|
||
You can extract metadata from a video file or url using `parsers.videoProbe`. | ||
|
||
```js | ||
const info = await parsers.videoProbe("path_url_to_video") | ||
const { duration } = info.streams[0] | ||
console.log(`video duration: ${duration} seconds`) | ||
``` |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.