# embedding_video A pipeline filter node that extracts frames from a video stream and encodes each frame as a vector embedding using a Hugging Face vision model. ## What it does Receives a video stream on its `video` input lane, buffers the full video in memory, then decodes it with **OpenCV** (`cv2.VideoCapture`) via a temporary file. Frames are sampled at a configurable interval and each is passed through a vision embedding model (CLIP by default, loaded via PyTorch and `transformers`) to produce a fixed-length numeric vector. One document is emitted per frame; all documents for a video are flushed as a single batch when processing completes. The embedding engine is shared with the `embedding_image` node and a thread lock serializes GPU access, so concurrent video streams do not race for the model. The node is flagged **experimental** and declares a **gpu** capability. Three safety limits apply out of the box: - Videos larger than `maxVideoSizeMB` (default 500 MB) are rejected with a warning and produce no output. - Frame extraction is capped at `max_frames` (default 50) per video; set to `0` for unlimited. - If the container does not report a frame rate, a fallback of 30 fps is used when computing timestamps. Supported containers: MP4 (`video/mp4`), AVI (`video/x-msvideo`), QuickTime (`video/quicktime`), WebM (`video/webm`). An unrecognized MIME type is still processed, treated as MP4. --- ## Configuration ### Lanes | Lane in | Lane out | Description | | ------- | ----------- | --------------------------------------------------- | | `video` | `documents` | One document per extracted frame, with an embedding | Each output document contains: - `type`: `Image`, with `page_content` holding the frame as a base64-encoded PNG - `embedding`: the frame's embedding vector (list of floats), plus `embedding_model` (the model identifier string) - metadata: `time_stamp` (seconds from the start of the video), `frame_number` (frame index in the source), and a per-video `chunkId` counter ### Fields | Field | Type | Description | |---|---|---| | `model` | string | Hugging Face model to use for frame embedding | | `profile` | string | Default "openai-patch16". Embedding model for video frames | | `interval` | number | Default 5. Time in seconds between extracted frames | | `max_frames` | number | Default 50. Limit the total number of frames extracted from the video. Set to 0 for unlimited. | | `start_time` | number | Default 0. | | `duration` | number | Default 0. | | `maxVideoSizeMB` | number | Default 500. Maximum allowed video file size in megabytes. Videos exceeding this limit will be rejected. | The extraction window runs from `start_time` to `start_time + duration`, clamped to the actual video length. --- ## Profiles | Profile ID | Model | Notes | | -------------------------- | ------------------------------ | -------------------------------------- | | `openai-patch16` (default) | `openai/clip-vit-base-patch16` | Good performance, lower memory | | `openai-patch32` | `openai/clip-vit-base-patch32` | Lower performance, better recognition | | `google16x224` | `google/vit-base-patch16-224` | Fast, accurate, general-purpose | | `custom` | user-specified via `model` | Any Hugging Face vision model | --- ## Schema | Field | Type | Description | Default | |---|---|---|---| | `embedding.duration` | `number` | **Duration (in seconds) for frame extraction (0=end of video)** | `0` | | `embedding.interval` | `number` | **Interval (in seconds) between frames**
Time in seconds between extracted frames | `5` | | `embedding.maxVideoSizeMB` | `number` | **Maximum video file size (MB)**
Maximum allowed video file size in megabytes. Videos exceeding this limit will be rejected. | `500` | | `embedding.max_frames` | `number` | **Maximum number of frames to extract (0=unlimited)**
Limit the total number of frames extracted from the video. Set to 0 for unlimited. | `50` | | `embedding.model` | `string` | **Model name**
Hugging Face model to use for frame embedding | | | `embedding.profile` | `string` | **Model**
Embedding model for video frames | `"openai-patch16"` | | `embedding.start_time` | `number` | **Start time (in seconds) for frame extraction (0=beginning)** | `0` | ## Dependencies - `transformers` - `accelerate` ## Source [ View source](https://github.com/rocketride-org/rocketride-server/tree/develop/nodes/src/nodes/embedding_video)