Video

View as Markdown

A pipeline filter node that extracts frames from a video stream and encodes each frame as a vector embedding using a Hugging Face vision model.

What it does

Receives a video stream on its video input lane, buffers the full video in memory, then decodes it with OpenCV (cv2.VideoCapture) via a temporary file. Frames are sampled at a configurable interval and each is passed through a vision embedding model (CLIP by default, loaded via PyTorch and transformers) to produce a fixed-length numeric vector. One document is emitted per frame; all documents for a video are flushed as a single batch when processing completes.

The embedding engine is shared with the embedding_image node and a thread lock serializes GPU access, so concurrent video streams do not race for the model. The node is flagged experimental and declares a gpu capability.

Three safety limits apply out of the box:

Videos larger than maxVideoSizeMB (default 500 MB) are rejected with a warning and produce no output.
Frame extraction is capped at max_frames (default 50) per video; set to 0 for unlimited.
If the container does not report a frame rate, a fallback of 30 fps is used when computing timestamps.

Supported containers: MP4 (video/mp4), AVI (video/x-msvideo), QuickTime (video/quicktime), WebM (video/webm). An unrecognized MIME type is still processed, treated as MP4.

Configuration

Lanes

Lane in	Lane out	Description
`video`	`documents`	One document per extracted frame, with an embedding

Each output document contains:

type: Image, with page_content holding the frame as a base64-encoded PNG
embedding: the frame's embedding vector (list of floats), plus embedding_model (the model identifier string)
metadata: time_stamp (seconds from the start of the video), frame_number (frame index in the source), and a per-video chunkId counter

Fields

Field	Type	Description
`model`	string	Hugging Face model to use for frame embedding
`profile`	string	Default "openai-patch16". Embedding model for video frames
`interval`	number	Default 5. Time in seconds between extracted frames
`max_frames`	number	Default 50. Limit the total number of frames extracted from the video. Set to 0 for unlimited.
`start_time`	number	Default 0.
`duration`	number	Default 0.
`maxVideoSizeMB`	number	Default 500. Maximum allowed video file size in megabytes. Videos exceeding this limit will be rejected.

The extraction window runs from start_time to start_time + duration, clamped to the actual video length.

Profiles

Profile ID	Model	Notes
`openai-patch16` (default)	`openai/clip-vit-base-patch16`	Good performance, lower memory
`openai-patch32`	`openai/clip-vit-base-patch32`	Lower performance, better recognition
`google16x224`	`google/vit-base-patch16-224`	Fast, accurate, general-purpose
`custom`	user-specified via `model`	Any Hugging Face vision model

Schema

Field	Type	Description	Default
`embedding.duration`	`number`	Duration (in seconds) for frame extraction (0=end of video)	`0`
`embedding.interval`	`number`	Interval (in seconds) between frames Time in seconds between extracted frames	`5`
`embedding.maxVideoSizeMB`	`number`	Maximum video file size (MB) Maximum allowed video file size in megabytes. Videos exceeding this limit will be rejected.	`500`
`embedding.max_frames`	`number`	Maximum number of frames to extract (0=unlimited) Limit the total number of frames extracted from the video. Set to 0 for unlimited.	`50`
`embedding.model`	`string`	Model name Hugging Face model to use for frame embedding
`embedding.profile`	`string`	Model Embedding model for video frames	`"openai-patch16"`
`embedding.start_time`	`number`	Start time (in seconds) for frame extraction (0=beginning)	`0`

Dependencies

transformers
accelerate

What it does​

Configuration​

Lanes​

Fields​

Profiles​

Schema​

Dependencies​