Image

A RocketRide embedding node that generates vector embeddings from images using local vision models.

What it does

Transforms image content into normalized embedding vectors that capture the semantic and structural characteristics of the image, enabling similarity search, clustering, and other multimodal workflows. Output documents have an embedding vector and an embedding_model name attached, ready for ingestion into a vector store.

Uses Hugging Face transformers vision models and supports two model families, selected automatically from the model name:

CLIP (model name contains clip, e.g. openai/clip-vit-base-patch16): embeds the image via get_image_features, normalized.
ViT (anything else, e.g. google/vit-base-patch16-224): embeds the image as the normalized CLS token of the last hidden state.

The model runs through a proxy that transparently routes inference either locally or to the model server: no API key is required, and both paths return identical results. The node is GPU-capable, so inference is GPU-accelerated when a GPU is available.

The default model is openai/clip-vit-base-patch16.

Configuration

Lanes

Lane in	Lane out	Description
`documents`	`documents`	Embed images carried in document objects
`image`	`documents`	Embed raw image data

documents lane

Each incoming document must have type: "Image"; any other type raises a ValueError. The document's page_content is expected to be a base64-encoded image, which is decoded to a Pillow image and embedded. The enriched document (with embedding and embedding_model set) is forwarded on the documents lane; the original image is not re-routed through the raw image path.

image lane

Raw image bytes are streamed in chunks (begin / write / end). On completion the accumulated bytes are decoded, embedded, and wrapped in a new document of type Image whose page_content is the base64-encoded image. Each image in the stream receives a unique chunkId in its metadata.

Fields

Field	Type	Description
`model`	string	Hugging face model to use for embedding
`profile`	string	Default "openai-patch16". Embedding model

Profiles

Profile key	Model	Notes
`openai-patch16` (default)	`openai/clip-vit-base-patch16`	Good performance, lower memory
`openai-patch32`	`openai/clip-vit-base-patch32`	Lower performance, better recognition
`google16x224`	`google/vit-base-patch16-224`	Fast, accurate, general-purpose
`custom`	(user-specified)	Any Hugging Face vision model, via `embedding.model`

Schema

Field	Type	Description	Default
`embedding.model`	`string`	Model name Hugging face model to use for embedding
`embedding.profile`	`string`	Model Embedding model	`"openai-patch16"`

Dependencies

transformers
accelerate

What it does​

Configuration​

Lanes​

documents lane​

image lane​

Fields​

Profiles​

Schema​

Dependencies​