Skip to main content
View source

Image

View as Markdown

A RocketRide embedding node that generates vector embeddings from images using local vision models.

What it does

Transforms image content into normalized embedding vectors that capture the semantic and structural characteristics of the image, enabling similarity search, clustering, and other multimodal workflows. Output documents have an embedding vector and an embedding_model name attached, ready for ingestion into a vector store.

Uses Hugging Face transformers vision models and supports two model families, selected automatically from the model name:

  • CLIP (model name contains clip, e.g. openai/clip-vit-base-patch16): embeds the image via get_image_features, normalized.
  • ViT (anything else, e.g. google/vit-base-patch16-224): embeds the image as the normalized CLS token of the last hidden state.

The model runs through a proxy that transparently routes inference either locally or to the model server: no API key is required, and both paths return identical results. The node is GPU-capable, so inference is GPU-accelerated when a GPU is available.

The default model is openai/clip-vit-base-patch16.


Configuration

Lanes

Lane inLane outDescription
documentsdocumentsEmbed images carried in document objects
imagedocumentsEmbed raw image data

documents lane

Each incoming document must have type: "Image"; any other type raises a ValueError. The document's page_content is expected to be a base64-encoded image, which is decoded to a Pillow image and embedded. The enriched document (with embedding and embedding_model set) is forwarded on the documents lane; the original image is not re-routed through the raw image path.

image lane

Raw image bytes are streamed in chunks (begin / write / end). On completion the accumulated bytes are decoded, embedded, and wrapped in a new document of type Image whose page_content is the base64-encoded image. Each image in the stream receives a unique chunkId in its metadata.

Fields

FieldTypeDescription
modelstringHugging face model to use for embedding
profilestringDefault "openai-patch16". Embedding model

Profiles

Profile keyModelNotes
openai-patch16 (default)openai/clip-vit-base-patch16Good performance, lower memory
openai-patch32openai/clip-vit-base-patch32Lower performance, better recognition
google16x224google/vit-base-patch16-224Fast, accurate, general-purpose
custom(user-specified)Any Hugging Face vision model, via embedding.model

Schema

FieldTypeDescriptionDefault
embedding.modelstringModel name
Hugging face model to use for embedding
embedding.profilestringModel
Embedding model
"openai-patch16"

Dependencies

  • transformers
  • accelerate