Image
A RocketRide embedding node that generates vector embeddings from images using local vision models.
What it does
Transforms image content into normalized embedding vectors that capture the semantic and
structural characteristics of the image, enabling similarity search, clustering, and other
multimodal workflows. Output documents have an embedding vector and an embedding_model
name attached, ready for ingestion into a vector store.
Uses Hugging Face transformers vision models and supports two model families, selected
automatically from the model name:
- CLIP (model name contains
clip, e.g.openai/clip-vit-base-patch16): embeds the image viaget_image_features, normalized. - ViT (anything else, e.g.
google/vit-base-patch16-224): embeds the image as the normalized CLS token of the last hidden state.
The model runs through a proxy that transparently routes inference either locally or to the model server: no API key is required, and both paths return identical results. The node is GPU-capable, so inference is GPU-accelerated when a GPU is available.
The default model is openai/clip-vit-base-patch16.
Configuration
Lanes
| Lane in | Lane out | Description |
|---|---|---|
documents | documents | Embed images carried in document objects |
image | documents | Embed raw image data |
documents lane
Each incoming document must have type: "Image"; any other type raises a ValueError.
The document's page_content is expected to be a base64-encoded image, which is decoded
to a Pillow image and embedded. The enriched document (with embedding and
embedding_model set) is forwarded on the documents lane; the original image is not
re-routed through the raw image path.
image lane
Raw image bytes are streamed in chunks (begin / write / end). On completion the
accumulated bytes are decoded, embedded, and wrapped in a new document of type Image
whose page_content is the base64-encoded image. Each image in the stream receives a
unique chunkId in its metadata.
Fields
| Field | Type | Description |
|---|---|---|
model | string | Hugging face model to use for embedding |
profile | string | Default "openai-patch16". Embedding model |
Profiles
| Profile key | Model | Notes |
|---|---|---|
openai-patch16 (default) | openai/clip-vit-base-patch16 | Good performance, lower memory |
openai-patch32 | openai/clip-vit-base-patch32 | Lower performance, better recognition |
google16x224 | google/vit-base-patch16-224 | Fast, accurate, general-purpose |
custom | (user-specified) | Any Hugging Face vision model, via embedding.model |
Schema
| Field | Type | Description | Default |
|---|---|---|---|
embedding.model | string | Model name Hugging face model to use for embedding | |
embedding.profile | string | Model Embedding model | "openai-patch16" |
Dependencies
transformersaccelerate