Transformer
A RocketRide embedding node that converts text into vector representations using local sentence-transformer models.
What it does
Generates text embeddings using local sentence-transformer models. Runs on the model server, so no API key is required. GPU-accelerated when available (the node declares the gpu capability).
Uses the SentenceTransformer class to load the configured Hugging Face model at pipeline start. On load, the node reports the model's vector size and maximum token count, and streams loading progress via monitor status so long model downloads are visible in the UI.
Documents are encoded in batches: incoming document chunks are buffered until 64 documents accumulate, then encoded in a single batch and written downstream. Any remaining documents are flushed when the input closes. Questions are encoded immediately, all questions in a request in one batch.
Each encoded item gets two fields set: embedding (the vector as a list of floats) and embedding_model (the model name that produced it).
Configuration
Lanes
| Lane in | Lane out | Description |
|---|---|---|
documents | documents | Embed document chunks, attach vector to each document |
questions | questions | Embed a question for vector similarity lookup |
The questions lane is used when querying a vector store: the store expects an embedded question to compare against stored document vectors.
Fields
| Field | Type | Description |
|---|---|---|
model | string | Hugging face model to use for embedding |
truncate_dim | number | Truncate embeddings to this dimensionality (0 = use model default) |
document_prefix | string | Prefix prepended to document text before encoding (e.g. 'search_document: ', 'passage: ') |
query_prefix | string | Prefix prepended to query text before encoding (e.g. 'search_query: ', 'query: ') |
profile | string | Default "miniLM". Embedding model |
Custom model options (shown when the custom profile is selected):
| Field | Type / Default | Description |
|---|---|---|
embedding.model | string | Any Hugging Face sentence-transformer model name |
embedding.truncate_dim | number | Truncate embeddings to this dimensionality (0 = use model default) |
embedding.document_prefix | string | Prefix prepended to document text before encoding (e.g. search_document: , passage: ) |
embedding.query_prefix | string | Prefix prepended to query text before encoding (e.g. search_query: , query: ) |
The prefixes matter for asymmetric models (such as Nomic or E5) that were trained with distinct document/query markers; leave both blank for the bundled symmetric profiles.
The node also exposes an embedding.preprocessor combo field (default preprocessor_langchain) that selects the chunking preprocessor used ahead of embedding.
Profiles
| Profile | Model | Notes |
|---|---|---|
| miniLM (default) | sentence-transformers/multi-qa-MiniLM-L6-cos-v1 | General use, good performance |
| miniAll | sentence-transformers/all-MiniLM-L6-v2 | General use alternative |
| mpnet | sentence-transformers/multi-qa-mpnet-base-cos-v1 | Higher quality |
| custom | (user-specified) | Any Hugging Face model |
Schema
| Field | Type | Description | Default |
|---|---|---|---|
embedding.document_prefix | string | Document prefix Prefix prepended to document text before encoding (e.g. 'search_document: ', 'passage: ') | |
embedding.model | string | Model name Hugging face model to use for embedding | |
embedding.preprocessor | "preprocessor_langchain" | ||
embedding.profile | string | Model Embedding model | "miniLM" |
embedding.query_prefix | string | Query prefix Prefix prepended to query text before encoding (e.g. 'search_query: ', 'query: ') | |
embedding.truncate_dim | number | Truncate dimensions Truncate embeddings to this dimensionality (0 = use model default) |
Dependencies
numpy