Skip to main content
View source

Transformer

View as Markdown

A RocketRide embedding node that converts text into vector representations using local sentence-transformer models.

What it does

Generates text embeddings using local sentence-transformer models. Runs on the model server, so no API key is required. GPU-accelerated when available (the node declares the gpu capability).

Uses the SentenceTransformer class to load the configured Hugging Face model at pipeline start. On load, the node reports the model's vector size and maximum token count, and streams loading progress via monitor status so long model downloads are visible in the UI.

Documents are encoded in batches: incoming document chunks are buffered until 64 documents accumulate, then encoded in a single batch and written downstream. Any remaining documents are flushed when the input closes. Questions are encoded immediately, all questions in a request in one batch.

Each encoded item gets two fields set: embedding (the vector as a list of floats) and embedding_model (the model name that produced it).


Configuration

Lanes

Lane inLane outDescription
documentsdocumentsEmbed document chunks, attach vector to each document
questionsquestionsEmbed a question for vector similarity lookup

The questions lane is used when querying a vector store: the store expects an embedded question to compare against stored document vectors.

Fields

FieldTypeDescription
modelstringHugging face model to use for embedding
truncate_dimnumberTruncate embeddings to this dimensionality (0 = use model default)
document_prefixstringPrefix prepended to document text before encoding (e.g. 'search_document: ', 'passage: ')
query_prefixstringPrefix prepended to query text before encoding (e.g. 'search_query: ', 'query: ')
profilestringDefault "miniLM". Embedding model

Custom model options (shown when the custom profile is selected):

FieldType / DefaultDescription
embedding.modelstringAny Hugging Face sentence-transformer model name
embedding.truncate_dimnumberTruncate embeddings to this dimensionality (0 = use model default)
embedding.document_prefixstringPrefix prepended to document text before encoding (e.g. search_document: , passage: )
embedding.query_prefixstringPrefix prepended to query text before encoding (e.g. search_query: , query: )

The prefixes matter for asymmetric models (such as Nomic or E5) that were trained with distinct document/query markers; leave both blank for the bundled symmetric profiles.

The node also exposes an embedding.preprocessor combo field (default preprocessor_langchain) that selects the chunking preprocessor used ahead of embedding.


Profiles

ProfileModelNotes
miniLM (default)sentence-transformers/multi-qa-MiniLM-L6-cos-v1General use, good performance
miniAllsentence-transformers/all-MiniLM-L6-v2General use alternative
mpnetsentence-transformers/multi-qa-mpnet-base-cos-v1Higher quality
custom(user-specified)Any Hugging Face model

Schema

FieldTypeDescriptionDefault
embedding.document_prefixstringDocument prefix
Prefix prepended to document text before encoding (e.g. 'search_document: ', 'passage: ')
embedding.modelstringModel name
Hugging face model to use for embedding
embedding.preprocessor"preprocessor_langchain"
embedding.profilestringModel
Embedding model
"miniLM"
embedding.query_prefixstringQuery prefix
Prefix prepended to query text before encoding (e.g. 'search_query: ', 'query: ')
embedding.truncate_dimnumberTruncate dimensions
Truncate embeddings to this dimensionality (0 = use model default)

Dependencies

  • numpy