Vectorizer
Chunks incoming text and tables, embeds them, and writes the resulting documents to the configured store.
What it does
The vectorizer is an internal (capabilities: ["internal"]) filter in the ingestion path,
registered under the vectorizer:// protocol. As text and tables flow through, it:
- Checks whether the current object is flagged for vectorization (
FLAGS.VECTORIZE), objects without the flag pass through untouched, and empty text is skipped. - Splits the text into chunks using the configured preprocessor.
- Builds per-chunk document metadata: chunk id, table flag and table id, deletion flag,
and the object's permission id (
-1when the object carries none). Chunk and table counters reset for every new object. - Computes embeddings for the chunks via the embedding component.
- Persists the chunks: either directly to the store (instance mode) or by writing them downstream to the endpoint store driver (transform mode).
On retrieval (renderObject), it pulls previously vectorized content back out of the
store and feeds it to the text writer, suppressing the default rendering path.
There are no user-facing config fields, no lanes, no profiles, and no classType, the
engine wires this node up for you rather than you placing it by hand.
Configuration
The node has no fields of its own. At startup it reads three optional multi-provider sections from the connection config and instantiates the matching component for each one that is present:
| Section | Resolved via | Purpose |
|---|---|---|
preprocessor | getPreprocessor | Splits incoming text/tables into chunks |
embedding | getEmbedding | Encodes chunks into vectors |
store | getStore | Persists chunks and serves them back on render |
No credentials are configured here; each component carries its own provider config. In config open mode, nothing is initialized at all.
requirements.txt is intentionally empty, the node relies on the separately installed
AI module, which brings its own dependencies.
Modes
Behavior depends on the endpoint's open mode:
- Instance: chunks are added directly to the store via
addChunks. The object'svectorBatchIdis reset to0when processing opens and set to1on close, marking the object as vectorized. - Transform: chunks are written downstream with
writeDocumentsand the endpoint store driver handles persistence. - Config: no preprocessor, embedding, or store is created.
Rendering
When an object is rendered, the node first checks vectorBatchId: if the object was
never vectorized (vectorBatchId == 0), it does nothing and default rendering proceeds.
Otherwise it retrieves the stored text for the object id from the store, streams it back
through the text writer, and calls preventDefault() so the content is served from the
vector store rather than re-extracted.
Schema
No configuration fields.