# vectorizer

Chunks incoming text and tables, embeds them, and writes the resulting documents to the configured store.

## What it does

The vectorizer is an internal (`capabilities: ["internal"]`) filter in the ingestion path,
registered under the `vectorizer://` protocol. As text and tables flow through, it:

1. Checks whether the current object is flagged for vectorization (`FLAGS.VECTORIZE`),
   objects without the flag pass through untouched, and empty text is skipped.
2. Splits the text into chunks using the configured preprocessor.
3. Builds per-chunk document metadata: chunk id, table flag and table id, deletion flag,
   and the object's permission id (`-1` when the object carries none). Chunk and table
   counters reset for every new object.
4. Computes embeddings for the chunks via the embedding component.
5. Persists the chunks: either directly to the store (instance mode) or by writing them
   downstream to the endpoint store driver (transform mode).

On retrieval (`renderObject`), it pulls previously vectorized content back out of the
store and feeds it to the text writer, suppressing the default rendering path.

There are no user-facing config fields, no lanes, no profiles, and no `classType`, the
engine wires this node up for you rather than you placing it by hand.

---

## Configuration

The node has no fields of its own. At startup it reads three optional multi-provider
sections from the connection config and instantiates the matching component for each one
that is present:

| Section        | Resolved via       | Purpose                                      |
|----------------|--------------------|----------------------------------------------|
| `preprocessor` | `getPreprocessor`  | Splits incoming text/tables into chunks      |
| `embedding`    | `getEmbedding`     | Encodes chunks into vectors                  |
| `store`        | `getStore`         | Persists chunks and serves them back on render |

No credentials are configured here; each component carries its own provider config. In
config open mode, nothing is initialized at all.

`requirements.txt` is intentionally empty, the node relies on the separately installed
AI module, which brings its own dependencies.

---

## Modes

Behavior depends on the endpoint's open mode:

- **Instance**: chunks are added directly to the store via `addChunks`. The object's
  `vectorBatchId` is reset to `0` when processing opens and set to `1` on close, marking
  the object as vectorized.
- **Transform**: chunks are written downstream with `writeDocuments` and the endpoint
  store driver handles persistence.
- **Config**: no preprocessor, embedding, or store is created.

---

## Rendering

When an object is rendered, the node first checks `vectorBatchId`: if the object was
never vectorized (`vectorBatchId == 0`), it does nothing and default rendering proceeds.
Otherwise it retrieves the stored text for the object id from the store, streams it back
through the text writer, and calls `preventDefault()` so the content is served from the
vector store rather than re-extracted.

---

<!-- ROCKETRIDE:GENERATED:PARAMS START -->
<!-- Generated by nodes:docs-generate. Do not edit by hand. -->

## Schema

_No configuration fields._

## Source

[<svg viewBox="0 0 16 16" width="15" height="15" fill="currentColor" aria-hidden="true" style="vertical-align:-0.15em;margin-right:0.35em"><path d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0016 8c0-4.42-3.58-8-8-8z"/></svg> View source](https://github.com/rocketride-org/rocketride-server/tree/develop/nodes/src/nodes/vectorizer)
<!-- ROCKETRIDE:GENERATED:PARAMS END -->
