Skip to main content
View source

Parse/Process/Embed

View as Markdown

An internal RocketRide meta-node ("Parse/Process/Embed") that assembles the document parse, preprocess, embed, and store filter stack at pipeline build time.

What it does

This is an internal node: it is wired automatically by the pipeline engine, and most users will not add it manually.

Rather than processing data itself, autopipe inspects the task configuration during beginGlobal and inserts the appropriate filter nodes (parser, OCR, indexer, preprocessor, embedding, vector store) into the pipeline. Which filters get inserted depends on the engine's current operation mode (CONFIG, SOURCE_INDEX, INDEX, INSTANCE, or TRANSFORM) and on which sub-configurations are present.

The node registers as a filter with class type other and capability internal. It defines no lanes of its own; all data flows through the filters it inserts.


What gets inserted

ModeFilters inserted
INDEXvector store (if store configured) → indexer
INSTANCEparse → (OCR if enabled)(indexer if enabled)(preprocessor)(embedding)(store)
TRANSFORMparse → (OCR if enabled)(preprocessor)(embedding)(store)
CONFIG, SOURCE_INDEXnothing

Filter instances are inserted with fixed ids: parse_1, ocr_1, indexer_1, preprocessor_1, embedding_1, vector_1.

The OCR and indexer toggles are not autopipe config fields. They are read from the include entries of the task configuration (service section for instance tasks, source section for transform tasks). A filter is enabled when any include entry sets the corresponding key (ocr or index) to true. The indexer is never inserted for transform tasks.

Where autopipe reads its own configuration also depends on the mode: for INSTANCE and INDEX tasks it comes from the autopipe key of the task configuration; for TRANSFORM tasks it comes from the autopipe key of the service config parameters.


Configuration

Each field is a multi-provider sub-configuration (provider key plus a provider-named config block). The pipe shape exposes remote, embedding, and store; the transform shape exposes only remote and embedding.

FieldDefaultDescription
remoteremote provider, local profileRemote processing target (see note below)
preprocessorpreprocessor_langchain, default profileText chunking before embedding
embeddingembedding_transformer, miniLM profileEmbedding model used to vectorize chunks
storeqdrant, local profile (collection ROCKETRIDE, localhost:6333)Vector store the embeddings are written to

Defaults come from the default preconfig profile in services.json. The preprocessor, embedding, and store filters are each only inserted when the corresponding key is present in the resolved autopipe configuration.


Remote processing

The implementation distinguishes local and remote filter placement, and the default profile carries a remote sub-configuration (host, port, apikey, mode: local). However, the remote-pipeline assembly path is currently commented out in IGlobal.endGlobal and the remote queue is never dispatched: all inserted filters run locally regardless of the remote setting.


Schema

No configuration fields.