Skip to main content
View source

MongoDB Atlas

View as Markdown

A RocketRide vector store node that stores embedded document chunks in MongoDB Atlas and retrieves them by semantic or keyword search.

What it does

Ingests pre-embedded document chunks into a MongoDB Atlas collection and searches them in response to questions. Semantic search runs an Atlas $vectorSearch aggregation against the question's embedding; keyword search uses MongoDB full-text search ($text) against the question's text.

Uses pymongo (with dnspython for mongodb+srv:// URI resolution), so no local MongoDB tooling is required on the machine running the engine.

Requires a MongoDB Atlas M10+ cluster or a serverless instance: Atlas Vector Search indexes are not available on free-tier (M0) clusters. On first ingest, the node automatically creates the collection, regular indexes on all metadata fields (meta.nodeId, meta.objectId, meta.parent, meta.permissionId, meta.isDeleted, meta.isTable, meta.chunkId, meta.tableId), a text index for keyword search, and a knnVector search index sized to the embedding dimensions. If Atlas rejects the search-index creation, the node logs the error and continues; in that case, create the vector search index manually in the Atlas UI.

Documents must be run through an embedding node before reaching this node. A chunk without an embedding raises an error at ingest.


Configuration

Lanes

Lane inLane outDescription
documents(none)Ingest pre-embedded documents into the collection
questionsdocumentsReturn matching documents
questionsanswersReturn matching documents as an answer
questionsquestionsEnrich the question with matching documents for downstream nodes

Fields

FieldTypeDescription
providerstringDefault "atlas".

Configuration is validated when the pipeline is saved: the host must match the mongodb+srv://user:pass@cluster.xxxxx.mongodb.net/?... URI pattern and all field-length and character restrictions listed above are checked at that time.


Search modes

Semantic uses Atlas Vector Search against the question's embedding. The pipeline requests limit x 10 candidates (numCandidates) and returns up to limit results (default 25). A non-zero offset is not supported and raises an error.

Keyword uses MongoDB full-text search against the question's text, sorted by text score, with offset/limit paging (default limit 25).

Raw scores are normalized to a 0-1 range before being returned. Cosine scores are mapped with (score + 1) / 2; other metrics pass through a sigmoid. Results that normalize below 0.20 are discarded regardless of the configured score threshold.


Ingest behavior

Before inserting, any existing documents whose meta.objectId matches an incoming top-level chunk (chunkId of 0) are deleted, so re-ingesting a document replaces it rather than duplicating it. Inserts are batched: a batch is flushed at 500 documents or when its accumulated size exceeds payloadLimit. Each stored document gets a generated UUID _id and carries embedding, content, and the chunk metadata under meta.

Documents can be marked deleted or active in place via meta.isDeleted. Filters exclude deleted documents by default. The node can also reassemble a full document from its chunks in chunkId order and stream the text to the renderData lane.


Schema

FieldTypeDescriptionDefault
atlas.providerstringconst: "atlas"
vector.cloud.hostEnter the server IP address e.g. ..atlas.io
vector.database"rocketride_db"

Dependencies

  • pymongo
  • dnspython
  • pydantic
  • urllib3