Astra DB
A RocketRide store node that persists embedded documents in DataStax Astra DB and retrieves them by semantic or keyword search.
What it does
Connects to an Astra DB collection via the astrapy DataAPIClient and exposes it as a standard RocketRide document store. Documents arriving on the documents lane are written to the collection; questions arriving on the questions lane are answered by searching that collection and emitting matching documents downstream.
Collections are created on demand: the vector dimension is taken from the first incoming embedding, the similarity metric comes from configuration, and BM25 lexical search is enabled automatically with the standard analyzer. No manual collection setup is required.
Key defaults every user should know:
- Documents must be run through an embedding node before reaching this node: a chunk without an embedding raises an error at ingest time.
- Semantic search results with a similarity score below 0.20 are silently dropped.
- Inserts are batched 500 documents at a time; chunks with a near-zero vector magnitude are skipped.
- Writing chunk
0of an object deletes all existing chunks with the sameobjectIdbefore inserting (upsert semantics for re-ingested documents). - Deletion is soft by default:
markDeletedsetsmeta.isDeleted: trueand those chunks are excluded from every query unless the filter explicitly requests them.
Configuration
Lanes
| Lane in | Lane out | Description |
|---|---|---|
documents | (none) | Ingest pre-embedded documents into the collection |
questions | documents | Return matching documents |
questions | answers | Return matching documents as an answer |
questions | questions | Enrich the question with matching documents for downstream nodes |
Fields
| Field | Type | Description |
|---|---|---|
api_endpoint | string | Enter the server API endpoint e.g. |
application_token | string | Enter the server API application token |
provider | string | Default "astra_db". |
Profiles
Two profiles are built in; cloud is the default.
| Profile | Description |
|---|---|
cloud | Astra DB cloud server, requires api_endpoint and application_token |
local | Local test server at http://localhost:8080 with token test-token and collection ROCKETRIDE |
Search modes
Both modes are available without extra configuration. The pipeline's question type determines which runs:
- Semantic: vector similarity search using the
$vectorsort on the question's embedding. Results carry similarity scores; anything scoring below 0.20 is discarded before results are returned. - Keyword: native BM25 lexical search using the
$lexicalsort on the question's text.
Both modes honour the standard document filter: node id, parent path, permissions, object ids, table ids, chunk-id ranges, and the soft-delete flag.
Document lifecycle
- Ingest: each chunk is inserted with a generated UUID
_id, the embedding stored as$vector, the text stored ascontent, and all metadata stored undermeta. Re-ingesting an object (chunk0received again) first deletes that object's previous chunks, then inserts the new batch. - Soft delete / restore:
markDeleted/markActiveflipmeta.isDeletedon all chunks for the given object ids. Soft-deleted chunks are hidden from default queries. - Hard delete:
removepermanently deletes all chunks matching the given object ids. - Render: rebuilds a full document by fetching all non-deleted chunks for an object id, sorting them by
chunkIdin application code (Astra DB does not guarantee order), and concatenating the content in order.
All read operations return empty results when the collection does not yet exist; the collection is not created until the first ingest.
Authentication
Set api_endpoint to the database's Data API URL and application_token to the token generated in the Astra DB console. The token is passed directly to DataAPIClient at pipeline startup. No other authentication modes are supported.
Schema
| Field | Type | Description | Default |
|---|---|---|---|
astra_db.api_endpoint | string | API Endpoint Enter the server API endpoint e.g. | |
astra_db.application_token | string | Application Token Enter the server API application token | |
astra_db.provider | string | const: "astra_db" |
Dependencies
astrapy