Chroma
A RocketRide vector store node backed by ChromaDB that ingests pre-embedded documents, answers questions via semantic or keyword search, and exposes search/upsert/delete as agent-callable tools.
What it does
Stores pre-embedded document chunks in a ChromaDB collection and retrieves them against incoming questions by semantic (vector) or keyword search. The node is registered with classType: ["store", "tool"] and the invoke capability, so an agent in the same pipeline can also call it directly as a tool (for example chroma.search, chroma.upsert, chroma.delete).
Uses the chromadb-client package (the lightweight HTTP client only, not the full embedded database) and connects via chromadb.HttpClient. A ChromaDB server (self-hosted or ChromaDB Cloud) must therefore be reachable at the configured host and port.
Documents must pass through an embedding node before reaching this node; chunks without an embedding are rejected with an error. The collection is created on first write via get_or_create_collection, with the configured similarity metric stored as hnsw:space. Soft deletes are supported: documents can be marked isDeleted in metadata and are then excluded from search results unless the filter explicitly requests deleted records.
Configuration
Lanes
| Lane in | Lane out | Description |
|---|---|---|
documents | (none) | Ingest pre-embedded documents into the collection |
questions | documents | Return matching documents |
questions | answers | Return matching documents as an answer |
questions | questions | Enrich the question with matching documents for downstream nodes |
Fields
| Field | Type | Description |
|---|---|---|
serverName | string | Default "chroma". Namespace for agent-facing tool names, e.g. 'chroma' exposes tools as chroma.search / chroma.upsert / chroma.delete. Change this when running multiple Chroma nodes in the same pipeline so their tool names do not collide. |
profile | string | Default "cloud". Connect to... |
provider | string |
Profiles
| Profile | Description |
|---|---|
local | Your own ChromaDB server. Connects with plain HttpClient(host, port), no authentication. |
cloud | ChromaDB Cloud. Requires host and apikey; authenticates using ChromaDB's TokenAuthClientProvider. |
Agent tools
When wired to an agent, the node exposes three tools via VectorStoreToolMixin. Each tool is named <serverName>.<tool> (defaults: chroma.search, chroma.upsert, chroma.delete).
| Tool | Key inputs | Description |
|---|---|---|
search | query (required); top_k (default 10, max 100); filter (optional dict, keys objectId/nodeId/parent are honored) | Semantic search over stored documents; returns content, metadata, and score per result. Falls back to keyword search if semantic search fails. |
upsert | documents array, each with content and object_id; optional metadata, embedding, and embedding_model | Add or update documents. Embeddings are computed automatically via the bound embedding provider, or pre-computed vectors can be supplied. |
delete | object_ids (non-empty string array) | Hard-delete documents by object ID. Returns deleted_count. |
Tool calls run on the control plane and do not flow through the pipeline's embedding lanes. Semantic search in the search tool and automatic embedding in upsert require an embedding provider bound to the node (the all.embedding block in its parameters). Without one, those calls return {"success": false, "error": ...}.
Search behavior
- Semantic search requires the question to carry an embedding; it raises an error otherwise. Non-zero result offsets are not supported in semantic search.
- Keyword search uses ChromaDB's
$containsdocument filter and supports offset/limit paging. - Raw distances are normalized to scores: cosine distances map to
(distance + 1) / 2;l2/ipdistances pass through a sigmoid. Results scoring below 0.20 are always dropped before they leave the node, regardless of thescorethreshold. - Filters on
nodeId,parent,objectId,tableId,chunkIdranges, and permissions are translated to ChromaDBwhereclauses. Documents marked deleted are excluded with$ne: true, so records that never had anisDeletedkey still match (they are treated as active).
Ingestion behavior
- Chunks are upserted in batches, flushed every 500 chunks or when the accumulated payload exceeds
payloadLimit(32 MiB by default). - When a chunk with
chunkId: 0arrives, all existing chunks sharing the sameobjectIdare deleted first, so re-ingesting a document replaces it rather than duplicating it. - Each stored chunk receives a fresh UUID as its ChromaDB record id;
objectIdandchunkIdin metadata are the stable application-level identifiers. - Rendering a full document re-assembles chunks in
chunkIdorder, fetchingrenderChunkSizechunks per round trip and tolerating gaps in the sequence.
Authentication
Local profile
No authentication is required. The node connects with chromadb.HttpClient(host, port).
Cloud profile
Set profile to cloud, provide the ChromaDB Cloud host and your apikey. The node authenticates using chromadb.auth.token_authn.TokenAuthClientProvider configured via ChromaDB's Settings object.
Schema
| Field | Type | Description | Default |
|---|---|---|---|
chroma.profile | string | Type of chroma host Connect to... | "cloud" |
chroma.provider | string | const: "chroma" | |
chroma.serverName | string | Tool Server Name Namespace for agent-facing tool names, e.g. 'chroma' exposes tools as chroma.search / chroma.upsert / chroma.delete. Change this when running multiple Chroma nodes in the same pipeline so their tool names do not collide. | "chroma" |
vector.cloud.host | Enter the server IP address e.g. | ||
vector.cloud.port | 443 | ||
vector.local.host | "localhost" | ||
vector.local.port | 8330 |
Dependencies
chromadb-client