Chroma

View as Markdown

A RocketRide vector store node backed by ChromaDB that ingests pre-embedded documents, answers questions via semantic or keyword search, and exposes search/upsert/delete as agent-callable tools.

What it does

Stores pre-embedded document chunks in a ChromaDB collection and retrieves them against incoming questions by semantic (vector) or keyword search. The node is registered with classType: ["store", "tool"] and the invoke capability, so an agent in the same pipeline can also call it directly as a tool (for example chroma.search, chroma.upsert, chroma.delete).

Uses the chromadb-client package (the lightweight HTTP client only, not the full embedded database) and connects via chromadb.HttpClient. A ChromaDB server (self-hosted or ChromaDB Cloud) must therefore be reachable at the configured host and port.

Documents must pass through an embedding node before reaching this node; chunks without an embedding are rejected with an error. The collection is created on first write via get_or_create_collection, with the configured similarity metric stored as hnsw:space. Soft deletes are supported: documents can be marked isDeleted in metadata and are then excluded from search results unless the filter explicitly requests deleted records.

Configuration

Lanes

Lane in	Lane out	Description
`documents`	(none)	Ingest pre-embedded documents into the collection
`questions`	`documents`	Return matching documents
`questions`	`answers`	Return matching documents as an answer
`questions`	`questions`	Enrich the question with matching documents for downstream nodes

Fields

Field	Type	Description
`serverName`	string	Default "chroma". Namespace for agent-facing tool names, e.g. 'chroma' exposes tools as chroma.search / chroma.upsert / chroma.delete. Change this when running multiple Chroma nodes in the same pipeline so their tool names do not collide.
`profile`	string	Default "cloud". Connect to...
`provider`	string

Profiles

Profile	Description
`local`	Your own ChromaDB server. Connects with plain `HttpClient(host, port)`, no authentication.
`cloud`	ChromaDB Cloud. Requires `host` and `apikey`; authenticates using ChromaDB's `TokenAuthClientProvider`.

Agent tools

When wired to an agent, the node exposes three tools via VectorStoreToolMixin. Each tool is named <serverName>.<tool> (defaults: chroma.search, chroma.upsert, chroma.delete).

Tool	Key inputs	Description
`search`	`query` (required); `top_k` (default 10, max 100); `filter` (optional dict, keys `objectId`/`nodeId`/`parent` are honored)	Semantic search over stored documents; returns content, metadata, and score per result. Falls back to keyword search if semantic search fails.
`upsert`	`documents` array, each with `content` and `object_id`; optional `metadata`, `embedding`, and `embedding_model`	Add or update documents. Embeddings are computed automatically via the bound embedding provider, or pre-computed vectors can be supplied.
`delete`	`object_ids` (non-empty string array)	Hard-delete documents by object ID. Returns `deleted_count`.

Tool calls run on the control plane and do not flow through the pipeline's embedding lanes. Semantic search in the search tool and automatic embedding in upsert require an embedding provider bound to the node (the all.embedding block in its parameters). Without one, those calls return {"success": false, "error": ...}.

Search behavior

Semantic search requires the question to carry an embedding; it raises an error otherwise. Non-zero result offsets are not supported in semantic search.
Keyword search uses ChromaDB's $contains document filter and supports offset/limit paging.
Raw distances are normalized to scores: cosine distances map to (distance + 1) / 2; l2/ip distances pass through a sigmoid. Results scoring below 0.20 are always dropped before they leave the node, regardless of the score threshold.
Filters on nodeId, parent, objectId, tableId, chunkId ranges, and permissions are translated to ChromaDB where clauses. Documents marked deleted are excluded with $ne: true, so records that never had an isDeleted key still match (they are treated as active).

Ingestion behavior

Chunks are upserted in batches, flushed every 500 chunks or when the accumulated payload exceeds payloadLimit (32 MiB by default).
When a chunk with chunkId: 0 arrives, all existing chunks sharing the same objectId are deleted first, so re-ingesting a document replaces it rather than duplicating it.
Each stored chunk receives a fresh UUID as its ChromaDB record id; objectId and chunkId in metadata are the stable application-level identifiers.
Rendering a full document re-assembles chunks in chunkId order, fetching renderChunkSize chunks per round trip and tolerating gaps in the sequence.

Authentication

Local profile

No authentication is required. The node connects with chromadb.HttpClient(host, port).

Cloud profile

Set profile to cloud, provide the ChromaDB Cloud host and your apikey. The node authenticates using chromadb.auth.token_authn.TokenAuthClientProvider configured via ChromaDB's Settings object.

Port

The port field accepts either a number or a string. Enter a plain integer such as 8000 for a local server or 443 for Cloud, or an env-var placeholder like ${ROCKETRIDE_CHROMA_PORT}. Env-var interpolation resolves to a string at run time, which is why the field accepts a string as well as a number; the node coerces the value to an integer before opening the ChromaDB connection, so a literal integer, a numeric string, and an interpolated placeholder all work. An unresolved or non-numeric placeholder falls back to 8000 rather than failing.

Schema

Field	Type	Description	Default
`chroma.profile`	`string`	Type of chroma host Connect to...	`"cloud"`
`chroma.provider`	`string`		const: `"chroma"`
`chroma.serverName`	`string`	Tool Server Name Namespace for agent-facing tool names, e.g. 'chroma' exposes tools as chroma.search / chroma.upsert / chroma.delete. Change this when running multiple Chroma nodes in the same pipeline so their tool names do not collide.	`"chroma"`
`vector.cloud.host`		Enter the server IP address e.g.
`vector.cloud.port`	`number,string`	Port number. Enter a plain integer such as 443, or an env-var placeholder like ${ROCKETRIDE_CHROMA_PORT}. Placeholders resolve to a string at run time, so this field accepts both a number and a string; the node coerces the value to an integer before connecting, so either form works.	`"443"`
`vector.local.host`			`"localhost"`
`vector.local.port`	`number,string`	Port number. Enter a plain integer such as 8000, or an env-var placeholder like ${ROCKETRIDE_CHROMA_PORT}. Placeholders resolve to a string at run time, so this field accepts both a number and a string; the node coerces the value to an integer before connecting, so either form works.	`"8330"`

Dependencies

chromadb-client

What it does​

Configuration​

Lanes​

Fields​

Profiles​

Agent tools​

Search behavior​

Ingestion behavior​

Authentication​

Local profile​

Cloud profile​

Port​

Schema​

Dependencies​