Skip to main content
View source

Chroma

View as Markdown

A RocketRide vector store node backed by ChromaDB that ingests pre-embedded documents, answers questions via semantic or keyword search, and exposes search/upsert/delete as agent-callable tools.

What it does

Stores pre-embedded document chunks in a ChromaDB collection and retrieves them against incoming questions by semantic (vector) or keyword search. The node is registered with classType: ["store", "tool"] and the invoke capability, so an agent in the same pipeline can also call it directly as a tool (for example chroma.search, chroma.upsert, chroma.delete).

Uses the chromadb-client package (the lightweight HTTP client only, not the full embedded database) and connects via chromadb.HttpClient. A ChromaDB server (self-hosted or ChromaDB Cloud) must therefore be reachable at the configured host and port.

Documents must pass through an embedding node before reaching this node; chunks without an embedding are rejected with an error. The collection is created on first write via get_or_create_collection, with the configured similarity metric stored as hnsw:space. Soft deletes are supported: documents can be marked isDeleted in metadata and are then excluded from search results unless the filter explicitly requests deleted records.


Configuration

Lanes

Lane inLane outDescription
documents(none)Ingest pre-embedded documents into the collection
questionsdocumentsReturn matching documents
questionsanswersReturn matching documents as an answer
questionsquestionsEnrich the question with matching documents for downstream nodes

Fields

FieldTypeDescription
serverNamestringDefault "chroma". Namespace for agent-facing tool names, e.g. 'chroma' exposes tools as chroma.search / chroma.upsert / chroma.delete. Change this when running multiple Chroma nodes in the same pipeline so their tool names do not collide.
profilestringDefault "cloud". Connect to...
providerstring

Profiles

ProfileDescription
localYour own ChromaDB server. Connects with plain HttpClient(host, port), no authentication.
cloudChromaDB Cloud. Requires host and apikey; authenticates using ChromaDB's TokenAuthClientProvider.

Agent tools

When wired to an agent, the node exposes three tools via VectorStoreToolMixin. Each tool is named <serverName>.<tool> (defaults: chroma.search, chroma.upsert, chroma.delete).

ToolKey inputsDescription
searchquery (required); top_k (default 10, max 100); filter (optional dict, keys objectId/nodeId/parent are honored)Semantic search over stored documents; returns content, metadata, and score per result. Falls back to keyword search if semantic search fails.
upsertdocuments array, each with content and object_id; optional metadata, embedding, and embedding_modelAdd or update documents. Embeddings are computed automatically via the bound embedding provider, or pre-computed vectors can be supplied.
deleteobject_ids (non-empty string array)Hard-delete documents by object ID. Returns deleted_count.

Tool calls run on the control plane and do not flow through the pipeline's embedding lanes. Semantic search in the search tool and automatic embedding in upsert require an embedding provider bound to the node (the all.embedding block in its parameters). Without one, those calls return {"success": false, "error": ...}.


Search behavior

  • Semantic search requires the question to carry an embedding; it raises an error otherwise. Non-zero result offsets are not supported in semantic search.
  • Keyword search uses ChromaDB's $contains document filter and supports offset/limit paging.
  • Raw distances are normalized to scores: cosine distances map to (distance + 1) / 2; l2/ip distances pass through a sigmoid. Results scoring below 0.20 are always dropped before they leave the node, regardless of the score threshold.
  • Filters on nodeId, parent, objectId, tableId, chunkId ranges, and permissions are translated to ChromaDB where clauses. Documents marked deleted are excluded with $ne: true, so records that never had an isDeleted key still match (they are treated as active).

Ingestion behavior

  • Chunks are upserted in batches, flushed every 500 chunks or when the accumulated payload exceeds payloadLimit (32 MiB by default).
  • When a chunk with chunkId: 0 arrives, all existing chunks sharing the same objectId are deleted first, so re-ingesting a document replaces it rather than duplicating it.
  • Each stored chunk receives a fresh UUID as its ChromaDB record id; objectId and chunkId in metadata are the stable application-level identifiers.
  • Rendering a full document re-assembles chunks in chunkId order, fetching renderChunkSize chunks per round trip and tolerating gaps in the sequence.

Authentication

Local profile

No authentication is required. The node connects with chromadb.HttpClient(host, port).

Cloud profile

Set profile to cloud, provide the ChromaDB Cloud host and your apikey. The node authenticates using chromadb.auth.token_authn.TokenAuthClientProvider configured via ChromaDB's Settings object.


Schema

FieldTypeDescriptionDefault
chroma.profilestringType of chroma host
Connect to...
"cloud"
chroma.providerstringconst: "chroma"
chroma.serverNamestringTool Server Name
Namespace for agent-facing tool names, e.g. 'chroma' exposes tools as chroma.search / chroma.upsert / chroma.delete. Change this when running multiple Chroma nodes in the same pipeline so their tool names do not collide.
"chroma"
vector.cloud.hostEnter the server IP address e.g.
vector.cloud.port443
vector.local.host"localhost"
vector.local.port8330

Dependencies

  • chromadb-client