Weaviate
A RocketRide store node that persists embedded document chunks in a Weaviate instance and retrieves them by semantic or keyword search.
What it does
Stores pre-embedded documents in a Weaviate collection and answers searches against them. Supports both self-hosted Weaviate and Weaviate Cloud, selected via a profile.
Uses the official weaviate-client Python SDK (v4 API): connect_to_local for self-hosted instances and connect_to_weaviate_cloud for cloud clusters, with connection timeouts of 30 s (init), 60 s (query), and 120 s (insert).
Key behavior to know:
- Documents must arrive pre-embedded. Run them through an embedding node first: the collection is created with
Vectorizer.none(), so Weaviate never embeds anything itself; the pipeline supplies all vectors. A document without an embedding raises an error on ingest. - The collection is created automatically on first write if it does not exist, with an HNSW vector index using the configured distance metric.
- Re-ingesting is idempotent per document. Before inserting, all existing chunks with the same
objectIdare deleted, then the new chunks are written via Weaviate's dynamic batch API. If any batch objects fail, the node raises an error. - Deletes are soft by default. Documents can be marked deleted (
isDeleted: true) and later re-activated; soft-deleted chunks are excluded from every search and get unless the filter explicitly asks for deleted documents. Hard removal byobjectIdis also supported. - The configured
hostis normalized automatically: leadinghttp:///https://and trailing slashes are stripped, and the API key is trimmed of whitespace.
Configuration
Lanes
| Lane in | Lane out | Description |
|---|---|---|
documents | - | Ingest pre-embedded documents into the collection |
questions | documents | Return matching documents |
questions | answers | Return matching documents as an answer |
questions | questions | Enrich the question with matching documents for downstream nodes |
The node can also render a stored object back to text: given an object id, it rehydrates all chunks in chunkId order (fetched in windows of renderChunkSize) and streams the joined text to the text lane.
Fields
| Field | Type / Default | Description |
|---|---|---|
host | string | Weaviate server address. Cloud: <your-instance-name>.weaviate.cloud. Local default: localhost. Scheme and trailing slashes are stripped automatically. |
port | int: 8080 local, 443 cloud | REST port |
grpc_port | int: 50051 | gRPC port (local profile only) |
apikey | string | API key. Required for cloud; optional for local (used only when non-empty) |
score | number: 0.5 | Minimum retrieval similarity threshold |
collection | string: ROCKETRIDE | Collection name: must start with an uppercase letter and contain only letters, numbers, and underscores |
similarity | string: cosine | Distance metric: cosine · dot · l2-squared · hamming · manhattan. Any other value raises an error at startup |
renderChunkSize | int: 33554432 | Number of chunk ids fetched per window when rendering a full document |
mode | string (set by profile) | local or cloud: selects the connection method |
Each ingested chunk is stored with these properties alongside its vector: content, objectId, nodeId, parent, permissionId, isDeleted, chunkId, isTable, tableId, vectorSize, modelName.
Profiles
| Profile | Mode | Default host | Port |
|---|---|---|---|
| Weaviate cloud server | cloud | (your Weaviate Cloud endpoint) | 443 |
| Your own Weaviate server | local | localhost | 8080 |
The preconfig default profile is cloud. The cloud profile exposes host, API key, score, and collection; the local profile exposes host, port, gRPC port, score, and collection.
Search behavior
- Semantic search runs a
near_vectorquery with the question's embedding. The question must carry an embedding (bind an embedding node), and a non-zero result offset is not supported. When the requested limit is 10 or less, the node queries with a limit of 25. - Keyword search matches the question text against chunk content with a
*query*wildcardlikefilter. - Both searches apply the document filter (node id, parent, permissions, object ids, chunk id ranges, table flags) and exclude soft-deleted chunks unless deleted documents are requested.
- Scoring: with the
cosinemetric the returned distance is mapped to(distance + 1) / 2; for all other metrics a sigmoid1 / (1 + exp(distance / -100))is used. Results scoring below0.20are discarded outright, before the configuredscorethreshold is applied.
Configuration validation
When the node config is saved, a fast probe validates it and surfaces problems as warnings:
- The collection name is checked against the official Weaviate rule (
^[A-Z][_0-9A-Za-z]*$): start with an uppercase letter; only letters, numbers, and underscores; no spaces or special characters. - Hosts of
localhost/127.*are treated as local, anything else as cloud. - Cloud: an HTTP GET to
/v1/metawith the API key as a Bearer token (3 s timeout). - Local: the SDK lists collections over REST, then verifies the gRPC port is reachable (channel-ready check, falling back to a plain TCP connect if
grpcis unavailable).
HTTP error responses are surfaced with their status code and the server's message/error body so misconfigurations are easy to diagnose.
Authentication
- Cloud profile: set
apikeyto your Weaviate Cloud API key, it is passed asAuth.api_keycredentials. - Local profile: anonymous by default. If
apikeyis set to a non-empty value, it is sent as API-key credentials to the local instance.
Upstream docs
Schema
| Field | Type | Description | Default |
|---|---|---|---|
vector.cloud.host | Enter the server IP address e.g. | ||
vector.cloud.port | 443 | ||
vector.local.grpc_port | 50051 | ||
vector.local.host | "localhost" | ||
vector.local.port | 8080 | ||
weaviate.profile | string | Type of Weaviate host Connect to... | "local" |
weaviate.provider | string | const: "weaviate" |
Dependencies
authlibgrpciogrpcio-health-checkinggrpcio-toolshttpxpydanticrequestsvalidatorsweaviate-clientnumpy