Skip to main content
View source

Index Search

View as Markdown

Store and retrieve documents by keyword (BM25) or by meaning (vectors), backed by Elasticsearch or OpenSearch.

What it does

One node, two service variants (Elasticsearch and OpenSearch) that ingest documents and retrieve them at query time. Each variant operates in one of two modes, selected by the Store Mode toggle; no pipeline rewiring is required to switch between them:

  • Index mode: classic BM25 full-text search over the index, with a configurable match operator (or, and, exact phrase) and optional contextual snippet highlighting. Text arrives on the text lane and results are returned via scan/scroll with a batch size of 500 and a scroll window of 1m, so all matches are returned rather than just the first page.
  • Vector store mode: semantic similarity search over embedded documents. Documents must pass through an embedding node before reaching this one. Hits below the configured Retrieval Score threshold are dropped.

Uses the official elasticsearch Python client (8.x, dense_vector + kNN, cosine similarity by default; l2_norm and dot_product are also accepted) and opensearch-py (knn_vector indices with HNSW / FAISS / cosinesimil). The backend is resolved automatically from the node config at startup.

Elasticsearch covers self-managed, Elastic Cloud Hosted, and Elastic Cloud Serverless deployments. OpenSearch covers self-managed OpenSearch. Default mode differs per variant: Elasticsearch starts in vector store mode (store_enabled: true); OpenSearch starts in index mode (mode: false).

Saving the node config runs a fast connectivity probe: the index/collection name format is checked, then the cluster is contacted (Elasticsearch: cluster.health with a 10 s timeout; OpenSearch: ping). Failures surface as warnings with the backend's inner error reason extracted.


Configuration

Lanes

Lane inLane outDescription
text(none)Ingest raw text (index mode)
documents(none)Ingest pre-embedded documents (vector store mode only)
questionstextSearch and stream matching text
questionsdocumentsSearch and stream matching documents
questionsanswersSearch and stream matching documents as answers
questionsquestionsEnrich question with matching documents for downstream nodes (Elasticsearch variant only)

The documents input lane is only processed in vector store mode; documents arriving in index mode are silently ignored. Documents without an embedding are skipped in vector store mode.

Index and collection names must be 1-255 characters of lowercase letters, digits, ., _, or -; slashes and spaces are not allowed.

Elasticsearch variant

The Deployment Type field selects a connection profile:

FieldTypeDescription
hoststringDefault "http://localhost:9200". Localhost URL for OpenSearch.
enabledbooleanDefault true. Enable basic authentication when connecting.
usernamestringDefault "admin".
passwordstringDefault empty.
collectionstringDefault "rocketride". The name of the collection to use for the OpenSearch index. Only lowercase letters, numbers, and underscores are allowed.
modebooleanDefault false. Toggle between index and vector store.
searchbooleanDefault false. Customize the search behavior of the index. This option does not affect the ingestion and creation of the index. You can switch between behaviors when searching between pipeline runs.
matchOperatorstringDefault "or". Controls how multiple query terms are matched: 'or' (default) matches documents containing ANY of the query terms, 'and' matches documents containing ALL of the query terms, 'exact' requires the exact phrase to appear in order (phrase matching).
slopnumberDefault 0. The number of words to allow between terms when exact phrase search is enabled.
highlightbooleanDefault false. Use the unified highlighter to return snippets around matches.
fragment_sizenumberDefault 250. Maximum characters in the returned highlight snippet (context window) per hit.
scorenumberDefault 0.5. Minimum retrieval score for vector stores
dimintegerDefault 768. Required in vector store mode; dimension of embedding vectors.
index_labelobject
vstore_labelobject
providerstringDefault "opensearch".
indexstringDefault "rocketride". Enter the name of the Elasticsearch index (must be lowercase)
typestringDefault "vector_database". Elasticsearch operation type
store_enabledbooleanDefault true. Enable document storage
profilestringDefault "self-managed". Connect to...
FieldType / DefaultDescription
elasticsearch.profileenum, self-managedself-managed / cloud-hosted / cloud-serverless
vector.local.hoststring, localhostServer address (self-managed)
vector.local.portnumber, 9200Server port (self-managed)
vector.cloud.hoststring, emptyCloud host URL, e.g. <deployment-id>.es.<region>.cloud.es.io (cloud profiles)
vector.cloud.portnumber, 9243/443Cloud port (9243 for cloud-hosted, 443 for cloud-serverless)
vector.apikeystring, emptyElastic Cloud API key (cloud profiles)
vector.indexstring, rocketrideElasticsearch index name (lowercase)
elasticsearch.modeboolean, truetrue = vector store (semantic search); false = index (BM25)
vector.scorenumber, 0.5Minimum similarity threshold in vector store mode (0.0-1.0)

OpenSearch variant

FieldType / DefaultDescription
opensearch.hoststring, http://localhost:9200OpenSearch server URL
opensearch.collectionstring, rocketrideIndex name (lowercase letters, digits, underscores)
opensearch.auth.enabledboolean, trueEnable basic authentication
opensearch.auth.usernamestring, adminBasic auth username (shown when auth is enabled)
opensearch.auth.passwordstring, emptyBasic auth password (shown when auth is enabled)
opensearch.modeboolean, falsetrue = vector store (kNN); false = index (BM25)
opensearch.diminteger, 768Embedding dimension (required in vector store mode; must be > 0)
opensearch.scorenumber, 0.5Minimum similarity score to include a result (0-1)

Modes

Index mode

Raw text from the text lane is stored in a plain content text field. Questions trigger a BM25 match query; every hit is streamed out on the answers, text, and documents lanes (using the hit _id as objectId).

The Customize Indexing Search Behavior toggle exposes additional search options. These affect querying only, never ingestion or index creation, so they can be changed between pipeline runs without re-ingesting data.

FieldDefaultDescription
elasticsearch.matchOperator / opensearch.matchOperatororor matches any term; and matches all terms; exact is phrase match
elasticsearch.search.exact.slop / opensearch.search.exact.slop0Words allowed between terms in exact phrase match
elasticsearch.search.highlight / opensearch.search.highlightfalseUse the unified highlighter to return snippets around matches instead of the full document
elasticsearch.search.highlight.fragment_size / opensearch.search.highlight.fragment_size250Maximum characters per highlight snippet per hit

Vector store mode

Pre-embedded documents from the documents lane are upserted into a vector index. Questions are answered by kNN similarity search; hits below the Retrieval Score threshold are dropped.

For Elasticsearch, the index uses a dense_vector field with cosine similarity by default. Similarity can be changed via the similarity config value (cosine, l2_norm, or dot_product). Search dispatches through the DocumentStoreBase using kNN with num_candidates set to 10x the requested limit.

For OpenSearch, the vector index uses knn_vector with HNSW / FAISS / cosinesimil. The top 10 nearest neighbours are returned and filtered against the Retrieval Score threshold.

Gotcha (OpenSearch vector store): the vector index is created automatically with the configured Embedding Dimension. If an index with the same name already exists but is not a knn_vector index, or its dimension does not match the configured value, the index is deleted and recreated, and all existing data in that index is lost. Keep the dimension in sync with your embedding model.


Authentication

Elasticsearch

Self-managed instances connect without credentials (http:// is assumed for localhost, 127.*, and self-managed mode). Elastic Cloud Hosted and Serverless profiles require the host URL plus an API key; connections use https://.

OpenSearch

Basic auth (username and password). When basic auth is enabled, an http:// host is automatically upgraded to https://, and TLS certificate verification is disabled (verify_certs=False), which is suitable for self-managed clusters with self-signed certificates. Both username and password are required when auth is on.


Schema

Elasticsearch (services.elasticsearch.json)

FieldTypeDescriptionDefault
elasticsearch.indexstringIndex Name / Collection Name
Enter the name of the Elasticsearch index
"rocketride"
elasticsearch.index_labelobjectIndex Mode
elasticsearch.matchOperatorstringMatch Operator
Controls how multiple query terms are matched: 'or' (default) matches documents containing ANY of the query terms, 'and' matches documents containing ALL of the query terms, 'exact' requires the exact phrase to appear in order (phrase matching).
"or"
elasticsearch.modebooleanStore Mode
Toggle between index (text search) and vector store (semantic search).
true
elasticsearch.profilestringDeployment Type
Connect to...
"self-managed"
elasticsearch.providerstringconst: "elasticsearch"
elasticsearch.searchbooleanCustomize Indexing Search Behavior
Customize the search behavior of the index. This option does not affect the ingestion and creation of the index. You can switch between behaviors when searching between pipeline runs.
false
elasticsearch.search.exact.slopnumberSlop
The number of words to allow between terms when exact phrase search is enabled.
0
elasticsearch.search.highlightbooleanReturn contextual snippets
Use the unified highlighter to return snippets around matches.
false
elasticsearch.search.highlight.fragment_sizenumberSnippet size (characters)
Maximum characters in the returned highlight snippet (context window) per hit.
250
elasticsearch.store_enabledbooleanStore
Enable document storage
true
elasticsearch.typestringType
Elasticsearch operation type
"vector_database"
elasticsearch.vstore_labelobjectVector Store Mode
vector.cloud.hostEnter the Elastic Cloud host URL e.g. .es..cloud.es.io
vector.cloud.port9243
vector.indexstringIndex Name / Collection Name
Enter the name of the Elasticsearch index (must be lowercase)
"rocketride"
vector.local.host"localhost"
vector.local.port9200

OpenSearch (services.opensearch.json)

FieldTypeDescriptionDefault
opensearch.auth.enabledbooleanUse basic auth
Enable basic authentication when connecting.
true
opensearch.auth.passwordstringPassword""
opensearch.auth.usernamestringUsername"admin"
opensearch.collectionstringCollection
The name of the collection to use for the OpenSearch index. Only lowercase letters, numbers, and underscores are allowed.
"rocketride"
opensearch.dimintegerEmbedding Dimension
Required in vector store mode; dimension of embedding vectors.
768
opensearch.hoststringHost
Localhost URL for OpenSearch.
"http://localhost:9200"
opensearch.index_labelobjectIndex Mode
opensearch.matchOperatorstringMatch Operator
Controls how multiple query terms are matched: 'or' (default) matches documents containing ANY of the query terms, 'and' matches documents containing ALL of the query terms, 'exact' requires the exact phrase to appear in order (phrase matching).
"or"
opensearch.modebooleanStore Mode
Toggle between index and vector store.
false
opensearch.providerstringconst: "opensearch"
opensearch.scorenumberRetrieval Score
Minimum retrieval score for vector stores
0.5
opensearch.searchbooleanCustomize Indexing Search Behavior
Customize the search behavior of the index. This option does not affect the ingestion and creation of the index. You can switch between behaviors when searching between pipeline runs.
false
opensearch.search.exact.slopnumberSlop
The number of words to allow between terms when exact phrase search is enabled.
0
opensearch.search.highlightbooleanReturn contextual snippets
Use the unified highlighter to return snippets around matches.
false
opensearch.search.highlight.fragment_sizenumberSnippet size (characters)
Maximum characters in the returned highlight snippet (context window) per hit.
250
opensearch.vstore_labelobjectVector Store Mode

Dependencies

  • elasticsearch >=8.0.0,<9.0.0
  • opensearch-py ==3.2.0
  • numpy