RAG Pipeline
RAG Pipeline
Retrieval-augmented generation (RAG) is the most common pattern in RocketRide: embed documents into a vector store, then answer questions by retrieving the relevant chunks and feeding them to an LLM. This example walks through a complete pipeline that accepts questions over HTTP, retrieves context from Qdrant, and returns answers.
What you need
- An OpenAI API key
- A running Qdrant instance (local Docker or Qdrant Cloud)
- The RocketRide engine running (self-hosted or Cloud)
The pipeline
Save this as rag.pipe:
{
"nodes": [
{
"id": "source_1",
"provider": "webhook"
},
{
"id": "embed_1",
"provider": "embedding_openai",
"config": {
"profile": "text-embedding-3-small",
"apikey": "${OPENAI_API_KEY}"
},
"input": [
{ "lane": "text", "from": "source_1" }
]
},
{
"id": "store_1",
"provider": "qdrant",
"config": {
"profile": "self-hosted",
"serverName": "localhost",
"collection": "rag-docs"
},
"input": [
{ "lane": "documents", "from": "embed_1" },
{ "lane": "questions", "from": "source_1" }
]
},
{
"id": "llm_1",
"provider": "llm_openai",
"config": {
"profile": "openai-4o",
"apikey": "${OPENAI_API_KEY}"
},
"input": [
{ "lane": "questions", "from": "store_1" }
]
},
{
"id": "target_1",
"provider": "response",
"input": [
{ "lane": "answers", "from": "llm_1" }
]
}
]
}
What each node does
| Node | Provider | Role |
|---|---|---|
source_1 | webhook | Exposes an HTTP endpoint. Incoming documents arrive on the text lane; incoming questions arrive on the questions lane. |
embed_1 | embedding_openai | Turns document text into vectors using text-embedding-3-small. Emits documents (vectors + metadata). |
store_1 | qdrant | Upserts vectors from embed_1 into the rag-docs collection. When a question arrives, it retrieves the top matching chunks and re-emits them as questions with context injected. |
llm_1 | llm_openai | Receives the question + retrieved context and generates an answer using GPT-4o. |
target_1 | response | Returns the answer to the caller. |
Start the pipeline
rocketride start --pipeline ./rag.pipe
The engine prints the webhook URL and public auth key:
Webhook ready - system is ready to accept requests
URL: http://localhost:5567/task/data
Auth: abc123...
Ingest documents
POST a document to the webhook URL. The pipeline embeds and stores it:
curl -X POST http://localhost:5567/task/data \
-H "Authorization: Bearer abc123..." \
-F "file=@./my-document.pdf"
Ask a question
Send a plain-text question to the same endpoint:
curl -X POST http://localhost:5567/task/data \
-H "Authorization: Bearer abc123..." \
-H "Content-Type: text/plain" \
-d "What does the document say about refund policy?"
The pipeline retrieves the relevant chunks from Qdrant, asks GPT-4o, and streams back the answer.
Next steps
- Swap
embedding_openaiforembedding_transformerto run embeddings locally without an API key. - Swap
qdrantforpineconeormilvuswithout changing the rest of the pipeline. - Add a
guardrailsnode between the LLM and response to validate outputs. - See the Qdrant integration guide for configuration details.