Skip to main content

Overview

RocketRide is an open-source data pipeline builder and runtime built for AI and ML workloads. With 50+ pipeline nodes spanning 13 LLM providers, 8 vector databases, OCR, NER, and more — pipelines are defined as portable JSON, built visually in VS Code, and executed by a multithreaded C++ runtime. From real-time data processing to multimodal AI search, RocketRide runs entirely on your own infrastructure.

Key Capabilities

  • High-performance C++ runtime — Native multithreading purpose-built for the throughput demands of AI and data workloads. No bottlenecks, no compromises for production scale.
  • Visual pipeline builder — Drag, connect, and configure nodes in VS Code — no boilerplate. Real-time observability tracks token usage, LLM calls, latency, and execution.
  • 50+ pipeline nodes — 13 LLM providers, 8 vector databases, OCR, NER, PII anonymization, and more.
  • Multi-agent workflows — Orchestrate and scale agents with built-in support for CrewAI and LangChain.
  • TypeScript, Python & MCP SDKs — Integrate pipelines into native apps, expose them as callable tools for AI assistants, or build programmatic workflows into your existing codebase.
  • Zero dependency headaches — Python environments, C++ toolchains, and all node dependencies managed automatically. Clone, build, run — no manual setup.
  • One-click deploy — Run on Docker, on-prem, or RocketRide Cloud.

Core Concepts

Pipelines

A pipeline is a directed graph of nodes that processes data from input to output. Pipelines are defined as .pipe files (JSON format) and rendered visually in the IDE extension. You can run, monitor, and debug pipelines directly from the canvas.

Nodes

Nodes are the building blocks of every pipeline. Each node performs a specific operation — calling an LLM, embedding text, querying a vector store, transforming data, and more. Nodes are organized into categories by function:

CategoryNodesDescription
Source15Where data enters the pipeline (webhook, chat, dropper)
LLM13Language model providers (OpenAI, Anthropic, Google, and more)
Store9Vector database integrations (Pinecone, Qdrant, Weaviate, and more)
Text7Text analysis and transformation (NER, PII, sentiment, and more)
Agents4Agent framework orchestration (CrewAI, LangChain)
Embedding3Generate vector representations
Image3Image processing and OCR
Preprocessor2Chunking and code processing
Audio2Transcription and playback
Data4Document parsing
Memory1Persistent agent memory
Search1Web and semantic search
Tool7External integrations (HTTP, Python, GitHub, and more)
Infrastructure1Output and export
Video1Frame extraction
Database1Direct database access

For a full breakdown, see the Nodes Overview.

Lanes

Lanes are the connections between nodes. Each node has typed input lanes and output lanes that define what data it accepts and produces. You wire nodes together by connecting an output lane of one node to a compatible input lane of another. Some nodes (like agents or LLMs) can also be invoked as tools by a parent node.

Source Types

Every pipeline begins with a source node that defines how data enters:

  • Webhook — Receives data via HTTP requests
  • Chat — Interactive conversational interface
  • Dropper — File-based input via drag-and-drop

Where to Go Next

  • Quickstart — Go from zero to a running pipeline in minutes.
  • Nodes Overview — Browse all available nodes by category.