Skip to main content
View source

Core

View as Markdown

The shared-services module of the RocketRide engine: it registers the built-in pipeline services (file system source, parser, fingerprinter, word indexer, ZIP target, null endpoint) and the common field libraries that other nodes merge into their own service definitions.

What it does

core is not one node; it is the module that registers RocketRide's family of shared services. Each service is declared in a services.*.json file in this directory, and you configure them inside a pipeline rather than dropping one standalone node on the canvas.

The directory holds three kinds of content:

  • Concrete service definitions: services.filesys.json, services.parse.json, services.hash.json, services.indexer.json, services.zip.json, and services.null.json each register one engine service (title, protocol, class type, capabilities, lanes, and config shape).
  • Shared field libraries: the services.common*.json files define reusable field groups (cloud-provider credentials, include/exclude path forms, vector-store settings, LLM access, anonymization, remote processing) that are merged into other service definitions as required.
  • Shared code and assets: google_access.py (the access/scope resolver used by Google tool nodes) and the SVG icons displayed in the UI for connector and processing nodes (Amazon S3, Azure Blob, Google Drive, OneDrive, SharePoint, Outlook, Gmail, Confluence, Slack, SMB, and others).

The hash/ and parser/ subdirectories carry the per-service documentation pages for the Fingerprinter and Parser services.


Services

ServiceFileProtocolClass typeLanes
Local File Systemservices.filesys.jsonfilesys://source_sourcetags
Parserservices.parse.jsonparse://datatagstext, table, image, video, audio
Fingerprinterservices.hash.jsonhash://datatagstags
Word indexerservices.indexer.jsonindexer://(empty, internal, not selectable)none
ZIP Creationservices.zip.jsonzip://target (internal)none
Nullservices.null.jsonnull://(empty, internal)sourcetags

Local File System

A source service that reads data from the local file system, ingesting files and documents from directories for further processing. Capabilities: filesystem, noremote, security, nosaas. Supported actions: export, delete, download.

FieldType / DefaultDescription
includearray (min 1 item)Paths included for Scan, Index, Classify, and OCR. Windows: C:\foldername or \\file.core.net\foldername; Linux: /file.core.net/foldername. Each entry carries per-path toggles (include.permissions, include.signing).
excludearray, optionalPaths excluded from processing (same path formats).
excludeExternalDrivesboolean, default trueSkip external drives.
excludeEnableGlobalboolean, default trueSkip typical OS files and directories.
excludeSymlinksboolean, default trueSkip symlinks.
estimationsection, optionalCost estimations: estimation.accessDelay, estimation.accessRate, estimation.storeCost, estimation.accessCost (all numbers, default 0).

The service exposes two shape sections: Source (full form with estimation) and Pipe (the Pipe.include / Pipe.exclude variants, for example /Users/usr/Documents/product-images/*). A development variant of the definition exists as services.filesys.json.dev.

Parser

Extracts structured content from a wide variety of document types. It automatically identifies embedded content and routes it to the appropriate output lane, making text, tables, images, audio, and video accessible for downstream processing. No configuration fields.

Lane inLane outDescription
tagstextExtracted plain text
tagstableExtracted tables
tagsimageExtracted images
tagsvideoExtracted video streams
tagsaudioExtracted audio streams

Fingerprinter

Generates a deterministic fingerprint (hash) of each document's content as it passes through the pipeline. The hash is computed from the raw or normalized text, so identical content always produces the same fingerprint regardless of metadata. Use it for deduplication, content tracking, and identity verification before indexing. Lane: tagstags. Ships a single empty default preconfig profile and has no configuration fields.

Word indexer

Enables full-text indexing inside the engine. Registered with capability internal and an empty classType, so it is not user-selectable on the canvas; it has no fields or shape of its own.

ZIP Creation

An internal target service (protocol zip://) that streams processed objects into a ZIP archive. Supported actions: export, delete, download. Its target parameters are storePath (destination folder, see path formats in the Local File System section) and url (read-only, default https://).

Null

An internal no-op endpoint registered as both a source shape and a target shape with empty parameter sections. Lane: sourcetags.


Shared field libraries

These files define common fields that are merged into a service definition as required. Field names below are exact.

Basic fields and include/exclude forms (services.common.json)

  • storePath: exact folder path; format varies by backend (Windows, Linux, AWS/S3 bucketname/foldername, Azure Blob containername/foldername, SharePoint sitename/drivename/foldername, OneDrive account/foldername).
  • url: service URL, read-only, default https://.
  • include / exclude: path arrays with per-path processing toggles: include.classify (default false), include.ocr (default false), include.signing (default true; enabling it unlocks include.index and include.vectorize), include.index (default false), include.permissions (default false), include.vectorize (default false).
  • estimation: cost-estimation section (estimation.accessCost, estimation.accessDelay, estimation.accessRate, estimation.storeCost; all default 0).
  • DTC.* and Pipe.*: simplified include/exclude/feature variants of the same forms (OCR and permissions feature toggles) used by the DTC and Pipe shapes.
  • Hidden plumbing fields: sync (default true), actions, source.mode (default "Source"), target.mode (default "Target"), hideForm (default true).

AWS credentials (services.common.aws.json)

FieldTypeDescription
aws.accessKeystring, secure, optionalAccess key used to sign requests to Amazon S3.
aws.secretKeystring, secure, optionalSecret key used to access AWS services.
aws.regionenumAWS region (us-east-1 through sa-east-1; default empty "Select Region").

Google Workspace credentials (services.common.google.json)

FieldTypeDescription
google.authTypeenum service / user, default serviceSelects service-account vs user OAuth flow.
google.customerIdstringGoogle Workspace Customer ID (service auth).
google.adminEmailstringAdministrator e-mail with admin privileges (service auth).
google.serviceKeydata-url (.json upload)Service-account JSON key file (service auth).
google.oAuthButtonstring, optional"Login with Google" OAuth widget (user auth).
google.userTokenstringLong-term access token used to mint Google API access tokens (user auth).

LLM access (services.common.llm.json)

FieldType / DefaultDescription
llm.local.serverbasestring, default http://localhost:11434/v1Base URL the model is hosted under.
llm.cloud.apikeystring, secureAPI key or token.
llm.cloud.projectstringLLM project or organization name.
llm.cloud.locationstringLLM server location.
llm.cloud.modelSourcestring, hidden, optionalModel source.

Remote processing (services.common.remote.json)

remote.profile selects the processing mode: local ("Process everything locally", the default) or remote ("Process CPU/GPU heavy tasks remotely"). Remote mode exposes remote.host (default pipe.rocketride.ai), remote.port (default 5565), and remote.apikey.

Vector stores and embeddings (services.common.vector.json)

FieldType / DefaultDescription
vector.host / vector.portstring / numberVector-store server address and port (with vector.cloud.* and vector.local.* variants, plus vector.local.grpc_port).
vector.collectionstring, default ROCKETRIDECollection name.
vector.scorenumber 0-1, default 0.7Minimum retrieval score, from 0.0 "All results" to 1.0 "Almost identical".
vector.apikeystring, secureAPI key.
vectorizer.embeddingcombo embeddingEmbedding provider selector.
vectorizer.storecombo storeVector-store provider selector.

Anonymization (services.common.anonymize.json)

FieldType / DefaultDescription
anonymizeboolean, default false (hidden)Master toggle: mask classified/sensitive data in the text.
anonymizeCharsingle character, default Mask character; "SSN: 064 70 6733" becomes "SSN: ███████████".
anonymizeAllboolean, default falseCollapse masked runs to a fixed length ("SSN: ***" instead of "SSN: ***********").

Combined provider selectors (services.all.json)

Combines services into single selectable types for pipelines that pick one provider per slot:

FieldComboDefault
all.preprocessorpreprocessorpreprocessor_langchain
all.embeddingembeddingembedding_transformer
all.storestoreqdrant
all.llmllmopenai

Google access helper (google_access.py)

A single reader that turns a Google tool node's access enum and capability toggles into one resolved object: the OAuth scopes to request, plus the write/destructive gates the node's tool functions check at invoke time.

resolve_google_access(config, spec) resolves a node config against a per-API AccessSpec and returns a GoogleAccess with the granted tier, scopes, can_write, and gate flags. Tool functions then call require_write(op) and require_flag(name, op), which raise GoogleAccessError when the operation is not enabled.

Behavior to know:

  • A blank or omitted access value falls back to the spec's default tier; any other non-string value raises.
  • can_write is derived from the granted scopes (a tier is writable if at least one scope does not end in .readonly), so it cannot drift from the actual grant.
  • Gate flags are strict: only an explicit boolean true enables a gated operation. A present non-bool value ("false", 1, "no") raises rather than coercing, and a missing flag defaults to off.

Bundled specs:

SpecTiers (default in bold)Gate flags
GMAILreadonly / modify / sendnone (permanent delete needs the full https://mail.google.com/ scope, which no tier grants; gmail.modify only trashes)
DRIVEreadonly / writeallowPublicSharing, allowHardDelete
SHEETSreadonly / writenone
DOCSreadonly / writenone
CALENDARreadonly / writeallowDelete
SLIDESreadonly / writenone
PEOPLEreadonly / write (contacts write + directory read-only)allowDelete

Running the tests

pytest nodes/test/core/test_google_access.py -v

Schema

services.all.json

FieldTypeDescriptionDefault
all.embeddingEmbedding"embedding_transformer"
all.llmLLM"openai"
all.preprocessorPreprocessor"preprocessor_langchain"
all.storeVector Store"qdrant"

services.common.anonymize.json

FieldTypeDescriptionDefault
anonymizebooleanAnonymize Classified Information
Enable it if you want to mask any sensitive data in the text. If you leave it disabled, the text will be output as it is.

For example, if you enable it, then the text "SSN: 064 70 6733" will become "SSN: ***********". If you disable it, the text will remain "SSN: 064 70 6733".
false
anonymizeAllbooleanAnonymize All Data
Enable it if you want to collapse to the fixed length any sensitive data in the text.

For example, if you enable it, then the text "SSN: 064 70 6733" will become "SSN: ***". If you disable it, the text will remain "SSN: ***********".
false
anonymizeCharstringAnonymization Character
Specify a character that will mask any sensitive data in the text.

For example, if you specify the characher "*", then the text "SSN: 064 70 6733" will become "SSN: ***********". If you specify the character "?" the text will become "SSN: ???????????".
"█"

services.common.aws.json

FieldTypeDescriptionDefault
aws.accessKeystringAccess key
This is a key which gives access to your AWS resources. It is provided by the service provider. It is used to sign the requests you send to Amazon S3.
aws.regionstringRegion
This is defined and provided by the service provider.
""
aws.secretKeystringSecret key
This is a key used to access the AWS services.

services.common.google.json

FieldTypeDescriptionDefault
google.adminEmailstringAdministrator E-mail
Enter the email address of a Google Workspace administrator.

This email should belong to a user with administrative privileges in your Google Workspace domain.
google.authTypestringAuthentication Type"service"
google.customerIdstringCustomer ID
Enter your Google Workspace Customer ID.

This unique identifier is assigned to your organization by Google. It is used to specify the particular Google Workspace domain you want to manage.
google.oAuthButtonstringLogin with Google
google.serviceKeystringService Account Key File
Upload the JSON key file for your Google Workspace service account.

This file contains the credentials necessary to authenticate API requests.
google.userTokenstringAccess Token
It is a long term token that allows you to get new access tokens to access the Google API.

services.common.json

FieldTypeDescriptionDefault
DTC.excludearrayProvide the path to exclude
DTC.exclude.pathstringExclude path
DTC.featuresobjectWhat features do you want?
Please be advised that selecting additional features will increase the time it takes to scan your data.
DTC.includearrayProvide the path to your data
DTC.include.pathstringInclude path
Pipe.excludearrayProvide the path to exclude
Pipe.exclude.pathstringExclude path
Pipe.featuresobjectWhat features do you want?
Please be advised that selecting additional features will increase the time it takes to scan your data.
Pipe.includearrayProvide the path to your data
Pipe.include.pathstringInclude path
actionsobject
estimationCost estimations
estimation.accessCostnumberAccess cost
The egress cost per MB to recall a file.
0
estimation.accessDelaynumberAccess delay
Elapsed time before access to a file starts. (For example: S3 could be 0 second delay, while Glacier could be hours.
0
estimation.accessRatenumberAccess rate
Time required to recall a file in MB per second.
0
estimation.storeCostnumberStore cost
The cost per MB to store a file for a month.
0
excludearrayExclude paths
exclude.pathstringExclude path
hideFormbooleantrue
includearrayInclude paths
include.classifybooleanEnable Classification
Classification will assign each file to one or more classification policies. Once enabled, all supported files will be classified into one or more of the activated classification policies.
false
include.indexbooleanEnable Indexing
Indexing will allow for full-text search of all processed files. Once enabled, all supported files will be scanned and indexed as they are processed.
Once Index is enabled, other parameters like OCR and classify could be enabled too.
false
include.ocrbooleanEnable OCR
Optical Character Recognition (OCR) will convert typed or handwritten text found in images into text.Once enabled, all image files such as jpgs will have text extracted for use in classification and search.
false
include.pathstringInclude path
include.permissionsbooleanEnable Permissions
Permissions allow gathering of file ownerships and permissions from all connected and scanned sources.
false
include.signingbooleanContent Signature
Content Signature executes a hash algorithm on the content of every object to generate a "signature". Identical signatures indicate identical content and are an effective method to detect duplicate objects.
true
include.vectorizebooleanEnable AI Embeddings
AI embeddings make text content accessible to AI technologies like semantic relevancy and generative AP chants.
Once AI embeddings are enabled, other parameters like OCR and classify could be enabled too.
false
source.modestring"Source"
storePathstringStore path
This path defines the exact specific folder in the filesystem.

Format for Windows : C:\foldername (for local filesystem) or \file.core.net\foldername (for shared folders)

Format for Linux : /file.core.net/foldername

Format for AWS/S3 : bucketname/foldername
Format for Azure blob : containername/foldername
Format for SharePoint: sitename/drivename/foldername
Format for onedrive: account/foldername
syncbooleantrue
target.modestring"Target"
urlstringURL
URL to connect to the service. E.g: https://[dnsname].com
"https://"

services.common.llm.json

FieldTypeDescriptionDefault
llm.cloud.apikeystringAPI key (Token)
Enter your API key or token
llm.cloud.locationstringLocation
LLM server location
llm.cloud.modelSourcestringModel source
llm.cloud.projectstringProject (Organization)
LLM project or organization name
llm.local.serverbasestringLLM URL
Base url the model is hosted under.
"http://localhost:11434/v1"

services.common.remote.json

FieldTypeDescriptionDefault
remote.apikeystringAPI key
Enter your API key
remote.hoststringHost"pipe.rocketride.ai"
remote.local.modestringconst: "local"
remote.portnumberPort5565
remote.profilestringProcessing Mode"local"
remote.providerstringconst: "remote"
remote.remote.modestringconst: "remote"

services.common.vector.json

FieldTypeDescriptionDefault
vector.apikeystringAPI key
Enter your API key
vector.cloud.hoststringHost
Enter the server IP address e.g. Localhost
vector.cloud.portnumberPort
Enter the port number
vector.collectionstringCollection
Enter the name of the collection
"ROCKETRIDE"
vector.hoststringHost
Enter the server IP address e.g. Localhost
vector.local.grpc_portnumbergRPC Port
Enter the port number
vector.local.hoststringHost
Enter the server IP address e.g. Localhost
vector.local.portnumberPort
Enter the port number
vector.portnumberPort
Enter the port number
vector.scorenumberRetrieval Score
Minumum retrieval score
0.7
vectorizer.embeddingEmbedding
vectorizer.storeVector Store

Local File System (services.filesys.json)

FieldTypeDescriptionDefault
Pipe.excludeExample Path: /Users/usr/Documents/product-images/*
Pipe.filesys.source.parameters
Pipe.includeExample Path: /Users/usr/Documents/product-images/*
excludeThis path defines the paths excluded for Scan, Index, Classify and OCR. By default, its empty.

Format for Windows : C:\foldername (for local filesystem) or \file.core.net\foldername (for shared folders)

Format for Linux : /file.core.net/foldername
excludeEnableGlobalbooleanExclude typical OS files and directoriestrue
excludeExternalDrivesbooleanExclude external drivestrue
excludeSymlinksbooleanExclude symlinkstrue
filesys.source.parametersParameters
includeThis path defines the paths included for Scan, Index, Classify and OCR. By default, its empty.

Format for Windows : C:\foldername (for local filesystem) or \file.core.net\foldername (for shared folders)

Format for Linux : /file.core.net/foldername

Fingerprinter (services.hash.json)

No configuration fields.

Word indexer (services.indexer.json)

No configuration fields.

Local File System (services.null.json)

FieldTypeDescriptionDefault
null.source.parametersParameters

Parser (services.parse.json)

No configuration fields.

ZIP Creation (services.zip.json)

No configuration fields.