Skip to main content
View source

Guardrails

View as Markdown

A RocketRide filter node that screens questions before they reach the LLM and answers before they reach your users.

What it does

Sits in the pipeline as a guard filter, evaluating questions on the way in and answers on the way out. On the input side it catches prompt injection, enforces topic rules (blocked and allowed keyword lists), and caps input length or estimated token count. On the output side it checks answers for hallucination (keyword grounding against source documents), flags harmful content, detects PII leaks (emails, phones, SSNs, credit cards, IP addresses), and validates the output format.

All checks are pure stdlib and regex: the node has no external dependencies, no model calls, and adds no network latency.

How it reacts is controlled by the policy_mode field: block drops the offending question or answer and never forwards it, warn logs the violation and forwards anyway, and log records the violation silently. The default profile runs in warn mode, so a freshly added node never blocks traffic until you opt in.

Text that is empty or whitespace-only is forwarded without checks.


Configuration

Lanes

Lane inLane outDescription
questionsquestionsInput checks run before the question is forwarded to the LLM
answersanswersOutput checks run after the LLM responds
documentsdocumentsForwarded unchanged; content is collected as ground-truth context for the hallucination check

Question text is assembled from both the question objects and any attached context before evaluation. Collected document content resets per pipeline object.

Fields

FieldTypeDescription
policy_modestringDefault "warn". How to handle violations: block (reject), warn (log + continue), log (silent)
enable_prompt_injectionbooleanDefault true. Detect and flag prompt injection attempts in input
enable_content_safetybooleanDefault true. Detect harmful or unsafe content in output
enable_pii_detectionbooleanDefault true. Detect personal identifiable information (emails, phones, SSNs, credit cards) in output
enable_hallucination_checkbooleanDefault false. Verify that output claims are grounded in source documents
max_input_lengthnumberDefault 0. Maximum character count for input text (0 = no limit)
max_tokens_estimatenumberDefault 0. Maximum estimated token count for input text (0 = no limit)
expected_formatstringDefault empty. Validate that output matches this format (empty = no check)
blocked_topicsarrayKeywords for topics that should be rejected
allowed_topicsarrayIf set, input must contain at least one of these keywords
profilestringDefault "basic". Guardrails profile

Profiles

Three built-in profiles control which fields are exposed in the UI and set sensible starting defaults.

ProfileBehaviour
Basic (default)Prompt injection + PII detection, warn mode. Only policy_mode is configurable in the UI.
StrictAll checks enabled, block on violation, max_input_length 50000, max_tokens_estimate 4096. Exposes policy_mode, max_tokens_estimate, and expected_format.
CustomAll checks enabled, warn mode. Every field is configurable individually.

Input checks

Run on the questions lane before the question is forwarded:

  • Prompt injection (rule prompt_injection, critical severity): regex patterns covering instruction-override attempts ("ignore all previous instructions"), system-prompt extraction, role-play jailbreaks (DAN and similar), delimiter/token injection (<|system|>, [INST], etc.), and encoding-evasion commands; plus weighted keyword scoring (keywords such as jailbreak, bypass, ignore safety) that triggers when the combined score reaches 0.7. Topic restriction only runs when blocked_topics or allowed_topics is non-empty.
  • Topic restriction (rule topic_restriction): blocked-keyword matches are high severity; failing to match any allowed keyword is medium severity. Matching is case-insensitive substring.
  • Input length (rule input_length, medium severity): only runs when a limit is set (max_input_length > 0 or max_tokens_estimate > 0). Tokens are estimated as word count times 1.3, so treat max_tokens_estimate as a rough budget rather than an exact tokenizer count.

Output checks

Run on the answers lane before the answer is forwarded:

  • Hallucination (rule hallucination, high severity): sentence-level grounding check. Each output sentence is evaluated for keyword overlap (3+ character non-stop words) against the combined source documents; sentences with less than 30% coverage are flagged. The check is skipped when no documents have been received on the documents lane.
  • Content safety (rule content_safety, critical severity): regex patterns across three categories: self-harm, violence (weapon and explosive construction), and illegal activity (hacking, theft, counterfeiting).
  • PII leak (rule pii_leak, high severity): pattern matches for email, phone_us, ssn, credit_card, and ip_address.
  • Format compliance (rule format_compliance, medium severity): only runs when expected_format is set. json must parse cleanly; markdown requires at least one markdown element (heading, bold, code, list marker); bullet_list and numbered_list require at least half the non-empty lines to be list items.

Policy modes

When any enabled check fails, policy_mode decides the outcome:

ModeEffect
blockEach violation is logged as a warning and the question or answer is dropped; nothing is forwarded downstream.
warnEach violation is logged as a warning; the item is forwarded anyway.
logThe item is forwarded with no warnings emitted.

Blocking happens silently from the pipeline's point of view: downstream nodes simply never receive the item. Check the engine logs (Guardrails input blocked: ... / Guardrails output blocked: ...) to see what was rejected and why.


Schema

FieldTypeDescriptionDefault
allowed_topicsarrayAllowed topics
If set, input must contain at least one of these keywords
blocked_topicsarrayBlocked topics
Keywords for topics that should be rejected
enable_content_safetybooleanEnable content safety check
Detect harmful or unsafe content in output
true
enable_hallucination_checkbooleanEnable hallucination check
Verify that output claims are grounded in source documents
false
enable_pii_detectionbooleanEnable PII detection
Detect personal identifiable information (emails, phones, SSNs, credit cards) in output
true
enable_prompt_injectionbooleanEnable prompt injection detection
Detect and flag prompt injection attempts in input
true
expected_formatstringExpected output format
Validate that output matches this format (empty = no check)
""
guardrails.profilestringProfile
Guardrails profile
"basic"
max_input_lengthnumberMax input length (chars)
Maximum character count for input text (0 = no limit)
0
max_tokens_estimatenumberMax tokens (estimate)
Maximum estimated token count for input text (0 = no limit)
0
policy_modestringPolicy mode
How to handle violations: block (reject), warn (log + continue), log (silent)
"warn"