Skip to main content
View source

Anomaly Detector

View as Markdown

A RocketRide filter node that monitors numeric values flowing through a pipeline and flags statistical anomalies by severity.

What it does

Watches numeric values passing through a pipeline and classifies each one as normal, warning, or critical using one of three statistical methods: Z-Score, IQR (interquartile range), or Rolling Average percentage deviation. Use it to catch outliers, unexpected spikes, or shifts in data distribution that may need attention.

Implemented entirely with the Python standard library (math, threading, collections); no external dependencies are required.

One detector is created per pipeline execution and shared across all instances. It maintains a thread-safe sliding window of the most recent windowSize values. Each incoming value is scored against the current window contents and then appended to the window. Window state is discarded when the pipeline ends.

Non-finite inputs (NaN, positive/negative infinity) are treated as normal and skipped. Until the window holds enough data (2 values for Z-Score and Rolling Average, 4 for IQR), every value is reported as normal with details: "insufficient data", so expect a brief warm-up period at the start of every run.

This node is marked experimental.


Configuration

Lanes

LaneIn → OutBehaviour
texttexttextParses a numeric value from the incoming text, scores it, and annotates anomalous text. Non-numeric text passes through unchanged.
documentsdocumentsdocumentsReads the configured metric field from each document's metadata, scores it, and writes the detection result back into the metadata.

Text lane

The node first tries to parse the entire (stripped) text as a float. If that fails, it extracts the first numeric token via regex (integers, decimals, scientific notation, optional leading minus). If no number can be found, the text is forwarded unchanged and a debug message is logged.

When a value is anomalous (warning or critical), a tag is appended to the original text:

42.7 [ANOMALY: critical score=3.4119]

Normal values pass through unchanged.

Documents lane

Each document is deep-copied before enrichment; documents whose metadata is None pass through untouched. Four fields are added to the metadata of each processed document:

Metadata fieldContent
anomaly_scoreNumeric anomaly score, rounded to 4 decimal places.
anomaly_severitynormal, warning, or critical.
anomaly_is_anomalousBoolean, true when severity is warning or critical.
anomaly_detailsHuman-readable diagnostics: method internals, or an explanation of why detection was skipped.

If the metric field is absent from a document's metadata or contains a non-numeric value, the document is marked normal with an explanatory details string and is never dropped.

Fields

The node is configured through a single profile selector plus per-method fields. The selected profile determines the detection method and supplies the defaults shown in the Profiles table below.

FieldTypeDescription
methodstringDefault "z_score". Statistical method used for anomaly detection
sensitivitynumberDefault 2.0. Detection sensitivity threshold (lower = more sensitive)
windowSizeintegerDefault 100. Number of recent values to consider for statistical calculations
metricstringDefault "value". The metadata field name containing the numeric value to monitor
warningThresholdnumberDefault 2.0. Threshold multiplier for warning-level anomalies
criticalThresholdnumberDefault 3.0. Threshold multiplier for critical-level anomalies
profilestringDefault "z_score". Anomaly detection configuration

Profiles

The profile field (UI: "Detection Method") selects a preset that pre-fills method and the threshold defaults. The default profile is z_score.

ProfileMethodsensitivitywindowSizewarningThresholdcriticalThreshold
z_scoreZ-Score2.01002.03.0
iqrIQR1.51001.53.0
rolling_avgRolling Average2.0502.03.0

Detection methods

All three methods produce a numeric score that is classified by the same rule: score >= criticalThreshold is critical, else score >= warningThreshold is warning, else normal.

Z-Score

Measures how many standard deviations the value is from the window mean: score = |value - mean| / std. Requires at least 2 values in the window. A window with zero variance (all identical values) yields normal with details: "zero variance". The sensitivity field has no effect on this method.

IQR

Computes Q1 and Q3 by linear interpolation over the sorted window and defines outlier bounds at Q1 - sensitivity * IQR and Q3 + sensitivity * IQR. The score is the distance from the nearer bound expressed in IQR units; any value outside the bounds (score > 0) is flagged as anomalous regardless of warningThreshold. Requires at least 4 values. A zero-IQR window yields normal with details: "zero IQR".

Rolling Average

Computes a moving average over the most recent half of the window (minimum 2 values) and measures the value's percentage deviation from that local mean. No standard-deviation normalization is applied, making it intuitive for business metrics where "a 10% deviation" has a clear meaning.

The score is pct_deviation / (sensitivity * 10), so the effective trigger points are sensitivity * 10 * threshold percent deviation. With default values (sensitivity 2.0, warningThreshold 2.0, criticalThreshold 3.0), warning fires at 40% deviation and critical at 60%. A zero local mean yields normal with details: "zero mean".


Schema

FieldTypeDescriptionDefault
anomaly_detector.criticalThresholdnumberCritical threshold
Threshold multiplier for critical-level anomalies
3
anomaly_detector.methodstringDetection method
Statistical method used for anomaly detection
"z_score"
anomaly_detector.metricstringMetric field
The metadata field name containing the numeric value to monitor
"value"
anomaly_detector.profilestringDetection Method
Anomaly detection configuration
"z_score"
anomaly_detector.sensitivitynumberSensitivity
Detection sensitivity threshold (lower = more sensitive)
2
anomaly_detector.warningThresholdnumberWarning threshold
Threshold multiplier for warning-level anomalies
2
anomaly_detector.windowSizeintegerWindow size
Number of recent values to consider for statistical calculations
100