Anonymize

A RocketRide filter node that detects and redacts sensitive entities in text flowing through a pipeline.

What it does

Scans text for sensitive entities — names, emails, phone numbers, organizations, and more — using a locally-run GLiNER zero-shot NER model, then replaces each detected span. You control which entity types are detected (the entityTypes field) and how matches are replaced (the redactionStyle field):

mask (default) overwrites every character of the span with a configurable masking character (default █, U+2588), preserving text length and structure.
token replaces each span with a labelled placeholder tag such as [PERSON] or [EMAIL].

Example (mask):

Input:  John Smith is a patient at St. Mary's Hospital.
Output: ████ █████ is a patient at ██ █████████████████.

Example (token):

Input:  John Smith is a patient at St. Mary's Hospital.
Output: [PERSON] is a patient at [ORGANIZATION].

Overlapping spans are merged before replacement (mask style also merges directly-adjacent spans, since the masked output is identical either way; token style keeps adjacent spans separate so each entity keeps its own tag).

Models are loaded via ai.common.models.GLiNER, which runs inference locally and automatically routes to a model server when the engine is started with the --modelserver flag. Models are downloaded from Hugging Face on first use; no API key is required. The node declares the gpu capability, so GPU acceleration is used when available.

Large documents are split into 1024-character chunks with a 128-character overlap so entities at chunk boundaries are not missed. Chunks are processed in parallel using up to 4 threads, entity labels are batched in groups of 32, and entities found in the overlap regions are de-duplicated before the final replacement pass.

Note: AI-based detection cannot guarantee 100% accuracy. Review results before using in production.

Configuration

Lanes

Lane	Direction	Behaviour
`text`	in -> out	Incoming text chunks are buffered for the whole object (downstream delivery is suspended via `preventDefault`). At object close, the buffered text is anonymized once and forwarded downstream as a single write.

When an upstream classifier node is present, the node also receives classifications and adjusts its behaviour (see "Entity labels" below).

Fields

The node is configured by choosing a model profile. Each profile exposes the entity-type, redaction-style, and masking-character fields; the custom profile additionally exposes a free-form model name field.

Field	Type	Description
`entityTypes`	array	Entity types to detect. Pre-filled with 15 common PII types; remove any you don't want or add your own (the model is zero-shot, so any label works). An empty value falls back to the defaults.
`redactionStyle`	string	How matches are replaced: `mask` (default) overwrites with the masking character; `token` replaces with a labelled tag like `[PERSON]`.
`anonymizeChar`	string	Character used for masking (mask style only)
`model`	string	Gliner model to use for anonymization
`profile`	string	Default "glinerMergedLarge". Anonymize model

Model profiles

Profile key	Model	Best for
`glinerSmall`	`urchade/gliner_small-v2.1`	General English PII, fastest
`glinerMedium`	`urchade/gliner_medium-v2.1`	General English PII, balanced
`glinerLarge`	`urchade/gliner_large-v2.1`	General English PII, highest accuracy
`glinerPIILarge`	`knowledgator/gliner-pii-large-v1.0`	High-accuracy English PII
`glinerMergedLarge` (default)	`xomad/gliner-model-merge-large-v1.0`	Combined from multiple datasets, broad coverage
`glinerMulti`	`urchade/gliner_multi`	Multilingual text
`glinerMultiPII`	`urchade/gliner_multi_pii-v1`	Multilingual PII
`gretelSmall`	`gretelai/gretel-gliner-bi-small-v1.0`	Business-oriented NER, compact
`gretelLarge`	`gretelai/gretel-gliner-bi-large-v1.0`	Business-oriented NER, large scale
`glinerKo`	`taeminlee/gliner_ko`	Korean
`glinerIt`	`DeepMount00/GLiNER_PII_ITA`	Italian
`glinerAr`	`NAMAA-Space/gliner_arabic-v2.1`	Arabic
`glinerCommunitySmall`	`gliner-community/gliner_small-v2.5`	Community general, compact
`glinerCommunityMedium`	`gliner-community/gliner_medium-v2.5`	Community general, balanced
`glinerCommunityLarge`	`gliner-community/gliner_large-v2.5`	Community general, largest
`glinerBiomedSmall`	`Ihor/gliner-biomed-small-v1.0`	Biomedical and clinical text, compact
`glinerBiomedLarge`	`Ihor/gliner-biomed-large-v1.0`	Biomedical and clinical text, high accuracy
`custom`	user-supplied	Any Hugging Face GLiNER model name

The default profile when adding the node is glinerSmall (as set in preconfig.default); the anonymize.profile field UI default is glinerMergedLarge.

Entity labels

What gets detected and redacted depends on whether an upstream classifier node feeds this node.

Standalone (no upstream classifier)

The node runs GLiNER with the labels from the entityTypes field. This is pre-filled with a default set of 15 common PII labels, which you can edit freely (the model is zero-shot, so any label works):

person, name, email, phone number, address, social security number, credit card number, date of birth, organization, company, location, ip address, bank account, passport number, driver license

With an upstream classifier

When classification data arrives before the object closes, the node:

Redacts the exact character spans (offset, length) reported in the classification textMatches. In token style these carry no entity type, so they are tagged [REDACTED] unless a more specific NER detection covers the same span.
Resolves classification rule idRef values to English names via the Nucleuz rule pack (nucleuz/rulePack.dat under the engine path).
Extracts keyword <Term> entries from the classification rules as additional GLiNER labels.
Runs GLiNER with the combined label set and merges the results with the spans from step 1.

If nucleuz/rulePack.dat is not present, rule-name resolution silently produces no results and the node falls back to GLiNER-only mode using the labels extracted from the classification rules.

Running the tests

The node ships automated test cases in services.json. The standard test runs against the glinerSmall profile; the full test exercises every model profile. Server-free unit tests for the pure redaction logic live in nodes/test/test_anonymize_logic.py.

# Standard test (glinerSmall profile)
builder nodes:test

# Full test across all model profiles
builder nodes:test-full

Schema

Field	Type	Description	Default
`anonymize.model`	`string`	Model name Gliner model to use for anonymization
`anonymize.profile`	`string`	Model Anonymize model	`"glinerMergedLarge"`
`anonymizeChar`	`string`	Character to use for anonymization Character
`entityTypes`	`array`	Entity types to anonymize PII / entity types to detect and mask. Pre-filled with common types; remove any you don't want, or add your own (the model is zero-shot, so any label works).	`["person","name","email","phone number","address","social security number","credit card number","date of birth","organization","company","location","ip address","bank account","passport number","driver license"]`
`redactionStyle`	`string`	Redaction style How detected entities are replaced. 'mask' overwrites each entity with the anonymization character (████). 'token' replaces each entity with a labelled tag like [PERSON] or [EMAIL].	`"mask"`

Dependencies

gliner
onnxruntime-gpu ==1.20.1; platform_system != 'Darwin'
onnxruntime ==1.20.1; platform_system == 'Darwin'

What it does​

Configuration​

Lanes​

Fields​

Model profiles​

Entity labels​

Standalone (no upstream classifier)​

With an upstream classifier​

Running the tests​

Schema​

Dependencies​