Anonymize
A RocketRide filter node that detects and masks sensitive entities in text flowing through a pipeline.
What it does
Scans text for personally identifiable information (names, emails, phone numbers, addresses, and more) using a locally-run GLiNER zero-shot NER model and replaces every character of each detected entity span with a configurable masking character (default █, U+2588). Text length and structure are preserved: only the entity character positions are replaced, and overlapping or adjacent spans are merged before masking.
Example:
Input: John Smith is a patient at St. Mary's Hospital.
Output: ████ █████ is a patient at ██ █████████████████.
Models are loaded via ai.common.models.GLiNER, which runs inference locally and automatically routes to a model server when the engine is started with the --modelserver flag. Models are downloaded from Hugging Face on first use; no API key is required. The node declares the gpu capability, so GPU acceleration is used when available.
Large documents are split into 1024-character chunks with a 128-character overlap so entities at chunk boundaries are not missed. Chunks are processed in parallel using up to 4 threads, entity labels are batched in groups of 32, and entities found in the overlap regions are de-duplicated before the final masking pass.
Note: AI-based detection cannot guarantee 100% accuracy. Review results before using in production.
Configuration
Lanes
| Lane | Direction | Behaviour |
|---|---|---|
text | in -> out | Incoming text chunks are buffered for the whole object (downstream delivery is suspended via preventDefault). At object close, the buffered text is anonymized once and forwarded downstream as a single write. |
When an upstream classifier node is present, the node also receives classifications and adjusts its behaviour (see "Entity labels" below).
Fields
The node is configured by choosing a model profile. Each profile exposes the masking character field; the custom profile additionally exposes a free-form model name field.
| Field | Type | Description |
|---|---|---|
anonymizeChar | string | Character |
model | string | Gliner model to use for anonymization |
profile | string | Default "glinerMergedLarge". Anonymize model |
Model profiles
| Profile key | Model | Best for |
|---|---|---|
glinerSmall | urchade/gliner_small-v2.1 | General English PII, fastest |
glinerMedium | urchade/gliner_medium-v2.1 | General English PII, balanced |
glinerLarge | urchade/gliner_large-v2.1 | General English PII, highest accuracy |
glinerPIILarge | knowledgator/gliner-pii-large-v1.0 | High-accuracy English PII |
glinerMergedLarge (default) | xomad/gliner-model-merge-large-v1.0 | Combined from multiple datasets, broad coverage |
glinerMulti | urchade/gliner_multi | Multilingual text |
glinerMultiPII | urchade/gliner_multi_pii-v1 | Multilingual PII |
gretelSmall | gretelai/gretel-gliner-bi-small-v1.0 | Business-oriented NER, compact |
gretelLarge | gretelai/gretel-gliner-bi-large-v1.0 | Business-oriented NER, large scale |
glinerKo | taeminlee/gliner_ko | Korean |
glinerIt | DeepMount00/GLiNER_PII_ITA | Italian |
glinerAr | NAMAA-Space/gliner_arabic-v2.1 | Arabic |
glinerCommunitySmall | gliner-community/gliner_small-v2.5 | Community general, compact |
glinerCommunityMedium | gliner-community/gliner_medium-v2.5 | Community general, balanced |
glinerCommunityLarge | gliner-community/gliner_large-v2.5 | Community general, largest |
glinerBiomedSmall | Ihor/gliner-biomed-small-v1.0 | Biomedical and clinical text, compact |
glinerBiomedLarge | Ihor/gliner-biomed-large-v1.0 | Biomedical and clinical text, high accuracy |
custom | user-supplied | Any Hugging Face GLiNER model name |
The default profile when adding the node is glinerSmall (as set in preconfig.default); the anonymize.profile field UI default is glinerMergedLarge.
Entity labels
What gets detected and masked depends on whether an upstream classifier node feeds this node.
Standalone (no upstream classifier)
The node runs GLiNER with a built-in set of 15 PII labels:
person, name, email, phone number, address, social security number, credit card number, date of birth, organization, company, location, ip address, bank account, passport number, driver license
With an upstream classifier
When classification data arrives before the object closes, the node:
- Masks the exact character spans (
offset,length) reported in the classificationtextMatches. - Resolves classification rule
idRefvalues to English names via the Nucleuz rule pack (nucleuz/rulePack.datunder the engine path). - Extracts keyword
<Term>entries from the classification rules as additional GLiNER labels. - Runs GLiNER with the combined label set and merges the results with the spans from step 1.
If nucleuz/rulePack.dat is not present, rule-name resolution silently produces no results and the node falls back to GLiNER-only mode using the labels extracted from the classification rules.
Running the tests
The node ships automated test cases in services.json. The standard test runs against the glinerSmall profile; the full test exercises every model profile.
# Standard test (glinerSmall profile)
builder nodes:test
# Full test across all model profiles
builder nodes:test-full
Schema
| Field | Type | Description | Default |
|---|---|---|---|
anonymize.model | string | Model name Gliner model to use for anonymization | |
anonymize.profile | string | Model Anonymize model | "glinerMergedLarge" |
anonymizeChar | string | Character to use for anonymization Character |
Dependencies
gliner