Skip to main content
View source

Anonymize

View as Markdown

A RocketRide filter node that detects and masks sensitive entities in text flowing through a pipeline.

What it does

Scans text for personally identifiable information (names, emails, phone numbers, addresses, and more) using a locally-run GLiNER zero-shot NER model and replaces every character of each detected entity span with a configurable masking character (default , U+2588). Text length and structure are preserved: only the entity character positions are replaced, and overlapping or adjacent spans are merged before masking.

Example:

Input:  John Smith is a patient at St. Mary's Hospital.
Output: ████ █████ is a patient at ██ █████████████████.

Models are loaded via ai.common.models.GLiNER, which runs inference locally and automatically routes to a model server when the engine is started with the --modelserver flag. Models are downloaded from Hugging Face on first use; no API key is required. The node declares the gpu capability, so GPU acceleration is used when available.

Large documents are split into 1024-character chunks with a 128-character overlap so entities at chunk boundaries are not missed. Chunks are processed in parallel using up to 4 threads, entity labels are batched in groups of 32, and entities found in the overlap regions are de-duplicated before the final masking pass.

Note: AI-based detection cannot guarantee 100% accuracy. Review results before using in production.


Configuration

Lanes

LaneDirectionBehaviour
textin -> outIncoming text chunks are buffered for the whole object (downstream delivery is suspended via preventDefault). At object close, the buffered text is anonymized once and forwarded downstream as a single write.

When an upstream classifier node is present, the node also receives classifications and adjusts its behaviour (see "Entity labels" below).

Fields

The node is configured by choosing a model profile. Each profile exposes the masking character field; the custom profile additionally exposes a free-form model name field.

FieldTypeDescription
anonymizeCharstringCharacter
modelstringGliner model to use for anonymization
profilestringDefault "glinerMergedLarge". Anonymize model

Model profiles

Profile keyModelBest for
glinerSmallurchade/gliner_small-v2.1General English PII, fastest
glinerMediumurchade/gliner_medium-v2.1General English PII, balanced
glinerLargeurchade/gliner_large-v2.1General English PII, highest accuracy
glinerPIILargeknowledgator/gliner-pii-large-v1.0High-accuracy English PII
glinerMergedLarge (default)xomad/gliner-model-merge-large-v1.0Combined from multiple datasets, broad coverage
glinerMultiurchade/gliner_multiMultilingual text
glinerMultiPIIurchade/gliner_multi_pii-v1Multilingual PII
gretelSmallgretelai/gretel-gliner-bi-small-v1.0Business-oriented NER, compact
gretelLargegretelai/gretel-gliner-bi-large-v1.0Business-oriented NER, large scale
glinerKotaeminlee/gliner_koKorean
glinerItDeepMount00/GLiNER_PII_ITAItalian
glinerArNAMAA-Space/gliner_arabic-v2.1Arabic
glinerCommunitySmallgliner-community/gliner_small-v2.5Community general, compact
glinerCommunityMediumgliner-community/gliner_medium-v2.5Community general, balanced
glinerCommunityLargegliner-community/gliner_large-v2.5Community general, largest
glinerBiomedSmallIhor/gliner-biomed-small-v1.0Biomedical and clinical text, compact
glinerBiomedLargeIhor/gliner-biomed-large-v1.0Biomedical and clinical text, high accuracy
customuser-suppliedAny Hugging Face GLiNER model name

The default profile when adding the node is glinerSmall (as set in preconfig.default); the anonymize.profile field UI default is glinerMergedLarge.


Entity labels

What gets detected and masked depends on whether an upstream classifier node feeds this node.

Standalone (no upstream classifier)

The node runs GLiNER with a built-in set of 15 PII labels:

person, name, email, phone number, address, social security number, credit card number, date of birth, organization, company, location, ip address, bank account, passport number, driver license

With an upstream classifier

When classification data arrives before the object closes, the node:

  1. Masks the exact character spans (offset, length) reported in the classification textMatches.
  2. Resolves classification rule idRef values to English names via the Nucleuz rule pack (nucleuz/rulePack.dat under the engine path).
  3. Extracts keyword <Term> entries from the classification rules as additional GLiNER labels.
  4. Runs GLiNER with the combined label set and merges the results with the spans from step 1.

If nucleuz/rulePack.dat is not present, rule-name resolution silently produces no results and the node falls back to GLiNER-only mode using the labels extracted from the classification rules.


Running the tests

The node ships automated test cases in services.json. The standard test runs against the glinerSmall profile; the full test exercises every model profile.

# Standard test (glinerSmall profile)
builder nodes:test

# Full test across all model profiles
builder nodes:test-full

Schema

FieldTypeDescriptionDefault
anonymize.modelstringModel name
Gliner model to use for anonymization
anonymize.profilestringModel
Anonymize model
"glinerMergedLarge"
anonymizeCharstringCharacter to use for anonymization
Character

Dependencies

  • gliner