Named Entity Recognition

View as Markdown

A RocketRide text-processing node that identifies and extracts named entities from text and documents using HuggingFace transformer models.

What it does

Runs a HuggingFace token-classification (NER) pipeline over everything flowing through the node and attaches the recognized entities to document metadata for downstream filtering, search, and analysis.

Uses the transformers pipeline via the RocketRide model server (ai.common.models.transformers): the pipeline automatically uses the model server when available and falls back to local execution otherwise, so the node has no local Python dependencies of its own. The node is GPU-capable and registers as a filter with class type text.

The model is loaded once per pipeline run (in the global context) and shared across all instances. Entities below the configured confidence threshold (default 0.9) are discarded. If entity extraction fails on a piece of text, the error is logged and an empty entity list is returned; the pipeline keeps running and the original content still passes through.

Each extracted entity carries: entity_group (type such as PER, ORG, LOC), word (the entity text), score (confidence), and start / end (character offsets).

Configuration

Lanes

Lane in	Lane out	Description
`text`	`text`	Extract entities, pass the original text through unchanged
`documents`	`documents`	Extract entities from each document's content and enrich document metadata

On the documents lane, when Store in metadata is on (the default), each document copy gains:

entities_<type>: one key per entity type, lowercased (e.g. entities_per, entities_org, entities_loc), holding a deduplicated, sorted list of entity texts
entities_count: total number of entities found in the document

The original documents are never mutated; enriched copies are written downstream.

Fields

The node is configured by picking a model profile (see below). The custom profile additionally exposes the model name field.

Field	Type	Description
`model`	string	HuggingFace model to use for NER
`aggregation_strategy`	string	Default "simple". How to combine word pieces into entities
`min_confidence`	number	Default 0.9. Minimum confidence score (0.0-1.0) for entity detection
`store_in_metadata`	boolean	Default true. Add extracted entities to document metadata fields
`profile`	string	Default "bertLarge". NER model configuration

If no model is configured, the recognizer falls back to dbmdz/bert-large-cased-finetuned-conll03-english.

Profiles

The default profile is bertLarge.

Profile key	Title	Model	Notes
`bertLarge`	BERT Large (English) - high accuracy for English text	`dbmdz/bert-large-cased-finetuned-conll03-english`	Default
`bertBase`	BERT Base (English) - balanced performance	`dslim/bert-base-NER`
`distilbert`	DistilBERT (English) - fast and lightweight	`Davlan/distilbert-base-multilingual-cased-ner-hrl`	Multilingual model despite the title
`xlmRoberta`	XLM-RoBERTa (Multilingual) - supports 100+ languages	`Davlan/xlm-roberta-base-ner-hrl`
`deberta`	DeBERTa v3 (English) - state-of-the-art accuracy	`dslim/distilbert-NER`	Currently maps to DistilBERT NER
`biomedical`	BioBERT (Biomedical) - medical/scientific entities	`dmis-lab/biobert-base-cased-v1.1`	`min_confidence` defaults to `0.85`
`custom`	Custom model	(user-specified)	Any compatible HuggingFace NER model

All profiles use aggregation_strategy: simple and min_confidence: 0.9 unless noted above; both can be overridden in the node config.

Schema

Field	Type	Description	Default
`ner.aggregation_strategy`	`string`	Entity aggregation strategy How to combine word pieces into entities	`"simple"`
`ner.min_confidence`	`number`	Minimum confidence threshold Minimum confidence score (0.0-1.0) for entity detection	`0.9`
`ner.model`	`string`	Model name HuggingFace model to use for NER
`ner.profile`	`string`	Model NER model configuration	`"bertLarge"`
`ner.store_in_metadata`	`boolean`	Store entities in document metadata Add extracted entities to document metadata fields	`true`

What it does​

Configuration​

Lanes​

Fields​

Profiles​

Schema​