Named Entity Recognition
A RocketRide text-processing node that identifies and extracts named entities from text and documents using HuggingFace transformer models.
What it does
Runs a HuggingFace token-classification (NER) pipeline over everything flowing through the node and attaches the recognized entities to document metadata for downstream filtering, search, and analysis.
Uses the transformers pipeline via the RocketRide model server (ai.common.models.transformers): the pipeline automatically uses the model server when available and falls back to local execution otherwise, so the node has no local Python dependencies of its own. The node is GPU-capable and registers as a filter with class type text.
The model is loaded once per pipeline run (in the global context) and shared across all instances. Entities below the configured confidence threshold (default 0.9) are discarded. If entity extraction fails on a piece of text, the error is logged and an empty entity list is returned; the pipeline keeps running and the original content still passes through.
Each extracted entity carries: entity_group (type such as PER, ORG, LOC), word (the entity text), score (confidence), and start / end (character offsets).
Configuration
Lanes
| Lane in | Lane out | Description |
|---|---|---|
text | text | Extract entities, pass the original text through unchanged |
documents | documents | Extract entities from each document's content and enrich document metadata |
On the documents lane, when Store in metadata is on (the default), each document copy gains:
entities_<type>: one key per entity type, lowercased (e.g.entities_per,entities_org,entities_loc), holding a deduplicated, sorted list of entity textsentities_count: total number of entities found in the document
The original documents are never mutated; enriched copies are written downstream.
Fields
The node is configured by picking a model profile (see below). The custom profile additionally exposes the model name field.
| Field | Type | Description |
|---|---|---|
model | string | HuggingFace model to use for NER |
aggregation_strategy | string | Default "simple". How to combine word pieces into entities |
min_confidence | number | Default 0.9. Minimum confidence score (0.0-1.0) for entity detection |
store_in_metadata | boolean | Default true. Add extracted entities to document metadata fields |
profile | string | Default "bertLarge". NER model configuration |
If no model is configured, the recognizer falls back to dbmdz/bert-large-cased-finetuned-conll03-english.
Profiles
The default profile is bertLarge.
| Profile key | Title | Model | Notes |
|---|---|---|---|
bertLarge | BERT Large (English) - high accuracy for English text | dbmdz/bert-large-cased-finetuned-conll03-english | Default |
bertBase | BERT Base (English) - balanced performance | dslim/bert-base-NER | |
distilbert | DistilBERT (English) - fast and lightweight | Davlan/distilbert-base-multilingual-cased-ner-hrl | Multilingual model despite the title |
xlmRoberta | XLM-RoBERTa (Multilingual) - supports 100+ languages | Davlan/xlm-roberta-base-ner-hrl | |
deberta | DeBERTa v3 (English) - state-of-the-art accuracy | dslim/distilbert-NER | Currently maps to DistilBERT NER |
biomedical | BioBERT (Biomedical) - medical/scientific entities | dmis-lab/biobert-base-cased-v1.1 | min_confidence defaults to 0.85 |
custom | Custom model | (user-specified) | Any compatible HuggingFace NER model |
All profiles use aggregation_strategy: simple and min_confidence: 0.9 unless noted above; both can be overridden in the node config.
Schema
| Field | Type | Description | Default |
|---|---|---|---|
ner.aggregation_strategy | string | Entity aggregation strategy How to combine word pieces into entities | "simple" |
ner.min_confidence | number | Minimum confidence threshold Minimum confidence score (0.0-1.0) for entity detection | 0.9 |
ner.model | string | Model name HuggingFace model to use for NER | |
ner.profile | string | Model NER model configuration | "bertLarge" |
ner.store_in_metadata | boolean | Store entities in document metadata Add extracted entities to document metadata fields | true |