Skip to main content
View source

OCR

View as Markdown

A RocketRide filter node that extracts machine-readable text and tables from images using optical character recognition.

What it does

Turns visual content (scanned documents, screenshots, photos) into structured text for downstream analysis. The node is GPU-capable and registered as a filter in the pipeline.

Four OCR engines are supported via the ai.common.models model-server wrappers: EasyOCR (multi-language, the default), DocTR (document-focused, language-agnostic), Surya (multi-language, 90+ languages), and TrOCR (transformer-based, Microsoft model). The wrappers auto-detect whether to call a remote model server or fall back to local inference. An unknown engine name falls back to EasyOCR silently.

Table extraction uses img2table for OpenCV-based table structure detection, with OCR inference routed through the same model-server adapter (ModelServerOCR). Detected tables are emitted on the table lane as Markdown. Both img2table v1 and v2 plug-in APIs are supported: v2 reorganised the API and replaced the two-step content/to_ocr_dataframe contract with a single of() method returning OCRData; the node detects the installed version at import time via _IMG2TABLE_V2.

Animated GIFs are handled frame by frame: each frame is OCR'd individually and the per-frame texts are joined with newlines before being written to the text lane. OCR reads are serialised with an internal threading lock so concurrent instances share one engine safely.


Configuration

Lanes

Lane inLane outDescription
documentstextExtract text from image documents
imagetextExtract text from a raw image
imagetableExtract tables from a raw image

On the documents lane, every incoming document must be of type Image (the node raises a ValueError otherwise). Each image document is OCR'd and re-emitted as a Document-type copy whose page_content is the extracted text. The original image documents are not forwarded: if a downstream node needs the images themselves, connect it to the source node directly.

Fields

FieldTypeDescription
enginestringDefault "easyocr". Select the OCR engine for text extraction. EasyOCR supports many languages with script families. DocTR is language-agnostic and good for documents. Surya supports multi-language. TrOCR uses transformer models.
script_familystringDefault "latin". Select the script family for OCR. This determines which languages are loaded for text recognition. Only applies to EasyOCR engine.
det_archstringDefault "db_resnet50". Choose the architecture used for table text detection.
Documentation: https://mindee.github.io/doctr/latest/using_doctr/using_models.html
reco_archstringDefault "crnn_vgg16_bn". Choose the architecture used for table text recognition.
Documentation: https://mindee.github.io/doctr/latest/using_doctr/using_models.html
table_enginestringDefault "doctr". Select the OCR engine used for table text extraction. DocTR is optimized for document tables. EasyOCR and Surya are general-purpose alternatives.
profilestringDefault "latin". Select a preconfigured OCR profile optimized for different languages and use cases.

The main settings panel exposes ocr.profile, ocr.engine, ocr.script_family, and ocr.table_engine. The DocTR architecture fields (ocr.det_arch, ocr.reco_arch) accept the architectures listed in the DocTR model docs.

Detection architectures: linknet_resnet18, linknet_resnet34, linknet_resnet50, db_resnet50, db_mobilenet_v3_large, fast_tiny, fast_small, fast_base.

Recognition architectures: crnn_vgg16_bn, crnn_mobilenet_v3_small, crnn_mobilenet_v3_large, sar_resnet31, master, vitstr_small, vitstr_base, parseq.

The TrOCR engine additionally reads an optional trocr_model config value selecting the Hugging Face model variant (default: microsoft/trocr-base-printed).


Profiles

Profiles are preconfigured combinations of engine, script family, and table engine. Selecting a profile sets all three at once. The default profile is latin.

Profile keyTitleEngineScript familyTable engine
latinLatin (English)EasyOCRlatinDocTR
latin-extendedLatin Extended (European)EasyOCRlatin-extendedDocTR
cyrillicCyrillic (Russian, etc.)EasyOCRcyrillicDocTR
arabicArabic/Persian/UrduEasyOCRarabicDocTR
devanagariDevanagari (Hindi, etc.)EasyOCRdevanagariDocTR
chinese-simplifiedChinese (Simplified)EasyOCRchinese-simplifiedDocTR
chinese-traditionalChinese (Traditional)EasyOCRchinese-traditionalDocTR
japaneseJapaneseEasyOCRjapaneseDocTR
koreanKoreanEasyOCRkoreanDocTR
doctrDocTR (Language-agnostic)DocTRlatin (unused)DocTR
suryaSurya (Multi-language)Suryalatin (unused)Surya
trocrTrOCR (Transformer)TrOCRlatin (unused)DocTR

Script families

Script families map to EasyOCR language code lists. Every family except plain latin also loads English as a fallback. The script_family setting has no effect when the selected engine is DocTR, Surya, or TrOCR.

FamilyLanguages loaded
latinen only (English only, for reliability)
latin-extendeden plus ~28 Latin-script languages: fr, de, es, it, pt, nl, pl, ro, cs, sk, hu, hr, sl, sq, lt, lv, da, no, sv, id, ms, tl, vi, tr, az, uz, sw, la, oc
cyrillicru, uk, be, bg, rs_cyrillic, mn, en (Macedonian not supported by EasyOCR; Serbian maps to rs_cyrillic)
arabicar, fa, ur, ug, en
devanagarihi, mr, ne, en
bengalibn, as, en
chinese-simplifiedch_sim, en
chinese-traditionalch_tra, en
japaneseja, en
koreanko, en
thaith, en
tamilta, en
telugute, en

The bengali, thai, tamil, and telugu families are selectable via ocr.script_family but have no preconfigured profile. The Japanese EasyOCR models may misread English text; the full test suite uses a relaxed assertion (contains: "quick" instead of "quick brown fox") for that profile.


OpenCV compatibility

All four engines share the cv2 namespace but require different OpenCV builds. The node installs a single unified build, opencv-contrib-python==4.13.0.92, via ai.common.opencv, which also uninstalls competing variants (opencv-python, opencv-python-headless, opencv-contrib-python-headless) so only one cv2 is active at runtime.

Upstream pins (as of the versions currently used):

EnginePyPI packageUpstream OpenCV requirementMatches project's 4.13.0.92?
EasyOCReasyocr 1.7.2opencv-python-headless (unpinned)Yes
DocTRpython-doctr 1.0.1opencv-python <5.0.0, >=4.5.0Yes
Suryasurya-ocr 0.17.1opencv-python-headless==4.11.0.86No (hard pin to 4.11.0.86)
TrOCRcraft-text-detector 0.4.3 (detector dep)opencv-python <4.5.4.62, >=3.4.8.29No (caps below 4.5.4.62)

Surya and TrOCR's detector pin OpenCV to versions the project deliberately overrides. They work because ai.common.opencv runs depends() at import time and force-aligns all four OpenCV variants to 4.13.0.92 after the engines are installed. Always import from ai.common.opencv import cv2 before importing an OCR engine, or the wrong cv2 may be resolved.

For the same reason, IGlobal.py imports ai.common.opencv before img2table: img2table internally imports cv2 at load time, so the correct OpenCV package must already be active.


img2table version compatibility

img2table 2.0 (released 2026-05-10) reorganised the OCR plug-in API. The node supports both v1 and v2:

Symbol / locationimg2table v1img2table v2
OCRInstance base classimg2table.ocr.baseimg2table.ocr._types
Result type returned by of()OCRDataframe (img2table.ocr.data)OCRData (img2table.ocr._types)
Plug-in contractcontent() + to_ocr_dataframe()single of() override

The _IMG2TABLE_V2 flag is set at import time and gates each code path. external_contracts.py declares version-tagged import requirements so the check-externals CI framework can validate the correct symbols on whichever version is installed.


Upstream docs


Schema

FieldTypeDescriptionDefault
ocr.det_archstringDetection Architecture (DocTR)
Choose the architecture used for table text detection.
Documentation: https://mindee.github.io/doctr/latest/using_doctr/using_models.html
"db_resnet50"
ocr.enginestringOCR Engine
Select the OCR engine for text extraction. EasyOCR supports many languages with script families. DocTR is language-agnostic and good for documents. Surya supports multi-language. TrOCR uses transformer models.
"easyocr"
ocr.profilestringOCR Profile
Select a preconfigured OCR profile optimized for different languages and use cases.
"latin"
ocr.reco_archstringRecognition Architecture (DocTR)
Choose the architecture used for table text recognition.
Documentation: https://mindee.github.io/doctr/latest/using_doctr/using_models.html
"crnn_vgg16_bn"
ocr.script_familystringScript Family
Select the script family for OCR. This determines which languages are loaded for text recognition. Only applies to EasyOCR engine.
"latin"
ocr.table_enginestringTable OCR Engine
Select the OCR engine used for table text extraction. DocTR is optimized for document tables. EasyOCR and Surya are general-purpose alternatives.
"doctr"

Dependencies

  • img2table
  • pillow
  • numpy