Skip to main content
View source

Reducto

View as Markdown

A RocketRide data node that parses documents with the Reducto cloud API, extracting clean Markdown text and structured tables.

What it does

Sends each incoming document to the Reducto cloud API for parsing and emits results on two output lanes: full extracted text as Markdown and each detected table as a separate Markdown table block. It handles PDFs, images, scanned documents, and mixed-content files.

Uses the reductoai Python SDK: each document is uploaded with reducto.upload() and parsed with reducto.parse.run(), where all parsing behavior is expressed through the enhance parameter. A fresh Reducto client is created per document parse so concurrent documents are safe.

The node buffers the document byte stream from the incoming tag lane and parses it once the stream ends. Output is only produced for lanes that have a downstream listener. If parsing fails, the error is logged and the node emits nothing for that document (empty text, no tables) rather than raising.

The extracted text is assembled as Markdown from Reducto's block types: title blocks become # headings, section_header blocks become ## headings, list_item blocks become - bullets, and table blocks appear both inline in the text stream and as separate items on the table lane. figure blocks are included as plain content or, when AI summarization is enabled, prefixed with [DIAGRAM/IMAGE SUMMARY]:.


Configuration

Lanes

Lane inLane outDescription
datatextExtracted text as Markdown
datatableExtracted tables in Markdown table format

Fields

FieldTypeDescription
api_keystringYour Reducto API key
parse_modebooleanDefault false. Toggle to use the advanced parse mode, and have access to the full set of options from the Reducto API.
Contains_Handwritten_TextbooleanDefault false. Enables Agentic OCR mode for better handwriting recognition and small text/table cell corrections.
Contains_Non_English_TextbooleanDefault false. Enables Multilingual OCR system which can parse non-Germanic languages and unicode symbols.
Summarize_TextbooleanDefault false. Generate AI summaries for figures, diagrams, and images using vision-language models.
advanced_documentationnullIn advanced mode, you can use the full set of options from the Reducto API. For each set of options you must use and only include a python dictionary, e.g., {'key': 'value', 'flag': True}. If no information is provided for a set of options, the default values will be used. For more information on what options are available, see the Reducto API documentation at https://docs.reducto.ai/parsing/default-configurations. This page also contains examples of how to format the options fields. (In Advanced mode your configuration from Simple mode will be ignored)
optionsstringOptions for the Reducto API
advanced_optionsstringAdvanced options for the Reducto API
experimental_optionsstringExperimental options for the Reducto API

The default profile uses Simple mode (parse_mode: false).

Simple mode

When parse_mode is false, three optional toggles (all defaulting to false) control Reducto enhance options:

FieldDefaultEffect when enabled
Contains_Handwritten_TextfalseSets ocr_mode: "agentic": Agentic OCR for better handwriting recognition and small text/table cell corrections.
Contains_Non_English_TextfalseSets ocr_system: "multilingual": Multilingual OCR for non-Germanic languages and Unicode symbols.
Summarize_TextfalseSets summarize_figures: true: AI summaries for figures, diagrams, and images using vision-language models.

Table summarization (summarize_tables) is always set to false in Simple mode because SDK 0.13.0 does not support it effectively.

Advanced mode

When parse_mode is true, the Simple mode toggles are ignored and you get direct access to the Reducto API through three free-text fields:

FieldDescription
optionsOptions for the Reducto API.
advanced_optionsAdvanced options for the Reducto API.
experimental_optionsExperimental options for the Reducto API.

Each field must contain a Python dictionary literal, for example {'key': 'value', 'flag': True}. The values are parsed with ast.literal_eval, so JSON-only syntax such as true or null will fail validation. Empty fields are skipped and Reducto's defaults apply. The three dictionaries are merged in order (options, then advanced_options, then experimental_options; later keys override earlier ones) into the single enhance parameter passed to parse.run().

See the Reducto parsing configurations documentation for available parameters and formatting examples.


Authentication

Set api_key to a valid Reducto API key. Config validation verifies the key by performing a minimal in-memory upload (ping.txt, no parsing) against the Reducto API. An invalid key causes validation to fail with the message Reducto API key validation failed. Note that validating the config requires network access to Reducto.


Upstream docs


Schema

FieldTypeDescriptionDefault
reducto.Contains_Handwritten_TextbooleanContains Handwritten Text
Enables Agentic OCR mode for better handwriting recognition and small text/table cell corrections.
false
reducto.Contains_Non_English_TextbooleanContains Non-English Text
Enables Multilingual OCR system which can parse non-Germanic languages and unicode symbols.
false
reducto.Summarize_TextbooleanAI Summarize Figures/Images
Generate AI summaries for figures, diagrams, and images using vision-language models.
false
reducto.advanced_documentationnullAdvanced Parse Mode - How to
In advanced mode, you can use the full set of options from the Reducto API. For each set of options you must use and only include a python dictionary, e.g., {'key': 'value', 'flag': True}. If no information is provided for a set of options, the default values will be used. For more information on what options are available, see the Reducto API documentation at https://docs.reducto.ai/parsing/default-configurations. This page also contains examples of how to format the options fields. (In Advanced mode your configuration from Simple mode will be ignored)
null
reducto.advanced_optionsstringAdvanced Options
Advanced options for the Reducto API
reducto.api_keystringAPI Key
Your Reducto API key
reducto.experimental_optionsstringExperimental Options
Experimental options for the Reducto API
reducto.optionsstringOptions
Options for the Reducto API
reducto.parse_modebooleanAdvanced Mode
Toggle to use the advanced parse mode, and have access to the full set of options from the Reducto API.
false

Dependencies

  • reductoai