Skip to main content
View source

Summarization: LLM

View as Markdown

A RocketRide filter node that uses an LLM to distill incoming text into a summary, key points, and named entities.

What it does

Accumulates all text (and table content) of each object flowing through the pipeline, then on close splits the full document into chunks and asks the connected LLM to extract three things from each chunk: a concise summary, a list of key points, and the most significant named entities (people, organizations, products, events, dates, locations). The LLM is instructed to respond as JSON.

Chunking uses LangChain's RecursiveCharacterTextSplitter, sized to the connected LLM's context length and measured with the LLM's own token counter. The token budget of the instruction prompt itself is accounted for, so each chunk plus the prompt always fits the model's context window.

Only the first numberOfSummaries chunks (default 2) are summarized; the rest of the document is ignored. Each of the three extraction sections can be disabled individually by setting its config field to 0.

Output is emitted only on lanes that actually have a downstream listener: plain formatted text on the text lane, and/or one structured document per section on the documents lane.


Configuration

Lanes

Lane inLane outDescription
texttextSummarized output as plain text: a summary block, a Key Points: bullet list, and an Entities: bullet list. Sections that are disabled or empty are omitted.
textdocumentsSummarized output as structured documents: each summary, key-point list, and entity list becomes its own Doc with an incrementing chunkId in its metadata.

Both table and plain-text input are accepted; table content is appended to the same accumulator as text and summarized together with it.

Fields

FieldTypeDescription
numberOfSummariesnumber
numberOfSummaryWordsnumber
numberOfKeyPointWordsnumber
numberOfEntitiesnumber
profilestringDefault "default".

The node ships a single default profile (selected via the hidden summarization.profile field) that exposes the four fields above.


Connections

ChannelRequiredDescription
llmyes (min 1)LLM used to generate summaries, key points, and entities. Also provides the context length and token counter used for chunking.

Schema

FieldTypeDescriptionDefault
summarization.numberOfEntitiesnumberNumber of entities to extract from the document. Set to 0 to disable entity extraction.
summarization.numberOfKeyPointWordsnumberNumber of words in each key point. Set to 0 to disable key points.
summarization.numberOfSummariesnumberNumber of chunks to summarize after the document is split
summarization.numberOfSummaryWordsnumberNumber of words in each summary. Set to 0 to disable summaries.
summarization.profilestring"default"

Dependencies

  • langchain