Skip to main content
View source

Text Output

View as Markdown

A RocketRide target node that writes the pipeline's extracted text to an SMB network share, with optional anonymization of classified (sensitive) data.

What it does

Saves your pipeline's text to networked storage over SMB. Each upstream object becomes a .txt file, mirroring the source directory layout under the store path. It is the end of the line, it consumes the text lane and emits nothing.

Uses smbclient / smbprotocol: a pure-Python SMB client, so no host SMB mount or smbclient binary is required on the machine running the engine.

Output is UTF-8, the source file extension is replaced with .txt, and target subdirectories are created automatically. Empty objects are skipped ("no text extracted"), and so are objects unchanged since the last run, so the same file is never rewritten twice.

Optionally, the node can anonymize PII: classification hits in the text are replaced with a masking character before the file is written.

Requires the network capability and is not available in remote (noremote) or SaaS (nosaas) deployments.


Configuration

Lanes

Lane inDescription
textText content to write to the SMB share

The node is a pure target (classType: ["target"]), it produces no output lanes.

Fields

FieldType / DefaultDescription
serverstring, requiredSMB server hostname or IP address (validated against RFC-1123).
usernamestringSMB user in domain format: DOMAIN\user. Required when password is set.
passwordstringSMB password, max 127 characters. Required when username is set.
storePathstring, requiredShare name plus optional subfolders, e.g. share/folder/subfolder. 3–256 characters.
anonymizeboolean, falseMask sensitive data in the text before writing. Enabling it reveals the two fields below.
anonymizeCharstring (1 char), The character used to mask each character of a classification hit. Required when anonymize is on.
anonymizeAllboolean, falseCollapse every hit to a fixed length instead of masking character-for-character (SSN: *** vs SSN: ***********).

storePath rules

The path must not be rooted, must not contain empty or dot (. / ..) folders, and must not contain the characters <>:"|?*. The first segment is the share name (1–80 characters). When the pipeline starts, the node verifies that //server/share is reachable; missing subfolders under the share are created on first write.


Anonymization

When anonymize is enabled, the node injects the classification filter plus an anonymize_text pipe filter, so classification hits in the incoming text are replaced with anonymizeChar before the file is written. With anonymizeAll enabled, each hit is collapsed to a fixed length (3 masking characters) instead of being masked character-for-character.

Changing any anonymization setting (the classify policies, anonymizeChar, or anonymizeAll) changes the settings key the node keeps in its key-value store, which forces all objects to be re-transformed on the next run, not just new and changed ones.


Change detection

The node performs incremental writes. For each object it builds a transform key of the form flags;sourceChangeKey;targetChangeKey, where the source change key comes from the object's change key (or its modify time and size) and the target change key from the existing target file's mtime and size (0;0 when the file does not exist yet). The key is stored in the object's instance tags under text-output://<server>/<storePath>/status.

On the next run an object is skipped ("object transformed and not changed") when its transform key matches the stored one and the anonymization settings have not changed. Failed objects record the exception as their completion code instead of writing a file.


Authentication

Authentication is optional, leave username and password blank for shares that allow anonymous/guest access. When credentials are provided, both fields are required and the username must use the domain format DOMAIN\user. Credentials are registered with the SMB client globally at connection time; the connection (and reachability of the share) is tested when the action starts, not during configuration validation.


Schema

No configuration fields.

Dependencies

  • cffi
  • cryptography ==46.0.7
  • pycparser
  • pyspnego
  • smbprotocol
  • sspilib