Skip to main content
View source

MySQL

View as Markdown

A RocketRide database node that answers natural-language questions against a MySQL database and inserts structured pipeline data into a MySQL table.

What it does

The node operates in two roles. As a pipeline node, it receives natural-language questions on the questions lane, uses a connected LLM to translate them into SQL, executes the query, and emits results on the table, text, and answers output lanes; it also receives structured data on the answers input lane and inserts rows into the configured table. As a tool node, it exposes get_data, get_schema, and get_sql to an agent so the agent can query the database by describing the data it wants.

Connects via SQLAlchemy with the pymysql driver (mysql+pymysql:// DSN; user, password, and database name are URL-encoded so reserved characters don't break the connection string). The connection pool allows up to 30 concurrent connections (pool_size=10, max_overflow=20). The full database schema (tables, columns, types, primary keys, and foreign keys) is reflected once at startup and supplied to the LLM as context for every query.

Only SELECT statements are permitted for queries; all generated SQL passes a whitelist safety check before execution. Inserts go through the answers lane, not through SQL. Saving the node config probes the server with SELECT 1 (5-second connect timeout) and surfaces the driver error verbatim if the connection fails.


Configuration

Lanes

Lane inLanes outDescription
questionstable, text, answersTranslate question to SQL, execute it. Results go to text as a string, to table as a markdown table (valid queries only), and to answers as a markdown table (or the LLM's text response when the question is not a database query).
answers(none)Parse structured JSON data and insert it into the configured table

Two special question types are handled on the questions lane:

  • QuestionType.DIALECT: emits {"dialect": "mysql"} on the answers lane so SDK callers can branch on the underlying engine.
  • QuestionType.EXECUTE: runs the question text as raw SQL, bypassing LLM translation and the safety check. Disabled unless allow_execute is on; see the "Direct SQL execution" section below.

Fields

FieldTypeDescription
hoststringDefault "localhost". Host name or IP address of the MySQL server
userstringDefault "root". User to connect to the MySQL server
passwordstringPassword to connect to the MySQL server
databasestringDefault "database". Name of database
tablestringDefault "table". Name of table
db_descriptionstringDefault empty. What is this database used for? Describe its content and purpose, this helps the LLM generate more accurate queries.
max_attemptsintegerDefault 5. Maximum number of times to re-ask the LLM if EXPLAIN rejects the generated SQL
allow_executebooleanDefault false. Permit QuestionType.EXECUTE callers to run raw SQL without LLM translation or safety checks. Leave OFF unless a trusted application explicitly needs to issue SQL directly.
profilestringDefault "default".

The node has a single default profile containing all fields above.


Connections

ConnectionRequiredDescription
llmyes (min 1)LLM used to craft SQL queries from natural-language questions

Available tools

When connected to an agent, the node exposes three functions:

ToolDescription
get_dataNatural language to SQL SELECT, executed; returns result rows plus the generated SQL. limit defaults to 250 rows, max 25,000.
get_schemaReturns the database schema: tables, columns, types, primary keys, and foreign keys. Pass a table name for a single table, or omit it for the full schema.
get_sqlNatural language to SQL only: returns the generated SELECT statement without executing it

SQL generation and validation

Generated SQL goes through two gates before execution:

  1. Safety whitelist: only SELECT statements (optionally prefixed by EXPLAIN) are allowed. Comments are stripped first so embedded keywords can't fool the check, each statement in a multi-statement string is checked separately, SELECT ... INTO OUTFILE / INTO DUMPFILE is blocked, and WITH (CTE) is deliberately rejected because CTEs can wrap mutations.
  2. EXPLAIN validation: the query is validated by running EXPLAIN against the live database without executing it. If EXPLAIN rejects the query, the rejected SQL and the database error are fed back to the LLM for a corrected attempt. This repeats up to max_attempts times (default 5) before the last result is returned as-is.

If the LLM decides the question is not a database query at all, it answers in plain text instead, and that text is returned on the text and answers lanes (or in the tool result's answer field).


Inserting data

Rows arriving on the answers lane are inserted into the configured table:

  • The table is auto-created on first insert if it doesn't exist, with column types inferred from the data (int, float, datetime, and text detection; short strings become VARCHAR(255), long ones TEXT) and an auto-increment id primary key prepended. A startup warning is logged when the table is missing.
  • Incoming keys are matched to schema columns case-insensitively (UserName maps to username); schema columns absent from the data are inserted as NULL, and keys not in the schema are dropped.
  • Lists and dicts are serialised to JSON strings; booleans are stored as 0/1.

Direct SQL execution

When allow_execute is true, callers sending QuestionType.EXECUTE can run raw SQL (reads or writes) with no LLM translation and no safety whitelist. Writes auto-commit; results are emitted on the text, table, and answers lanes, with the answers payload containing rows and affected_rows. SELECT results are capped at 25,000 rows (configurable via the max_execute_rows config key); exceeding the cap fails the query rather than truncating it.

This is off by default. Leave it off unless a trusted application explicitly needs to issue SQL directly: with it enabled, any pipeline caller that can reach the node can run arbitrary statements against the database.


Schema

FieldTypeDescriptionDefault
mysql.allow_executebooleanAllow direct query execution
Permit QuestionType.EXECUTE callers to run raw SQL without LLM translation or safety checks. Leave OFF unless a trusted application explicitly needs to issue SQL directly.
false
mysql.databasestringDatabase name
Name of database
"database"
mysql.db_descriptionstringDatabase description
What is this database used for? Describe its content and purpose, this helps the LLM generate more accurate queries.
""
mysql.hoststringMySQL host
Host name or IP address of the MySQL server
"localhost"
mysql.max_attemptsintegerMax validation attempts
Maximum number of times to re-ask the LLM if EXPLAIN rejects the generated SQL
5
mysql.passwordstringPassword
Password to connect to the MySQL server
mysql.profilestring"default"
mysql.tablestringTable name
Name of table
"table"
mysql.userstringUser
User to connect to the MySQL server
"root"

Dependencies

  • pymysql ==1.2.0
  • cryptography ==46.0.7