Gemini Vision

A RocketRide filter node that sends images to Google Gemini vision-capable models and emits the model's text analysis.

What it does

Sends images to Google Gemini vision models and returns text analysis: image description, OCR, visual understanding, and scene description. Accepts either a single streamed image (via the image lane) or a stream of image documents such as frames from a frame grabber (via the documents lane). Metadata such as frame number and timestamp is preserved on the documents output.

Uses the google-genai SDK (>=1.14.0). Each inference call runs in its own client and worker thread with a 30-second hard timeout, plus one automatic retry with exponential backoff for transient errors (timeouts, connection failures, 5xx responses). If no analysis prompt is configured, the node defaults to Describe this image in detail.

Most profiles support a 1 million token context window, making this node well-suited for high-volume frame analysis pipelines where many images are processed in sequence. The exception is Gemini 3.1 Flash Image Preview, which has a 131,072-token limit.

When both lanes are connected for the same frame, the node calls Gemini once and reuses the cached answer for the second lane, so you are not billed twice per image.

Configuration

Lanes

Lane in	Lane out	Description
`image`	`text`	Analyze a single streamed image, receive the model's text response
`documents`	`documents`	Analyze image documents, return text analysis with original metadata preserved

On the documents lane, each incoming Image document is replaced by a Text document containing the model's answer, carrying the original metadata. The document content is treated as base64-encoded PNG data. The original Image documents do not flow downstream. Documents with a type other than Image, or with empty content, are skipped with a warning. On the image lane, empty frames are also skipped with a warning.

If inference fails for a document, the node logs a warning and continues with the next one: a single bad frame does not stop the pipeline.

Fields

The node shape exposes a single Vision Model profile selector (image_vision_gemini.profile); the remaining fields appear once a profile is chosen.

Field	Type	Description
`apikey`	string	Google AI API key. Get one at https://aistudio.google.com/apikey
`model`	string	Gemini Vision model
`modelTotalTokens`	number	Maximum context length in tokens
`systemPrompt`	string	Define the model's role and behavior for image analysis
`prompt`	string	Describe what you want to analyze or extract from the image
`profile`	string	Default "gemini-2_5-flash". Select the Gemini vision model to use

The system prompt, when set, is sent to Gemini as the system_instruction of every request.

Profiles

Gemini 2.5

Profile	Model	Context
Gemini 2.5 Flash (default)	`models/gemini-2.5-flash`	1,048,576
Gemini 2.5 Pro	`models/gemini-2.5-pro`	1,048,576
Gemini 2.5 Flash Lite	`models/gemini-2.5-flash-lite`	1,048,576

Gemini 3.1 (Preview)

Profile	Model	Context
Gemini 3.1 Pro Preview	`models/gemini-3.1-pro-preview`	1,048,576
Gemini 3.1 Flash Image Preview	`models/gemini-3.1-flash-image-preview`	131,072

Choosing a profile

Flash Lite: fastest and cheapest; good for high-throughput frame pipelines where speed matters more than detail
Flash: balanced speed and quality; the recommended default for most vision tasks
Pro: highest quality analysis; use when accuracy is critical and latency is acceptable
3.1 Pro Preview / Flash Image Preview: latest generation previews; expect higher capability but potential instability as models are still in preview

Authentication

Get a key at aistudio.google.com/apikey. Keys are free for development use and grant access to all Gemini models listed above.

The node validates the key format at startup: keys beginning with sk- are rejected immediately with a clear message indicating that the key appears to be an OpenAI key. When the configuration is saved, the node also runs a minimal API probe against the selected model and surfaces any provider error (invalid key, missing model, quota exceeded, etc.) as a warning. The probe is skipped while the key is still blank.

Error handling

API errors are mapped to user-friendly messages: authentication failures, rate limits, billing and quota issues, safety-filter blocks, model unavailability, timeouts, and 5xx service errors each produce a clear explanation instead of a raw stack trace.

Retry behavior: one retry for transient errors (timeout, connection, 500/502/503/504, service unavailable) with exponential backoff starting at 1 second. Repeated timeouts are not retried beyond the second attempt, so a hung request costs at most two 30-second waits before the error is surfaced.

Upstream docs

Schema

Field	Type	Description	Default
`image_vision_gemini.apikey`	`string`	API Key Google AI API key. Get one at https://aistudio.google.com/apikey
`image_vision_gemini.profile`	`string`	Vision Model Select the Gemini vision model to use	`"gemini-2_5-flash"`
`model`	`string`	Model Gemini Vision model
`modelTotalTokens`	`number`	Tokens Maximum context length in tokens
`vision.prompt`	`string`	Analysis Prompt Describe what you want to analyze or extract from the image
`vision.systemPrompt`	`string`	System Instructions Define the model's role and behavior for image analysis

Dependencies

google-genai >=1.14.0

What it does​

Configuration​

Lanes​

Fields​

Profiles​

Choosing a profile​

Authentication​

Error handling​

Upstream docs​

Schema​

Dependencies​