Skip to main content
View source

Gemini Vision

View as Markdown

A RocketRide filter node that sends images to Google Gemini vision-capable models and emits the model's text analysis.

What it does

Sends images to Google Gemini vision models and returns text analysis: image description, OCR, visual understanding, and scene description. Accepts either a single streamed image (via the image lane) or a stream of image documents such as frames from a frame grabber (via the documents lane). Metadata such as frame number and timestamp is preserved on the documents output.

Uses the google-genai SDK (>=1.14.0). Each inference call runs in its own client and worker thread with a 30-second hard timeout, plus one automatic retry with exponential backoff for transient errors (timeouts, connection failures, 5xx responses). If no analysis prompt is configured, the node defaults to Describe this image in detail.

Most profiles support a 1 million token context window, making this node well-suited for high-volume frame analysis pipelines where many images are processed in sequence. The exception is Gemini 3.1 Flash Image Preview, which has a 131,072-token limit.

When both lanes are connected for the same frame, the node calls Gemini once and reuses the cached answer for the second lane, so you are not billed twice per image.


Configuration

Lanes

Lane inLane outDescription
imagetextAnalyze a single streamed image, receive the model's text response
documentsdocumentsAnalyze image documents, return text analysis with original metadata preserved

On the documents lane, each incoming Image document is replaced by a Text document containing the model's answer, carrying the original metadata. The document content is treated as base64-encoded PNG data. The original Image documents do not flow downstream. Documents with a type other than Image, or with empty content, are skipped with a warning. On the image lane, empty frames are also skipped with a warning.

If inference fails for a document, the node logs a warning and continues with the next one: a single bad frame does not stop the pipeline.

Fields

The node shape exposes a single Vision Model profile selector (image_vision_gemini.profile); the remaining fields appear once a profile is chosen.

FieldTypeDescription
apikeystringGoogle AI API key. Get one at https://aistudio.google.com/apikey
modelstringGemini Vision model
modelTotalTokensnumberMaximum context length in tokens
systemPromptstringDefine the model's role and behavior for image analysis
promptstringDescribe what you want to analyze or extract from the image
profilestringDefault "gemini-2_5-flash". Select the Gemini vision model to use

The system prompt, when set, is sent to Gemini as the system_instruction of every request.


Profiles

Gemini 2.5

ProfileModelContext
Gemini 2.5 Flash (default)models/gemini-2.5-flash1,048,576
Gemini 2.5 Promodels/gemini-2.5-pro1,048,576
Gemini 2.5 Flash Litemodels/gemini-2.5-flash-lite1,048,576

Gemini 3.1 (Preview)

ProfileModelContext
Gemini 3.1 Pro Previewmodels/gemini-3.1-pro-preview1,048,576
Gemini 3.1 Flash Image Previewmodels/gemini-3.1-flash-image-preview131,072

Choosing a profile

  • Flash Lite: fastest and cheapest; good for high-throughput frame pipelines where speed matters more than detail
  • Flash: balanced speed and quality; the recommended default for most vision tasks
  • Pro: highest quality analysis; use when accuracy is critical and latency is acceptable
  • 3.1 Pro Preview / Flash Image Preview: latest generation previews; expect higher capability but potential instability as models are still in preview

Authentication

Get a key at aistudio.google.com/apikey. Keys are free for development use and grant access to all Gemini models listed above.

The node validates the key format at startup: keys beginning with sk- are rejected immediately with a clear message indicating that the key appears to be an OpenAI key. When the configuration is saved, the node also runs a minimal API probe against the selected model and surfaces any provider error (invalid key, missing model, quota exceeded, etc.) as a warning. The probe is skipped while the key is still blank.


Error handling

API errors are mapped to user-friendly messages: authentication failures, rate limits, billing and quota issues, safety-filter blocks, model unavailability, timeouts, and 5xx service errors each produce a clear explanation instead of a raw stack trace.

Retry behavior: one retry for transient errors (timeout, connection, 500/502/503/504, service unavailable) with exponential backoff starting at 1 second. Repeated timeouts are not retried beyond the second attempt, so a hung request costs at most two 30-second waits before the error is surfaced.


Upstream docs


Schema

FieldTypeDescriptionDefault
image_vision_gemini.apikeystringAPI Key
Google AI API key. Get one at https://aistudio.google.com/apikey
image_vision_gemini.profilestringVision Model
Select the Gemini vision model to use
"gemini-2_5-flash"
modelstringModel
Gemini Vision model
modelTotalTokensnumberTokens
Maximum context length in tokens
vision.promptstringAnalysis Prompt
Describe what you want to analyze or extract from the image
vision.systemPromptstringSystem Instructions
Define the model's role and behavior for image analysis

Dependencies

  • google-genai >=1.14.0