OpenAI Vision

A RocketRide filter node that sends images to OpenAI's vision-capable models and returns the text analysis.

What it does

Accepts single image frames or streams of image documents and calls OpenAI's Chat Completions API with each image encoded as a base64 data URL (with detail: "auto"), returning the model's text response. Supports the GPT-4.1 and GPT-4o model families for use cases including image analysis, OCR, visual understanding, and scene description.

Uses the official openai Python SDK (>=2.38.0). If no analysis prompt is configured, the node defaults to "Describe this image in detail.".

Each API call runs in a daemon thread with a 30-second hard timeout and is retried once on retryable errors (rate limits, connection errors, timeouts, 5xx responses). Rate-limit retries honor the retry-after response header (default 60 s); other retries use exponential backoff starting at 1 s. A fresh HTTP client is created per attempt to avoid exhausting the connection pool from a prior timed-out attempt. API errors are translated to user-friendly messages covering authentication failure, rate limits, quota or billing issues, invalid input, model not found, timeout, and server unavailability.

When both lanes carry the same frame, the node makes only one API call per frame: the first lane to process the frame caches the answer, and the second lane reuses it. The cache is cleared at the start of each new frame.

Configuration

Lanes

Lane in	Lane out	Description
`image`	`text`	Analyze a single image frame and emit the model's text response
`documents`	`documents`	Analyze image documents and emit text analysis with original metadata preserved

On the documents lane, each incoming Image document is replaced by a Text document containing the model's answer; the original metadata (frame number, timestamp, chunk id) is carried over. The original Image documents do not flow downstream. Documents with a type other than Image or with empty content are skipped with a warning. Image document content is expected to be base64-encoded PNG: all Image document producers (frame_grabber, thumbnail, embedding_image) normalize to PNG.

If inference fails for a document after retries, the node logs a warning and continues with the next document. On the image lane, a failure logs a warning and emits nothing for that frame. Empty image frames on the image lane are also skipped with a warning.

Fields

Field	Type	Description
`apikey`	string	OpenAI API key. Get one at https://platform.openai.com/api-keys
`model`	string	OpenAI Vision model
`modelTotalTokens`	number	Maximum context length in tokens
`systemPrompt`	string	Define the model's role and behavior for image analysis
`prompt`	string	Describe what you want to analyze or extract from the image
`profile`	string	Default "openai-4-1". Select the OpenAI vision model to use

The selected profile supplies the model identifier and modelTotalTokens context limit. The API key, system prompt, and analysis prompt are configured per profile.

Profiles

Profile	Model	Context (tokens)
`openai-4-1` (default)	`gpt-4.1`	1,047,576
`openai-4-1-mini`	`gpt-4.1-mini`	1,047,576
`openai-4-1-nano`	`gpt-4.1-nano`	1,047,576
`openai-4o`	`gpt-4o`	128,000
`openai-4o-mini`	`gpt-4o-mini`	128,000

Authentication

Provide an OpenAI API key in image_vision_openai.apikey. The key is validated at pipeline start: it must be present and must begin with sk-, otherwise the node raises a configuration error before any image is processed.

Upstream references:

Schema

Field	Type	Description	Default
`image_vision_openai.apikey`	`string`	API Key OpenAI API key. Get one at https://platform.openai.com/api-keys
`image_vision_openai.profile`	`string`	Vision Model Select the OpenAI vision model to use	`"openai-4-1"`
`model`	`string`	Model OpenAI Vision model
`modelTotalTokens`	`number`	Tokens Maximum context length in tokens
`vision.prompt`	`string`	Analysis Prompt Describe what you want to analyze or extract from the image
`vision.systemPrompt`	`string`	System Instructions Define the model's role and behavior for image analysis

Dependencies

openai >=2.38.0

What it does​

Configuration​

Lanes​

Fields​

Profiles​

Authentication​

Schema​

Dependencies​