OpenAI Vision
A RocketRide filter node that sends images to OpenAI's vision-capable models and returns the text analysis.
What it does
Accepts single image frames or streams of image documents and calls OpenAI's Chat Completions API with each image encoded as a base64 data URL (with detail: "auto"), returning the model's text response. Supports the GPT-4.1 and GPT-4o model families for use cases including image analysis, OCR, visual understanding, and scene description.
Uses the official openai Python SDK (>=2.38.0). If no analysis prompt is configured, the node defaults to "Describe this image in detail.".
Each API call runs in a daemon thread with a 30-second hard timeout and is retried once on retryable errors (rate limits, connection errors, timeouts, 5xx responses). Rate-limit retries honor the retry-after response header (default 60 s); other retries use exponential backoff starting at 1 s. A fresh HTTP client is created per attempt to avoid exhausting the connection pool from a prior timed-out attempt. API errors are translated to user-friendly messages covering authentication failure, rate limits, quota or billing issues, invalid input, model not found, timeout, and server unavailability.
When both lanes carry the same frame, the node makes only one API call per frame: the first lane to process the frame caches the answer, and the second lane reuses it. The cache is cleared at the start of each new frame.
Configuration
Lanes
| Lane in | Lane out | Description |
|---|---|---|
image | text | Analyze a single image frame and emit the model's text response |
documents | documents | Analyze image documents and emit text analysis with original metadata preserved |
On the documents lane, each incoming Image document is replaced by a Text document containing the model's answer; the original metadata (frame number, timestamp, chunk id) is carried over. The original Image documents do not flow downstream. Documents with a type other than Image or with empty content are skipped with a warning. Image document content is expected to be base64-encoded PNG: all Image document producers (frame_grabber, thumbnail, embedding_image) normalize to PNG.
If inference fails for a document after retries, the node logs a warning and continues with the next document. On the image lane, a failure logs a warning and emits nothing for that frame. Empty image frames on the image lane are also skipped with a warning.
Fields
| Field | Type | Description |
|---|---|---|
apikey | string | OpenAI API key. Get one at https://platform.openai.com/api-keys |
model | string | OpenAI Vision model |
modelTotalTokens | number | Maximum context length in tokens |
systemPrompt | string | Define the model's role and behavior for image analysis |
prompt | string | Describe what you want to analyze or extract from the image |
profile | string | Default "openai-4-1". Select the OpenAI vision model to use |
The selected profile supplies the model identifier and modelTotalTokens context limit. The API key, system prompt, and analysis prompt are configured per profile.
Profiles
| Profile | Model | Context (tokens) |
|---|---|---|
openai-4-1 (default) | gpt-4.1 | 1,047,576 |
openai-4-1-mini | gpt-4.1-mini | 1,047,576 |
openai-4-1-nano | gpt-4.1-nano | 1,047,576 |
openai-4o | gpt-4o | 128,000 |
openai-4o-mini | gpt-4o-mini | 128,000 |
Authentication
Provide an OpenAI API key in image_vision_openai.apikey. The key is validated at pipeline start: it must be present and must begin with sk-, otherwise the node raises a configuration error before any image is processed.
Upstream references:
Schema
| Field | Type | Description | Default |
|---|---|---|---|
image_vision_openai.apikey | string | API Key OpenAI API key. Get one at https://platform.openai.com/api-keys | |
image_vision_openai.profile | string | Vision Model Select the OpenAI vision model to use | "openai-4-1" |
model | string | Model OpenAI Vision model | |
modelTotalTokens | number | Tokens Maximum context length in tokens | |
vision.prompt | string | Analysis Prompt Describe what you want to analyze or extract from the image | |
vision.systemPrompt | string | System Instructions Define the model's role and behavior for image analysis |
Dependencies
openai>=2.38.0