Mistral Vision

A RocketRide node that sends images to Mistral AI's vision-capable models and returns text analysis.

What it does

Accepts either a single image (via the image lane) or a stream of image documents (via the documents lane, e.g. from a frame grabber), calls the configured Mistral vision model, and returns the model's response as text or as text documents.

Uses the official mistralai Python SDK with a custom httpx client (120 s timeout, redirects followed) to handle large image payloads. Token counting uses the mistral-common tokenizer, with per-model tokenizers loaded strictly; the v3 tokenizer is used as a fallback when no model-specific tokenizer is available.

All requests are sent with temperature 0.0 for deterministic output. Transient failures (timeouts, connection errors, 5xx responses) are retried up to 3 times with exponential backoff. The base delay scales with model size: 2.0 s for large models, 1.5 s for medium, 1.0 s for all others. API errors are translated into user-friendly messages covering authentication failures, rate limits, quota exhaustion, content-policy violations, and image-processing errors.

Configuration

Lanes

Lane in	Lane out	Description
`image`	`text`	Analyze a single image, receive text
`documents`	`documents`	Analyze image documents, return text analysis with original metadata preserved

On the image lane, incoming image bytes are buffered across chunks, base64-encoded with the source MIME type, and sent to the model together with the configured analysis prompt. The answer is written downstream as text.

On the documents lane, only documents of type Image are processed. Documents with a different type, or an Image document with empty content, are skipped with a warning. The document's page_content is expected to be base64-encoded PNG (the frame grabber always outputs PNG). Each answer is emitted as a Text document that preserves the original metadata (chunkId, time_stamp, etc.). If inference fails for a chunk, a warning is logged and processing continues with the next document.

Fields

Field	Type	Description
`model`	string	Mistral Vision model
`modelTotalTokens`	number	Maximum context length in tokens
`systemPrompt`	string	Define the model's role and behavior for image analysis
`prompt`	string	Describe what you want to analyze or extract from the image
`profile`	string	Default "mistral-large-3". Select the Mistral vision model to use

Profiles

Profile key	Title	Model	Context tokens
`mistral-large-3` (default)	Mistral Large 3 - Premier Vision	`mistral-large-2512`	256,000
`mistral-medium-3.1`	Mistral Medium 3.1 - Balanced Vision	`mistral-medium-2508`	128,000
`mistral-small-3.2`	Mistral Small 3.2 - Fast & Cheap Vision	`mistral-small-2506`	128,000
`ministral-14b-3`	Ministral 3 14B - High Performance Vision	`ministral-14b-2512`	256,000
`ministral-8b-3`	Ministral 3 8B - Balanced Vision	`ministral-8b-2512`	256,000
`ministral-3b-3`	Ministral 3 3B - Efficient Vision	`ministral-3b-2512`	256,000

Image input

The node accepts images in the following formats:

HTTP(S) URL: passed to the Mistral API as-is.
Data URI (data:image/... or data:application/...): passed as-is.
Local file path: read from disk, base64-encoded, and sent as a data URI. Files over 10 MB are rejected. MIME type is inferred from the file extension (.jpg/.jpeg -> image/jpeg, .png -> image/png, .gif -> image/gif, .webp -> image/webp; any unrecognized extension defaults to image/jpeg).

Authentication

Provide a Mistral AI API key in the apikey field for the selected profile. The node validates the key format at startup: if you supply an OpenAI key (starting with sk-) or a Google AI/Gemini key (starting with AI), initialization fails immediately with a specific error message pointing to the wrong provider.

See the Mistral vision documentation for model capabilities and upstream limits.

Schema

Field	Type	Description	Default
`image_vision_mistral.profile`	`string`	Vision Model Select the Mistral vision model to use	`"mistral-large-3"`
`model`	`string`	Model Mistral Vision model
`modelTotalTokens`	`number`	Tokens Maximum context length in tokens
`vision.prompt`	`string`	Analysis Prompt Describe what you want to analyze or extract from the image
`vision.systemPrompt`	`string`	System Instructions Define the model's role and behavior for image analysis

Dependencies

mistralai
mistral-common[sentencepiece]

What it does​

Configuration​

Lanes​

Fields​

Profiles​

Image input​

Authentication​

Schema​

Dependencies​