Mistral Vision
A RocketRide node that sends images to Mistral AI's vision-capable models and returns text analysis.
What it does
Accepts either a single image (via the image lane) or a stream of image documents (via the documents lane, e.g. from a frame grabber), calls the configured Mistral vision model, and returns the model's response as text or as text documents.
Uses the official mistralai Python SDK with a custom httpx client (120 s timeout, redirects followed) to handle large image payloads. Token counting uses the mistral-common tokenizer, with per-model tokenizers loaded strictly; the v3 tokenizer is used as a fallback when no model-specific tokenizer is available.
All requests are sent with temperature 0.0 for deterministic output. Transient failures (timeouts, connection errors, 5xx responses) are retried up to 3 times with exponential backoff. The base delay scales with model size: 2.0 s for large models, 1.5 s for medium, 1.0 s for all others. API errors are translated into user-friendly messages covering authentication failures, rate limits, quota exhaustion, content-policy violations, and image-processing errors.
Configuration
Lanes
| Lane in | Lane out | Description |
|---|---|---|
image | text | Analyze a single image, receive text |
documents | documents | Analyze image documents, return text analysis with original metadata preserved |
On the image lane, incoming image bytes are buffered across chunks, base64-encoded with the source MIME type, and sent to the model together with the configured analysis prompt. The answer is written downstream as text.
On the documents lane, only documents of type Image are processed. Documents with a different type, or an Image document with empty content, are skipped with a warning. The document's page_content is expected to be base64-encoded PNG (the frame grabber always outputs PNG). Each answer is emitted as a Text document that preserves the original metadata (chunkId, time_stamp, etc.). If inference fails for a chunk, a warning is logged and processing continues with the next document.
Fields
| Field | Type | Description |
|---|---|---|
model | string | Mistral Vision model |
modelTotalTokens | number | Maximum context length in tokens |
systemPrompt | string | Define the model's role and behavior for image analysis |
prompt | string | Describe what you want to analyze or extract from the image |
profile | string | Default "mistral-large-3". Select the Mistral vision model to use |
Profiles
| Profile key | Title | Model | Context tokens |
|---|---|---|---|
mistral-large-3 (default) | Mistral Large 3 - Premier Vision | mistral-large-2512 | 256,000 |
mistral-medium-3.1 | Mistral Medium 3.1 - Balanced Vision | mistral-medium-2508 | 128,000 |
mistral-small-3.2 | Mistral Small 3.2 - Fast & Cheap Vision | mistral-small-2506 | 128,000 |
ministral-14b-3 | Ministral 3 14B - High Performance Vision | ministral-14b-2512 | 256,000 |
ministral-8b-3 | Ministral 3 8B - Balanced Vision | ministral-8b-2512 | 256,000 |
ministral-3b-3 | Ministral 3 3B - Efficient Vision | ministral-3b-2512 | 256,000 |
Image input
The node accepts images in the following formats:
- HTTP(S) URL: passed to the Mistral API as-is.
- Data URI (
data:image/...ordata:application/...): passed as-is. - Local file path: read from disk, base64-encoded, and sent as a data URI. Files over 10 MB are rejected. MIME type is inferred from the file extension (
.jpg/.jpeg->image/jpeg,.png->image/png,.gif->image/gif,.webp->image/webp; any unrecognized extension defaults toimage/jpeg).
Authentication
Provide a Mistral AI API key in the apikey field for the selected profile. The node validates the key format at startup: if you supply an OpenAI key (starting with sk-) or a Google AI/Gemini key (starting with AI), initialization fails immediately with a specific error message pointing to the wrong provider.
See the Mistral vision documentation for model capabilities and upstream limits.
Schema
| Field | Type | Description | Default |
|---|---|---|---|
image_vision_mistral.profile | string | Vision Model Select the Mistral vision model to use | "mistral-large-3" |
model | string | Model Mistral Vision model | |
modelTotalTokens | number | Tokens Maximum context length in tokens | |
vision.prompt | string | Analysis Prompt Describe what you want to analyze or extract from the image | |
vision.systemPrompt | string | System Instructions Define the model's role and behavior for image analysis |
Dependencies
mistralaimistral-common[sentencepiece]