Skip to main content
View source

Mistral Vision

View as Markdown

A RocketRide node that sends images to Mistral AI's vision-capable models and returns text analysis.

What it does

Accepts either a single image (via the image lane) or a stream of image documents (via the documents lane, e.g. from a frame grabber), calls the configured Mistral vision model, and returns the model's response as text or as text documents.

Uses the official mistralai Python SDK with a custom httpx client (120 s timeout, redirects followed) to handle large image payloads. Token counting uses the mistral-common tokenizer, with per-model tokenizers loaded strictly; the v3 tokenizer is used as a fallback when no model-specific tokenizer is available.

All requests are sent with temperature 0.0 for deterministic output. Transient failures (timeouts, connection errors, 5xx responses) are retried up to 3 times with exponential backoff. The base delay scales with model size: 2.0 s for large models, 1.5 s for medium, 1.0 s for all others. API errors are translated into user-friendly messages covering authentication failures, rate limits, quota exhaustion, content-policy violations, and image-processing errors.


Configuration

Lanes

Lane inLane outDescription
imagetextAnalyze a single image, receive text
documentsdocumentsAnalyze image documents, return text analysis with original metadata preserved

On the image lane, incoming image bytes are buffered across chunks, base64-encoded with the source MIME type, and sent to the model together with the configured analysis prompt. The answer is written downstream as text.

On the documents lane, only documents of type Image are processed. Documents with a different type, or an Image document with empty content, are skipped with a warning. The document's page_content is expected to be base64-encoded PNG (the frame grabber always outputs PNG). Each answer is emitted as a Text document that preserves the original metadata (chunkId, time_stamp, etc.). If inference fails for a chunk, a warning is logged and processing continues with the next document.

Fields

FieldTypeDescription
modelstringMistral Vision model
modelTotalTokensnumberMaximum context length in tokens
systemPromptstringDefine the model's role and behavior for image analysis
promptstringDescribe what you want to analyze or extract from the image
profilestringDefault "mistral-large-3". Select the Mistral vision model to use

Profiles

Profile keyTitleModelContext tokens
mistral-large-3 (default)Mistral Large 3 - Premier Visionmistral-large-2512256,000
mistral-medium-3.1Mistral Medium 3.1 - Balanced Visionmistral-medium-2508128,000
mistral-small-3.2Mistral Small 3.2 - Fast & Cheap Visionmistral-small-2506128,000
ministral-14b-3Ministral 3 14B - High Performance Visionministral-14b-2512256,000
ministral-8b-3Ministral 3 8B - Balanced Visionministral-8b-2512256,000
ministral-3b-3Ministral 3 3B - Efficient Visionministral-3b-2512256,000

Image input

The node accepts images in the following formats:

  • HTTP(S) URL: passed to the Mistral API as-is.
  • Data URI (data:image/... or data:application/...): passed as-is.
  • Local file path: read from disk, base64-encoded, and sent as a data URI. Files over 10 MB are rejected. MIME type is inferred from the file extension (.jpg/.jpeg -> image/jpeg, .png -> image/png, .gif -> image/gif, .webp -> image/webp; any unrecognized extension defaults to image/jpeg).

Authentication

Provide a Mistral AI API key in the apikey field for the selected profile. The node validates the key format at startup: if you supply an OpenAI key (starting with sk-) or a Google AI/Gemini key (starting with AI), initialization fails immediately with a specific error message pointing to the wrong provider.

See the Mistral vision documentation for model capabilities and upstream limits.


Schema

FieldTypeDescriptionDefault
image_vision_mistral.profilestringVision Model
Select the Mistral vision model to use
"mistral-large-3"
modelstringModel
Mistral Vision model
modelTotalTokensnumberTokens
Maximum context length in tokens
vision.promptstringAnalysis Prompt
Describe what you want to analyze or extract from the image
vision.systemPromptstringSystem Instructions
Define the model's role and behavior for image analysis

Dependencies

  • mistralai
  • mistral-common[sentencepiece]