Skip to main content
View source

Ollama Vision

View as Markdown

A RocketRide image node that sends images to locally-hosted Ollama vision models and returns text analysis.

What it does

Connects to open-source multimodal models (Llama 3.2 Vision, LLaVA, Moondream, MiniCPM-V, Qwen 2.5 VL, or any custom model) served by a local Ollama instance. No API key is required: models run entirely on your own hardware, making the node suitable for privacy-sensitive workloads. It accepts either a single image or a stream of image documents (e.g. from a frame grabber); metadata such as frame number and timestamp is preserved on the documents output.

Uses langchain-openai (ChatOpenAI) against Ollama's OpenAI-compatible /v1 endpoint. The configured server base URL is normalized to end with /v1 automatically, and a placeholder API key ("ollama") is sent because Ollama ignores it. Requests run with temperature: 0.

Each inference attempt is capped by a 30-second hard timeout; a timed-out or retryable failure is retried once with exponential backoff (a fresh HTTP client is created per attempt so a hung request cannot exhaust the connection pool). API errors are translated into actionable user-facing messages (see Troubleshooting).


Configuration

Lanes

Lane inLane outDescription
imagetextAnalyze a single image, receive text
documentsdocumentsAnalyze image documents, return text analysis with original metadata preserved

Image to text

Raw image bytes arrive in chunks over the AVI protocol, are accumulated, encoded as a base64 data URL with the incoming MIME type, and sent to the model together with the configured analysis prompt. The model's answer is written to the text lane.

Documents to documents

Each incoming Doc of type Image (its page_content is base64-encoded PNG, since the frame grabber always outputs PNG) is analyzed individually. The answer is emitted as a Text Doc that preserves the original document metadata (chunkId, time_stamp, etc.). Non-Image documents and Image documents with empty content are skipped with a warning; a per-document inference failure is logged and skipped rather than failing the batch. The original image documents do not flow downstream.

If no analysis prompt is configured, the question text from the request is used; if that is also empty, the prompt defaults to Describe this image.

Fields

The node is configured by selecting a profile in the Vision Model field (image_vision_ollama.profile, default llama3_2-vision-11b). All profiles expose the same connection and prompt fields; the Custom profile additionally exposes the model name and token limit.

FieldTypeDescription
modelstringOllama vision model name
modelTotalTokensnumberTotal Tokens
systemPromptstringDefine the model's role and behavior for image analysis
promptstringDescribe what you want to analyze or extract from the image
profilestringDefault "llama3_2-vision-11b". Select the Ollama vision model to use

Profiles

ProfileModelContext tokens
Llama 3.2 Vision 11B (default)llama3.2-vision:11b128,000
Llama 3.2 Vision 90Bllama3.2-vision:90b128,000
Qwen 2.5 VL 3Bqwen2.5vl:3b128,000
Qwen 2.5 VL 7Bqwen2.5vl:7b128,000
LLaVA 7Bllava:7b32,768
LLaVA 13Bllava:13b4,096
LLaVA 34Bllava:34b4,096
MiniCPM-Vminicpm-v8,192
Moondream 2moondream2,048
Custom(user-specified)configurable

See the Ollama model library for available models. The selected model must be pulled into Ollama before use (ollama pull <model>).


Troubleshooting

The node maps common API failures to clear messages:

SymptomMeaning / fix
"Cannot connect to Ollama server"Ollama is not running, or serverbase points to the wrong host/port
"Model '...' is not loaded in Ollama"Pull the model first: ollama pull <model>
"Too many requests to Ollama"Rate limited: wait a moment and retry
"Ollama returned a server error"Check the Ollama server logs
"Vision request timed out"Inference exceeded the 30 s hard timeout; large models may need a warm-up run or a smaller model
"Image processing error"Use a supported image format: JPEG, PNG, GIF, WEBP

Schema

FieldTypeDescriptionDefault
image_vision_ollama.profilestringVision Model
Select the Ollama vision model to use
"llama3_2-vision-11b"
modelstringModel
Ollama vision model name
modelTotalTokensnumberTokens
Total Tokens
vision.promptstringAnalysis Prompt
Describe what you want to analyze or extract from the image
vision.systemPromptstringSystem Instructions
Define the model's role and behavior for image analysis

Dependencies

  • langchain-openai
  • langchain-core
  • langchain