Skip to main content
View source

Accessibility Describe

View as Markdown

A RocketRide image-filter node that turns an image into a structured scene description optimized for blind and visually impaired users.

What it does

Receives an image on its input lane, sends it to Google Gemini Vision (via the google-genai SDK), and emits a text description designed for assistive use, for example real-time narration on smart glasses. The description covers environment type, hazards with positions, key objects, visible text read verbatim (OCR), people, and navigation guidance, kept under 150 words by the default prompt.

The node buffers the incoming image stream, base64-encodes it as a data URL, and sends it together with the analysis prompt in a single generate_content call. Requests run with temperature: 0.3 and max_output_tokens: 1024. Transient failures (timeouts, connection errors, 5xx responses) are retried up to 3 times with exponential backoff. API errors are translated into user-friendly messages covering authentication, rate limiting, safety blocks, model unavailability, and timeouts.

The default model is gemini-2.5-flash. A Google AI API key is required: the node fails at startup without one, and rejects keys starting with sk- (OpenAI keys) with a clear error.


Configuration

Lanes

LaneDirectionDescription
imageinputThe image to describe (streamed; any image MIME type)
textoutputThe accessibility-optimized scene description

Fields

FieldTypeDescription
modelstringGoogle Gemini vision model
modelTotalTokensnumberMaximum context length in tokens
systemPromptstringDefault "You are an accessibility-focused scene analyzer designed to help blind and visually impaired users understand their surroundings through image descriptions.". Define the accessibility description behavior and priorities
promptstringDefault "Describe this image for a blind person. Include: environment type, hazards with positions, key objects with clock positions, visible text, people, and navigation guidance. Keep under 150 words.". Prompt template for generating accessibility descriptions from images
prioritizeHazardsstringDefault "high". How aggressively to prioritize hazard detection
spatialFormatstringDefault "clock". How to describe spatial positions
profilestringDefault "gemini-2.5-flash". Select the Gemini vision model for accessibility descriptions

If accessibility.systemPrompt or accessibility.prompt is left empty, the node falls back to a generic systemPrompt / prompt config value, then to its built-in defaults.

Hazard priority

ValueEffect
high (default)The model must lead with hazards; if none exist it explicitly states the area appears safe
mediumHazards are included in their spatial context when present
lowStandard description order, no extra hazard emphasis

Spatial format

ValueEffect
clock (default)Clock positions (12 o'clock = straight ahead)
relativeRelative directions (left, right, ahead, behind)
bothBoth clock positions and relative directions

Both settings are applied as modifiers appended to the system prompt at runtime.


Profiles

ProfileModelNotes
gemini-2.5-flash (default)gemini-2.5-flashFast and efficient, suitable for real-time use
gemini-2.5-progemini-2.5-proHighest quality
gemini-2.0-flashgemini-2.0-flashBalanced

All profiles use a 1M (1,048,576) token context window.


Default output structure

1. ENVIRONMENT  - type of place
2. HAZARDS - obstacles, stairs, vehicles (with positions)
3. KEY OBJECTS - notable items with clock positions and distances
4. TEXT - any visible text read verbatim
5. PEOPLE - count, positions, and actions
6. NAVIGATION - clear path forward, turns, or barriers

Customize the Analysis Prompt (accessibility.prompt) field to change this structure.


Authentication

Requires a Google AI API key: get one at aistudio.google.com/apikey and set it in the node's API key field. The node validates the key at startup: a missing key raises an error immediately, and a key with the sk- prefix is rejected as an OpenAI key.


Schema

FieldTypeDescriptionDefault
accessibility.prioritizeHazardsstringHazard Priority
How aggressively to prioritize hazard detection
"high"
accessibility.promptstringAnalysis Prompt
Prompt template for generating accessibility descriptions from images
"Describe this image for a blind person. Include: environment type, hazards with positions, key objects with clock positions, visible text, people, and navigation guidance. Keep under 150 words."
accessibility.spatialFormatstringSpatial Format
How to describe spatial positions
"clock"
accessibility.systemPromptstringSystem Instructions
Define the accessibility description behavior and priorities
"You are an accessibility-focused scene analyzer designed to help blind and visually impaired users understand their surroundings through image descriptions."
accessibility_describe.profilestringVision Model
Select the Gemini vision model for accessibility descriptions
"gemini-2.5-flash"
modelstringModel
Google Gemini vision model
modelTotalTokensnumberTokens
Maximum context length in tokens

Dependencies

  • google-genai >=1.14.0