Envision, Arm, and Google are bringing powerful visual AI on-device

At Envision, accessibility has always meant more than just building tools for people who are blind or have low vision. It also means making those tools easier to access, easier to rely on, and available in more situations for more people.

That is why we are excited to work with Arm and Google to help bring advanced visual AI experiences directly onto smartphones.

By combining Envision’s expertise in accessible AI, the Arm compute platform with Arm Scalable Matrix Extension 2 (SME2) technology, and Google’s latest Gemma 4 model, we are making it possible to run features like scene description and visual question answering on-device. In practice, that means a user can point their phone at the world around them, get a description of what is in front of them, and ask follow-up questions, all without needing to send that data to the cloud.

As part of this collaboration, Envision had early access to Gemma 4, which enabled us to build and deploy an on-device visual understanding experience within the Envision app. This is now integrated into our scene description feature, allowing users to get rich descriptions and ask follow-up questions directly on their device, even without an internet connection.

For blind and low-vision users, that shift matters.
‍

Making accessibility more accessible

For years, many of the most powerful AI experiences have depended on a strong internet connection and cloud-based processing. That has unlocked a lot of progress, but it also comes with limits. Responses can be slower. Reliability can vary depending on connectivity. And when visual data has to be sent elsewhere for processing, privacy becomes part of the equation too.

Running these experiences on-device changes that.

It makes interactions faster. It makes them more dependable in low-connectivity or no-connectivity environments. And it gives users greater confidence that the images they capture and the questions they ask can stay on their own device.

For someone who is blind, these are not small technical improvements. They shape whether a tool feels dependable enough to use in daily life.

A scene description feature is only truly useful if it responds quickly enough to keep up with the moment. A visual Q&A feature is only truly helpful if it works when someone is out and about, traveling, or in places where the network is weak or unavailable. And privacy is not a nice extra when someone is using AI to interpret personal surroundings, documents, objects, or moments in everyday life.

This is why on-device AI is such an important step forward for accessibility. Here's a quick video of this in action:

A screenrecording showing the Envision app accurately describing and reading text from a display, completely offline

‍

Karthik Kannan using the new Gemma model to read text on a display offline

What this makes possible

Through this collaboration, Envision is exploring how visual understanding capabilities can run directly on smartphones powered by the latest SME2-enabled Arm C1 CPUs and optimized with Google’s Gemma models.

That opens the door to experiences such as:

rich scene description directly on a phone
follow-up questions about what is visible in an image or live scene
faster conversational interactions around visual information
more reliable use in places with poor or no internet connectivity
stronger privacy by keeping processing on-device

For the user, the benefit is simple. The experience feels faster, more immediate, and more trustworthy.

Instead of waiting for a round trip to the cloud, users can get responses with much lower latency. Instead of depending on whether they have signal, they can keep using the feature wherever they are. Instead of wondering where their data is going, they can know that the processing is happening locally on the device.

A shared step toward the future of accessible AI

We see this as part of a bigger shift.

The future of accessible AI is not just about making models more capable. It is about making them available in the moments that matter most. It is about putting intelligence closer to the user. And it is about making advanced assistive experiences practical enough to become part of everyday life.

At Envision, we believe accessible technology should work where people actually live: on the move, in real time, and without friction. That is what makes this collaboration with Arm and Google so exciting.

Together, we are helping move accessibility experiences from the cloud to the device itself, without losing the richness and usefulness that people expect from modern AI.

“This collaboration with Envision and Google shows how advanced AI is now running directly on Arm-powered devices, delivering real-time, reliable experiences,” said Chris Bergey, EVP of Edge AI at Arm. “With Armv9, the world’s most advanced, secure, and widely deployed AI architecture, models like Gemma 4 are utilizing SME2 benefits via KleidiAI to unlock powerful on-device AI capabilities across billions of smartphones and enable more accessible, assistive experiences. As on-device AI becomes the new default, it is essential to meeting rising AI expectations and maximizing impact - unlocking applications that are more responsive, efficient, and built for how people use technology every day.”

“Envision is excited to work with Arm and Google to bring powerful accessibility experiences directly onto smartphones. Running visual understanding models on-device opens the door to reliable, low-latency scene description and visual Q&A for blind and low-vision users. For our community, the ability to access these capabilities offline is incredibly meaningful because it ensures the technology works wherever they are, while also improving privacy by keeping more processing on the device itself.”

Looking ahead

This is an important milestone for us, but it is also just the beginning.

Our goal is to keep pushing toward a future where accessible AI is faster, more private, more dependable, and available to more people on the devices they already use every day.

We are proud to be working with Arm and Google on that future.

Envision, Arm, and Google are bringing powerful visual AI on-device

Making accessibility more accessible

What this makes possible

A shared step toward the future of accessible AI

Looking ahead

Related Posts

About

Resources

Products

Get in Touch