We need an Eval layer for AI in the blind assistive tech world.

After the last few discussions I’ve been part of, I’ve realised something important.

We’re not really disagreeing about tools.
We’re disagreeing about how we decide what AI can and can’t do.

I’ve now spent roughly three years working daily with large language models, and about two and a half years with vision–language models. During that time, I’ve built and used:

mixtures of models
extended visual narratives
“Vibe Text” alongside literal description
layered-use approaches (different tools for different stakes)

What that experience has taught me is this:

first impressions age amazingly fast.

A lot of our current beliefs about AI in accessibility are based on:

early failures
single bad experiences
tools used out of context
versions that no longer exist

Those experiences are real — but they’re also time-bound.

What we don’t have, as a community, is a shared way to re-evaluate tools as they evolve. So, discussions collapse into one blind person’s experience versus another’s, with no effective way to update beliefs without someone feeling dismissed.

I think 2026 needs something different.

Not endorsements.
Not certifications.
Not “this is perfect now”.

But a shared, blind-led Eval layer.

Something that asks questions like:

What tool, which version, and when?
In what context was it used?
What worked?
What failed?
How recoverable were the failures?
Were the risks low, social, annoying, or dangerous?
Would you try it again in the same context?

Crucially:

“Didn’t work for me in 2024” should not automatically mean
“can’t work in 2026”.

Eval isn’t about proving anyone wrong.
It’s about keeping our collective map aligned with a fast-moving territory.

Right now, we don’t have a shared mechanism for belief-updating, so we default to caution, habit, and legacy problem definitions. That made sense once. It makes less sense as general-purpose AI starts absorbing whole categories of assistive technology.

If we want to avoid arguing the same questions every year, we need a better way to evaluate change itself.

I’m interested in starting that conversation.

Charlotte Joanne