~alcinnz/argonaut-constellation.org

ref: d88192d24de80365d94ca92a65225056a0196cc0 argonaut-constellation.org/_posts/2020-10-31-why-auditory.md -rw-r--r-- 5.5 KiB
d88192d2 — Adrian Cochrane Wrap title & navbar in header. 1 year, 9 months ago
                                                                                
da1ec90f Adrian Cochrane
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
layout: post
title: Why an Auditory Browser?
author: Adrian Cochrane
date: 2020-10-31 20:38:51 +1300
---

I thought I might start a blog to discuss how and why Rhapsode works the way it does.
And what better place to start than "why is Rhapsode an *auditory* web browser?"

## It's accessable!
The blind, amongst numerous others, [deserves](http://gameaccessibilityguidelines.com/why-and-how/) as *excellent* a computing experience as
the rest of us! Yet webdesigners *far too* often don't consider them, and webdevelopers
*far too* often [exclude them](https://webaim.org/projects/million/) in favour of visual slickness.

Anyone who can't operate a mouse, keyboard, or touchscreen, anyone who can't see well or
at all, anyone who can't afford the latest hardware is being
*excluded* from [our conversations online](https://ferd.ca/you-reap-what-you-code.html).
*[A crossfade](https://adactio.com/journal/17573) is not worth this loss*!

Currently the blind are [reliant](https://bighack.org/5-most-annoying-website-features-i-face-as-a-blind-screen-reader-user-accessibility/)
on "screenreaders" to describe the webpages, and applications, they're interacting with.
Screenreaders in turn rely on webpages to inform it of the semantics being communicated
visually, which they rarely do.

But *even if* those semantics were communicated, screenreaders would *still* offer a poor
experience! As they retrofit an auditory output upon an inherantly visual experience.

## It's cool!
It's unfortunately [not](https://webaim.org/projects/million/) considered cool to show
disabled people the *dignity* they deserve.

But you know what is considered cool?
[Voice assistants](https://marketingland.com/more-than-200-million-smart-speakers-have-been-sold-why-arent-they-a-marketing-channel-276012)!
Or at least that's what Silicon Valley wants us to believe as they sell us
[Siri](https://www.apple.com/siri/), [Cortana](https://www.microsoft.com/en-us/cortana/),
[Alexa](https://en.wikipedia.org/wiki/Amazon_Alexa), and other
[privacy-invasive](https://www.theguardian.com/technology/2019/oct/09/alexa-are-you-invading-my-privacy-the-dark-side-of-our-voice-assistants)
cloud-centric services.

Guess what? These feminine voices [are accessable](https://vimeo.com/event/540113#t=2975s) to many people otherwise excluded from
modern computing! Maybe voice assistants can make web accessability cool? Maybe I can
deliver an alternative web experience people will *want* to use even if they don't need to?

## It's different!
On a visual display you can show multiple items onscreen at the same time for your eyes
to choose where to focus their attention moment-to-moment. You can even update those items
live without confusing anyone!

In contrast in auditory communication, information is positioned in time rather than space.
Whilst what you say (or type) is limited by your memory rather than screen real estate.

Visual and auditory user experiences are two
[totally different](https://developer.amazon.com/en-US/docs/alexa/alexa-design/get-started.html)
beasts, and that makes developing a voice assistant platform interesting!

## It works!
Webpages in general are still mostly text. Text can be rendered to audio output
just as (if not more) readily as it can be rendered to visual output. HTML markup
can be naturally communicated via tone-of-voice. And links can become voice
commands! A natural match!

Yes, this totally breaks down in the presence of JavaScript with it's device-centric
input events and ability to output anything whenever, wherever it wants. But I'll
never be able to catch up in terms of JavaScript support, even if I didn't have
grave concerns about it!

In practice I find that [most websites](https://hankchizljaw.com/wrote/the-(extremely)-loud-minority/)
work perfectly fine without JavaScript, it's mainly just the *popular* ones which don't.

## It's simple!
You may be surprised to learn it's actually *simpler* for me to start my browser
developments with an auditory offering like Rhapsode! This is because laying out
text on a one-dimensional timeline is trivial, whilst laying it out in 2-dimensional
space absolutely isn't. Especially when considering the needs of languages other
than English!

Once downloaded (along with it's CSS and sound effects), rendering a webpage
essentially just takes applying a specially-designed [CSS](https://hankchizljaw.com/wrote/css-doesnt-suck/)
stylesheet! This yields data that can be almost directly passed to basically any
text-to-speech engine like [eSpeak NG](http://espeak.sourceforge.net/).

Whilst input, whether from the keyboard or a speech-to-text engine like [CMU Sphinx](https://cmusphinx.github.io/),
is handled through string comparisons against links extracted from the webpage.

## It's efficient!
I could discuss how the efficiency gained from the afforementioned simplicity is
important because CPUs are no longer getting any faster, only gaining more cores.
But that would imply that it was a valid strategy to wait for the latest hardware
rather than invest time in optimization.

Because performant software is [good for the environment](https://tomgamon.com/posts/is-it-morally-wrong-to-write-inefficient-code/)!

Not only because speed *loosely*
[correlates](https://thenewstack.io/which-programming-languages-use-the-least-electricity/)
with energy efficiency, but also because if our slow software pushes others to
buy new hardware (which again, they might not be able to afford) manufacture that
new computer incurs
[significant](https://solar.lowtechmagazine.com/2009/06/embodied-energy-of-digital-technology.html)
environmental cost.