~alcinnz/rhapsode

2481b8c0a5030f0a1f5e7093072929f043bcb081 — Adrian Cochrane 4 years ago edd8488
Merge wiki into main repo.
A docs/CSS-Speech-Tutorial.md => docs/CSS-Speech-Tutorial.md +41 -0
@@ 0,0 1,41 @@
**Wanted:** Guidance on creating a great audio theme. Maybe based on voice acting or public speaking theory.

Rhapsode still lets you apply CSS styles to your webpages, but since it outputs audio rather than video it supports a different set of CSS properties. This page provides an overview of these properties.

## Should it be spoken?
You can use the `speak` property to determine whether an HTML element should be read aloud or not, and the `speak-as` property to determine how it reads digits and/or punctuation.

Setting `speak: never` is not the same as setting `voice-volume: silent` as the latter still takes up the same ammount of time as it would've to read the text aloud.

## The Voice
You can use the `voice-family` attribute to select a voice either by age/gender/variant or by it's name. Just like font-family this'll make a big difference to the look of your page.

## Speaking Style
You can alter the voice you choose by varying it's volume, rate, pitch, range, and stress. Doing so helps people pay attention, especially if it reinforces the meaning of your text.

### Keywords & Offsets
All the speaking style properties provides keywords you can use instead of a number. In which case write a number after a keyword to represent an offset from that keyword.

## The CSS Speech "Box Model"
On either end of your text you can place an audio cue to identify it, and on either end of those you can insert additional silence. If two silences are directly adjacent, the smaller one will be removed.

The inner pauses are called the element's `rest` and the outer ones are called it's `pause`.

The user agent stylesheet, for example, uses audio cues to indicate list bullets and links. And silence functions exactly like whitespace in a visual browser.

## Text Generation
Rhapsode supports (some of) the same text generation attributes as visual browsers, namely:

* `counter-reset`
* `counter-increment`
* `counter-set`
* `content`

Though more may be added in the future.

However unlike visual browsers you can apply the `content` property the element itself to replace it's own children.

---

* [CSS3 Speech Module](https://drafts.csswg.org/css-speech-1/) (Retired W3C Note)
* [MDN on CSS Counters](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Lists_and_Counters/Using_CSS_counters)
\ No newline at end of file

A docs/Designing-for-Rhapsode.md => docs/Designing-for-Rhapsode.md +41 -0
@@ 0,0 1,41 @@
## 1. Write Semantic (X)HTML5
Use *changes* in Rhapsode's voice to *enhance* the communication of your text, that'll help people pay attention to it. To do so start by making meaningful use of the (X)HTML5 tags, and then (if you want) you can get more specific with Rhapsode-specific CSS. At the very least provide good links.

If you've got quotes, marking them with `<q>` or `<blockquote>` tags rather than quote marks will render much more clearly in Rhapsode. Rhapsode will then render these in a new voice, while visual browsers will still render them as the appropriate quote marks for your language.

## 2. Declare Your Page's Language
If Rhapsode knows what language your page is written in, it can alter some of the phrases it inserts to match (not that it does yet). To do so use the `lang` or `xml:lang` attributes on the root `<html>` element.

This mostly just applies if you've got forms on the page.

## 3. Don't Rely on JavaScript
Rhapsode doesn't support JavaScript, as it's APIs don't map cleanly to Rhapsode's experience. And because Rhapsode doesn't like it's complexity or security model.

As such if your pages break when JavaScript's disabled, they'll almost certainly break in Rhapsode. The exception is for simple scripts that alter the style of an existing element, because Rhapsode follows an alternative set of CSS properties.

## 4. Avoid Excess Text
Listeners may tune out if you don't get straight to the point and stay on topic. As for styles, it doesn't matter much what you do as long as you're consistant.

## Navigation
There are a couple of additional points when it comes to navigation.

**NOTE:** Navigation has not yet been implemented.

### 5. Never Override `:link {cue-before}`
Visitors will be relying on this audio cue to know this is a link they can follow, and override it defeats the purpose of the link.

### 6. Don't Rely on Navbar (Or Adjust Styles)
Because excess text can bore the visitor, Rhapsode defaults to not read out your `<nav>` tag. However it'll still allow visitors to follow these links if they can intuit that they exist, so it's still very useful to provide a navbar on your pages.

As such you should make sure that visitors can navigate your entire site without using the navbar. With the navbar itself acting as an enhancement but not a necessity.

Or alternatively you can override this default via the CSS `nav {speak: always}`. I would suggest applying this style *only* to your homepage, so it doesn't get in the way of your site's text.

## 7. Reset All Properties In Your Voice Stylesheets
Rhapsode reserves the right to adjust it's user agent stylesheet to better suite the majority of sites not targetting Rhapsode specifically. As such you should not rely on these defaults staying as they are when styling your own pages for it.

The good news is that there's not that many properties to reset to `initial`.

---

[elementary OS's blog](https://blog.elementary.io/) sounds great in Rhapsode, for example.
\ No newline at end of file

A docs/Home.md => docs/Home.md +5 -0
@@ 0,0 1,5 @@
This wiki provides (rudimentary) documentation for the Rhapsode web browser/voice assistant.

* [CSS](CSS-Speech-Tutorial)
* [Web Design](Designing-for-Rhapsode)
* Just for fun: [How I'd design a processor specifically for Rhapsode](Hypothetical/Custom-CPU-Design)
\ No newline at end of file

A docs/Hypothetical/Custom-CPU-Design.md => docs/Hypothetical/Custom-CPU-Design.md +82 -0
@@ 0,0 1,82 @@
This page describes some hypothetical hardware designed specifically to run a Rhapsode-like web browser. There are no plans to build *this* hardware.

However this hypothetical may help to clarify how Rhapsode works.

## The Task
A navigation task is performed by:

1. Performing voice recognition
2. Matching the textual translations to links on the page
3. Parse the (relative) URL
4. Lookup URL in cache
5. Resolve the domain name via DNS if needed
6. Send a HTTP(/TLS)/TCP/IP network request ideally on a previously open connection
7. Parse the HTTP response, having dispatched network input to the right thread
8. Parse the HTML to extract CSS & links
9. Update the language model for voice recognition
10. Parse the CSS
11. Convert the HTML to SSML via CSS
12. Convert the SSML text to phonemes via natural language-specific rules
13. Convert the SSML/phonemes into audio manipulations
14. Output raw audio

Almost all of those steps are straightforward format conversions (possibly via instructions extracted from HTML) or map lookups. So that's what I'll design here.

The main exceptions are TLS, TCP, & especially voice recognition. TLS requires circuitry that can perform en/de-cryption. Whilst TCP requires cancellable timeouts, randomness, and coroutines. Voice recognition will be addressed later.

## Parsing
Let's say network, buttons, and (voice-recognized) audio is written into a ringbuffer  by those input devices. Each 4bits(?) of which would navigate a graph describing the syntax being matched.

The nodes in that graph could "call" other syntaxes or tries (for more complex syntaxes) before "returning" to where it left off by pushing and popping a stack.

If the parsing CPU encounters a node that's not in it's cache memory (a "cache miss"), I'd have it immediately load it in from memory. And since this CPU focuses on format conversions anyways, it could be repurposed to decompress/decode the new instructions.

## Reformatting
Each of the parsing rules you can call could optionally have a corresponding instructions for what to do upon pop. So that those instructions could be prefetched upon push and enqueued upon pop. There may also be an "echo" shorthand in this process.

Those instructions in turn would output bytes to external hardware, cached disk pages, and/or other programs as tracked in a "capabilities stack". Bytes written to other programs would be queued up in an "idle" ringbuffer to be dequeued when there's no external input.

To compile machine code, update caches, add a timeout, or sort/dedup output there'd also need to be an instruction to rewrite specified page(s) of memory. This could be handled using the same circuits as cache misses during parsing, or it could trigger the interrupt only once the fetch has completed.

---

Occasionally an ALU would be required for encryption, comparison, checksums, sound effects, etc.

Coroutines would be required for TLS and (navigatable) audio output. Saves could be done by writing a pointer to it's stack(s) to another page. And restores could occur via parsing cache miss once it's been looked up.

## Memory Blocks
This hypothetical CPU would require very little circuitry, relying almost entirely on multiple independant chunks of memory that can be accessed concurrently. It should be trivial to build on a FPGA.

Specially it'd include memory blocks for:

1. Input queue
2. Parsing graph (split in 2?)
3. Parsing stack
4. Prefetch stack
5. Output instruction queue
6. Capabilities stack
7. Staging areas/stacks
8. Idle queue

There may be a second core that turns on when the idle queue overflows, which would have (some of) it's own dedicated memory blocks. Also a bitmask could be used for allocate overflow and other pages.

Furthermore the queues and stacks could have near-perfect cache hit rates, and would rarely overflow to memory.

## Voice Recognition
There are two approaches to voice recognition I'm familiar with: Mozilla Deep Voice & CMU Sphinx. What's described here caters to both approaches, whilst the circuit described above caters to neither.

No one understands how any specific neural network (like Mozilla Deep Voice) works, but I can expand upon how CMU Sphinx works:

1. Compute "feature vectors" to describe each sliver of audio.
2. Use "Hidden Markov Models" (HMMs) to convert those feature vectors to "phones".
3. Use a "language model" (ngrams or finite-state automatons) to convert phones into possible texts.

### Circuitry

Hidden Markov Models, finite-state automatons, & ngram models can all be viewed as variants of a probability graph. To traverse these we need extensive multiplication, addition, and random-access memory lookups.

Additions and multiplications can be combined into a matrix multiplication operation, which are also heavily used in neural networks and optimized for by GPU & KPU hardware. Maybe this could be reused to perform the audio analysis?

Random-access memory lookups meanwhile require some sort of RAM which is hard to optimize. But because having too large (or too small) of a language model gives a bad UX, it would be appropriate to heavily limit the available memory. And make heavier use of matrix multiplies then CMU Sphinx might.

This happens to closely describe the [MAIX SoC](https://www.seeedstudio.com/sipeed) which may power some real Rhapsode hardware.
\ No newline at end of file

A docs/Why?.md => docs/Why?.md +13 -0
@@ 0,0 1,13 @@
I wish to show that The Web can be more private, secure, accessable, and easier to author if it limited it's scope and drastically simplified. I do not aim to support highly-interactive "webapps", but rather keep the I/O model abstract enough that it can work pretty much anywhere.

As such I'm implementing my own browser engines, and making them modular enough that you can reuse it's components in other browser engines or other projects.

## Bibliography
* https://invidio.us/watch?v=fPFdV-Z69Lo
* https://anewdigitalmanifesto.com/
* http://john.ankarstrom.se/replacing-javascript/
* https://drewdevault.com/2020/03/18/Reckless-limitless-scope.html
* https://brutalist-web.design/
* https://mastodon.social/@tbernard/103889150137765427
* https://mstdn.io/@wolf480pl/103772675972092365
* https://media.libreplanet.org/u/libreplanet/m/who-s-afraid-of-spectre-and-meltdown/
\ No newline at end of file