From da1ec90f791233fcb688ae118bf79bcb861e4cb4 Mon Sep 17 00:00:00 2001 From: Adrian Cochrane Date: Thu, 24 Nov 2022 20:20:10 +1300 Subject: [PATCH] Init! With moved blog! --- HarfBuzz.png | Bin 0 -> 3361 bytes JuicyPixels.png | Bin 0 -> 3113 bytes _layouts/page.html | 14 ++ _layouts/post.html | 20 +++ _posts/2020-10-31-why-auditory.md | 100 ++++++++++++++ _posts/2020-11-12-css.md | 209 ++++++++++++++++++++++++++++++ _posts/2021-01-23-why-html.md | 51 ++++++++ _posts/2021-06-13-voice2json.md | 161 +++++++++++++++++++++++ _posts/2021-12-24-amphiarao.md | 31 +++++ _posts/2022-07-04-op-parsing.md | 53 ++++++++ _posts/Pango-name.svg.png | Bin 0 -> 8950 bytes blog.atom | 19 +++ blog.html | 24 ++++ espeak.png | Bin 0 -> 3979 bytes freetype.png | Bin 0 -> 7566 bytes haskell.png | Bin 0 -> 3858 bytes index.html | 152 ++++++++++++++++++++++ rhapsode.png | Bin 0 -> 2588 bytes upload.sh | 5 + voice2json.png | Bin 0 -> 4030 bytes 20 files changed, 839 insertions(+) create mode 100644 HarfBuzz.png create mode 100644 JuicyPixels.png create mode 100644 _layouts/page.html create mode 100644 _layouts/post.html create mode 100644 _posts/2020-10-31-why-auditory.md create mode 100644 _posts/2020-11-12-css.md create mode 100644 _posts/2021-01-23-why-html.md create mode 100644 _posts/2021-06-13-voice2json.md create mode 100644 _posts/2021-12-24-amphiarao.md create mode 100644 _posts/2022-07-04-op-parsing.md create mode 100755 _posts/Pango-name.svg.png create mode 100644 blog.atom create mode 100644 blog.html create mode 100644 espeak.png create mode 100644 freetype.png create mode 100644 haskell.png create mode 100644 index.html create mode 100644 rhapsode.png create mode 100755 upload.sh create mode 100644 voice2json.png diff --git a/HarfBuzz.png b/HarfBuzz.png new file mode 100644 index 0000000000000000000000000000000000000000..5423b76c322ec89eca57197da455a14611ed65f4 GIT binary patch literal 3361 zcmeAS@N?(olHy`uVBq!ia0vp^3LwnE1|*BCs=ffJ=Bkj0lAy$Lg@U5|w9K4Tg_6pG zRE5-v%rpjuid&_9cR7zZaJ1ZC`^alX=;L)yStdDH?f8^0R@`Fqgt_#O>&^Yo73X_0 zADM2ar0ig0Te?(2gD0cQOu4quH9N#>W2oLTRVDZK`^|mo+(}0+&OIg?e0S~TRVSsJ zemdP_|FS<*|10-T#u@LmE&kXZ`N+V)#gqwjPK2+IRdRl=USdjqQmS4>ZUNAp3=B5* z6$OdO*{LN8NvY|XdA3ULckfqH$V{%1*XSQL?w= zvZ=5F8jzb>lBiITo0C^;Rbi_HHrEQs1_|pcDS(xfWZNo5_y#CA=NF|anCO}48R)uJ zWR@8z*>Ne@6s4qD1-ZCEjR5j&l`=|73as??%gf94%8m8%i_-NCEiEne4UF`SjC6r2 zbc-wVN)jt{^NN*0MnKGPNi9w;$}A|!%+FH*nVFcBUs__Tqy&^#fEo@8dbsI%#n3%NW_x2MiRPq9xy$tkmErRq+a~zBL{;SA1mo6+j{BK9R``?InOc4_zR!Nogr!huWC1jUW1d3cXW}25J z)cH6y^+ivA<)jVr2RheymL1>q{_q=~kHs5V zmw&KY_Ak#VI_;5bNu8SFs;(vCN|ENfK3LXyco|v>o$LQ>-`OXy_I^a;YQ=>3w#DC> zm3QnaFyLDww59FQR~P$Kx5|*a^V>Jrt9((~wp3)T?Ds;ukKE=eMQ=|^t>M|bj^k2( zr|Qd|=l_L|O8Q@DT4*Y=zR2dp>yOG+k1kY*To8M**YaG4p|f|+Ud|q=e+*}Sw5cgK RwuXYb5}vMpF6*2UngCL$5?%lR literal 0 HcmV?d00001 diff --git a/JuicyPixels.png b/JuicyPixels.png new file mode 100644 index 0000000000000000000000000000000000000000..43cc19ee2b7d4dbc21fabdec730fc1106f70120f GIT binary patch literal 3113 zcmeAS@N?(olHy`uVBq!ia0vp^3LwnE1|*BCs=ffJ=Bkj0lAy$Lg@U5|w9K4Tg_6pG zRE5-v%rpjuid&_9cR7zZaJ1ZC`^alX=;L)yStdDH?f8^0R@`Fqgt_#O>&^Yo73X_0 zADM2ar0ig0Te?(2gD0cQOu4quH9N#>W2oLTRVDZK`^|mo+(}0+&OIg?e0S~TRVSsJ zemdP_|FS<*|10-T#u@LmE&kXZ`N+V)#gqwjPK2+IRdRl=USdjqQmS4>ZUNAp3=B5* z6$OdO*{LN8NvY|XdA3ULckfqH$V{%1*XSQL?w= zvZ=5F8jzb>lBiITo0C^;Rbi_HHrEQs1_|pcDS(xfWZNo5_y#CA=NF|anCO}48R)uJ zWR@8z*>Ne@6s4qD1-ZCEjR5j&l`=|73as??%gf94%8m8%i_-NCEiEne4UF`SjC6r2 zbc-wVN)jt{^NN*0MnKGPNi9w;$}A|!%+FH*nVFcBUs__Tqy&^#fEo@8dbsI%#n3HSF!kOgLB>fH z^*03@Hhtba<1qK!rZHb661EOQ6j z36Y9b>=7^aSk~;jae#I5mZN2d7TYGAn=|#bn~V4kvpL%@$1%qADO_f*F3#G(5%6;E z-Prv0%>owF7ngA_YyXuy`}Vf1H8#IoJaq1eJ!Go>q~g`UFPA$@PVmytHHIIS_kXK1 zwfo#r_3ue}T8Mc3`BvrwkL`BEJe>Ymo=YKLC&Bbz(=1ko{O<>!-t50)#bhy~K)`@w z$4&-g;blHY#p{o=#nm6zO4acf6J+9_+f`Y!*IV-L-VZZ2{`nE^Yu94#_aS%XgT^Ih QS)eAhr>mdKI;Vst0Kx>1-~a#s literal 0 HcmV?d00001 diff --git a/_layouts/page.html b/_layouts/page.html new file mode 100644 index 0000000..7618002 --- /dev/null +++ b/_layouts/page.html @@ -0,0 +1,14 @@ +--- +--- + + + + + + {{ page.title|xml_escape }} + + + + {{ content }} + + diff --git a/_layouts/post.html b/_layouts/post.html new file mode 100644 index 0000000..bee4cb0 --- /dev/null +++ b/_layouts/post.html @@ -0,0 +1,20 @@ +--- +--- + + + + + + {{ page.title|xml_escape }} — Argonaut Constellation's blog + + + + +

{{ page.title|xml_escape }}

+ + + {{ content }} + + diff --git a/_posts/2020-10-31-why-auditory.md b/_posts/2020-10-31-why-auditory.md new file mode 100644 index 0000000..2bf3c7d --- /dev/null +++ b/_posts/2020-10-31-why-auditory.md @@ -0,0 +1,100 @@ +--- +layout: post +title: Why an Auditory Browser? +author: Adrian Cochrane +date: 2020-10-31 20:38:51 +1300 +--- + +I thought I might start a blog to discuss how and why Rhapsode works the way it does. +And what better place to start than "why is Rhapsode an *auditory* web browser?" + +## It's accessable! +The blind, amongst numerous others, [deserves](http://gameaccessibilityguidelines.com/why-and-how/) as *excellent* a computing experience as +the rest of us! Yet webdesigners *far too* often don't consider them, and webdevelopers +*far too* often [exclude them](https://webaim.org/projects/million/) in favour of visual slickness. + +Anyone who can't operate a mouse, keyboard, or touchscreen, anyone who can't see well or +at all, anyone who can't afford the latest hardware is being +*excluded* from [our conversations online](https://ferd.ca/you-reap-what-you-code.html). +*[A crossfade](https://adactio.com/journal/17573) is not worth this loss*! + +Currently the blind are [reliant](https://bighack.org/5-most-annoying-website-features-i-face-as-a-blind-screen-reader-user-accessibility/) +on "screenreaders" to describe the webpages, and applications, they're interacting with. +Screenreaders in turn rely on webpages to inform it of the semantics being communicated +visually, which they rarely do. + +But *even if* those semantics were communicated, screenreaders would *still* offer a poor +experience! As they retrofit an auditory output upon an inherantly visual experience. + +## It's cool! +It's unfortunately [not](https://webaim.org/projects/million/) considered cool to show +disabled people the *dignity* they deserve. + +But you know what is considered cool? +[Voice assistants](https://marketingland.com/more-than-200-million-smart-speakers-have-been-sold-why-arent-they-a-marketing-channel-276012)! +Or at least that's what Silicon Valley wants us to believe as they sell us +[Siri](https://www.apple.com/siri/), [Cortana](https://www.microsoft.com/en-us/cortana/), +[Alexa](https://en.wikipedia.org/wiki/Amazon_Alexa), and other +[privacy-invasive](https://www.theguardian.com/technology/2019/oct/09/alexa-are-you-invading-my-privacy-the-dark-side-of-our-voice-assistants) +cloud-centric services. + +Guess what? These feminine voices [are accessable](https://vimeo.com/event/540113#t=2975s) to many people otherwise excluded from +modern computing! Maybe voice assistants can make web accessability cool? Maybe I can +deliver an alternative web experience people will *want* to use even if they don't need to? + +## It's different! +On a visual display you can show multiple items onscreen at the same time for your eyes +to choose where to focus their attention moment-to-moment. You can even update those items +live without confusing anyone! + +In contrast in auditory communication, information is positioned in time rather than space. +Whilst what you say (or type) is limited by your memory rather than screen real estate. + +Visual and auditory user experiences are two +[totally different](https://developer.amazon.com/en-US/docs/alexa/alexa-design/get-started.html) +beasts, and that makes developing a voice assistant platform interesting! + +## It works! +Webpages in general are still mostly text. Text can be rendered to audio output +just as (if not more) readily as it can be rendered to visual output. HTML markup +can be naturally communicated via tone-of-voice. And links can become voice +commands! A natural match! + +Yes, this totally breaks down in the presence of JavaScript with it's device-centric +input events and ability to output anything whenever, wherever it wants. But I'll +never be able to catch up in terms of JavaScript support, even if I didn't have +grave concerns about it! + +In practice I find that [most websites](https://hankchizljaw.com/wrote/the-(extremely)-loud-minority/) +work perfectly fine without JavaScript, it's mainly just the *popular* ones which don't. + +## It's simple! +You may be surprised to learn it's actually *simpler* for me to start my browser +developments with an auditory offering like Rhapsode! This is because laying out +text on a one-dimensional timeline is trivial, whilst laying it out in 2-dimensional +space absolutely isn't. Especially when considering the needs of languages other +than English! + +Once downloaded (along with it's CSS and sound effects), rendering a webpage +essentially just takes applying a specially-designed [CSS](https://hankchizljaw.com/wrote/css-doesnt-suck/) +stylesheet! This yields data that can be almost directly passed to basically any +text-to-speech engine like [eSpeak NG](http://espeak.sourceforge.net/). + +Whilst input, whether from the keyboard or a speech-to-text engine like [CMU Sphinx](https://cmusphinx.github.io/), +is handled through string comparisons against links extracted from the webpage. + +## It's efficient! +I could discuss how the efficiency gained from the afforementioned simplicity is +important because CPUs are no longer getting any faster, only gaining more cores. +But that would imply that it was a valid strategy to wait for the latest hardware +rather than invest time in optimization. + +Because performant software is [good for the environment](https://tomgamon.com/posts/is-it-morally-wrong-to-write-inefficient-code/)! + +Not only because speed *loosely* +[correlates](https://thenewstack.io/which-programming-languages-use-the-least-electricity/) +with energy efficiency, but also because if our slow software pushes others to +buy new hardware (which again, they might not be able to afford) manufacture that +new computer incurs +[significant](https://solar.lowtechmagazine.com/2009/06/embodied-energy-of-digital-technology.html) +environmental cost. diff --git a/_posts/2020-11-12-css.md b/_posts/2020-11-12-css.md new file mode 100644 index 0000000..ef0875e --- /dev/null +++ b/_posts/2020-11-12-css.md @@ -0,0 +1,209 @@ +--- +layout: post +title: How Does CSS Work? +author: Adrian Cochrane +date: 2020-11-12 20:35:06 +1300 +--- + +Rendering a webpage in Rhapsode takes little more than applying a +[useragent stylesheet](https://meiert.com/en/blog/user-agent-style-sheets/) +to decide how the page's semantics should be communicated. +[In addition to](https://www.w3.org/TR/CSS2/cascade.html#cascade) any installed +userstyles and *optionally* author styles. + +Once the [CSS](https://www.w3.org/Style/CSS/Overview.en.html) has been applied +Rhapsode sends the styled text to [eSpeak NG](https://github.com/espeak-ng/espeak-ng) +to be converted into the sounds you hear. So *how* does Rhapsode apply that CSS? + +## Parsing +[Parser](http://parsingintro.sourceforge.net/) implementations differ mainly in +*what* they implement rather than *how*. They repeatedly look at the next character(s) +in the input stream to decide how to represent it in-RAM. Often there'll be a +"lexing" step (for which I use [Haskell CSS Syntax](https://hackage.haskell.org/package/css-syntax)) +to categorize consecutive characters into "tokens", thereby simplifying the main parser. + +My choice to use [Haskell](https://www.haskell.org/), however, does change things +a little. In Haskell there can be [*no side effects*](https://mmhaskell.com/blog/2017/1/9/immutability-is-awesome); +all [outputs **must** be returned](https://mmhaskell.com/blog/2018/1/8/immutability-the-less-things-change-the-more-you-know). +So in addition to the parsed tree, each part of the parser must return the rest +of text that still needs to be parsed by another sub-parser. Yielding a type +signature of [`:: [Token] -> (a, [Token])`](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Syntax/StylishUtil.hs#n11), +leading Haskell to allow you to combine these subparsers together in what's +called "[parser combinators](https://remusao.github.io/posts/whats-in-a-parser-combinator.html)". + +Once each style rule is parsed, a method is called on a +[`StyleSheet`](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Syntax/StyleSheet.hs#n27) +"[typeclass](http://book.realworldhaskell.org/read/using-typeclasses.html)" +to return a modified datastructure containing the new rule. And a different method +is called to parse any [at-rules](https://www.w3.org/TR/CSS2/syndata.html#at-rules). + +## Pseudoclasses +Many of my `StyleSheet` implementations handle only certain aspects of CSS, +handing off to another implementation to perform the rest. + +For example most pseudoclasses (ignoring interactive aspects I have no plans to +implement) can be re-written into simpler selectors. So I added a configurable +`StyleSheet` [decorator](https://refactoring.guru/design-patterns/decorator) just +to do that! + +This pass also resolves any [namespaces](https://www.w3.org/TR/css3-namespace/), +and corrects [`:before` & `:after`](https://www.w3.org/TR/CSS2/selector.html#before-and-after) +to be parsed as pseudoelements. + +## Media Queries & `@import` +CSS defines a handful of at-rules which can control whether contained style rules +will be applied: + +* [`@document`](https://developer.mozilla.org/en-US/docs/Web/CSS/@document) allows user & useragent stylesheets to apply style rules only for certain (X)HTML documents & URLs. An interesting Rhapsode-specific feature is [`@document unstyled`](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Preprocessor/Conditions.hs#n84) which applies only if no author styles have already been parsed. +* [`@media`](https://drafts.csswg.org/css-conditional-3/#at-media) applies it's style rules only if the given media query evaluates to true. Whilst in Rhapsode only the [`speech`](https://www.w3.org/TR/CSS2/media.html#media-types) or `-rhapsode` mediatypes are supported, I've implemented a full caller-extensible [Shunting Yard](https://en.wikipedia.org/wiki/Shunting-yard_algorithm) interpretor. +* [`@import`](https://www.w3.org/TR/css3-cascade/#at-import) fetches & parses the given URL if the given mediatype evaluates to true when you call [`loadImports`](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Preprocessor/Conditions.hs#n138). As a privacy protection for future browsers, callers may avoid hardware details leaking to the webserver by being more vague in this pass. +* [`@supports`](https://drafts.csswg.org/css-conditional-3/#at-supports) applies style rules only if the given CSS property or selector syntax parses successfully. + +Since media queries might need to be rechecked when, say, the window has been resized +`@media` (and downloaded `@import`) are resolved to populate a new `StyleSheet` +implementation only when the [`resolve`](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Preprocessor/Conditions.hs#n151) +function is called. Though again this is overengineered for Rhapsode's uses as +instead of window it renders pages to an infinite auditory timeline, media queries +are *barely* useful here. + +## Indexing +Ultimately Rhapsode parses CSS style rules to be stored in a [hashmap](https://en.wikipedia.org/wiki/Hash_table) +(or rather a [Hash Array Mapped Trie](https://en.wikipedia.org/wiki/Hash_array_mapped_trie)) +[indexed](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Style/Selector/Index.hs#n50) +under the right-most selector if any. This dramatically cuts down on how +many style rules have to be considered for each element being styled. + +So that for each element needing styling, it [looks up](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Style/Selector/Index.hs#n68) +just those style rules which match it's name, attributes, IDs, and/or classes. +However this only considers a single test from each rules' selector, so we need a… + +## Interpretor +To truly determine whether an element matches a [CSS selector](https://www.w3.org/TR/selectors-3/), +we need to actually evaluate that selector! I've implemented this in 3 parts: + +* [Lowering](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Style/Selector/Interpret.hs#n53) - Reduces how many types of selector tests need to be compiled by e.g. converting `.class` to `[class~=class]`. +* [Compilation](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Style/Selector/Interpret.hs#n34) - Converts the parsed selector into a [lambda](https://teraum.writeas.com/anatomy-of-things) function you can call as the style rule is being added to the store. +* [Runtime](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Style/Selector/Interpret.hs#n88) - Provides functions that may be called as part of evaluating a CSS selector. + +Whether there's actually any compilation happening is another question for the +[Glasgow Haskell Compiler](https://www.haskell.org/ghc/), but regardless I find +it a convenient way to write and think about it. + +Selectors are interpreted from right-to-left as that tend to shortcircuit sooner, +upon an alternate inversely-linked representation of the element tree parsed by +[XML Conduit](https://hackage.haskell.org/package/xml-conduit). + +**NOTE** In webapp-capable browser engines [`querySelectorAll`](https://developer.mozilla.org/en-US/docs/Web/API/Element/querySelectorAll) +tends to use a *slightly* different selector interpretor because there we know +the ancestor element. This makes it more efficient to interpret *those* selectors +left-to-right. + +## Specificity +Style rules should be sorted by a ["selector specificity"](https://www.w3.org/TR/selectors-3/#specificity), +which is computed by counting tests on IDs, classes, & tagnames. With ties broken +by which come first in the source code and whether the stylesheet came from the +[browser, user, or webpage](https://www.w3.org/TR/CSS2/cascade.html#cascade). + +This is implemented as a decorator around the interpretor & (in turn) indexer. +Another decorator strips [`!important`](https://www.w3.org/TR/CSS2/cascade.html#important-rules) +off the end of any relevant CSS property values, generating new style rules with +higher priority. + +## Validation +Once `!important` is stripped off, the embedding application is given a chance +to validate whether the syntax is valid &, as such, whether it should participate +in the CSS cascade. Invalid properties are discarded. + +At the same time the embedding application can expand CSS +[shorthands](https://developer.mozilla.org/en-US/docs/Web/CSS/Shorthand_properties) +into one or more longhand properties. E.g. convert `border-left: thin solid black;` +into `border-left-width: thin; border-left-style: solid; border-left-color: black;`. + +## CSS [Cascade](https://www.w3.org/TR/css3-cascade/) +This was trivial to implement! Once you have a list of style rules listed by +specificity, just load all their properties into a +[hashmap](http://hackage.haskell.org/package/unordered-containers) & back! + +Maybe I'll write a little blogpost about how many webdevs seem to be +[scared of the cascade](https://mxb.dev/blog/the-css-mindset/#h-the-cascade-is-your-friend)… + +After cascade, methods are called on a given [`PropertyParser`](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Style/Cascade.hs#n18) +to parse each longhand property into an in-memory representation that's easier +to process. This typeclass *also* has useful decorators, though few are needed +for the small handful of speech-related properties. + +Haskell's [pattern matching](http://learnyouahaskell.com/syntax-in-functions#pattern-matching) +syntax makes the tidious work of parsing the +[sheer variety](https://www.w3.org/TR/CSS2/propidx.html#q24.0) of CSS properties +absolutely trivial. I didn't have to implement a DSL like other +[browser engines do](http://trac.webkit.org/browser/webkit/trunk/Source/WebCore/css/CSSProperties.json)! +This is the reason why I chose Haskell! + +## CSS Variables [`var()`](https://www.w3.org/TR/css-variables-1/) +In CSS3, any property prefixed with [`--`](https://www.w3.org/TR/css-variables-1/#defining-variables) +will participate in CSS cascade to specify what tokens the `var()` function should +substitute in. If the property no longer parses successfully after this substitution +it is ignored. A bit of a [gotcha for webdevs](https://matthiasott.com/notes/css-custom-properties-fail-without-fallback), +but makes it quite trivial for me to implement! + +In fact, beyond prioritizing extraction of `--`-prefixed properties, I needed little +more than a [trivial](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Style.hs#n91) +`PropertyParser` decorator. + +## [Counters](https://www.w3.org/TR/css-counter-styles-3/) +There's a [handful of CSS properties](https://www.w3.org/TR/CSS2/text.html#q16.0) +which alters the text parsed from the HTML document, predominantly by including +counters. Which I use to render [`
    `](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/ol) +elements. Or to generate marker labels for the arrow keys to jump to. + +To implement these I added a [`StyleTree`](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/StyleTree.hs) +abstraction to hold the relationship between all parsed `PropertyParser` style +objects & aid tree traversals. From there I implemented a [second](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Preprocessor/Text.hs#n31) +`PropertyParser` decorator with two tree traversals: +[one](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Preprocessor/Text.hs#n179) +to collapse whitespace & the [other](https://git.adrian.geek.nz/haskell-stylist.git/tree/src/Data/CSS/Preprocessor/Text.hs#n112) +to track counter values before substituting them (as strings) in-place of any +[`counter()`](https://www.w3.org/TR/CSS2/generate.html#counter-styles) or +[`counters()`](https://developer.mozilla.org/en-US/docs/Web/CSS/counters()) functions. + +## [`url()`](https://www.w3.org/TR/CSS2/syndata.html#uri) +In most browser engines any resource references (via the `url()` function, which +incidentally requires special effort to lex correctly & resolve any relative links) +is resolved after the page has been fully styled. I opted to do this prior to +styling instead, as a privacy measure I found just as easy to implement as it +would be not to do so. + +Granted this does lead to impaired functionality of the +[`style`](https://www.w3.org/TR/html401/present/styles.html#h-14.2.2) +attribute, but please don't use that anyways! + +This was implemented as a pair of `StyleSheet` implementations: one to extract +relevant URLs from the stylesheet, and the other to substitute in the filepaths +where they were downloaded. eSpeak NG will parse these +[`.wav`](http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html) +files when it's ready to play these sound effects. + +## [CSS Inheritance](https://www.w3.org/TR/CSS2/cascade.html#inheritance) +Future browser engines of mine will handle this differently, but for Rhapsode I +simply reformat the style tree into a [SSML document](https://www.w3.org/TR/speech-synthesis/) +to hand to straight to [eSpeak NG](https://adrian.geek.nz/docs/espeak.html). + +[eSpeak NG](http://espeak.sourceforge.net/ssml.html) (running in-process) will +then parse this XML with the aid of a stack to convert it into control codes +within the text it's later stages will gradually convert to sound. + +--- + +While all this *is* useful to webdevs wanting to give a special feel to their +webpages (which, within reason, I don't object to), my main incentive to implement +CSS was for my own sake in designing Rhapsode's +[useragent stylesheet](https://git.adrian.geek.nz/rhapsode.git/tree/useragent.css). +And that stylesheet takes advantage of most of the above. + +Sure there are features (like support for CSS variables or most pseudoclasses) I +decided to implement just because they were easy, but the only thing I'd consider +extra complexity beyond the needs of an auditory browser engine are media queries. +But I'm sure I'll find a use for those in future browser engines. + +Otherwise all this code would have to be in Rhapsode in some form or other to +give a better auditory experience than eSpeak NG can deliver itself! diff --git a/_posts/2021-01-23-why-html.md b/_posts/2021-01-23-why-html.md new file mode 100644 index 0000000..4c73421 --- /dev/null +++ b/_posts/2021-01-23-why-html.md @@ -0,0 +1,51 @@ +--- +layout: post +title: Why (Mostly-)Standard HTTP/HTML/optional CSS? +author: Adrian Cochrane +date: 2021-01-23 15:20:24 +1300 +--- + +[Modern](https://webkit.org/) [web](https://www.chromium.org/blink) [browsers](https://hg.mozilla.org/mozilla-central/) are massively complex [beasts](https://roytang.net/2020/03/browsers-http/), implementing an [evergrowing mountain](https://drewdevault.com/2020/03/18/Reckless-limitless-scope.html) of [supposedly-open standards](https://html.spec.whatwg.org/multipage/) few can [keep up with](https://css-tricks.com/the-ecological-impact-of-browser-diversity/). So why do I think I *can* do better whilst adhering to the [same standards](https://w3.org/)? Why do I think that's valuable? + +## Where's the complexity? +[XML](https://www.w3.org/TR/2008/REC-xml-20081126/), [XHTML](https://www.w3.org/TR/xhtml1/), & [HTTP](https://tools.ietf.org/html/rfc2616) are all very trivial to parse, with numerous [parsers](https://en.wikipedia.org/wiki/Category:XML_parsers) implemented in just about every programming language. HTML *wouldn't* be much worse if it weren't for WHATWG's [error recovery specification](https://html.spec.whatwg.org/multipage/parsing.html#tree-construction) most webdevs *don't* seem to take advantage of anyways. [CSS](https://www.w3.org/Style/CSS/Overview.en.html) should be *optional* for web browsers to support, and most of the complexity there is better considered an inherent part of rendering international, resizable, formatted [text](https://gankra.github.io/blah/text-hates-you/). Even if your OS is hiding this complexity from applications. + +If it's not there, *where* is the complexity? [Richtext layout](https://raphlinus.github.io/text/2020/10/26/text-layout.html) in arbitrarily-sized windows is one answer, which I think is disrespectful to want to [do away with](https://danluu.com/sounds-easy/). But unlike what some browser devs suggest it isn't the full answer. + +Expecting [JavaScript](https://262.ecma-international.org/10.0/) to be [fast](https://bellard.org/quickjs/) yet [secure](https://webkit.org/blog/8048/what-spectre-and-meltdown-mean-for-webkit/) so you can [reshape](https://andregarzia.com/2020/03/private-client-side-only-pwas-are-hard-but-now-apple-made-them-impossible.html) a beautiful document publishing system into the application distribution platform you're upset your proprietary OS doesn't deliver *is* a massive source of ([intellectually](https://webkit.org/blog/10308/speculation-in-javascriptcore/)-[stimulating](https://v8.dev/blog/pointer-compression)) complexity. The 1990's-era over-engineered [object-oriented](https://web.archive.org/web/20010429235709/http://www.bluetail.com/~joe/vol1/v1_oo.html) representation of parsed HTML which leaves nobody (including JavaScript optimizers) happy is almost 200,000 lines of code in WebKit, which CSS barely cares about. The [videoconferencing backend](https://www.w3.org/TR/webrtc/) they embed for [Google Hangouts](https://hangouts.google.com/) takes almost as much code as the rest of the browser! + +I can back up these claims both qualitatively & quantitatively. + +So yes, dropping JavaScript support makes a huge difference! Not worrying about parsing long-invalid HTML correctly makes a difference, not that we shouldn't [recover from errors](https://www.w3.org/2004/04/webapps-cdf-ws/papers/opera.html). Even moving webforms out-of-line from their embedding webpages to simplify the user interaction, to the point they can be accessed via the unusual human input devices that interests me, makes a difference. Whilst stopping webdevs from [complaining](https://css-tricks.com/custom-styling-form-inputs-with-modern-css-features/) about OS-native controls clashing with their designs. + +There's lots and lots of feature bloat we can remove from web browsers before we jump ship to something [new](http://gopher.floodgap.com/overbite/relevance.html). + +## There's Valuable Writing Online +To many [the web is now](https://thebaffler.com/latest/surfin-usa-bevins) just a handful of Silicon Valley giants, surrounded by newssites, etc [begging you](https://invidiou.site/watch?v=OFRjZtYs3wY) to accept numerous popups before you can start reading. It's no wonder they want to burn it to the ground! + +But beneath all the skyscrapers and commercialization there's already a vast beautiful underbelly of [knowledge](http://www.perseus.tufts.edu/hopper/) and [entertainment](https://decoderringtheatre.com/). Writing that deserves that [deserves to be preserved](http://robinrendle.com/essays/newsletters.html). Pages that, for the most part, works perfectly fine in Rhapsode as validated by manual testing. + +It is for this "[longtail](https://longtail.typepad.com/the_long_tail/2008/11/does-the-long-t.html)" I develop Rhapsode. I couldn't care less that I broke the "[fat head](https://facebook.com/)". + +## Links To Webapps +A common argument in favour of jumping ship to, say, [Gemini](https://gemini.circumlunar.space/) (not that I dislike Gemini) is that on the existing web readers are bound to [frequently encounter links](https://gemini.circumlunar.space/docs/faq.html) to, say, JavaScript-reliant sites. I think such arguments underestimate [how few sites](https://hankchizljaw.com/wrote/the-(extremely)-loud-minority/) are actually broken in browsers like [Rhapsode](https://rhapsode.adrian.geek.nz/), [Lynx](https://lynx.browser.org/), & [Dillo](https://www.dillo.org/). This damage is easily repairable with a little automation, which has already been done via content mirrors like [Nitter](https://nitter.net/) & [Invidious](https://invidio.us/). + +Rhapsode supports URL [redirection/blocking extensions](https://hackage.haskell.org/package/regex-1.1.0.0/docs/Text-RE-Tools-Edit.html) for this very reason, and I hope that it's novelty leads people to forgive any other brokenness they encounter. Rightfully blaming the website instead. + +## Why Not JavaScript? +To be clear, I do not wish to demean anyone for using JavaScript. There are valid use cases you can't yet achieve any other way, *some* of which has enhanced the document web and we should [find alternative more declarative ways](http://john.ankarstrom.se/replacing-javascript/) to preserve. I always like a [good visualization](https://www.joshworth.com/dev/pixelspace/pixelspace_solarsystem.html)! And there is a need for interactive apps to let people do more with computers than read what others have written. + +What I want long term is for JavaScript to leave the document web. For the browsers' feature set (like [payments](https://tools.ietf.org/html/rfc8905) & [videocalls](https://tools.ietf.org/html/rfc3261#section-19.1)) to be [split between more apps](https://www.freedesktop.org/wiki/Distributions/AppStream/). If this means websites & webapps split into their own separate platforms, I'll be happy. Though personally I'd prefer to oneclick-install beautiful [consistently-designed](https://elementary.io/docs/human-interface-guidelines) apps from the [elementary AppCenter](https://appcenter.elementary.io/)! And I want it to be reasonable to audit any software running on my computer. + +In part my complaint with JavaScript is that it's where most of the web's recent feature bloat has been landing. But I do think it was a mistake to allow websites to run [arbitrary computation](https://garbados.github.io/my-blog/browsers-are-a-mess.html) on the client. Sure that computation is "[sandboxed](https://web.archive.org/web/20090424010915/http://www.google.com/googlebooks/chrome/small_26.html)" but that sandbox isn't as secure (eBay's been able to determine which ports you have open [citation needed]) as we thought especially given [hardware vulnerabilities](https://spectreattack.com/), it's restrictions [are loosening](https://web.dev/usb/), & there's plenty of antifeatures you can add well within it's bounds. JavaScript [degrades my experience](https://www.wired.com/2015/11/i-turned-off-javascript-for-a-whole-week-and-it-was-glorious/) on the web far more often than it enhances it. + +I want [standards](https://www.freedesktop.org/wiki/Specifications/) that give implementers [UI leeway](https://yewtu.be/watch?v=fPFdV-Z69Lo). JavaScript is not that! + +Even if I did implement JavaScript in Rhapsode all that would accomplish is raise expectations impossibly high, practically none of those JavaScript-*reliant* websites (where they don't [block me outright](https://www.bleepingcomputer.com/news/google/google-now-bans-some-linux-web-browsers-from-their-services/)) will deliver a decent auditory UX. JavaScript isn't necessary for delivering a [great auditory UX](https://www.smashingmagazine.com/2020/12/making-websites-accessible/), only for repairing the damage from focusing exclusively on sighted readers. + +## Why CSS? +Webdevs harm the [readability of their websites](https://css-tricks.com/reader-mode-the-button-to-beat/) via CSS frequently enough that most browsers offer a button to replace those stylesheets. So why do I want to let them continue? + +I don't. I want a working CSS engine for my own sake in designing Rhapsode's auditory experience, and to allow readers to repair broken websites in a familiar language. I think I can expose it to webdevs whilst minimizing the damage they can do, by e.g. not supporting overlays & enforcing minimum text contrast in visual browsers. For Rhapsode I prevent websites from overriding the "dading" used to indicate links you can repeat back for it to follow. + +Regardless I believe CSS should be *optional*. Web browsers shouldn't *have to* implement CSS. Websites shouldn't *have to* provide CSS for their pages to be legible on modern monitors. And users must be able to switch stylesheets if the current one doesn't work for them. diff --git a/_posts/2021-06-13-voice2json.md b/_posts/2021-06-13-voice2json.md new file mode 100644 index 0000000..f1a50e1 --- /dev/null +++ b/_posts/2021-06-13-voice2json.md @@ -0,0 +1,161 @@ +--- +layout: post +title: Voice Input Supported in Rhapsode 5! +author: Adrian Cochrane +date: 2021-06-13T16:10:28+12:00 +--- +Not only can Rhapsode read pages aloud to you via [eSpeak NG](https://github.com/espeak-ng/espeak-ng) +and it's [own CSS engine](/2020/11/12/css.html), but now you can speak aloud to *it* via +[Voice2JSON](https://voice2json.org/)! All without trusting or relying upon any +[internet services](https://www.gnu.org/philosophy/who-does-that-server-really-serve.html), +except ofcourse for [bogstandard](https://datatracker.ietf.org/doc/html/rfc7230) +webservers to download your requested information from. Thereby completing my +[vision](/2020/10/31/why-auditory.html) for Rhapsode's reading experience! + +This speech recognition can be triggered either using the space key or by calling Rhapsode's name +(Okay, by saying Hey Mycroft because I haven't bothered to train it). + +## Thank you Voice2JSON! +Voice2JSON is **exactly** what I want from a speech-to-text engine! + +Accross it's 4 backends (CMU [PocketSphinx](https://github.com/cmusphinx/pocketsphinx), +Dan Povey's [Kaldi](https://kaldi-asr.org/), Mozilla [DeepSpeech](https://github.com/mozilla/DeepSpeech), +& Kyoto University's [Julius](https://github.com/julius-speech/julius)) it supports +*18* human languages! I always like to see more language support, but *this is impressive*. + +I can feed it (lightly-preprocessed) whatever random phrases I find in link elements, etc +to use as voice commands. Even feeding it different commands for every webpage, including +unusual words. + +It operates entirely on your device, only using the internet initially to download +an appropriate profile for your language. + +And when I implement webforms it's slots feature will be **invaluable**. + +The only gotcha is that I needed to also add a [JSON parser](https://hackage.haskell.org/package/aeson) +to Rhapsode's dependencies. + +## Mechanics +To operate Voice2JSON you rerun [`voice2json train-profile`](http://voice2json.org/commands.html#train-profile) +everytime you edit [`sentences.ini`](http://voice2json.org/sentences.html) or +any of it's referenced files to update the list of supported voice commands. +This prepares a language model to guide the output of +[`voice2json transcribe-stream`](http://voice2json.org/commands.html#transcribe-stream) +or [`transcribe-wav`](http://voice2json.org/commands.html#transcribe-wav), +who's output you'll probably pipe into +[`voice2json recognize-intent`](http://voice2json.org/commands.html#recognize-intent) +to determine which intent from `sentences.ini` it matches. + +If you want this voice recognition to be triggered by some wake word +run [`voice2json wait-wake`](http://voice2json.org/commands.html#wait-wake) +to determine when that keyphrase has been said. + +### `voice2json train-profile` +For every page Rhapsode outputs a `sentences.ini` file & runs `voice2json train-profile` +to compile this mix of [INI](https://www.techopedia.com/definition/24302/ini-file) & +[Java Speech Grammar Format](https://www.w3.org/TR/jsgf/) syntax into an appropriate +[NGram](https://blog.xrds.acm.org/2017/10/introduction-n-grams-need/)-based +language model for the backend chosen by the +[downloaded profile](https://github.com/synesthesiam/voice2json-profiles). + +Once it's parsed `sentences.ini` Voice2JSON optionally normalizes the sentence casing and +lowers any numeric ranges, slot references from external files or programs, & numeric digits +via [num2words](https://pypi.org/project/num2words/) before reformatting it into a +[NetworkX](https://pypi.org/project/networkx/) [graph](https://www.redblobgames.com/pathfinding/grids/graphs.html) +with weighted edges. This resulting +[Nondeterministic Finite Automaton](https://www.geeksforgeeks.org/%E2%88%88-nfa-of-regular-language-l-0100-11-and-l-b-ba/) (NFA) +is [saved](https://docs.python.org/3/library/pickle.html) & [gzip](http://www.gzip.org/)'d +to the profile before lowering it further to an [OpenFST](http://www.openfst.org/twiki/bin/view/FST/WebHome) +graph which, with a handful of [opengrm](http://www.opengrm.org/twiki/bin/view/GRM/WebHome) commands, +is converted into an appropriate language model. + +Whilst lowering the NFA to a language model Voice2JSON looks up how to pronounce every unique +word in that NFA, consulting [Phonetisaurus](https://github.com/AdolfVonKleist/Phonetisaurus) +for any words the profile doesn't know about. Phonetisaurus in turn evaluates the word over a +[Hidden Markov](https://www.jigsawacademy.com/blogs/data-science/hidden-markov-model) n-gram model. + +### `voice2json transcribe-stream` + +`voice2json transcribe-stream` pipes 16bit 16khz mono [WAV](https://datatracker.ietf.org/doc/html/rfc2361)s +from a specified file or profile-configured record command +(defaults to [ALSA](https://alsa-project.org/wiki/Main_Page)) +to the backend & formats it's output sentences with metadata inside +[JSON Lines](https://jsonlines.org/) objects. To determine when a voice command +ends it uses some sophisticated code [extracted](https://pypi.org/project/webrtcvad/) +from *the* WebRTC implementation (from Google). + +That 16khz audio sampling rate is interesting, it's far below the 44.1khz sampling +rate typical for digital audio. Presumably this reduces the computational load +whilst preserving the frequencies +(max 8khz per [Nyquist-Shannon](https://invidio.us/watch?v=cIQ9IXSUzuM)) +typical of human speech. + +### `voice2json recognize-intent` + +To match this output to the grammar defined in `sentences.ini` Voice2JSON provides +the `voice2json recognize-intent` command. This reads back in the compressed +NetworkX NFA to find the best path, fuzzily or not, via +[depth-first-search](https://www.techiedelight.com/depth-first-search) which matches +each input sentence. Once it has that path it iterates over it to resolve & capture: + +1. Substitutions +2. Conversions +3. Tagged slots + +The resulting information from each of these passes is gathered & output as JSON Lines. + +In Rhapsode I apply a further fuzzy match, the same I've always used for keyboard input, +via [Levenshtein Distance](https://devopedia.org/levenshtein-distance). + +### `voice2json wait-wake` + +To trigger Rhapsode to recognize a voice command you can either press a key +or, to stick to pure voice control, saying a wakeword