Rendering a webpage in Rhapsode takes little more than applying a useragent stylesheet to decide how the page's semantics should be communicated. In addition to any installed userstyles and optionally author styles.
Once the CSS has been applied Rhapsode sends the styled text to eSpeak NG to be converted into the sounds you hear. So how does Rhapsode apply that CSS?
Parser implementations differ mainly in what they implement rather than how. They repeatedly look at the next character(s) in the input stream to decide how to represent it in-RAM. Often there'll be a "lexing" step (for which I use Haskell CSS Syntax) to categorize consecutive characters into "tokens", thereby simplifying the main parser.
My choice to use Haskell, however, does change things
a little. In Haskell there can be no side effects;
all outputs must be returned.
So in addition to the parsed tree, each part of the parser must return the rest
of text that still needs to be parsed by another sub-parser. Yielding a type
signature of :: [Token] -> (a, [Token])
,
leading Haskell to allow you to combine these subparsers together in what's
called "parser combinators".
Once each style rule is parsed, a method is called on a
StyleSheet
"typeclass"
to return a modified datastructure containing the new rule. And a different method
is called to parse any at-rules.
Many of my StyleSheet
implementations handle only certain aspects of CSS,
handing off to another implementation to perform the rest.
For example most pseudoclasses (ignoring interactive aspects I have no plans to
implement) can be re-written into simpler selectors. So I added a configurable
StyleSheet
decorator just
to do that!
This pass also resolves any namespaces,
and corrects :before
& :after
to be parsed as pseudoelements.
@import
CSS defines a handful of at-rules which can control whether contained style rules will be applied:
@document
allows user & useragent stylesheets to apply style rules only for certain (X)HTML documents & URLs. An interesting Rhapsode-specific feature is @document unstyled
which applies only if no author styles have already been parsed.@media
applies it's style rules only if the given media query evaluates to true. Whilst in Rhapsode only the speech
or -rhapsode
mediatypes are supported, I've implemented a full caller-extensible Shunting Yard interpretor.@import
fetches & parses the given URL if the given mediatype evaluates to true when you call loadImports
. As a privacy protection for future browsers, callers may avoid hardware details leaking to the webserver by being more vague in this pass.@supports
applies style rules only if the given CSS property or selector syntax parses successfully.Since media queries might need to be rechecked when, say, the window has been resized
@media
(and downloaded @import
) are resolved to populate a new StyleSheet
implementation only when the resolve
function is called. Though again this is overengineered for Rhapsode's uses as
instead of window it renders pages to an infinite auditory timeline, media queries
are barely useful here.
Ultimately Rhapsode parses CSS style rules to be stored in a hashmap (or rather a Hash Array Mapped Trie) indexed under the right-most selector if any. This dramatically cuts down on how many style rules have to be considered for each element being styled.
So that for each element needing styling, it looks up just those style rules which match it's name, attributes, IDs, and/or classes. However this only considers a single test from each rules' selector, so we need a…
To truly determine whether an element matches a CSS selector, we need to actually evaluate that selector! I've implemented this in 3 parts:
.class
to [class~=class]
.Whether there's actually any compilation happening is another question for the Glasgow Haskell Compiler, but regardless I find it a convenient way to write and think about it.
Selectors are interpreted from right-to-left as that tend to shortcircuit sooner, upon an alternate inversely-linked representation of the element tree parsed by XML Conduit.
NOTE In webapp-capable browser engines querySelectorAll
tends to use a slightly different selector interpretor because there we know
the ancestor element. This makes it more efficient to interpret those selectors
left-to-right.
Style rules should be sorted by a "selector specificity", which is computed by counting tests on IDs, classes, & tagnames. With ties broken by which come first in the source code and whether the stylesheet came from the browser, user, or webpage.
This is implemented as a decorator around the interpretor & (in turn) indexer.
Another decorator strips !important
off the end of any relevant CSS property values, generating new style rules with
higher priority.
Once !important
is stripped off, the embedding application is given a chance
to validate whether the syntax is valid &, as such, whether it should participate
in the CSS cascade. Invalid properties are discarded.
At the same time the embedding application can expand CSS
shorthands
into one or more longhand properties. E.g. convert border-left: thin solid black;
into border-left-width: thin; border-left-style: solid; border-left-color: black;
.
This was trivial to implement! Once you have a list of style rules listed by specificity, just load all their properties into a hashmap & back!
Maybe I'll write a little blogpost about how many webdevs seem to be scared of the cascade…
After cascade, methods are called on a given PropertyParser
to parse each longhand property into an in-memory representation that's easier
to process. This typeclass also has useful decorators, though few are needed
for the small handful of speech-related properties.
Haskell's pattern matching syntax makes the tidious work of parsing the sheer variety of CSS properties absolutely trivial. I didn't have to implement a DSL like other browser engines do! This is the reason why I chose Haskell!
var()
In CSS3, any property prefixed with --
will participate in CSS cascade to specify what tokens the var()
function should
substitute in. If the property no longer parses successfully after this substitution
it is ignored. A bit of a gotcha for webdevs,
but makes it quite trivial for me to implement!
In fact, beyond prioritizing extraction of --
-prefixed properties, I needed little
more than a trivial
PropertyParser
decorator.
There's a handful of CSS properties
which alters the text parsed from the HTML document, predominantly by including
counters. Which I use to render <ol>
elements. Or to generate marker labels for the arrow keys to jump to.
To implement these I added a StyleTree
abstraction to hold the relationship between all parsed PropertyParser
style
objects & aid tree traversals. From there I implemented a second
PropertyParser
decorator with two tree traversals:
one
to collapse whitespace & the other
to track counter values before substituting them (as strings) in-place of any
counter()
or
counters()
functions.
url()
In most browser engines any resource references (via the url()
function, which
incidentally requires special effort to lex correctly & resolve any relative links)
is resolved after the page has been fully styled. I opted to do this prior to
styling instead, as a privacy measure I found just as easy to implement as it
would be not to do so.
Granted this does lead to impaired functionality of the
style
attribute, but please don't use that anyways!
This was implemented as a pair of StyleSheet
implementations: one to extract
relevant URLs from the stylesheet, and the other to substitute in the filepaths
where they were downloaded. eSpeak NG will parse these
.wav
files when it's ready to play these sound effects.
Future browser engines of mine will handle this differently, but for Rhapsode I simply reformat the style tree into a SSML document to hand to straight to eSpeak NG.
eSpeak NG (running in-process) will then parse this XML with the aid of a stack to convert it into control codes within the text it's later stages will gradually convert to sound.
While all this is useful to webdevs wanting to give a special feel to their webpages (which, within reason, I don't object to), my main incentive to implement CSS was for my own sake in designing Rhapsode's useragent stylesheet. And that stylesheet takes advantage of most of the above.
Sure there are features (like support for CSS variables or most pseudoclasses) I decided to implement just because they were easy, but the only thing I'd consider extra complexity beyond the needs of an auditory browser engine are media queries. But I'm sure I'll find a use for those in future browser engines.
Otherwise all this code would have to be in Rhapsode in some form or other to give a better auditory experience than eSpeak NG can deliver itself!