~alcinnz/argonaut-constellation.org

ref: c404a268fe871dfed272a553daf9e27ac8221286 argonaut-constellation.org/_posts/2022-07-04-op-parsing.md -rw-r--r-- 6.3 KiB
c404a268 — Adrian Cochrane Add link to blog. 1 year, 11 months ago

#layout: post title: Operator Parsing in Haskell author: Adrian Cochrane date: 2022-07-04 19:48:15 +1200

Plenty has happened since I last blogged here, including banging my head against Haskell's text rendering libraries (hoped to be blogging about that...) & rejigging my plans for Rhapsode & similar browser engines. Now I'm actively taking steps towards turning Rhapsode into a decent browser you can install as an app from your package repositories, as opposed to a browser engine waiting for others to build upon. As part of that I'm commisioning art to have something to display onscreen!

This display will be delivered as a Parametric SVG file so I wrote some code to implement that SVG extension such that I can render it in a GTK UI. This code, written in Haskell to reuse the same XML parser I've been using & bridged over into Vala via their C foreign-function interfaces, lowers the parametric aspects before handing it off to GDK Pixbuf & rSVG.

It does so by traversing the XML tree (exposed as strings to Vala) interpreting every attribute with a supported namespace, generating a value to store in the corresponding unnamespaced attribute. Overriding the attribute if it already exists. The interpretor, where the bulk of the logic is, references a mapping of values populated from the Vala code.

#Lexing

Interpreting a formula is split into 3 phases: lex, parse, & evaluate. Lexing involves using reads, span, & character-checks to split the input text up into numbers, backtick-quoted strings, alphabetical identifiers, symbols, & equals-suffixed (comparison) symbols whilst ignoring whitespace. These all become strings again when parsed into an operator.

Lexing strings is more involved since:

  • They are where lexing errors can occur.
  • Parametric SVG specifies a templating syntax by which the textualized result of some subformula can be spliced into the quoted text.

To lex these string templates my implementation utilizes a co-recursive sublexer which treats this as syntactic sugar for a parenthesized # concatenation operator. The normal lexer handles } specially to hand the trailing text back to the template-string lexer.

#Parsing

Parsing is done using a Pratt/TDOP parser, which was a pleasure to write in Haskell! Since pattern-matching allowed me to more cleanly separate the layers-of-abstraction (parser vs lexer) than in a typical OO language, & Haskell's normal parsing weakness wasn't as much of a problem here.

A Top-Down Operator Precedance (TDOP) parser is a parsing technique very well suited to operator-heavy languages like mathematics (hence why I desugared template-strings!). It is centred around a core reentrant loop which parses an initial token & all subsequant tokens with a greater "binding power". Helper functions/methods are used to parse the initial token(s) (nud), parse subsequant tokens (led), & determine the binding power for those tokens (lbp):

	parse' rbp (tok:toks) = go $ nud tok toks
		where
			go (left, tok':toks')
				| rbp < lbp tok' = go $ led left tok' toks'
				| otherwise = (left, tok':toks')

This is paired with a helper function to simplify recursive calls:

	parseWith rbp toks cb = let (expr, toks') = parse' rbp toks in (cb expr, toks')

Usually the led function, & to a lesser degree the nud function, wraps a recursive call back into that core loop whilst constructing an AST (Abstract Syntax Tree). Though there are exceptions for literal values, variable names, & postfix operators like ! for computing factorials. Also parentheses messes with the binding powers & expects to be closed, whether used as a prefix or (for function calls) infix operator.

To close all in-progress loops infinite EOF (which has the lowest binding power, alongside close-paren) tokens are appended to the lexing results. Unknown operators also have zero precedance such that they'll stop the parser & allow for their syntax errors to be detected.

#Evaluation

That parser yields a binary tree to be postorder-traversed (if the parser didn't emit any Errors) to yield boolean, floating point number, or string value. Or (upon failed typechecks or unknown variables) nulls. A Haskell-datatype is defined that can hold any of those. Numbers can be converted to strings, whilst booleans & nulls yields further nulls instead.

Usually it calls one of two helper function to map each operator/function to the corresponding Haskell-equivalent. Though the ternary (condition ? iftrue : iffalse) operator is handled specially even if ? & : aren't by the parser. Same for the , (which I ensured was right-associative) & ( operators function calls consist of, these get restructured into a function name & list of pre-evaluated argument values to simplify the helper function.

The parser has minimal understanding of the different operators, mostly just enough to assign them a binding power.

#Conclusion

I don't know whether Parametric SVG is a feature I'll ever expose to webdevs, but I enjoyed implementing it. And it makes it easier to design programmatic art which'll both make Rhapsode's visuals more interesting & more accessible. I look forward to sharing these visuals once they are ready!

Atypically Rhapsode's experience has been designed audio-first, with visuals only added due to the requirements of most app platforms!