--- layout: post title: Operator Parsing in Haskell author: Adrian Cochrane date: 2022-07-04 19:48:15 +1200 --- Plenty has happened since I last blogged here, including banging my head against Haskell's text rendering libraries (hoped to be blogging about that...) & rejigging my plans for Rhapsode & similar browser engines. Now I'm actively taking steps towards turning Rhapsode into a decent browser you can install as an app from your package repositories, as opposed to a browser engine waiting for others to build upon. As part of that I'm [commisioning art](https://www.artstation.com/betalars) to have something to display onscreen! This display will be delivered as a [Parametric SVG](http://parametric-svg.js.org/) file so I wrote [some code](https://git.adrian.geek.nz/parametric-svg.git/tree/) to implement that SVG extension such that I can render it in a [GTK UI](https://gtk.org/). This code, written in [Haskell](https://www.haskell.org/) to reuse the [same XML parser](https://hackage.haskell.org/package/xml-conduit) I've been using & bridged over into [Vala](https://wiki.gnome.org/Projects/Vala) via their C foreign-function interfaces, lowers the parametric aspects before handing it off to [GDK Pixbuf](http://docs.gtk.org/gdk-pixbuf/) & [rSVG](https://wiki.gnome.org/Projects/LibRsvg). It does so by traversing the XML tree (exposed as strings to Vala) interpreting every attribute with a supported namespace, generating a value to store in the corresponding unnamespaced attribute. Overriding the attribute if it already exists. The interpretor, where the bulk of the logic is, references a mapping of values populated from the Vala code. ## Lexing Interpreting a formula is split into 3 phases: lex, parse, & evaluate. Lexing involves using [`reads`](https://hackage.haskell.org/package/base-4.16.1.0/docs/Prelude.html#v:reads), [`span`](https://hackage.haskell.org/package/base-4.16.1.0/docs/Prelude.html#v:span), & [character-checks](https://hackage.haskell.org/package/base-4.16.1.0/docs/Data-Char.html) to split the input text up into numbers, backtick-quoted strings, alphabetical identifiers, symbols, & equals-suffixed (comparison) symbols whilst ignoring whitespace. These all become strings again when parsed into an operator. Lexing strings is more involved since: * They are where lexing errors can occur. * Parametric SVG specifies a templating syntax by which the textualized result of some subformula can be spliced into the quoted text. To lex these string templates my implementation utilizes a co-recursive sublexer which treats this as syntactic sugar for a parenthesized `#` concatenation operator. The normal lexer handles `}` specially to hand the trailing text back to the template-string lexer. ## Parsing Parsing is done using a [Pratt](http://crockford.com/javascript/tdop/tdop.html)/[TDOP](https://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing/) parser, which was a pleasure to write in Haskell! Since [pattern-matching](https://hackingwithhaskell.com/basics/pattern-matching/) allowed me to more cleanly separate the layers-of-abstraction (parser vs lexer) than in a typical OO language, & Haskell's normal [parsing](https://two-wrongs.com/parser-combinators-parsing-for-haskell-beginners.html) weakness wasn't as much of a problem here. A Top-Down Operator Precedance (TDOP) parser is a parsing technique very well suited to operator-heavy languages like mathematics (hence why I desugared template-strings!). It is centred around a core [reentrant](https://en.wikipedia.org/wiki/Reentrancy_(computing)) loop which parses an initial token & all subsequant tokens with a greater "binding power". Helper functions/methods are used to parse the initial token(s) (`nud`), parse subsequant tokens (`led`), & determine the binding power for those tokens (`lbp`): parse' rbp (tok:toks) = go $ nud tok toks where go (left, tok':toks') | rbp < lbp tok' = go $ led left tok' toks' | otherwise = (left, tok':toks') This is paired with a helper function to simplify recursive calls: parseWith rbp toks cb = let (expr, toks') = parse' rbp toks in (cb expr, toks') Usually the `led` function, & to a lesser degree the `nud` function, wraps a recursive call back into that core loop whilst constructing an AST (Abstract Syntax Tree). Though there are exceptions for literal values, variable names, & postfix operators like `!` for computing [factorials](https://www.cuemath.com/numbers/factorial/). Also parentheses messes with the binding powers & expects to be closed, whether used as a prefix or (for function calls) infix operator. To close all in-progress loops infinite EOF (which has the lowest binding power, alongside close-paren) tokens are appended to the lexing results. Unknown operators also have zero precedance such that they'll stop the parser & allow for their syntax errors to be detected. ## Evaluation That parser yields a [binary tree](https://www.programiz.com/dsa/binary-tree) to be [postorder-traversed](https://www.programiz.com/dsa/tree-traversal) (if the parser didn't emit any `Error`s) to yield boolean, floating point number, or string value. Or (upon failed typechecks or unknown variables) nulls. A Haskell-datatype is defined that can hold any of those. Numbers can be converted to strings, whilst booleans & nulls yields further nulls instead. Usually it calls one of two helper function to map each operator/function to the corresponding Haskell-equivalent. Though the ternary (`condition ? iftrue : iffalse`) operator is handled specially even if `?` & `:` aren't by the parser. Same for the `,` (which I ensured was [right-associative](https://en.wikipedia.org/wiki/Operator_associativity)) & `(` operators function calls consist of, these get restructured into a function name & list of pre-evaluated argument values to simplify the helper function. The parser has minimal understanding of the different operators, mostly just enough to assign them a binding power. ## Conclusion I don't know whether Parametric SVG is a feature I'll ever expose to webdevs, but I enjoyed implementing it. And it makes it easier to design programmatic art which'll both make Rhapsode's visuals more interesting & more accessible. I look forward to sharing these visuals once they are ready! Atypically Rhapsode's experience has been designed audio-first, with visuals only added due to the requirements of most app platforms!