~alcinnz/argonaut-constellation.org

ref: af284e5567b05ddbf3a0045afd0f400ffc3a9453 argonaut-constellation.org/_posts/2022-07-04-op-parsing.md -rw-r--r-- 6.3 KiB
af284e55 — Adrian Cochrane Blog about my opposition to DRM. 1 year, 10 months ago
                                                                                
da1ec90f Adrian Cochrane
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
layout: post
title: Operator Parsing in Haskell
author: Adrian Cochrane
date: 2022-07-04 19:48:15 +1200
---

Plenty has happened since I last blogged here, including banging my head against Haskell's text rendering libraries (hoped to be blogging about that...) & rejigging my plans for Rhapsode & similar browser engines. Now I'm actively taking steps towards turning Rhapsode into a decent browser you can install as an app from your package repositories, as opposed to a browser engine waiting for others to build upon. As part of that I'm [commisioning art](https://www.artstation.com/betalars) to have something to display onscreen!

This display will be delivered as a [Parametric SVG](http://parametric-svg.js.org/) file so I wrote [some code](https://git.adrian.geek.nz/parametric-svg.git/tree/) to implement that SVG extension such that I can render it in a [GTK UI](https://gtk.org/). This code, written in [Haskell](https://www.haskell.org/) to reuse the [same XML parser](https://hackage.haskell.org/package/xml-conduit) I've been using & bridged over into [Vala](https://wiki.gnome.org/Projects/Vala) via their C foreign-function interfaces, lowers the parametric aspects before handing it off to [GDK Pixbuf](http://docs.gtk.org/gdk-pixbuf/) & [rSVG](https://wiki.gnome.org/Projects/LibRsvg).

It does so by traversing the XML tree (exposed as strings to Vala) interpreting every attribute with a supported namespace, generating a value to store in the corresponding unnamespaced attribute. Overriding the attribute if it already exists. The interpretor, where the bulk of the logic is, references a mapping of values populated from the Vala code.

## Lexing
Interpreting a formula is split into 3 phases: lex, parse, & evaluate. Lexing involves using [`reads`](https://hackage.haskell.org/package/base-4.16.1.0/docs/Prelude.html#v:reads), [`span`](https://hackage.haskell.org/package/base-4.16.1.0/docs/Prelude.html#v:span), & [character-checks](https://hackage.haskell.org/package/base-4.16.1.0/docs/Data-Char.html) to split the input text up into numbers, backtick-quoted strings, alphabetical identifiers, symbols, & equals-suffixed (comparison) symbols whilst ignoring whitespace. These all become strings again when parsed into an operator.

Lexing strings is more involved since:

* They are where lexing errors can occur.
* Parametric SVG specifies a templating syntax by which the textualized result of some subformula can be spliced into the quoted text.

To lex these string templates my implementation utilizes a co-recursive sublexer which treats this as syntactic sugar for a parenthesized `#` concatenation operator. The normal lexer handles `}` specially to hand the trailing text back to the template-string lexer.

## Parsing
Parsing is done using a [Pratt](http://crockford.com/javascript/tdop/tdop.html)/[TDOP](https://eli.thegreenplace.net/2010/01/02/top-down-operator-precedence-parsing/) parser, which was a pleasure to write in Haskell! Since [pattern-matching](https://hackingwithhaskell.com/basics/pattern-matching/) allowed me to more cleanly separate the layers-of-abstraction (parser vs lexer) than in a typical OO language, & Haskell's normal [parsing](https://two-wrongs.com/parser-combinators-parsing-for-haskell-beginners.html) weakness wasn't as much of a problem here.

A Top-Down Operator Precedance (TDOP) parser is a parsing technique very well suited to operator-heavy languages like mathematics (hence why I desugared template-strings!). It is centred around a core [reentrant](https://en.wikipedia.org/wiki/Reentrancy_(computing)) loop which parses an initial token & all subsequant tokens with a greater "binding power". Helper functions/methods are used to parse the initial token(s) (`nud`), parse subsequant tokens (`led`), & determine the binding power for those tokens (`lbp`):

		parse' rbp (tok:toks) = go $ nud tok toks
			where
				go (left, tok':toks')
					| rbp < lbp tok' = go $ led left tok' toks'
					| otherwise = (left, tok':toks')

This is paired with a helper function to simplify recursive calls:

		parseWith rbp toks cb = let (expr, toks') = parse' rbp toks in (cb expr, toks')

Usually the `led` function, & to a lesser degree the `nud` function, wraps a recursive call back into that core loop whilst constructing an AST (Abstract Syntax Tree). Though there are exceptions for literal values, variable names, & postfix operators like `!` for computing [factorials](https://www.cuemath.com/numbers/factorial/). Also parentheses messes with the binding powers & expects to be closed, whether used as a prefix or (for function calls) infix operator.

To close all in-progress loops infinite EOF (which has the lowest binding power, alongside close-paren) tokens are appended to the lexing results. Unknown operators also have zero precedance such that they'll stop the parser & allow for their syntax errors to be detected.

## Evaluation
That parser yields a [binary tree](https://www.programiz.com/dsa/binary-tree) to be [postorder-traversed](https://www.programiz.com/dsa/tree-traversal) (if the parser didn't emit any `Error`s) to yield boolean, floating point number, or string value. Or (upon failed typechecks or unknown variables) nulls. A Haskell-datatype is defined that can hold any of those. Numbers can be converted to strings, whilst booleans & nulls yields further nulls instead.

Usually it calls one of two helper function to map each operator/function to the corresponding Haskell-equivalent. Though the ternary (`condition ? iftrue : iffalse`) operator is handled specially even if `?` & `:` aren't by the parser. Same for the `,` (which I ensured was [right-associative](https://en.wikipedia.org/wiki/Operator_associativity)) & `(` operators function calls consist of, these get restructured into a function name & list of pre-evaluated argument values to simplify the helper function.

The parser has minimal understanding of the different operators, mostly just enough to assign them a binding power.

## Conclusion
I don't know whether Parametric SVG is a feature I'll ever expose to webdevs, but I enjoyed implementing it. And it makes it easier to design programmatic art which'll both make Rhapsode's visuals more interesting & more accessible. I look forward to sharing these visuals once they are ready!

Atypically Rhapsode's experience has been designed audio-first, with visuals only added due to the requirements of most app platforms!