27 December 2020 ; tagged in: advent of code , haskell

Advent of Code 2020 day 18

Going back to the source for parser combinators

Advent of Code 2020 day 18

Day 18 was all about the parsing, rather than the coding. It also meant going back to the original papers that prompted the development of attoparsec and other parsing libraries.

Data structures

This is about representing arithmetic expressions then evaluating them. As I didn't know what part 2 would bring, I thought I'd develop a data structure to hold a parsed expression, then evaluate it in a separate stage. It's just an algebraic type to hold the different operators. There's no need for a "bracket" option, as the final expression precedence is given by the shape of the Expression tree.

data Expression 
  = Number Integer
  | Mul Expression Expression
  | Add Expression Expression
  deriving (Eq)

instance Show Expression where
  show (Number x) = show x
  show (Mul a b) = "(" ++ show a ++ " * " ++ show b ++ ")"
  show (Add a b) = "(" ++ show a ++ " + " ++ show b ++ ")"

evaluate (Number x)  = x
evaluate (Mul a b) = (evaluate a) * (evaluate b)
evaluate (Add a b) = (evaluate a) + (evaluate b)

(In my code, I decided not to use a custom Show instance, as I wanted to be clear what the parsers were generating.

First attempts

The parsing and association rules need to turn an expression like 1 + 2 * 3 + 4 into an Expression tree like this:

     / \
    *   4
   / \
  +   3
 / \
1   2

A first attempt at parsing looked like this.

expressionP = mulExpression <|> addExpression
mulExpressionP = Mul <$> expressionP <* " * " *> elementP
addExpressionP = Add <$> expressionP <* " + " *> elementP

elementP = numberP <|> bracketedP
bracketedP = "(" *> exprP <* ")"

Which looks good, but fails instantly with an infinite loop. A little thought reveals why: the first + needs to be a few levels down in the tree. The parser needs to parse the whole expression in order to know how many levels down it needs to be. But the structure of the parser means it needs to make a decision now about this first +, before it's seen the rest of the expression. So it keeps creating more layers in the tree, hoping they'll be enough but never terminating.

Getting parsing working

This problem was solved in the original paper that introduced parser combinators. That paper illustrated the use of combinators with a calculator example, correctly parsing expressions like 1 + 2 + 3 into (1 + 2) + 3, which is what we want.

(Hutton, G., Meijer,  E. (1998) “Monadic parsing in Haskell,” Journal of Functional Programming 8(4): 437–444).

The trick is to use the chainl1 combinator to chain, from the left, a bunch of operations. It's easier to use a more Applicative style for this combinator (Applicative hadn't been identified in 1998!), as described by Vibhav Sagar and Stephen Diehl. I'm slightly prefer Diehl's verison, so I'm using it here.

mulExprP = string " * " *> pure Mul
addExprP = string " + " *> pure Add

expessionrP = elementP `chainl1` (mulExprP <|> addExprP)

chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a
p `chainl1` op = do a <- p
                    rest a
  where rest a = (do f <- op
                     b <- p
                     rest (f a b))
                 <|> return a

In this instance of the chainl1 parser, it first reads an element (a number or a bracketed term) and calls it a. If it can read the op (e.g. *), it combines a and the op with the result of parsing the rest of the input. If parsing rest fails, it just returns a.

That gives us the parse we want.

(chainl1 exists in the Text.ParserCombinators.ReadP library, but its somehow missing from attoparsec. But I'm using attoparsec now, so I'll just reimplement it.)

Given the ability to parse inputs, evaluating them is easy.

main = 
  do  text <- TIO.readFile "data/advent18.txt"
      print $ part1 text

part1 = parseAndEval expressionsP

parseAndEval parser text = sumEval $ successfulParse parser text

sumEval = sum . map evaluate

Introducing precedence

Using operator precedence takes my code closer to the original Functional Pearl, even if the precedence rules are reversed from convention. All that's needed are a couple of changes to the parser, and new names to avoid confusion.

pExpressionsP = pExprP `sepBy` endOfLine

pExprP = pTermP `chainl1` mulExprP
pTermP = pElementP `chainl1` addExprP

pElementP = numberP <|> pBracketedP
pBracketedP = "(" *> pExprP <* ")"

That's all that's needed!


You can find the code here or on GitLab.