# [Year 2015 Day 8](https://adventofcode.com/2015/day/8) This day provides a more in depth introduction to writing effectful parsers, making use of state and non-determinism in our parsers. ```idris import Data.String import Data.Vect import Data.List.Lazy import Data.Either ``` ## Parsing A "Parser" is an effectful computation that has access to a list of chars as state, can throw exceptions of type `String`, and has non-determinism through the `Choose` effect, which consumes some of the state to potentially return a value. ### Basic operations Get the next character out of the parser state, updating the state to consume that character. ```idris nextChar : Has (State (List Char)) fs => Has (Except String) fs => Eff fs Char nextChar = do c :: rest <- get | [] => throw "End of input" put rest pure c ``` Attempt to parse a single character based on the supplied predicate, consuming the character from the state and throwing the provided error if the predicate does not hold over the consumed character. ```idris char : Has (State (List Char)) fs => Has (Except String) fs => (pred : Char -> Bool) -> (err : Char -> String) -> Eff fs Char char pred err = do c <- nextChar unless (pred c) (throw $ err c) pure c ``` Type alias for a parser ```idris Parser : (res : Type) -> Type Parser res = Eff [State (List Char), Except String, Choose, Logger] res ``` Parse 0+ of a thing, speculatively attempting to apply the given parser. In the event the supplied parser fails, catch the error, unwind changes to the state, and return an empty list, otherwise append the parsed element to the head of the returned list and recurse. The rewinding of the state on failure _could_ be handled implicitly by the effect stack if `Except` was structured a bit differently, but that's a topic for another day. ```idris many : (f : Parser t) -> Parser (List t) many f = do state <- get Just x <- lift $ catch (\ _ => pure Nothing) (map Just f) | Nothing => do put state pure [] map (x ::) $ many f ``` "Parse" the end of the input, returning a unit if we are at the end of input, throwing an error otherwise. ```idris endOfInput : Parser () endOfInput = do [] <- get | xs => throw "Expected end of input, state non empty: `\{pack xs}`" pure () ``` ### Character Classes Parse a single `"`, throwing an error if the current character is anything else. Returns a unit since there is only one possible character this parses, and this avoids us having to discard the character later in our `string` parser. ```idris quote : Parser () quote = do _ <- char (== '"') (\x => "Expected `\"`, got `\{String.singleton x}`") pure () ``` Parse any character except `\` or `"` ```idris normal : Parser Char normal = char (\x => not $ any (== x) (the (List _) ['\\', '"'])) (\x => "Expected normal, got `\{String.singleton x}`") ``` #### Escaped Characters Parse the character sequence `\"`, returning just the `"`. Despite the fact that can only return one possible character, like `quote` above, we return the parsed character, as we intend to provide all the escaped character parsers to the `oneOfM` combinator later. ```idris eQuote : Parser Char eQuote = do _ <- char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`") char (== '"') (\x => "Expected `\"`, got `\{String.singleton x}`") ``` Parse the character sequence `\\`, returning just the second `\`. ```idris eSlash : Parser Char eSlash = do _ <- char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`") char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`") ``` Convert a lowercase hexadecimal digit to its numerical value. ```idris fromHex : Char -> Int fromHex c = if ord c >= 97 then ord c - 87 else ord c - 48 ``` Parse a character sequence `\xAB`, where `AB` are hexadecimal digits, and decode the numerical value of `AB`, as a hexadecimal number, into its corresponding character. ```idris eHex : Parser Char eHex = do _ <- char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`") _ <- char (== 'x') (\x => "Expected `x`, got `\{String.singleton x}`") [x, y] <- map (map $ fromHex . toLower) . sequence . Vect.replicate 2 $ (char isHexDigit (\x => "Expected hex digit, got `\{String.singleton x}`")) pure . chr $ x * 0x10 + y ``` Use the `oneOfM` combinator to combine our escaped character parsers into a single character class. `oneOfM` uses the `Choice` effect to try all of the provided parsers, conceptually in parallel, returning all of the results. ```idris escaped : Parser Char escaped = oneOfM $ the (List _) [eQuote, eSlash, eHex] ``` ### Top Level Class Combine our `normal` and `escaped` parsers into a single parser for non-quote characters. ```idris character : Parser Char character = oneOfM $ the (List _) [normal, escaped] ``` Parse a string literal by describing its layout as a quote (`"`), followed by any number of characters, then another quote, followed by the end of input. Return the characters between the outer quotes. ```idris string : Parser (List Char) string = do quote xs <- many character quote endOfInput pure xs ``` ### Running a parser Run a parser, with some debug logging, by peeling the parsing effects of of the type. The order is important here, remember that function composition "runs" right to left. We peel the state off the type first, so that we can get implicit "rewinding" of the state inside of our combinators, like `oneOfM`. For a full understanding of what's going on here, we need to see how the type signature changes as we peel effects off the type, you can uncomment the commented lines, and comment out the rest of the function, modifying the `let output =` line to follow along yourself, though the types you get may look a little different due to idris evaluating type alaises like `Eff` for you, I am keeping them in aliased forms, and excluding `Logger`, which doesn't impact the semantic of parsing, to keep the examples concise: - `runstate (unpack str) $ x` This produces a value with type `Eff [Except String, Choose] (List Char, List Char)`. In this tuple of values, the first element is the actual output of the parser, and the second element is the state after the parser has run. - `runExcept . runState (unpack str) $ x` This produces a value with type `Eff [Choose] (Either String (List Char, List Char))`. This corresponds to a computation that either returns our tuple of output and state from before, or an error. Important to note here is that, in the error case, we only return a `String` (our error type), our state is discarded. - `runChoose {f = LazyList} . runExcept . runState (unpack str) $ x` The `Choose` effect, works with any type that implements the `Alternative` interface, and the choice of type impacts the semantics. A full discussion of this is beyond the scope for today, but we chose to "run" `Choose` with `LazyList`, which effectively gives us an iterator over all the possible parsings of our input text. This produces a value with type `Eff [] (LazyList (Either String (List Char, List Char)))`. When we hit an application of `Choose`, like `oneOfM`, all possibilities will be tried and each one will be added to our output `LazyList`. Because this is a `LazyList`, and not a `List`, only values we actually inspect are generated, allowing parsing to terminate after the first successful parse without having to generate the rest of the list. Note that we have an independent possible state value for each slot in the list, this speaks to this effect stack, when run in this order, providing a sort of branching behavior for states, allowing different branches in the `Choose` effect to modify their state without impacting the state of other branches. From there, we run our `lazyRights` helper function over the outputs `LazyList` to discard parsing paths that result in an error, extract the first element of each tuple, get the head of the list, if one still exists, and use `pack` to convert the contents of the resulting `Maybe (List Char)` to a string. Then a little bit of debug logging, and return the output. ```idris runParser : Has Logger fs => Parser (List Char) -> String -> Eff fs $ Maybe String runParser x str = do info "Parsing: \{str}" -- let outputs = -- runChoose {f = LazyList} . runExcept . runState (unpack str) $ x -- ?parser_types outputs <- lift . runChoose {f = LazyList} . runExcept . runState (unpack str) $ x let output = map pack . head' . map fst . lazyRights $ outputs case output of Nothing => debug "Failed: \{show outputs}" Just y => do debug "Got: \{y}" trace "\{show outputs}" pure output where lazyRights : LazyList (Either a b) -> LazyList b lazyRights [] = [] lazyRights (Left _ :: xs) = lazyRights xs lazyRights (Right x :: xs) = x :: lazyRights xs ``` ## Escaping characters This is much more boring the the parsing, we just do simple recursive pattern matching on the characters of the provided string, escaping `"` and `\`, and surround the resulting string with quotes. ```idris escape : String -> String escape str = "\"\{pack . escape' . unpack $ str}\"" where escape' : List Char -> List Char escape' [] = [] escape' ('"' :: xs) = '\\' :: '"' :: escape' xs escape' ('\\' :: xs) = '\\' :: '\\' :: escape' xs escape' (x :: xs) = x :: escape' xs ``` ## Part Functions ### Part 1 Convert the inputs into a list of lines, traverse our parser over it, deal with possible failures by sequencing the `List (Maybe String)` into a `Maybe (List String)`, and the compute the difference in character counts. ```idris part1 : Eff (PartEff String) (Int, List String) part1 = do inputs <- map lines $ askAt "input" outputs <- traverse (runParser string) inputs Just outputs <- pure $ sequence outputs | _ => throw "Some strings failed to parse" let difference = sum $ zipWith (\x, y => strLength x - strLength y) inputs outputs pure (difference, inputs) ``` ### Part 2 Map our character escaping function over our input string and compute the difference in character counts. Make a little stop along the way to ensure that escape -> parse round trips without changing the content of the string. ```idris part2 : List String -> Eff (PartEff String) Int part2 inputs = do let outputs = map escape inputs Just reversed_outputs <- map sequence . traverse (runParser string) $ outputs | _ => throw "Reversing escaping of the inputs failed" unless (all id $ zipWith (==) inputs reversed_outputs) $ do debug . delay . joinBy "\n" . map show . filter (\(x, y, z) => x /= z) $ zip3 inputs outputs reversed_outputs throw "Parsed outputs were not identical to inputs" let difference = sum $ zipWith (\x, y => strLength y - strLength x) inputs outputs pure difference ```