351 lines
11 KiB
Markdown
351 lines
11 KiB
Markdown
# Year 2015 Day 8
|
|
|
|
This day provides a more in depth introduction to writing effectful parsers,
|
|
making use of state and non-determinism in our parsers.
|
|
|
|
<!-- idris
|
|
module Years.Y2015.Day8
|
|
|
|
import Control.Eff
|
|
|
|
import Data.Primitives.Interpolation
|
|
|
|
import Runner
|
|
-->
|
|
|
|
```idris
|
|
import Data.String
|
|
import Data.Vect
|
|
import Data.List.Lazy
|
|
import Data.Either
|
|
```
|
|
|
|
## Parsing
|
|
|
|
A "Parser" is an effectful computation that has access to a list of chars as
|
|
state, can throw exceptions of type `String`, and has non-determinism through
|
|
the `Choose` effect, which consumes some of the state to potentially return a
|
|
value.
|
|
|
|
### Basic operations
|
|
|
|
Get the next character out of the parser state, updating the state to consume
|
|
that character.
|
|
|
|
```idris
|
|
nextChar : Has (State (List Char)) fs => Has (Except String) fs => Eff fs Char
|
|
nextChar = do
|
|
c :: rest <- get
|
|
| [] => throw "End of input"
|
|
put rest
|
|
pure c
|
|
```
|
|
|
|
Attempt to parse a single character based on the supplied predicate, consuming
|
|
the character from the state and throwing the provided error if the predicate
|
|
does not hold over the consumed character.
|
|
|
|
```idris
|
|
char : Has (State (List Char)) fs => Has (Except String) fs =>
|
|
(pred : Char -> Bool) -> (err : Char -> String) -> Eff fs Char
|
|
char pred err = do
|
|
c <- nextChar
|
|
unless (pred c) (throw $ err c)
|
|
pure c
|
|
```
|
|
|
|
Type alias for a parser
|
|
|
|
```idris
|
|
Parser : (res : Type) -> Type
|
|
Parser res = Eff [State (List Char), Except String, Choose, Logger] res
|
|
```
|
|
|
|
Parse 0+ of a thing, speculatively attempting to apply the given parser. In the
|
|
event the supplied parser fails, catch the error, unwind changes to the state,
|
|
and return an empty list, otherwise append the parsed element to the head of the
|
|
returned list and recurse.
|
|
|
|
The rewinding of the state on failure _could_ be handled implicitly by the
|
|
effect stack if `Except` was structured a bit differently, but that's a topic
|
|
for another day.
|
|
|
|
```idris
|
|
many : (f : Parser t) -> Parser (List t)
|
|
many f = do
|
|
state <- get
|
|
Just x <- lift $ catch (\ _ => pure Nothing) (map Just f)
|
|
| Nothing => do
|
|
put state
|
|
pure []
|
|
map (x ::) $ many f
|
|
```
|
|
|
|
"Parse" the end of the input, returning a unit if we are at the end of input,
|
|
throwing an error otherwise.
|
|
|
|
```idris
|
|
endOfInput : Parser ()
|
|
endOfInput = do
|
|
[] <- get
|
|
| xs => throw "Expected end of input, state non empty: `\{pack xs}`"
|
|
pure ()
|
|
```
|
|
|
|
### Character Classes
|
|
|
|
Parse a single `"`, throwing an error if the current character is anything else.
|
|
Returns a unit since there is only one possible character this parses, and this
|
|
avoids us having to discard the character later in our `string` parser.
|
|
|
|
```idris
|
|
quote : Parser ()
|
|
quote = do
|
|
_ <- char (== '"') (\x => "Expected `\"`, got `\{String.singleton x}`")
|
|
pure ()
|
|
```
|
|
|
|
Parse any character except `\` or `"`
|
|
|
|
```idris
|
|
normal : Parser Char
|
|
normal =
|
|
char
|
|
(\x => not $ any (== x) (the (List _) ['\\', '"']))
|
|
(\x => "Expected normal, got `\{String.singleton x}`")
|
|
```
|
|
|
|
#### Escaped Characters
|
|
|
|
Parse the character sequence `\"`, returning just the `"`. Despite the fact that
|
|
can only return one possible character, like `quote` above, we return the parsed
|
|
character, as we intend to provide all the escaped character parsers to the
|
|
`oneOfM` combinator later.
|
|
|
|
```idris
|
|
eQuote : Parser Char
|
|
eQuote = do
|
|
_ <- char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`")
|
|
char (== '"') (\x => "Expected `\"`, got `\{String.singleton x}`")
|
|
```
|
|
|
|
Parse the character sequence `\\`, returning just the second `\`.
|
|
|
|
```idris
|
|
eSlash : Parser Char
|
|
eSlash = do
|
|
_ <- char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`")
|
|
char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`")
|
|
```
|
|
|
|
Convert a lowercase hexadecimal digit to its numerical value.
|
|
|
|
```idris
|
|
fromHex : Char -> Int
|
|
fromHex c =
|
|
if ord c >= 97
|
|
then ord c - 87
|
|
else ord c - 48
|
|
```
|
|
|
|
Parse a character sequence `\xAB`, where `AB` are hexadecimal digits, and decode
|
|
the numerical value of `AB`, as a hexadecimal number, into its corresponding
|
|
character.
|
|
|
|
```idris
|
|
eHex : Parser Char
|
|
eHex = do
|
|
_ <- char (== '\\') (\x => "Expected `\\`, got `\{String.singleton x}`")
|
|
_ <- char (== 'x') (\x => "Expected `x`, got `\{String.singleton x}`")
|
|
[x, y] <- map (map $ fromHex . toLower) .
|
|
sequence .
|
|
Vect.replicate 2 $
|
|
(char
|
|
isHexDigit
|
|
(\x => "Expected hex digit, got `\{String.singleton x}`"))
|
|
pure . chr $ x * 0x10 + y
|
|
```
|
|
|
|
Use the `oneOfM` combinator to combine our escaped character parsers into a
|
|
single character class. `oneOfM` uses the `Choice` effect to try all of the
|
|
provided parsers, conceptually in parallel, returning all of the results.
|
|
|
|
```idris
|
|
escaped : Parser Char
|
|
escaped = oneOfM $ the (List _) [eQuote, eSlash, eHex]
|
|
```
|
|
|
|
### Top Level Class
|
|
|
|
Combine our `normal` and `escaped` parsers into a single parser for non-quote
|
|
characters.
|
|
|
|
```idris
|
|
character : Parser Char
|
|
character = oneOfM $ the (List _) [normal, escaped]
|
|
```
|
|
|
|
Parse a string literal by describing its layout as a quote (`"`), followed by
|
|
any number of characters, then another quote, followed by the end of input.
|
|
Return the characters between the outer quotes.
|
|
|
|
```idris
|
|
string : Parser (List Char)
|
|
string = do
|
|
quote
|
|
xs <- many character
|
|
quote
|
|
endOfInput
|
|
pure xs
|
|
```
|
|
|
|
### Running a parser
|
|
|
|
Run a parser, with some debug logging, by peeling the parsing effects of of the
|
|
type. The order is important here, remember that function composition "runs"
|
|
right to left.
|
|
|
|
We peel the state off the type first, so that we can get implicit "rewinding" of
|
|
the state inside of our combinators, like `oneOfM`.
|
|
|
|
For a full understanding of what's going on here, we need to see how the type
|
|
signature changes as we peel effects off the type, you can uncomment the
|
|
commented lines, and comment out the rest of the function, modifying the
|
|
`let output =` line to follow along yourself, though the types you get may look
|
|
a little different due to idris evaluating type alaises like `Eff` for you, I am
|
|
keeping them in aliased forms, and excluding `Logger`, which doesn't impact the
|
|
semantic of parsing, to keep the examples concise:
|
|
|
|
- `runstate (unpack str) $ x`
|
|
|
|
This produces a value with type
|
|
`Eff [Except String, Choose] (List Char, List Char)`. In this tuple of values,
|
|
the first element is the actual output of the parser, and the second element
|
|
is the state after the parser has run.
|
|
|
|
- `runExcept . runState (unpack str) $ x`
|
|
|
|
This produces a value with type
|
|
`Eff [Choose] (Either String (List Char, List Char))`. This corresponds to a
|
|
computation that either returns our tuple of output and state from before, or
|
|
an error. Important to note here is that, in the error case, we only return a
|
|
`String` (our error type), our state is discarded.
|
|
|
|
- `runChoose {f = LazyList} . runExcept . runState (unpack str) $ x`
|
|
|
|
The `Choose` effect, works with any type that implements the `Alternative`
|
|
interface, and the choice of type impacts the semantics. A full discussion of
|
|
this is beyond the scope for today, but we chose to "run" `Choose` with
|
|
`LazyList`, which effectively gives us an iterator over all the possible
|
|
parsings of our input text.
|
|
|
|
This produces a value with type
|
|
`Eff [] (LazyList (Either String (List Char, List Char)))`. When we hit an
|
|
application of `Choose`, like `oneOfM`, all possibilities will be tried and
|
|
each one will be added to our output `LazyList`. Because this is a `LazyList`,
|
|
and not a `List`, only values we actually inspect are generated, allowing
|
|
parsing to terminate after the first successful parse without having to
|
|
generate the rest of the list.
|
|
|
|
Note that we have an independent possible state value for each slot in the
|
|
list, this speaks to this effect stack, when run in this order, providing a
|
|
sort of branching behavior for states, allowing different branches in the
|
|
`Choose` effect to modify their state without impacting the state of other
|
|
branches.
|
|
|
|
From there, we run our `lazyRights` helper function over the outputs `LazyList`
|
|
to discard parsing paths that result in an error, extract the first element of
|
|
each tuple, get the head of the list, if one still exists, and use `pack` to
|
|
convert the contents of the resulting `Maybe (List Char)` to a string. Then a
|
|
little bit of debug logging, and return the output.
|
|
|
|
```idris
|
|
runParser : Has Logger fs => Parser (List Char) -> String
|
|
-> Eff fs $ Maybe String
|
|
runParser x str = do
|
|
info "Parsing: \{str}"
|
|
-- let outputs =
|
|
-- runChoose {f = LazyList} . runExcept . runState (unpack str) $ x
|
|
-- ?parser_types
|
|
outputs <-
|
|
lift . runChoose {f = LazyList} . runExcept . runState (unpack str) $ x
|
|
let output = map pack . head' . map fst . lazyRights $ outputs
|
|
case output of
|
|
Nothing =>
|
|
debug "Failed: \{show outputs}"
|
|
Just y => do
|
|
debug "Got: \{y}"
|
|
trace "\{show outputs}"
|
|
pure output
|
|
where
|
|
lazyRights : LazyList (Either a b) -> LazyList b
|
|
lazyRights [] = []
|
|
lazyRights (Left _ :: xs) = lazyRights xs
|
|
lazyRights (Right x :: xs) = x :: lazyRights xs
|
|
```
|
|
|
|
## Escaping characters
|
|
|
|
This is much more boring the the parsing, we just do simple recursive pattern
|
|
matching on the characters of the provided string, escaping `"` and `\`, and
|
|
surround the resulting string with quotes.
|
|
|
|
```idris
|
|
escape : String -> String
|
|
escape str = "\"\{pack . escape' . unpack $ str}\""
|
|
where
|
|
escape' : List Char -> List Char
|
|
escape' [] = []
|
|
escape' ('"' :: xs) = '\\' :: '"' :: escape' xs
|
|
escape' ('\\' :: xs) = '\\' :: '\\' :: escape' xs
|
|
escape' (x :: xs) = x :: escape' xs
|
|
```
|
|
|
|
## Part Functions
|
|
|
|
### Part 1
|
|
|
|
Convert the inputs into a list of lines, traverse our parser over it, deal with
|
|
possible failures by sequencing the `List (Maybe String)` into a
|
|
`Maybe (List String)`, and the compute the difference in character counts.
|
|
|
|
```idris
|
|
part1 : Eff (PartEff String) (Int, List String)
|
|
part1 = do
|
|
inputs <- map lines $ askAt "input"
|
|
outputs <- traverse (runParser string) inputs
|
|
Just outputs <- pure $ sequence outputs
|
|
| _ => throw "Some strings failed to parse"
|
|
let difference =
|
|
sum $ zipWith (\x, y => strLength x - strLength y) inputs outputs
|
|
pure (difference, inputs)
|
|
```
|
|
|
|
### Part 2
|
|
|
|
Map our character escaping function over our input string and compute the
|
|
difference in character counts.
|
|
|
|
Make a little stop along the way to ensure that escape -> parse round trips
|
|
without changing the content of the string.
|
|
|
|
```idris
|
|
part2 : List String -> Eff (PartEff String) Int
|
|
part2 inputs = do
|
|
let outputs = map escape inputs
|
|
Just reversed_outputs <- map sequence . traverse (runParser string) $ outputs
|
|
| _ => throw "Reversing escaping of the inputs failed"
|
|
unless (all id $ zipWith (==) inputs reversed_outputs) $ do
|
|
debug . delay . joinBy "\n" . map show . filter (\(x, y, z) => x /= z) $
|
|
zip3 inputs outputs reversed_outputs
|
|
throw "Parsed outputs were not identical to inputs"
|
|
let difference =
|
|
sum $ zipWith (\x, y => strLength y - strLength x) inputs outputs
|
|
pure difference
|
|
```
|
|
|
|
<!-- idris
|
|
public export
|
|
day8 : Day
|
|
day8 = Both 8 part1 part2
|
|
-->
|