Printf post time
This commit is contained in:
parent
45dcb3d177
commit
585b2be5ab
2 changed files with 222 additions and 60 deletions
|
@ -4,7 +4,7 @@
|
|||
"hidden": false,
|
||||
"idris": true,
|
||||
"ipkg": "/home/nathan/Projects/Blog/projects/Idris/Idris.ipkg",
|
||||
"posted-at": "2025-02-09T06:23:37.499533-05:00",
|
||||
"posted-at": "2025-02-11T18:24:30.059245-05:00",
|
||||
"slugs": [
|
||||
],
|
||||
"source": "/home/nathan/Projects/Blog/projects/Idris/src/LessMacrosMoreTypes/Printf.md",
|
||||
|
|
|
@ -7,51 +7,37 @@ import Data.List
|
|||
import System
|
||||
```
|
||||
|
||||
While C can provide convenient string formatting by having hideously memory
|
||||
unsafe variadics, and dynamic languages, like python, can do the same while
|
||||
being memory safe by not being type safe, many type safe languages, such as
|
||||
Rust, are forced to provide such functionality through the use of a macro.
|
||||
Dependently typed languages, like Idris, can provide a printf like formatting
|
||||
interface, while maintaining both memory and type safety, without the need for
|
||||
the macro. We will explore this by implementing a simplified version of `printf`
|
||||
in Idris from scratch.
|
||||
While C can provide "convenient" string formatting by having hideously unsafe variadics, and dynamic languages, like python, can do the same, many type safe languages, such as Rust, are forced to provide such functionality through the use of a macro. Dependently typed languages, like Idris, can provide a `printf` like formatting interface, while maintaining both memory and type safety, without the need for macros. We will explore this by implementing a simplified version of `printf` in Idris from scratch.
|
||||
|
||||
This article is inspired by an exercise from chapter 6 of
|
||||
[Type Driven Development with Idris](https://www.manning.com/books/type-driven-development-with-idris),
|
||||
and is written as a literate Idris file, with the source available
|
||||
[here](https://git.stranger.systems/thatonelutenist/website/src/branch/trunk/projects/Idris/src/LessMacrosMoreTypes/Printf.md).
|
||||
This article is inspired by an exercise from chapter 6 of [Type Driven Development with Idris](https://www.manning.com/books/type-driven-development-with-idris), and is written as a literate Idris file, with the source available [here](https://git.stranger.systems/thatonelutenist/website/src/branch/trunk/projects/Idris/src/LessMacrosMoreTypes/Printf.md).
|
||||
|
||||
## Gameplan
|
||||
|
||||
Our goal is to provide a printf function that can be called, much like it's C
|
||||
equivalent:
|
||||
Our goal is to provide a `printf` function that can be called much like it's C equivalent:
|
||||
|
||||
> [!NOTE]
|
||||
> As this is a literate Idris document, and we haven't defined our `printf`
|
||||
> function yet, we have to use a `failing` block to ask the compiler to check
|
||||
> that this code parses, and syntax highlight it for us, but not attempt to
|
||||
> actually compile it.
|
||||
> [!Note]
|
||||
> As this is a literate Idris document, and we haven't defined our `printf` function yet, we have to use a `failing` block to ask the compiler to check that this code parses, and syntax highlight it for us, but not attempt to actually compile it.
|
||||
|
||||
```idris
|
||||
failing
|
||||
printf "%s %d %2d" "hello" 1 2
|
||||
example : String
|
||||
example = printf "%s %d %2d" "hello" 1 2
|
||||
```
|
||||
|
||||
Idris lacks a dedicated facility for variadics, but we can call functions in
|
||||
type signatures, Idris allows us to manipulate types as first class values, and
|
||||
we can use the runtime values of previous arguments to the function we are
|
||||
defining as arguments to our type-level function.
|
||||
Idris lacks a dedicated facility for variadics, but we can call functions in type signatures, which gives us an in for implementing them.
|
||||
|
||||
To get our variadic `printf` function, we can parse our format string into a
|
||||
data structure, then pass that data structure into a type-level function that
|
||||
calculates the rest of the type signature of our `printf` function based on its
|
||||
contents.
|
||||
Idris allows us to manipulate types as first class values, and we can use the runtime values of previous arguments to the function we are declaring as arguments to later type-level functions. These type-level functions can, furthermore, recurse into themselves to produce type signatures of varying length and content.
|
||||
|
||||
> [!TIP]
|
||||
> Idris requires that these type level functions be [_total_](https://idris2.readthedocs.io/en/latest/tutorial/theorems.html#totality-checking) for them to be expanded at compile time, meaning they are known to the compiler to return a value in constant time for all possible inputs.
|
||||
|
||||
This allows us to produce variadic functions, so long as the number of arguments and their types can be determined from one of the arguments of the function before the variadic portion.
|
||||
|
||||
To get our variadic `printf` function, we can parse our format string into a data structure, then pass that data structure into a type-level function that calculates the rest of the type signature of our `printf` function based on contents of the format string.
|
||||
|
||||
## Parsing a Format String
|
||||
|
||||
First, we need a data structure to describe our format string. We define the
|
||||
`Format` data type, with constructors for each of the format specifiers we will
|
||||
be supporting, as well as a constructor to hold literal components.
|
||||
First, we need a data structure to describe our format string. We define the `Format` data type, with constructors for each of the format specifiers we will be supporting, as well as a constructor to hold literal components.
|
||||
|
||||
```idris
|
||||
data Format : Type where
|
||||
|
@ -68,15 +54,10 @@ data Format : Type where
|
|||
End : Format
|
||||
```
|
||||
|
||||
We'll need to be able to parse numbers for our `PaddedNumber` constructor, so
|
||||
we'll write a little helper function to handle that component of the parsing.
|
||||
Simply keep pulling off characters, converting them to integers by shifting
|
||||
their ordinal values.
|
||||
We'll need to be able to parse numbers for our `PaddedNumber` constructor, so we'll write a little helper function to handle that component of parsing. We'll simply keep pulling off characters, converting them to integers by manipulating their ordinal values, until we hit a non-digit.
|
||||
|
||||
> [!WARNING]
|
||||
> On Idris 2's chez scheme backend, these are unicode ordinals, where the digit
|
||||
> characters are in numerical order, but the values are backend dependent and
|
||||
> this code is not guaranteed to work properly on other backends.
|
||||
> On Idris 2's chez scheme backend, these are unicode ordinals, where the digit characters are in numerical order, but the values are backend dependent and this code is not guaranteed to work properly on other backends.
|
||||
|
||||
```idris
|
||||
parseNumber : (xs : List Char) -> (acc : Nat) -> (Nat, List Char)
|
||||
|
@ -88,10 +69,7 @@ parseNumber (x :: xs) acc =
|
|||
else (acc, x :: xs)
|
||||
```
|
||||
|
||||
We'll also want another one to scoop up a literal into a `List Char`, continuing
|
||||
until we hit the end of the string or the next `%`. This is optional, we could
|
||||
have each literal consist of an individual char, but we will go ahead and group
|
||||
them together.
|
||||
We'll also want another one to scoop up a literal into a `List Char`, continuing until we hit the end of the string or the next `%`. This is optional, we could have each literal consist of an individual char, but we will go ahead and group them together.
|
||||
|
||||
```idris
|
||||
parseLiteral : (xs : List Char) -> (List Char, List Char)
|
||||
|
@ -102,10 +80,9 @@ parseLiteral (x :: xs) =
|
|||
in (x :: literal, rest)
|
||||
```
|
||||
|
||||
Parse our format string into our `Format` data structure. The specifics of the
|
||||
parsing here aren't really material to the main point of this article, but we
|
||||
use a basic pattern matching approach, calling into our helper functions as
|
||||
appropriate.
|
||||
Next, parse our format string into an instance of the `Format` data structure. The specifics of the parsing aren't really material to the main point of this article, but we use a basic pattern matching approach, calling into our helper functions as appropriate.
|
||||
|
||||
We return a `Maybe Format` to signal failure in the case of an invalid format string, and trigger an error when the programmer attempts to use an invalid format string.
|
||||
|
||||
```idris
|
||||
parseFormat : (xs : List Char) -> Maybe Format
|
||||
|
@ -143,7 +120,14 @@ parseFormat (x :: xs) =
|
|||
|
||||
## Calculating a Type From a Format String
|
||||
|
||||
```idris
|
||||
Since our parser returns a `Maybe Format`, we will define two variants of our `PrintfType` type-level function. One that handles the inner `Format`, and another that handles a `Maybe Format`, returning a type signature that is impossible to satisfy in the `Nothing` case, and calling our first variant in the `Just` case.
|
||||
|
||||
The structure of this type-level function is rather simple, we just pattern match on our `Format` argument, and prepend the corresponding argument type to the rest of the type signature, computed through a recursive call on the `next` component of the `Format`. When we hit the `End`, we return the `String` type, which is the final return value of our function.
|
||||
|
||||
> [!TIP]
|
||||
> The `->` that appears in the body of the function is actually the exact same `->` that appears in the type signature, in both cases it is the constructor for function types. Idris uses the same language for type signatures and for function bodies, so anything you can use in one place, you can use in the other.
|
||||
|
||||
```idris
|
||||
PrintfType' : Format -> Type
|
||||
PrintfType' (Number next) =
|
||||
(num : Nat) -> PrintfType' next
|
||||
|
@ -154,7 +138,9 @@ PrintfType' (Str next) =
|
|||
PrintfType' (Literal literal next) =
|
||||
PrintfType' next
|
||||
PrintfType' End = String
|
||||
```
|
||||
|
||||
```idris
|
||||
PrintfType : Maybe Format -> Type
|
||||
PrintfType Nothing = Void -> String
|
||||
PrintfType (Just x) = PrintfType' x
|
||||
|
@ -173,6 +159,15 @@ left_pad len pad str =
|
|||
else str
|
||||
```
|
||||
|
||||
Now, we need to be able to produce our formatting function from a `Maybe Format`. We'll use the same two variants trick as above, but put the variant that operates directly on `Format`s in a `where` block, since we won't need to call it from anywhere else.
|
||||
|
||||
We build the function argument by argument, pattern matching on the `Format`, producing a lambda that accepts the argument type demanded by the current specifier. Then recurse into our inner `printfFmt'` function to handle the rest of the specifiers and arguments, applying the appropriate string conversion to the argument and tacking it to the end of the accumulator string.
|
||||
|
||||
The accumulator string is returned verbatim when we hit `End`.
|
||||
|
||||
> [!NOTE]
|
||||
> Much like Haskell and other functional programming languages, Idris, semantically, only actually has functions of one argument. Multi-argument functions actually accept one argument, returning another function that handles the rest of the arguments.
|
||||
|
||||
```idris
|
||||
printfFmt : (fmt : Maybe Format) -> (acc : String) -> PrintfType fmt
|
||||
printfFmt Nothing acc =
|
||||
|
@ -193,13 +188,17 @@ printfFmt (Just x) acc = printfFmt' x acc
|
|||
|
||||
### With a Format String
|
||||
|
||||
`printf` is easily defined a simple wrapper, converting the provided string to a `Format` and passing it to `printfFmt`. After accepting the format string argument, we parse a `Maybe Format` from it, providing the parsed value to our `PrintfType` function to calculate the rest of our type signature.
|
||||
|
||||
> [!TIP]
|
||||
> The `_` syntax that appears in this function introduces an anonymous hole. When the value of a hole is forced by type checking (resolved), the compiler fills in its value for you. Holes defined with `_` will result in an "unsolved holes" compile error if unresolved.
|
||||
|
||||
```idris
|
||||
printf : (fmt : String) -> PrintfType (parseFormat (unpack fmt))
|
||||
printf fmt = printfFmt _ ""
|
||||
```
|
||||
|
||||
We can call `printf` as expected, with the number of and types of the arguments
|
||||
being determined by the provided format string:
|
||||
We can call `printf` as expected, with the number of and types of the arguments being determined by the provided format string:
|
||||
|
||||
```idris hide
|
||||
-- @@test printf hello world
|
||||
|
@ -208,28 +207,191 @@ helloWorld = do
|
|||
pure $
|
||||
```
|
||||
|
||||
<div class="unit-test">
|
||||
```idris
|
||||
printf "%s %s%s %3d %d" "Hello" "world" "!" 1 23 == "Hello world! 001 23"
|
||||
```
|
||||
</div>
|
||||
|
||||
It will even fail to compile if you attempt to provide arguments to an invalid
|
||||
format string, which we can demonstrate by trying to apply a padding modifier to
|
||||
a string specifier:
|
||||
In fact, our `printf` is an improvement over a simple C-style printf[^1], it will fail to compile if you attempt to provide arguments to an invalid format string, which we can demonstrate by trying to apply a padding modifier to a string specifier:
|
||||
|
||||
> [!NOTE]
|
||||
> `failing` blocks have an additional feature, they will trigger a compiler
|
||||
> error if their contents _do_ compile successfully.
|
||||
> [!TIP]
|
||||
> `failing` blocks have an additional feature, they will trigger a compiler error if their contents _do_ compile successfully, so we can use them to verify that incorrect use of our function results in a compiler error.
|
||||
|
||||
```idris
|
||||
failing
|
||||
printf "Hello %s %3s" "world" "!"
|
||||
invalidSpecifier : String
|
||||
invalidSpecifier = printf "Hello %s %3s" "world" "!"
|
||||
```
|
||||
|
||||
## Working With Run-Time Format Strings
|
||||
It will also fail to compile if too many arguments are passed (in Idris, it isn't desirable to fail when passed too few arguments, due to partial application being a core idiom of the language), or the arguments are of an incorrect type:
|
||||
|
||||
```idris
|
||||
failing
|
||||
tooManyArguments : String
|
||||
tooManyArguments = printf "%s %s" "Hello" "World" "!"
|
||||
```
|
||||
|
||||
```idris
|
||||
failing
|
||||
wrongArgumentType : String
|
||||
wrongArgumentType = printf "%s %s" "Hello" 1
|
||||
```
|
||||
|
||||
## Working With Runtime Format Strings
|
||||
|
||||
Since we'll be talking about situations where the compiler doesn't have insight into the value of the format string, we'll define a little helper function to hide a value from the type checker.
|
||||
|
||||
> [!NOTE]
|
||||
> The evaluator Idris uses for expanding terms at the type level does not have an IO context available to it, so round tripping through `IO` with `unsafePerformIO` results in the wrapped term being opaque during compile time.
|
||||
|
||||
```idris
|
||||
blackbox : a -> a
|
||||
blackbox x = unsafePerformIO $ pure x
|
||||
```
|
||||
|
||||
Our `printf` function works just fine when provided a format string that's known at compile time, but as written, it isn't really usable with format strings that are provided via IO at runtime.
|
||||
|
||||
When the format string isn't known to the compiler at compile time, it can't expand the `PrintfFmt` function, leaving it unable to compute the rest of the type signature.
|
||||
|
||||
```idris
|
||||
failing
|
||||
runtimeFormatString : String
|
||||
runtimeFormatString = printf (blackbox "%s %s") "Hello" "World"
|
||||
```
|
||||
|
||||
We can construct a variant of `printf` that will work with runtime format strings by defining a custom, heterogenous list-like structure, and making a fallible version of our `printf` function based on this data structure:
|
||||
|
||||
```idris
|
||||
data PrintfInput : Type where
|
||||
INum : (d : Nat) -> (next : PrintfInput) -> PrintfInput
|
||||
IStr : (s : String) -> (next : PrintfInput) -> PrintfInput
|
||||
IEnd : PrintfInput
|
||||
```
|
||||
|
||||
Now we need a `printfFmt` analog that works with our `PrintfInput` type. It accepts a `fmt : Maybe Format`, the corresponding function of type `PrintfType fmt`, and the argument inputs as a `PrintfInput` structure. We use the same two-variants trick as before to handle the `Maybe`, and the rest is basic pattern matching to make sure the type of the input matches up with the type of the format specifier, and passing that input in as the next argument to the formatting function.
|
||||
|
||||
```idris
|
||||
printfFmtInput :
|
||||
(fmt : Maybe Format) -> (fn : PrintfType fmt) -> (input : PrintfInput)
|
||||
-> Maybe String
|
||||
printfFmtInput Nothing fn input = Nothing
|
||||
printfFmtInput (Just x) fn input = printfFmtInput' x fn input
|
||||
where
|
||||
printfFmtInput' :
|
||||
(fmt' : Format) -> (fn : PrintfType' fmt') -> (input : PrintfInput)
|
||||
-> Maybe String
|
||||
-- Handle cases where the input type matches up with the formatter type
|
||||
printfFmtInput' (Number next) fn (INum d y) =
|
||||
printfFmtInput' next (fn d) y
|
||||
printfFmtInput' (PaddedNumber digits next) fn (INum d y) =
|
||||
printfFmtInput' next (fn d) y
|
||||
printfFmtInput' (Str next) fn (IStr s y) =
|
||||
printfFmtInput' next (fn s) y
|
||||
printfFmtInput' (Literal literal next) fn input =
|
||||
printfFmtInput' next fn input
|
||||
-- Handle cases where the input type does not match up with the formatter type
|
||||
printfFmtInput' End fn IEnd = Just fn
|
||||
printfFmtInput' (Number next) fn _ = Nothing
|
||||
printfFmtInput' (PaddedNumber digits next) fn _ = Nothing
|
||||
printfFmtInput' (Str next) fn _ = Nothing
|
||||
printfFmtInput' End fn _ = Nothing
|
||||
```
|
||||
|
||||
Our `printfInput` function calls `printf` to produce the formatting function, then passes that formatting function to `printfFmtInput` along with the provided input structure.
|
||||
|
||||
```idris
|
||||
printfInput : (fmt : String) -> (input : PrintfInput) -> Maybe String
|
||||
printfInput fmt input = printfFmtInput _ (printf fmt) input
|
||||
```
|
||||
|
||||
Now that it's all put together, we can now use `printfInput` with format strings that are generated via IO.
|
||||
|
||||
```idris hide
|
||||
-- @@test dynamic printf
|
||||
worksNow : IO Bool
|
||||
worksNow = do
|
||||
pure $
|
||||
```
|
||||
<div class="unit-test">
|
||||
```idris
|
||||
printfInput (blackbox "%s %s") (IStr "Hello" (IStr "World" IEnd)) == Just "Hello World"
|
||||
```
|
||||
</div>
|
||||
|
||||
Being fully honest, I'm not entirely sure what use cases this particular function would make sense for, but it provides a first taste of working with type signatures that depend on values that aren't known until runtime, which is a concept we will expand upon in future entries in this series.
|
||||
|
||||
|
||||
## Conclusions
|
||||
|
||||
### Why this API isn't really a great idea
|
||||
We've made a fun little function that works much like C's `printf` function or Rust's `println!`/`format!` macros, but without relying on magic variadic mechanisms that introduce unsafe behavior and require special, individualized support from the compiler to check, and without requiring macros.
|
||||
|
||||
<!-- Doesn't fail to compile if you don't pass any arguments -->
|
||||
Avoiding using a macro also gives us a lot of neat bonuses, like `printf` automatically playing nice with partial application and currying, with no extra effort on our part.
|
||||
|
||||
```idris hide
|
||||
-- @@test partial application
|
||||
partialApplication : IO Bool
|
||||
partialApplication = do
|
||||
pure $
|
||||
```
|
||||
<div class="unit-test">
|
||||
```idris
|
||||
map (printf "Hello %s") ["Alice", "Bob"] == ["Hello Alice", "Hello Bob"]
|
||||
```
|
||||
</div>
|
||||
|
||||
|
||||
However, the interface this function presents isn't _quite_ ideal, we could do a lot better.
|
||||
|
||||
For instance, since we make the type signature one that isn't possible to satisfy in the case where the format string is invalid, there isn't actually a compiler error if you provide an invalid format string, but don't provide any arguments.
|
||||
|
||||
This isn't a _huge_ deal, since the produced function isn't going to doing anything if you don't have any arguments, but it constitutes a minor annoyance. You can also quite easily end up surprising yourself later with this if you are making use of partial application.
|
||||
|
||||
```idris hide
|
||||
BadExample : ?
|
||||
BadExample =
|
||||
```
|
||||
|
||||
<div class="warn-unit-test">
|
||||
```idris
|
||||
printf "Hello %s %3s"
|
||||
```
|
||||
</div>
|
||||
|
||||
Another annoyance here is the error messages, which are quite confusing and opaque, at least at the time of writing, taking another look at our `invalidSpecifer` failing example:
|
||||
|
||||
```idris
|
||||
failing
|
||||
invalidSpecifier : String
|
||||
invalidSpecifier = printf "Hello %s %3s" "world" "!"
|
||||
```
|
||||
|
||||
The error message we get if we try to have this outside of a failing block, at time of writing, reads like:
|
||||
|
||||
```
|
||||
[ build ] 1/3: Building LessMacrosMoreTypes.Printf (src/LessMacrosMoreTypes/Printf.md)
|
||||
[ build ] Error: While processing right hand side of invalidSpecifier. When unifying:
|
||||
[ build ] String
|
||||
[ build ] and:
|
||||
[ build ] ?argTy -> String
|
||||
[ build ] Mismatch between: String and ?argTy -> String.
|
||||
[ build ]
|
||||
[ build ] LessMacrosMoreTypes.Printf:359:20--359:53
|
||||
[ build ] 355 | Another annoyance here is the error messages, which are quite confusing and opaque, at least at the time of writing, taking another look at our `invalidSpecifer` failing example:
|
||||
[ build ] 356 |
|
||||
[ build ] 357 | ```idris
|
||||
[ build ] 358 | invalidSpecifier : String
|
||||
[ build ] 359 | invalidSpecifier = printf "Hello %s %3s" "world" "!"
|
||||
[ build ] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
```
|
||||
|
||||
This correctly identifies the `printf` call as the source of the error, but doesn't zero in on the format string or the specifier, and the error message here is just a completely generic one about mismatch during unification that doesn't really tell us about _what_ went wrong.
|
||||
|
||||
Some of the fault here lies on the compiler, Idris 2 is currently immature in a lot of ways, to be expected from a pre-1.0 compiler, and error messages are very much one of them, but the way we have written this function also isn't giving the compiler much help.
|
||||
|
||||
The need for a bespoke data structure for `PrintfInput` also isn't ideal.
|
||||
|
||||
We will delve in ways to rectify all of these complaints in future entries in this series.
|
||||
|
||||
|
||||
[^1]: Ignoring for a second the special handling for format strings in modern compilers
|
||||
|
|
Loading…
Add table
Reference in a new issue