Printf post time

This commit is contained in:
Nathan McCarty 2025-02-11 18:24:34 -05:00
parent 45dcb3d177
commit 585b2be5ab
2 changed files with 222 additions and 60 deletions

View file

@ -4,7 +4,7 @@
"hidden": false,
"idris": true,
"ipkg": "/home/nathan/Projects/Blog/projects/Idris/Idris.ipkg",
"posted-at": "2025-02-09T06:23:37.499533-05:00",
"posted-at": "2025-02-11T18:24:30.059245-05:00",
"slugs": [
],
"source": "/home/nathan/Projects/Blog/projects/Idris/src/LessMacrosMoreTypes/Printf.md",

View file

@ -7,51 +7,37 @@ import Data.List
import System
```
While C can provide convenient string formatting by having hideously memory
unsafe variadics, and dynamic languages, like python, can do the same while
being memory safe by not being type safe, many type safe languages, such as
Rust, are forced to provide such functionality through the use of a macro.
Dependently typed languages, like Idris, can provide a printf like formatting
interface, while maintaining both memory and type safety, without the need for
the macro. We will explore this by implementing a simplified version of `printf`
in Idris from scratch.
While C can provide "convenient" string formatting by having hideously unsafe variadics, and dynamic languages, like python, can do the same, many type safe languages, such as Rust, are forced to provide such functionality through the use of a macro. Dependently typed languages, like Idris, can provide a `printf` like formatting interface, while maintaining both memory and type safety, without the need for macros. We will explore this by implementing a simplified version of `printf` in Idris from scratch.
This article is inspired by an exercise from chapter 6 of
[Type Driven Development with Idris](https://www.manning.com/books/type-driven-development-with-idris),
and is written as a literate Idris file, with the source available
[here](https://git.stranger.systems/thatonelutenist/website/src/branch/trunk/projects/Idris/src/LessMacrosMoreTypes/Printf.md).
This article is inspired by an exercise from chapter 6 of [Type Driven Development with Idris](https://www.manning.com/books/type-driven-development-with-idris), and is written as a literate Idris file, with the source available [here](https://git.stranger.systems/thatonelutenist/website/src/branch/trunk/projects/Idris/src/LessMacrosMoreTypes/Printf.md).
## Gameplan
Our goal is to provide a printf function that can be called, much like it's C
equivalent:
Our goal is to provide a `printf` function that can be called much like it's C equivalent:
> [!NOTE]
> As this is a literate Idris document, and we haven't defined our `printf`
> function yet, we have to use a `failing` block to ask the compiler to check
> that this code parses, and syntax highlight it for us, but not attempt to
> actually compile it.
> [!Note]
> As this is a literate Idris document, and we haven't defined our `printf` function yet, we have to use a `failing` block to ask the compiler to check that this code parses, and syntax highlight it for us, but not attempt to actually compile it.
```idris
failing
printf "%s %d %2d" "hello" 1 2
example : String
example = printf "%s %d %2d" "hello" 1 2
```
Idris lacks a dedicated facility for variadics, but we can call functions in
type signatures, Idris allows us to manipulate types as first class values, and
we can use the runtime values of previous arguments to the function we are
defining as arguments to our type-level function.
Idris lacks a dedicated facility for variadics, but we can call functions in type signatures, which gives us an in for implementing them.
To get our variadic `printf` function, we can parse our format string into a
data structure, then pass that data structure into a type-level function that
calculates the rest of the type signature of our `printf` function based on its
contents.
Idris allows us to manipulate types as first class values, and we can use the runtime values of previous arguments to the function we are declaring as arguments to later type-level functions. These type-level functions can, furthermore, recurse into themselves to produce type signatures of varying length and content.
> [!TIP]
> Idris requires that these type level functions be [_total_](https://idris2.readthedocs.io/en/latest/tutorial/theorems.html#totality-checking) for them to be expanded at compile time, meaning they are known to the compiler to return a value in constant time for all possible inputs.
This allows us to produce variadic functions, so long as the number of arguments and their types can be determined from one of the arguments of the function before the variadic portion.
To get our variadic `printf` function, we can parse our format string into a data structure, then pass that data structure into a type-level function that calculates the rest of the type signature of our `printf` function based on contents of the format string.
## Parsing a Format String
First, we need a data structure to describe our format string. We define the
`Format` data type, with constructors for each of the format specifiers we will
be supporting, as well as a constructor to hold literal components.
First, we need a data structure to describe our format string. We define the `Format` data type, with constructors for each of the format specifiers we will be supporting, as well as a constructor to hold literal components.
```idris
data Format : Type where
@ -68,15 +54,10 @@ data Format : Type where
End : Format
```
We'll need to be able to parse numbers for our `PaddedNumber` constructor, so
we'll write a little helper function to handle that component of the parsing.
Simply keep pulling off characters, converting them to integers by shifting
their ordinal values.
We'll need to be able to parse numbers for our `PaddedNumber` constructor, so we'll write a little helper function to handle that component of parsing. We'll simply keep pulling off characters, converting them to integers by manipulating their ordinal values, until we hit a non-digit.
> [!WARNING]
> On Idris 2's chez scheme backend, these are unicode ordinals, where the digit
> characters are in numerical order, but the values are backend dependent and
> this code is not guaranteed to work properly on other backends.
> On Idris 2's chez scheme backend, these are unicode ordinals, where the digit characters are in numerical order, but the values are backend dependent and this code is not guaranteed to work properly on other backends.
```idris
parseNumber : (xs : List Char) -> (acc : Nat) -> (Nat, List Char)
@ -88,10 +69,7 @@ parseNumber (x :: xs) acc =
else (acc, x :: xs)
```
We'll also want another one to scoop up a literal into a `List Char`, continuing
until we hit the end of the string or the next `%`. This is optional, we could
have each literal consist of an individual char, but we will go ahead and group
them together.
We'll also want another one to scoop up a literal into a `List Char`, continuing until we hit the end of the string or the next `%`. This is optional, we could have each literal consist of an individual char, but we will go ahead and group them together.
```idris
parseLiteral : (xs : List Char) -> (List Char, List Char)
@ -102,10 +80,9 @@ parseLiteral (x :: xs) =
in (x :: literal, rest)
```
Parse our format string into our `Format` data structure. The specifics of the
parsing here aren't really material to the main point of this article, but we
use a basic pattern matching approach, calling into our helper functions as
appropriate.
Next, parse our format string into an instance of the `Format` data structure. The specifics of the parsing aren't really material to the main point of this article, but we use a basic pattern matching approach, calling into our helper functions as appropriate.
We return a `Maybe Format` to signal failure in the case of an invalid format string, and trigger an error when the programmer attempts to use an invalid format string.
```idris
parseFormat : (xs : List Char) -> Maybe Format
@ -143,7 +120,14 @@ parseFormat (x :: xs) =
## Calculating a Type From a Format String
```idris
Since our parser returns a `Maybe Format`, we will define two variants of our `PrintfType` type-level function. One that handles the inner `Format`, and another that handles a `Maybe Format`, returning a type signature that is impossible to satisfy in the `Nothing` case, and calling our first variant in the `Just` case.
The structure of this type-level function is rather simple, we just pattern match on our `Format` argument, and prepend the corresponding argument type to the rest of the type signature, computed through a recursive call on the `next` component of the `Format`. When we hit the `End`, we return the `String` type, which is the final return value of our function.
> [!TIP]
> The `->` that appears in the body of the function is actually the exact same `->` that appears in the type signature, in both cases it is the constructor for function types. Idris uses the same language for type signatures and for function bodies, so anything you can use in one place, you can use in the other.
```idris
PrintfType' : Format -> Type
PrintfType' (Number next) =
(num : Nat) -> PrintfType' next
@ -154,7 +138,9 @@ PrintfType' (Str next) =
PrintfType' (Literal literal next) =
PrintfType' next
PrintfType' End = String
```
```idris
PrintfType : Maybe Format -> Type
PrintfType Nothing = Void -> String
PrintfType (Just x) = PrintfType' x
@ -173,6 +159,15 @@ left_pad len pad str =
else str
```
Now, we need to be able to produce our formatting function from a `Maybe Format`. We'll use the same two variants trick as above, but put the variant that operates directly on `Format`s in a `where` block, since we won't need to call it from anywhere else.
We build the function argument by argument, pattern matching on the `Format`, producing a lambda that accepts the argument type demanded by the current specifier. Then recurse into our inner `printfFmt'` function to handle the rest of the specifiers and arguments, applying the appropriate string conversion to the argument and tacking it to the end of the accumulator string.
The accumulator string is returned verbatim when we hit `End`.
> [!NOTE]
> Much like Haskell and other functional programming languages, Idris, semantically, only actually has functions of one argument. Multi-argument functions actually accept one argument, returning another function that handles the rest of the arguments.
```idris
printfFmt : (fmt : Maybe Format) -> (acc : String) -> PrintfType fmt
printfFmt Nothing acc =
@ -193,13 +188,17 @@ printfFmt (Just x) acc = printfFmt' x acc
### With a Format String
`printf` is easily defined a simple wrapper, converting the provided string to a `Format` and passing it to `printfFmt`. After accepting the format string argument, we parse a `Maybe Format` from it, providing the parsed value to our `PrintfType` function to calculate the rest of our type signature.
> [!TIP]
> The `_` syntax that appears in this function introduces an anonymous hole. When the value of a hole is forced by type checking (resolved), the compiler fills in its value for you. Holes defined with `_` will result in an "unsolved holes" compile error if unresolved.
```idris
printf : (fmt : String) -> PrintfType (parseFormat (unpack fmt))
printf fmt = printfFmt _ ""
```
We can call `printf` as expected, with the number of and types of the arguments
being determined by the provided format string:
We can call `printf` as expected, with the number of and types of the arguments being determined by the provided format string:
```idris hide
-- @@test printf hello world
@ -208,28 +207,191 @@ helloWorld = do
pure $
```
<div class="unit-test">
```idris
printf "%s %s%s %3d %d" "Hello" "world" "!" 1 23 == "Hello world! 001 23"
```
</div>
It will even fail to compile if you attempt to provide arguments to an invalid
format string, which we can demonstrate by trying to apply a padding modifier to
a string specifier:
In fact, our `printf` is an improvement over a simple C-style printf[^1], it will fail to compile if you attempt to provide arguments to an invalid format string, which we can demonstrate by trying to apply a padding modifier to a string specifier:
> [!NOTE]
> `failing` blocks have an additional feature, they will trigger a compiler
> error if their contents _do_ compile successfully.
> [!TIP]
> `failing` blocks have an additional feature, they will trigger a compiler error if their contents _do_ compile successfully, so we can use them to verify that incorrect use of our function results in a compiler error.
```idris
failing
printf "Hello %s %3s" "world" "!"
invalidSpecifier : String
invalidSpecifier = printf "Hello %s %3s" "world" "!"
```
## Working With Run-Time Format Strings
It will also fail to compile if too many arguments are passed (in Idris, it isn't desirable to fail when passed too few arguments, due to partial application being a core idiom of the language), or the arguments are of an incorrect type:
```idris
failing
tooManyArguments : String
tooManyArguments = printf "%s %s" "Hello" "World" "!"
```
```idris
failing
wrongArgumentType : String
wrongArgumentType = printf "%s %s" "Hello" 1
```
## Working With Runtime Format Strings
Since we'll be talking about situations where the compiler doesn't have insight into the value of the format string, we'll define a little helper function to hide a value from the type checker.
> [!NOTE]
> The evaluator Idris uses for expanding terms at the type level does not have an IO context available to it, so round tripping through `IO` with `unsafePerformIO` results in the wrapped term being opaque during compile time.
```idris
blackbox : a -> a
blackbox x = unsafePerformIO $ pure x
```
Our `printf` function works just fine when provided a format string that's known at compile time, but as written, it isn't really usable with format strings that are provided via IO at runtime.
When the format string isn't known to the compiler at compile time, it can't expand the `PrintfFmt` function, leaving it unable to compute the rest of the type signature.
```idris
failing
runtimeFormatString : String
runtimeFormatString = printf (blackbox "%s %s") "Hello" "World"
```
We can construct a variant of `printf` that will work with runtime format strings by defining a custom, heterogenous list-like structure, and making a fallible version of our `printf` function based on this data structure:
```idris
data PrintfInput : Type where
INum : (d : Nat) -> (next : PrintfInput) -> PrintfInput
IStr : (s : String) -> (next : PrintfInput) -> PrintfInput
IEnd : PrintfInput
```
Now we need a `printfFmt` analog that works with our `PrintfInput` type. It accepts a `fmt : Maybe Format`, the corresponding function of type `PrintfType fmt`, and the argument inputs as a `PrintfInput` structure. We use the same two-variants trick as before to handle the `Maybe`, and the rest is basic pattern matching to make sure the type of the input matches up with the type of the format specifier, and passing that input in as the next argument to the formatting function.
```idris
printfFmtInput :
(fmt : Maybe Format) -> (fn : PrintfType fmt) -> (input : PrintfInput)
-> Maybe String
printfFmtInput Nothing fn input = Nothing
printfFmtInput (Just x) fn input = printfFmtInput' x fn input
where
printfFmtInput' :
(fmt' : Format) -> (fn : PrintfType' fmt') -> (input : PrintfInput)
-> Maybe String
-- Handle cases where the input type matches up with the formatter type
printfFmtInput' (Number next) fn (INum d y) =
printfFmtInput' next (fn d) y
printfFmtInput' (PaddedNumber digits next) fn (INum d y) =
printfFmtInput' next (fn d) y
printfFmtInput' (Str next) fn (IStr s y) =
printfFmtInput' next (fn s) y
printfFmtInput' (Literal literal next) fn input =
printfFmtInput' next fn input
-- Handle cases where the input type does not match up with the formatter type
printfFmtInput' End fn IEnd = Just fn
printfFmtInput' (Number next) fn _ = Nothing
printfFmtInput' (PaddedNumber digits next) fn _ = Nothing
printfFmtInput' (Str next) fn _ = Nothing
printfFmtInput' End fn _ = Nothing
```
Our `printfInput` function calls `printf` to produce the formatting function, then passes that formatting function to `printfFmtInput` along with the provided input structure.
```idris
printfInput : (fmt : String) -> (input : PrintfInput) -> Maybe String
printfInput fmt input = printfFmtInput _ (printf fmt) input
```
Now that it's all put together, we can now use `printfInput` with format strings that are generated via IO.
```idris hide
-- @@test dynamic printf
worksNow : IO Bool
worksNow = do
pure $
```
<div class="unit-test">
```idris
printfInput (blackbox "%s %s") (IStr "Hello" (IStr "World" IEnd)) == Just "Hello World"
```
</div>
Being fully honest, I'm not entirely sure what use cases this particular function would make sense for, but it provides a first taste of working with type signatures that depend on values that aren't known until runtime, which is a concept we will expand upon in future entries in this series.
## Conclusions
### Why this API isn't really a great idea
We've made a fun little function that works much like C's `printf` function or Rust's `println!`/`format!` macros, but without relying on magic variadic mechanisms that introduce unsafe behavior and require special, individualized support from the compiler to check, and without requiring macros.
<!-- Doesn't fail to compile if you don't pass any arguments -->
Avoiding using a macro also gives us a lot of neat bonuses, like `printf` automatically playing nice with partial application and currying, with no extra effort on our part.
```idris hide
-- @@test partial application
partialApplication : IO Bool
partialApplication = do
pure $
```
<div class="unit-test">
```idris
map (printf "Hello %s") ["Alice", "Bob"] == ["Hello Alice", "Hello Bob"]
```
</div>
However, the interface this function presents isn't _quite_ ideal, we could do a lot better.
For instance, since we make the type signature one that isn't possible to satisfy in the case where the format string is invalid, there isn't actually a compiler error if you provide an invalid format string, but don't provide any arguments.
This isn't a _huge_ deal, since the produced function isn't going to doing anything if you don't have any arguments, but it constitutes a minor annoyance. You can also quite easily end up surprising yourself later with this if you are making use of partial application.
```idris hide
BadExample : ?
BadExample =
```
<div class="warn-unit-test">
```idris
printf "Hello %s %3s"
```
</div>
Another annoyance here is the error messages, which are quite confusing and opaque, at least at the time of writing, taking another look at our `invalidSpecifer` failing example:
```idris
failing
invalidSpecifier : String
invalidSpecifier = printf "Hello %s %3s" "world" "!"
```
The error message we get if we try to have this outside of a failing block, at time of writing, reads like:
```
[ build ] 1/3: Building LessMacrosMoreTypes.Printf (src/LessMacrosMoreTypes/Printf.md)
[ build ] Error: While processing right hand side of invalidSpecifier. When unifying:
[ build ] String
[ build ] and:
[ build ] ?argTy -> String
[ build ] Mismatch between: String and ?argTy -> String.
[ build ]
[ build ] LessMacrosMoreTypes.Printf:359:20--359:53
[ build ] 355 | Another annoyance here is the error messages, which are quite confusing and opaque, at least at the time of writing, taking another look at our `invalidSpecifer` failing example:
[ build ] 356 |
[ build ] 357 | ```idris
[ build ] 358 | invalidSpecifier : String
[ build ] 359 | invalidSpecifier = printf "Hello %s %3s" "world" "!"
[ build ] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
This correctly identifies the `printf` call as the source of the error, but doesn't zero in on the format string or the specifier, and the error message here is just a completely generic one about mismatch during unification that doesn't really tell us about _what_ went wrong.
Some of the fault here lies on the compiler, Idris 2 is currently immature in a lot of ways, to be expected from a pre-1.0 compiler, and error messages are very much one of them, but the way we have written this function also isn't giving the compiler much help.
The need for a bespoke data structure for `PrintfInput` also isn't ideal.
We will delve in ways to rectify all of these complaints in future entries in this series.
[^1]: Ignoring for a second the special handling for format strings in modern compilers