# Type Safe Variadic printf ```idris hide module LessMacrosMoreTypes.Printf import Data.List import System -- Only needed for addenda import Data.Vect ``` While C can provide "convenient" string formatting by having hideously unsafe variadics, and dynamic languages, like python, can do the same, many type safe languages, such as Rust, are forced to provide such functionality through the use of a macro. Dependently typed languages, like Idris, can provide a `printf` like formatting interface, while maintaining both memory and type safety, without the need for macros. We will explore this by implementing a simplified version of `printf` in Idris from scratch. This article is inspired by an exercise from chapter 6 of [Type Driven Development with Idris](https://www.manning.com/books/type-driven-development-with-idris), and is written as a literate Idris file, with the source available [here](https://git.stranger.systems/thatonelutenist/website/src/branch/trunk/projects/Idris/src/LessMacrosMoreTypes/Printf.md). ## Game plan Our goal is to provide a `printf` function that can be called much like it's C equivalent: > [!Note] > As this is a literate Idris document, and we haven't defined our `printf` function yet, we have to use a `failing` block to ask the compiler to check that this code parses, and syntax highlight it for us, but not attempt to actually compile it. ```idris failing example : String example = printf "%s %d %2d" "hello" 1 2 ``` Idris lacks a dedicated facility for variadics, but we can call functions in type signatures, which gives us an in for implementing them. Idris allows us to manipulate types as first class values, and we can use the runtime values of previous arguments to the function we are declaring as arguments to later type-level functions. These type-level functions can, furthermore, recurse into themselves to produce type signatures of varying length and content. > [!TIP] > Idris requires that these type level functions be [_total_](https://idris2.readthedocs.io/en/latest/tutorial/theorems.html#totality-checking) for them to be expanded at compile time, meaning they are known to the compiler to return a value in finite time for all possible inputs. This allows us to produce variadic functions, so long as the number of arguments and their types can be determined from one of the arguments of the function before the variadic portion. To get our variadic `printf` function, we can parse our format string into a data structure, then pass that data structure into a type-level function that calculates the rest of the type signature of our `printf` function based on contents of the format string. ## Parsing a Format String First, we need a data structure to describe our format string. We define the `Format` data type, with constructors for each of the format specifiers we will be supporting, as well as a constructor to hold literal components. ```idris data Format : Type where ||| A slot that should be filled in with a number Number : (next : Format) -> Format ||| A slot that should be filled in with a number, padded to a certain number ||| of digits PaddedNumber : (digits : Nat) -> (next : Format) -> Format ||| A slot that should be filled in with a string Str : (next : Format) -> Format ||| A literal component of the format string that should not be interpolated Literal : (literal : String) -> (next : Format) -> Format ||| The end of the format string End : Format ``` We'll need to be able to parse numbers for our `PaddedNumber` constructor, so we'll write a little helper function to handle that component of parsing. We'll simply keep pulling off characters, converting them to integers by manipulating their ordinal values, until we hit a non-digit. > [!WARNING] > On Idris 2's chez scheme backend, these are unicode ordinals, where the digit characters are in numerical order, but the values are backend dependent and this code is not guaranteed to work properly on other backends. ```idris parseNumber : (xs : List Char) -> (acc : Nat) -> (Nat, List Char) parseNumber [] acc = (acc, []) parseNumber (x :: xs) acc = if isDigit x then let value = cast $ (ord x) - (ord '0') in parseNumber xs (acc * 10 + value) else (acc, x :: xs) ``` We'll also want another one to scoop up a literal into a `List Char`, continuing until we hit the end of the string or the next `%`. This is optional, we could have each literal consist of an individual char, but we will go ahead and group them together. ```idris parseLiteral : (xs : List Char) -> (List Char, List Char) parseLiteral [] = ([], []) parseLiteral ('%' :: xs) = ([], '%' :: xs) parseLiteral (x :: xs) = let (literal, rest) = parseLiteral xs in (x :: literal, rest) ``` Next, parse our format string into an instance of the `Format` data structure. The specifics of the parsing aren't really material to the main point of this article, but we use a basic pattern matching approach, calling into our helper functions as appropriate. We return a `Maybe Format` to signal failure in the case of an invalid format string, and trigger an error when the programmer attempts to use an invalid format string. ```idris parseFormat : (xs : List Char) -> Maybe Format parseFormat [] = Just End -- A `%` has to come before a specifier parseFormat ('%' :: []) = Nothing parseFormat ('%' :: (x :: xs)) = if isDigit x -- Invoke parseNumber to get our padding specifier then let (digits, rest) = parseNumber (x :: xs) 0 in case rest of -- A padding specifier has to come before something [] => Nothing ('d' :: ys) => do rest <- parseFormat ys Just $ PaddedNumber digits rest -- A padding specifier is only valid before a number specifier (y :: ys) => Nothing -- Parse as an unpadded specifier else case x of 'd' => do rest <- parseFormat xs Just $ Number rest 's' => do rest <- parseFormat xs Just $ Str rest -- Any other character here is an invalid specifier _ => Nothing parseFormat (x :: xs) = let (literal, rest) = parseLiteral (x :: xs) in do rest <- parseFormat rest Just $ Literal (pack literal) rest ``` ## Calculating a Type From a Format String Since our parser returns a `Maybe Format`, we will define two variants of our `PrintfType` type-level function. One that handles the inner `Format`, and another that handles a `Maybe Format`, returning a type signature that is impossible to satisfy in the `Nothing` case, and calling our first variant in the `Just` case. The structure of this type-level function is rather simple, we just pattern match on our `Format` argument, and prepend the corresponding argument type to the rest of the type signature, computed through a recursive call on the `next` component of the `Format`. When we hit the `End`, we return the `String` type, which is the final return value of our function. > [!TIP] > The `->` that appears in the body of the function is actually the exact same `->` that appears in the type signature, in both cases it is the constructor for function types. Idris uses the same language for type signatures and for function bodies, so anything you can use in one place, you can use in the other. ```idris PrintfType' : Format -> Type PrintfType' (Number next) = (num : Nat) -> PrintfType' next PrintfType' (PaddedNumber digits next) = (num : Nat) -> PrintfType' next PrintfType' (Str next) = (str : String) -> PrintfType' next PrintfType' (Literal literal next) = PrintfType' next PrintfType' End = String ``` ```idris PrintfType : Maybe Format -> Type PrintfType Nothing = Void -> String PrintfType (Just x) = PrintfType' x ``` ## printf ### With the Format Structure ```idris hide left_pad : (len : Nat) -> (pad : Char) -> (str : String) -> String left_pad len pad str = let cs = unpack str in if length cs < len then pack $ replicate (len `minus` length cs) pad ++ cs else str ``` Now, we need to be able to produce our formatting function from a `Maybe Format`. We'll use the same two variants trick as above, but put the variant that operates directly on `Format`s in a `where` block, since we won't need to call it from anywhere else. We build the function argument by argument, pattern matching on the `Format`, producing a lambda that accepts the argument type demanded by the current specifier. Then recurse into our inner `printfFmt'` function to handle the rest of the specifiers and arguments, applying the appropriate string conversion to the argument and tacking it to the end of the accumulator string. The accumulator string is returned verbatim when we hit `End`. > [!NOTE] > Much like Haskell and other functional programming languages, Idris, semantically, only actually has functions of one argument. Multi-argument functions actually accept one argument, returning another function that handles the rest of the arguments. ```idris printfFmt : (fmt : Maybe Format) -> (acc : String) -> PrintfType fmt printfFmt Nothing acc = \void => absurd void printfFmt (Just x) acc = printfFmt' x acc where printfFmt' : (fmt : Format) -> (acc : String) -> PrintfType' fmt printfFmt' (Number next) acc = \i => printfFmt' next (acc ++ show i) printfFmt' (PaddedNumber digits next) acc = \i => printfFmt' next (acc ++ left_pad digits '0' (show i)) printfFmt' (Str next) acc = \str => printfFmt' next (acc ++ str) printfFmt' (Literal literal next) acc = printfFmt' next (acc ++ literal) printfFmt' End acc = acc ``` ### With a Format String `printf` is easily defined as a simple wrapper, converting the provided string to a `Format` and passing it to `printfFmt`. After accepting the format string argument, we parse a `Maybe Format` from it, providing the parsed value to our `PrintfType` function to calculate the rest of our type signature. > [!TIP] > The `_` syntax that appears in this function introduces an anonymous hole. When the value of a hole is forced by type checking (resolved), the compiler fills in its value for you. Holes defined with `_` will result in an "unsolved holes" compile error if unresolved. ```idris printf : (fmt : String) -> PrintfType (parseFormat (unpack fmt)) printf fmt = printfFmt _ "" ``` We can call `printf` as expected, with the number of and types of the arguments being determined by the provided format string: ```idris hide -- @@test printf hello world helloWorld : IO Bool helloWorld = do pure $ ```