website/projects/Idris/src/LessMacrosMoreTypes/Printf.md

7.5 KiB

Type Safe Variadic printf

module LessMacrosMoreTypes.Printf

import Data.List
import System

While C can provide convenient string formatting by having hideously memory unsafe variadics, and dynamic languages, like python, can do the same while being memory safe by not being type safe, many type safe languages, such as Rust, are forced to provide such functionality through the use of a macro. Dependently typed languages, like Idris, can provide a printf like formatting interface, while maintaining both memory and type safety, without the need for the macro. We will explore this by implementing a simplified version of printf in Idris from scratch.

This article is inspired by an exercise from chapter 6 of Type Driven Development with Idris, and is written as a literate Idris file, with the source available here.

Gameplan

Our goal is to provide a printf function that can be called, much like it's C equivalent:

Note

As this is a literate Idris document, and we haven't defined our printf function yet, we have to use a failing block to ask the compiler to check that this code parses, and syntax highlight it for us, but not attempt to actually compile it.

failing
  printf "%s %d %2d" "hello" 1 2

Idris lacks a dedicated facility for variadics, but we can call functions in type signatures, Idris allows us to manipulate types as first class values, and we can use the runtime values of previous arguments to the function we are defining as arguments to our type-level function.

To get our variadic printf function, we can parse our format string into a data structure, then pass that data structure into a type-level function that calculates the rest of the type signature of our printf function based on its contents.

Parsing a Format String

First, we need a data structure to describe our format string. We define the Format data type, with constructors for each of the format specifiers we will be supporting, as well as a constructor to hold literal components.

data Format : Type where
  ||| A slot that should be filled in with a number
  Number : (next : Format) -> Format
  ||| A slot that should be filled  in with a number, padded to a certian number
  ||| of digits
  PaddedNumber : (digits : Nat) -> (next : Format) -> Format
  ||| A slot that should be filled in with a string
  Str : (next : Format) -> Format
  ||| A literal component of the format string that should not be interpolated
  Literal : (literal : String) -> (next : Format) -> Format
  ||| The end of the format string
  End : Format

We'll need to be able to parse numbers for our PaddedNumber constructor, so we'll write a little helper function to handle that component of the parsing. Simply keep pulling off characters, converting them to integers by shifting their ordinal values.

Warning

On Idris 2's chez scheme backend, these are unicode ordinals, where the digit characters are in numerical order, but the values are backend dependent and this code is not guaranteed to work properly on other backends.

parseNumber : (xs : List Char) -> (acc : Nat) -> (Nat, List Char)
parseNumber [] acc = (acc, [])
parseNumber (x :: xs) acc = 
  if isDigit x
    then let value = cast $ (ord x) - (ord '0')
    in parseNumber xs (acc * 10 + value)
    else (acc, x :: xs)

We'll also want another one to scoop up a literal into a List Char, continuing until we hit the end of the string or the next %. This is optional, we could have each literal consist of an individual char, but we will go ahead and group them together.

parseLiteral : (xs : List Char) -> (List Char, List Char)
parseLiteral [] = ([], [])
parseLiteral ('%' :: xs) = ([], '%' :: xs)
parseLiteral (x :: xs) = 
  let (literal, rest) = parseLiteral xs
  in (x :: literal, rest)

Parse our format string into our Format data structure. The specifics of the parsing here aren't really material to the main point of this article, but we use a basic pattern matching approach, calling into our helper functions as appropriate.

parseFormat : (xs : List Char) -> Maybe Format
parseFormat [] = Just End
-- A `%` has to come before a specifier
parseFormat ('%' :: []) = Nothing
parseFormat ('%' :: (x :: xs)) = 
  if isDigit x
    -- Invoke parseNumber to get our padding specifier
    then let (digits, rest) = parseNumber (x :: xs) 0
    in case rest of
      -- A padding specifier has to come before something
      [] => Nothing
      ('d' :: ys) => do
        rest <- parseFormat ys
        Just $ PaddedNumber digits rest
      -- A padding specifier is only valid before a number specifier
      (y :: ys) => Nothing
    -- Parse as an unpadded specifier
    else case x of
      'd' => do
        rest <- parseFormat xs
        Just $ Number rest
      's' => do
        rest <- parseFormat xs
        Just $ Str rest
      -- Any other character here is an invalid specifier
      _ => Nothing
parseFormat (x :: xs) = 
  let (literal, rest) = parseLiteral (x :: xs)
  in do
    rest <- parseFormat rest
    Just $ Literal (pack literal) rest

Calculating a Type From a Format String

PrintfType' : Format -> Type
PrintfType' (Number next) =
  (num : Nat) -> PrintfType' next
PrintfType' (PaddedNumber digits next) =
  (num : Nat) -> PrintfType' next
PrintfType' (Str next) = 
  (str : String) -> PrintfType' next
PrintfType' (Literal literal next) = 
  PrintfType' next
PrintfType' End = String

PrintfType : Maybe Format -> Type
PrintfType Nothing = Void -> String
PrintfType (Just x) = PrintfType' x

printf

With the Format Structure

left_pad : (len : Nat) -> (pad : Char) -> (str : String) -> String
left_pad len pad str = 
  let cs = unpack str
  in if length cs < len
    then pack $ replicate (len `minus` length cs) pad ++ cs
    else str
printfFmt : (fmt : Maybe Format) -> (acc : String) -> PrintfType fmt
printfFmt Nothing acc = 
  \void => absurd void
printfFmt (Just x) acc = printfFmt' x acc
  where 
    printfFmt' : (fmt : Format) -> (acc : String) -> PrintfType' fmt
    printfFmt' (Number next) acc = 
      \i => printfFmt' next (acc ++ show i)
    printfFmt' (PaddedNumber digits next) acc = 
      \i => printfFmt' next (acc ++ left_pad digits '0' (show i))
    printfFmt' (Str next) acc =
      \str => printfFmt' next (acc ++ str)
    printfFmt' (Literal literal next) acc = 
      printfFmt' next (acc ++ literal)
    printfFmt' End acc = acc

With a Format String

printf : (fmt : String) -> PrintfType (parseFormat (unpack fmt))
printf fmt = printfFmt _ ""

We can call printf as expected, with the number of and types of the arguments being determined by the provided format string:

-- @@test printf hello world
helloWorld : IO Bool
helloWorld = do
  pure $
  printf "%s %s%s %3d %d" "Hello" "world" "!" 1 23 == "Hello world! 001 23"

It will even fail to compile if you attempt to provide arguments to an invalid format string, which we can demonstrate by trying to apply a padding modifier to a string specifier:

Note

failing blocks have an additional feature, they will trigger a compiler error if their contents do compile successfully.

failing
  printf "Hello %s %3s" "world" "!"

Working With Run-Time Format Strings

Conclusions

Why this API isn't really a great idea