7.5 KiB
Type Safe Variadic printf
module LessMacrosMoreTypes.Printf
import Data.List
import System
While C can provide convenient string formatting by having hideously memory
unsafe variadics, and dynamic languages, like python, can do the same while
being memory safe by not being type safe, many type safe languages, such as
Rust, are forced to provide such functionality through the use of a macro.
Dependently typed languages, like Idris, can provide a printf like formatting
interface, while maintaining both memory and type safety, without the need for
the macro. We will explore this by implementing a simplified version of printf
in Idris from scratch.
This article is inspired by an exercise from chapter 6 of Type Driven Development with Idris, and is written as a literate Idris file, with the source available here.
Gameplan
Our goal is to provide a printf function that can be called, much like it's C equivalent:
Note
As this is a literate Idris document, and we haven't defined our
printf
function yet, we have to use afailing
block to ask the compiler to check that this code parses, and syntax highlight it for us, but not attempt to actually compile it.
failing
printf "%s %d %2d" "hello" 1 2
Idris lacks a dedicated facility for variadics, but we can call functions in type signatures, Idris allows us to manipulate types as first class values, and we can use the runtime values of previous arguments to the function we are defining as arguments to our type-level function.
To get our variadic printf
function, we can parse our format string into a
data structure, then pass that data structure into a type-level function that
calculates the rest of the type signature of our printf
function based on its
contents.
Parsing a Format String
First, we need a data structure to describe our format string. We define the
Format
data type, with constructors for each of the format specifiers we will
be supporting, as well as a constructor to hold literal components.
data Format : Type where
||| A slot that should be filled in with a number
Number : (next : Format) -> Format
||| A slot that should be filled in with a number, padded to a certian number
||| of digits
PaddedNumber : (digits : Nat) -> (next : Format) -> Format
||| A slot that should be filled in with a string
Str : (next : Format) -> Format
||| A literal component of the format string that should not be interpolated
Literal : (literal : String) -> (next : Format) -> Format
||| The end of the format string
End : Format
We'll need to be able to parse numbers for our PaddedNumber
constructor, so
we'll write a little helper function to handle that component of the parsing.
Simply keep pulling off characters, converting them to integers by shifting
their ordinal values.
Warning
On Idris 2's chez scheme backend, these are unicode ordinals, where the digit characters are in numerical order, but the values are backend dependent and this code is not guaranteed to work properly on other backends.
parseNumber : (xs : List Char) -> (acc : Nat) -> (Nat, List Char)
parseNumber [] acc = (acc, [])
parseNumber (x :: xs) acc =
if isDigit x
then let value = cast $ (ord x) - (ord '0')
in parseNumber xs (acc * 10 + value)
else (acc, x :: xs)
We'll also want another one to scoop up a literal into a List Char
, continuing
until we hit the end of the string or the next %
. This is optional, we could
have each literal consist of an individual char, but we will go ahead and group
them together.
parseLiteral : (xs : List Char) -> (List Char, List Char)
parseLiteral [] = ([], [])
parseLiteral ('%' :: xs) = ([], '%' :: xs)
parseLiteral (x :: xs) =
let (literal, rest) = parseLiteral xs
in (x :: literal, rest)
Parse our format string into our Format
data structure. The specifics of the
parsing here aren't really material to the main point of this article, but we
use a basic pattern matching approach, calling into our helper functions as
appropriate.
parseFormat : (xs : List Char) -> Maybe Format
parseFormat [] = Just End
-- A `%` has to come before a specifier
parseFormat ('%' :: []) = Nothing
parseFormat ('%' :: (x :: xs)) =
if isDigit x
-- Invoke parseNumber to get our padding specifier
then let (digits, rest) = parseNumber (x :: xs) 0
in case rest of
-- A padding specifier has to come before something
[] => Nothing
('d' :: ys) => do
rest <- parseFormat ys
Just $ PaddedNumber digits rest
-- A padding specifier is only valid before a number specifier
(y :: ys) => Nothing
-- Parse as an unpadded specifier
else case x of
'd' => do
rest <- parseFormat xs
Just $ Number rest
's' => do
rest <- parseFormat xs
Just $ Str rest
-- Any other character here is an invalid specifier
_ => Nothing
parseFormat (x :: xs) =
let (literal, rest) = parseLiteral (x :: xs)
in do
rest <- parseFormat rest
Just $ Literal (pack literal) rest
Calculating a Type From a Format String
PrintfType' : Format -> Type
PrintfType' (Number next) =
(num : Nat) -> PrintfType' next
PrintfType' (PaddedNumber digits next) =
(num : Nat) -> PrintfType' next
PrintfType' (Str next) =
(str : String) -> PrintfType' next
PrintfType' (Literal literal next) =
PrintfType' next
PrintfType' End = String
PrintfType : Maybe Format -> Type
PrintfType Nothing = Void -> String
PrintfType (Just x) = PrintfType' x
printf
With the Format Structure
left_pad : (len : Nat) -> (pad : Char) -> (str : String) -> String
left_pad len pad str =
let cs = unpack str
in if length cs < len
then pack $ replicate (len `minus` length cs) pad ++ cs
else str
printfFmt : (fmt : Maybe Format) -> (acc : String) -> PrintfType fmt
printfFmt Nothing acc =
\void => absurd void
printfFmt (Just x) acc = printfFmt' x acc
where
printfFmt' : (fmt : Format) -> (acc : String) -> PrintfType' fmt
printfFmt' (Number next) acc =
\i => printfFmt' next (acc ++ show i)
printfFmt' (PaddedNumber digits next) acc =
\i => printfFmt' next (acc ++ left_pad digits '0' (show i))
printfFmt' (Str next) acc =
\str => printfFmt' next (acc ++ str)
printfFmt' (Literal literal next) acc =
printfFmt' next (acc ++ literal)
printfFmt' End acc = acc
With a Format String
printf : (fmt : String) -> PrintfType (parseFormat (unpack fmt))
printf fmt = printfFmt _ ""
We can call printf
as expected, with the number of and types of the arguments
being determined by the provided format string:
-- @@test printf hello world
helloWorld : IO Bool
helloWorld = do
pure $
printf "%s %s%s %3d %d" "Hello" "world" "!" 1 23 == "Hello world! 001 23"
It will even fail to compile if you attempt to provide arguments to an invalid format string, which we can demonstrate by trying to apply a padding modifier to a string specifier:
Note
failing
blocks have an additional feature, they will trigger a compiler error if their contents do compile successfully.
failing
printf "Hello %s %3s" "world" "!"