From 99b7ca322eebecf9892cb6fb2a41f8714e3f0626 Mon Sep 17 00:00:00 2001
From: Nathan McCarty
The posts in this series are 'living posts', they will be referenced as library modules by future posts, and will be updated as needs change.", "post-ids": [ 7 ], diff --git a/projects/Idris/src/DependentNuggets/FreshLists.md b/projects/Idris/src/DependentNuggets/FreshLists.md index d3eb99b..5921401 100644 --- a/projects/Idris/src/DependentNuggets/FreshLists.md +++ b/projects/Idris/src/DependentNuggets/FreshLists.md @@ -9,13 +9,13 @@ import Data.So import Data.Nat ``` -When programming, we quite frequently encounter the need for a data structure that can contain at most one of each given element. Typically, we would use a `Set`, which satisfies this constraint, but does so as a runtime invariant that must be taken on trust, and results in ergonomic concerns when used as a component of API design. We can use "Fresh Lists" as an alternative that provide a type-level guarantee of the uniqueness of each element, while also generalizing the notion of uniqueness and providing compile-time checking of the constraint on literals. +When programming, we quite frequently encounter the need for a data structure that can contain at most one of each given element. Typically, we might use a `Set`, which satisfies this constraint, but does so as a runtime invariant, which must be taken on trust, and results in ergonomic concerns when used as a component of API design. We introduce "Fresh Lists" as an alternative that provide a type-level guarantee of the uniqueness of each element, while also generalizing the notion of uniqueness and providing compile-time checking of literals. -As best as I can tell, the concept of fresh lists originated with Catarina Coquand's _A Formalised Proof of the Soundness and Completeness of a Simply Typed Lambda-Calculus with Explicit Substitutions_ paper, and acquired the name 'FreshList' when they were modified for inclusion in [Agda's standard library](https://agda.github.io/agda-stdlib/master/Data.List.Fresh.html). The implementation in this blog post is inspired both by the Agda implementation, and by the implementation in [collie](https://github.com/ohad/collie), which is a command line arguments handling library for Idris. +As best as I can tell, the concept of the `FreshList` originated with Catarina Coquand's _A Formalised Proof of the Soundness and Completeness of a Simply Typed Lambda-Calculus with Explicit Substitutions_ paper, and acquired the name 'FreshList' when they were modified for inclusion in [Agda's standard library](https://agda.github.io/agda-stdlib/master/Data.List.Fresh.html). The implementation in this post is inspired both by the Agda implementation, and by the implementation in [collie](https://github.com/ohad/collie). ## The Problems With Sets -Let's define a little example function that consumes a `SortedSet` to demonstrate with: +Let's define a little example function that consumes a `SortedSet` to demonstrate my complaints with: ```idris takesSortedSet : SortedSet a -> Bool @@ -24,15 +24,17 @@ takesSortedSet _ = True ### `Set`'s Constraints on the Element Type -In general, `Set`s require that the element type is either hash-able or order-able, Idris at the moment only provides a `SortedSet` which requires the element implement `Ord`. +In general, `Set`s require that the element type is either hash-able or order-able, Idris at the moment only provides a `SortedSet`, requiring the element implement `Ord`. -While many types that we would want to apply uniqueness constraint to have natural and reasonable orderings, like integers or strings, this isn't always desirable. As an example, I've found myself implementing contrived orderings based on an internal integer identifier that was supposed to be completely opaque for opaque token types in other languages. +While many types that we might want to apply the uniqueness constraint to have natural and reasonable orderings, like integers or strings, applying such a constraint isn't always desirable. -The constraint for a `HashSet` is less annoying, but its still a bit onerous to require that a type be hashable just to guarantee that a collection contains at most one of each given item. +As an example, I've found myself implementing contrived orderings based on an internal integer identifier that was supposed to be completely opaque for opaque token types in other languages. + +The constraint for a `HashSet` is less annoying, but it's still a bit onerous to require that a type be hashable just to guarantee that a collection contains at most one of each given item. ### Unchecked Literals -In most languages, there isn't a direct literal for `Set`s, including Idris, and you have to call some conversion function on a list or array literal to specify one at compile time. +In most languages, including Idris, there isn't a literal for `Set` types, instead you have to call some conversion function on a list or array literal to specify one at compile time. This works _ok_, wrapped literals do, in fact, work when passed to functions consuming `Set`s: @@ -64,11 +66,11 @@ takesSortedSetBad = ``` -This can be especially problematic when the literal contains variables and the construction of the values incurs side effects, or when the values are complex and only some of the contained data is used in the equality check. +This can be especially problematic when the literal contains variables, when the construction of the values incurs side effects, or when the values are complex and only some of the contained data is used in the equality check. ### Sets only provide uniqueness over equality -While needing a collection of elements to be unique specifically under equality is really common, its often quite useful to work with a more general notion of uniqueness. +While the need for a collection of elements to be unique specifically under equality is really common, its often quite useful to work with a more general notion of uniqueness. As an example, consider a collection grouping updates to a database table. It might be useful for the collection to enforce that no two of the updates touch the same rows, maybe to guarantee they can all easily be run in parallel. This comparison can be thought of as of equality, it is an equivalence relation[^1], which is a _sort of_ equality, but it would be _spectacularly_ weird to express this through an implementation of a languages `Eq` interface, and any ordering derived from this would be quite contrived. @@ -88,7 +90,9 @@ data So' : Bool -> Type where Oh' : So' True ``` -This may seem a little bit confusing if you aren't used to dependent types, don't worry if it doesn't sit right at first, it'll start to make sense as we use it. The `So` type constructor takes an argument of type `Bool`, and the sole data constructor, `Oh`, can only produce values of `So` where the argument to the data constructor is `True`. This isn't _super_ useful on its own, but grows wings when we leverage the type checker's symbolic evaluation capability. +This may seem a little bit confusing if you aren't used to dependent types, don't worry if it doesn't sit right at first, it'll start to make sense as we use it. + +The `So` type constructor takes an argument of type `Bool`, and its sole data constructor, `Oh`, can only produce values of `So` where the value of the argument to the type constructor is `True`. This isn't _super_ useful on its own, but grows wings when we leverage the type checker's symbolic evaluation capability. Lets define a silly little runtime check to write some examples with: @@ -142,9 +146,9 @@ sixIsEven : So (isEven 6) sixIsEven = Oh ``` -In this example, `isEven 6` gets symbolically expanded and evaluated to `True` during type checking, allowing us to return a raw `Oh`. +In this example, `isEven 6` gets symbolically expanded and evaluated to `True` during type checking, allowing us to return `Oh`. -For more complicated predicates, or values constructed at run time, a `choose` function is provided which will evaluate the given predicate and provide a witness of either its truthiness or its falseiness, defined as so: +For more complicated predicates, or values constructed at run time, the `choose` function is provided. `choose` will evaluate the given predicate and provide a witness of either its truthiness or its falseiness, defined as so: ```idris choose' : (b : Bool) -> Either (So b) (So (not b)) @@ -168,11 +172,13 @@ complicatedIsEven x = ### Freshness -As we want to generalize the notion of equality, we ought to give our generalized concept a name. We can refer to a general predicate of type `pred : a -> a -> Bool` as a "freshness" criteria, and say that an element `x` is "fresh" relative to an element `y` when `pred x y` evaluates to `True`. An element `x` is "fresh" relative to a list `xs` when it is independently "fresh" relative to every element in `xs`. +As we want to generalize the notion of equality, we ought to give our generalized concept a name. + +We refer to a general predicate of type `pred : a -> a -> Bool` as a "freshness" criteria, and say that an element `x` is "fresh" relative to an element `y` when `pred x y` evaluates to `True`. An element `x` is "fresh" relative to a list `xs` when it is independently "fresh" relative to every element in `xs`. To recover `Set` like behavior, we can use non-equality (`/=`) as the "freshness" criteria, stating that an element is "fresh" when it is not-equal to every element already in the list. -When an equivalence relation[^1] is used as the "freshness" criteria, the `FreshList` will behave roughly like a `Set`, but with a non-standard definition of equality, instead allowing at most one element from each equivalence class. "Freshness", however, generalizes beyond equivalence relations, and we can get some really interesting and useful behavior by using a non-equivalence relation as the "freshness" criteria, we will cover some examples later in the article. +When an equivalence relation[^1] is used as the "freshness" criteria, the `FreshList` will behave roughly like a `Set`, but with a potentially non-standard definition of equality, instead allowing at most one element from each equivalence class. "Freshness", however, generalizes beyond equivalence relations, and we can get some really interesting and useful behavior by using a non-equivalence relation as the "freshness" criteria, we will cover some examples later in the article. ### Forward declarations @@ -212,17 +218,17 @@ data FreshList : (fresh : e -> e -> Bool) -> (e : Type) -> Type where %name FreshList xs, ys, zs ``` -The freshness criteria that applies to this particular `FreshList` is taken as the first argument to the `FreshList` type constructor, the element type as the second argument. This order is chosen to allow implementation of the standard collection interfaces like `Foldable`. +The freshness criteria that applies to this particular `FreshList` is taken as the first argument to the `FreshList` type constructor, and the element type as the second argument. This order is chosen to allow implementation of the standard collection interfaces like `Foldable` for `FreshList`. A value of type `FreshList (/=) Nat` would be a list of `Nat`s with the `Set`-like behavior of requiring that each `Nat` be non-equal to every other `Nat` in the list. The `Nil` constructor for `FreshList` is pretty straight forward, nearly identical to `List`'s `Nil` constructor. -The `::` (pronounced 'cons') operator is also very similar to `List`'s, but it's "enhanced" with an erased auto implicit argument, applying our `#` operator to the new element `x` and the rest of the list (`xs`). We haven't implemented `#` yet, but it's going to be a short hand for checking that the new element is fresh relative to all of the elements already in the list. +The `::` (pronounced 'cons') operator is also very similar to `List`'s, but "enhanced" with an erased auto implicit argument, applying our `#` operator to the new element `x` and the rest of the list (`xs`). We haven't implemented `#` yet, but it's going to be a short hand for checking that the new element is fresh relative to all of the elements already in the list. -The erasure (the `0` after the `auto`) provides a bit of interesting semantics, it restricts the `freshness` value (which serves as proof that `x` is fresh relative to the rest of the list) such that it only exists at compile time. While we still have to go through the motions of producing `freshness`, even if it's handled for us by the compiler through proof search, the `freshness` value doesn't exist at runtime. Appending to a `FreshList` is thus a linear time (`O(n)`) operation, but the runtime representation is exactly that of `List` (in fact, Idris has an optimization that should result in conversions from `FreshList` to `List` being optimized out), with no space overhead from the freshness proof. +The erasure (the `0` after the `auto`) provides an interesting bit of semantics, it restricts the `freshness` value (which serves as proof that `x` is fresh relative to the rest of the list) such that it is only allowed to exist at compile time. While we still have to go through the motions of producing the `freshness` value, even if it's handled for us by the compiler through proof search, but the `freshness` value doesn't exist at runtime. Appending to a `FreshList` is thus a linear time (`O(n)`) operation, but the runtime representation is exactly that of `List` (in fact, Idris has an optimization that should result in conversions from `FreshList` to `List` being optimized out), with no space overhead from the `freshness` proof. -The result of this, being unable to produce a value of a type without going through the motions of making sure its invariants are upheld, but not paying the runtime cost of actually storing the witness, is very common in Idris 2. +The result, being unable to produce a value of a type without going through the motions of making sure its invariants are upheld, but not paying the runtime cost of actually storing the witness, is a very common pattern in Idris 2. ### Our Operators @@ -233,9 +239,9 @@ x ## [] = True x ## (y :: xs) = (x `fresh` y) && (x ## xs) ``` -This operator takes our freshness predicate as an implicit argument, named `fresh`, and recursively checks that an element is fresh relative to every element already in a given `FreshList`, in effect "lifting" our runtime freshness check over two elements into a runtime freshness check over an element and an entire list. +This operator takes our freshness predicate as an implicit argument, named `fresh`, and recursively checks that an element is fresh relative to every element in a given `FreshList`, in effect "lifting" our runtime freshness check over two elements into a runtime freshness check over an element and an entire list. -While this provides us our whole-list runtime freshness check, we still need to lift this into a type level proof, which the `#` operator, with type `(#) : {fresh : e -> e -> Bool} -> (x : e) -> (xs : FreshList fresh e) -> Type`, does by dressing our `##` in `So` clothing: +While this provides us our whole-list runtime freshness check, we still need to lift this into a type level witness, which the `#` operator, with type `(#) : {fresh : e -> e -> Bool} -> (x : e) -> (xs : FreshList fresh e) -> Type`, does by dressing our `##` operator in `So` clothing: ```idris x # xs = So (x ## xs) @@ -243,7 +249,7 @@ x # xs = So (x ## xs) ### Some operations over FreshLists -We can define all of our usual list operations over `FreshList`s, but most of them are going to be just about identical to the `List` equivalent, so we'll just cover some interesting ones. +We can define all of our usual list operations over `FreshList`s, but most of them are going to be just about identical to the `List` equivalent, so we'll just cover some interesting ones. This post will be referenced in future posts, and this section will grow as new operations are required. We can't unconditionally append elements to a `FreshList`, they must be fresh after all, and it can be a bit onerous to have to do the freshness check ourself every time, so we'll define a `maybeAppend` function that runs the freshness check, returning a `Just` if the new element is, in fact, fresh, and `Nothing` otherwise: @@ -294,7 +300,7 @@ takesUniqueNumbersGood = ``` -Notice how, thanks to proof search and literal overloading, we don't have to apply any conversion function to the list-style literal here, the literal directly produces a `FreshList` carrying the associated type-level guarantees. +Notice how, thanks to proof search and literal overloading, we don't have to apply a conversion function to the list-style literal here, the literal directly produces a `FreshList` carrying the associated type-level guarantees. Additionally, as an upgrade to `SortedSet`, at least for this use-case, attempts to define a literal with duplicate elements are rejected by the compiler: @@ -459,6 +465,16 @@ failing ] ``` +## Real World Examples of FreshLists in Action + +### collie + +[collie](https://github.com/ohad/collie), a command line arguments handling library for Idris, makes use of `FreshList`s to ensure that the programmer doesn't define the same argument or sub-program twice. + +### Idris 2 by Highly Contrived Example + +My work in progress book, [_Idris 2 by Highly Contrived Example_](https://static.stranger.systems/idris-by-contrived-example/), which is structured as a day-by-day work-through of the Advent of Code problem backlog, uses `FreshList`s for the [runner api](https://git.stranger.systems/Idris/advent/src/commit/76fcb6c34bbfdae36a76194bbc41d55aa0c3ba35/src/Runner.md), prevented a given day from being declared twice, and also enforcing that they are declared in sorted order as a cheeky little bit of type-level style enforcement. + [^1]: https://en.wikipedia.org/wiki/Equivalence_relation [^2]: The exact conditions under which this will work are beyond the scope of this article