Freshlist checkpoint

This commit is contained in:
Nathan McCarty 2025-03-16 17:44:58 -04:00
parent 8fc5a0fbda
commit 8e4adf7f3b
2 changed files with 240 additions and 3 deletions

View file

@ -1,7 +1,7 @@
{
"edited-at": [
],
"hidden": false,
"hidden": true,
"idris": true,
"ipkg": "/home/nathan/Projects/Blog/projects/Idris/Idris.ipkg",
"posted-at": "2025-03-15T18:07:00.649227-04:00",
@ -13,4 +13,4 @@
"tags": [
"idris"
]
}
}

View file

@ -2,7 +2,244 @@
```idris hide
module DependentNuggets.FreshLists
import System
import Data.SortedSet
import Data.So
```
When programming, we quite frequently encounter the for a data structure that can contain at most one of each given element. Typically, we would use a `Set`, which satisfies this constraint, but does so as a runtime invariant that must be taken on trust, and results in ergonomic concerns when used as a component of API design.
When programming, we quite frequently encounter the for a data structure that can contain at most one of each given element. Typically, we would use a `Set`, which satisfies this constraint, but does so as a runtime invariant that must be taken on trust, and results in ergonomic concerns when used as a component of API design. We can use "Fresh Lists" as an alternative that provide a type-level guarantee of the uniqueness of each element, while also generalizing the notion of uniqueness and providing compile-time checking of the constraint on literals.
As best as I can tell, the concept of fresh lists originated with Catarina Coquand's _A Formalised Proof of the Soundness and Completeness of a Simply Typed Lambda-Calculus with Explicit Substitutions_ paper, and acquired the name 'FreshList' when they were modified for inclusion in [Agda's standard library](https://agda.github.io/agda-stdlib/master/Data.List.Fresh.html). The implementation in this blog post is inspired both by the Agda implementation, and by the implementation in [collie](https://github.com/ohad/collie), which is a command line arguments handling library for Idris.
## The Problems With Sets
Let's define a little example function that consumes a `SortedSet` to demonstrate with:
```idris
takesSortedSet : SortedSet a -> Bool
takesSortedSet _ = True
```
### `Set`'s Constraints on the Element Type
In general, `Set`s require that the element type is either hash-able or order-able, Idris at the moment only provides a `SortedSet` which requires the element implement `Ord`.
While many types that we would want to apply uniqueness constraint to have natural and reasonable orderings, like integers or strings, this isn't always desirable. As an example, I've found myself implementing contrived orderings based on an internal integer identifier that was supposed to be completely opaque for opaque token types in other languages.
The constraint for a `HashSet` is less annoying, but its still a bit onerous to require that a type be hashable just to guarantee that a collection contains at most one of each given item.
### Unchecked Literals
In most languages, there isn't a direct literal for `Set`s, including Idris, and you have to call some conversion function on a list or array literal to specify one at compile time.
This works _ok_, wrapped literals do, in fact, work when passed to functions consuming `Set`s:
```idris hide
-- @@test takesSortedSet good
takesSortedSetGood : IO Bool
takesSortedSetGood =
pure $
```
<div class="unit-test">
```idris
takesSortedSet $ fromList [1, 2, 3, 4, 5]
```
</div>
However, this provides no compile time checking that the elements in the input list are actually unique, the programmer could very well accidentally duplicate a value without any warning from the compiler:
```idris hide
-- @@test takesSortedSet bad
takesSortedSetBad : IO Bool
takesSortedSetBad =
pure $
```
<div class="warn-unit-test">
```idris
takesSortedSet $ fromList [1, 1, 2, 3, 4, 5]
```
</div>
This can be especially problematic when the literal contains variables and the construction of the values incurs side effects, or when the values are complex and only some of the contained data is used in the equality check.
### Sets only provide uniqueness over equality
While needing a collection of elements to be unique specifically under equality is really common, its often quite useful to work with a more general notion of uniqueness.
As an example, consider a collection grouping updates to a database table. It might be useful for the collection to enforce that no two of the updates touch the same rows, maybe to guarantee they can all easily be run in parallel. This comparison can be thought of as of equality, it is an equivalence relation[^1], which is a _sort of_ equality, but it would be _spectacularly_ weird to express this through an implementation of a languages `Eq` interface, and any ordering derived from this would be quite contrived.
## Lifting Arbitrary Runtime Checks to the Type Level
While we could build our structure on top of type-level witnesses, such as `Not (x = y)`, proving, structurally, that each pair of values is not equal, this would be needlessly restrictive, not every criteria of interest is easily expressible as a type-level predicate.
Instead, we will use `Data.So`'s `So` type to lift an arbitrary user-defined runtime predicate to the type level.
`So` is defined like so:
>[!NOTE]
> The apostrophes have been added to avoid conflict with the standard library data type
```idris
data So' : Bool -> Type where
Oh' : So' True
```
This may seem a little bit confusing if you aren't used to dependent types, don't worry if it doesn't sit right at first, it'll start to make sense as we use it. The `So` type constructor takes an argument of type `Bool`, and the sole data constructor, `Oh`, can only produce values of `So` where the argument to the data constructor is `True`. This isn't _super_ useful on its own, but grows wings when we leverage the type checker's symbolic evaluation capability.
Lets define a silly little runtime check to write some examples with:
```idris
isEven : (Integral ty, Eq ty) => ty -> Bool
isEven x = (x `mod` 2) == 0
```
We can then call this predicate in `So`'s type constructor, for example, to construct a function that only accepts even numbers:
```idris
onlyEven : (Integral ty, Eq ty) => (x : ty) -> {auto prf: So (isEven x)} -> Bool
onlyEven x = True
```
Since `isEven` has type `ty -> Bool`, the statement `isEven x` has type `Bool`, and is thus allowed as an argument to `So`'s type constructor. Thanks to the type checker's symbolic evaluation capability, `So (isEven x)` is a type that we can only construct a value of if `isEven x` returns `True`.
Notice that the `prf` argument is declared as an auto-implicit, the caller doesn't need to explicitly provide that argument, the compiler will try to locate a suitable value from the environment, or construct one. `So` normally plays quite well with proof search for reasonable predicates, letting us pretend the argument doesn't exist when working with literals, so long as the predicate is satisfied.
This function, as expected, accepts even numbers:
```idris hide
-- @@test onlyEven accpets even
onlyEvenAcceptsEven : IO Bool
onlyEvenAcceptsEven =
pure $
```
> [!TIP]
> Since we are passing a literal here, the compiler automatically generates an appropriate value of the `prf` argument to `onlyEven` through proof search
<div class="unit-test">
```idris
onlyEven 4
```
</div>
It also provides a compile time error if you try to pass it an odd literal:
```idris
failing
oopsNotEven : Bool
oopsNotEven = onlyEven 3
```
The type checker's symbolic evaluation gives us a short cut for constructing values of `So` when the predicate is "obviously" true[^2]:
```idris
sixIsEven : So (isEven 6)
sixIsEven = Oh
```
In this example, `isEven 6` gets symbolically expanded and evaluated to `True` during type checking, allowing us to return a raw `Oh`.
For more complicated predicates, or values constructed at run time, a `choose` function is provided which will evaluate the given predicate and provide a witness of either its truthiness or its falseiness, defined as so:
```idris
choose' : (b : Bool) -> Either (So b) (So (not b))
choose' True = Left Oh
choose' False = Right Oh
```
To provide an example of its use, let's write an overly complicated function to check if a number is even:
```idris
complicatedIsEven : (Integral ty, Eq ty) => ty -> Bool
complicatedIsEven x =
case choose (isEven x) of
Left y => onlyEven x
Right y => False
```
`choose` returns a `Left` in the event the predicate evaluates to `True`, by pattern matching on the result of `choose`, we can extract the `So (isEven x)` in the `Left` branch, which then allows us to pass `x` to `onlyEven`, with the compiler able to detect that `y` has the same type as the `prf` auto-implicit argument and filling in `y` for `prf` for us.
## Introducing The FreshList
### Freshness
As we want to generalize the notion of equality, we ought to give our generalized concept a name. We can refer to a general predicate of type `pred : a -> a -> Bool` as a "freshness" criteria, and say that an element `x` is "fresh" relative to an element `y` when `pred x y` evaluates to `True`. An element `x` is "fresh" relative to a list `xs` when it is independently "fresh" relative to every element in `xs`.
To recover `Set` like behavior, we can use non-equality (`/=`) as the "freshness" criteria, stating that an element is "fresh" when it is not-equal to every element already in the list.
When an equivalence relation[^1] is used as the "freshness" criteria, the `FreshList` will behave roughly like a `Set`, but with a non-standard definition of equality, instead allowing at most one element from each equivalence class. "Freshness", however, generalizes beyond equivalence relations, and we can get some really interesting and useful behavior by using a non-equivalence relation as the "freshness" criteria, we will cover some examples later in the article.
### Forward declarations
Our definitions are about to get mutually recursive, and since Idris is a strictly define-before-use language, we'll forward declare the types to enable that mutual recursion. Don't worry, we'll cover what each definition means in detail as we fill them in.
We'll also go ahead and hide a built-in operator to avoid conflicts, and declare the operators we will be defining.
```idris
%hide Builtin.infixr.(#)
export infix 4 #, ##, #?
public export
data FreshList : (fresh : e -> e -> Bool) -> (e : Type) -> Type
||| Check if an element is "fresh" relative to all of the elements in a given freshlist
public export
(##) : {fresh : e -> e -> Bool} -> (x : e) -> (xs : FreshList fresh e) -> Bool
||| Lift `##` to the type level with `So`
public export
(#) : {fresh : e -> e -> Bool} -> (x : e) -> (xs : FreshList fresh e) -> Type
```
### FreshList proper
Our operators need to be able to destructure `FreshList`s for us to be able to define them, so we'll define `FreshList` first:
> [!TIP]
> In Idris, list literals, like `[1, 2, 3]`, are pure syntax sugar. List literals can be used for any type with a `::` constructor taking exactly 2 explicit arguments and a `Nil` constructor taking exactly 0 explicit arguments.
```idris
public export
data FreshList : (fresh : e -> e -> Bool) -> (e : Type) -> Type where
Nil : FreshList fresh e
(::) : (x : e) -> (xs : FreshList fresh e) -> {auto 0 freshness : x # xs}
-> FreshList fresh e
%name FreshList xs, ys, zs
```
The freshness criteria that applies to this particular `FreshList` is taken as the first argument to the `FreshList` type constructor, the element type as the second argument. This order is chosen to allow implementation of the standard collection interfaces like `Foldable`.
A value of type `FreshList (/=) Nat` would be a list of `Nat`s with the `Set`-like behavior of requiring that each `Nat` be non-equal to every other `Nat` in the list.
The `Nil` constructor for `FreshList` is pretty straight forward, nearly identical to `List`'s `Nil` constructor.
The `::` (pronounced 'cons') operator is also very similar to `List`'s, but it's "enhanced" with an erased auto implicit argument, applying our `#` operator to the new element `x` and the rest of the list (`xs`). We haven't implemented `#` yet, but it's going to be a short hand for checking that the new element is fresh relative to all of the elements already in the list.
The erasure (the `0` after the `auto`) provides a bit of interesting semantics, it restricts the `freshness` value (which serves as proof that `x` is fresh relative to the rest of the list) such that it only exists at compile time. While we still have to go through the motions of producing `freshness`, even if it's handled for us by the compiler through proof search, the `freshness` value doesn't exist at runtime. Appending to a `FreshList` is thus a linear time (`O(n)`) operation, but the runtime representation is exactly that of `List` (in fact, Idris has an optimization that should result in conversions from `FreshList` to `List` being optimized out), with no space overhead from the freshness proof.
The result of this, being unable to produce a value of a type without going through the motions of making sure its invariants are upheld, but not paying the runtime cost of actually storing the witness, is very common in Idris 2.
### Our Operators
First, we'll tackle the `##` operator, which as a refresher, has type `(##) : {fresh : e -> e -> Bool} -> (x : e) -> (xs : FreshList fresh e) -> Bool`:
```idris
x ## [] = True
x ## (y :: xs) = (x `fresh` y) && (x ## xs)
```
This operator takes our freshness predicate as an implicit argument, named `fresh`, and recursively checks that an element is fresh relative to every element already in a given `FreshList`, in effect "lifting" our runtime freshness check over two elements into a runtime freshness check over an element and an entire list.
While this provides us our whole-list runtime freshness check, we still need to lift this into a type level proof, which the `#` operator, with type `(#) : {fresh : e -> e -> Bool} -> (x : e) -> (xs : FreshList fresh e) -> Type` does by dressing our `##` in `So` clothing:
```idris
x # xs = So (x ## xs)
```
[^1]: https://en.wikipedia.org/wiki/Equivalence_relation
[^2]: The exact conditions under which this will work are beyond the scope of this article