Do some editing

2025-07-14 10:37:13 -04:00 · 2025-07-14 10:37:13 -04:00 · 536f2ecbf9
commit 536f2ecbf9
parent 18a4eac1f1
1 changed files with 143 additions and 33 deletions
--- a/projects/Idris/src/LessMacrosMoreTypes/RowPolymorphism.md
+++ b/projects/Idris/src/LessMacrosMoreTypes/RowPolymorphism.md
@ -19,9 +19,11 @@ This article is inspired by my work on implementing the _Brown Benchmark for Tab
 ## What is Row Polymorphism
-At time of writing, the [Wikipedia page](https://en.wikipedia.org/wiki/Row_polymorphism) for row polymorphism starts with 
+At time of writing, the [Wikipedia page](https://en.wikipedia.org/wiki/Row_polymorphism) for row polymorphism starts with:
 > In programming language type theory, row polymorphism is a kind of polymorphism that allows one to write programs that are polymorphic on row types such as record types and polymorphic variants.
-and then leads into some mathematical notation heavy text that doesn't really elucidate much, and what the hell does "polymorphic on row types" even mean?
+
 leading into some text dense with mathematical notation that doesn't do much in explaining what any of this _looks_ like, and what the hell does "polymorphic on row types" even mean?
 Consider a basic `record` in Idris as an example:
 ```idris
@ -32,13 +34,13 @@ record ImACat where
  meows_a_lot : Bool
 ```
-This certainly seems like it might fit the bill, it says `record` right there. But what are these "row types"?
+This certainly seems like it might fit the bill, it says `record` right there, and the wikipedia snippet _did_ mention records. But what are these "row types"?
 ### Row Types
-Row types often are called records in other contexts, and while "record" may be a somewhat more intuitive name for someone approaching this from the perspective of many common programming languages, I will continue to largely use the name "row type" as it will make more sense in the tabular data context I will be framing the rest of this post in terms of.
+"Row Type" and "Record Type" are sometimes used interchangeably in this corner of type theory. While "record" may be a somewhat more intuitive name for someone approaching this from the perspective of many popular programming languages, we will mostly be using the name "Row Type", as it will make more sense in the tabular data context that will serve as framing for this article.
-Consider some tabular data, say a table in a database, or a spread sheet. Tabular data is composed of a 2d grid of rows and columns, typically with each column having a name and potentially a type constraint on the values in that column. Consider this example of some tabular data: 
+Imagine some tabular data, say a table in a database or a spread sheet. Tabular data is composed of a 2d grid of rows and columns, typically with each column having a name and potentially a type constraint on the values in that column. Consider this example: 
 <style type="text/css">
 #tbl-cats tbody tr :nth-child(2) {
@ -57,13 +59,13 @@ Consider some tabular data, say a table in a database, or a spread sheet. Tabula
 </div>
-A row type is parameterized by a list of `(label, Type)` pairs, represented by the header of the table. We can see that the labels and types in our *Cats* table are identical to the fields of our `ImACat` record from earlier. Under this view, each row in the table can be thought of as an instance of the `ImACat` record. 
+"Row Types" are constructions for describing individual rows in such a structure [^7]. They are parameterized by a list of `(label, Type)` pairs, represented by the header of the table and referred to as a schema. We can see that the labels and types in our *Cats* table are identical to the fields of our `ImACat` record from earlier. As such, each row in the table can be thought of as an instance of the `ImACat` record, and the table itself as a list of such records.
-That parameterization on a schema, however, makes row types a bit more flexible, as well as a bit more complicated. We can perform the usual operations, like accessing a column in a row by name, but we can also perform new operations that enable a sort of limited, type-safe reflection over row types, such as selecting a specific set of columns from a row to produce a new row type, or concatenating the schemas of two rows to create a new row type expressing the result of joining the two rows.
+That dependence on a schema, a piece of data describing the structure of the type, makes row types a bit more flexible, as well as a bit more complicated. We can perform the usual operations, like accessing a given column in a row by name, analogous to field access like calling `name` on value of `ImACat`. More interestingly, we can also perform new operations that enable a sort of limited, type-safe reflection over row types, such as selecting a specific set of columns from a row to produce a new row type, or concatenating the schemas of two rows to create a new type expressing the result of joining the rows.
 ### The Mysterious Row Polymorphism
-Imagine you are helping your friend move and they need to figure out what furniture will fit in what rooms. The current property owner of their new place has graciously provided a table containing the length and width of each room, and their furniture suppler has provided a table detailing the lengths, widths, and heights of the bounding cuboids for each piece of furniture. Let's say the tables look a bit like this:
+Imagine you are helping your friend move and they need to figure out what furniture will fit in what rooms. The previous occupant of their new place has graciously provided a table containing the length and width of each room, and their furniture supplier has provided a table detailing the lengths, widths, and heights of the bounding cuboids for each piece of furniture:
 <style type="text/css">
 #tbl-rooms tbody tr :nth-child(2),
@ -107,27 +109,27 @@ Imagine you are helping your friend move and they need to figure out what furnit
 One thing that's going to be critical to determine is how much floor space is left in a room after the furniture's been placed in it. After all, a room isn't of much use if there's not even enough free floor space to stand in it. You notice that both of these tables contain the information needed to calculate the available area and the foot prints of the furniture, the `length` and `width` columns.
-Its tempting to define a `Room` data type and a `Furniture` data type containing each table's respective columns as fields, and describe the surface area/footprint via an `Area` interface. However, this approach has several drawbacks. It requires implementing the interface manually for each data type[^3], can hide certain categories of error like inadvertently using the wrong field to calculate the area, and is potentially quite fragile if the structure of the tables changes on us or we need to add more tables to our system.
+Its tempting to define a `Room` data type and a `Furniture` data type containing each table's respective columns as fields, and describe the area/footprint via an `Area` interface. However, this approach has several drawbacks. It requires implementing the interface manually for each data type[^3], can hide certain categories of error, such as inadvertently using the wrong field to calculate the area, and is potentially quite fragile if the structure of the tables changes on us or we need to add more tables to our system.
 We can do better. With our parameterized row types we can write an `area` function that is defined to work on any row type that has a `length` and a `width` column of type `Double`. This is the essence of row polymorphism, writing code that's generic not over what interfaces a type implements, but instead is generic over what fields the type contains.
 ### Pros and Cons
-While row polymorphism does come with the downside of loosing proof-of-intentionality[^4] , so to speak, it does come with some major upsides. Two columns in two different tables having the same names and types, doesn't always mean that they are semantically equivalent, but its often a pretty good bet, especially when they are coming from the same source. More importantly, thinking about data types in terms of what values they contain rather than what behavior they implement is often a much better way to think about messy real-world data that was laid out with a dynamically typed language in mind on a good day, and not laid out with computers in mind at all on a bad day.
+While row polymorphism does come with the relative downside of loosing proof-of-intentionality[^4], so to speak, it does come with some major upsides. Two columns in two different tables having the same names and types doesn't always mean that they are semantically equivalent, but its often a pretty good bet, especially when they are coming from related sources. More importantly, thinking about data types in terms of what values they contain rather than what behaviors they implement is often a much better way to think about messy real-world data that was laid out with a dynamically typed language in mind on a good day, and not laid out with computers in mind at all on a bad day.
-Like so many things in programming, its a trade off. Low level systems programming, where almost all the data you are working with was generated by the computer, often by systems you also control isn't going to get much benefit from row polymorphic programming. Other tasks like frontend, business logic, and video games quite frequently deal with extremely messy data that has only loosely defined shape and doesn't want to fit neatly into a type hierarchy have a lot to gain from row polymorphism. In the former category, you are more likely to hurt from the loss of semantic guarantees and end up needlessly weakening your type system and tripping over yourself more than you would without it. In the messier situation, you are instead likely to both end up saving type fighting with the type system trying to fit your data to it, and also run into the downsides of row polymorphism much less frequently.
+Like so many things in programming, its a trade off. Low level systems programming, where almost all the data you are working with was generated by the computer, often by systems you also control, isn't going to get much benefit from row polymorphic programming. Other tasks like frontend work, business logic, and video games quite frequently deal with extremely messy data that has only loosely defined shape and doesn't want to fit neatly into a type hierarchy, leaving you with a lot to gain from row polymorphism. In the former category, you are more likely to hurt from the loss of semantic guarantees and end up needlessly weakening your type system and tripping over yourself more than you would without it. In the messier situation, you are instead likely to both end up saving type fighting with the type system trying to fit your data to it, and also run into the downsides of row polymorphism much less frequently.
-Row polymorphism is also a fantastic way to think about assigning types to code that was originally written for a dynamically typed programming language[^5]. In my personal experience I have found that when an experienced programmer who really seems like they should know better is making some weird sounding arguments about how type systems decrease their productivity and aren't working it, they seem to usually be complaining about the lack of row polymorphism (or the closely related structural subtyping) in the popular statically typed languages, just without having the technical vocabulary for it.
+Row polymorphism is also a fantastic way to think about assigning types to code that was originally written for a dynamically typed programming language[^5]. In my personal experience I have found that when an experienced programmer who really seems like they should know better is making some weird sounding arguments about how type systems decrease their productivity and don't prevent enough bugs to be worth it, they seem to usually be complaining about the lack of row polymorphism (or the closely related structural subtyping) in popular statically typed languages, just without having the technical vocabulary for it.
 ## To Type a Table
-One of the core features of row polymorphism is that the labels and types of the columns are determined by a piece of data, a list like structure containing pairs of labels and types. In languages built specifically to be row polymorphic, this data describing the column structure is often implicit. Idris, however, isn't built to be row polymorphic. It's type system just happens to be powerful enough that we can build it on our own, so we will have to build a schema composed of straight up pairs of labels and types.
+One of the core features of row polymorphism is that the labels and types of the columns are determined by a piece of data, a list like structure containing pairs of labels and types. In languages built specifically to be row polymorphic, this data describing the column structure is often implicit. Idris, however, isn't built to be row polymorphic. It's type system just happens to be powerful enough that we can build it on our own, so we will have to build an explicit schema composed of our own pairs of labels and types.
-A more developed library would probably use bespoke types for most of these components, but for the sake of keeping things concise, we'll use regular ol' lists as much as possible.
+A more developed library would use bespoke types for most of these components, but for the sake of keeping things concise, we'll use regular ol' lists as much as possible.
 ### Building the Schema
-To make things just a little bit cleaner, we'll define our own data type for the pairs, which we will call `Header`:
+To make things just a little bit cleaner, we'll define our own data type for the pairs, which we will call `Header`, and an infix constructor for it, `:>`:
 > [!NOTE]
 > Remember that the Idris concept called a `record` is a distinct concept, and Idris `record`s are _not_ the row types we are going to be building.
@ -145,7 +147,7 @@ public export
 (:>) str x = MkHeader str x
 ```
-The schema itself is really simple, just an alias for a list of headers:
+For this article, we will keep the schema simple and just use a basic list of headers:
 ```idris
 public export
@ -154,7 +156,7 @@ Schema = List Header
 ```
-We will also need a way to compute the column types from our schema, turning a list of headers into a list of types:
+We will also need a way to compute the column types from our schema, turning a list of headers into a list of types. This amounts to mapping the `type` accessor over our list of `Headers`, but we will implement this with a bespoke function to avoid negative interactions with proof search resulting from the standard library's implementation of `map`:
 > [!TIP]
 > This function has an UpperCamelCase name because it is intended to be called in type signatures. This is both conventional, and reduces ambiguity as it can't be confused for a lowercase, implicitly defined, argument name.
@ -171,7 +173,7 @@ ColumnTypes (h :: hs) = h.type :: ColumnTypes hs
 A row is a list of the same length as the schema where every list element has a type determined by the header in the same position of the schema:
 > [!TIP]
-> `HList` is a standard library type with type constructor `List Type -> type`. Each type in the list of types determines the type of value in that position of the list. Notice this also constrains the length of the list, it must be exactly the same as the length of the list of types. This can be thought of as a more primitive version of the `Row` type we are building.
+> `HList` is a standard library type with type constructor `List Type -> type`. Each type in the list of types determines the type of value in that position of the list. Notice this also constrains the length of the list, it must be exactly the same as the length of the list of types. `HList` can be thought of as a more primitive version of the `Row` type we are building.
 ```idris
 public export
@ -179,10 +181,10 @@ Row : Schema -> Type
 Row hs = HList (ColumnTypes hs)
 ```
-We will also want a way of indexing the values in a row by their columns, instead of by their positions. We only want this function to be callable when the row actually _has_ that column, so we also require an inclusion proof, as an auto-implicit so the compiler can generate it for us when possible: 
+We will also want a way of indexing the values in a row by their columns, instead of by their positions. We only want this function to be callable when the row actually _has_ that column, so we also require an inclusion proof as an auto-implicit, so the compiler can generate it for us when possible: 
 > [!TIP]
-> A value of type `Elem x xs` is a proof that the value `x` is included in the list `xs`, which also encodes its position in the list. The type signature of the `prf` argument thus reads "`h` is an element with known position in `hs`"
+> A value of type `Elem x xs` is a proof that the value `x` is included in the list `xs`, which also encodes its position in the list. The type signature of the `prf` argument thus reads "`h` is an element of `hs` with known position in `hs`"
 ```idris
 namespace RowIndex
@ -192,9 +194,9 @@ namespace RowIndex
  index {hs = (_ :: hs)} h (y :: ys) {prf = (There x)} = index h ys
 ```
-### Making a table
+### Making a Table
-The table itself is pretty simple, its just a list of rows:
+A table is a list of rows all sharing the same schema, and, with our new tools, can easily be defined as such:
 ```idris
 public export
@ -202,9 +204,9 @@ Table : Schema -> Type
 Table hs = List (Row hs)
 ```
-## Using our tables
+## Using our Tables
-Now that we have a way to describe the type signature of a table, let's go ahead and translate our previous table examples into code:
+Now that we have a way to describe the type signature and structure of a table, let's go ahead and translate our previous table examples into code:
 ```idris
 Rooms : Schema
@ -237,7 +239,7 @@ furniture =
 ### A simple row polymorphic function
-With our tables defined, it's time to revisit our `area` function. The schema will be an implicit variable in the type signature, and we'll constrain it such that both the length and width fields are elements of the schema. These constraints allow us to simply call the `index` function with the given fields on the provided row:
+With our tables defined, it's time to revisit our `area` function. The schema will be an implicit variable in the type signature, and we'll constrain it such that both the length and width fields are elements of the schema. These constraints allow us to simply call the `index` function with the given fields on the provided row and have the guarantee that if our call type checks, we'll get a value back, no need for `Maybe` here[^8]:
 > [!TIP]
 > In a type signature, `x =>` is equivalent to `{auto _ : x} ->`
@ -251,10 +253,10 @@ area x =
  in length * width
 ```
-Without any further work, we can now call this function on both our room and furniture tables and it works as expected:
+Without any further implementation, we can now call this function on both our room and furniture tables and it works as expected:
 > [!NOTE]
-> In this specific context, we do have to explicitly provide the schema to the `area` function to disambiguate, but this can be worked around and should be considered more of an artifact of the limited implementation in this blog post then an intrinsic ergonomics concern.
+> In this specific context, we do have to explicitly provide the schema to the `area` function to disambiguate, but this can be avoided with more careful design and should be considered more of an artifact of the limited implementation in this article then an intrinsic ergonomics concern with this approach to programming in Idris.
 ```idris hide
 -- @@test map area rooms
@ -284,7 +286,7 @@ mapAreaFurniture = do
 ### Building a new column
-Just being able to access the fields in a row polymorphic way is pretty neat, but its also not really that impressive. We have much more interesting capabilities at our disposal, like adding new fields. We can take a function that consumes a row and produces a new value, along with a name for the new value, and use that to add a new column to each row in the table:
+Just being able to access the fields in a row polymorphic way is pretty useful, but its also not really that impressive. We have much more interesting capabilities at our disposal, like adding new fields. As a first example, we can take a function that consumes a row and produces a new value, along with a name for the new value, and use that to add a new column to each row in the table:
 ```idris
 buildColumn : {hs : Schema} 
@ -299,7 +301,7 @@ buildColumn f name table = map buildColumnRow table
    buildColumnRow row = addToEnd (f row) row
 ```
-With this new under our belt, we can make new tables derived from our previous ones, such as adding an area column to our previous two tables:
+With our new tool, we can make new tables derived from our previous ones, such as adding an area column to our previous two tables:
 ```idris
 roomsWithArea : Table (Rooms ++ ["footprint" :> Double])
@ -352,15 +354,117 @@ furnitureWithAreaTest = do
 ### Select
 We can also implement a function that takes only certain specified fields from a table, analogous to an SQL `SELECT` statement. This function only makes sense to apply when all the fields in the new schema are also in the old schema, so we take an `auto` implicit argument containing a proof of inclusion for each element of the new schema, which we will use to index into the input table. `select` makes sense to define both for individual rows and whole tables, so we create two different versions of it in two different namespaces to allow for overloading:
 > [!TIP]
 > `All` is a more general version of `HList`. `All p xs` can be thought of as being equivalent to `HList (map p xs)`, but in the Idris standard library, `HList xs` is actually an alias for `All id xs`.
 ```idris
 namespace SelectRow
  public export
  select : {hs : Schema} -> Row hs -> (is : Schema) -> {auto allIn : All (`Elem` hs) is}
    -> Row is
  select r [] {allIn = []} = []
  select r (x :: xs) {allIn = (this :: rest)} = 
    let this = x `index` r
    in this :: select r xs
 namespace SelectTable
  public export
  select : {hs : Schema} -> Table hs -> (is : Schema) -> {auto allIn : All (`Elem` hs) is}
    -> Table is
  select xs is = map (\r => select r is) xs
 ```
 Lets give this a test drive, we only really wanted to know how big the rooms are[^6], and now that we have the areas calculated, we can discard the length and width fields:
 ```idris
 roomAreas : Table ["room" :> String, "footprint" :> Double]
 roomAreas = 
  select {hs = Rooms ++ ["footprint" :> Double]}
    roomsWithArea ["room" :> String, "footprint" :> Double]
 ```
 ```idris hide
 -- @@test roomAreas
 roomAreasTest : IO Bool
 roomAreasTest = do
  pure $
 ```
 <div class="unit-test">
 ```idris
  roomAreas ==
  [ ["Bedroom",     12.2275]
  , ["Living Room", 16.314500000000002]
  , ["Dining Room", 5.9536]
  , ["Kitchen",     8.174]
  , ["Bathroom",    8.905999999999999]
  ]
 ```
 </div>
 One interesting thing to note about `select` is that you don't actually need to know the complete schema of the table it was called on for your call to type check. This caveat allows `select` calls to still work correctly after rows have been added to tables, reused on tables that have had some sort of join operation applied to them, or work across entire families of related tables.
 ### Implementing Regular Interfaces for Row Types
-### Generating table types for sql schemas, with runtime example
+Implementing types as `Row`s doesn't lock us out of more traditional programming methods. Say, for instance, in interacting with someone else's code, it turns out that we actually do have to provide area information through a `HasArea` interface:
-## Simplifications made for the sake of this blog post
+```idris
 interface HasArea t where
  getArea : t -> Double
 ```
- Hardcoding string as the label type
+Not only can we implement this for a specific `Row` type, we can implement it generically for any `Row` that has the appropriate fields in one shot:
 ```idris
 [rowHasArea]
 {hs : Schema} -> All (`Elem` hs) ["length" :> Double, "width" :> Double] 
  => HasArea (Row hs) where
  getArea row = 
    let [length, width] = select row ["length" :> Double, "width" :> Double]
    in length * width
 ```
 > [!NOTE]
 > This code is a bit needlessly ugly, the implicit arguments can be made amenable to proof search by using bespoke types instead of generalist standard library types, but that would have made this article a lot longer than it needed to be.
 ```idris hide
 -- @@test getArea
 getAreaTest : IO Bool
 getAreaTest = do
  pure $
 ```
 <div class="unit-test">
 ```idris
  map (getArea @{rowHasArea {hs = Rooms}}) rooms == 
    map head 
      (select {hs = Rooms ++ ["footprint" :> Double]} roomsWithArea ["footprint" :> Double])
 ```
 </div>
 ## Simplifications made for the sake of this article
 I've made a number of simplifications compared to my implementation in _Structures_[^2] and implementations in other languages to keep the implementation side concise. If _Structures_ hasn't been updated yet by the time you are reading it, you can check out the status at the time I was writing this article [here](https://git.sr.ht/~thatonelutenist/Structures/tree/663dd01931b4a31f14ce61bb54f9cdff579c3b25/item/src/Structures/Tables.idr). I also recommend checking out the [_Brown Benchmark For Table Types_](https://github.com/brownplt/B2T2) for a more complete treatment of the topic.
 To summarize a few of the changes relative to _Structures_:
 - Label types are hard coded to `String`
  While the label type is going to be `String` in a lot of cases in Idris code, really any type that has a decidable equality implementation, or even just plain equality, depending on the level of guarantees you wish to provide.
 - Feilds aren't nullable
- Not validating the consistency of the schema
+
  Tabular data coming in from the real world quite often has fields that are missing their data, someone may have not answered a question on their survey, that particular row might be older than that field's addition, among a litany of other reasons. Many other treatments make all fields implicitly nullable, and _Structures_ provides explicit control over nullability in the schema, but I've ignored nullability here to make the examples much more concise.
 - The consistency of the schema isn't ensured
  A lot of operations on row types stop making sense if the schema doesn't satisfy certain constraints. For instance, indexing (among other operations) gets weird if there are two columns with the same name in the schema. _Structures_ has its functions accept auto-implicit proofs of the relevant properties for each functions, but those proof types consist of a large amount of the verbosity of the library, and is not needed for conveying the basic idea of row polymorphism.
 - Lack of bespoke types
  The auto implicits demonstrated in this article all interact with proof search better if they are bespoke types instead of compositions of standard library types. Not only are those bespoke types not needed for conveying the basic idea, but the use of standard library types helps show the connection to more general ideas, making it a better idea for a purely pedagogical implementation like this.
 [^1]: <https://github.com/brownplt/B2T2>
@ -371,3 +475,9 @@ furnitureWithAreaTest = do
 [^4]: Generally speaking, with "traditional" data types, an interface typically requires some manual intervention from the programmer to be implemented for a given type. This provides a level of assurance that the implementation over that data type is intentional, and thus likely semantically valid.
 [^5]: For instance, typescript's structural subtyping is a very good approximation of row polymorphism, and this is something I believe has contributued just as much to its success as a javascript replacement as its backwards compatibility. A statically typed langauge that could run existing javascript code, but made it a battle to assign types to that existing javascript code, wouldn't be very useful as a transition language.
 [^6]: Because that's totally all the information you could ever need to determine what all will fit in a room, right? 
 [^7]: Though row types need not inhabit tables, they can also be used effectively on their own 
 [^8]: In other languages, feilds in row types may be considered to be implicitly nullable. In _Structures_, I provide explicit control over nullability of fields in the schema.