The Algebra of Algebraic data types

gjm11 · on June 25, 2015

A few remarks on this stuff.

1. Very closely related is the pure-mathematical technique of generating functions, where you take a sequence (let's say the Fibonacci sequence 0,1,1,2,3,5,8,13,...) and use its elements as coefficients in a power series (0+1x+1x^2+2x^3+3x^4+5x^5+8x^6+13x^7+...) and then notice that (e.g.) shifting the sequence by 1 is the same as multiplying by x, so if F is the series for (f(n)) then x.F is the series for (f(n-1)) and x^2.F is the series for (f(n-2)) apart from some corrections for very small n, which gets you F=(x+x^2)F+x or F = x/(1-x-x^2), and now you can factorize that quadratic and write F in the form p/(1-ax)+q/(1-bx) which immediately gets you an explicit formula for the Fibonacci numbers (the one with all the sqrt(5)s in it).

So let's translate this into algebraic data types. We get

  F(x) = x + x F(x) + x^2 F(x)
  data FibTree a = Leaf a | Pair a (FibTree a) | Triple a a (FibTree a)

and the number of FibTrees with n "a"s in is exactly the nth Fibonacci number.

(There is a theoretically important data structure called a Fibonacci heap, but I don't believe there is any connection between these and these "Fibonacci trees" other than that in both cases there is something you can count with Fibonacci numbers.)

2. Suppose you consider a rather boring binary-tree type like this:

  data Tree = Leaf | Pair Tree Tree

whose corresponding identity is T = 1+T^2, or T^2-T+1=0. If you're a mathematician you will quickly see that the solutions (in the complex numbers!) of this are sixth roots of unity; that is, they have T^6=1. This clearly can't hold in terms of types (it would mean an equivalence between 6-tuples of trees and objects of the "unit type", there being only one such object), but it turns out that T^7=T is "true"; you can find an explicit construction for it on page 3 of http://arxiv.org/abs/math/9405205 if you like.

3. It doesn't appear that the author of the OP ever got round to writing about negative and fractional types, but other people have. For instance, this paper http://www.cs.indiana.edu/~sabry/papers/rational.pdf which I have only glanced at (it appears to do this in a more restrictive context where there aren't general ADTs with recursive definitions; negative types seem to involve running computations backwards somehow, and reciprocals are "constraints").

Apologies if I've got my Haskell syntax wrong anywhere.

alex-g · on June 25, 2015

Just as a breadcrumb for readers who are curious about the generating function stuff: https://en.wikipedia.org/wiki/Combinatorial_species (see also https://www.cis.upenn.edu/~byorgey/pub/species-pearl.pdf for a nice FP-flavoured tutorial)

acbart · on June 25, 2015

If you're like me, and you didn't immediately understand why the number of inhabitants for "a -> b" is "a^b", and needed help for the answer[1], then here's my solution to the author's question ("Why are there eight inhabitants of Tri -> Bool, but nine of Bool -> Tri? It helps to write out each possible function.").

The nine inhabitants of Bool -> Tri: https://gist.github.com/acbart/5d3fdfd8d363af26a59c

The eight inhabitants of Tri -> Bool: https://gist.github.com/acbart/c695355880a706a4ff59

Pedagogical note: If you're going to poise a question in an online presentation of material, try to have a solution (automatically checked or at least statically available). A huge part of learning is having the learner participate (which you did!) and then assessed.

[1] I got help from here, @tikhonj suggested this in another comment: http://chris-taylor.github.io/blog/2013/02/10/the-algebra-of...

kahirsch · on June 25, 2015

It may be helpful to think about this as a table.

9 functions from 2 to 3 - How many 2 digit numbers in base 3?

Each row represents a function from {True,False} to {A,B,C}.

    False  True
    -----  -----
      A      A      Function 1   f₁(False)=A   f₁(True)=A
      A      B      Function 2   f₂(False)=A   f₂(True)=B
      A      C      Function 3   etc.
      B      A      Function 4
      B      B      Function 5
      B      C      Function 6
      C      A      Function 7
      C      B      Function 8
      C      C      Function 9

8 functions from 3 to 2 - How many 3 digit numbers in base 2?

      A      B      C
    -----  -----  -----
    False  False  False   Function 1   f₁(A)=False f₁(B)=False f₁(C)=False
    False  False  True    Function 2   f₂(A)=False f₂(B)=False f₂(C)=True
    False  True   False   Function 3   etc.
    False  True   True    Function 4
    True   False  False   Function 5
    True   False  True    Function 6
    True   True   False   Function 7
    True   True   True    Function 8

wz1000 · on June 25, 2015

Imagine a collection of a objects(lets call it A) and another collection of b objects(called B). A function is a set of arrows matching each object in A to an object in B. Each object in A only has one arrow originating from it.

For each object in A, you have b ways to draw an arrow to an object in B. As there are a number of objects in A, the total number of arrows you can draw is bbb*b... a times. Hence the number of functions from A -> B is b^a.

saurik · on June 25, 2015

> the number of inhabitants for "a -> b" is "a^b"

You mean "b^a". (wz1000 has it right, but he didn't point out that the original statement had them flipped.)

cousin_it · on June 25, 2015

Maybe that part would've been easier with Bool and Unit. It's obvious that there is only one function from Bool to Unit, and two functions from Unit to Bool.

joelburget · on June 27, 2015

Hi acbart, thanks for the suggestion. This is now fixed. https://codewords.recurse.com/issues/three/algebra-and-calcu...

acbart · on June 27, 2015

Very cool! Thank you for writing this article, I found it very helpful and extremely interesting. I also found it a very nice read in general.

jbun77 · on June 25, 2015

Thanks - wasn't obvious to me either. Am I missing something though, I only see six inhabitants in your gist for Tri?

acbart · on June 25, 2015

You're right, I missed two cases! Thanks a bunch. Second pedagogical note: we just demonstrated the value of cooperative learning, which can really help a weak assessment. Similar to paired programming!

T-R · on June 25, 2015

To add another way to look at it (thinking of a function as a set of tuples):

With tuples (A,B), you're listing As uniquely paired with Bs

With functions (A -> B), you're listnig unique sets of (As non-uniquely paired with Bs), so each A can go to any value of B, with replacement.

tel · on June 25, 2015

A nice algebraic law that holds in data types is

    a -> (b, c) = (a -> b, a -> c)
    (b * c)^a = b^a * c^a

This is useful as common argument factoring and can optimize repeated function calls.

Another is the famous

     (c^b)^a = c^(b * a)
     a -> b -> c = (a, b) -> c

Which is curry/uncurry or the "product/exponential adjunction" in category theory.

Finally, consider

    a^b * a^c = a^(b + c)
    (b -> a, c -> a) = Either b c -> a

Which essentially says that given a tuple of functions we can choose which one we want to evaluate. Choose b and c to be unit and notice that `() -> a` is essentially `a` (after all, a^1 = a) and you'll have

    a * a = a^2
    (a, a) = Bool -> a

Which is the seed of a useful way of memorizing pure functions (look up memo-combinators for many more details).

sjolsen · on June 25, 2015

It's worth noting that if you take the expansion for List:

    L a = 𝟏 + a × (L a)
        = 𝟏 + a + a² + a³ + ⋯

and substitute 𝟏 for a:

    L 𝟏 = 𝟏 + 𝟏 + 𝟏² + 𝟏³ + ⋯
        = 𝟏 + 𝟏 + 𝟏  + 𝟏  + ⋯
        = 𝟏 + (L 𝟏)

you get ℕ:

    ℕ = 𝟏 + ℕ

The zero-power rule (x⁰ = 1) from classic algebra also holds; there is exactly one function from 𝟎 to any type (or in set-theoretic terms, from the empty set to any set).

There was an interesting paper on "closed semirings"[0] posted here a few days ago. Clearly, the product and coproduct on types form a semiring, and interestingly, List is the closure!

[0] http://www.cl.cam.ac.uk/~sd601/papers/semirings.pdf

hiker · on June 25, 2015

The "Analytic Combinatorics"[1] book by Flajolet and Sedgewick is also relevant here.

[1] http://algo.inria.fr/flajolet/Publications/book.pdf

tel · on June 25, 2015

Not to detract from your point, but you don't need the expansion for that. Directly substituting 𝟏 in the definitional equation works just fine:

    L a = 𝟏 + a × (L a)
    L 𝟏 = 𝟏 + 𝟏 × (L 𝟏)
    L 𝟏 = 𝟏 +     (L 𝟏)
    L 𝟏 = 𝟏 + 𝟏 + 𝟏 + 𝟏 + 𝟏 + 𝟏 + ...
    ℕ   = 𝟏 + 𝟏 + 𝟏 + 𝟏 + 𝟏 + 𝟏 + ...

The final expansion is worth writing, however, since it notes how there are unboundedly many naturals, each a different component in this infinite disjoint sum.

sjolsen · on June 26, 2015

>Not to detract from your point, but you don't need the expansion for that

I just used the expansion given in the parent. It also makes it apparent that List is the closure.

tel · on June 26, 2015

That's true!

bweitzman · on June 25, 2015

Actually I just realized that the law does not hold, the type `Void -> Maybe Void` has two inhabitants, not one:

f1 x = Nothing

f2 x = Just x

wz1000 · on June 25, 2015

Both of them are regarded as the same function(ignoring bottom) because there is no way to differentiate between them as you can never supply them with a value of type Void in order to see their result.

bweitzman · on June 25, 2015

They are similar in that they cannot be applied to any non-bottom value, but they are definitely not the same function.

sjolsen · on June 26, 2015

>they are definitely not the same function

This depends on what equivalence you're using, and the only one in which it's true (definitional equality) isn't very interesting. Extensionally, the functions are identical.

bweitzman · on June 26, 2015

Ignoring, bottom, the functions arguably have no extensionality, and therefore are only vacuously equivalent in that sense.

yummyfajitas · on June 25, 2015

`Just x` cannot be constructed since there is no `x` which is has type `Void`.

tel · on June 25, 2015

Well, there's the one in context that does! Except this function can never be called in an empty context so you ultimately are stuck.

bweitzman · on June 25, 2015

`Just x` can be constructed given an `x`. The fact that no `x` can be given is somewhat irrelevant to the definition of the function.

bweitzman · on June 25, 2015

I love that 0^0 = 1 since you can define voidToVoid :: Void -> Void = id

sjolsen · on June 26, 2015

You have to be careful when using Haskell for this sort of thing, because you can, for example, have a term of type Void:

    inhabitantOfZero :: Void
    inhabitantOfZero = fix id

(Side note: the empty type can also be interpreted as the type of non-terminating computations, and in fact typing 'fix id' into ghci will cause it to hang)

It turns out that 𝟎⁰ = 𝟏 is true, though, and if I'm not mistaken, this is true of semirings in general (where the exponent has a natural interpretation; of course you can define exponentiation this way for any semiring you like). This sole inhabitant is the empty function, which is easiest to explain in set-theoretic terms: if you define a function from A to B to be a subset of A × B such that each a ∈ A appears exactly once (for example, the negation function on Bool would be the set {(true, false), (false, true)}), it's clear that there's exactly one function from the empty set to the empty set, because there is only one subset of ∅ × ∅ (i.e., ∅) and for this set the above property is vacuously true.

For completeness, here's what the empty function looks like in Agda, which treats types much more carefully than Haskell (to understand why, look up intuitionistic logic/type theory and the Curry-Howard isomorphism):

    module EmptyFunction where
    
      data ⊥ : Set where
        -- No constructors, since ⊥ is empty
    
      inhabitant-of-𝟎⁰ : ⊥ → ⊥
      inhabitant-of-𝟎⁰ ()

bweitzman · on June 26, 2015

Yeah, things start to break down when you include bottom (and fix id = _|_ since it diverges).

I'm not sure exactly how to work with polymorphism in the function as subset model you described, but I'm guessing it would be something like this:

Let id_A ⊆ A x A s.t for all a ∈ A, (a, a) ∈ id_A.

If A = ∅, then id_A = ∅, which is exactly the result we'd expect, so I don't see any issue.

> the empty type can also be interpreted as the type of non-terminating computations

This is true, in a sense, but also a little bit misleading I think, since technically every type in Haskell is the set of terminating value with that type plus _|_

sjolsen · on June 27, 2015

>If A = ∅, then id_A = ∅, which is exactly the result we'd expect, so I don't see any issue.

The issue isn't with the example you gave, but with the idea in general that being able to produce a term of a given type in Haskell means that that type is necessarily inhabited.

>This is true, in a sense, but also a little bit misleading I think, since technically every type in Haskell is the set of terminating value with that type plus _|_

Well, ⊥ is empty, so naturally every type is itself plus ⊥. In fact, that leads to an important observation, which is that any computation in Haskell could potentially be non-terminating, not just those with type Void—and so it's really not the inclusion of the bottom type that causes things to break down, but the ability to produce non-terminating computations (more specifically, non-coterminating, i.e. non-productive "lazy" computations).

Unfortunately, the alternative—making it impossible to produce non-terminating computations—is more than anything a matter of rejecting everything the compiler can't prove terminates, which is kind of limiting, termination being undecidable in the general case. On the other hand, the sorts of computations humans design are generally much more well-structured than an arbitrary Turing machine, and so probably much easier to prove (co)termination for.

wetmore · on June 25, 2015

Worth noting that this isn't why they are called algebraic data types, which was a mistake I made for a while.

The "algebraic" in algebraic data types refers to this notion of "algebra": https://en.wikipedia.org/wiki/Initial_algebra#Use_in_Compute...

m_mueller · on June 25, 2015

As someone new to this I have to say the first paragraph doesn't do a good job at introducing the concepts. I was completely lost in how to read this. Isn't + == "or" and x == "and" like in boolean algebra? What does 'either' even mean here, if not 'or'? So then why is the Bool type 1 + 1 == 2? Shouldn't it be 1 + 0 == 1 if you want to map the possible states? Same question for "maybe" that seems to work like in Rust or Swift - shouldn't it be a + 0? If not, what does "a or unit" mean? So many questions...

tikhonj · on June 25, 2015

Take a look at this other blog post[1] with the same title, which I think does a better job of explaining the core ideas.

[1]: http://chris-taylor.github.io/blog/2013/02/10/the-algebra-of...

m_mueller · on June 25, 2015

Thank you, that clears lots of things up.

Edit: Wow, even currying is explained nicely. This blog entry is pure gold.

sjolsen · on June 25, 2015

>Isn't + == "or" and x == "and" like in boolean algebra?

In a sense, yes. "×" denotes the product of two types— i.e., the type of pairs of one value of the first type, and one of the second. "+" denotes the disjoint union of two types, the type of values which are taken from the first type or the second type. Importantly, when you have "a + a," taking a value from the left "a" is distinct from taking a value from the right "a."

>So then why is the Bool type 1 + 1 == 2? Shouldn't it be 1 + 0 == 1 if you want to map the possible states?

"0," "1," and "2" here do not denote states; rather, they denote types. In particular, "0" is the type with zero values, "1" is the type with one value, "2" is the type with two values, etc.

The reason that "1 + 1 = 2" is true is that there are two ways to form a value of type "1 + 1:" take the one possible value from the left "1," or take the one possible value from the right "1." The Bool type, by definition, has two values, so it needs to be given by the type "2," or equivalently, "1 + 1."

>Same question for "maybe" that seems to work like in Rust or Swift - shouldn't it be a + 0? If not, what does "a or unit" mean?

"a + 0" is, as the algebra suggest, just "a." This is because the only way to form a value of type "a + 0" is to use a value from "a;" there are no values in "0" to choose from. The Maybe type is effectively the identity type plus a single "None" value. Thus, we call it "a + 1;" there is a "Some" for each value in "a," and a None for each (i.e., the only) value in "1."

m_mueller · on June 25, 2015

Got it, thank you. The other link posted by tikhonj is also very good at explaining. Your phrase

> "0," "1," and "2" here do not denote states; rather, they denote types. In particular, "0" is the type with zero values, "1" is the type with one value, "2" is the type with two values, etc.

should IMO be included in OP's post in some form or another. Also an explanation for void would be helpful, as in do not confuse 'void' in Haskell with e.g. 'void' in C or 'None' in python, as in those types are rather corresponding to 'unit' in Haskell.

brudgers · on June 25, 2015

Reading the article reminded me of reading about Lisp macros (which I have been doing a bit of recently]. The shift from reasoning about values to reasoning about types feels a Ltd like the shift from reasoning about runtime to reasoning about compile time.

The interesting take away for me is that smaller data types are going to make it easier to reason about the state space. In turn this reminded me of Perlis:

Epigram 34. The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information.

In that the state space is so massive.

eru · on June 25, 2015

(Formal) differentiation also makes sense for grammars, and in particular gives rise to an interesting way to match regular expressions.

mafribe · on June 25, 2015

The paper introducing this concept is [1] but was long forgotten. Recently the use of derivations to implement regular expressions has become popular again, see e.g. [2].

[1] J. A. Brzozowski's, "Derivatives of Regular Expressions", https://dl.acm.org/citation.cfm?id=321249

[2] https://www.cl.cam.ac.uk/~so294/documents/jfp09.pdf

sjolsen · on June 25, 2015

>Recently the use of derivations to implement regular expressions has become popular again

And they're not limited to regular languages; with the right computational tricks (lazy evaluation, coinduction, whatever you want to call it), it can be used with context-free languages in general: http://matt.might.net/papers/might2011derivatives.pdf