There is a very useful perspective in which categories are just typed monoids, and the monoid operation can only be applied when the types "line up". For example, here are some useful operations which do not form a monoid:
- PUSH(n) where n is a floating-point number
- POP
- CLEAR
- ADD, SUB, MUL, DIV
- ID
These can be interpreted as operations on a stack of floating-point numbers in the obvious way, PUSH(1.2) * [3.14] == [1.2, 3.14], POP * [1, 2, 3] == [2, 3], ADD * [1, 2, 5] == [3, 5], CLEAR * [1, 2, 3] == [], ID * [1, 2, 3] == [1, 2, 3], etc. However, not all of the compositions of stack operations are legal. For example, ADD * PUSH(1) * PUSH(2) is fine and equivalent to PUSH(3), but ADD * PUSH(1) * CLEAR is illegal.
Ok, so our stack operations don't form a monoid. But they obviously can still be composed, sometimes, so what do we have if not a monoid? They form a category! There is one object for each natural number, representing the height of the stack. So there are arrows like PUSH(3.14) : Height_{n} -> Height_{n+1} for all n, and POP : Height_{n} -> Height_{n-1} whenever n >= 1, and ADD : Height_{n} -> Height_{n-2} whenever n >= 2.
Another common example is matrices. Square matrices form a monoid, but what about arbitrary rectangular matrices? They don't form a monoid, but they do form a category where the objects are natural numbers, and the arrows N -> M are just the MxN matrices. You can't multiply any two matrices, but if you have a P -> Q matrix (QxP) and a Q -> R (RxQ) matrix then you can multiply them to get a P -> R matrix (RxP).
Yes, there are a number of them. Here are some examples off the top of my head:
- Moggi was studying the problem of equivalence of programs, and noted that the traditional approach to modeling a program as a total function Input -> Output is problematic. He pioneered the use of monads and Kleisli categories as a foundation for reasoning about equivalence of real programs, including all the real-world nastiness like non-termination, partiality (e.g. throwing an exception that kills the program), non-determinism, and so on. https://person.dibris.unige.it/moggi-eugenio/ftp/lics89.pdf
- Linear logic (and it's close relative affine logic) was the inspiration behind Rust's ownership model, from what I understand. Linear logic was originally described in terms of the sequent calculus by Girard (http://girard.perso.math.cnrs.fr/linear.pdf), but later work used certain categories as a model of linear logic (https://ncatlab.org/nlab/files/SeelyLinearLogic.pdf). This answered and clarified a number of questions stemming from Girard's original work.
- Cartesian-closed categories (CCCs) form models of the simply-typed lambda calculus, in the sense that any lambda term can be interpreted as a value in a CCC. Conal Elliott pointed out that this means that a lambda term doesn't have just one natural meaning; it can be given a multitude of meanings by interpreting the same term in different CCCs. He shows how to use this idea to "interpret" a program into a circuit that implements the program. http://conal.net/papers/compiling-to-categories/
- There is a classical construction about minimizing a DFA due to Brzozowski which is a bit magical. Given a DFA, do the following process twice: (a) get an NFA for the reverse language by reversing all edges in the DFA and swapping start / accept nodes, then (b) drop any nodes which are not reachable from a start node in the NFA. The result will be the minimal DFA that accepts the same language as your original DFA! Bonchi, Bonsangue, Rutten, and Silva analyzed Brzozowski's algorithm from a categorical perspective, which allowed them to give a very clear explanation of why it works along with a novel generalization of Brzozowski's algorithm to other kinds of automata. https://alexandrasilva.org/files/RechabilityObservability.pd...
- I would also put the development of lenses in this list, but they haven't leaked very far outside of the Haskell universe yet so I don't think they are a compelling example. Check back in 5 years perhaps. Here's a blog post describing how lenses relate to jq and xpath: https://chrispenner.ca/posts/traversal-systems
- I've personally had success in finding useful generalizations of existing algorithms by finding a monoid in the algorithm and replacing it with a category, using the fact that categories are like "many-kinded monoids" in some sense. I haven't written any of these cases up yet, so check back in 2 years or so. In any case, they've been useful enough to drive some unique user-facing features.
This comment pairs very well with the recent thread on Heaviside’s operator calculus. He got ahead of theory, did groundbreaking work, and was ultimately superseded by more principled approaches. I think this illustrates a kind of intellectual leapfrog. Sometimes we do things that are right, but it’s not clear why they work, and other times we have theoretical structures that open up new realms of practice.
I just realized I botched the description of Brzozowski's algorithm, step (b) should be "determinize the NFA using the powerset construction". Mea culpa.
That quote about equations with rational numbers was from here. Galois didn't have a computer of course. Rational Numbers and trisecting an angle sound related.
The quote is followed immediately by this: "It extends naturally to equations with coefficients *in any field*, but this will not be considered in the simple examples below." Emphasis on 'in any field' is mine. Among the other fields that can be considered include the Galois fields, which are another name for finite fields. (There are also infinite fields other than the rationals, so 'in any field' does not just mean Galois fields/finite fields.) https://en.wikipedia.org/wiki/Finite_field
Galois fields have nothing to do with being able to represent rational numbers in a computer: elements of a finite field aren't even rational numbers.
I can see how the idea extends naturally and also how it doesn't extend naturally. Thanks for explaining, I wonder what Galois would do if alive today with computers.
No, it's quite different. A super rough but somewhat accurate starting point is to think of a lens as being like the `.foo.bar.baz` in `myobject.foo.bar.baz`: a way to describe some "location" inside of a data structure, in a way that allows for getting and setting. But lenses are also data, so they can be stored in variables, manipulated, and so on. It's a very useful concept, and very much not required or helpful when learning Haskell. It's also not a core Haskell concept in any sense, just something that has been built on top of Haskell as a library.
Also helpful to note that the "location" can be virtual. For example you could have a lens into the hours field a string that contains an ISO-8601 formatted date, so you could do something like
let s = "20240722T193456.789-0800"
putStrLn $ 19840206T193456.789-0800 -- prints "20240722T203456.789-0800"
and they can be composed, so if you have some other lens that gets from your outer structure to an ISO-8601 formatted date string field, let's call that lens `fieldLens` then `fieldLens . hours` will get from your outer structure to the hours field of that date field.
A common failure mode is for people to think Haskell is some special snowflake that requires reading 50 books and papers to understand. It doesn’t. Learning by doing is definitely the way to go. LYAH is fine but not great at practical problems. Real World Haskell is somewhat out of date but better at the actual “how do I make a program that does real things?” question. Best bet is to hack until you get stuck or your solution seems too ugly, then ask for leads on Reddit or the FP discord.
I started by reading those two, then made it through a chunk of 99 problems in Haskell — at some point I lost interest and felt ready to write applications.
There are some interesting solutions out there, such as bit-banding used in some ARM Cortex CPUs. This maps entire bytes in the high part of the address space to single bits in the low part of the address space, so that you can make an atomic set or clear of a single physical bit by writing to a byte of memory. https://spin.atomicobject.com/bit-banding/
It's almost correct, but misses the point in an annoying way that kind of ruins the example. What does work is something like the subset of the plane given by { (x, y) | x real, y rational } U { (0, y) | y real }. This is connected, because you can walk from any point (x,y) to any other point (x',y') by traveling horizontally to the Y axis at (0,y), vertically to (0,y'), then horizontally to (x',y'). But it isn't locally connected away from the Y axis because for a tiny enough open set S around a point (x,y), there are other points in S that you can't get to from (x,y) without leaving S.
It's definitely surprising, for a couple of reasons:
1. It isn't just the uninteresting result that the set of trees has the same cardinality as the set of 7-tuples of trees; the bijection here is given by a finite, non-looping program built out of `isEmpty : Tree -> Bool`, `getLeft : Tree -> Maybe Tree`, `getRight : Tree -> Maybe Tree`, and the constructors `empty : Tree` and `join : Tree x Tree -> Tree`
2. It isn't true for any number 1 < x < 7
3. In any case, why should it work out exactly? Why not "one tree can be encoded into seven trees, or one of these 13 remaining cases"?
Nice puzzle. I found a 18 step solution but of course somebody posted one in the comments years ago.
Why not "one tree can be encoded into seven trees, or one of these 13 remaining cases"?
This direction is no problem, at least if we accept a graph with zero vertices as a tree which is probably non-standard as it should then have -1 edges. But if we allowed it, then a trivial mapping would be T -> (T,Ø,Ø,Ø,Ø,Ø,Ø). The other direction is much more problematic, one can easily map n trees into one, but then one gets stuck with either some trees that do not map back to any tuple or some tuples that have multiple representations. So now I am really curious what makes 7 special, so I will probably have to read the paper after all.
It gives a set-theoretic bijection, but not one that plays by all the rules I mentioned above. In particular, you don't get a bijection that corresponds to a finite, non-looping program.
Generally speaking, Cantor–Schröder–Bernstein does not give you a (finite) construction of the bijection, because you have to follow the inverses of f and g back until you either end up with something from the first set with no preimage under g, or something from the second set with no preimage under f, or you find that you are iterating forever. You have to make a decision based on whether or not a certain computation terminates. That's essentially why it is a non-constructive theorem (in fact, it even implies the law of excluded middle)
But in your specific case, we're kind of in luck. The injections are simple enough that you can work out the bijection that you'd get from König's proof manually, and it goes like this:
Concretely, let's write (AB) for the tree with left subtree A and right subtree B, and write . for the empty tree. Your injections are f(A,B) = (AB) and g(T) = (T.). Alternately applying f/g partitions the set of trees into a bunch of infinite sequences. Your bijection is given by: if T is part of the sequence that begins with the empty tree, pair T with (T,.). If T is part of some other sequence, then T is not the empty tree; it is of the form T = (LR), and you should pair T with (L,R). Concretely, the sequence that starts with the empty tree looks like:
In other words, the single trees that appear in the empty list's sequence are the fully-left-leaning trees like ((((..).).).); all other trees are in the other sequences. So to decide where your tree goes in the bijection, you have to do this:
if (T is fully left-leaning) then (T,.) else (left-child(T), right-child(T))
And computing whether or not T is fully left-leaning involves an unbounded amount of computation. You have to actually walk the whole tree. So this bijection won't correspond to a finite, non-looping program. In a sense, the algorithm you get from Cantor–Schröder–Bernstein is not "continuous", but the one you get from the Seven Trees In One construction is.
Another vote here for the matrixanalysis.com book. It is a really excellent book and takes a reasonably pragmatic approach to linear algebra. Coming from a programming background, you're more likely to find some things that resonate with you here. For example, this is one of the few introductory linear algebra books that deals with sensitivity analysis, which is useful to think about when dealing with floating point arithmetic instead of real arithmetic.
- PUSH(n) where n is a floating-point number
- POP
- CLEAR
- ADD, SUB, MUL, DIV
- ID
These can be interpreted as operations on a stack of floating-point numbers in the obvious way, PUSH(1.2) * [3.14] == [1.2, 3.14], POP * [1, 2, 3] == [2, 3], ADD * [1, 2, 5] == [3, 5], CLEAR * [1, 2, 3] == [], ID * [1, 2, 3] == [1, 2, 3], etc. However, not all of the compositions of stack operations are legal. For example, ADD * PUSH(1) * PUSH(2) is fine and equivalent to PUSH(3), but ADD * PUSH(1) * CLEAR is illegal.
Ok, so our stack operations don't form a monoid. But they obviously can still be composed, sometimes, so what do we have if not a monoid? They form a category! There is one object for each natural number, representing the height of the stack. So there are arrows like PUSH(3.14) : Height_{n} -> Height_{n+1} for all n, and POP : Height_{n} -> Height_{n-1} whenever n >= 1, and ADD : Height_{n} -> Height_{n-2} whenever n >= 2.
Another common example is matrices. Square matrices form a monoid, but what about arbitrary rectangular matrices? They don't form a monoid, but they do form a category where the objects are natural numbers, and the arrows N -> M are just the MxN matrices. You can't multiply any two matrices, but if you have a P -> Q matrix (QxP) and a Q -> R (RxQ) matrix then you can multiply them to get a P -> R matrix (RxP).