Cantor Pairing

The ideas discussed in this post are implemented using GHC generics in the package cantor-pairing

Consider the following data type:

data TrafficLight = R | Y | G

It's straightforward enough to see that if I have a value of type TrafficLight, there's only 3 (moral) possibilities. But what if I had something like (TrafficLight , TrafficLight)? Or Either TrafficLight Bool? Or TrafficLight -> Bool? How many possibilities are there for each of these? Can we enumerate them? What if instead of having types like TrafficLight or Bool, we have something like Natural? In this article, I want to use Cantor's arguments to demonstrate when and how we can construct isomorphisms between certain types.

First, the cardinality of a set (or, in our world, a type) is just a formal term for its size. In the simple case, we deal with finite cardinalities, and combining them using the typical primitives of algebraic data types (sums, products, the unit type, and the empty type) is straightforward. For example, TrafficLight above has finite cardinality 3, while Bool has finite cardinality 2, and unsurprisingly, their product (TrafficLight , Bool) has finite cardinality 3 * 2 = 6. We can think about it like this:

       R       Y       G
 T  (R , T) (Y , T) (G , T)
 F  (R , F) (Y , F) (G , F)

We can put an ordering on the pairs easily enough: you can just read the values off from left to right and top to bottom just like you normally would with text on a page, so the first value is (R , T), then (Y , T), then (G , T), then (R , F), then (Y , F), and finally (G , F). Simple!

With sums like Either TrafficLight Bool, again our intuition holds: we could be in the Left with a value of type TrafficLight, yielding 3 possibilities, or we could be in the Right with a value of type Bool, yielding 2 possibilities for a total of 3 + 2 = 5. Enumeration is simple: enumerate all the values on the left, and when you finish, go through all the values on the right.

The remaining types, unit (() in Haskell) and empty (Void in Haskell, but usually called "bottom" in type theory) behave as expected, with () having finite cardinality 1 and Void having finite cardinality 0.

But what about types like Natural? Well, clearly it doesn't have a finite cardinality, so why don't we just say it has infinite cardinality? But that brings up some trickier questions: first, can we still enumerate all the values of sums and products? And second, is there only one infinity?

Let's start with considering the naturals in this context. We can say the cardinality is infinite, and we can also enumerate them reasonably -- 0 is the first Natural, 1 is the next, and so on. But what if we take the product of Natural and Natural, i.e., (Natural , Natural)?

    0     1     2   ...
0 (0,0) (1,0) (2,0) ...
1 (0,1) (1,1) (2,1) ...
2 (0,2) (1,2) (2,2) ...
.   .     .     .
.   .     .     .
.   .     .     .

Here we have a discrete grid with pairs of Naturals at each point. Obviously there's still an infinite number of them, but can we still enumerate them? Our previous strategy for enumerating finite values no longer seems to work properly. If you start reading left-to-right from (0,0), you never even finish the first line because, well, it's infinite, and so you never actually touch the (0,1) or any of the lower points. That seems problematic. With the naturals, I could choose any Natural like 9283749823, and eventually, our enumeration would in fact find that number, given enough time. But here, it doesn't matter how long we search for a point like (1,1), we'll never find it because our search never makes it to the second row.

So wait, is this "grid" infinity somehow a "larger" infinity than our natural number" infinity? Because at least with our natural number infinity we could hit all the possibilities, but with this grid structure, we seem to be be having trouble with that.

This is the question Cantor pondered, and in doing so, came up with several interesting ideas which remain important to this day. The first is that no, the "grid" infinity here is actually the same "level" of infinity as the naturals, and in fact, we can enumerate all the possibilities on the grid, we just have to be a bit clever: what if instead of going left-to-right or top-to-bottom, what if we went diagonally?

If you start at the top left corner, you get (0,0) as the first value, then along our next diagonal, we have two values, (0,1) and (1,0), then along the next diagonal, we have three values, (0,2), (1,1), and (2,0), etc.. If we traverse the grid in this fashion, it seems like we should properly hit all the points without missing any! So now it seems there is no "second level of infinity" here, we just had to be smarter in how we made our traversal, but ultimately, Natural and (Natural , Natural) are completely isomorphic.

In the spirit of not being overly revisionist, this is not quite the question Cantor was addressing: Cantor was specifically interested in demonstrating that the rationals had the same cardinality as the naturals. You can see how this line of reasoning does lead us to an enumeration on the rationals, but not in a very nice way - if you just take pairs of Naturals, you will end up enumerating all the rationals, but there's substantial redundancy in this encoding in that (1,1) -> 1/1 = 1 is the same rational as (2,2) -> 2/2 = 1. For that reason, it's a touch ironic that despite this library being inspired by and named after Cantor, I've opted not to include instances for the rationals (bonus question: how do you enumerate, without redundancy, the rationals? bonus question 2: how does this apply to types?)

Anyway, for sums like Either Natural Natural, it's hopefully fairly obvious how we can enumerate them - just alternate back and forth between left and right as we go deeper. So while this is useful, we yet again have failed to discover a new level of infinity.

But we're functional programmers, and if there's one thing we're good at besides annoying dysfunctional programmers, it's functions! So how does this apply to function types? Well, function types are really just a flavor of exponentiation, so it shouldn't surprise us if there's a connection. Let's start with finite cardinality, how many values are there of type TrafficLight -> Bool? If you said 3 ^ 2, you are sloppy and should feel bad. To find the correct answer, let's think about it this way:

f :: TrafficLight -> Bool
f R = _
f Y = _
f G = _

So we have 3 holes, each with 2 possibilities, so that's 2 * 2 * 2 = 2 ^ 3 = 8 possibilities. Put another way, it's the number of ways we could generate (Bool , Bool , Bool), and our function is defined on three cases, with each case mapping to its respective part of the tuple. But what if the types aren't finite?

Let's start with TrafficLight -> Natural:

f :: TrafficLight -> Natural
f R = _
f Y = _
f G = _

By the same reasoning above, this is really just an enumeration of (Natural , Natural , Natural), and we already know how to handle arbitrary products above, so this is straightforward. Even functions like TrafficLight -> Natural are enumerable.

But what about the other way? What about functions like Natural -> Bool?

f :: Natural -> Bool
f 0 = _
f 1 = _
f 2 = _
f 3 = _
...

This... seems problematic. We now have an infinite number of holes, and while we can easily generalize a binary product to an n-ary product, there doesn't intuitively feel like a way to generalize this to an infinity-ary product. But maybe we're just not being clever enough. Maybe we just need to "read the page diagonally" like before and we'll come to a clever resolution! Let's give it a shot.

Suppose there is a way to enumerate all Natural -> Bool. That means we have a function e :: Natural -> (Natural -> Bool). But consider this function:

f :: Natural -> Bool
f i = not $ e i i

Since your enumeration by definition contains all Natural -> Bool, then it must contain this f, so let's find it. Let's start at the beginning - f is clearly not the first function since f 0 = not $ e 0 0, and if they differ at the point 0, obviously they're not the same function. Let's try the next one: but f 1 = not $ e 1 1, so clearly it can't be this function either, since they differ at the value 1, and if they differ there, they can't be the same function. In fact, this reasoning holds for any Natural, so... my function f is clearly not in your enumeration after all. Which means e is not an enumeration.

I hope you didn't blink, because that was a proof by contradiction. There is no possible way to enumerate Natural -> Bool. It hasn't merely not been discovered yet; it is mathematically impossible. Which means yes, this is, in some sense, a "higher" infinity than what we had with Naturals or grids of Naturals. The most common name for this larger kind of infinity is "uncountably infinite", while the kind isomorphic to the Naturals is called "countably infinite." Yet with similar logic, there are higher levels of infinity still.

This is the culmination of Cantor's work in this area, famously known as his diagonal argument. Countably infinite spaces are the most tame kind of infinite. When searching a countably infinite space, we can at least have the knowledge that if a solution exists, we will eventually (heh) find it. But with these higher infinities, that guarantee is fundamentally broken: no matter how hard you search and no matter how clever you are with your navigation, even if a solution exists, you are not guaranteed to ever discover it.