Notes, Not a Review

There are a couple different ways that people pursue technical skills. To vastly simplify things into a single spectrum, on one end, there’s the hacker approach of finding the problem first, and then looking for whatever solution lies closest at hand to solve it. On the other end is the academic approach of attempting a broad, comprehensive review of a field, then selecting the appropriate tool to solve problems as they present themselves.

For most of my life, I’ve preferred the hacker approach, but as I’ve begun to get more engaged directly with research problems, I’ve found it coming up short. Against hard problems where there are dozens of possible approaches and limited time and resources with which to test them, throwing everything that I’ve got at them and hoping something works seems less and less likely to actually solve the problem.

As such, I’m now tending toward taking not so much a broad and comprehensive approach, but a deep approach - learning as much as I can about the tools and techniques at my disposal so that I can better diagnose why some approaches do and don’t work on my the problems I’m facing, and figure out what direction to head in to make better progress.

To do that, the first thing I think I need to do is get conversant in the formal, precise language and approach to mathematics that my education so far has skirted over as being unnecessary so long as one can grok the more practical problem-solving techniques. The people who know more about such things than me tell me that set theory is the cornerstone of modern mathematics, and so I began here in my exploration, but after going through a good chunk of this book in my spare time, I’ve realized that it demands much more serious and dedicated study in order to appreciate and absorb its lessons.

In short - I need to write this stuff down as I go through it, and figured I might as well publish my take as I go through. If you’re interested in seeing what these ideas look like from someone else’s perspective, or else in kindly correcting where I’ve gotten something fundamentally wrong, I’ll very much appreciate hearing from you.


Two Rules to Bind Them

The book doesn’t actually separate itself out into sections, but there’s definitely a tick-tock tempo going on where Halmos bounces between introducing some new idea or rule and then showing you all the things that naturally fall out of those rules, or rather, how much of intuitive mathematics can be “reclaimed” with it. I don’t know much about the history of mathematics, but my understanding is that much of what’s now accepted set theory was codified in the late 19th and early 20th century in an attempt to establish some fundamental basis for mathematics that was as good as it could be. Especially once Godel kinda smashed the hope of a complete and consistent scheme of axioms.

As such, much of the book is written with the assumption that the reader is familiar with much of basic mathematics and arithmetic, and that rather than introducing entirely new concepts, Halmos is instead merely showing the reader how these concepts can be formulated using only the concept of a set. It also has the slightly annoying tendency to address other advanced mathematicians who would object to Halmos’ approach, but that seems to be unavoidable whenever an academic attempts to publish a definitive review of a subject.

The first thing we get introduced to is, naturally, the idea of a Set, except that no formal definition is given - Halmos relies on the reader’s intuitive understanding of a collection of things to define a set, though I think this a bit risky as I think there are a number of intuitive properties of collections of things that we tend to rely on which don’t exactly hold up for sets - particularly when we start talking about sets of sets of sets, etc.

Regardless, we are informed that the principal idea of a set is that it is a thing to which other things may belong, or be contained in, though already I’ve run into terminological trouble - as there is a separate concept by which as set may be included in another set, while not belonging to it. That is, while a set is made up of elements and members, there also exists the concept of a subset wherein all the members of one set are also members of another set, though not necessarily vice-versa. Consider, for example, the set of Kings of England, and separately, the set, Kings of European Countries. Current political tensions aside, Henry VIII is both a King of England and a King of a European Country - in fact, we would find that the every King of England is a King of a European Country, and therefore the Kings of England are included in the Kings of European Countries - the Kings of England are a Subset of the Kings of European Countries.

This, however, is an entirely different concept from the “Kings of England” being contained within Kings of European Countries. After all - the members of the set Kings of European Countries are things like Henry VII and Charles V. If you saw a long of names, and among them was a list of names, as if the whole list itself were just another name, you’d be rather perplexed. This is what is meant by the distinction between belonging/membership/containment, and being a subset/included.

With those ideas laid out, we’re introduced to the Axiom of Extension, which for practical purposes, defines equality between sets as being the relationship where two different sets have the same members. This naturally implies that the first set is a subset of the second, and that the second is a subset of the first, and Halmos makes the point that most formal proofs of equality between two sets are proved by showing this symmetric subset property. I’m a little fuzzy on why exactly this is called the axiom of extension, rather than equality, since Halmos doesn’t give a definition of the word extension, but my understanding is that the extension of a set is its members. Therefore, the axiom of extension is tantamount to saying that a set is defined solely, solely, solely by its members, and that two sets which have the same members are the exact same set, regardless of if one is referred to as “Colors of the Sky” and the other is “Colors Which Follow Green and Proceed Indigo in a Rainbow”.

Given these basic ideas of the relationships between elements and sets, and between sets and sets, we can then begin to introduce ideas of how to make new sets out of old ones by stacking assertions of belonging and equality and using logical operators as the glue. Halmos lists 7 logical operators for stacking and slicing sets and their elements:

  • and

  • or

  • not

  • if — then —

  • if and only if

  • for some/there exists

  • for all

This set is already redundant, but makes for a nice, compact group that covers most of the bases. The Axiom of Specification simply states that there exists some set B for every set A which is arrived at when you apply a logical condition to A. Thus, by specifying, for example “The Kings of Europe who are not Kings of England” you arrive at the set “The Kings of Continental Europe”, so on and so forth. One interesting consequence is that there is no Set that Contains Everything - else you could simply define the “Set of Things Not in the Set That Contains Everything”. Which, by the axiom of specification must exist, and yet cannot if the Set that Contains Everything truly contains everything.


The Deceptively Simple Stuff

Once we’ve set out what we can do with sets, the tick turns to tock, and we begin to talk about different kinds of sets it’s useful to consider. The first of these we’re introduced to is the Empty Set, which is simply the set with no members. If it seems somewhat unlikely that such a set could exist given that our current understanding of a set is a thing to which things can belong, consider that the axiom of specification allows us to take any set A, and specify “The Members of A Which Are Not Equal to Themselves” or any other condition that is false for all members of A. By the axiom of specification, this set must exist, and yet it contains nothing. Another curious point is that every set includes the empty set (the empty set is a subset of every set) since all of the empty set’s members are in every set. (Are the any members of the empty set which are not in every set?).

Then, we get one more rule of set construction - the Axiom of Pairing which states that for any two sets, there exists a set that they both belong to. Note that this third set is a set-of-sets. This is sort of a special case of the axiom of specification, where in we specify for sets A and B “A Member Belongs In This Set If and Only If That Member is A or B”, except that is would rely on the concept of a Set that Contains Everything to apply that rule to, so it needs to be a separate axiom. Sets constructed in this way are called Unordered Pairs, but note that we could also have specified “A Member Belongs In This Set If and Only If That Member is A or A” - the axiom of pairing doesn’t say for any two different sets, we can construct a third. This, however, naturally results in a set that contains only one member, which is termed a Singleton. In this way, we begin to blur the line between sets and elements of those sets, since we can now define the basic concept of an element belonging in a set as the singleton of that element being included in the set.

Where the axiom of pairing was about constructing a set of sets out of a pair of sets, our next rule, the Axiom of Unions is about constructing a set out of the members of a collection of sets. It states specifically that for any collection (a set of sets) there exists a set that contains all the elements that belong to at least one of the the sets in the collection. Note that there are many sets which might have that property. If we’re talking about school children , and we define one set, {Jimmy, Lucy} and another set {Bobby, Sally}, then the set “Students Of Parkwood Elementary” might be a set that contains all the elements of those two sets, but it also contains many other examples that are not in those sets. “Children Living in Parkwood” or “Children Living in the United States” also fulfill the condition. However, knowing that such a set containing all the elements must exist, we can combine the axiom of unions and the axiom of specification to define “The Members of the Set Assured To Exists By The Axiom of Unions Which Are Members In the Sets Under Consideration” to specify the minimal set containing every member of the sets we started with, and nothing else. This minimal set is known as the Union of the sets we began with.

Though it doesn’t require its own axiom separate from the axiom of specification, this is a good time to introduce the Intersection of sets, since it’s got some conceptual similarities to the union. The intersection of a pair of sets A and B can be specified as “The Members of A which are Members of B” or vice-versa. Where the union is the set of all members in any set in a collection, the intersection is the set of all members in every set of the collection.

Unions and intersections share several similar properties which are familiar from elementary school arithmetic, including commutativity, associativity, idempotemce, and distributivity. Additionally, the union of any set with the empty set is the original set, and the intersection of any set with the empty set is the empty set. Unions and intersections can be useful for showing inclusion, since A includes B if an only if the union of A and B is A, and their intersection is B.

The next kind of constructed set we encounter is the Relative Complement, which is intuitively analogous to the difference between them. It’s defined as the members of one set which are not in the other, without consideration of whether one is a subset of the other. Thinking of a typical Venn diagram, you can “subtract” one of the circles from the other, even though this will only remove the slice where they overlap, and you don’t try and subtract the elements of the second circle that don’t overlap with the first. Talking about complements is easier if we assume the existence of a larger set, which Halmos calls E, in order to refer to how complements are constructed relative to that set E. This sets acts somewhat like the Set That Contains Everything, though obviously it is not that set in order to avoid violating the axiom of specification.

That said, we can then proceed to define the Absolute Complement of a set as everything in E not contained in the set under consideration, which has a few natural and intuitive properties. The absolute complement of a set’s absolute complement is the set itself. The absolute complement of the empty set is E and vice-versa. The intersection of a set and its absolute complement is the empty set, and the union of a set and its absolute complement is E. Some of the less intuitive rules, for which I find it helpful to think in terms of Venn diagrams are that a set A is a subset of a set B if and only if B'‘s absolute complement is a subset of A’s absolute complement. The most complicated but still basic rules about complements, which Halmos calls the most important are the De Morgan laws:

  • The absolute complement of a union of sets is the intersection of those sets’ absolute complements

  • The absolute complement of an intersection of sets is the union of those sets’ absolute complements

If those two seem to have some sort of weird, poetic rhyming structure, that’s not a coincidence. Indeed, the Principle of Duality states that for any rule about sets, if you swap every set for it’s complement, swap the unions and intersections for each other, and reverse the inclusions, we arrive at a new rule.

The last form of difference we’ll define is actually called the Boolean Sum between two sets as the union of the relative complements formed by subtracting each one from the other. i.e., A + B is the union between (A - B) and (B - A). Despite being represented with a + sign, the Boolean sum is also known as the Symmetric Difference, since it can be intuitively though of as the members of the union between the sets which are not in both of them. In a typical Venn diagram, you’re left with the two circles with a hole where their overlap used to be. It’s analogous to the exclusive-or operation in computer logic gates. Symmetric difference is really a better name for it, but I led with the name Boolean sum since it is represented with a + sign in Halmos’ book. Wikipedia says it’s also represented with a Delta symbol, a - in a circle, or a + in a circle, so… I think the - in a circle is my favorite. Oof.

The last kind of basic constructed set we’ll consider for now is the Power Set which is defined with it’s own little axiom, the Axiom of Powers. For every set, there exists a set of sets that contains as its elements, all the subsets of the set under consideration. For example, if our set under consideration is A = {1, 2, 3}, then the power set is P(A) = {{}, {1}, {2}, {3}, {1, 2}, {1, 3} {2, 3}, {1, 2, 3}}. Noting that A has 3 elements and P(A) has 8 = 2^3 elements reveals why these are called power sets - a set with n elements produces a power set with 2^n elements.


Building Simple Stuff Out of Way More Complicated Stuff

With those basic rules of set construction under our belt, Halmos then moves on to introducing the set theoretic ways of constructing some familiar concepts from arithmetic and algebra. This (delineated only by my own evaluation) section is kind of mind blowing, or at least mind bending, and it’s where the concept that “everything is a set” really begins to sink in.

Although somewhat unintuitive, considering a power set points us in the direction of imposing the concept of order onto the elements of a set. Sure, I wrote out A = {1, 2, 3} using the familiar number line ordering of those elements, but I could have just as easily written out {2, 1, 3} or {3, 2, 1}, or {3, 1, 2}, etc. and by the axiom of extension, these are all precisely the same set, which produce precisely the same power set, and behave in precisely the same way under all the operations we’ve defined. For simplicity’s sake, let’s consider a set with only two elements, a and b. It’s power set would be {{}, {a}, {b}, {a, b}}. We can define an Ordered Pair as the set (a, b) = {{a}, {a,b}} or (b, a) = {{b}, {a, b}}. In this sense, the “first” element is the one which is included in every element of the ordered pair, and the second in only one.

As far as I can remember, I was first introduced to the concept of an ordered pair some time in middle school when learning about the coordinate plane, and plotting functions. We’ll get to the functions shortly as well, but first let’s think about the coordinate plane - more formally known as the Cartesian plane. We can think of the Cartesian plane as the set containing all the ordered pairs (x, y) where x and y are real numbers. The related but more general concept is the Cartesian Product of two sets - which is a set containing all the possible ordered pairs which can be formed with the elements of one set occupying the first slot, and the elements of another set occupying the second slot. For example, if our set are {1, 2} and {3, 4}, then their Cartesian product is {(1, 3), (1,4), (2, 3), (2, 4)}. If we apply the concept to the real numbers - the Cartesian Product of R with itself, R X R becomes the familiar R^2, usually represented as a plane.

It’s important that the Cartesian product can be applied conversely - any set of ordered pairs can be expressed as the subset of a Cartesian product of a separate pair of sets. The elements of each of those sets which are part of the ordered pairs in the original set are known as the Projections of the ordered pair set onto the sets which form the Cartesian product. A familiar example is a curve lying in the Cartesian plane - the curve is comprised of a set of ordered pairs, and the elements of X or Y which the curve “casts a shadow” onto are its projection onto the elements of R which are contained within the ordered pair set which defines the curve.

The use of functions and curves to explain some of the concepts of ordered pairs is premature, but not hugely as we can immediately use ordered pairs to explore the fundamental concept of Relations between sets. Seeing as how we’re only working with ordered pairs, we’re only going to be talking about binary relations. The somewhat brute force nature of set theory is on display here, as we define relations simply as being a set of ordered pairs where we claim that a certain relationship exists between the elements of the pairs in the relation set. The example Halmos uses is that we could define marriage simply by assembling a set of ordered pairs of everyone who is married to each other. If a pair exists in the set, then they’re married, and if the pair is not in the set, they’re not.

This concept seems somewhat counter intuitive, to me at least. I’m used to thinking of relationships as being defined by some essential quality or statement which can be made about two objects. The set containing all examples of such a relationship would seem to be a natural consequence, but in set theory the set of all examples is the definition. It’s thereby entirely natural to say that any set of ordered pairs defines some sort of relation, regardless of how nonsensical the relationship implied by the ordered set might be. Indeed in some cases the only human-intuitable relationship between two points might be that they have the relation implied by the set of ordered pairs.

A few points of technical importance - relations are only one way when defined in this way, though we naturally tend to think of some as symmetric. If we defined the concept of marriage to be the set of ordered pairs {(Roxanne, Alfred)}, they we would conclude that Roxanne is married to Alfred, but Alfred is not married to Roxanne. For that to be true, the set would have to contain {(Roxanne, Alfred), (Alfred, Roxanne)}. Relations that have this property are called Symmetric. Other significant properties a relation might have include being Reflexive, wherein the set contains (x, x) for all x in X - that is, every member of a certain set stands in relation to itself by this relationship. Another important property is being Transitive where (x, y) and (y, z) being in the set implies that (x, z) is in the set. The most obvious relationship that has all three of these properties is equality - the set containing only (x, x) for all x in X. Equality is in fact the smallest set of a type of relation called Equivalence Relations. A Cartesian product with itself is the largest possible equivalence relation on a set..

A particularly interesting kind of relationship is the kind that subdivides a set into subsets so that each member of a subset is in relation to the same element. Think of rounding - take the entire number line and establish a relationship between every point on it and its nearest integer. Any disjoint, non-empty collection of subsets of a given set whose union comprises that set is known as a Partition of the set. Any partition for which each of the subsets has all of its elements stand in relation to the same element of the greater set is known as an Equivalence Class. Rounding turns out to be useful example, as the term Halmos uses to refer to the concept of the equivalence class on a set X established by the relation R is “X modulo R”. If, instead, you come at it from the other end, and already have a partition on X, the relationship implied by that partition is said to be Induced by the partition.


What Is My Function?

The whole concept of relations and partitions reminds me mostly of dictionaries/mappings in programming. Mostly Halmos talks about relations existing within a set, whereas in dictionaries I tend to think of the keys and their associated values as being two different sets which are mapping from one to the other. In, for example, a crude model of an atmosphere, we can establish a relation between altitude and pressure, density or temperature - everything from 0-10,000 feet would map to one set of values, everything from 10,000-20,000 feet would map to another. In this sense 4000 and 8000 feet are equivalent under this relation. This is a nice example since I’m just mapping real numbers to other real numbers and so it doesn’t violate Halmos’ convention of relations only existing between members of the same set, but we immediately then move onto the concept of relations between two different sets and the extra bits of scaffolding we need to support that concept.

The easy, cheaty way of establishing a relationship between two sets is to use the axioms of pairing and specification to lump the two sets together into an augmented set and establish the relation between members of that new set. This isn’t super interesting, though. What is interesting is the concept of a Function - a relationship that relates every element of one set onto a unique element of another set. The set being related from/on (non-uniquely) is called the Domain of the function, and the set of unique values in the second set being related to/into is called the Range of the function. Note that while a function might relate from set X onto set Y, and the domain of the function is all of X unless otherwise specified (in which case the specified subset might as well be called X), the range need not necessarily be all of Y. If it does establish a relationship between all of X and all of Y, then Halmos says that the function Maps X onto Y.

I’ve always wondered why my math teachers were such sticklers about keeping in mind the domain and range of functions and I can now see that they are an inherent part of the function as much as (if not more so) than the algebraic expression that usually defined the function. There appears to be some dispute about this however as some people prefer to think of the function as a nebulous sort of thing that does something to turn an element of X into and element of Y, and the set of ordered pairs associated with the function is then called the Graph of the function, rather than being the only and essential definition of the function. If we restrict ourselves to consider a subset of X, then the resultant part of the graph which corresponds to that subset is known as the Image of the subset under the function.

A small bit of accounting - if a function is defined such that its domain is a subset of a larger set that it maps into, then the simple function f(x) = x is called the Inclusion Map of the smaller set into the larger set. If that subset is identical to the larger set, i.e. the function maps X onto X, then it is called an Identity Map. These things seem somewhat trivial to me, but they’re necessary to acknowledge in the process of creating nested functions out of say, a function that maps from set Y onto set Z, and then wanting to apply that function solely to a subset of Y called X. From the start we don’t necessarily have a function that maps from X to Z, but by taking the inclusion map of X into Y, then applying the function from Y into Z, much the same effect can be achieved. Functions used in this manner are known as Restrictions or Extensions depending on whether they establish a relationship to a smaller or larger set than they were originally defined on.

Sometimes, functions have the special property that they map distinct elements to distinct elements. That is, any two different elements in the domain take on distinct values in the range. Such a function is known as a One-to-One Correspondence. A natural example is between values in the range and the equivalence classes that produce them. Note, the correspondence does not exist between the value and the elements within the class, but between the value and the class itself. It verges on the tautological, but going back to our rounding example, there is a one-to-one correspondence between say, “1” and “The Set Of All Numbers Between 0.5 and 1.5” but not a one-to-one correspondence between 1 and any single one of those numbers, since there are many numbers that round to 1.

An interesting kind of function Halmos introduces is the Characteristic Function - which is a sort of indicator function that returns either 0 or 1, depending on whether an element in its domain is part of a chosen subset. From our European royals example, the “Characteristic Function of The Kings of England” is a function on the domain “European Kings” that returns 1 if the European King under consideration is a King of England, and a 0 if he is not. Since there is only one possible characteristic function for each subset of a given set, there exists a natural one-to-one correspondence between power sets (the set of all possible subsets of a set) and the set of all possible functions on a set that map to the binary set of 0 and 1.

A type of function that Halmos dedicates an entire chapter to just because of its importance is the Family, which is the technical term referring to the common and familiar practice of mapping to elements of the range from a set of Indices. The familiar x_i, where i is the index (an element of the index set) and x_i is the related member of X, the indexed set. Honestly, it’s never even occurred to me to think of this as a function, just a common convention to identify elements in a set, but sure. I guess technically it’s common practice to use the natural numbers or letters of the alphabet as an index set, though there’s no reason you couldn’t use anything as an index set.

In fact, imagine you had a set consisting only of two elements, a and b. Then index some set z with them so that z_a identifies a unique element of z and z_b identifies another. If you consider all such families, and then restrict yourself only to consider those families where the resultant z_a and z_b come from the sets X and Y respectively, you’ve found another way to define the Cartesian product of X and Y by establishing a function f(z) = (z_a, z_b) This, in fact, is a more general approach, that allows us to define Ordered Triples, Quadruples, etc. as families indexed by unordered triples, quadruples, etc. Put another way, imagine we have the index set I = {x, y, z}. This could just as easily be {dog, 52, Pikachu} but {x, y, z} is nice and simple. You then index everything under the sun by x, y, and z, but restrict yourself to the happy coincidence that the elements indexed by them all happen to be real numbers, and establish a function that returns the the family {R_I} = (Rx, Ry, Rz), which is thereby an ordered triple indexed by {x, y, z}, and the set of all such triples is a natural extension of the Cartesian product into a larger number of sets.

Frankly, this is the worst-explained part of the book so far and I’m not sure if I’ve misunderstood something here, but this essentially forms the basis for functions on larger numbers of variables, as we’ve established a basis for correlating any number of specific, ordered inputs to specific, ordered outputs. I thnk technically we’re somewhere around the math I thought I understood in the 8th grade, so I’m gonna just call it good and move on.

Speaking of 8th grade math, we come up next against the concept of an Inverse Function. In set-theoretic terms, the inverse of a function f: X->Y is defined as the set of elements in set X for which f(x) is in some given subset of set Y. It is interesting to note that while f is defined in regard to the elements of X, f^-1 is defined in terms of subsets of Y. It is a necessary and sufficient condition for f to fully map X onto Y that the inverse image of every non-empty subset of Y be a non-empty subset of X. For f to be one-to-one, the inverse image of each singleton in Y must be a singleton in X. It is only in this case of a one-to-one function that f^-1 is interpreted as being a function in and of itself which acts on the elements of Y rather than the subsets of Y.

The last kind of function to come under consideration is the Composite Function. If, for instance f: X ->Y and g: Y -> Z, and additionally, the range of f is contained in the domain of g, then it makes sense to think of a h = g(f(x)): X -> Z. Care must be taken at times to note that the second condition is satisfied - the range of the inner function being contained within the domain of the the outer function. Particularly in cases where the two functions operate on the same two sets but in opposite directions, it’s easy to violate this condition and end up with some non-sensical results. Thankfully, composition plays nice with inversion in the case that functions fully map one set onto another. If you have a formable composite h = g(f(x)) you can rest assured that h^-1 = f^-1(g^-1(z)).

Care must be taken when generalizing inversion and composition to all relations, not just the input-to-unique-output form of functions. Composite relations only exist in the case, for instance, of xRy and ySz allowing there to exist xTz. A general relation between all members of X and all members of Z might not exist, even if one does exists for specific members. When they do exist, however, the inversion algebraic rules are the same as for the restricted case of functions.

Completing our regression, we now move from the heady space of 8th grade math into the perhaps the more challenging world of 1st grade math and Numbers. Specifically, the natural numbers, like 0, 1, 2, 3,… etc. As with everything else, we must identify the concept of a number itself with a set, and up until this point we’ve accomplished this in a pretty brute force manner - by defining a concept as the set of all sets bearing a particular property. We might be tempted to define, say, 2 as the set of all sets having two members, but that gets pretty circular, pretty fast. Instead, the approach is similar to how unit standards are established - some mass is defined as the “standard kilogram” and anything that weighs the same as it is determined to also weigh one kilogram. We do this in a mildly recursive manner, utilizing the concept of a Successor, which is to say, for some given set X, X’s successor , X+ is defined as having all the same elements as X, plus X itself. So, if we define the set, “The Beatles” as {John, Paul, George, Ringo}, then “The Beatles+” is {John, Paul, George, Ringo, The Beatles}, or more explicitly, {John, Paul, George, Ringo, {John, Paul, George, Ringo}}.

This makes a bit more sense when we consider the subject at hand - the natural numbers. If we want the set which we define to be a sort of standard, we can start by setting zero to be a set with zero elements. It must and can only be the empty set: {}. 1 then, is the successor set of zero: {0} or {{}}. 2 is {0, 1}, or {{}, {{}}}. I’m gonna stop here since the bracketology has already crossed the point that’s strictly comfortable, but say that we can carry this process on literally ad infinitum thanks to the handy, dandy and excellently named Axiom of Infinity which simply says that there exists a set containing 0 and the successor of every element in it.

You can imagine any number of sets containing {0, 1, 2, 3, etc…, The Beatles, The Beatles+, The Beatles++, etc.} or zero, all it its successors, some extraneous elements and all of their successors, and this qualifies as a Successor Set - which is one of those with the property described in the axiom of infinity. But since a successor set has to have 0 and all of its successors, there is only one minimal set which is contained in all successor sets - that is {0, 1, 2, 3, etc….} which Halmos calls omega (We’ll get to why this is the first lower case vocab word waaaaaay later). A Natural Number then, is any of the elements of that minimal successor set, omega. It’s a weird thing to think that, say, 7 isn’t just the predecessor of 8, but that 7 is a subset of 8, but these properties are usually extraneous to what we do with numbers.

A precise definition worth noting is that a Sequence is a family using a natural number as its index set - i.e., if we’re using 8 as an index set, then we have the numbers 0-7 available to use as indices. This is why arrays start at 0. Having them start at 1 is unnatural.


And Now We Learn To Count

Alright, now that we know what toys we can play with, let’s figure out how we can play with them. The first couple of rules Halmos introduces in this next chapter aren’t really formulated in terms of sets and relations in the same way we’ve defined them but take the form of a few more axioms, which all have to do with specific properties of that minimal successor set we used to define the natural numbers, omega:

  1. 0 is in omega

  2. If n is in omega, then n+ is omega

  3. If S is a subset of omega, and 0 is in S, and n+ is in S for every n in S, then S must be omega

  4. None of the successors in omega are 0

  5. If two elements of omega have the same successor, they are the same element

These are known as the Peano Axioms. I’m not super sure why they’re considered axiomatic since they seem implicit from the axiom of infinity and the definition of a successor, but it might be down to more of that mathematical history I don’t really know. It seems the Peano axioms predate this formulation of set theory, and may have needed to be taken as a given to do arithmetic in the period after their supposition. Halmos claims it’s possible to derive rational, real, and imaginary numbers from these axioms and the arithmetic associated from them, but that it’s beyond the scope of the book. I’m kinda interested to see how that’s done, but I suppose it’ll have to wait for now.

He does, however, draw particular attention to Peano axiom 3, which is apparently also known as the Principle of Mathematical Induction, and points out that it can be used to establish the uniqueness of a function f:X->X which can be used to establish a sequence in the manner that u(0) = a in X, and u(1) = f(u(0)), u(2)=f(u(1)), etc. Essentially, if you can establish how a function operates on zero and how it operates on the successor of anything you already know how it operates on, that’s sufficient to determine how it operates on all the natural numbers. While the principle of mathematical induction establishes that such a function would be unique, it does not establish that it must exist. For that, we need the Recursion Theorem which says precisely that if f exists, the corresponding u exists as well. Using these two in conjunction, we can thereby establish new functions through Definition by Induction.

Indeed, we can use definition by induction to begin to define some of the more familiar arithmetic operations, like Addition. From the recursion theorem, there must exist a function s such that u(0) = a and u(n+) = (u(n+))+ for every natural number n. The value of this function for every natural number can be seen to be u(n) = a + n for every n. Thus, although kinda weird and unintuitive, we can see that addition is not one single function or operation, but a sequence of functions defined for every natural number. We can similarly define Multiplication as u(0) = 0 and u(n+) = u(n) + a, so that u(n) = n * a for every n. Exponents work similarly, with u(0) = 1 and u(n+) = u(n) * a.

One small oversight that it’s time to correct is the concept of order in the natural numbers. After all, we’ve been talking about omega a lot and the concept of a successor, but we’ve never formally established that the set has a particular order. We might just as easily list elements of omega as {354, 4254, 653, 3, 1112, etc…} as {0, 1, 2, 3, 4, etc….}. For the natural numbers order is straightforward, and we say that one member of omega is Less Than another if it is a member of that number as a set, or equivalently, that it is a subset of that number.

If there exists a one-to-one correspondence between two sets, they are called Equivalent. Some hijinks ensue when considering infinite sets. If you consider, for instance, f(n) = n+, you can see that it establishes a one-to-one correspondence between the natural numbers and the positive integers: 0<->1, 1<->2, etc., so that even though the positive integers are a proper subset of a natural numbers, the two are still equivalent. Thankfully, as long as we’re only playing around with and comparing the natural numbers, where no single number is an infinite set, we’ll find that no two natural numbers are equivalent. Moreover, any finite set is equivalent to one and only one natural number, which is known as the Number of Elements in that set. Hey, we’re about halfway through the book and we’ve finally learned how to count!



Now that we know how to count, let’s set about putting things in Order so that we know which way to count them. We’ll be generalizing the concept of “less than or equal to” rather than the more strict “less than” because it’s easier and more common. We define orders (naturally) as a kind of set - specifically a relation between members of a set that is called a Partial Order if it has three properties:

  1. If xRy and yRx, then x = y. (Anti-symmetric)

  2. xRx for every x in the set (Reflexive)

  3. If xRy and yRz, then xRz. (Transitive)

At first blush this seems to me insufficient to establish any kind of order. Yes, an order must have those properties, but these alone seem insufficient. I suppose it’s sort of the transitivity that seems the most order-ish to me, and the other two are sort of constraints to stop the order from becoming degenerate or ambiguous. Typically, R is represented simply by the familiar <= sign. If it so happens that there is the stronger condition that for every pair {x. y} in the set, that either x <= y, or y<=x, then the ordering is a Total Order or a Linear Order, and the set so ordered is called a Chain.

It’s easy to think of partial orders in terms of the natural numbers, but it generalizes pretty easily - within a power set, for instance, the relation of inclusion/subset-ness establishes a partial order, though not a total order unless the set only has 0 or 1 elements. If there are two, say {a, b}, then the power set is {{},{a},{b}, {a, b}}, and our order tells us that {} <= {a}, {b}, {a, b}, and {a}, {b} <={a, b}, but {a} is not <= {b} or vice-versa. The concept of function extensions establishes a somewhat natural partial order as well - f <= g if g is an extension of f. In the natural numbers, though, <= is a total order.

An Ordered Set is in fact an ordered pair of a set and an order in it, such as (X, <=). Of course, the relation which defines the order (which is itself a subset of the Cartesian product of X with itself) implies a definition of X, as so the ordered pair bit is somewhat redundant. In order to move to a Strict relationship or order, we define a second relationship, <, which holds for the case that x <= y, and x != y.

Some more vocab. The Initial Segment Defined by an Element are all the elements of a set that are less than the defining element - and it may be a weak or strict segment. It’s members are referred to as Predecessors of the defining element. The natural definitions for Successors (though this overloads the earlier definition of a set that contains all the elements of its predecessor plus its predecessor), and being Between two elements follow. When we get to the edges or partial ordering, there are subtle distinctions between a Least Element (one which is less than or equal to all the other elements in the set) and the Minimal Element (one for which there is no element strictly smaller than it.). Imagine the set {{a}, {b}, {a, b}} with inclusion defining the partial ordering. a and b are both minimal elements, but neither is a least element. There can only be one least element in a set if it exists, but there may be many minimal elements. The mirror is true for Greatest Elements and Maximal Elements.

If set is a subset of a larger set, then elements of the greater set which are less than than all the elements of that subset are Lower Bounds of the subset, and those that are greater than all the elements of the subset are Upper Bounds. The largest possible lower bound (whether its in the subset or not) is the Infimum of the subset, and the smallest upper bound is the Supremum. If it takes a while to wrap your head around these differences, don’t worry - I first encountered them in a graduate course on convex optimization and I didn’t understand them until my second read through of this book on my own time. This distinction is actually what prompted me to read this book in the first place. Consider the positive numbers as a subset of the reals. -10,000, -3, and 0 are all lower bounds - they are all less than any of the positive numbers. 0 is the inifimum of the positive numbers, even though it is not itself a positive number, because there is no positive number which can be definitively called the least element. 0.0001 is still greater than 0.00000001, which is still greater than 0.00000000000000001, etc. 0 is the greatest possible number which is still less than all the positive numbers. Consider the set of all numbers less than 100 to see that 100 is their supremum.

It’s worth stating at this point that the concept of order is probably the most important thing in set theory to me personally, since my research involves optimization, and establishing an objective function is essentially establishing a partial order on the Cartesian product of your design variables, and then executing some kind of search algorithm to find the minimal elements. If anyone’s got some super cool tricks about applying the concept of order to optimization, I’d loooove to hear about them.


This Is Where The Fun Begins

Anyway, back on track - item duplication glitches, and the axioms that enable them. Specifically, the Axiom of Choice, which states that the Cartesian product of a non-empty family of sets is non-empty. This seems like it should be provable from the other elements of set theory we’ve learned so far, but apparently it isn’t, and further leads to some weird, counter-intuitive results, like the Banach-Tarski Theorem which involves slicing up a ball in a sufficiently special way that you end up with two balls of the same size as the original. Halmos then goes on to define the related Choice Function in an extremely poor (difficult to understand way). A choice function is a function whose domain is the same index set as a family, and which assigns to each index an element of the Cartesian product of the family. An alternative statement of the axiom of choice is simply to say that a choice function exists for any non-empty family of sets.

It seems that it’s pretty easy to see that the axiom of choice holds for finite sets, but it needs to be axiomatically stated to hold for infinite sets. I’m not going to delve into the deeper aspects of it, the controversy and alternative, contradictory axioms which have been postulated. I’m mostly reading this book to get a firm foundation for my study of other mathematics, and since the axiom of choice appears to be accepted enough to build off of, I’ll just stick this particular knife in my belt and move on.

Back to thinking about optimization and the extreme edges of sets. Zorn’s Lemma stats that if a set is partially ordered such that every chain within it has an upper bound, then the set has at least one maximal element. This immediately sets my antennae wiggling because the way it’s phrased, it sounds like if a partially ordered set contains no chains then its still guaranteed to have a maximal element (since there are no chains without upper bounds), which is weird. Is is possible for a partially ordered set to contain no chains? Can one order the empty set?

… Some brief Googling and I’m reminded that an upper bound of a subset has to exist within the greater set, so even though it is possible to partially order the empty set (the only possible order being the empty set itself as a relation), none of the (non-existent) chains within it have upper bounds, so Zorn’s Lemma doesn’t apply. Along the way to proving Zorn’s Lemma, Halmos introduces the concept of a Tower - a set within a set that contains three distinct types of elements:

  1. The empty set

  2. All the elements of the greater set such that they can be unioned with a previously known element of the set and still remain within the non-empty set of subsets of the greater set

  3. The union of all elements of any chains within the tower

Proof of Zorn’s Lemma entails proving that the smallest possible tower within a set, the intersection of all other towers, is itself a chain. Zorn’s Lemma is equivalent to the axiom of choice - if you take all the other axiom’s we’ve studied up until now and throw in Zorn’s Lemma as an axiom instead of the axiom of choice, you can instead prove the axiom of choice.

Another equivalent statement to the axiom of choice which concerns the other end of sets is the Well-Ordering Theorem which says that every set can be ordered in such a way that every non-empty subset of it has a smallest element. (A set so ordered is called Well Ordered.) Any well ordered set is also necessarily totally ordered.

The naming of well ordered sets has a weirdly paternalistic feeling to me - like we’re patting the set on its head and telling it it’s well behaved. I feel like fixed ordering or something more neutral would be better - well ordering makes me feel like any property it would be nice for a well ordered set to have I’m more biased to assume it has since it’s such a good, little set. Halmos explicitly says that it’s a pleasant fact about well ordered sets that there’s an analogy to the principle of mathematical induction for them - the Principle of Transfinite Induction. Which, rather than applying only to the natural numbers and omega, applies to all well ordered sets. It states that if a subset of a well ordered set is such that if the entire initial segment of an element of the well ordered set is in the subset then so is the element itself (If the strict initial segment is in the subset then the weak initial segment is as well), then that subset must be equal to the well ordered set. Transfinite induction differs from mathematical induction in that it makes a jump not from immediate predecessor to successor, but from the set of all predecessors to a successor. Elements of a well ordered set might not have an immediate predecessor, whereas with the natural numbers, the immediate predecessor of a natural number is the set of all predecessors to itself, so the distinction is needed. Also, transfinite induction makes no assumption about a starting, base element in the role that 0 plays for mathematical induction.

Well ordered sets have some natural associated concepts/vocab to learn. A set is a Continuation of a well ordered set if the well ordered set is a subset and initial segment of the continuation, and the continuation has the same ordering as the well ordered set. Of any two distinct initial segments in a well ordered set, one of them must be a continuation of the other. The union of a collection of well ordered sets can always be well ordered so that it is a continuation of each set used to assemble it. That ordering, however, is not necessarily just well ordered continuation. As a last word - a related insight can be applied more generally, which is to say that while the well ordering theorem guarantees that any set can be well ordered, that ordering might be quite arbitrary and have nothing to do with any other partial or total orderings on the set.

Since the principle of mathematical induction was only one half of its usage in definition by induction, we need to think about the transfinite analog of the recursion theorem to establish a transfinite analog of definition by induction. Transfinite Recursion, then allows us to get the value of a function on a well ordered set at a particular element based on its value at all of that element’s predecessors. To get to it, we have to generalize the concept of sequences to Sequences of Given Type. For an element within a well ordered set, say a in the well ordered set W, a “Sequence of Type a in X” is a function from the initial segment of a into another set, X. The previous definition of a sequence is just a sequence of given type where the well ordered set is the set of natural numbers, omega. A Sequence Function is a function whose domain is all sequences of given type from a well ordered set into another specified set.

The Transfinite Recursion Theorem states that for every sequence function that maps the sequences of given type in a well ordered set into another set, there is a unique function from the well ordered set into the other set defined such that its value at any member of the well ordered set is the value given by the sequence function’s value on the restriction of the unique function to the initial segment of the element.

I know I’m getting quite twisted up in my words here - but the book is called Naive Set Theory, so trying to state these things in plain language so I can intuitively understand them has been a goal of this note-taking process. As an example, if I have a set, {a, b, c} which is well ordered by the usual alphabetical order, and a separate set {Monkey, Zebra, Snake, Lion}, then every permutation that maps one of the initial segments possible in {a, b, c} to a unique element of {Monkey, Zebra, Snake, Lion} are together the Sequences of Given Type in {a, b, c} to {Monkey, Zebra, Snake, Lion}. A function that takes one of those permutations, and associates the permutation to a unique element of {Monkey, Zebra, Snake, Lion} is a sequence function.

That is, a function that says f({a}) = Monkey, f({a, b}) = Zebra is a sequence. A function that says g({a}) = Snake, g({a, b}) = Lion is also a sequence. A sequence function is a function whose argument is a function and returns an element of the objective set, so if my sequence function is z, then z(f) = Snake, z(g) = Lion, z(h) = Lion, z(i) = Monkey, etc. for every possible function that indexes {Monkey, Lion, Snake, Zebra} with {a, b, c}.

What the transfinite recursion theorem says is that there is a unique function, call it U, which takes in an element of the index set and returns a member of the objective set according to the rule U(x in {a, b, c}) = z(U(x in {a, b})), and we know that U(x in {a, b}) = z(U({a})). So, if we define U({a}) = Monkey, and we know that z(“A Function That Returns Monkey for a”) = Zebra, then we know that U({b}) = Zebra. And if we know that z(“A Function that Returns Monkey for a and Zebra for b”) = Snake, then we know that U({c}) = Snake. Thus, we’ve fully defined U({a}) = Monkey, U({b}) = Zebra, and U({c}) = Snake, by use of z as a sort of generator of possible functions. An exercise like this one is known as Definition by Transfinite Induction.

Bit more vocab before we’re done this section - an order-preserving, one-to-one correspondence between two sets is called a Similarity and such a pair of sets are called Similar. Since similarities are one-to-one, their inverses must exist and are also similarities. Composites of similarities are similarities. Well ordered sets can have only one similarity between them, but a well ordered set cannot be similar to one of its initial segments, but there exists the Comparability Theorem which states that for two well ordered sets, either they are similar to each other, or one of them is similar to an initial segment of the other.


To Infinity and Beyond!

This is the point I’m kinda thinking about jumping off the bandwagon here. I’m not sure how prevalent the things in this section are to my research, but then again this is the point in my first read through of the book I realized I was misunderstanding the basics and needed to go back through, taking notes. Maybe I’ll get something out of it this time.

Goku would be proud. What happens when we take omega, the set containing 0 and every successor to a member within it, and then take omega’s successor, omega+? What of omega++? What about going even further beyond?

To get there, we need the Axiom of Substitution, which says that if S(a, b) is a sentence such that for each a in a set A the set B = {b: S(a, b)} can be formed, then there exists a function F with domain A such that F(a) = {b: S(a, b)}. In plain language - you can form a set of all the points for which a particular sentence is true. It seems to me a kind of super form of the axiom of specification - where that one states you can form a set by paring down a set already known to exist, this one says you can construct new sets out of anything intelligent you can say about the elements of a set, not just that a subset of them exists.

Ordinal Numbers make use of the axiom of substitution to define themselves as an extension of the concept of natural numbers. An ordinal number is a set that is well ordered and such that every member in it is constituted of every one of its predecessors inside the well ordered set. In the same way that 8 (as a set) is composed of {7, 6, 5, 4, 3, 2, 1, 0}, so an ordinal number alpha would be composed of {alpha-, alpha- -, alpha- - -, alpha- - - -, etc.}. All the natural numbers are ordinal numbers, as is the set containing all the natural numbers, omega (though it is not itself a natural number). The process extends - omega+, omega++, etc. are all also ordinal numbers. The successor of any ordinal number is also an ordinal number. A useful fact about the ordinals is that if it is possible to make a set into an ordinal by pairing it with a well ordering, there’s only one, unique ordering that does so.

What if we take omega and use the axiom of substitution to define a function F on omega such that F(0) = omega, and F(n+) = (F(n))+, so that F(1) = omega+, F(2) = omega++? What is the range of this function - in effect the highest ordinal number this allows us to define? Well, the union of that set with omega itself (omega, omega+, omega++, etc.} is called omega2. In other language, omega2 is the set {x:x=omega+n for n in omega}. If you do it again, unioning omega2 with all it’s successors according to a function on omega, you get omega3, and then again to get omega4, etc. Do that whole thing, and in the same way that omega follows all the natural numbers, omega^2 is the thing that follows all the ordinals constructed in this way. Then you get omega^2 + 1, … omega^2 + omega, … omega^2 + omega2,… omega^3, etc., etc. Omega and its successors are called Transfinite Ordinals.

One of the subtle things about the way ordinals are defined is that whereas we originally defined omega as the minimal successor set - making it sort of a special case of a wider type of set, the way they ordinals are defined directly restricts the concept to the natural numbers and their successors. Where a generic successor set could have contained {0, 1, 2, … a, b, c, …, alpha, beta, gamma…} and any other combination of elements so long as the necessary order is maintained - the requirement of well ordering combined with the requirement that every element be a set containing all of its predecessors means that only 0, 1, 2,… omega,… omega2, etc. satisfy both requirements. Generic successor sets with non-number elements cannot be well ordered, and the requirement that every element be constituted of its predecessors ensures that every predecessor from 0 up to n-1 is in ordinal number n. There’s no need to specify the ordinals as any sort of restriction as a more general type of set.

We can at this point make a few more statements about the ordinals and sets comprised of them. First, explicitly - any element of an ordinal number is itself an ordinal number, and a subset of the larger ordinal number. Two ordinal numbers cannot be similar unless they are equal. If they are not equal (and therefore not similar) they one is still necessarily similar to an initial segment of the other and is an element of the other. Essentially, any two ordinal numbers are comparable with the natural ordering - any set of ordinal numbers is therefore totally ordered. A peculiarity is that despite this order within the ordinal numbers, not every ordinal number has an immediate predecessor. I might have referenced it as a concept earlier, but what ordinal is omega-? or omega2-? These transfinite ordinals without predecessors are called Limit Numbers.

Here’s a question - can there be a set of all the ordinal numbers? It seems straightforward to say obviously yes - but then we get into kind of “What’s infinity+1?” territory. If there were a set claimed to contain all the ordinal numbers, it would be straightforward to define the supremum of the set (the smallest ordinal not contained within the set) and show that it must exists (trivially the successor of the upper bound of the set), and therefore the set claimed to be the set of all ordinals is in fact, not. The idea of the existence of the set of all ordinals is called the Burali-Forti Paradox.

I know I claimed to have learned to count a while back, but now we’re ready to introduce the proper Counting Theorem which states that each well ordered set is similar to exactly one, unique ordinal number. So now I know how to count past infinity. I think? Kind of? Not really. Count to transfinty, I guess.

Now that we’re in (Infinity Plus) 1st grade again, let’s learn some more arithmetic. Halmos points out that there are two different approaches to ordinal arithmetic - one more set theory-y and one more recusive-y, and we’ll be going over the set theory-y one. Let’s start with addition. Now that we know how to count properly, instead of defining addition as a series of individual operators for adding a specified number to an argument, we can simply define adding to ordinals together more generally as the size of the resulting set when two disjoint sets of similar size to each of the ordinals are unioned together. If, for some reason the two sets you want to add together aren’t disjoint, make sets of ordered pairs out of them so that if the element x appears in both of the original sets, you end up with two new sets that now have (x, 0) and (x, 1) in them instead. The resulting union is called the Ordinal Sum of the two sets, and is typically ordered so that all the elements of one set precede all the elements of the second set. It’s ordinal size is the sum of the ordinal sizes of the two sets that went into it. This works just as well for the infinite (Transfinite? Both?) ordinals as well.

Ordinal sums have most of the familiar properties we want. for any ordinals alpha, beta, and gamma:

  • alpha + 0 = alpha

  • 0 + alpha = 0

  • alpha + 1 = alpha+

  • alpha + (beta + gamma) = (alpha + beta) + gamma

Buuuuuut, for the transfinite ordinals, like omega:

  • 1 + omega = omega

  • omega + 1 = omega+

Yikes. The same sort of behavior crops up when we move onto Ordinal Products. If we define such, e.g. A x B as taking A and adding it to itself B times by producing B disjoint copies of A using the ordered pair trick detailed earlier with each second item in the ordered pair being an element of B, we naturally arrive at the ordinal product of two sets being their Cartesian product with reverse lexographical order. That is, (a, b) < (c, d) implies that either b < d or b = d and a < c.

The good, familiar properties of ordinal multiplication are:

  • alpha0 = 0

  • 0alpha = 0

  • alpha1 = alpha

  • 1alpha = alpha

  • alpha(beta(gamma)) = (alpha(beta))gamma

  • alpha(beta + gamma) = alpha(beta) + alpha(gamma)

But also:

  • 2omega = omega

  • omega2 = omega2

  • (alpha + beta)gamma != alpha(gamma) + beta(gamma)

C’est le ordinaux. Scaling up again to Ordinal Exponents via either repeated production or transfinite recursion (the precise mechanism is left out of Halmos’ book) we arrive at:

  • 0^alpha = 0 (alpha >= 1)

  • 1^alpha = 1

  • alpha^(beta+gamma) = alpha^beta(alpha^gamma)

  • alpha^(beta(gamma)) = (alpha^beta)^gamma

But also:

  • (alpha(beta))^gamma != (alpha^gamma)*(beta^gamma)

Two things to say at this point. First, I know that writing out variable names in this format kinda sucks - I’ll look into getting MathJax to work with SquareSpace if this is truly incomprehensible. Also, we’ve previously used the x^y notation to denote the set of all functions from y to x, so this is an overload that we’ll have to rely on context to avoid confusing. I would have thought that, say (x -> y) would have been easier for the set of all functions, but, y’know. Math history.


Seriously, This Time I Promise I’ll Learn How to Count

So, we’ve learned how to count, but why do we count? Well, mostly to compare sizes - I guess we’ve reached (Infinity Plus) middle school again. But counting and comparing things does allow us to place them in a natural order, which is relevant to what I’m doing with optimization, so we soldier on.

Since the well ordering theorem tells us that every set can be well ordered, and the counting theorem tells us that every well ordered set is similar to exactly one ordinal number, the ordinals appear to provide a fine basis for determining and comparing the size of any two sets, even if they have nothing to do with each other. Except that the well ordering theorem doesn’t go so far as to say that any set can be well ordered in only one, unique way, and similarity to an ordinal number is determined by the ordering, not the elements of the set.

Might different well orderings lead us to different conclusions about the size of a set? Let’s take omega, and introduce a new ordering where you place 0 after everything else. This is still a well ordering of the elements of omega, but its ordinal number is omega+1.

If we skip the step of attempting to determine a definitive size of a well-ordered set, we can compare any two of them, X and Y, on the basis that a necessary and sufficient condition for ord X < ord Y is that X is similar to an initial segment of Y. We might establish a size comparison of two well ordered sets on the basis of whether one is equivalent to a subset of the other. In that case, we say that Y Dominates X. Domination forms a relationship among the power set of some set E, and is a reflexive, and transitive, but not anti-symmetric. That is, any set dominates itself, and if set A dominates set B which in turn dominates set C, then we can conclude that set A dominates set C. We cannot, however, conclude that if set A dominates set B and set B dominates set A, then A and B must be equal, since we’re only comparing sizes and not elements.

What we can conclude is that if set A dominates set B and set B dominates set A, then A and B must be equivalent, which is the Schroder-Bernstein Theorem. Together, this gives domination a set of properties sufficient to let it act similarly to a partial order. The Comparability Theorem for Sets says that for any two sets, one must dominate the other (if they mutually dominate, they are equivalent).

The ordinals aren’t quite dead and gone in our study of counting, though. By comparing sets with the ordinals, we can’t determine some facts about their countability. In the case that one set dominates another without being equivalent, we’ll call that Strict Domination. A set is only finite if it is strictly dominated by omega. If it is (not necessarily strictly) dominated by omega, then it is Countable, and if it is equivalent with omega, then it is Countably Infinite. If it dominates omega, then it’s just infinite.

As a consequence, if we know that there’s a function from omega that maps fully onto a set, then we can also conclude that the set being mapped onto is countable. In fact, if there is a function from any countably infinite set that fully maps onto a set, then the set being mapped onto is countable (and vice-versa - if a set is countable, then there exists a function that maps any countably infinite set onto it). The union of any finite or countably infinite set of countable set is countable. Since it can be expressed as such, the Cartesian product of countable sets is also countable.

While it might be nice to jump from this so-far all green list of the ways we can know that a set is countable to say that all sets are countable, this is not to be. Cantor’s Theorem stats that every set X is strictly dominated by its power set. Since we know power sets are always equivalent to 2^X (the set of all functions from X into 2, having 2^#(X) elements for finite sets), we know that any set X is also strictly dominated by the set 2^X. If we take X to be omega, then we know that at least the set of all functions from omega into 2 strictly dominates omega, and is Uncountable.

Now that we’ve set some boundaries on what we can and can’t count, and established that the ordinals aren’t really a good, firm foundation on which to base a universal system of counting, let’s try a new kind of number. Rather than jump right into it, it’s lay out our wishlist for how these new numbers will work. Cardinal Numbers, then, are to be constructed with a few useful properties. For every set X, there’s an associated cardinal number given by “card X”, and conversely, for every cardinal number a, there exists some set with card A = a. There will be an ordering for the cardinal numbers, and it will be the case that card X = card Y if and only if X and Y are equivalent, and card X < card Y if and only if Y strictly dominates X. Apparently, exactly how to define a set of numbers that has this wonderful list of properties is a matter of some debate, so we’ll just continue to talk about all the wonderful things we want them to be able to do for a while before we stake our flag on how we’ll achieve these properties.

The sum of a pair of cardinal numbers will be the cardinal number of a union of disjoint sets with the original cardinal numbers. (card A + card B = card (A U B)). If we have another two disjoint sets with the cardinal numbers as the first pair, then their union will be equivalent to the union of the first pair (and thus the unions will have the same cardinal numbers). (If card C = card A and card D = card B, then card C + card D = card A + card B, and C U D ~ A U B)Basically, cardinal addition behaves like you want cardinal addition to behave - assuming rough equivalence between the concept of cardinal addition and the concept of a union of disjoint sets.

The product of a pair of cardinal numbers will be the cardinal number of the Cartesian product of two sets with the original cardinal numbers. (card A * card B = card (A X B)) Alternatively, you can define the product as there result of adding a set to itself a specified number of times, where the second number is the specified number of times. Exponents may similarly be defined as carrying out a self-product a specified number of times. Else, we finally tie the notation of the set of all functions from one set to another together with the exponent notation: card A ^ card B = card A^B. And these exponents do work like the familiar exponents in the reals, unlike exponents in the ordinals.

Well, at least until (as always) things get screwy with infinite sets. The sum of a finite cardinal number with an infinite cardinal number is the infinite cardinal number. The sum of an infinite cardinal number with itself is itself. The product of an infinite cardinal number with itself is itself. The sum of two different infinite cardinal numbers is the larger of them. The rules for exponents follow naturally if you think of products as serialized sums and exponents as serialized products.

Right. That’s our wish list. Now how do we get it?

Similar to how we constructed the natural numbers, we’ll determine a special set of sets, each of which is equivalent to every set bearing a specific cardinal number. Each of those special sets will in turn be a cardinal number. The ordinals are a good place to start, since we know every set is equivalent to at least one (though possibly many) ordinals. The cardinal number of a set is the smallest ordinal equivalent to that set. We know that the set of all ordinals equivalent to a given set can be formed (indeed the supremum of the set must contain all its members), and since every set of ordinals is well ordered, it must have a least element. Thus, every set has a unique least ordinal with which it is equivalent.

The distinction between a set’s cardinal number and its ordinal equivalent is only necessary for infinite sets - the set of ordinals equivalent to a finite set is a singleton. For infinite sets, their infinite cardinal numbers are all limit numbers. Their order follows naturally from the order of the ordinals and satisfies all the items on our wish list. Since cardinal and ordinal arithmetic often give different results from operations notated and verbally referred to with the same word, it’s often necessary to specify whether one is intending to carry out, say, ordinal or cardinal arithmetic, even though they are the same set/number.

Cardinals are often associated with the hebrew letter aleph in the same way ordinals are associated with omega, and in this context omega is aleph_0 as the smallest transfinite ordinal. All the ordinals specified by omega and sums, products, and exponents of it are countably infinite. The smallest uncountable ordinal number is Omega and is the cardinal number aleph_1, the immediate successor of omega in the cardinals. There’s a lot of open questions about cardinal arithmetic involving these two. We know that 2^omega dominates Omega, but is this a strict dominance? There’s an unproven hypothesis that no, it isn’t, and 2^omega = Omega, but all that’s known is that it’s consistent with the axioms, not whether or not it’s true.

That’s waaaay above my pay grade, though.


Final Thoughts

The books ends extremely abruptly - no parting words about what Halmos hope we’ve learned from reading it. To a large extent, this is just a vocab primer - exercises are spread few and far between, and examples even thinner. Most of the ones that I’ve sprinkled throughout this post I came up with in order to better understand what I was trying to say.

So much of what’s in here is dedicated to laying down properties of infinite sets, and it’s frustrating how often things must be stated axiomatically to deal with these cases. It’s nice to see these axioms as kind of signposts clearly illuminating the things all our mathematics depend on that haven’t themselves been proven to be implicit in some more elemental rule, but I wish this book did a better job of delineating with examples where and how our intuitions break down around infinite sets. The one-to-one correspondence between the natural numbers and the positive integers is a great example that Halmos uses, but many of the edge cases are glossed over. Maybe its because certain principles only break down in really, really niche cases that it would take too long to delve into, though.

Overall - I came into this hoping to get a primer on set theory and think that I’ve got it, but from the way the book is constructed it’s hard to feel assured that I’ve learned enough in the subject from this one book alone. Halmos says this book is his answer to the question of how much set theory every mathematician needs to know, so I’m sure there’s more out there, but hopefully it won’t be too taxing to pick it up along the way as I encounter it.