Practical Foundations of Mathematics

Paul Taylor

2.2  Sets (Zermelo Type Theory)

These methods of construction were first set out as a basis of set theory by Ernst Zermelo in 1908. The subsequent work sought to formalise them in terms of a notion of membership in which any entity in the universe may serve either as an element or as a set, and where it is legitimate to ask of any two entities whether one bears this relation to the other.

We shall make a distinction between elements and sets, though in such a formalism it is usual to refer to terms and types as we did in Section 1.3. We shall also modify what Zermelo did very slightly, taking the cartesian product XxY as a primitive instead of the unordered pair {X,Y}, and the singleton instead of the empty set ( cf Examples 2.1.8).

Our system conforms very closely to the way mathematical constructions have actually been formulated in the twentieth century. The claim that set theory provides the foundations of mathematics is only justified via an encoding of this system, and not directly. It is, or at least it should be, surprising that it took 60 years to arrive at an axiomatisation which is, after all, pretty much as Zermelo did it in the first place.

The study of sheaf theory by the Grothendieck school unintentionally wrested foundations from the set-theorists, though it was Bill Lawvere who saw that logic could be done in these new worlds (toposes). The formulation of languages for such reasoning was undertaken by Bénabou, Coste, Fourman, Joyal and Mitchell; although they called it ``set theory,'' they were in fact developing the type theory below and in Chapter V- 1pt. For a detailed account of the modern system and its history, see [ LS86].

DEFINITION 2.2.1 Zermelo type theory consists of:

(a)
the singleton, 1, which has just one element, called *  ;

(b)
the cartesian product, Xx Y, whose elements are ordered pairs a,b, where a Î X and b Î Y, whenever X and Y are sets; Remark 2.2.2 gives the full definition of the product and associated pairing and projection operations;

(c)
the set U, whenever X is a set and U Ì X a subset of it;

(d)
the powerset, P(X), whose elements are the subsets of a set X;

(e)
and the set N of natural numbers (Section  2.7).

Singleton and product   We shall give introduction and elimination rules for the types announced in this Definition, as we did in Sections 1.4- 1.5 for the predicate calculus. There we were really only interested in the fact that various propositions could be proved, but now we want to say that certain terms do or do not denote the same value, so we must give reduction rules relating them (Section  1.2).

REMARK 2.2.2 In the colon notation (1.3.3), the product satisfies

omitted array environment
The rules have been named in the same way as in Remark  1.4.3: the introduction rule (xÁ) creates a term of product type (the pair), which is used by the elimination rules, and the b- and h-rules cut out detours. The equality rules say that substitution interacts with p0, p1 and   ,   in the same way that it does with operation-symbols.

The other eight rules make (xÁ) into a two-way correspondence:

omitted prooftree environment
There are two elimination rules and two b-rules: in the ternary case there would be three, so in the nullary case (the singleton) there is no elimination rule at all. (xÁ) has two premises, so (1Á) has none.

omitted prooftree environment               omitted prooftree environment
Compare these with the rules for descriptions (Definition  1.2.10ff) and conjunction (Definition  1.4.2ff).

Pairing is the generic binary operation, coding many arguments as one,

r(a,b) = _
r
 
(a,b),
so the product type is used in the semantics of algebra (Section  4.6), but we choose not to use it for the syntax. Pairing is sometimes claimed to make variables redundant, but it does so by replacing named variables with numbered projection functions. In a product of many factors (as for record types in programming languages) we want to name, not number, the fields. [Pit95] treats the product carefully in its own right.

Comprehension and powerset   As we have rejected the ``inductive conception'' of a set as a voluntary conspiracy of its elements, we are left with the problem of defining what a ``subset'' is. We shall do this in terms of predicates (Definition  1.4.1): a subset of X is by definition the same thing as a predicate with a variable of type X. By the same convention, we may treat a k-ary relation as either a predicate in k variables or a subset of the k-fold cartesian product, cf Definition  1.3.1(a).

DEFINITION 2.2.3 The axiom of comprehension says that

if X is a set, and f[x] is a predicate on X, then {x:X|f[x]} is also a set.

The syntax for comprehension, like quantification, binds the variable x, so it is subject to a-equivalence (Definition  1.1.6). Since it is therefore a context-changing operation, box or sequent methods similar to those of Section  1.5 are needed to formalise it properly. We shall not in fact do this until Section 9.5, because it is preferable to introduce these methods for the function-type instead, as we do in the next section.

The elements of this new set (or, as we prefer to say following Notation 1.3.3, the terms of the new type) are given by the two-way rule,

omitted prooftree environment
which we may read downwards as an introduction rule ({}Á) and upwards as two elimination rules, rather similar to those for the product type. The b- and h-rules say that the term a stays the same; this is because f[a], unlike the type Y in the product, has no associated term.

REMARK 2.2.4 Notice that the term a has both the ambient set X and the subset {x:X|f[x]} as its type, so two occurrences of the same term may have different types. In particular, by one of the elimination rules, any term of the subtype acquires the wider type. This defines an injective function, which is called the inclusion:

{x:X|f[x]}\hookrightarrow X.
In category theory we define subsets as injective functions (Section  5.2).

DEFINITION 2.2.5 For a predicate f[x] on X, we write {x:X|f[x]} not only for the new set defined above by comprehension, but also for an element of the powerset P(X); this is the introduction rule (PÁ). The elimination rule (PE) provides the binary membership relation,

( Î X):X\leftharpoondown \rightharpoonup P(X)        for each type X.
Our Î X is typed as shown, whereas in set theory there is a single Î relation for the whole class of sets. As we mentioned in Notation  1.3.3 and Exercise 1.12, there is a distinction between

(a)
a:X, the statement in the meta-language that a term a has type X,

(b)
and a Î XU, which says that the term a of type X satisfies the predicate f[x] (in the object-language) defining U = {x: X|f[x]}.

The use of Î in (a) is a rather ingrained habit, to which we shall often revert since the colon is not altogether a satisfactory alternative: if it could be stripped of its set-theoretic confusions, a symbol derived from the Italian è (is) would be entirely reasonable, whereas the colon is punctuation. Nor do we often bother to write the subscript. We shall, however, be careful to write "x:X.f for quantifiers, reserving Î for the guarded quantifiers, so

omitted eqnarray* environment

The ambiguous notation makes the b-rule for powerset look the same as the introduction and elimination rules for comprehension together:

f[a]Û a Î {x:X|f[x]}. (P b)
Regarding them as elements rather than types, we have to say when two subsets U,V:P(X) are equal:

U = V     if    "x:Xx Î UÛ x Î V ,
ie the predicates defining these subsets are inter-provable. Like Û in logic, but unlike equality in arithmetic, we have to give two arguments to show that subsets are equal, one in each direction: U Ì V and U É V ( cf Exercise  2.18). Finally, the (Ph)-rule is U = {x:X|x Î U}.

We shall study the powerset in Sections 2.8, 5.2 and 9.5.

Notation  

REMARK 2.2.6 The symbols £ and < for the reflexive and irreflexive orders on N were used for inclusion of subsets in the nineteenth century (and are still used for subgroups), but Ernst Schröder introduced = Ì and Ì . Many authors use Ì for strict containment, and Í for the non-strict version, but strict inclusion is neither primitive (constructively) nor particularly useful: if U Ì V but V\not Ì U then the latter fact has to be proved - and should be stated - separately. Indeed Louis Couturat rejected the symbol = Ì in 1905 ``parce qu'il est complexe, tandis que la relation d'inclusion est simple.'' The analogy with arithmetic is bogus (Section  3.1): Ì is syntactic sugar for \vdash , Þ or \hookrightarrow . In these cases, rightly, no notation has been invented for the strict versions, or any resolution made into strict and equal. So we use Ì in the non-strict sense.

A conflicting notation survives in philosophy as É for implication (and is also used by some modern authors in type theory to avoid overloading the arrow notations). It is actually older: Joseph Gergonne introduced C for contient and É for its converse in 1817, and these symbols were used by Peano and by Russell and Whitehead. (In fact Russell and Whitehead also used Ì for containment in our sense.)

REMARK 2.2.7 A common abuse of the subset-forming notation (which we have already committed) is to put a term in place of the variable:

{f(x,y)|f[x,y]} means {z|$x,y.f[ x,y]Ùz = f(x,y)}.

Which variables x, y, ..., are deemed to be bound in this notation? This is not made clear - informally, we write ``x Î X,y Î Y,f[x,y]'' to indicate what we mean. (In fact this is the same abuse of notation which we ridiculed in Examples 1.1.7.) As an important special case, we often want to apply a function ``in parallel'' to the elements of a subset, obtaining a (sub)set of results. In this case we write, as in Notation  1.3.4,

\funf!(U) = {f(x)|x Î U} = {y|$x.x Î UÙy = f(x)}
for the image (see also Remark 3.8.13(b)); in particular \funf!({x}) = {f(x)}. Notice that the extended use of the notation for comprehension, and in particular the image, disguise an existential quantifier, which we shall discuss in Section  2.4.

Another, perhaps unfamiliar, special case is when there is a constant on the left of the divider, or maybe no variables in the expression at all:

{*|f}.
(Classically, we would say that this is {*} if f is true and Æ otherwise.) In this way, propositions correspond to subsets of the singleton, and to elements of P(1).

Parametric sets   In algebra and the predicate calculus we used terms and formulae containing variables, but the types of the variables were fixed in advance. In Zermelo type theory, by contrast, the comprehension operation {x|f} is not required to bind all of the free variables of the formula f. Those which remain free become the free variables of a type- expression, for example,

Factors[x] = {y|$z.x = y z}.
The ``arguments'' of a dependent type Y[x] will be enclosed in square brackets, as we have already done for predicates. This is an informal notation like f(x) for functions (Remark  1.1.2). Of course each argument x:X has its own type, but x is not itself a type (although in Section  2.8 we shall briefly discuss an extension in which there are type variables and quantification over them). These phenomena are called polymorphism, because the same type-expression may be instantiated in many ways.

As Factors[x] may be used as the type of another variable y in terms, formulae and other type-expressions, we must modify Definition  1.5.4.

DEFINITION 2.2.8 For each typed variable y:Y in a context, the free variables [(x)\vec] of the type-expression Y[[(x)\vec]] must occur earlier in the list than the variable y itself.

This and Remark 2.2.4 make Zermelo type theory very complicated.

Fortunately, Exercise 2.17 shows that it is possible to rewrite any type-expression (using any of the constructors, including comprehension) as a subset of a type defined using 1, x, P and N alone. So comprehension may be postponed and used just once. Variables may be taken to range over types from this simpler class, where each formula or term is guarded (Remark  1.5.2) by the predicate defining the subset. Types and terms may once more be treated separately, and the exchange rule allows us to disregard the order of the variables. This system is studied in [LS86].

Comprehension-free types are, however, somewhat artificial and do not allow us to speak directly of functions, real numbers or equivalence classes. But in practice the difficulty which we mentioned does not arise unless we make actual use of types containing non-trivial dependency. The notation for dependent types is not straightforward, but as they are important to the practical foundations of mathematics and interpreted in many semantic models we devote the final two chapters to them.

Historical comments   The foregoing motivation is a fiction, in terms of history. Ernst Zermelo had been enticed from applied mathematics into foundations by Hilbert. He was interested in cardinal arithmetic and in particular the well-ordering property (Proposition 6.7.13), which Cantor had assumed but was unable to prove. The effort of formalising his proof of this led Zermelo to a usable system of foundations, which brought set theory to what was arguably its perigee.

REMARK 2.2.9 Zermelo's axioms [Zer08b] were, in modern notation,

(a)
Bestimmtheit (literally definiteness, but usually known in English as extensionality): if "z.z Î x Û z Î y then x = y;

(b)
Elementarmengen (basic sets): Æ, {x}, {x,y} are sets if x and y are;

(c)
Aussonderung (comprehension) for definit properties (see below);

(d)
Potenzmenge (power set);

(e)
Vereinigung (union): {z|$y.z Î y Î x} is a set if x is;

(f)
Auswahl (Choice): see Exercise 2.15; and

(g)
Unendlichkeit (infinity, N is a set): see Exercise  2.47.

Real numbers and ``ideal'' algebraic numbers were both constructed as sets, so it was reasonable at the time to treat individuals and collections in the same way. Nowadays we are used to mutual recursive definitions of several distinct syntactic classes (such as commands and expressions in programs), and it is preferable to do this for terms, types and predicates.

The (now archaic ) German word Aussonderung means ``sorting out'' in the sense of discarding what is not wanted. No single English word seems to fit as well, but it is often translated as separation, comprehension being reserved for the unbounded way of forming sets which brought Gottlob Frege's system down. But as the word ``separation'' has wider and more natural uses in, for example, topology, it seems better to re-employ the term whose meaning would be immediately recognisable to non-logicians.

The definit properties sparked a new controversy. Again people were forced to think, now about the formulation of the sentential calculus, and the outcome was first order model theory. (Zermelo's vagueness is now perhaps an advantage, since we may (remove Choice and) substitute intuitionistic , classical or some other calculus for his missing definition.)

Extensionality defines equality between individuals, but it also imposes an absolute notion of interchangeability for types, where subsequent experience has taught us that specified isomorphisms should play this role (Definition 1.2.12ff). This first, seemingly innocuous, axiom has some rather bizarre results, particularly for unions and intersections. If the grandchildren z belong to some known set w, then Zermelo's union {z|$y.z Î y Î x} is simply the union in the lattice P(w), which is given by the existential quantification of a family of predicates in our notation. However, if x = {y1,y2}, where y1 and y2 are sets given independently, they may suddenly be found to overlap. For the more natural disjoint union (Example 2.1.7), explicit coding must be used to distinguish the elements of the two sets. (The overlapping union is not definable using Definition 2.2.1, but see [Tay96a] and [Tay96b] .)

Even as an equality test, extensionality is highly recursive, although a further axiom (Foundation) is needed to justify the recursion. Dana Scott (1966) showed that it is essential to giving the axiom of replacement its power: without extensionality, ZF (see below) is provably consistent in Zermelo type theory ( cf Example 7.1.6(g) and Exercise 9.62).

More recently, this idea has arisen in process algebra as bisimulation , and can in fact be seen as a notion of co-induction [ Acz88,Hen88], cf Exercise 3.53, Example 6.3.3 and Remark 6.7.14.

It is easy to find a bijection between Q and N (we say that Q and Z are countable), but Cantor showed that there is none between R and N. He found his now well known diagonalisation argument in 1891, but had a much prettier proof using intervals rather than decimal expansions in 1873. This began the theory of cardinality. But his next discovery, that there is a bijection between R2 and R, showed at its conception in 1877 that the attempt to classify infinite sets up to (not necessarily continuous) bijection is powerless to define dimension and hence make the distinctions which are important in mathematics. (See the remarks after Proposition  9.6.4 for how cardinals ought to be interpreted.)

As Emile Borel stressed in 1908, the important observation about Q is that there is an effective coding, not anything to do with its ``size.''

In the same year as Zermelo, Bertrand Russell gave a theory of ``ramified'' types, later developed in [RW13]. Leon Chwistek (1923-5) and Frank Ramsey (1927) showed how to eliminate ramification, giving a theory (historically known as simple type theory) which is essentially equivalent to Zermelo's without the comprehension scheme. Versions of the systems of Frege, Russell and Whitehead, Zermelo, Ramsey and Quine are compared in [Hat82]; for a more philosophical survey, see [Bla33].

Thoralf Skolem, like Russell, set out to deal with the impredicativity questions which had been raised by Poincaré and Weyl (see the end of this chapter). He recognised the difference between the mathematical statement that P(N) is uncountable, and the metamathematical one that its terms are recursively enumerable. Skolem is best remembered for formalising first order logic and establishing its model theory. As this was, for a long time, the only available tool for the logical analysis of mathematical theories, set theory became the study of axiomatisations of a first order Î relation. Since the predicates belonged to the meta-language, comprehension was turned into an infinite scheme of axioms, and, by the Löwenheim-Skolem theorem (Remark  2.8.1), set theory has countable models which contain uncountable cardinalities [Sko22].

This paradox was repugnant to Zermelo, who correctly said that it revealed the limitations of first order logic, not of set theory. Indeed, turning comprehension into salami doesn't explain the type-constructors that actually occur in mathematics (but see Definition  5.2.10). After his treatment by the subsequently dominant tradition in set theory, Zermelo would, I believe, readily forgive my putting ordered pairs into his system.

A trick was found for coding the ordered pairs which mathematics needs in terms of the unordered ones Zermelo provided (Exercise  2.19). Like the Sheffer stroke (page  1.8.4), this only obscures matters. (Those who like to argue from authority should note that [ Bou57], whilst indulging in obfuscatory reductionism of its own (Remark 1.6.7), treats pairing as primitive.) Set theory is sometimes called the ``machine code'' of mathematics, and this comment is supposed to justify the pair formula.

Since mathematicians have traditionally regarded their Platonist world as unchanging, it is understandable for them to think like this, but I find it shocking to hear it from informaticians. What use are the Z80 machine code programs I wrote around 1980? Modularity and portability are the programming values which we should be teaching.

Why Zermelo made the choice he did between ordered and unordered pairs is perhaps worthy of historical study, though it seems unlikely that it actually occurred to him that he was making any significant decision at all. Irredundancy was considered important at the time, probably because Hilbert had only recently found the final settlement of the axiomatics behind Euclid's parallel postulate. In fact the cartesian product had yet to gain the importance which it has now; for example [Die88] remarks that the first mention of the product of two abstract topological spaces was also in 1908.

In 1922 Abraham Fraenkel and Thoralf Skolem added another type-forming operation: the axiom-scheme of replacement. Owing to its obscure formulation, use of Replacement is widely overlooked, but it is incredibly powerful: Richard Montague (1966) showed that it can prove the consistency, not only of Zermelo set theory itself, but of the extension of this by any single theorem of ZF. We shall try to see what Replacement means in the final section.

The last of the Zermelo-Fraenkel axioms is foundation, which says that we may use induction on the membership relation (Definition  6.7.5). On the other hand, it also says that everything in mathematics belongs to the set-theoretic hierarchy, contradicting the intuition that, for example, the group A5 has elements but not elements of elements. (This group has familiar representations as the even permutations of five objects and the symmetries of a dodecahedron, as well as of two different projective spaces.) The axiom of foundation makes the hierarchy rigid, whereas objects of mathematical interest typically have lots of automorphisms. Stone's representation of Boolean algebras as lattices of clopen subsets also gives an example where there is no preferred view in which either ``points'' belong to ``neighbourhoods'' or vice versa .

After Zermelo, more axioms were added to set theory in an endeavour to formulate the strongest system that would remain consistent, in which everything anyone could possibly want would be provable, and the model would be unique. We mention some of these in Sections 4.1 and  9.6.

We shall take the opposite strategy: by restricting the hypotheses of an argument to just what is needed, we are better able to understand how it works, and we are led to generalisations and novel applications. When the box or sequent rules for the correct management of the free variables of terms and comprehension types are taken into account, Zermelo type theory becomes far too complicated to study in one go, and is arguably too powerful for actual mathematics and programming.