We shall make a distinction between elements and sets, though in such a formalism it is usual to refer to terms and types as we did in Section 1.3. We shall also modify what Zermelo did very slightly, taking the cartesian product XxY as a primitive instead of the unordered pair {X,Y}, and the singleton instead of the empty set ( cf Examples 2.1.8).
Our system conforms very closely to the way mathematical constructions have actually been formulated in the twentieth century. The claim that set theory provides the foundations of mathematics is only justified via an encoding of this system, and not directly. It is, or at least it should be, surprising that it took 60 years to arrive at an axiomatisation which is, after all, pretty much as Zermelo did it in the first place.
The study of sheaf theory by the Grothendieck school unintentionally wrested foundations from the settheorists, though it was Bill Lawvere who saw that logic could be done in these new worlds (toposes). The formulation of languages for such reasoning was undertaken by Bénabou, Coste, Fourman, Joyal and Mitchell; although they called it ``set theory,'' they were in fact developing the type theory below and in Chapter V 1pt. For a detailed account of the modern system and its history, see [ LS86].
DEFINITION 2.2.1 Zermelo type theory consists of:
Singleton and product We shall give introduction and elimination rules for the types announced in this Definition, as we did in Sections 1.4 1.5 for the predicate calculus. There we were really only interested in the fact that various propositions could be proved, but now we want to say that certain terms do or do not denote the same value, so we must give reduction rules relating them (Section 1.2).
REMARK 2.2.2 In the colon notation (1.3.3), the product satisfies

The other eight rules make (xÁ) into a twoway correspondence:


Pairing is the generic binary operation, coding many arguments as one,

Comprehension and powerset As we have rejected the ``inductive conception'' of a set as a voluntary conspiracy of its elements, we are left with the problem of defining what a ``subset'' is. We shall do this in terms of predicates (Definition 1.4.1): a subset of X is by definition the same thing as a predicate with a variable of type X. By the same convention, we may treat a kary relation as either a predicate in k variables or a subset of the kfold cartesian product, cf Definition 1.3.1(a).
DEFINITION 2.2.3 The axiom of comprehension says that
if X is a set, and f[x] is a predicate on X, then {x:Xf[x]} is also a set.
The syntax for comprehension, like quantification, binds the variable x, so it is subject to aequivalence (Definition 1.1.6). Since it is therefore a contextchanging operation, box or sequent methods similar to those of Section 1.5 are needed to formalise it properly. We shall not in fact do this until Section 9.5, because it is preferable to introduce these methods for the functiontype instead, as we do in the next section.
The elements of this new set (or, as we prefer to say following Notation 1.3.3, the terms of the new type) are given by the twoway rule,

REMARK 2.2.4 Notice that the term a has both the ambient set X and the subset {x:Xf[x]} as its type, so two occurrences of the same term may have different types. In particular, by one of the elimination rules, any term of the subtype acquires the wider type. This defines an injective function, which is called the inclusion:

DEFINITION 2.2.5 For a predicate f[x] on X, we write {x:Xf[x]} not only for the new set defined above by comprehension, but also for an element of the powerset P(X); this is the introduction rule (PÁ). The elimination rule (PE) provides the binary membership relation,

The use of Î in (a) is a rather ingrained habit, to which we shall often revert since the colon is not altogether a satisfactory alternative: if it could be stripped of its settheoretic confusions, a symbol derived from the Italian è (is) would be entirely reasonable, whereas the colon is punctuation. Nor do we often bother to write the subscript. We shall, however, be careful to write "x:X.f for quantifiers, reserving Î for the guarded quantifiers, so
omitted eqnarray* environment
The ambiguous notation makes the brule for powerset look the same as the introduction and elimination rules for comprehension together:


We shall study the powerset in Sections 2.8, 5.2 and 9.5.
REMARK 2.2.6 The symbols £ and < for the reflexive and irreflexive orders on N were used for inclusion of subsets in the nineteenth century (and are still used for subgroups), but Ernst Schröder introduced = Ì and Ì . Many authors use Ì for strict containment, and Í for the nonstrict version, but strict inclusion is neither primitive (constructively) nor particularly useful: if U Ì V but V\not Ì U then the latter fact has to be proved  and should be stated  separately. Indeed Louis Couturat rejected the symbol = Ì in 1905 ``parce qu'il est complexe, tandis que la relation d'inclusion est simple.'' The analogy with arithmetic is bogus (Section 3.1): Ì is syntactic sugar for \vdash , Þ or \hookrightarrow . In these cases, rightly, no notation has been invented for the strict versions, or any resolution made into strict and equal. So we use Ì in the nonstrict sense.
A conflicting notation survives in philosophy as É for implication (and is also used by some modern authors in type theory to avoid overloading the arrow notations). It is actually older: Joseph Gergonne introduced C for contient and É for its converse in 1817, and these symbols were used by Peano and by Russell and Whitehead. (In fact Russell and Whitehead also used Ì for containment in our sense.)
REMARK 2.2.7 A common abuse of the subsetforming notation (which we have already committed) is to put a term in place of the variable:
{f(x,y)f[x,y]} means {z$x,y.f[ x,y]Ùz = f(x,y)}.
Which variables x, y, ..., are deemed to be bound in this notation? This is not made clear  informally, we write ``x Î X,y Î Y,f[x,y]'' to indicate what we mean. (In fact this is the same abuse of notation which we ridiculed in Examples 1.1.7.) As an important special case, we often want to apply a function ``in parallel'' to the elements of a subset, obtaining a (sub)set of results. In this case we write, as in Notation 1.3.4,

Another, perhaps unfamiliar, special case is when there is a constant on the left of the divider, or maybe no variables in the expression at all:

Parametric sets In algebra and the predicate calculus we used terms and formulae containing variables, but the types of the variables were fixed in advance. In Zermelo type theory, by contrast, the comprehension operation {xf} is not required to bind all of the free variables of the formula f. Those which remain free become the free variables of a type expression, for example,

As Factors[x] may be used as the type of another variable y in terms, formulae and other typeexpressions, we must modify Definition 1.5.4.
DEFINITION 2.2.8 For each typed variable y:Y in a context, the free variables [(x)\vec] of the typeexpression Y[[(x)\vec]] must occur earlier in the list than the variable y itself.
This and Remark 2.2.4 make Zermelo type theory very complicated.
Fortunately, Exercise 2.17 shows that it is possible to rewrite any typeexpression (using any of the constructors, including comprehension) as a subset of a type defined using 1, x, P and N alone. So comprehension may be postponed and used just once. Variables may be taken to range over types from this simpler class, where each formula or term is guarded (Remark 1.5.2) by the predicate defining the subset. Types and terms may once more be treated separately, and the exchange rule allows us to disregard the order of the variables. This system is studied in [LS86].
Comprehensionfree types are, however, somewhat artificial and do not allow us to speak directly of functions, real numbers or equivalence classes. But in practice the difficulty which we mentioned does not arise unless we make actual use of types containing nontrivial dependency. The notation for dependent types is not straightforward, but as they are important to the practical foundations of mathematics and interpreted in many semantic models we devote the final two chapters to them.
Historical comments The foregoing motivation is a fiction, in terms of history. Ernst Zermelo had been enticed from applied mathematics into foundations by Hilbert. He was interested in cardinal arithmetic and in particular the wellordering property (Proposition 6.7.13), which Cantor had assumed but was unable to prove. The effort of formalising his proof of this led Zermelo to a usable system of foundations, which brought set theory to what was arguably its perigee.
REMARK 2.2.9 Zermelo's axioms [Zer08b] were, in modern notation,
Real numbers and ``ideal'' algebraic numbers were both constructed as sets, so it was reasonable at the time to treat individuals and collections in the same way. Nowadays we are used to mutual recursive definitions of several distinct syntactic classes (such as commands and expressions in programs), and it is preferable to do this for terms, types and predicates.
The (now archaic ) German word Aussonderung means ``sorting out'' in the sense of discarding what is not wanted. No single English word seems to fit as well, but it is often translated as separation, comprehension being reserved for the unbounded way of forming sets which brought Gottlob Frege's system down. But as the word ``separation'' has wider and more natural uses in, for example, topology, it seems better to reemploy the term whose meaning would be immediately recognisable to nonlogicians.
The definit properties sparked a new controversy. Again people were forced to think, now about the formulation of the sentential calculus, and the outcome was first order model theory. (Zermelo's vagueness is now perhaps an advantage, since we may (remove Choice and) substitute intuitionistic , classical or some other calculus for his missing definition.)
Extensionality defines equality between individuals, but it also imposes an absolute notion of interchangeability for types, where subsequent experience has taught us that specified isomorphisms should play this role (Definition 1.2.12ff). This first, seemingly innocuous, axiom has some rather bizarre results, particularly for unions and intersections. If the grandchildren z belong to some known set w, then Zermelo's union {z$y.z Î y Î x} is simply the union in the lattice P(w), which is given by the existential quantification of a family of predicates in our notation. However, if x = {y_{1},y_{2}}, where y_{1} and y_{2} are sets given independently, they may suddenly be found to overlap. For the more natural disjoint union (Example 2.1.7), explicit coding must be used to distinguish the elements of the two sets. (The overlapping union is not definable using Definition 2.2.1, but see [Tay96a] and [Tay96b] .)
Even as an equality test, extensionality is highly recursive, although a further axiom (Foundation) is needed to justify the recursion. Dana Scott (1966) showed that it is essential to giving the axiom of replacement its power: without extensionality, ZF (see below) is provably consistent in Zermelo type theory ( cf Example 7.1.6(g) and Exercise 9.62).
More recently, this idea has arisen in process algebra as bisimulation , and can in fact be seen as a notion of coinduction [ Acz88,Hen88], cf Exercise 3.53, Example 6.3.3 and Remark 6.7.14.
It is easy to find a bijection between Q and N (we say that Q and Z are countable), but Cantor showed that there is none between R and N. He found his now well known diagonalisation argument in 1891, but had a much prettier proof using intervals rather than decimal expansions in 1873. This began the theory of cardinality. But his next discovery, that there is a bijection between R^{2} and R, showed at its conception in 1877 that the attempt to classify infinite sets up to (not necessarily continuous) bijection is powerless to define dimension and hence make the distinctions which are important in mathematics. (See the remarks after Proposition 9.6.4 for how cardinals ought to be interpreted.)
As Emile Borel stressed in 1908, the important observation about Q is that there is an effective coding, not anything to do with its ``size.''
In the same year as Zermelo, Bertrand Russell gave a theory of ``ramified'' types, later developed in [RW13]. Leon Chwistek (19235) and Frank Ramsey (1927) showed how to eliminate ramification, giving a theory (historically known as simple type theory) which is essentially equivalent to Zermelo's without the comprehension scheme. Versions of the systems of Frege, Russell and Whitehead, Zermelo, Ramsey and Quine are compared in [Hat82]; for a more philosophical survey, see [Bla33].
Thoralf Skolem, like Russell, set out to deal with the impredicativity questions which had been raised by Poincaré and Weyl (see the end of this chapter). He recognised the difference between the mathematical statement that P(N) is uncountable, and the metamathematical one that its terms are recursively enumerable. Skolem is best remembered for formalising first order logic and establishing its model theory. As this was, for a long time, the only available tool for the logical analysis of mathematical theories, set theory became the study of axiomatisations of a first order Î relation. Since the predicates belonged to the metalanguage, comprehension was turned into an infinite scheme of axioms, and, by the LöwenheimSkolem theorem (Remark 2.8.1), set theory has countable models which contain uncountable cardinalities [Sko22].
This paradox was repugnant to Zermelo, who correctly said that it revealed the limitations of first order logic, not of set theory. Indeed, turning comprehension into salami doesn't explain the typeconstructors that actually occur in mathematics (but see Definition 5.2.10). After his treatment by the subsequently dominant tradition in set theory, Zermelo would, I believe, readily forgive my putting ordered pairs into his system.
A trick was found for coding the ordered pairs which mathematics needs in terms of the unordered ones Zermelo provided (Exercise 2.19). Like the Sheffer stroke (page 1.8.4), this only obscures matters. (Those who like to argue from authority should note that [ Bou57], whilst indulging in obfuscatory reductionism of its own (Remark 1.6.7), treats pairing as primitive.) Set theory is sometimes called the ``machine code'' of mathematics, and this comment is supposed to justify the pair formula.
Since mathematicians have traditionally regarded their Platonist world as unchanging, it is understandable for them to think like this, but I find it shocking to hear it from informaticians. What use are the Z80 machine code programs I wrote around 1980? Modularity and portability are the programming values which we should be teaching.
Why Zermelo made the choice he did between ordered and unordered pairs is perhaps worthy of historical study, though it seems unlikely that it actually occurred to him that he was making any significant decision at all. Irredundancy was considered important at the time, probably because Hilbert had only recently found the final settlement of the axiomatics behind Euclid's parallel postulate. In fact the cartesian product had yet to gain the importance which it has now; for example [Die88] remarks that the first mention of the product of two abstract topological spaces was also in 1908.
In 1922 Abraham Fraenkel and Thoralf Skolem added another typeforming operation: the axiomscheme of replacement. Owing to its obscure formulation, use of Replacement is widely overlooked, but it is incredibly powerful: Richard Montague (1966) showed that it can prove the consistency, not only of Zermelo set theory itself, but of the extension of this by any single theorem of ZF. We shall try to see what Replacement means in the final section.
The last of the ZermeloFraenkel axioms is foundation, which says that we may use induction on the membership relation (Definition 6.7.5). On the other hand, it also says that everything in mathematics belongs to the settheoretic hierarchy, contradicting the intuition that, for example, the group A_{5} has elements but not elements of elements. (This group has familiar representations as the even permutations of five objects and the symmetries of a dodecahedron, as well as of two different projective spaces.) The axiom of foundation makes the hierarchy rigid, whereas objects of mathematical interest typically have lots of automorphisms. Stone's representation of Boolean algebras as lattices of clopen subsets also gives an example where there is no preferred view in which either ``points'' belong to ``neighbourhoods'' or vice versa .
After Zermelo, more axioms were added to set theory in an endeavour to formulate the strongest system that would remain consistent, in which everything anyone could possibly want would be provable, and the model would be unique. We mention some of these in Sections 4.1 and 9.6.
We shall take the opposite strategy: by restricting the hypotheses of an argument to just what is needed, we are better able to understand how it works, and we are led to generalisations and novel applications. When the box or sequent rules for the correct management of the free variables of terms and comprehension types are taken into account, Zermelo type theory becomes far too complicated to study in one go, and is arguably too powerful for actual mathematics and programming.