Practical Foundations of Mathematics

Paul Taylor

1.1 Substitution

Logic as the foundations of mathematics turns mathematics on itself, but there is content even in the study of form.

We begin by considering how expressions must transform if they are to have and retain meaning, without being too specific about that meaning. Think of this as learning how to use a balance, ruler and stop-watch in preparation for studying physics. To those whose background is in pure mathematics, the explicit manipulation of syntax and the distinction between object-language and meta-language may be unfamiliar, but they are the stuff of logic and informatics. We shall not be too formal: formal rigour can only be tested dynamically, and those whose business it is to implement formal languages on computers are already well aware of many of the actual difficulties from a practical point of view.

The mathematical expressions which we write are graphically of many forms, not necessarily one-dimensional, and are often subject to complex rules of well-formedness. It suffices at first to think of strings composed of constants (0, 1, ..., p, etc ) and operation-symbols (+, x, [Ö], sin).

Given an expression of this kind, we can evaluate it, but that's the end of the story. The main preoccupation of algebra before the nineteenth century was the solution of equations.

Variables Algebraic techniques typically involve the manipulation of expressions in which the unknown is represented by a place-holder. This may be a letter, or (as often in category theory) an anonymous symbol such as (-) or ( = ). The transformations must be such that they would remain valid if the place were filled by any particular quantity. They must not depend on its being a variable-name rather than a quantity, so we cannot add the variables b and e to obtain g (``2+5 = 7'') because this is unlikely to continue to make sense when other values are put for the variables. Variables were used in this way by Aristotle.

The symbols are called parameters, constants, etc depending on ``how'' variable they are taken to be in the context. Usage might describe x as a variable, p and q as parameters and 3 as a constant, but they may change their status during the course of an argument. This is reflected in the history of the cubic equation (Example 4.3.4): Gerolamo Cardano, like the author of the Rhind papyrus, demonstrated his techniques by solving specific but typical numerical examples such as x³+6x = 20, but Franccois Viète was the first to use letters for the coefficients. (The use of x y z for variables and abc for coefficients is due to René Descartes.)

After we have solved x³ = 3px+2q for x in terms of p and q, we may consider how the form of the solution depends on them, in particular on whether p³ ł q². In a more general discussion, what were constants before are subject to investigation and become variables. This changing demarcation between what is considered constant or variable will be reflected in the binding of variables by quantifiers, and in the boxes used in Section 1.5.

The places p and q also differ in that they stand for arbitrary (") quantities, whilst x is intended to be filled by individual ($), though as yet unknown, ones, namely the solutions of the equation. This difference is represented formally by the two quantifiers: "p,q.$x.x³ = 3px+2q.

Structural recursion

REMARK 1.1.1 Although we write algebraic expressions and programs as linear strings of symbols, it is better to think of them as trees, in which the connectives are nodes and the constants and variables are leaves. Each operation- symbol corresponds to a certain species of node, with a sub-tree for each argument; we speak of nullary, unary, binary, ternary and in general k- ary or multiary operations if they have 0, 1, 2, 3 or k arguments. For example Cardano's first cubic equation might be written

omitted diagram environment

Transformation of expressions becomes surgery on trees. The analysis of a tree begins with the top node, the ``outermost'' operation-symbol, which is (confusingly) called the root. It is important to be able to recognise the root of an expression written in the familiar linear fashion; if it is a unary symbol such as a quantifier (", $) it is usually at the front (a prefix), but binary ones (+, x, Ú, Ů, Ţ ) are traditionally written between their arguments ( infix). Like unwrapping a parcel, you must remove the outermost operation-symbol first.

Expressions are typically stored in a computer by representing each node as a record, stating which operation-symbol, constant or variable it is, and with pointers to the records which represent the arguments or branches. Concretely, the pointer is the machine address of the record. The whole expression is denoted by the address of its root record, so the outermost operation-symbol is accessible im mediately, ie without inter 1em mediate processing.

The leaves of an expression-tree are its variables. The simplest operation on trees is substitution of a term for a variable. A copy of the expression-tree of the term is made for every occurrence of the variable, and attached to the tree in its place. If there were many occurrences, the term and its own variables would be proliferated, but if there were none the latter would disappear. We call this a direct transformation.

The application of a single operation-symbol such as + to its family of arguments is a special case of substitution: 3+5 is obtained by substituting 3 for x and 5 for y in x+y. Conversely, it can be useful to think of any expression-tree (with variables scattered amongst its leaves) as a generalised operation-symbol; since it replicates the algebraic language by adding in definable operations, the collection of all trees, substitutable for each other's leaves, is called the clone.

REMARK 1.1.2 An expression may also be regarded as a function of certain variables, written f(x,y,Ľ). Beware that ``(x,y,Ľ)'' signifies neither that x, y, etc actually occur in the expression (if they don't then f is a constant function), nor that these exhaust the variables which do occur (others may be parameters). We shall therefore describe this and f(a,b,Ľ), the result of substituting certain terms for the variables, as the informal notation for functions.

There are also indirect transformations which re-structure the whole tree from top to bottom. For each kind of node, we have to specify the resulting node and how its sub-trees are formed from the original ones and their transformed versions. If you have a mathematical background, but haven't previously thought about syntactic processes, then symbolic differentiation will perhaps be the most familiar example. The most complex case is multiplication.

omitted diagram environment

The tree has grown both horizontally and vertically, but notice that all occurrences of [(d)/( dx)] are now closer to the leaves, and this observation is just what we need to see that the recursion terminates: the motto is divide and conquer. We shall study recursion in Chapters II and VI.

These symbols u and v are meant to stand for sub-trees such as x³+6x or y, rather than values: we have to use unknowns representing trees in order to apply mathematics to the discussion. It is important to understand that u is a different kind of variable from x. In the differential calculus an approximately analogous distinction is made between dependent and independent variables; this is needed to explain what we mean by [(d)/( dx)]t, where the variable x may ``occur'' in t. In terms of formal languages, t is (a variable in the) meta-notation, ie a meta- variable. We must be prepared for the possibility that t does actually stand for a variable, in which case the manipulation usually behaves in one way for the special variable x and in another for other variables.

Equipped with some meta-notation, it is more convenient to change back from trees to the linear notation; then the rules of differentiation are

omitted eqnarray* environment

Terms and substitution Following these examples we can give

DEFINITION 1.1.3 A term is either

(a): a variable, such as x,
(b): a constant symbol, such as c, or
(c): r(u,v,Ľ), ie an operation-symbol r that is applied to a sequence of sub-terms u, v, ...

Then the set FV(t) of variables in a term t is defined by

omitted eqnarray* environment

Beware that FV is not an operation-symbol, since it does depend on the identity of the variables. It's a meta-operation like [(d)/( dx)].

Substitution is, for us, the most important use of structural recursion. We shall study the term a by exploiting its effect by substitution for variables in other terms, which must follow through all of the transformations of expressions. This obligation on each principle of logic gives rise to many curious (and easily overlooked) phenomena.

DEFINITION 1.1.4 By structural induction on t we define substitution, t[x: = a], of a term a for a variable x possibly occurring in t, as follows:

omitted eqnarray* environment

Our notation for substitution, t[x: = a], has a star, the real meaning of which will become clear in Chapter VIII. For the moment, it is a reminder that t[x: = a] is not an expression starting with a square bracket but a modified form of t, having the same first symbol as it has, unless this is x, in which case t[x: = a] starts with the first symbol of a. Other forms of this notation are to be found in the literature, including t[a/x] from typography and t[x: = a] from programming.

The next result, the Substitution Lemma, is the key to our semantic understanding of syntax in Chapters IV and VIII.

LEMMA 1.1.5 Let a, b and t be expressions, and x, y distinct variables such that x,y Ď FV(a) and y Ď FV(b). Then

omitted eqnarray* environment

PROOF: We use structural induction on the term (a or t), considering each of the cases c, x, y, z ( ie another distinct variable) and r(u,v,Ľ), using the induction hypothesis for the last. The first result, with b[x: = a] instead of b, is needed to prove the second in the case where t = x. []

When both x Ď FV(b) and y Ď FV(a), the operations commute,

t[y: = b][x: = a] = t[x: = a][y: = b],

justifying simultaneous substitution, for which we write

t[x: = a,y: = b] or t[[(x)\vec]: = [(a)\vec]].

The general form in the lemma, where we allowed x to occur in b, applies when variables do interfere: as in quantum mechanics, the failure of commutation shows that something is happening.

Quantification There are expressions which have logical values as well as those with numerical meaning. We shall say formula for a logical expression and term for other kinds. Equations (a = b) and inequalities (a Ł b) are (atomic) formulae; we might think of = as a binary operation-symbol whose result is a truth-value. There is an algebra of formulae à la Boole, whose operation-symbols include Ů, Ú and Ţ.

Using the informal notation, a formula such as f[x] containing variables is known as a predicate, x being the subject. If it has no variables it is called a proposition, sentence or closed formula. Like terms with variables, predicates may be understood as equations to be ``solved,'' defining the class of things which satisfy them. The formation of such a class from a predicate is known as comprehension (Definition 2.2.3).

A predicate f[x] may also be intended to make a general assertion, ie of every instance f[a] obtained by substituting a term a for x. In this case it is called a scheme. It may be turned into a single assertion by universal quantification: "x.f[x]. Similarly, we may assert that the predicate qua equation has some solution or witness by writing $x.f[x], with the existential quantifier.

Bound variables Quantified variables, like those in a definite integral,

ó
ő

2p

0

sinx dx

10
ĺ
x = 1

x² $x.f[x] "x.f[x] lx.p(x )

have a special status. They are no longer available for substitution, and can be replaced by any other variable not already in use: $x.f[x] is the same as $y.f[y]. In this new role , x is called a bound variable, where before it was free. This distinction was first clarified by Giuseppe Peano (1897). He invented the symbols Î , $ and many others in logic, and a language called Latino sine flexione or Interlingua in which to write his papers. The twin " was added by Gerhard Gentzen in 1935.

Variables link together occurrences which are intended to be the same thing, and substitution specifies what that thing is. Variable-binding operations isolate this link from the outside world. The linking may be represented on paper by drawing lines instead (such as in Exercise 1.23) or in a machine by assigning to the variable an address that can only be accessed via the binding node.

DEFINITION 1.1.6 The expressions $x.f[x] and $y.f[y] are said to be a-equivalent. In the latter, y has been substituted for each occurrence of x inside the quantifier, which now ranges over y. Although we often emphasise the dynamic nature of calculation, we shall treat these expressions as the same: there is no preferential choice of going from one to the other. Technically this presents no problems since it is decidable whether two given strings of symbols are a-equivalent. Nikolas de Bruijn devised a method of eliminating (the choice of names of) bound variables in favour of something more canonical (Exercise 2.24), but we shall not use it, as it is less readable by humans.

EXAMPLES 1.1.7 The predicate $x:N.y = x² says whether y is a square number. We may substitute a value (3) or another expression (w+z) for y, to obtain the (false) proposition $x:N.3 = x² or binary predicate $x:N.(w+z) = x². But we may not substitute a term involving x for y: $x:N.x = x² is a true proposition, which is not a substitution instance of the original formula. Likewise, a-equivalence lets us substitute a new variable for the bound x: $w:N.y = w² is the same predicate, but $3:N.y = 3² and $(w+z):N.y = (w+z)² are nonsense, whilst $y:N.y = y² is again a true proposition.

These difficulties can be overcome by the strong variable convention:

CONVENTION 1.1.8 We never use the same variable name twice in an expression. Where circumstances bring together two expressions with common variable names, we use a- equivalent forms instead.

Surely only an undisciplined programmer who couldn't be bothered to make a good choice of names would do otherwise? But no, although $ and " respect it, b-reduction in the l-calculus may cause duplication of sub-terms, and hence bound variables, so it may later substitute one abstraction within the scope of the other (Exercise 2.25). The convention is therefore a very naive one, unsuitable for computation. (A weaker convention will be needed in Section 4.3, and variables must be repeated for different reasons in Sections 9.3 and 9.4.)

DEFINITION 1.1.9 When we use variable-binding operators we must therefore have available an inexhaustible supply of variables. This means that, whatever (finite) set of variables we're already using, we can always find a new one, ie one which is different from the rest. Exercise 6.50 examines what is needed to mechanise this. Some accounts of formal methods specify that there are countably many variables - even number them as x₁,x₂,x₃,Ľ or x,x˘,x˘˘,Ľ - but one thing the formalism must not allow is to treat x_n as an expression containing a variable n.

In fact the x₁ written here is a meta-variable which stands for whatever actual variable may be desired in the application; it may stand for the same variable as x₂ unless we say otherwise. The difference between object-language and meta-language is important, but making it explicit can be taken too far. In this book we make make no systematic distinction between object-variables and the meta-variables used as place-holders for them, or, rather, we shall use x, y, z, x₁, x₂, etc as examples of variable names ( cf Cardano's examples of equations).

In the Substitution Lemma and elsewhere we do need to know whether or not two symbols are meant to be the same variable ( intensional or by mention equality); we write x\not ş y if they have to be different symbols. Even in this case they may be assigned the same value ( extensional or by use equality). On the other hand, we sometimes also need to constrain values, for example to say that x ą 0 as a precondition for division.

Substitution and variable-binding Bound variables complicate the formal definition of substitution, and it must be given simultaneously with the definitions of free variables and of a-equivalence. As before, we do this by structural recursion on the main term. If this is new to you then you should work through these definitions and prove the Substitution Lemma for yourself, maybe using the tree notation.

DEFINITION 1.1.10 By simultaneous structural recursion on t we define (i) the set, FV(t), of free variables, (ii) substitution, t[x: = a], of a term a for a variable x possibly occurring in it and (iii) a-equivalence with another term. To Definitions 1.1.3 and 1.1.4 we add that

(a): the constant c and variable x are each a-equivalent to themselves, but not to anything else;
(b): operation-symbols respect a-equivalence: if u is a-equivalent to u˘ and v to v˘, etc , then r(u,v,Ľ) is a-equivalent to r(u˘,v˘,Ľ);
(c): if t is $x.p then it is a-equivalent (to itself and) to any $y.( p[x: = y]) as long as y Ď FV(p), and similarly for ".

Hence

(d): if t is a quantified formula then (by the previous part, and using the inexhaustible supply of variables) we may assume that it is $y.p where y is not the same variable as x, and y does not occur in a; then FV(t) = FV(p)\{y} and $y.p[x: = a] = $y.p[x: = a]. (Similarly with ".) In this sense, quantification respects a-equivalence.

Notice that a-substitution for y Î FV(a) may be needed within t in order to define t[x: = a]. (Some authors say that ``a is free for x in t'' if there are no clashes between bound variables of t and free variables of a.) The Substitution Lemma remains valid, and, as a corollary, substitution respects a-equivalence.

Recall that the Lemma mentioned expressions in which the variable x must not occur freely, and we shall put the same condition on certain formulae in the rules for proving "x.f[x] and using $x.f[x] in Section 1.5. When this condition arises, it often means that at some point the same expression is also used as if the variable x really did occur in it.

NOTATION 1.1.11 For any term t with x Ď FV(t), we shall write

^
x

for the term t in which the variable x is considered to occur (``invisibly''). Like [x: = a], [^(x)]^* is meta-notation, though in this case it does not actually alter the term. The use of [^(x)]^* presupposes x Ď FV(t), ie that the term t came from a world in which x did not occur. In particular, [^(x)]^* x and [^(x)]^*[^(x)]^*t are not well formed terms.

The Notation conveys more than a negative listing of the free variables: it says that t is being imported into a world in which x is defined. Although this distinction seems trivial, making it and the passage back and forth between these two worlds explicit will help us considerably to understand the quantifiers in Chapter IX. What [^(x)]^* means will become clearer when we use it in Sections 1.5, 2.3 and 4.3. The hat has been adopted from the abbreviation 1,Ľ,[^(i)] ,Ľ,n for the sequence with i omitted in the theory of matrices and their determinants.

We define FV([^(x)]^*t) = FV(t)Č{x}.

PROPOSITION 1.1.12 [ Extended Substitution Lemma] Let a, b, t, u and v be expressions, and x and y be distinct variables such that x Ď FV(a,u) and y Ď FV(a,b,u,v). Then

omitted eqnarray* environment

Moreover if t˘ is a-equivalent to t, u˘ to u and a˘ to a, then t˘[x: = a˘] and [^(x)]^*u˘ are a-equivalent to t[x: = a] and [^(x)]^*u respectively.

PROOF: To the proof of Lemma 1.1.5 we add the case $z.s, in which, without loss of generality (by a- equivalence), the bound variable is not x or y and does not occur in a or b. Then [x: = a], [y: = b], [^(x)]^* and [^(y)]^* commute with $z, and the induction hypothesis applies. Since they also commute with [^(w)]^*[z: = w], the bound variable z can be renamed. []

The Extended Substitution Lemma will be our point of departure for the semantic treatment of syntax in Sections 4.3 and 8.2. Algebras are the subject of Section 4.6, and term algebras of Chapter VI, which considers structural recursion in a novel and sophisticated way.