We begin by considering how expressions must transform if they are to have and retain meaning, without being too specific about that meaning. Think of this as learning how to use a balance, ruler and stopwatch in preparation for studying physics. To those whose background is in pure mathematics, the explicit manipulation of syntax and the distinction between objectlanguage and metalanguage may be unfamiliar, but they are the stuff of logic and informatics. We shall not be too formal: formal rigour can only be tested dynamically, and those whose business it is to implement formal languages on computers are already well aware of many of the actual difficulties from a practical point of view.
The mathematical expressions which we write are graphically of many forms, not necessarily onedimensional, and are often subject to complex rules of wellformedness. It suffices at first to think of strings composed of constants (0, 1, ..., p, etc ) and operationsymbols (+, x, [Ö], sin).
Given an expression of this kind, we can evaluate it, but that's the end of the story. The main preoccupation of algebra before the nineteenth century was the solution of equations.
Variables Algebraic techniques typically involve the manipulation of expressions in which the unknown is represented by a placeholder. This may be a letter, or (as often in category theory) an anonymous symbol such as () or ( = ). The transformations must be such that they would remain valid if the place were filled by any particular quantity. They must not depend on its being a variablename rather than a quantity, so we cannot add the variables b and e to obtain g (``2+5 = 7'') because this is unlikely to continue to make sense when other values are put for the variables. Variables were used in this way by Aristotle.
The symbols are called parameters, constants, etc depending on ``how'' variable they are taken to be in the context. Usage might describe x as a variable, p and q as parameters and 3 as a constant, but they may change their status during the course of an argument. This is reflected in the history of the cubic equation (Example 4.3.4): Gerolamo Cardano, like the author of the Rhind papyrus, demonstrated his techniques by solving specific but typical numerical examples such as x^{3}+6x = 20, but Franccois Viète was the first to use letters for the coefficients. (The use of x y z for variables and abc for coefficients is due to René Descartes.)
After we have solved x^{3} = 3px+2q for x in terms of p and q, we may consider how the form of the solution depends on them, in particular on whether p^{3} ³ q^{2}. In a more general discussion, what were constants before are subject to investigation and become variables. This changing demarcation between what is considered constant or variable will be reflected in the binding of variables by quantifiers, and in the boxes used in Section 1.5.
The places p and q also differ in that they stand for arbitrary (") quantities, whilst x is intended to be filled by individual ($), though as yet unknown, ones, namely the solutions of the equation. This difference is represented formally by the two quantifiers: "p,q.$x.x^{3} = 3px+2q.
REMARK 1.1.1 Although we write algebraic expressions and programs as linear strings of symbols, it is better to think of them as trees, in which the connectives are nodes and the constants and variables are leaves. Each operation symbol corresponds to a certain species of node, with a subtree for each argument; we speak of nullary, unary, binary, ternary and in general k ary or multiary operations if they have 0, 1, 2, 3 or k arguments. For example Cardano's first cubic equation might be written
omitted diagram environment
Transformation of expressions becomes surgery on trees. The analysis of a tree begins with the top node, the ``outermost'' operationsymbol, which is (confusingly) called the root. It is important to be able to recognise the root of an expression written in the familiar linear fashion; if it is a unary symbol such as a quantifier (", $) it is usually at the front (a prefix), but binary ones (+, x, Ú, Ù, Þ ) are traditionally written between their arguments ( infix). Like unwrapping a parcel, you must remove the outermost operationsymbol first.
Expressions are typically stored in a computer by representing each node as a record, stating which operationsymbol, constant or variable it is, and with pointers to the records which represent the arguments or branches. Concretely, the pointer is the machine address of the record. The whole expression is denoted by the address of its root record, so the outermost operationsymbol is accessible im mediately, ie without inter 1em mediate processing.
The leaves of an expressiontree are its variables. The simplest operation on trees is substitution of a term for a variable. A copy of the expressiontree of the term is made for every occurrence of the variable, and attached to the tree in its place. If there were many occurrences, the term and its own variables would be proliferated, but if there were none the latter would disappear. We call this a direct transformation.
The application of a single operationsymbol such as + to its family of arguments is a special case of substitution: 3+5 is obtained by substituting 3 for x and 5 for y in x+y. Conversely, it can be useful to think of any expressiontree (with variables scattered amongst its leaves) as a generalised operationsymbol; since it replicates the algebraic language by adding in definable operations, the collection of all trees, substitutable for each other's leaves, is called the clone.
REMARK 1.1.2 An expression may also be regarded as a function of certain variables, written f(x,y,¼). Beware that ``(x,y,¼)'' signifies neither that x, y, etc actually occur in the expression (if they don't then f is a constant function), nor that these exhaust the variables which do occur (others may be parameters). We shall therefore describe this and f(a,b,¼), the result of substituting certain terms for the variables, as the informal notation for functions.
There are also indirect transformations which restructure the whole tree from top to bottom. For each kind of node, we have to specify the resulting node and how its subtrees are formed from the original ones and their transformed versions. If you have a mathematical background, but haven't previously thought about syntactic processes, then symbolic differentiation will perhaps be the most familiar example. The most complex case is multiplication.
omitted diagram environment
The tree has grown both horizontally and vertically, but notice that all occurrences of [(d)/( dx)] are now closer to the leaves, and this observation is just what we need to see that the recursion terminates: the motto is divide and conquer. We shall study recursion in Chapters II and VI.
These symbols u and v are meant to stand for subtrees such as x^{3}+6x or y, rather than values: we have to use unknowns representing trees in order to apply mathematics to the discussion. It is important to understand that u is a different kind of variable from x. In the differential calculus an approximately analogous distinction is made between dependent and independent variables; this is needed to explain what we mean by [(d)/( dx)]t, where the variable x may ``occur'' in t. In terms of formal languages, t is (a variable in the) metanotation, ie a meta variable. We must be prepared for the possibility that t does actually stand for a variable, in which case the manipulation usually behaves in one way for the special variable x and in another for other variables.
Equipped with some metanotation, it is more convenient to change back from trees to the linear notation; then the rules of differentiation are
omitted eqnarray* environment
Terms and substitution Following these examples we can give
DEFINITION 1.1.3 A term is either
Then the set FV(t) of variables in a term t is defined by
omitted eqnarray* environment
Beware that FV is not an operationsymbol, since it does depend on the identity of the variables. It's a metaoperation like [(d)/( dx)].
Substitution is, for us, the most important use of structural recursion. We shall study the term a by exploiting its effect by substitution for variables in other terms, which must follow through all of the transformations of expressions. This obligation on each principle of logic gives rise to many curious (and easily overlooked) phenomena.
DEFINITION 1.1.4 By structural induction on t we define substitution, t[x: = a], of a term a for a variable x possibly occurring in t, as follows:
omitted eqnarray* environment
Our notation for substitution, t[x: = a], has a star, the real meaning of which will become clear in Chapter VIII. For the moment, it is a reminder that t[x: = a] is not an expression starting with a square bracket but a modified form of t, having the same first symbol as it has, unless this is x, in which case t[x: = a] starts with the first symbol of a. Other forms of this notation are to be found in the literature, including t[a/x] from typography and t[x: = a] from programming.
The next result, the Substitution Lemma, is the key to our semantic understanding of syntax in Chapters IV and VIII.
LEMMA 1.1.5 Let a, b and t be expressions, and x, y distinct variables such that x,y Ï FV(a) and y Ï FV(b). Then
omitted eqnarray* environment
PROOF: We use structural induction on the term (a or t), considering each of the cases c, x, y, z ( ie another distinct variable) and r(u,v,¼), using the induction hypothesis for the last. The first result, with b[x: = a] instead of b, is needed to prove the second in the case where t = x. []
When both x Ï FV(b) and y Ï FV(a), the operations commute,

t[x: = a,y: = b] or t[[(x)\vec]: = [(a)\vec]].
The general form in the lemma, where we allowed x to occur in b, applies when variables do interfere: as in quantum mechanics, the failure of commutation shows that something is happening.
Quantification There are expressions which have logical values as well as those with numerical meaning. We shall say formula for a logical expression and term for other kinds. Equations (a = b) and inequalities (a £ b) are (atomic) formulae; we might think of = as a binary operationsymbol whose result is a truthvalue. There is an algebra of formulae à la Boole, whose operationsymbols include Ù, Ú and Þ.
Using the informal notation, a formula such as f[x] containing variables is known as a predicate, x being the subject. If it has no variables it is called a proposition, sentence or closed formula. Like terms with variables, predicates may be understood as equations to be ``solved,'' defining the class of things which satisfy them. The formation of such a class from a predicate is known as comprehension (Definition 2.2.3).
A predicate f[x] may also be intended to make a general assertion, ie of every instance f[a] obtained by substituting a term a for x. In this case it is called a scheme. It may be turned into a single assertion by universal quantification: "x.f[x]. Similarly, we may assert that the predicate qua equation has some solution or witness by writing $x.f[x], with the existential quantifier.
Bound variables Quantified variables, like those in a definite integral,

Variables link together occurrences which are intended to be the same thing, and substitution specifies what that thing is. Variablebinding operations isolate this link from the outside world. The linking may be represented on paper by drawing lines instead (such as in Exercise 1.23) or in a machine by assigning to the variable an address that can only be accessed via the binding node.
DEFINITION 1.1.6 The expressions $x.f[x] and $y.f[y] are said to be aequivalent. In the latter, y has been substituted for each occurrence of x inside the quantifier, which now ranges over y. Although we often emphasise the dynamic nature of calculation, we shall treat these expressions as the same: there is no preferential choice of going from one to the other. Technically this presents no problems since it is decidable whether two given strings of symbols are aequivalent. Nikolas de Bruijn devised a method of eliminating (the choice of names of) bound variables in favour of something more canonical (Exercise 2.24), but we shall not use it, as it is less readable by humans.
EXAMPLES 1.1.7 The predicate $x:N.y = x^{2} says whether y is a square number. We may substitute a value (3) or another expression (w+z) for y, to obtain the (false) proposition $x:N.3 = x^{2} or binary predicate $x:N.(w+z) = x^{2}. But we may not substitute a term involving x for y: $x:N.x = x^{2} is a true proposition, which is not a substitution instance of the original formula. Likewise, aequivalence lets us substitute a new variable for the bound x: $w:N.y = w^{2} is the same predicate, but $3:N.y = 3^{2} and $(w+z):N.y = (w+z)^{2} are nonsense, whilst $y:N.y = y^{2} is again a true proposition.
These difficulties can be overcome by the strong variable convention:
CONVENTION 1.1.8 We never use the same variable name twice in an expression. Where circumstances bring together two expressions with common variable names, we use a equivalent forms instead.
Surely only an undisciplined programmer who couldn't be bothered to make a good choice of names would do otherwise? But no, although $ and " respect it, breduction in the lcalculus may cause duplication of subterms, and hence bound variables, so it may later substitute one abstraction within the scope of the other (Exercise 2.25). The convention is therefore a very naive one, unsuitable for computation. (A weaker convention will be needed in Section 4.3, and variables must be repeated for different reasons in Sections 9.3 and 9.4.)
DEFINITION 1.1.9 When we use variablebinding operators we must therefore have available an inexhaustible supply of variables. This means that, whatever (finite) set of variables we're already using, we can always find a new one, ie one which is different from the rest. Exercise 6.50 examines what is needed to mechanise this. Some accounts of formal methods specify that there are countably many variables  even number them as x_{1},x_{2},x_{3},¼ or x,x¢,x¢¢,¼  but one thing the formalism must not allow is to treat x_{n} as an expression containing a variable n.
In fact the x_{1} written here is a metavariable which stands for whatever actual variable may be desired in the application; it may stand for the same variable as x_{2} unless we say otherwise. The difference between objectlanguage and metalanguage is important, but making it explicit can be taken too far. In this book we make make no systematic distinction between objectvariables and the metavariables used as placeholders for them, or, rather, we shall use x, y, z, x_{1}, x_{2}, etc as examples of variable names ( cf Cardano's examples of equations).
In the Substitution Lemma and elsewhere we do need to know whether or not two symbols are meant to be the same variable ( intensional or by mention equality); we write x\not º y if they have to be different symbols. Even in this case they may be assigned the same value ( extensional or by use equality). On the other hand, we sometimes also need to constrain values, for example to say that x ¹ 0 as a precondition for division.
Substitution and variablebinding Bound variables complicate the formal definition of substitution, and it must be given simultaneously with the definitions of free variables and of aequivalence. As before, we do this by structural recursion on the main term. If this is new to you then you should work through these definitions and prove the Substitution Lemma for yourself, maybe using the tree notation.
DEFINITION 1.1.10 By simultaneous structural recursion on t we define (i) the set, FV(t), of free variables, (ii) substitution, t[x: = a], of a term a for a variable x possibly occurring in it and (iii) aequivalence with another term. To Definitions 1.1.3 and 1.1.4 we add that
Hence
Notice that asubstitution for y Î FV(a) may be needed within t in order to define t[x: = a]. (Some authors say that ``a is free for x in t'' if there are no clashes between bound variables of t and free variables of a.) The Substitution Lemma remains valid, and, as a corollary, substitution respects aequivalence.
Recall that the Lemma mentioned expressions in which the variable x must not occur freely, and we shall put the same condition on certain formulae in the rules for proving "x.f[x] and using $x.f[x] in Section 1.5. When this condition arises, it often means that at some point the same expression is also used as if the variable x really did occur in it.
NOTATION 1.1.11 For any term t with x Ï FV(t), we shall write

The Notation conveys more than a negative listing of the free variables: it says that t is being imported into a world in which x is defined. Although this distinction seems trivial, making it and the passage back and forth between these two worlds explicit will help us considerably to understand the quantifiers in Chapter IX. What [^(x)]^{*} means will become clearer when we use it in Sections 1.5, 2.3 and 4.3. The hat has been adopted from the abbreviation 1,¼,[^(i)] ,¼,n for the sequence with i omitted in the theory of matrices and their determinants.
We define FV([^(x)]^{*}t) = FV(t)È{x}.
PROPOSITION 1.1.12 [ Extended Substitution Lemma] Let a, b, t, u and v be expressions, and x and y be distinct variables such that x Ï FV(a,u) and y Ï FV(a,b,u,v). Then
omitted eqnarray* environment
Moreover if t¢ is aequivalent to t, u¢ to u and a¢ to a, then t¢[x: = a¢] and [^(x)]^{*}u¢ are aequivalent to t[x: = a] and [^(x)]^{*}u respectively.
PROOF: To the proof of Lemma 1.1.5 we add the case $z.s, in which, without loss of generality (by a equivalence), the bound variable is not x or y and does not occur in a or b. Then [x: = a], [y: = b], [^(x)]^{*} and [^(y)]^{*} commute with $z, and the induction hypothesis applies. Since they also commute with [^(w)]^{*}[z: = w], the bound variable z can be renamed. []
The Extended Substitution Lemma will be our point of departure for the semantic treatment of syntax in Sections 4.3 and 8.2. Algebras are the subject of Section 4.6, and term algebras of Chapter VI, which considers structural recursion in a novel and sophisticated way.