Practical Foundations of Mathematics

Paul Taylor

2.5 Induction and Recursion

After the connectives x, Ž and +, the next thing we want to salvage from set theory is the system of natural numbers, equipped not only with arithmetic but also with induction and recursion. The latter have a long history in number theory, but the idioms are readily transferred to lists, trees and recursive programs. Set theory itself, with the axiom of foundation, admits a more general notion of induction on the Î relation, of which transfinite iteration is a useful special case. Another similar, also rather disorderly, form of induction is that needed to prove properties of self-calling programs. These will be our starting point.

Induction and recursion are often confused in symbolic logic, but the difference between them is another one between propositions and types: we prove theorems by induction, but construct programs by recursion. The two are linked by the General Recursion Theorem 6.3.13 (it is not enough ``to prove by induction that p(x) is defined,'' as this wrongly treats existence as a predicate).

DEFINITION 2.5.1 A self-calling program is said to obey the recursive paradigm if it defines a function p(x) of exactly one argument and it proceeds as follows:

(a): from the given argument t, by means of code known to terminate,
it derives zero or more other sub-arguments [(u)\vec] of the same type,
(b): then it applies p to each sub-argument \termu_j in parallel, ie with no interaction among the sub-computations,
(c): finally, from the sub-results returned by the recursive calls together with the original argument, by means of another piece of code which is known to terminate, it computes its own result to be output.

We shall call phases (a) and (c) parsing and evaluation respectively, and write u\prec t whenever u is a sub-argument of t. This may mean that u is symbolically a sub-expression of t, in which case we're doing structural recursion, but we shall use the notation and terminology for any recursive problem which fits the paradigm, whatever the means by which the us are obtained from t.

Recursive definitions of functions with many arguments, and systems of functions which call each other ( mutual recursion), are no more general than the paradigm, because the arguments or results may be packaged into a ``tuple.'' The complication which we have not allowed is that a sub- result might be fed back to help generate another sub- argument. The effect of the paradigm is to allow the parsing of hereditary sub-arguments to be exhausted before any evaluation is done; this will be expressed more formally in Definition 6.3.7. A special case, tail recursion, is discussed in Section 6.4, where it is shown to be equivalent to the imperative while construct.

EXAMPLES 2.5.2

(a)

The familiar example of the factorial function

fact n = if n = 0 then 1 else n*fact(n-1)fi

fits the paradigm because, for n ł 1, it first prepares the argument n-1, calls itself with this argument, and finally multiplies the result of the recursive call by the original argument to give its own result.

(b)

A compiler for a programming language takes the source text of an entire program and extracts the immediate sub-expressions, which it feeds back to itself. It puts the digested code-fragments together as the translation of the whole program.

(c)

Carl Friedrich Gauss (1815) showed how to transform a polynomial of degree 2n into one of degree n(2n-1), in such a way that each root of the latter may be used to obtain two roots of the former by means of a quadratic equation. Any polynomial of odd degree has a real root by the intermediate value theorem. Here the sub-argument is the auxiliary polynomial and the sub-result one of its roots.

How do we prove that such programs are correct? There is nothing to say in the case of the factorial function, because it is evaluated directly from its definition. The correctness of the compiler, ie of its output (object code) relative to the input (source), is given case by case for the connectives of the language: for example the Floyd rules in Sections 4.3, 5.3 and 6.4 for imperative languages say that if the sub-expressions satisfy certain conditions then so does the whole program(-fragment). Finally, Gauss's paper gave a method of deriving the auxiliary polynomial and showed that if it has a root then this provides a root of the original one.

Induction In all of these arguments, and whenever we want to prove consequences of recursive definitions, it is necessary to demonstrate

a typical instance, q[t], of the required property, from "u.u\prec tŢ q[u], the induction hypothesis, that all of the sub-arguments have the property,

by methods which are peculiar to the application. Common usage often forgets that the q[t] so proved is not that of the ultimate conclusion, "x.q[x], but only a stepping stone. In order to justify a single q[a], the induction step must be proved for every t (or at least for all descendants of a). Such sloppiness in the presentation can give the impression that induction is circular.

To obtain the desired result, we have to use an additional rule, of which the proof ``by induction'' is the premise.

DEFINITION 2.5.3 The induction scheme is the rule

omitted prooftree environment

where \prec is a binary relation on a set X. We say that \prec is well founded if the induction scheme is valid. It is a scheme because it is asserted for each predicate q (which may involve parameters) on X; quantification over predicates introduces second order logic (Section 2.8).

The important point formally is that the variable t in the premise and the induction hypothesis which it satisfies are bound in an ("Ţ Á) proof box, so nothing is to be assumed about t apart from this.

-2 15pt omitted proofbox environment

Both the property q and the variable t must be given formal names, and the box clearly delimited. Idiomatically, this is done by saying ``we shall prove by \prec -induction on t that q'' to introduce what (line wf pf inter being omitted) is apparently a proof of "x.q[x].

We have followed tradition in using a symbol \prec suggesting order for well founded relations, but this is extremely misleading. The motivating examples of immediate sub-processes and sub-expressions above are not transitive, and Corollary 2.5.11 shows that \prec is irreflexive.

Genealogical analogies can be useful, but there is a conflict of intuition about the direction. In set theory it is traditionally synthetic: ``on the first day, God made Ć'' [Knu74]. Our recursive paradigm is analytic, and sub-processes are usually called children, so Ć is the ultimate descendant.

DEFINITION 2.5.4 If u\prec t or u\prec \prec t then we refer to u as a child or a descendant of t, respectively, where \prec \prec is the transitive closure of \prec . For any predicate f on X, we say that t is hereditarily f if all of its descendants satisfy f, ie

"u.u\prec \prec tŢ f[u].

Theorem 3.8.11 uses this to show that the strict induction scheme

omitted prooftree environment

(with Ű ) is equivalent to the one in Definition 2.5.3.

EXAMPLES 2.5.5

(a)

If no instances of the induction hypothesis are actually used in the body of the proof, it is simply an ("Á ) (Definition 1.5.1).

(b)

Let n\prec n+1 on N. The rule reduces to the familiar Peano scheme,

omitted prooftree environment

which is also known as simple, primitive, or just mathematical induction. ĺ_{k = 1}ⁿ k² = ¹/₆n(n+1)(2n +1) and similar results may be proved like this.

(c)

Let \prec be the strict arithmetical order < on N. Then we have course-of-values induction (sometimes called complete induction):

omitted prooftree environment

This relation \prec is the transitive closure of the previous one. Notice that in the base case (n = 0) the hypothesis "i < n.q[i ] is vacuous.

Whereas simple induction deals with orderly, step-by-step reductions (like nuclear decay by a- or b-radiation), numerous algorithms split up graphs into parts whose size may be anything between zero and the original size (like fission). For such problems we need course-of- values induction. But the problem for a graph with n nodes does not usually reduce to sub-problems involving graphs of every size < n (the products of fission of ²³⁵U by neutrons typically have masses about 90 and 145), so \prec is usually sparser than < .

(d)

For a recursive program in a functional language, the invariant or induction hypothesis behaves just as in the mathematical setting: the induction premise is that the result on an argument t is correct as long as the recursive calls (\termu_j) have been computed correctly.

(e)

Since the order of any subgroup or quotient divides that of a finite group, many properties of groups (notably the Sylow theorems, Fact 6.6.8, and their applications) are shown by induction on the divisibility orders on N. This may be seen as just another example of course-of-values induction (c). Alternatively, prime factorisation expresses the positive integers as (part of) an infinite product of copies of N, which may be given either the product order or, as with Gauss's proof, the lexicographic one.

Section 2.7 considers induction and recursion for numbers and lists.

Minimal counterexamples If some property is not universally true of the natural numbers, there is a least number for which it fails. (Georg Cantor generalised this idea to define the ordinals, Section 6.7.) This least number principle depends on excluded middle, but with a little care such proofs can be made intuitionistic.

PROPOSITION 2.5.7 [Richard Montague, 1955] A relation \prec is well founded iff every non-empty subset has a \prec -minimal element.

PROOF: Let Ć š V Ě X, and put q[x] = (x Ď V). This means that the conclusion of the induction scheme ( "x.q[x]) fails. The scheme is therefore valid iff the premise also fails, ie

$t. ["u.u\prec tŢ u Ď V] Ů(t Î V).

This t is what we mean by a minimal element of V. []

REMARK 2.5.8 The classical idiom of induction is to use the minimal counterexample to show that "x.q[x] by reductio ad absurdum. Like Example 1.8.2, this is very often gratuitous. The hypothesis ``let t be a minimal counterexample'' breaks into two parts:

(a): t is a counterexample, \lnot q[t], and
(b): anything less, u\prec t, is not a counterexample, ie "u\prec t.q[u].

The second part is the induction hypothesis as given in Definition 2.5.3. Commonly, q[t] can be deduced without using the first part, so this may be eliminated to give a sound intuitionistic argument.

Imagine a proof by induction as the tide climbing up a beach: in the end the whole beach gets submerged. The induction scheme requires us to show that any part which is above only water soon gets wet. The classical induction hypothesis is that there is somewhere which is on the water-line but still completely dry - a rather implausible supposition!

COROLLARY 2.5.9 Any (properly) recursive definition of a function which terminates on all values includes an analysis ( parsing) into

(a): base cases (leaves or \prec -minimal elements) at which the value of the function is immediately given in the code (in Examples 2.5.2 these are respectively 0! = 1, constants of the language and odd-degree polynomials), and
(b): induction steps (branches) at which the value of the function is given by a k-ary operation applied to the values at recursive arguments; k = 1 in three of the examples, but for a compiler the arities are exactly those of the connectives of the language. []

This feature of well founded structures and free algebras will recur many times, in the rest of this chapter, in Sections 3.7- 3.8, and in Chapter VI.

Descending chains Dependent Choice (Definition 1.8.9) gives another condition which is non-constructively equivalent to well-foundedness.

PROPOSITION 2.5.10 (X,\prec ) is well founded iff there is no sequence of the form ···\prec \termu₃\prec \termu₂\prec \termu₁\prec \termu₀.

PROOF:

[[Ţ ]] If there is a sequence \termu_n with " n.\termu_n+1\prec \termu_n then U = {\termu_n|n Î N } is a non-empty set with no \prec -minimal element; alternatively the predicate q[x] ş ("n.x š \termu_n) does not satisfy the induction scheme.
[[Ü ]] If (X,\prec ) is not well founded, it has a non-empty subset U Ě X with no minimal element. In other words, "t.t Î UŢ $u. u Î UŮu\prec t and \termu₀ Î U, so the axiom of dependent choice applies. []

The geometrical form of Euclid's algorithm (Example 6.4.3) was perhaps the first statement of induction: an infinite sequence of numbers is found, each less than the one before, which, as Elements VII 31 says quite clearly, is impossible amongst whole numbers. When this algorithm is applied to the side and diagonal of a square or pentagon, however, the successive quotients form a periodic sequence. David Fowler [Fow87] has made an appealing reconstruction of classical Greek mathematics, in which he claims that Euclid's book was motivated by problems like this. (The story that the discovery of the irrationality of Ö2 led to the downfall of the Pythagorean sect seems to have been an invention of Iamblichus nearly 1000 years later.)

In the investigation of the triangle which bears his name, Blaise Pascal (1654) stated lemmas for one case and for any ``following'' one, and concluded that the result holds for all numbers. John Wallis (1655) may also have been aware of the logical principle, but Pierre de Fermat (1658) was the first to make non-trivial use of the method of infinite descent to obtain positive results in number theory (Exercise 2.33).

A variant of this result is (Dénes) König's lemma (1928). A tree is an oriented graph with a distinguished node (its root) from which to each node there is a unique oriented path. Since in most applications the branches (outgoing arrows) from any node are labelled in some fashion, the Choice of branch may usually be made canonically. To any programmer, the procedure in König's lemma is depth-first search.

COROLLARY 2.5.11 If, in a tree, every node has finitely many branches and there is no infinite oriented path, then the tree is finite.

PROOF: As there is no w^op -sequence, the branch relation is well founded, so induction on it is valid. Using this, every sub-tree is finite. []

Using this formulation of induction is not so easy as it may appear: somehow we have to find that forbidden infinite sequence, which usually requires König's lemma.

Notice that we never said that the terms in the sequence had to be distinct; of course if only finitely many values may occur then any infinite sequence of them must contain repetitions ( loops).

COROLLARY 2.5.12 If the relation has a cycle, u\prec u, or u\prec v\prec u, or u\prec v\precw\prec u, etc , then \prec is not well founded. []

Proof trees The graphical presentation, as a tree without cycles, shows the difference between the induction step which is to be proved and the induction hypothesis which can be assumed.

omitted prooftree environment

REMARK 2.5.13 The ingredients for an inductive proof of q[n] by the Peano primitive induction scheme are the axiom/zero q[0] and the rule/successor "n.q[n]Ţ q[n+1]. If we need to prove q[3], then these may be composed, showing the different roles of the occurrences of q. Compare this with the proofs we called 0, 1 and 2 in Example 2.4.1, and notice how the structure of the term directly gives that of the proof.

Termination Infinite descending sequences are very familiar to anyone who has tried to execute an ill founded recursive program: the sequence is stored on the machine stack, which overflows. (Non-terminating loop programs are usually silent.)

REMARK 2.5.14 (Robert Floyd, 1967) In what circumstances can a recursive program p which obeys the paradigm (Definition 2.5.1) fail to terminate on a particular argument \termu₀? If, and only if, the execution of p(\termu₀) involves the computation of some p(\termu₁), which in turn requires some p(\termu₂) and so on ad infinitum.

For any two values t and u of the domain of definition, write u\prec_pt if u is one of the sub-arguments generated in the first step of the computation of p(t). Then the program p terminates on all arguments iff \prec_p is well founded. Let q[x] denote that p terminates on the argument x. (This is a higher order predicate; indeed its mathematical content is really that \prec_p restricted to the descendants of x is well founded.) The premise of the induction scheme holds by assumption on the form of p, that the first and third stages of the program terminate. The conclusion is universal termination, so this is the case iff the induction scheme is valid. []

A good definition in mathematics is one which is the meeting point of examples and theorems, cf introduction and elimination rules in logic. Definition 2.5.3 is such a meeting point. Minimal counterexamples and descending sequences may perhaps help to give you a picture of how induction works, but they do not prove it. Without excluded middle, nor can they even prove the results for which induction is needed.