Practical Foundations of Mathematics

Paul Taylor

1.7  Automated Deduction

There are mechanised approaches to algebra, combinatorics, numerical analysis, etc , but this book is not about those subjects, it is about logic: all we can discuss in this section is the choreography of ", $, Ù, Ú and Þ , and reinforce the importance of the scope of variables and hypotheses. We cannot do the creative part of mathematics, because the solution to a problem in number theory, for example, may be ``simply'' an ingenious observation about z(s) = ån = 1¥ n-s or counting points in a cube in Rn.

The steps which can be automated are the obvious ones, in a technical sense: this literally means ``in the way'' in Latin. It is obvious how to go through a foreign airport, not because you know it intimately, but because there are signs telling you where to turn whenever you need them (you hope). This is also known as exam technique: write down and exploit what you already know. Whereas the box or sequent rules of predicate calculus from the previous section are the laws of the game of proof, the heuristics are hints on the tactics. This section is based on teaching first year informatics students to construct proofs on paper. Of course this will also give some idea of how to write a program to do it, but the strategy for making choices when backtracking is needed raises issues far outside the scope of this book [Pau92].

George Polya [Pol45] and Imre Lakatos [ Lak63] gave two classic accounts of heuristics in mathematics, using Euclidean geometry for examples. Polya's advice - make a plan and carry it out, compare your problem with known theorems, etc - is extremely valuable to help students of mathematics (and professionals) get past the blank sheet of paper, but treats more strategic aspects of proof than we can. An early theorem prover was based on his methods of drawing diagrams and formulating conjectures and counterexamples; that this seems odd now shows both the sophistication of modern proof theory and perhaps also the danger of isolation from the traditional instincts of mathematicians.

Nikolas de Bruijn's AUTOMATH project (late 1960s) set out to codify existing mathematical arguments, rather than to find new theorems, and this remains the research objective of automated reasoning. Johan van Benthem Jutting (1977) translated Edmund Landau's book Foundations of Analysis into AUTOMATH and analysed the ratio by which the text is magnified, which was approximately constant from beginning to end. Similar work has been done for other areas of mathematics.

There are certain dangers inherent in the formalisation of mathematics. Systems of axioms acquire a certain sanctity with age, and in the how of churning out theorems we forget why we were studying these conditions in the first place. Computer languages suffer far more from this problem: nobody would claim any intrinsic merit for FORTRAN or HTML, but sheer weight of existing code keeps them in use. Through the need for a standard - any standard - a similar disaster could befall mathematics if set theory were chosen. As with any programming, and also with the verification of programs, far more detail is required than is customary in mathematics. G. H. Hardy (1940) claimed that there is no permanent place in the world for ugly mathematics, but I have never seen a program which is not ugly. Even when the mathematical context and formal language are clear, we should not perpetuate old proofs but instead look for new and more perspicuous ones.

Theoretical basis   By Gentzen's cut-elimination theorem, supposing that G\vdash q is provable, we need only look for a cut-free proof. We can say something about the final lines of such a proof (in sequent calculus) from the structure of the rules other than cut, and in particular the sub-formula property (Remark  1.4.9).

Although we must read a finished proof from top to bottom, the search for and creation of the proof are not so direct. (The commonest misconception about mathematicians amongst the general population is that we act like robots when trying to solve problems.) By the nature of cut- elimination, the heuristics are in fact goal-driven: they proceed mainly in the opposite direction from the reading of the completed proof.

For certain fragments of logic, if there is any proof of G\vdash q then there is one obtainable by means of the following heuristics. Conversely, if we fail, by completeness (Remark 1.6.13) there is a counterexample, which can be obtained from the trace of our proof- attempts.

FACT 1.7.1 Hereditary Harrop formulae are the definite formulae g and goals q respectively defined by the grammar

omitted eqnarray* environment

where a is atomic. If G is a list of definite formulae and q is a goal formula for which G\vdash q is provable, then it has a uniform proof, ie one in which each sequent D\vdash f with f non-atomic is deduced only by means of the introduction rule for the outermost connective of f [MNPS91]. []

So we construct the proof in reverse by dismantling the goal formula.

Resolution   When the goal is an atomic formula, logical manipulation has nothing to say, and we have to make use of the database, ie the axioms G given in the problem. These are written at the top of the page, numbering the lines from 1 and giving the justification for each line as `` data.'' The desired conclusion(s) or goals q are written at the bottom, numbering the lines backwards from 99 and giving no reason (yet). We shall progressively add more lines 2, 3, ... and 98, 97, ... and also fill in reasons; the lines which have no reason so far are called pending goals.

REMARK 1.7.2 If the atomic formula q is a goal and the axiom

f1Ùf2Ù¼ÙfkÞ q
is in the database, then the problem is reduced to proving each of f1, f2, ..., fk. The idea of logic programming is that this is a procedure which defines q (the head) in terms of the fi (the body), and the notation is reversed (sometimes using ``: = '' for Ü ):
qÜ f1,  f2,  ¼,  fk.
In order to answer the query q, we regard the database as a program and call a procedure whose head is q, which calls sub-procedures; if the original call returns successfully then q has been proved, ie the answer to the query is yes. Notice that while the search for a proof is in progress there may be several pending goals, to be taken conjunctively, just as the database may consist of several hypotheses.

REMARK 1.7.3 The program is non-deterministic, because there may be several procedures for q: if using one of them fails to find a proof, we backtrack and try another. To do this by hand, place a new sheet of tracing paper over the proof so far each time you have to make a choice; then if the choice is wrong you can discard the working which depended on it and return to the immediately preceding state. Only the last choice is discarded: earlier ones may still be viable until all possibilities at this stage have been exhausted. This means that the choices in the search form a nested system in the heuristics, but this is independent of the nested contexts (boxes) in the completed proof.

REMARK 1.7.4 Similarly, a predicate in the database of the form

" ®
x
 
, ®
y
 
. æ
è
$ ®
z
 
. f1[ ®
x
 
, ®
y
 
, ®
z
 
]Ù···Ùfk[ ®
x
 
, ®
y
 
, ®
z
 
] ö
ø
Þ q[ ®
c
 
, ®
x
 
]
is written as the procedure
q[ ®
c
 
, ®
x
 
]Ü f1[ ®
x
 
, ®
y
 
, ®
z
 
],¼ ,fk[ ®
x
 
, ®
y
 
, ®
z
 
].
By convention, the whole formula is universally quantified over all the free variables, which is the same as saying that it is a scheme for the closed formulae obtained by substituting terms for variables. A formula such as this, in which the sub-goals fi are also atomic predicates, is called a ( positive or definite) Horn clause of arity k. (Recall from Definition 1.4.1(a) that the atomic predicates fi[[(x)\vec]] also have arity - the length of the sequence [(x)\vec] - but this is independent of the arity of the clause, ie the number of atomic formulae it contains.)

Suppose that we want to use this Horn clause to prove (solve the query) q[[(a)\vec],[(b)\vec]]. By ("E ) we put [(a)\vec] for [(x)\vec], and by (Þ E ) we have to prove f[[(b)\vec],[(d)\vec],[(e)\vec]] and match [(a)\vec] = [(c)\vec], substituting suitable terms [(d)\vec] and [(e)\vec] for [(y)\vec] and [(z)\vec].

EXAMPLE 1.7.5 A database application might have axioms such as

train[London, Bristol, £ 40] train[London, York, £ 40] train[London, Paris, £ 100] train[Paris, Nice, £ 80] train[x,y,u] train[y,x,u]
journey [x,y,u] train[x,y,u]
journey[x,z,u+vÜ   journey[x,y,u],   journey[y,z,v] .

Then for the query journey[Nice, Bristol, u] we expect not only a proof that one can go from Nice to Bristol by rail, but also the route and cost. So when we assert $x.q[x] we give a definite answer as to what x is - these substitutions are the result of the computation.

REMARK 1.7.6 John Robinson showed how to do this by resolution (1965). Gentzen's Hauptsatz cannot eliminate cuts when axioms are used, and resolution deals with those that remain. It involves substitution of terms for variables, but each resolution step only gives partial information about what has to be substituted: the constraints which fully determine the value may come from quite different parts of the proof (execution of the program).

We can use declarations (Definition 1.6.8) to record this information:

= .2 - omitted proofbox environment

omitted proofbox environment

Here f[[(b)\vec],[(y)\vec]0] is a new goal, to be satisfied by further resolution, as we have done with q. The partial proof on the right illustrates the similar way in which existential goals are handled.

The equations [(a)\vec] = [(c)\vec] are also new goals. If these terms are simply names for individuals (London, York, etc ) and there are no axioms to say that individuals with different names can be equal then we can see immediately whether or not the equations hold. If not, this attempt at resolution fails and we backtrack to find another one. In practice this is done by database-searching techniques.

The programming language PROLOG does resolution and unification. Despite its name, it does not in fact deal with the logical connectives and quantifiers, but what we shall come to call the algebraic fragment (although this will not look like algebra until Section 4.6). The denotational semantics, based on the work of Jacques Herbrand (1930), will be discussed in Sections 3.7 and 3.9.

Unification   Goals involving function-symbols need another technique, called unification. How to do unification is easy: the difficult part is to see what it means. The functions in question are those whose values might be enumerated in a database, such as mother_of, not arithmetic.

REMARK 1.7.7 A goal of the form r([(u)\vec]) = r([(v)\vec]), where r is an operation-symbol for which no laws are known, can only follow by substitution:

omitted prooftree environment
with the new goals \termu1 = \termv1, ..., \termuk = \termvk.

This does not mean that every function is injective. We want to carry on building the logical structure of a proof, possibly without knowing what terms serve as the subjects of predicates. We postpone filling in these terms, and then try to do so as non-specifically as possible, using only the building blocks we already have in the term calculus of the object-language. The possibility that two terms might denote the same thing is only considered if the terms themselves were formed in the same way.

The point is that

(a)
if we have no information about r, then the only hope we have of proving that r([(u)\vec]) = r([(v)\vec]) is by first showing \termui = \termvi (1 £ i £ k). This step can be built into the proof-layout we have given, by treating \termui = \termvi as new goals and giving `` substitution'' as the reason for the line r([(u)\vec]) = r([(v)\vec]).

So

(b)
a match between terms r(\termu1,\termu2) = r(\termv1,\termv2 ) having the same outermost operation creates a new equation for each argument, \termu1 = \termv1 and \termu2 = \termv2; despite this proliferation the algorithm does terminate because \termui and \termvi are all shorter than r([(u)\vec]) and r([(v)\vec]);

(c)
there is no way to prove r([(u)\vec]) = s([(v)\vec]) if r and s are different operation-symbols satisfying no laws; this is called a clash;

(d)
if r does satisfy other axioms or laws, unification may be of no help; at any rate it may involve backtracking, as for example with concatenation of lists, where a division must somehow be chosen;

(e)
a goal of the form x0 = u, where x0 is an indeterminate and u a term in which x0 does not occur, forms part of the solution of the unification problem, and completes the unfinished declaration ( putx0 = ?) in Remark 1.7.6;

(f)
an equation such as x0 = r(x0), in which x0 does occur on the right, cannot be satisfied by substitution of a term for x0 (try it!); this necessitates the occurs-check;

(g)
the other axioms for a congruence (Definition  1.2.3) are also applicable: if u = v and v = w are goals then so is u = w, and these may match, clash, form part of the solution or fail the occurs-check;

(h)
the heuristic applies only to goals, not to hypotheses - it exploits u = vÞ r(u) = r(v) without asserting the converse.

Eventually, if neither type of failure (clash or occurrence) happens, the system of equations will be saturated, ie none of these rules will expand it further. Then some of the indeterminates will be expressed as terms, possibly involving the others.

REMARK 1.7.8 Some of the indeterminates may be independent, for example y0 is arbitrary in the equation x0 = r(y0). The full solution to the unification problem is not unique, since we may put anything we please for y0. However, the solution in which y0 is left as we have it is the most general unifier in that

(a)
it is itself a solution,

(b)
every other solution is obtained by substituting terms into it, and

(c)
any such substitution is a solution.

We can in fact eliminate the confusion of working backwards from goals, and reduce unification to a kind of algebra. A theory with operation-symbols of various arities (numbers of arguments) but no laws is called a free theory, and unification is the study of its free or term model. We shall take this up formally in Chapter  VI and return to unification in Section 6.5 , where we shall see that Remark 1.7.7(g) can be simplified.

Unification in theories with laws is more difficult. It is possible to handle commutativity and associativity, at the cost of uniqueness: there is now a family of maximally general unifiers. Unification under the distributive law would give a uniform way to solve Diophantine equations, but Yuri Matajasivi vc showed that this is undecidable (1970). Gérard Huet showed that unification in higher order l-calculi is also undecidable [ Hue73].

Box-proof heuristics   Now we turn to the logical symbols themselves. The following methods belong in a course on the predicate calculus: it is probably better to teach resolution quite separately. Unless we say otherwise, any boxes are drawn as large as possible, extending from the end of the database to the first pending goal.

REMARK 1.7.9 The uniform proof (Fact 1.7.1) is found as follows:

(a)
Any formula fÙy as a goal or hypothesis may be replaced by f and y as two formulae. Similarly T may be ignored altogether.

=.3 omitted proofbox environment               omitted proofbox environment

(b)
To prove the goal "x.f[x]Þ q[x], we open an ("Á )-box with new variable x, hypothesis f[[(x)\vec]] and conclusion q[x]. Having now filled in the immediate proof-step which justifies the first goal, albeit without any reason for q[x] so far, we are excused from considering this goal again by the annotation (Þ Á ) or ("Á ) on line 99. The goals \lnot f and fÛ q are handled in a similar way.

-.5 -2em 2 omitted proofbox environment -28.4pt        omitted proofbox environment

(c)
The behaviour of $x.f[x] as a hypothesis ($E ) mirrors that of " as a goal, since

($x.f [x])Þ q º "x.(f[x] Þ q). Recall, however, that the ($E ) -box is open-ended below (Remark 1.6.5), so as long as the variable x does not occur elsewhere, we can simply add f[x] to the data without a box. It is to our advantage to do this as soon as possible, because there may be many things satisfying f, and it could be relevant later that the same one plays two or more different roles in the argument, although to say that we ``choose'' a witness does not mean that an actual individual is selected (Remark  1.6.7). The original axiom $x.f[x] will not be needed again.

Subject to scoping of variables, these boxes may be nested in any order and so may be taken together in a single step.

(d)
A goal $x.q[x] can only be deduced from q[x0 ], using ($Á ), where x0 is a term to be found by resolution (Remark 1.7.6).

(e)
If fÚy is in the database then an (ÚE )-box is opened below it. Each half of the box now has its own copy of the database (with f or y respectively replacing fÚy) and goals. As this step may lead to duplication of the proof, we prefer to do it as late as possible.

The remaining heuristics involve backtracking (Remark  1.7.3).

(f)
We use resolution (Remark 1.7.6) to prove an atomic goal q[[(a)\vec]] using "[(x)\vec],[(y)\vec].f [[(x)\vec],[(y)\vec]]Þ q[[(x)\vec]] from the database ("Þ E).

Notice that ("Á ) and ($E ) mirror each other, but ($Á ) and ("E ) do not. This is because a goal requires just one proof, whereas a hypothesis may be employed any number of times, or not at all. (Linear logic analyses the reuse of hypotheses, but we shall not consider it in this book.)

(g)
If ^ is in the database, all goals are immediately satisfied ( ^E ). More generally, if "[(x)\vec].\lnot f[[(x)\vec]] is in the database then we may replace it by put[(x)\vec]0 = ? and all outstanding goals by f[[(x)\vec]0], using ^ as a ``joker' ' (Remark 1.4.4).

(h)
If q0Úq1 is a goal then we seek first a proof of q0, and then (if that fails) a proof of q1.

REMARK 1.7.10 During resolution, we used declarations ( putx0 = ?) to introduce indeterminates. This was done to allow us to continue building the logical structure of the proof without specifying certain of its details. When we have obtained a valid proof, complete apart from the occurrences of an indeterminate x0, we have to find a term which can be substituted everywhere for it. This term must satisfy any equations in which x0 occurs, irrespective of how they are nested within the proof box, so the unification problem cuts across the scoping structure of the proof. Nevertheless, the term must still be well formed at the point of the declaration: the variables belonging to nested ("Á )- and ($E )-boxes must not be free in it.

REMARK 1.7.11 We have gone beyond Fact 1.7.1 by discussing axioms of the form $x.f[x], fÚy and \lnot f. These are not definite in the sense of Example  1.7.5, because when $x.f[x] is used the program cannot provide an answer for x, and x may remain free in any other answers it gives. Similarly which of f or y holds in fÚy is indeterminate. If the joker (^E) is used to prove $y.q[y], again we have no idea what y is.