Practical Foundations of Mathematics

Paul Taylor

4.3  Categories for Formal Languages

The algebraic notation which is standard today only became so in the seventeenth century. By this time general methods for solving quadratic (in ancient Babylonia), cubic and quartic equations (Renaissance Italy) had already been found and expressed in an algorithmic or declarative style. Indeed if we put the solution of the cubic into closed or normal form it would become incomprehensible. This presentation is convenient for us too, because it is already the category we require.

DEFINITION 4.3.1 The direct declarative language has the following syntax for \bnfname programs (sometimes called commands):

skip put\bnfname variable = \bnfname term discard\bnfname variables \bnfname program;\bnfname program

subject to the weak variable convention that no declaration may be made of a variable (with the same name as one) which is already in scope. A variable is said to be in scope from its put to its discard command; of course only the variables in scope may occur freely in \bnfname terms. In the construction of this section, the notion of \bnfname term is meant to be an indeterminate one, which will be taken to be algebra in Section 4.6, and the l-calculus in Section 4.7. Conditionals, loops and other constructs will be added later. The put command is commonly called let, but we wish to maintain the distinction made in Remark 1.6.2 between definite and indeterminate values.

The direct declarative language alone is rather mundane, so we shall borrow an extension from Section 5.3 to present a famous example.

Operational interpretation   Treating the language as imperative, the state at any point of the execution of a program like this is determined by the tuple of current values of the variables in scope. The type of states is the cartesian product of the sets over which the variables range. Notice that the put and discard commands change not only the value of the state s but also its type, so we really need a category and not just a monoid (or semigroup, ie monoid without identity) to interpret the language.

REMARK 4.3.2 As in Example 4.2.10, programs are represented

(a)
contravariantly by their effect on continuations, and

(b)
covariantly by changing the initialisations of program-variables,

where composition (;) performs substitution of values for variables.

REMARK 4.3.3 Our language does not seem to have an imperative flavour, but in fact the weak variable convention allows assignment to be defined in it:

x: = a       is the same as    { omitted array environment
where x¢ is a new variable, ie not in scope. This is a sleight of hand: from outside this fragment of code it appears that the value of x has been changed, cf ``without loss of generality'' (Remark  1.6.9).

EXAMPLE 4.3.4 This program finds the one real (x0) and two possibly complex roots (x± ) of the cubic equation x3+ax2+bx+c = 0. The index n takes each of the values 0, +1 and -1, and w = -1/2+1/2[Ö(-3)] denotes a complex cube root of unity.

q2 00 { = [1/27]a3-2q-ap}0 then 0

omitted program environment

Logical interpretation   What the covariant imperative action means for more complex languages becomes less clear. It also has the weakness that it acts on terms which may be substituted for the free variables, which must therefore belong to the same calculus, here a programming language. The contravariant action, by substitution into expressions, is not restricted in this way: these expressions may belong to a much richer language, such as higher order logic. Recall that we also used predicates (or subsets) to represent posets in Definition  3.1.7ff.

This method of proving partial correctness of programs is due to Robert Floyd [Flo67], though he presented it for flow charts. The notation g{u}f is due to Tony Hoare. Floyd also gave the criterion for termination (Remark 2.5.13) .

REMARK 4.3.5 For any program u, we write

{g} u {q}
to mean ``if g was true before executing u then q will be true afterwards,'' g and q being known as pre- and postconditions respectively.

These satisfy the sequent-style rules

{q[x: = a]}  putx = a  {q}
{q}   skip  {q}

{ ^
x
 
*
 
q}   discardx  {q} , where x Ï FV(q)
omitted prooftree environment
omitted prooftree environment
Or, in the informal notation (without [^(x)]* from Notation  1.1.11),
{q[a]}  putx = a   {q[x]}        {q}  discardx  {q}.

In the case of the whole program, g and q constitute the specification, for instance in the Example it is that the program produces the solution to the cubic equation. Of course we may always take

g = ^ and q = T, but this vacuous specification says that the program is good for nothing.

It is natural to insert midconditions between the lines of the program, instead of this repetitive sequent-style notation. A fully proved program consists of phrases of proof interrupted by single commands. The latter, together with the proof lines either side, must obey the Floyd rules.

The midconditions need not be computable. Even when they are, it would often be more difficult to compute them than the program itself. Sometimes they involve universal quantification over infinite sets, so they are of strictly greater logical (quantifier) complexity. The proofs may be arbitrarily complicated: it is true that given integers x,y,z ³ 1 and n ³ 3 the program xn+yn-zn will always produce a non-zero answer, but the proof- phrase took 357 years!

REMARK 4.3.6 We may also read put as an abbreviation, local definition or (as we called it in Definition 1.6.8) a declaration within the proof. We showed there that a declaration may itself be treated as an ( $E )-proof box - the scope of the variable. The box is open-ended: it extends until the end of the argument, or of the enclosing conditional branch.

As we saw, the weak variable convention allows us to define assignment, but the proof rules then become much more complicated. The original (strong) convention gives referential transparency, the free ability to import formulae ( cf Lemma 1.6.3). But if we have shown that x is odd and then do x: = 2 we may no longer use the previous knowledge. For this reason assignment is to be avoided in programming.

The logical interpretation gives a contravariant action:

REMARK 4.3.7 For any u and q there is a weakest precondition, obtained by letting u act on q by substitution:

{g} u {q} Û g\vdash u*q.

Normal forms The operational interpretation - the execution of the program on numerical values - is not the only notion of computation. As in the l -calculus, there are rules for rewriting programs with the aim of putting them in normal (or, as it is called in algebra, closed) form.

REMARK 4.3.8 The operational and logical interpretations satisfy

omitted eqnarray* environment

where x and y are distinct variables with x,y Ï FV(a) and y Ï FV(b). Of course skip is the identity and ; is composition. The last law, which is not redundant, says that x: = x does nothing. Assignment is the simplest case of conflict between the names of input and output variables, which must be resolved by renaming. We shall discuss the orientation of these laws as reduction rules in Remark 8.2.7.

THEOREM 4.3.9 Every program of the direct declarative language is equivalent (with respect to the above laws) to one in normal form:

put ®
z
 
= ®
a
 
; discard ®
x
 
; put ®
y
 
= ®
z
 
; discard ®
z
 
where FV([(a)\vec]) Ì [(x)\vec] are the inputs and [(y)\vec] the outputs. This means

(a)
first the (renamed) outputs are declared by \bnfname terms in which only the input variables occur, not the output ones;

(b)
then the input variables are discarded;

(c)
finally the output variables are renamed. []

The proof (by induction on the length of the program) is left as a valuable exercise. The point is to show that the laws suffice to capture the familiar process of eliminating intermediate variables (p, q, etc in the Example). Beware that we are only normalising the connectives defined in this section: the theorem says nothing about any normal forms of the \bnfname terms themselves ( eg from the l-calculus). The normal form is unique (up to order, which may be canonised, and the choice of new names), so it may be used to compare programs and make deductions about commutative diagrams (Remark 7.6.12). But it is not a good way to define the category, as (the proof of) the theorem is needed every time we compose two morphisms, cf Example  1.2.1.

REMARKS 4.3.10

(a)
Normalisation uses the second law from left to right, but (in the opposite sense) the derived law
putx = a[z: = u] \leadsto putz = u; put x = a; discardz
decomposes terms in algebra into operation-symbols and sub-terms, and hence to operation-symbols alone. On a machine the latter are calls to library routines, so this process ( compilation) reduces any program to a sequence of such calls. The second law then becomes redundant, being replaced by those of the language itself.

(b)
The weak variable convention is necessary to define the category composed of programs.

The object at each semicolon is the set of typed variables in scope there.

In particular the source and target of the program are the lists of types of the input and output variables respectively. It is convenient to assume that these have no local variable names in common.

(c)
From the example and theorem, we see that the put and discard commands do not form a nested system like proof boxes; indeed if they did the program would just throw away its results.

Remark 1.6.5 already gives some freedom to choose when to close ( $E )-boxes and definitions. The laws for discard extend this conservatively, ie they give no new equivalences between properly nested boxes. In fact the proof of Lemma 1.6.6 illustrates that the natural scope of ( $E )-boxes is not necessarily nested.

(d)
The discard commands (which Floyd called undefine) have been added for an exact match with the categorical concepts which are used to interpret the language, but experience shows that a compiler can do a more accurate and efficient job if it is given better type and resource information about the intentions of the programmer. If the variables obey the strong Convention  1.1.8 ( ie they cannot be reused, so assignment is not allowed) the laws allow us to move all of these commands to the end of the program. As compilers do this automatically, discard is redundant in programming.

The language defines a sketch whose objects are sets of typed variables; the maps are programs whose source and target objects are the variables which are in scope before and after. We have given familiar covariant and contravariant actions, and five laws which are sound for them. The normal form theorem shows that the substitution action is faithful , and indeed that we need only consider its effect on variables; in other words the five laws provide a complete axiomatisation.

The category of contexts and substitutions   These programs are not exactly the notation which we introduced in Section  1.1, but the difference is ``syntactic sugar.''

DEFINITION 4.3.11 The category of contexts and substitutions, which is called Cn×L, is presented by the following elementary sketch:

(a)
The objects of \CloneLx are the contexts of L (Notation 2.3.6), ie finite lists of distinct variables, [x1:\typeX1,x2:\typeX2,¼,xk:\typeXk], together with their types. In shorthand, [[(x)\vec]:[(X)\vec]], [[(X)\vec]], [(x)\vec] or G, with len[(x)\vec] = k. As far as Section  4.6, the \typeXi will just be base types (sorts), then in Sections 4.7 and 5.3 we shall begin to allow expressions such as X® Y and X+Y for the types of the terms. Lists of types on both sides of the turnstile mean products.

(b)
The generating morphisms are put (declaration) and discard, ie

The extra variable x may be inserted anywhere in the list.

(c)
The Extended Substitution Lemma ( 1.1.12) gives the laws: omitted eqnarray* environment where x\not º y, x,y Ï FV(a) and y Ï FV(b). They are shown as commutative diagrams in Section 8.2.

For a Horn theory, Theorem 3.9.1 generated the analogous preorder Cn\landL of contexts under provability, by the two cases of omitting a proposition from a context and using a single instance of the closure condition.

There is nothing in this construction which is peculiar to either algebra, programming or the l-calculus: it may be applied to any typed calculus of substitutions. The maps have been written with a substitution notation, because this is the notion of composition. The l-calculus defines another composition operation via abstraction and application, but it is only associative after the b- and h-rules (Definition 4.7.6) have been imposed, and then the two forms of composition agree. Substitution is a primitive of symbolic manipulation, the l-calculus is not.

In Section 4.7 we shall begin to add type constructors such as ® to the logic. Then [f:(X® Y)] will be a valid context, and so will be added to the category as a new object. We shall write Cn® L for the larger category, in which the morphisms are formed by l-abstraction and application. But it turns out (and this is an important theorem) that there will be no additional maps between the old objects, nor do maps which were previously distinguished become equal: we say that Cn® L is a conservative extension of Cn×L. It may provide more powerful methods of reasoning, without doing anything which we couldn't have done before. In other words, it gives short proofs of facts which were already true in the simpler system, but which would have taken much (maybe hyper-exponentially) longer to prove. Other type constructors extend the category further, and we write Cn[]L for the generic situation; in fact we have already made such an extension from the unary case in Section 4.2. We shall discuss conservativity in Sections 7.6 and 7.7.

Terms as sections   Theorem 4.3.9 substantiates the remarks about simultaneous substitution which we made after we first introduced the Substitution Lemma 1.1.5.

NOTATION 4.3.12 Any map may be written uniquely as a multiple or simultaneous substitution,

omitted diagram environment

ie a sequence of bindings of terms to variables, where yj and \argaj have type \typeYj, and FV(\argaj) Ì {[(x)\vec]}. Composition is by simultaneous substitution and the identity is [[(x)\vec]: = [(x)\vec]].

The sources and targets of the maps are ambiguous in this notation. There may be more variables in the source context than are mentioned in the substituted terms, and it is not clear whether they should survive or be forgotten in the target. Indeed we took advantage of the ambiguity by saying that the substitutions for different variables commute. The (strict) notion of commutativity in monoids does not extend to categories because the sources must agree with the targets, but it becomes meaningful in situations like this, where maps with different endpoints have ``essentially'' the same effect, differing only in their passive contexts. This is what led to the ``commutative'' diagram terminology.

COROLLARY 4.3.13

(a)
A map a:G® [G,x:X] is a single substitution iff it is a section of the display [^(x)]:[G,x:X]® G , ie a;[^(x)] = \idG . Then a = [x: = a], where the term G\vdash a:X is determined by a = a*x.

(b)
Sections a:G® [G,x:X] º Gx X correspond bijectively to total functional relations G\leftharpoondown \rightharpoonup X, cf Exercise 1.14.

(c)
The clone \CloneL(G,X) is isomorphic to the hom-set Cn×L(G,[x:X]) , whence the deliberate use of similar notation.

(d)
More generally, Cn×L(G,[[(x)\vec]:[(X)\vec]]) º Õi = 1len[( X)\vec] \CloneL(G,\typeXi). []

We shall use display maps and their sections to recover a generalised algebraic from a category in Chapter VIII.

Use of variables   In the last section terms of a unary language were called (oriented) strings of operations.

REMARK 4.3.14 Let's say wires instead; then a morphism in the many-argument version is to a term in the unary one as a multi-core cable is to a single wire. We now need a way to distinguish the wires, where we did not before: this is what variables do. We use German and italic letters for the cables and single wires respectively.

In the foregoing account we have chosen to colour the wires with variable-names. Many authors prefer to number the pins in the plugs and sockets instead. That is, they specify that x1,¼,xn are the actual names rather than meta-variables ( cf Definition  1.1.9), and have to renumber them at every stage. According to this convention, a morphism is a list of equivalence classes of assignments of the form [x1,¼,xn® a], where

[x1,¼,xn® a] = [y1,¼,yn® a[ ®
x
 
: = ®
y
 
]] .
This ubiquitous renaming needlessly complicates the formal discussion of the notation. It is useful as an informal way of naming a single map, to avoid choosing names for the target variables. However, in a diagram, where such maps are to be composed, it is often more convenient to attach distinct variables to the objects ( ie to adopt our convention) in order to use them in a global symbolic argument about commutativity.

In this book we shall keep an explicit distinction between free variables, so that [x:X] and [y:X] are isomorphic but unequal contexts. The difference is that our category has many isomorphic duplicates of each object X, but the leaner one may be obtained from it by a straightforward construction from abstract category theory (Exercise  4.7). We have chosen this convention in order to take best advantage of variables as they are normally used in mathematics, namely to relate quantities defined in one part of any argument to their use in another.

A change of free variables we call open a-equivalence . When we come to l-abstraction, terms differing only by the names of corresponding bound variables will be considered to be the same. The context says which free variables are allowed, but the bound ones are unlimited.

A map is in fact not just a cable but a device with inputs and outputs. Corollary  4.3.13(d) says that if the output is a tuple then it may be split into several (multiple-input,) single-output devices. Yet another common metaphor is to think of each term as a tree (Remark  1.1.1) and the maps are forests (collections of trees); composition is by substitution of the roots of one forest for the leaves of another! Theorem 3.9.1 also described Cn\landL using proof trees.