How I teach the Algorithms course

Paul Taylor

30 January 2001

This course is about understanding the behaviour, implementation, correctness and complexity of some well known array algorithms, especially for sorting and searching.

It is taken by first year BSc (G500) and MSci (G501) Computer Science and second year BSc Computer Science and Mathematics (GG15) and Mathematics and Computing (GG51) students.

It was introduced by Richard Bornat in January 1997, and I took it over in January 1999, so I have taught it three times.

The message that course is meant to send to students is

technically about components and their specifications, and
philosophically that it is not obvious what programs do - you have to think about them and do experiments.

The purpose of this note is to explain (mainly to my colleagues at QMW) my own approach to this course, and in particular to address certain misunderstandings that are in circulation: that

the course is a collection of sorting algorithms,
it's about big-O notation,
it's too mathematical,
it ought to be about data structures and
it has a high failure rate.

It also describes the things that I have contributed to the course, in particular my TESTER, and my treatment of components, specifications, correctness proofs and the subtleties of some of the algorithms.

1 Data structures

Similar courses to mine are often called Algorithms and Data Structures, and the material that I inherited from Richard Bornat concluded with treatments of topics such as balanced binary trees.

I am not clear whether those of my colleagues who advocate teaching data structures in this course mean

brief descriptions of how to program linked lists, trees, etc. in JAVA, or
consideration of algorithms that make use of such structures, such as balancing of binary trees.

If they mean real consideration of algorithms, then this complaint is inconsistent with the other common complaint that the course is too difficult. Partly on Richard Bornat's advice, I took out the balanced binary trees, because I felt that there were plenty of things to learn about algorithms that could be done just with arrays.

If they just mean brief descriptions, it seems to me that such a course would just be a zoo, which would warrant exactly the criticism of the collection of sorting algorithms that is made.

I was persuaded to teach Dijsktra's algorithm for finding the shortest path between nodes of a network, as one of the end-of-term options. I tried very hard to present this using arrays, but it turned out to be essential to use linked lists of objects. I feel that this went beyond the proper scope of the course. (On another occasion, I might teach Walshall's algorithm for the transitive closure, which contains similar concepts but can be implemented with arrays alone.)

In point of fact, JAVA is a singularly inappropriate language for abstract data types. For example, to implement an arithmetical expression tree, we need an abstract class for expressions, together with subclasses for each of the several cases of the arithmetical operators. The JAVA compiler requires each of these subclasses to be in a separate file. On the tree, we intend to implement several algorithms (in this case it might be evaluation, the encoding in some machine or programming language, or maybe symbolic differentiation). There is therefore a matrix of pieces of code to write.

Natural, modular programming would put all of the cases for each algorithm in a file, but
Object-oriented programming requires each case for all of the algorithms in a file, so when we add an algorithm, we have to add methods to every one of the existing files.

(It was Appel's book, Modern Compiler Implementation in Java, pp99-100, that first drew my attention to this "transposed matrix" problem, but I have since experienced it myself in several ways.)

Maybe that makes sense for the GUI applications for which JAVA was designed, but this style of programming would be an additional obstacle for first year students.

In fact I consider that there ought to be a course on data structures - in the second year.

Despite my complaints about JAVA, I am happy to use it to teach such a course. JAVA has its virtues, and I would teach its use for such problems seriously. But I would at the same time invite students to form their own judgements, and take a level-headed attitude to the dogmas of Computer Science such as Object-Oriented programming.

2 Sorting algorithms

The goal of the group projects in the second year Software Engineering course in 2000-1 was to write a web server to help people choose holiday destinations. In the previous three years, the projects were about playing Othello, Draughts and Backgammon.

Does this mean that Norman Fenton teaches a course about travel agencies, or that Graem Ringwood and Victoria Stavridou taught about games?

Of course not. These applications are vehicles to carry ideas of Software Engineering, or programming in the large.

The ideas of my course are about specification of components, or programming in the small.

Sorting serves well as my top-level specification, because there is no need to spend time motivating or explaining it. As it is a common theme throughout the course, the students and I can put this top-level specification to the back of our minds and concentrate on the things that we can learn from the implementation of different algorithms and the specifications of their components.

Having inherited teaching materials about sorting algorithms, it was quite difficult for me (as well as the students) to see what we were supposed to do in the labs, when JAVA code for these algorithms is readily available in textbooks and on the Web.

I believe that I have now identified those things, and that is what I concentrate on teaching.

I am, of course, open to suggestions for other algorithms that I might consider in place of the well know sorting algorithms. In fact, such suggestions would be extremely valuable, as I am running out of ideas for exam questions.

In many ways, the textbooks (such as Weiss) seriously got in my way, which is why I stopped recommending any of them.

3 Components and specifications

For teaching the algorithms, I adopt the slogan, one problem, one specification, one program, one loop, one invariant, one proof.

The quadratic sorting algorithms (in particular insertion sort and selection sort) have two loops, where the inner one solves one of the following problems:

search a sorted segment of an array for where to insert a new value,
shift a segment of an array
search a segment of an array for the position of the least value.

I get the students to study and implement these algorithms before putting them together with the insertion or selection sort proper. (In fact I start with simple linear search, in both directions).

These preliminary exercises are extremely valuable.

Left-to-right linear search is the Hello World problem for my TESTER and this course. It shows up those students that have failed to learn basic JAVA syntax for loops and arrays, or who do not understand the difference between pointers and values in an array.
Right-to-left linear search is a symmetrical problem, but they write much more complicated code, so this is an opportunity to teach style and simplicity of programming.
I forget from one year to the next how traumatic they find the shift problem (at first they have emotional difficulties with trampling over data!), so this exercise is a rite of passage from writing toy programs and starting to be a programmer. Again style and simplicity of coding are the key, especially when it comes to getting both the left-to-right and right-to-level versions to work.
I make them check the validity of the arguments of the shift method and throw an exception. This problem is therefore a hook on which to hang discussion of overflowing arrays and its consequences for security of programs, as well as when to catch exceptions. This can be illustrated by the Ariane V disaster.
When they come to fit the components together, they learn how to match actual and formal arguments of subroutines.

4 Complexity theory

Complexity theory, like Number theory, is a peculiarly discontinuous mathematical discipline. There are some easy things, after which there is a huge gap: the next things to consider are much more difficult mathematically.

The (mainly American) textbooks on Algorithms use a lot of Calculus. I consider this largely irrelevant to the analysis of algorithms at the level that is feasible for undergraduates in Computer Science or even Mathematics. I suspect that it appears in the textbooks simply because of the heavy emphasis on Calculus that is traditional in the American university curriculum as a whole.

The formal definition of the O(n) notation itself was inherited from Real Analysis and is a handy way of describing convergence, and in particular the asymptotic behaviour of functions.

But asymptotic behaviour is wholly inappropriate for the understanding of how real programs behave on real computers:

Real hardware works differently for kilobytes (in CPU cache), megabytes (in RAM), gigabytes (in disk) and terabytes (on some other media), so the mathematical models that apply to one do not apply to the others. In fact some of my exercises actually demonstrate the discontinuities when caches overflow.
Real programs written by real students consist of nested loops, so certain standard functions are more or less certain to fit statistically throughout the range (rather than asymptotically), if anything will.

For these reasons I decided to teach the big-O notation simply as jargon, with terms such as logarithmic and quadratic as equivalents. (Clearly this is not an approach that a mathematician would choose lightly!)

What is the reason for teaching complexity at all at this level? I believe that it is to make students aware of one of the reasons why programs still take such a long time to execute, namely indiscriminate use of nested loops that traverse datasets.

Unfortunately, there is a difficulty at a much lower level: students themselves only understand mathematical formulae at a jargon or symbolic level. I am still struggling to find a way to get them to understand the practical implications of fast-growing functions (see the exam question).

5 Automatic testing

There is no point in writing these programs if students only test them by poking. My impression during the first year in which I taught the course was that many students thought that they were doing and had completed the exercises when in fact they had not.

The TESTER was written to be an automatic teaching assistant that would never let them get away with incorrect programs. As they are supposed to be writing components, it also provides the main() method: an interface whereby they can supply specific or random data to their code. The TESTER does comprehensive correctness testing, and complexity analysis. The TESTER checks the students' implementations of the algorithms for them, but by an extension of the same theme, they could be (black box) testing alleged implementations of specifications. The original version of the TESTER included stubs of code of code with this in mind, but I have not as yet had time to develop either this feature of my program or teaching material that would exploit it.

Having totally abandoned a mathematical approach to complexity theory, I invested a lot of effort in the empirical approach, i.e. timing programs in labs. In his lecture notes, Richard Bornat talks about running races to compare the various sorting algorithms, but the fact is that he did not have the technology to do this, and nor did I until I wrote my TESTER.

The TESTER obviously embodies a rather large investment of time, both in the correctness testing and the timings. Such an investment ought to be developed and utilised by the Department, not discarded.

Its primary job in the latter case is to decide which function fits the data, and secondarily to report to coefficients. This decision begins from the assumption that one of the standard functions (constant, logarithmic, linear, n log(n), quadratic or some power) is appropriate, based on the crude observation of the general nature of the code that students are likely to have written. The best fit for each function is calculated, using a standard least squares method with weighting. Then

binary search only fits the logarithm (so long as the array sizes range over several orders of magnitude, say from 4 to 250,000) whilst the other algorithms do not fit it, i.e. the correlation coefficients are 99% and 40%;
quadratic and nlog(n) are similarly orthogonal;
the quadratic coefficient for linear algorithms is less than 10 picoseconds, which is far less than one machine operation takes;
the nlog(n) coefficient for linear algorithms does not have such a wide margin of error, so can often be mis-identified.

6 Mathematical requirements

As I have explained, I have removed mathematical material that is traditionally contained in a course such as this.

What I added mathematically was the Floyd-Hoare-Dijkstra style logical proofs of correctness. I had learned this from small group teaching on the corresponding course at Imperial College, out of which came the book by Broda et al..

I am very skeptical of the value of formal approaches of any kind (whether to correctness proofs or to Software Engineering), so my treatment is considerably less formal than the one I had learned, and is largely pictorial.

In fact I had learned loop invariants for myself at school, simply by documenting the meanings of program variables at the top of each loop. (I wrote a suite of programs to calculate mathematical functions from power series, for a programmable calculator with a small number of numbered memories, so I had to document the usage of these memories (variables), and did so "at the point of the loop test".)

In the second exam that I set (May 2000), students at pass level could reproduce reasonably accurately the loop invariant diagrams that I had drawn repeatedly for insertion and selection sort. So far, only the A grade students could reproduce the arguments that used the diagrams to demonstrate correctness.

Although this logical thread dominated my treatment in the first year that I taught the course (1998-9), I played it down in the second year. Instead, I spent more time on empirical observations of the behaviour of the algorithms, expressed without formal logical or programming notation.

7 Exam results

I put a lot of care into marking - and am criticised for doing so.

I get a spread of marks all the way from 0 to 100%, whereas a bookwork course such as Discrete Structures gets a bell curve between 30% and 75%.

I give a lot of well deserved A grades. These students come away having thrown themselves wholeheartedly into the course and having really learned some Computer Science.

I also give a lot of C grades. These students have done most of the work in the labs and have learned from the course.

The results trail down to and below the pass mark. I consider very carefully whether students near the boundary have learned something from the course, and my resit paper is worded and marked to do this too.

The percentage that pass my course is similar to that for Introduction to Programming and other first year courses in the Department.

At the very bottom are maybe 20 students that have clearly opted out of the course at an early stage (the first three weeks). The modular course system forces all students to register for exactly eight courses before Week 2 of the Spring term. Many students, wisely or otherwise, study actively for fewer than eight courses in each year and there is no incentive for them to deregister.

Despite this, the actions and opinions of these students impose a dumbing-down pressure on courses such as mine, both through their influence on the exam statistics and by giving them the opportunity to make anonymous personal insults about lecturers through multiple-choice course questionnaires. I am happy to receive considered criticism from students who participate in the course (the mediocre ones as well as the best), but it is offensive that comments by non-participants should be given equal weight in reviewing courses.

Therefore, percentages are misleading because the performance of these students is beyond my control. Without imputing negligence to our departmental student administrator, who of course only has the information about attendance that lecturers give her, these additional numbers are essentially a bureaucratic oversight.

The way that I advocate for dealing with the failure rate is to put greater pressure on students to do the work, and to deregister those that are not participating, under the College's General Regulation 5.8 regarding Attendance at lectures, etc..

The TESTER provides one way of monitoring students' work, as it sends a logging message to me (by UDP) every time it is run.

The list of students currently logged in to the machines in the ITL during timetabled labs is also automatically saved.

Students have been told to do their work in a particular directory, which is copied every night to a place where I can read and analyse it.

8 Books and Web links

College course directory (one paragraph, years out of date)
Departmental information sheet (one page)
My teaching materials Appel, Andrew, Modern Copiler Implemenmtation in Java, Cambridge University Press, 1998.
Broda, Krysia, Eisenbach, Susan, Khoshnevisan, Hessam., Vickers, Steven, Reasoned Programming, Prentice Hall, International Series in Computer Science, 1994, 0-13-098831-6.
Weiss, Mark Allen, Data Structures and Problem Solving Using Java, Addison-Wesley, 1998, http://cseng.aw.com/bookdetail.qry?ISBN=0-201-54991-3, 0-201-54991-3, QA76.73.

This is www.PaulTaylor.EU/algorithms/howIteach.html and it was derived from algorithms/howIteach.tex which was last modified on 31 December 2007.