How I teach the Algorithms course
Paul Taylor
30 January 2001
This course is about understanding the behaviour, implementation,
correctness and complexity of some well known array algorithms,
especially for sorting and searching.
It is taken by first year
BSc (G500)
and
MSci (G501) Computer Science
and second year
BSc Computer Science and Mathematics (GG15) and
Mathematics and Computing (GG51) students.
It was introduced by Richard Bornat in January 1997, and I took it
over in January 1999, so I have taught it three times.
The message that course is meant to send to students is
- technically about components and their
specifications, and
- philosophically that
it is not obvious what programs do -
you have to think about them and do experiments.
The purpose of this note is to explain (mainly to my colleagues at
QMW) my own approach to this course, and in particular to address
certain misunderstandings that are in circulation: that
- the course is a collection of sorting algorithms,
- it's about big-O notation,
- it's too mathematical,
- it ought to be about data structures and
- it has a high failure rate.
It also describes the things that I have contributed to the course, in
particular my TESTER, and my treatment of components, specifications,
correctness proofs and the subtleties of some of the algorithms.
1 Data structures
Similar courses to mine are often called
Algorithms and Data Structures,
and the material that I inherited from Richard Bornat
concluded with treatments of topics such as balanced binary trees.
I am not clear whether those of my colleagues who advocate teaching
data structures in this course mean
- brief descriptions of how to program
linked lists, trees, etc. in JAVA, or
- consideration of algorithms that make use of such
structures, such as balancing of binary trees.
If they mean real consideration of algorithms, then this complaint
is inconsistent with the other common complaint that the course is
too difficult.
Partly on Richard Bornat's advice, I took out the balanced binary trees,
because I felt that there were plenty of things to learn about algorithms
that could be done just with arrays.
If they just mean brief descriptions, it seems to me that such a course
would just be a zoo, which would warrant exactly the criticism
of the collection of sorting algorithms that is made.
I was persuaded to teach Dijsktra's algorithm for finding the shortest
path between nodes of a network, as one of the end-of-term options.
I tried very hard to present this using arrays, but it turned out to
be essential to use linked lists of objects.
I feel that this went beyond the proper scope of the course.
(On another occasion, I might teach Walshall's algorithm for the
transitive closure, which contains similar concepts but can be
implemented with arrays alone.)
In point of fact, JAVA is a singularly inappropriate language
for abstract data types. For example, to implement an arithmetical
expression tree, we need an abstract class for expressions, together
with subclasses for each of the several cases of the
arithmetical operators. The JAVA compiler requires each of these
subclasses to be in a separate file. On the tree, we intend
to implement several algorithms (in this case it might be
evaluation, the encoding in some machine or programming
language, or maybe symbolic differentiation). There is
therefore a matrix of pieces of code to write.
- Natural, modular programming would put all of the cases for
each algorithm in a file, but
- Object-oriented programming requires each case for all of the
algorithms in a file, so when we add an algorithm, we have to add methods
to every one of the existing files.
(It was Appel's book, Modern Compiler
Implementation in Java, pp99-100, that first drew my attention to
this "transposed matrix" problem, but I have since experienced it
myself in several ways.)
Maybe that makes sense for the GUI applications for which JAVA was designed,
but this style of programming would be an additional obstacle
for first year students.
In fact I consider that
there ought to be a course on data structures -
in the second year.
Despite my complaints about JAVA, I am happy to use it to teach such a
course. JAVA has its virtues, and I would teach its use for such problems
seriously.
But I would at the same time invite students to form their own judgements,
and take a level-headed attitude to the dogmas of Computer Science
such as Object-Oriented programming.
2 Sorting algorithms
The goal of the group projects in the second year Software Engineering course
in 2000-1 was to write a web server to help people choose holiday
destinations. In the previous three years, the projects
were about playing Othello, Draughts and Backgammon.
Does this mean that Norman Fenton teaches a course about travel agencies,
or that Graem Ringwood and Victoria Stavridou taught about games?
Of course not. These applications are vehicles to carry ideas
of Software Engineering, or programming in the large.
The ideas of my course are about specification of components,
or programming in the small.
Sorting serves well as my top-level specification,
because there is no need to spend time motivating or explaining it.
As it is a common theme throughout the course,
the students and I can put this top-level specification to the back of
our minds and concentrate on the things that we can learn from
the implementation of different algorithms and the specifications
of their components.
Having inherited teaching materials about sorting algorithms,
it was quite difficult for me (as well as the students) to
see what we were supposed to do in the labs,
when JAVA code for these algorithms is readily available in
textbooks and on the Web.
I believe that I have now identified those things,
and that is what I concentrate on teaching.
I am, of course, open to suggestions for other algorithms
that I might consider in place of the well know sorting algorithms.
In fact, such suggestions would be extremely valuable, as I am
running out of ideas for exam questions.
In many ways, the textbooks (such as Weiss)
seriously got in my way,
which is why I stopped recommending any of them.
3 Components and specifications
For teaching the algorithms, I adopt the slogan,
one problem, one specification, one program, one loop, one invariant,
one proof.
The quadratic sorting algorithms (in particular insertion sort and
selection sort) have two loops, where the inner one solves one of the
following problems:
- search a sorted segment of an array for where to insert a new value,
- shift a segment of an array
- search a segment of an array for the position of the least value.
I get the students to study and implement these algorithms before
putting them together with the insertion or selection sort proper.
(In fact I start with simple linear search, in both directions).
These preliminary exercises are extremely valuable.
- Left-to-right linear search is the Hello World problem
for my TESTER and this course. It shows up those students that have
failed to learn basic JAVA syntax for loops and arrays,
or who do not understand the difference between pointers and
values in an array.
- Right-to-left linear search is a symmetrical problem, but they write
much more complicated code, so this is an opportunity to teach
style and simplicity of programming.
- I forget from one year to the next how traumatic they find the shift
problem (at first they have emotional difficulties with trampling over
data!), so this exercise is a rite of passage from writing toy
programs and starting to be a programmer.
Again style and simplicity of coding are the key, especially
when it comes to getting both the left-to-right and right-to-level versions
to work.
- I make them check the validity of the arguments of the shift method
and throw an exception.
This problem is therefore a hook on which to hang discussion of
overflowing arrays and its consequences for security
of programs,
as well as when to catch exceptions.
This can be illustrated by the
Ariane V disaster.
- When they come to fit the components together, they learn how to
match actual and formal arguments of subroutines.
4 Complexity theory
Complexity theory, like Number theory, is a peculiarly discontinuous
mathematical discipline.
There are some easy things, after which there is a huge gap:
the next things to consider are much more difficult mathematically.
The (mainly American) textbooks on Algorithms use a lot of Calculus.
I consider this largely irrelevant to the analysis of algorithms
at the level that is feasible for undergraduates in Computer Science
or even Mathematics.
I suspect that it appears in the textbooks simply because of the heavy
emphasis on Calculus that is traditional in the American university
curriculum as a whole.
The formal definition of the O(n) notation itself was inherited
from Real Analysis and is a handy way of describing convergence,
and in particular the asymptotic behaviour of functions.
But asymptotic behaviour is wholly inappropriate for
the understanding of how real programs behave on real computers:
- Real hardware works differently for kilobytes (in CPU cache),
megabytes (in RAM), gigabytes (in disk) and terabytes (on some other media),
so the mathematical models that apply to one do not apply to the others.
In fact some of my exercises actually demonstrate the discontinuities
when caches overflow.
- Real programs written by real students consist of nested loops,
so certain standard functions are more or less certain to
fit statistically throughout the range (rather than asymptotically),
if anything will.
For these reasons I decided to teach the big-O notation simply as
jargon, with terms such as logarithmic and
quadratic as equivalents. (Clearly this is not an approach that a
mathematician would choose lightly!)
What is the reason for teaching complexity at all at this level?
I believe that it is to make students aware of one of the reasons
why programs still take such a long time to execute,
namely indiscriminate use of nested loops that traverse datasets.
Unfortunately, there is a difficulty at a much lower level: students
themselves only understand mathematical formulae at a jargon or
symbolic level. I am still struggling to find a way to get them to
understand the practical implications of fast-growing functions (see
the exam question).
5 Automatic testing
There is no point in writing these programs if students only
test them by poking. My impression during the first year in which I
taught the course was that many students thought that they were
doing and had completed the exercises when in fact they had not.
The TESTER was written to be an
automatic teaching assistant
that would never let them get away with incorrect
programs. As they are supposed to be writing components, it
also provides the main() method: an interface whereby they
can supply specific or random data to their code. The TESTER does
comprehensive correctness testing, and complexity analysis. The
TESTER checks the students' implementations of the algorithms for
them, but by an extension of the same theme, they could be
(black box) testing alleged implementations of specifications. The
original version of the TESTER included stubs of code of code with
this in mind, but I have not as yet had time to develop either this
feature of my program or teaching material that would exploit it.
Having totally abandoned a mathematical approach to complexity theory,
I invested a lot of effort in the empirical approach, i.e. timing programs in labs. In his lecture notes, Richard
Bornat talks about running races to compare the various sorting
algorithms, but the fact is that he did not have the technology to do
this, and nor did I until I wrote my TESTER.
The TESTER obviously embodies a rather large investment of time,
both in the correctness testing and the timings.
Such an investment ought to be developed and utilised by the Department,
not discarded.
Its primary job in the latter case is to decide which function
fits the data, and secondarily to report to coefficients.
This decision begins from the assumption that one of the
standard functions (constant, logarithmic, linear, n log(n),
quadratic or some power) is appropriate, based on the crude
observation of the general nature of the code that students are likely
to have written. The best fit for each function is calculated, using
a standard least squares method with weighting. Then
- binary search only fits the logarithm (so long as the
array sizes range over several orders of magnitude, say from 4 to
250,000) whilst the other algorithms do not fit it, i.e. the
correlation coefficients are 99% and 40%;
- quadratic and nlog(n) are similarly orthogonal;
- the quadratic coefficient for linear algorithms is less than
10 picoseconds, which is far less than one machine operation takes;
- the nlog(n) coefficient for linear algorithms does not
have such a wide margin of error, so can often be mis-identified.
6 Mathematical requirements
As I have explained, I have removed mathematical material that
is traditionally contained in a course such as this.
What I added mathematically was the Floyd-Hoare-Dijkstra style
logical proofs of correctness. I had learned this from small group
teaching on the corresponding course at Imperial College,
out of which came the book by Broda et al..
I am very skeptical of the value of formal approaches of any kind
(whether to correctness proofs or to Software Engineering), so my
treatment is considerably less formal than the one I had
learned, and is largely pictorial.
In fact I had learned loop invariants for myself at school,
simply by documenting the meanings of program variables at
the top of each loop. (I wrote a suite of programs to calculate
mathematical functions from power series, for a programmable
calculator with a small number of numbered memories, so I had
to document the usage of these memories (variables), and did so
"at the point of the loop test".)
In the second exam that I set (May 2000),
students at pass level could reproduce reasonably accurately the loop
invariant diagrams that I had drawn repeatedly for insertion
and selection sort.
So far, only the A grade students could reproduce the arguments
that used the diagrams to demonstrate correctness.
Although this logical thread dominated my treatment in the first year
that I taught the course (1998-9), I played it down in the second
year. Instead, I spent more time on empirical observations of the
behaviour of the algorithms, expressed without formal logical or
programming notation.
7 Exam results
I put a lot of care into marking - and am criticised for doing so.
I get a spread of marks all the way from 0 to 100%, whereas a bookwork
course such as Discrete Structures gets a bell curve between 30%
and 75%.
I give a lot of well deserved A grades. These students come away having
thrown themselves wholeheartedly into the course and having really learned
some Computer Science.
I also give a lot of C grades. These students have done most of the work
in the labs and have learned from the course.
The results trail down to and below the pass mark. I consider very
carefully whether students near the boundary have learned something
from the course, and my resit paper is worded and marked to do this too.
The percentage that pass my course is similar to that for
Introduction to Programming and other first year courses in
the Department.
At the very bottom are maybe 20 students that have clearly opted out
of the course at an early stage (the first three weeks).
The modular course system forces all students to register for
exactly eight courses before Week 2 of the Spring term.
Many students, wisely or otherwise, study actively for fewer than eight
courses in each year and there is no incentive for them to deregister.
Despite this, the actions and opinions of these students impose a
dumbing-down pressure on courses such as mine, both through
their influence on the exam statistics and by giving them the
opportunity to make anonymous personal insults about
lecturers through multiple-choice course questionnaires. I am happy
to receive considered criticism from students who participate
in the course (the mediocre ones as well as the best), but it is
offensive that comments by non-participants should be given
equal weight in reviewing courses.
Therefore, percentages are misleading because the
performance of these students is beyond my control. Without
imputing negligence to our departmental student administrator, who of
course only has the information about attendance that lecturers give
her, these additional numbers are essentially
a bureaucratic oversight.
The way that I advocate for dealing with the failure rate is
to put greater pressure on students to do the work, and
to deregister those that are not participating,
under the College's
General Regulation 5.8
regarding Attendance at lectures, etc..
The TESTER provides one way of monitoring students' work,
as it sends a logging message to me (by UDP) every time it is run.
The list of students currently logged in to the machines in the ITL
during timetabled labs is also automatically saved.
Students have been told to do their work in a particular directory,
which is copied every night to a place where I can read and analyse it.
8 Books and Web links
- College course directory (one paragraph, years out of date)
- Departmental information sheet (one page)
- My teaching materials
Appel, Andrew,
Modern Copiler Implemenmtation in Java,
Cambridge University Press, 1998.
- Broda, Krysia,
Eisenbach, Susan,
Khoshnevisan, Hessam.,
Vickers, Steven,
Reasoned Programming,
Prentice Hall,
International Series in Computer Science,
1994,
0-13-098831-6.
- Weiss, Mark Allen,
Data Structures and Problem Solving Using Java,
Addison-Wesley,
1998,
http://cseng.aw.com/bookdetail.qry?ISBN=0-201-54991-3,
0-201-54991-3,
QA76.73.
This is
www.PaulTaylor.EU/algorithms/howIteach.html
and it was derived from algorithms/howIteach.tex
which was last modified on 31 December 2007.