Introduction to Algorithms:
Mergesort and Quicksort

Paul Taylor

19 February 2001

1 Mergesort and quicksort

These are your first two recursive programs in JAVA (you have already written lots of recursive programs in MIRANDA). There are two approaches to teaching recursion. One is to pull your hair out with worry about how circular definitions could possibly be meaningful. The other is just to get on with it, and that's what we shall do.

Mergesort takes an array, {91 46 16 46 61 46 89 78 27 81 83 37 95 34 75 56} and splits it in two without any attempt to sort it: {91 46 16 46 61 46 89 78} {27 81 83 37 95 34 75 56} then sorts two parts recursively (i.e. by calling itself): {16 46 46 46 61 78 89 91} {27 34 37 56 75 81 83 95} and finally merges them in order: {16 27 34 37 46 46 46 56 61 75 78 81 83 89 91 95} So mergesort does the work on the way out.

Quicksort (maybe we should call it splitsort) takes the array, {91 46 16 46 61 46 89 78 27 81 83 37 95 34 75 56} chooses a pivot value, say 46, and splits the array into two non-empty parts, one consisting of elements that are less than or equal to the pivot value, the other consisting of values greater than or equal to the pivot value: {34 37 16 27 46} {61 89 78 46 81 83 46 95 91 75 56} sorts the two parts recursively: {16 27 34 37 46} {46 46 56 61 75 78 81 83 89 91 95} and finally concatenates them as they are already in order: {16 27 34 37 46 46 46 56 61 75 78 81 83 89 91 95} So quicksort does the work on the way in.

Notice that the " ≤ " and " ≥ " parts are neither sorted nor in the same order that these elements appeared in the given array, and that "=" values may be put in either part. It is not important what order they're in (of course, because we're about to sort them), but this is the order that you get from the splitting process according to the simplest algorithm for doing it.

(It could alternatively split the array into three parts, including one for values equal to the pivot, but this involves a more complicated splitting algorithm, known as the "Dutch National Flag".)

The recursive parts of the programming are easy. What is tricky is to get the merge and split operations to work. We concentrate on those first.

Before you start work on the programming, do trace merge-sort and quick-sort a few times to get a feel for how the splitting and merging work.

2 Write the code for merge

Do not confuse merge with mergesort!

There are no more templates any more: you create your own file called Merge.java containing

   class Merge implements Merging {
      public int[] merge (int[] B, int[] C) {

your code goes here

      }
      public void choice (int c) {}
   }

Assume that the arrays B and C are already sorted (when you have the whole of mergesort in place they will have just been sorted).

Allocate a new array A that's just big enough to contain the merge of B and C.

Merge B and C into A, so that A ends up sorted and containing the same stuff as B and C.

Draw the diagram showing how merge is going to work.

Don't forget to deal with the situation where one of the arrays has been exhausted before the other, for which a simple copy (with no comparisons) is needed.

Run the TESTER on run Merge.java in the usual way. The size command line argument controls the size of B; use size2 for C.

You can use Trace.printarray (B, j) etc.. to see your code in action.

Give it the example above explicitly:

   trace Merge.java array=16,46,46,46,61,78,89,91 array2=27,34,37,56,75,81,83,95

These arrays are already sorted, so the precondition for merge is satisfied, and you should get a sorted result.

Test the situations where all of B comes before all of C, or vice versa.

There is no need to time this code: it is only a subroutine, and is pretty obviously linear in the (total) size of the arrays.

3 Mergesort

Create a file Mergesort.java containing

   class Mergesort implements Sorting {
      public int[ ] sort (int[ ] A) {
         int size = A.length;

your code goes here

      }
      // access to merge(B,C)
      private Merging merger;
      // this is for the hybrid algorithm (Section 6)
      private int cutoff=0;
      public void cutoff (int n) { cutoff=n; }
      private Sorting insertion_sorter;
      public void choice (int c) {}
      public Mergesort() {
         merger = new TestMerge();  // or Merge() to use your own
         insertion_sorter = new Insertion(); 
      }
   }

Begin by finding the midpoint:

    int mid = size/2;

Assign this explicitly to a variable - don't just use this expression verbatim in the code. Remember that size may be odd; it is more important to make the calculation of the left and right "halves" consistent than to make mid exactly A.length/2.

Now declare two new arrays (B[ ] and C[ ]) of appropriate sizes and copy half of the given array into each of them.

Next is where the the recursive calls to sort(B) and sort(C) belong. Write the code, but comment it out (i.e. put // before it) for the time being.

Now use merger.merge (B,C) to merge them, and return the result of the merge as the result of sort.

The code as given above (with TestMerge) uses the merge method that's built into the TESTER, so you can be confident that this part is correct. Substitute your own code (with Merge) later, when you have mergesort working correctly.

Try this out using an array consisting of two sorted halves:

   run Mergesort.java array=16,46,46,46,61,78,89,91,27,34,37,56,75,81,83,95

There is a pitfall here. If you fall into it, you will get back the same (unsorted) array that you put in. This is to do with the distinction between JAVA variables and the objects (or arrays) to which they refer. You have recently done some exercises in Introduction to Programming concerning this point.

If you didn't actually make this mistake, you have missed out on a learning opportunity - alter your code to make this happen.

If we had been using C or C++, you would have fallen down this hole much earlier.

4 Recursive Merge Sort

Now enable the recursive calls to sort(B) and sort(C) and run the TESTER again.

What happens?

What if A is empty or only contains one element?

When your mergesort is correct, the TESTER will, as usual, time it for you. This time you will see nlogn time complexity (instead of the n² complexity for insertion sort etc..).

You can time it for larger sizes of array using commands like these:

   run Mergesort.java size=1000000 thread=no
   check Mergesort.java sizes=200000,500000,1000000 maxtime=60

Because of the allocation of scratch space within the recursive calls, you will not be able to run this for very large arrays: 1,000,000 may be the largest that the JAVA run time system allows.

Does it make any difference to the performance of merge sort if you give it already-sorted data?

5 Seeing recursive mergesort in action

It's rather more difficult to get to see what recursive programs are doing than for while programs. Here's something that you can try.

First, take out the calls to Trace.printarray() that you put in the loop(s) for merge.

There is a special option to demonstrate mergesort in action:

   Trace.printmerge (int[] A, int indent, int level, boolean direction);

The direction argument is there to show when you enter and leave sort. Call it with true on entry and false on exit; then an appropriate bracket will be printed.

The other two arguments keep track of how deeply nested the recursion is, and what part of the array you're working on. You will need to alter your code like this to use them:

   public int[] sort (int[] A)
      { return mergesort (A, 0, 0); }
   private int[] mergesort (int[] A, int indent, int level)
      { /* your original code for sort(A) */ }

and replace sort(B) with

   mergesort (B, new_indent, level+1);

and similarly C.

The level argument causes that many dots to be printed, to show the depth of the recursion.

The indent argument shifts the printout of the array across the screen by some number of cells.

Work out what value of indent has to be passed to each of the two recursive calls to align them with the two halves of the larger array from which they came. How many elements of the original array are to the left of the part we're looking at?

6 Hybrid of mergesort and insertion sort

Is there some size below which insertion sort is faster than merge sort? Work out, by experiment, what it is.

Why does merge sort perform so badly on small arrays?

Why is this important for its use on large arrays too?

Make a new sort method that uses insertion sort for small arrays and merge sort for big ones.

Use the cutoff variable to decide whether to use mergesort or insertion sort on the present array, depending on its size. This variable can be set by adding cutoff=25 or similar to the TESTER command line.

Time this one.

Is it better than the the original merge_sort for big arrays? By how much? What function of the size is this?

Try to find the optimum value of cutoff.

Carefully compare the coefficients of both nlogn and n in the formulae for the dependence of execution time on size.

Mergesort is very profligate with space. Sections 10ff consider ways of reducing the space used and replacing the recursive calls with while loops.

7 Quicksort

Mergesort does the work (merging) on the way out of the recursion.

Quicksort does it on the way in, and also does it in place.

Just as mergesort needs a merge subroutine, quicksort needs split, but this time we shall treat the main algorithm as the priority and leave split for the better programmers to do later.

Create a file Quicksort.java containing

   class Quicksort implements Sorting {
      public int[ ] sort (int[ ] A) {
         return quicksort (A, 0, A.length);
      }
      private int[ ] quicksort (int[ ] A, int left, int right) {

your code goes here

      }
      // access to split (A, left, right)
      private Splitting splitter;
      // this is for the hybrid algorithm
      private int cutoff=0;
      public void cutoff (int n) { cutoff=n; }
      public void choice (int c) {}
      public Quicksort() {
         splitter = new TestSplit();  // or Split() to use your own
      }

Do not allocate any new arrays.

As with shift(), where_is_min() and where_to_insert(), we shall work with segments of the array, delimited by left and right. As usual, left is inside and right is outside the segment. At the top level, left=0 and right=A.length.

The interesting thing about quicksort is how to choose the pivot value. Start by using pivot=A[left].

Call split with appropriate arguments to tell it what segment to consider and what the pivot value is. It re-arranges the segment and returns the position at (before) which to make another split.

Call quicksort recursively on the two halves.

Where is the sorted result now?

Try it out using the TESTER.

There is a special tracing routine to let you see how quicksort works:

   Trace.printquick (A, left, right, level, direction)

As you did with mergesort, you will need to add an argument to quicksort to count the level.

8 Choosing the pivot for quicksort

You can do this experiment with the version of quicksort that's built in to the TESTER.

   trace quick-sort pivot=first

Other ways of choosing the pivot are last, middle, mean, random and median3.

What is meant by the average, mean, median and mode of a collection of numbers?

Which of them can be calculated easily? Which of them would you like to use, ideally, to choose the pivot value for quicksort?

The first value in the segment is the obvious one to try, but what happens when you run quicksort on already sorted data? Use trace to watch it.

Is the last value any better than this? What about the middle one, or one in a random position?

The textbooks tell you to use the median of the first, middle and last values.

Write the code to do this. It is more complicated than you might like for such a simple job.

As is unfortunately rather common, people write textbooks and copy ideas like median-of-three from previous generations of textbooks (but adding the word " JAVA" in fluorescent letters to the cover of the book, to make you but it) without considering the ideas carefully themselves.

   run quick-sort sawtoothpc=50

to see what I mean by a sawtooth array. (I got the word from sawtooth waves, which are used to move the electron beam across a television screen, as my father was an electronics engineer designing televisions.)

You can control the position of the vertex of the sawtooth by using sawtooth=2 to put it 2 places from the left, sawtooth=-2 to put it 2 places from the right or sawtoothpc=25 to put it 25% of the way through the array.

   trace quick-sort sawtooth=1 pivot=median3

and guess what the result of check might be before you try it.

The rest of this exercise sheet is for the serious programmers.

9 Splitting

(To be written. See the exam question.)

10 Merge Sort with `while` loops: easier way

Merge sort does not have to be recursive. We can do it with loops instead. It's just that the programming is more difficult.

The print-out from part 5 shows that recursive merge sort breaks the array down into single elements and re-combines these into 2s, 4s, 8s, etc.., doing as much work as it can at the left before moving towards the right.

A different way to proceed is to combine all the 2s first, throughout the array, then all the 4s, then the 8s, up to the whole array.

Modify your merge code to copy segments of the array between certain pointers, instead of the whole array.

You will need nested loops like this:

    for (int frag_len = 1; frag_len < size; frag_len *= 2) {
         for (int from = 0;  from < size;  from += 2 * frag_len) {
               // merge B[from...from+frag_len-1] and B[to...to+frag_len-1]
               // to corresponding segment in A
         }
    }

When you've done each merge, you can either copy it back, or alternately merge from A[] to B[] and vice versa.

Don't forget to make allowance for the size not being exactly a power of 2.

11 Merge Sort with `while` loops: disk-efficient way

Part 10 wasn't the way in which the recursion in part 3 actually worked. When sorting amounts of data that are too large to fit in memory and must be stored on disk, the order of work that the recursive method exhibits is better because we can put whole blocks of sorted data on disk, and only have to read them back when we get to merging very long segments. The method in part 10 involves reading and writing to disk for every power of 2, not just the bigger ones.

How can you implement this method using while loops? How does this compare with the way that recursion is implemented on the machine?

This is not easy. It is only for the most confident programmers.

12 Merging runs instead

Here is yet another way to do merge sort, which will take linear time on already-sorted data.

Instead of dividing the array into 2s, 4s, 8s, 16s, and so on without regard to any order that may already exist, look for runs and merge those.

A run is a sequence of consecutive elements that are already in order.

Plainly it is inefficient to copy a run of length 1000 in order to merge a few elements into it, so we need a scheme of runs of diminishing lengths, so that you only merge runs that are of approximately equal lengths.

Try to mimic what the ordinary binary mergesort does.

This is www.PaulTaylor.EU/algorithms/merge+quick.html and it was derived from algorithms/merge+quick.tex which was last modified on 21 July 2007.

Introduction to Algorithms: Mergesort and Quicksort

Paul Taylor

19 February 2001

1 Mergesort and quicksort

2 Write the code for merge

3 Mergesort

4 Recursive Merge Sort

5 Seeing recursive mergesort in action

6 Hybrid of mergesort and insertion sort

7 Quicksort

8 Choosing the pivot for quicksort

9 Splitting

10 Merge Sort with while loops: easier way

11 Merge Sort with while loops: disk-efficient way

12 Merging runs instead

Introduction to Algorithms:
Mergesort and Quicksort

10 Merge Sort with `while` loops: easier way

11 Merge Sort with `while` loops: disk-efficient way