# 36.1 Radix vs. Comparison Sorting

## Intuitive Analysis

### Merge Sort Runtime

Merge Sort requires Θ(N log N) compares.
A key concept here is that Merge Sort’s runtime on strings of length W is Θ(N log N) if each comparison takes constant time or Θ(WN log N) if each comparison takes Θ(W) time.
Example of Θ(N log N): Strings are all different in top character.
Example of Θ(WN log N): Strings are all equal.

### LSD vs. Merge Sort

The facts:
Treating alphabet size as constant, LSD Sort has runtime Θ(WN). Merge Sort has runtime between Θ(N log N) and Θ(WN log N).
Which is better? It depends.

#### When might LSD sort be faster?

• When W is sufficiently smaller than N (or equivalently, N is really really big, or as you get to larger and larger numbers of strings, we expect LSD to pull ahead).
• Worst case strings for mergesort, the more similar they are, the more we expect LSD sort to do better.
• Sufficiently large N.
• If strings are very similar to each other.
• Each Merge Sort comparison costs Θ(W) time.

#### When might Merge Sort be faster?

• Strings are different, especially at the beginning.
• If strings are highly dissimilar from each other
• Each merge sort comparison is very fast

## Cost Model Analysis

### Alternate Approach: Picking a Cost Model

An alternate approach is to pick a cost model.
• We’ll use number of characters examined.
• By “examined”, we mean:
• Radix Sort: Calling charAt in order to count occurrences of each character.
• Merge Sort: Calling charAt in order to compare two Strings.

### MSD vs. Mergesort

Suppose we have 100 strings of 1000 characters each.
For MSD Radix Sort, in the worst case (all strings equal), every character is examined exactly once. Thus, we have exactly 100,000 total character examinations.
For Merge Sort, estimate the total number of characters examined if all strings are equal.
Merging 100 items, assuming equal items results in always picking left:
• Comparing A[0] to A[50]: 2000 character examinations.
• Comparing A[1] to A[50]: 2000 character examinations.
• … Comparing A[49] to A[50]: 2000 character examinations.
• Total characters examined: 50 * 2000 = 100000.
• Merging N strings of 1000 characters requires N/2 * 2000 = 1000N examinations.
In total, we must examine approximately 1000N log2 N total characters.
• 100000 + 50000*2 + 25000 * 4 + … = ~660,000 characters.

### MSD vs. Mergesort Character Examinations

For N equal strings of length 1000, we found that:
• MSD radix sort will examine ~1000N characters (For N= 100: 100,000).
• Merge sort will examine ~1000Nlog2(N) characters (For N=100: 660,000).
If character examination are an appropriate cost model, we’d expect Merge Sort to be slower by a factor of log2N.
To see if we’re right, we’ll need to do a computational experiment.