> For the complete documentation index, see [llms.txt](https://cs61b-2.gitbook.io/cs61b-textbook/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://cs61b-2.gitbook.io/cs61b-textbook/36.-sorting-and-data-structures-conclusion/36.1-radix-vs.-comparison-sorting.md).

# 36.1 Radix vs. Comparison Sorting

## Intuitive Analysis

### Merge Sort Runtime

Merge Sort requires Θ(N log N) compares.

A key concept here is that Merge Sort’s runtime on strings of length W is Θ(N log N) if each comparison takes constant time or Θ(WN log N) if each comparison takes Θ(W) time.&#x20;

Example of Θ(N log N): Strings are all different in top character.&#x20;

Example of Θ(WN log N): Strings are all equal.

### LSD vs. Merge Sort

The facts:

Treating alphabet size as constant, LSD Sort has runtime Θ(WN). Merge Sort has runtime between Θ(N log N) and Θ(WN log N).

Which is better? It depends.

#### When might LSD sort be faster?

* When W is sufficiently smaller than N (or equivalently, N is really really big, or as you get to larger and larger numbers of strings, we expect LSD to pull ahead).
* Worst case strings for mergesort, the more similar they are, the more we expect LSD sort to do better.&#x20;
* Sufficiently large N.
* If strings are very similar to each other.
  * &#x20;Each Merge Sort comparison costs Θ(W) time.

<figure><img src="/files/UGrMlr1kEgIxtXMSPJWz" alt=""><figcaption></figcaption></figure>

#### When might Merge Sort be faster?

* Strings are different, especially at the beginning.
* If strings are highly dissimilar from each other
  * Each merge sort comparison is very fast

<figure><img src="/files/vxib7k8PcGxOO1jwdFDQ" alt=""><figcaption></figcaption></figure>

## Cost Model Analysis

### Alternate Approach: Picking a Cost Model

An alternate approach is to pick a cost model.

* We’ll use number of characters examined.
* By “examined”, we mean:
  * Radix Sort: Calling charAt in order to count occurrences of each character.
  * Merge Sort: Calling charAt in order to compare two Strings.

### MSD vs. Mergesort

Suppose we have 100 strings of 1000 characters each.

For MSD Radix Sort, in the worst case (all strings equal), every character is examined exactly once. Thus, we have exactly 100,000 total character examinations.

<figure><img src="/files/ajalRZczwiXHvmTQOjwu" alt=""><figcaption></figcaption></figure>

For Merge Sort, estimate the total number of characters examined if all strings are equal.

Merging 100 items, assuming equal items results in always picking left:

* Comparing A\[0] to A\[50]: 2000 character examinations.
* Comparing A\[1] to A\[50]: 2000 character examinations.
* … Comparing A\[49] to A\[50]: 2000 character examinations.
* Total characters examined: 50 \* 2000 = 100000.
* Merging N strings of 1000 characters requires N/2 \* 2000 = 1000N examinations.

<figure><img src="/files/wnVYwjuV5v3ASIYWnHqL" alt=""><figcaption></figcaption></figure>

In total, we must examine approximately 1000N log2 N total characters.

* 100000 + 50000\*2 + 25000 \* 4 + … = \~660,000 characters.

<figure><img src="/files/0yW1AFc1CRIrTmVjEycy" alt=""><figcaption></figcaption></figure>

### MSD vs. Mergesort Character Examinations

For N equal strings of length 1000, we found that:

* MSD radix sort will examine \~1000N characters (For N= 100: 100,000).
* Merge sort will examine \~1000Nlog2(N) characters (For N=100: 660,000).<br>

  If character examination are an appropriate cost model, we’d expect Merge Sort to be slower by a factor of log2N.<br>

  To see if we’re right, we’ll need to do a computational experiment.

## Empirical Study: Radix Sort vs. Comparison Sorting

### Computational Experiment Results (Your Answers)

Computational experiment for W = 100.

* [MSD](https://algs4.cs.princeton.edu/51radix/MSD.java.html) and [merge sort](https://algs4.cs.princeton.edu/22mergesort/MergeX.java.html) implementations are highly optimized versions taken from our optional algorithms textbook.
* Does our data match our runtime hypothesis? No! Why not?
  * Maybe when MSD makes partitions, the divide and conquer cost is not accounted for.
  * Is mergesort optimized to do the timesort optimization? Unclear.
  * Maybe this is caching performance, Josh mentioned MSD has bad cache performance.
  * Maybe string comparison is not linear time somehow (keep in mind every string compare does the SAME thing. Repeated work on real machines these days tends to get optimized away).
  * Our cost model isn’t representative of everything that is happening.
  * One particularly thorny issue: **The “Just In Time” Compiler.**

<figure><img src="/files/IfUYlpQa8qV47hTSVWL4" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/5pr1cwHPZfnMS7tgqM7z" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://cs61b-2.gitbook.io/cs61b-textbook/36.-sorting-and-data-structures-conclusion/36.1-radix-vs.-comparison-sorting.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
