CS61B Textbook
  • Contributors
  • DISCLAIMER
  • 1. Introduction
    • 1.1 Your First Java Program
    • 1.2 Java Workflow
    • 1.3 Basic Java Features
    • 1.4 Exercises
  • 2. Defining and Using Classes
  • 3. References, Recursion, and Lists
  • 4. SLLists
  • 5. DLLists
  • 6. Arrays
  • 7. Testing
  • 8. ArrayList
  • 9. Inheritance I: Interface and Implementation Inheritance
  • 10. Inheritance II: Extends, Casting, Higher Order Functions
    • 10.1 Implementation Inheritance: Extends
    • 10.2 Encapsulation
    • 10.3 Casting
    • 10.4 Higher Order Functions in Java
    • 10.5 Exercises
  • 11. Inheritance III: Subtype Polymorphism, Comparators, Comparable
    • 11.1 A Review of Dynamic Method Selection
    • 11.2 Subtype Polymorphism vs Explicit Higher Order Functions
    • 11.3 Comparables
    • 11.4 Comparators
    • 11.5 Chapter Summary
    • 11.6 Exercises
  • 12. Inheritance IV: Iterators, Object Methods
    • 12.1 Lists and Sets in Java
    • 12.2 Exceptions
    • 12.3 Iteration
    • 12.4 Object Methods
    • 12.5 Chapter Summary
    • 12.6 Exercises
  • 13. Asymptotics I
    • 13.1 An Introduction to Asymptotic Analysis
    • 13.2 Runtime Characterization
    • 13.3 Checkpoint: An Exercise
    • 13.4 Asymptotic Behavior
    • 13.6 Simplified Analysis Process
    • 13.7 Big-Theta
    • 13.8 Big-O
    • 13.9 Summary
    • 13.10 Exercises
  • 14. Disjoint Sets
    • 14.1 Introduction
    • 14.2 Quick Find
    • 14.3 Quick Union
    • 14.4 Weighted Quick Union (WQU)
    • 14.5 Weighted Quick Union with Path Compression
    • 14.6 Exercises
  • 15. Asymptotics II
    • 15.1 For Loops
    • 15.2 Recursion
    • 15.3 Binary Search
    • 15.4 Mergesort
    • 15.5 Summary
    • 15.6 Exercises
  • 16. ADTs and BSTs
    • 16.1 Abstract Data Types
    • 16.2 Binary Search Trees
    • 16.3 BST Definitions
    • 16.4 BST Operations
    • 16.5 BSTs as Sets and Maps
    • 16.6 Summary
    • 16.7 Exercises
  • 17. B-Trees
    • 17.1 BST Performance
    • 17.2 Big O vs. Worst Case
    • 17.3 B-Tree Operations
    • 17.4 B-Tree Invariants
    • 17.5 B-Tree Performance
    • 17.6 Summary
    • 17.7 Exercises
  • 18. Red Black Trees
    • 18.1 Rotating Trees
    • 18.2 Creating LLRB Trees
    • 18.3 Inserting LLRB Trees
    • 18.4 Runtime Analysis
    • 18.5 Summary
    • 18.6 Exercises
  • 19. Hashing I
    • 19.1 Introduction to Hashing: Data Indexed Arrays
      • 19.1.1 A first attempt: DataIndexedIntegerSet
      • 19.1.2 A second attempt: DataIndexedWordSet
      • 19.1.3 A third attempt: DataIndexedStringSet
    • 19.2 Hash Code
    • 19.3 "Valid" & "Good" Hashcodes
    • 19.4 Handling Collisions: Linear Probing and External Chaining
    • 19.5 Resizing & Hash Table Performance
    • 19.6 Summary
    • 19.7 Exercises
  • 20. Hashing II
    • 20.1 Hash Table Recap, Default Hash Function
    • 20.2 Distribution By Other Hash Functions
    • 20.3 Contains & Duplicate Items
    • 20.4 Mutable vs. Immutable Types
  • 21. Heaps and Priority Queues
    • 21.1 Priority Queues
    • 21.2 Heaps
    • 21.3 PQ Implementation
    • 21.4 Summary
    • 21.5 Exercises
  • 22. Tree Traversals and Graphs
    • 22.1 Tree Recap
    • 22.2 Tree Traversals
    • 22.3 Graphs
    • 22.4 Graph Problems
  • 23. Graph Traversals and Implementations
    • 23.1 BFS & DFS
    • 23.2 Representing Graphs
    • 23.3 Summary
    • 23.4 Exercises
  • 24. Shortest Paths
    • 24.1 Introduction
    • 24.2 Dijkstra's Algorithm
    • 24.3 A* Algorithm
    • 24.4 Summary
    • 24.5 Exercises
  • 25. Minimum Spanning Trees
    • 25.1 MSTs and Cut Property
    • 25.2 Prim's Algorithm
    • 25.3 Kruskal's Algorithm
    • 25.4 Chapter Summary
    • 25.5 MST Exercises
  • 26. Prefix Operations and Tries
    • 26.1 Introduction to Tries
    • 26.2 Trie Implementation
    • 26.3 Trie String Operations
    • 26.4 Summary
    • 26.5 Exercises
  • 27. Software Engineering I
    • 27.1 Introduction to Software Engineering
    • 27.2 Complexity
    • 27.3 Strategic vs Tactical Programming
    • 27.4 Real World Examples
    • 27.5 Summary, Exercises
  • 28. Reductions and Decomposition
    • 28.1 Topological Sorts and DAGs
    • 28.2 Shortest Paths on DAGs
    • 28.3 Longest Path
    • 28.4 Reductions and Decomposition
    • 28.5 Exercises
  • 29. Basic Sorts
    • 29.1 The Sorting Problem
    • 29.2 Selection Sort & Heapsort
    • 29.3 Mergesort
    • 29.4 Insertion Sort
    • 29.5 Summary
    • 29.6 Exercises
  • 30. Quicksort
    • 30.1 Partitioning
    • 30.2 Quicksort Algorithm
    • 30.3 Quicksort Performance Caveats
    • 30.4 Summary
    • 30.5 Exercises
  • 31. Software Engineering II
    • 31.1 Complexity II
    • 31.2 Sources of Complexity
    • 31.3 Modular Design
    • 31.4 Teamwork
    • 31.5 Exerises
  • 32. More Quick Sort, Sorting Summary
    • 32.1 Quicksort Flavors vs. MergeSort
    • 32.2 Quick Select
    • 32.3 Stability, Adaptiveness, and Optimization
    • 32.4 Summary
    • 32.5 Exercises
  • 33. Software Engineering III
    • 33.1 Candy Crush, SnapChat, and Friends
    • 33.2 The Ledger of Harms
    • 33.3 Your Life
    • 33.4 Summary
    • 33.5 Exercises
  • 34. Sorting and Algorithmic Bounds
    • 34.1 Sorting Summary
    • 34.2 Math Problems Out of Nowhere
    • 34.3 Theoretical Bounds on Sorting
    • 34.4 Summary
    • 34.5 Exercises
  • 35. Radix Sorts
    • 35.1 Counting Sort
    • 35.2 LSD Radix Sort
    • 35.3 MSD Radix Sort
    • 35.4 Summary
    • 35.5 Exercises
  • 36. Sorting and Data Structures Conclusion
    • 36.1 Radix vs. Comparison Sorting
    • 36.2 The Just-In-Time Compiler
    • 36.3 Radix Sorting Integers
    • 36.4 Summary
    • 36.5 Exercises
  • 37. Software Engineering IV
    • 37.1 The end is near
  • 38. Compression and Complexity
    • 38.1 Introduction to Compression
    • 38.2 Prefix-free Codes
    • 38.3 Shannon-Fano Codes
    • 38.4 Huffman Coding Conceptuals
    • 38.5 Compression Theory
    • 38.6 LZW Compression
    • 38.7 Summary
    • 38.8 Exercises
  • 39. Compression, Complexity, P = NP
    • 39.1 Models of Compression
    • 39.2 Optimal Compression, Kolmogorov Complexity
    • 39.3 Space/Time-Bounded Compression
    • 39.4 P = NP
    • 39.5 Exercises
Powered by GitBook
On this page
  • Quicksort Flavors: Motivation
  • Quicksort Flavors: Philosophies
  • Philosophy 1: Randomness
  • Philosophy 2: Smarter Pivot Selection
  • Philosophy 3: Introspection
  • Quicksort Flavors vs. MergeSort: Who's the best?
  • Candidate One: Quicksort L3S
  • Candidate Two: Quicksort LTHS
  • Candidate Three: QuickSort PickTH
  1. 32. More Quick Sort, Sorting Summary

32.1 Quicksort Flavors vs. MergeSort

Previous32. More Quick Sort, Sorting SummaryNext32.2 Quick Select

Last updated 2 years ago

Quicksort Flavors: Motivation

Before we dive into the wonderful ideas of how to improve the Quicksort Algorithm, here's a on how the Quicksort (or Partition Sort) works.

Recall, our version of Quicksort has the following properties:

  • Leftmost item is always the pivot

  • Our partitioning algorithm always preserves the relative order of <= and >= items.

And remember that even though if we have a pivot that is always "good", which leads to a best case runtime of θ(NlogN)\theta(NlogN)θ(NlogN), it is still possible that the rare worst case runtime θ(N2)\theta(N^2)θ(N2) will happen in practice. Consequently, Quicksort will have a worse runtime than , losing its title as the "fastest sorting algorithm".

Therefore, a natural question to ask is: What can we do to avoid running into the worst case for Quicksort? So it can beat MergeSort for sure? This will be the main motivation of exploration for this section.

Quicksort Flavors: Philosophies

Here are some general philosophies to follow to brainstorm ways to improve Quicksort

Philosophy 1: Randomness

Some possible reasons that the rare worst case runtime still happen in practice include:

  • Bad ordering: Arrays already in sorted order

  • Bad elements: Arrays with all duplicates

In both of the examples above, assuming we are using the strategy that the leftmost item is always the pivot, each partition of the elements around the pivot fails to reduce the size of the resulting subarrays significantly, which leads to the consequence of going through the items many many times.

One solution to the bad ordering problem is randomness, which is encoded by these two strategies below:

  1. Pick pivots randomly.

  2. Shuffle items before sort.

Note: Strategy 2 requires care in partitioning code to avoid θ(N2)\theta(N^2)θ(N2) behavior on arrays of duplicates.

Turns out, this randomized version of Quicksort is the best sort you can get with extremely good performance.

Just like the care-free, motorcycle rider from the 1992 TV Show Renegade, this version of the Quicksort algorithm is able to thrive with a devil-may-care attitude, even without the luxurious hair.

Philosophy 2: Smarter Pivot Selection

Another approach to the problem is to choose a "good" pivot effectively, somehow.

2a. Constant time pivot pick

One of such approach is to pick a few items, and then among them, choose the "best" one as the pivot. This type of approach exemplifies a type of pivot selection procedure that is both deterministic and constant time.

However, a big drawback of such procedure is that the resulting Quicksort has a family of "dangerous inputs"---inputs that will lead to the worst case runtime or break the algorithm---that an adversary could easily generate.

2b. Linear time pivot pick

Since we are partioning multiple times, always selecting the median as the pivot seems like a good idea, because each partition will allow us to cut the size of the resulting subarrays by half.

Therefore, to improve upon the idea from part a, we can calculate the median of a given array, which can be done in linear time, then select that as the pivot.

The "exact median QuickSort" algorithm will not have the technical vulnerabilities as the 2a approach, and has a runtime of θ(NlogN)\theta(NlogN)θ(NlogN). However, it is still slower than MergeSort.

Philosophy 3: Introspection

This philosophy simply relies on introspecting the recursive depth of our algorithm, and switch to MergeSort once we hit the depth threshold.

Although this is a reasonable approach, it is not common to use in practice.

Quicksort Flavors vs. MergeSort: Who's the best?

Candidate One: Quicksort L3S

Quicksort L3S is the Quicksort that is introduced last time, with the following properties:

  • Always leftmost pivot selection

  • Partioning

  • Shuffle before starting

Unfortunately, 0 : 1 with Mergesort in the lead.

Candidate Two: Quicksort LTHS

Tony Hoare's Partioning

Imagine two pointers, one at each end of the items array, walking towards each other:

  • Left pointer loves small items.

  • Right pointer loves large items.

  • Big idea: Walk towards each other, swapping anything they don’t like; stop when the two pointers cross each other.

    • Left pointer hates larger or equal items

    • Right pointer hates smaller or equal items

  • New pivot = Item at right pointer.

  • End result is that things on left of pivot (originally the leftmost item) are “small” and things on the right of pivot are “large”.

TH Partioning: An Intuitive Analogy

Intuitively, you can think of each pointer as a little "police" for the "less than pivot" territory and "greater than pivot" territory. And think of the pivot as the "border line" between the two territories.

Their job is to find and arrest any "suspects", or items that don't belong to their respective territories, and trade the arrested suspects with the "police" on the other side.

Specifically, the left pointer is the "police" for the "less than pivot" territory, looking for any "suspects" (items greater than or equal to the pivot) that are not supposed to be on this side of the border line. Analogously, the right pointer looks for "suspects" (items less than or equal to the pivot) in the "greater than territory".

When the two "polices" meet, mission is done as they both have sufficiently searched across their representative territories.

All this work for nothing?

Not at all! All this work with a new partioning scheme did pay off, and using the TH partitioning scheme yields a very fast Quicksort: one that is faster than MergeSort!

It's important to also note that:

  • Faster schemes have been found since.

  • Overall runtime still depends crucially on pivot selection strategy!

1 : 1 with a win for QuickSort Flavors!

Candidate Three: QuickSort PickTH

Median Identification: Linear Time

Recall from philosophy 2b that the idea of identifying the median and use it as the pivot is inefficient because finding the median itself is a costly process. (Runtime: θ(NlogN)\theta(NlogN)θ(NlogN)

Will this improved version of Exact Median Quicksort perform better than Mergesort?

Sadly, no. And it was terrible! Likely for the following reasons:

  • Cost to compute medians is too high.

  • Have to live with worst case θ(N2)\theta(N^2)θ(N2) if we want good practical performance.

1 : 2 with Mergesort as the winner.

Quicksort LTHS is very similar to Quicksort L3S, except with a different partioning scheme: .

Turns out, it is possible to find the median in θ(N)\theta(N) θ(N) time instead, with an algorithm called "".

review
MergeSort
Tony Hoare's In-Place Partioning Scheme
PICK
Spirit of Randomized Quicksort Algorithm.