CS61B Textbook
  • Contributors
  • DISCLAIMER
  • 1. Introduction
    • 1.1 Your First Java Program
    • 1.2 Java Workflow
    • 1.3 Basic Java Features
    • 1.4 Exercises
  • 2. Defining and Using Classes
  • 3. References, Recursion, and Lists
  • 4. SLLists
  • 5. DLLists
  • 6. Arrays
  • 7. Testing
  • 8. ArrayList
  • 9. Inheritance I: Interface and Implementation Inheritance
  • 10. Inheritance II: Extends, Casting, Higher Order Functions
    • 10.1 Implementation Inheritance: Extends
    • 10.2 Encapsulation
    • 10.3 Casting
    • 10.4 Higher Order Functions in Java
    • 10.5 Exercises
  • 11. Inheritance III: Subtype Polymorphism, Comparators, Comparable
    • 11.1 A Review of Dynamic Method Selection
    • 11.2 Subtype Polymorphism vs Explicit Higher Order Functions
    • 11.3 Comparables
    • 11.4 Comparators
    • 11.5 Chapter Summary
    • 11.6 Exercises
  • 12. Inheritance IV: Iterators, Object Methods
    • 12.1 Lists and Sets in Java
    • 12.2 Exceptions
    • 12.3 Iteration
    • 12.4 Object Methods
    • 12.5 Chapter Summary
    • 12.6 Exercises
  • 13. Asymptotics I
    • 13.1 An Introduction to Asymptotic Analysis
    • 13.2 Runtime Characterization
    • 13.3 Checkpoint: An Exercise
    • 13.4 Asymptotic Behavior
    • 13.6 Simplified Analysis Process
    • 13.7 Big-Theta
    • 13.8 Big-O
    • 13.9 Summary
    • 13.10 Exercises
  • 14. Disjoint Sets
    • 14.1 Introduction
    • 14.2 Quick Find
    • 14.3 Quick Union
    • 14.4 Weighted Quick Union (WQU)
    • 14.5 Weighted Quick Union with Path Compression
    • 14.6 Exercises
  • 15. Asymptotics II
    • 15.1 For Loops
    • 15.2 Recursion
    • 15.3 Binary Search
    • 15.4 Mergesort
    • 15.5 Summary
    • 15.6 Exercises
  • 16. ADTs and BSTs
    • 16.1 Abstract Data Types
    • 16.2 Binary Search Trees
    • 16.3 BST Definitions
    • 16.4 BST Operations
    • 16.5 BSTs as Sets and Maps
    • 16.6 Summary
    • 16.7 Exercises
  • 17. B-Trees
    • 17.1 BST Performance
    • 17.2 Big O vs. Worst Case
    • 17.3 B-Tree Operations
    • 17.4 B-Tree Invariants
    • 17.5 B-Tree Performance
    • 17.6 Summary
    • 17.7 Exercises
  • 18. Red Black Trees
    • 18.1 Rotating Trees
    • 18.2 Creating LLRB Trees
    • 18.3 Inserting LLRB Trees
    • 18.4 Runtime Analysis
    • 18.5 Summary
    • 18.6 Exercises
  • 19. Hashing I
    • 19.1 Introduction to Hashing: Data Indexed Arrays
      • 19.1.1 A first attempt: DataIndexedIntegerSet
      • 19.1.2 A second attempt: DataIndexedWordSet
      • 19.1.3 A third attempt: DataIndexedStringSet
    • 19.2 Hash Code
    • 19.3 "Valid" & "Good" Hashcodes
    • 19.4 Handling Collisions: Linear Probing and External Chaining
    • 19.5 Resizing & Hash Table Performance
    • 19.6 Summary
    • 19.7 Exercises
  • 20. Hashing II
    • 20.1 Hash Table Recap, Default Hash Function
    • 20.2 Distribution By Other Hash Functions
    • 20.3 Contains & Duplicate Items
    • 20.4 Mutable vs. Immutable Types
  • 21. Heaps and Priority Queues
    • 21.1 Priority Queues
    • 21.2 Heaps
    • 21.3 PQ Implementation
    • 21.4 Summary
    • 21.5 Exercises
  • 22. Tree Traversals and Graphs
    • 22.1 Tree Recap
    • 22.2 Tree Traversals
    • 22.3 Graphs
    • 22.4 Graph Problems
  • 23. Graph Traversals and Implementations
    • 23.1 BFS & DFS
    • 23.2 Representing Graphs
    • 23.3 Summary
    • 23.4 Exercises
  • 24. Shortest Paths
    • 24.1 Introduction
    • 24.2 Dijkstra's Algorithm
    • 24.3 A* Algorithm
    • 24.4 Summary
    • 24.5 Exercises
  • 25. Minimum Spanning Trees
    • 25.1 MSTs and Cut Property
    • 25.2 Prim's Algorithm
    • 25.3 Kruskal's Algorithm
    • 25.4 Chapter Summary
    • 25.5 MST Exercises
  • 26. Prefix Operations and Tries
    • 26.1 Introduction to Tries
    • 26.2 Trie Implementation
    • 26.3 Trie String Operations
    • 26.4 Summary
    • 26.5 Exercises
  • 27. Software Engineering I
    • 27.1 Introduction to Software Engineering
    • 27.2 Complexity
    • 27.3 Strategic vs Tactical Programming
    • 27.4 Real World Examples
    • 27.5 Summary, Exercises
  • 28. Reductions and Decomposition
    • 28.1 Topological Sorts and DAGs
    • 28.2 Shortest Paths on DAGs
    • 28.3 Longest Path
    • 28.4 Reductions and Decomposition
    • 28.5 Exercises
  • 29. Basic Sorts
    • 29.1 The Sorting Problem
    • 29.2 Selection Sort & Heapsort
    • 29.3 Mergesort
    • 29.4 Insertion Sort
    • 29.5 Summary
    • 29.6 Exercises
  • 30. Quicksort
    • 30.1 Partitioning
    • 30.2 Quicksort Algorithm
    • 30.3 Quicksort Performance Caveats
    • 30.4 Summary
    • 30.5 Exercises
  • 31. Software Engineering II
    • 31.1 Complexity II
    • 31.2 Sources of Complexity
    • 31.3 Modular Design
    • 31.4 Teamwork
    • 31.5 Exerises
  • 32. More Quick Sort, Sorting Summary
    • 32.1 Quicksort Flavors vs. MergeSort
    • 32.2 Quick Select
    • 32.3 Stability, Adaptiveness, and Optimization
    • 32.4 Summary
    • 32.5 Exercises
  • 33. Software Engineering III
    • 33.1 Candy Crush, SnapChat, and Friends
    • 33.2 The Ledger of Harms
    • 33.3 Your Life
    • 33.4 Summary
    • 33.5 Exercises
  • 34. Sorting and Algorithmic Bounds
    • 34.1 Sorting Summary
    • 34.2 Math Problems Out of Nowhere
    • 34.3 Theoretical Bounds on Sorting
    • 34.4 Summary
    • 34.5 Exercises
  • 35. Radix Sorts
    • 35.1 Counting Sort
    • 35.2 LSD Radix Sort
    • 35.3 MSD Radix Sort
    • 35.4 Summary
    • 35.5 Exercises
  • 36. Sorting and Data Structures Conclusion
    • 36.1 Radix vs. Comparison Sorting
    • 36.2 The Just-In-Time Compiler
    • 36.3 Radix Sorting Integers
    • 36.4 Summary
    • 36.5 Exercises
  • 37. Software Engineering IV
    • 37.1 The end is near
  • 38. Compression and Complexity
    • 38.1 Introduction to Compression
    • 38.2 Prefix-free Codes
    • 38.3 Shannon-Fano Codes
    • 38.4 Huffman Coding Conceptuals
    • 38.5 Compression Theory
    • 38.6 LZW Compression
    • 38.7 Summary
    • 38.8 Exercises
  • 39. Compression, Complexity, P = NP
    • 39.1 Models of Compression
    • 39.2 Optimal Compression, Kolmogorov Complexity
    • 39.3 Space/Time-Bounded Compression
    • 39.4 P = NP
    • 39.5 Exercises
Powered by GitBook
On this page
  • Intuitive Analysis
  • Merge Sort Runtime
  • LSD vs. Merge Sort
  • Cost Model Analysis
  • Alternate Approach: Picking a Cost Model
  • MSD vs. Mergesort
  • MSD vs. Mergesort Character Examinations
  • Empirical Study: Radix Sort vs. Comparison Sorting
  • Computational Experiment Results (Your Answers)
  1. 36. Sorting and Data Structures Conclusion

36.1 Radix vs. Comparison Sorting

Intuitive Analysis

Merge Sort Runtime

Merge Sort requires Θ(N log N) compares.

A key concept here is that Merge Sort’s runtime on strings of length W is Θ(N log N) if each comparison takes constant time or Θ(WN log N) if each comparison takes Θ(W) time.

Example of Θ(N log N): Strings are all different in top character.

Example of Θ(WN log N): Strings are all equal.

LSD vs. Merge Sort

The facts:

Treating alphabet size as constant, LSD Sort has runtime Θ(WN). Merge Sort has runtime between Θ(N log N) and Θ(WN log N).

Which is better? It depends.

When might LSD sort be faster?

  • When W is sufficiently smaller than N (or equivalently, N is really really big, or as you get to larger and larger numbers of strings, we expect LSD to pull ahead).

  • Worst case strings for mergesort, the more similar they are, the more we expect LSD sort to do better.

  • Sufficiently large N.

  • If strings are very similar to each other.

    • Each Merge Sort comparison costs Θ(W) time.

When might Merge Sort be faster?

  • Strings are different, especially at the beginning.

  • If strings are highly dissimilar from each other

    • Each merge sort comparison is very fast

Cost Model Analysis

Alternate Approach: Picking a Cost Model

An alternate approach is to pick a cost model.

  • We’ll use number of characters examined.

  • By “examined”, we mean:

    • Radix Sort: Calling charAt in order to count occurrences of each character.

    • Merge Sort: Calling charAt in order to compare two Strings.

MSD vs. Mergesort

Suppose we have 100 strings of 1000 characters each.

For MSD Radix Sort, in the worst case (all strings equal), every character is examined exactly once. Thus, we have exactly 100,000 total character examinations.

For Merge Sort, estimate the total number of characters examined if all strings are equal.

Merging 100 items, assuming equal items results in always picking left:

  • Comparing A[0] to A[50]: 2000 character examinations.

  • Comparing A[1] to A[50]: 2000 character examinations.

  • … Comparing A[49] to A[50]: 2000 character examinations.

  • Total characters examined: 50 * 2000 = 100000.

  • Merging N strings of 1000 characters requires N/2 * 2000 = 1000N examinations.

In total, we must examine approximately 1000N log2 N total characters.

  • 100000 + 50000*2 + 25000 * 4 + … = ~660,000 characters.

MSD vs. Mergesort Character Examinations

For N equal strings of length 1000, we found that:

  • MSD radix sort will examine ~1000N characters (For N= 100: 100,000).

  • Merge sort will examine ~1000Nlog2(N) characters (For N=100: 660,000).

    If character examination are an appropriate cost model, we’d expect Merge Sort to be slower by a factor of log2N.

    To see if we’re right, we’ll need to do a computational experiment.

Empirical Study: Radix Sort vs. Comparison Sorting

Computational Experiment Results (Your Answers)

Computational experiment for W = 100.

  • Does our data match our runtime hypothesis? No! Why not?

    • Maybe when MSD makes partitions, the divide and conquer cost is not accounted for.

    • Is mergesort optimized to do the timesort optimization? Unclear.

    • Maybe this is caching performance, Josh mentioned MSD has bad cache performance.

    • Maybe string comparison is not linear time somehow (keep in mind every string compare does the SAME thing. Repeated work on real machines these days tends to get optimized away).

    • Our cost model isn’t representative of everything that is happening.

    • One particularly thorny issue: The “Just In Time” Compiler.

Previous36. Sorting and Data Structures ConclusionNext36.2 The Just-In-Time Compiler

Last updated 2 years ago

and implementations are highly optimized versions taken from our optional algorithms textbook.

MSD
merge sort