CS61B Textbook
  • Contributors
  • DISCLAIMER
  • 1. Introduction
    • 1.1 Your First Java Program
    • 1.2 Java Workflow
    • 1.3 Basic Java Features
    • 1.4 Exercises
  • 2. Defining and Using Classes
  • 3. References, Recursion, and Lists
  • 4. SLLists
  • 5. DLLists
  • 6. Arrays
  • 7. Testing
  • 8. ArrayList
  • 9. Inheritance I: Interface and Implementation Inheritance
  • 10. Inheritance II: Extends, Casting, Higher Order Functions
    • 10.1 Implementation Inheritance: Extends
    • 10.2 Encapsulation
    • 10.3 Casting
    • 10.4 Higher Order Functions in Java
    • 10.5 Exercises
  • 11. Inheritance III: Subtype Polymorphism, Comparators, Comparable
    • 11.1 A Review of Dynamic Method Selection
    • 11.2 Subtype Polymorphism vs Explicit Higher Order Functions
    • 11.3 Comparables
    • 11.4 Comparators
    • 11.5 Chapter Summary
    • 11.6 Exercises
  • 12. Inheritance IV: Iterators, Object Methods
    • 12.1 Lists and Sets in Java
    • 12.2 Exceptions
    • 12.3 Iteration
    • 12.4 Object Methods
    • 12.5 Chapter Summary
    • 12.6 Exercises
  • 13. Asymptotics I
    • 13.1 An Introduction to Asymptotic Analysis
    • 13.2 Runtime Characterization
    • 13.3 Checkpoint: An Exercise
    • 13.4 Asymptotic Behavior
    • 13.6 Simplified Analysis Process
    • 13.7 Big-Theta
    • 13.8 Big-O
    • 13.9 Summary
    • 13.10 Exercises
  • 14. Disjoint Sets
    • 14.1 Introduction
    • 14.2 Quick Find
    • 14.3 Quick Union
    • 14.4 Weighted Quick Union (WQU)
    • 14.5 Weighted Quick Union with Path Compression
    • 14.6 Exercises
  • 15. Asymptotics II
    • 15.1 For Loops
    • 15.2 Recursion
    • 15.3 Binary Search
    • 15.4 Mergesort
    • 15.5 Summary
    • 15.6 Exercises
  • 16. ADTs and BSTs
    • 16.1 Abstract Data Types
    • 16.2 Binary Search Trees
    • 16.3 BST Definitions
    • 16.4 BST Operations
    • 16.5 BSTs as Sets and Maps
    • 16.6 Summary
    • 16.7 Exercises
  • 17. B-Trees
    • 17.1 BST Performance
    • 17.2 Big O vs. Worst Case
    • 17.3 B-Tree Operations
    • 17.4 B-Tree Invariants
    • 17.5 B-Tree Performance
    • 17.6 Summary
    • 17.7 Exercises
  • 18. Red Black Trees
    • 18.1 Rotating Trees
    • 18.2 Creating LLRB Trees
    • 18.3 Inserting LLRB Trees
    • 18.4 Runtime Analysis
    • 18.5 Summary
    • 18.6 Exercises
  • 19. Hashing I
    • 19.1 Introduction to Hashing: Data Indexed Arrays
      • 19.1.1 A first attempt: DataIndexedIntegerSet
      • 19.1.2 A second attempt: DataIndexedWordSet
      • 19.1.3 A third attempt: DataIndexedStringSet
    • 19.2 Hash Code
    • 19.3 "Valid" & "Good" Hashcodes
    • 19.4 Handling Collisions: Linear Probing and External Chaining
    • 19.5 Resizing & Hash Table Performance
    • 19.6 Summary
    • 19.7 Exercises
  • 20. Hashing II
    • 20.1 Hash Table Recap, Default Hash Function
    • 20.2 Distribution By Other Hash Functions
    • 20.3 Contains & Duplicate Items
    • 20.4 Mutable vs. Immutable Types
  • 21. Heaps and Priority Queues
    • 21.1 Priority Queues
    • 21.2 Heaps
    • 21.3 PQ Implementation
    • 21.4 Summary
    • 21.5 Exercises
  • 22. Tree Traversals and Graphs
    • 22.1 Tree Recap
    • 22.2 Tree Traversals
    • 22.3 Graphs
    • 22.4 Graph Problems
  • 23. Graph Traversals and Implementations
    • 23.1 BFS & DFS
    • 23.2 Representing Graphs
    • 23.3 Summary
    • 23.4 Exercises
  • 24. Shortest Paths
    • 24.1 Introduction
    • 24.2 Dijkstra's Algorithm
    • 24.3 A* Algorithm
    • 24.4 Summary
    • 24.5 Exercises
  • 25. Minimum Spanning Trees
    • 25.1 MSTs and Cut Property
    • 25.2 Prim's Algorithm
    • 25.3 Kruskal's Algorithm
    • 25.4 Chapter Summary
    • 25.5 MST Exercises
  • 26. Prefix Operations and Tries
    • 26.1 Introduction to Tries
    • 26.2 Trie Implementation
    • 26.3 Trie String Operations
    • 26.4 Summary
    • 26.5 Exercises
  • 27. Software Engineering I
    • 27.1 Introduction to Software Engineering
    • 27.2 Complexity
    • 27.3 Strategic vs Tactical Programming
    • 27.4 Real World Examples
    • 27.5 Summary, Exercises
  • 28. Reductions and Decomposition
    • 28.1 Topological Sorts and DAGs
    • 28.2 Shortest Paths on DAGs
    • 28.3 Longest Path
    • 28.4 Reductions and Decomposition
    • 28.5 Exercises
  • 29. Basic Sorts
    • 29.1 The Sorting Problem
    • 29.2 Selection Sort & Heapsort
    • 29.3 Mergesort
    • 29.4 Insertion Sort
    • 29.5 Summary
    • 29.6 Exercises
  • 30. Quicksort
    • 30.1 Partitioning
    • 30.2 Quicksort Algorithm
    • 30.3 Quicksort Performance Caveats
    • 30.4 Summary
    • 30.5 Exercises
  • 31. Software Engineering II
    • 31.1 Complexity II
    • 31.2 Sources of Complexity
    • 31.3 Modular Design
    • 31.4 Teamwork
    • 31.5 Exerises
  • 32. More Quick Sort, Sorting Summary
    • 32.1 Quicksort Flavors vs. MergeSort
    • 32.2 Quick Select
    • 32.3 Stability, Adaptiveness, and Optimization
    • 32.4 Summary
    • 32.5 Exercises
  • 33. Software Engineering III
    • 33.1 Candy Crush, SnapChat, and Friends
    • 33.2 The Ledger of Harms
    • 33.3 Your Life
    • 33.4 Summary
    • 33.5 Exercises
  • 34. Sorting and Algorithmic Bounds
    • 34.1 Sorting Summary
    • 34.2 Math Problems Out of Nowhere
    • 34.3 Theoretical Bounds on Sorting
    • 34.4 Summary
    • 34.5 Exercises
  • 35. Radix Sorts
    • 35.1 Counting Sort
    • 35.2 LSD Radix Sort
    • 35.3 MSD Radix Sort
    • 35.4 Summary
    • 35.5 Exercises
  • 36. Sorting and Data Structures Conclusion
    • 36.1 Radix vs. Comparison Sorting
    • 36.2 The Just-In-Time Compiler
    • 36.3 Radix Sorting Integers
    • 36.4 Summary
    • 36.5 Exercises
  • 37. Software Engineering IV
    • 37.1 The end is near
  • 38. Compression and Complexity
    • 38.1 Introduction to Compression
    • 38.2 Prefix-free Codes
    • 38.3 Shannon-Fano Codes
    • 38.4 Huffman Coding Conceptuals
    • 38.5 Compression Theory
    • 38.6 LZW Compression
    • 38.7 Summary
    • 38.8 Exercises
  • 39. Compression, Complexity, P = NP
    • 39.1 Models of Compression
    • 39.2 Optimal Compression, Kolmogorov Complexity
    • 39.3 Space/Time-Bounded Compression
    • 39.4 P = NP
    • 39.5 Exercises
Powered by GitBook
On this page
  1. 15. Asymptotics II

15.3 Binary Search

hi-lo!

Previous15.2 RecursionNext15.4 Mergesort

Last updated 2 years ago

Binary Search Introduction

Binary search is a nice way of searching a list for a particular item. It requires the list to be in sorted order and uses that fact to find an element quickly.

To do a binary search, we start in the middle of the list, and check if that's our desired element. If not, we ask: is this element bigger or smaller than our element?

If it's bigger, then we know we only have to look at the half of the list with smaller elements. If it's too small, then we only look at the half with bigger elements. In this way, we can cut in half the number of options we have left at each step, until we find our target element.

What's the worst possible case? When the element we want isn't in the list at all. Then we will make comparisons until we've eliminated all regions of the list, and there are no more bigger or smaller halves left.

What's the intuitive runtime of binary search? Take a minute and use the tools you know to consider this.

We start with NNN options, then N/2N/2N/2, then N/4N/4N/4 ... until we have just 1. Each time, we cut the array in half, so in the end we must perform a total of log2(N)log_{2}(N)log2​(N)operations. Each of the log2(N)log_{2}(N)log2​(N) operations, eg. finding the middle element and comparing with it, takes constant time. So the overall runtime then is order log2(N)log_{2}(N)log2​(N).

It's important to note, however that each step doesn't cut it exactly in half. If the array is of even length, and there is no 'middle', we have to take either a smaller or a larger portion. But this is a good intuitive approach.

We'll do a precise way next.

To precisely calculate the runtime of binary search, we'll count the number of operations, just as we've done previously.

First, we define our cost model: let's use the number of recursive binary search calls. Since the number of operations inside each call is constant, the number of calls will be the only thing varying based on the size of the input, so it's a good cost model.

Like we've seen before, let's do some example counts for specific NNN. As an exercise, try to fill this table in before continuing:

N
1
2
3
4
5
6
7
8
9
10
11
12
13

Count

Alright, here's the result:

N
1
2
3
4
5
6
7
8
9
10
11
12
13

Count

1

2

2

3

3

3

3

4

4

4

4

4

4

These seems to support our intuition above of log2(N)log_{2}(N)log2​(N). We can see that the count seems to increase by one only when NNN hits a power of 2.

...but we can be even more precise: C(N)=⌊log2(N)⌋+1C(N)=\lfloor log_{2}(N)\rfloor +1C(N)=⌊log2​(N)⌋+1 (These L-shaped bars are the "floor" function, which is the result of the expression rounded down to the nearest integer.)

A couple properties worth knowing (see below for proofs):

  • ⌊f(N)⌋=Θ(f(N))\lfloor f(N) \rfloor=\Theta(f(N))⌊f(N)⌋=Θ(f(N))

  • ⌈f(N)⌉=Θ(f(N))\lceil f(N) \rceil=\Theta(f(N))⌈f(N)⌉=Θ(f(N))

  • logp(N)=Θ(logq(N))log_{p}(N) = \Theta(log_{q}(N))logp​(N)=Θ(logq​(N))

The last one essentially states that for logarithmic runtimes, the base of the logarithm doesn't matter at all, because they are all equivalent in terms of Big-O (this can be seen by applying the logarithm change of base). Applying these simplifications, we see that Θ(⌊log2(N)⌋=Θ(log(N))\Theta(\lfloor log_{2}(N)\rfloor=\Theta(log(N))Θ(⌊log2​(N)⌋=Θ(log(N)) just as we expected from our intuition.

Example Proof: Prove ⌊f(N)⌋=Θ(f(N))\lfloor f(N) \rfloor=\Theta(f(N))⌊f(N)⌋=Θ(f(N))

Solution:

We start with the following inequality:

f(N)−12<f(N)≤⌊f(N)+12⌋≤f(N)+12f(N) - \frac{1}{2}<f(N)\le \lfloor f(N) + \frac{1}{2} \rfloor \le f(N) + \frac{1}{2}f(N)−21​<f(N)≤⌊f(N)+21​⌋≤f(N)+21​

Simplifying f(N)−12f(N)-\frac{1}{2}f(N)−21​ and f(N)+12f(N)+\frac{1}{2}f(N)+21​ according to our big theta rules by dropping the constants, we see that they are of order f(N)f(N)f(N). Therefore ⌊f(N)+12⌋\lfloor f(N) + \frac{1}{2} \rfloor⌊f(N)+21​⌋ is bounded by two expressions of order f(N)f(N)f(N), and is therefore also Θ(f(N))\Theta(f(N))Θ(f(N)).

Exercise:

  • Prove ⌈f(N)⌉=Θ(f(N))\lceil f(N) \rceil=\Theta(f(N))⌈f(N)⌉=Θ(f(N))

  • Prove logp(N)=Θ(logq(N))log_{p}(N) = \Theta(log_{q}(N))logp​(N)=Θ(logq​(N))

One cool fact to wrap up with: Log time is super good! It's almost as fast as constant time, and way better than linear time. This is why we like binary search, rather than stepping one by one through our list and looking for the right thing.

To show this concretely:

Typical runtime (nanoseconds)

100

6.6

1

100,000

16.6

2.5

100,000,000

26.5

4

100,000,000,000

36.5

5.5

100,000,000,000,000

46.5

7

The input size increase by a factor of 1 trillion but the runtime only inceased by a factor of about 7! This shows how remarkable log(N)log(N)log(N) runtime is especially for large size inputs.

For an animation of binary search, see

NNN
log2(N)log_{2}(N)log2​(N)
these slides.
Getting Familiar with Binary Search
A more precise analysis of Binary Search runtime