Pagefy

Pagefy

Back to Data Structures and Algorithms

Quicksort

Introduction to Algorithms

Chapter 7: Quicksort

Quicksort has a worst-case running time of Θ(n²) but is often the best practical choice for sorting because:

  • Expected running time is Θ(n lg n) with very small constant factors
  • Sorts in place (no auxiliary array needed)
  • Works well in virtual-memory environments

7.1 Description of Quicksort

Quicksort applies the divide-and-conquer paradigm to sort a subarray A[p..r]:

  1. Divide: Partition A[p..r] into two (possibly empty) subarrays A[p..q-1] and A[q+1..r] such that:
    • Every element in A[p..q-1]A[q]
    • A[q] ≤ every element in A[q+1..r]
    • Compute index q as part of partitioning
  2. Conquer: Recursively sort A[p..q-1] and A[q+1..r]
  3. Combine: No work needed — the array is already sorted after the recursive calls
QUICKSORT(A, p, r)
1  if p < r
2      q = PARTITION(A, p, r)
3      QUICKSORT(A, p, q - 1)
4      QUICKSORT(A, q + 1, r)

To sort an entire array: QUICKSORT(A, 1, A.length)

Partitioning the Array

The PARTITION procedure is the key to the algorithm. It rearranges A[p..r] in place, selecting x = A[r] as the pivot element.

PARTITION(A, p, r)
1  x = A[r]                  // pivot
2  i = p - 1
3  for j = p to r - 1
4      if A[j] ≤ x
5          i = i + 1
6          exchange A[i] with A[j]
7  exchange A[i + 1] with A[r]
8  return i + 1

How it works: PARTITION maintains four regions in the array:

RegionRangeProperty
1A[p..i]Elements ≤ pivot
2A[i+1..j-1]Elements > pivot
3A[j..r-1]Unrestricted (not yet examined)
4A[r]Pivot element x

Loop Invariant

At the beginning of each iteration of the for loop (lines 3–6), for any index k:

  1. If p ≤ k ≤ i, then A[k] ≤ x
  2. If i + 1 ≤ k ≤ j - 1, then A[k] > x
  3. If k = r, then A[k] = x

Proof of correctness:

  • Initialization: Before the first iteration, i = p - 1 and j = p. The first two conditions hold trivially (empty ranges). Line 1 satisfies the third condition.
  • Maintenance: Two cases per iteration:
    • If A[j] > x: just increment j; the element joins the "greater than" region
    • If A[j] ≤ x: increment i, swap A[i] with A[j], then increment j; the element joins the "less than or equal" region
  • Termination: When j = r, every element is in one of the three sets. The final swap (line 7) places the pivot between the two partitions.

Running time of PARTITION on a subarray of size n = r - p + 1 is Θ(n).

After line 2 of QUICKSORT, A[q] is strictly less than every element of A[q+1..r].


7.2 Performance of Quicksort

The running time depends on whether partitioning is balanced or unbalanced.

Worst-Case Partitioning

Occurs when PARTITION produces one subproblem with n - 1 elements and one with 0 elements at every level.

Recurrence:

T(n) = T(n - 1) + T(0) + Θ(n)
     = T(n - 1) + Θ(n)
  • Summing costs at each level gives an arithmetic seriesT(n) = Θ(n²)
  • This happens when the input is already sorted — a case where insertion sort runs in O(n)

Best-Case Partitioning

Occurs when PARTITION produces two subproblems each of size ≤ n/2.

Recurrence:

T(n) = 2T(n/2) + Θ(n)
  • By case 2 of the Master TheoremT(n) = Θ(n lg n)

Balanced Partitioning

The average case is much closer to the best case than the worst case.

  • Even a 9-to-1 split at every level yields O(n lg n):

    T(n) = T(9n/10) + T(n/10) + cn
    
    • Every level of the recursion tree has cost cn
    • Recursion terminates at depth log₁₀ n = Θ(lg n) (shortest path) and log₁₀/₉ n = Θ(lg n) (longest path)
    • Total cost: O(n lg n)
  • Even a 99-to-1 split yields O(n lg n)

  • Any split of constant proportionality yields a recursion tree of depth Θ(lg n) with O(n) work per level → O(n lg n)

Intuition for the Average Case

  • Assume all permutations of input are equally likely
  • In practice, some splits are well-balanced, some are unbalanced
  • ~80% of the time, PARTITION produces a split more balanced than 9-to-1
  • ~20% of the time, the split is less balanced than 9-to-1

Key insight: A "bad" split followed by a "good" split is no worse than a single balanced split:

  • Bad split at root: subarrays of size 0 and n - 1, cost = n
  • Good split at next level: subarrays of size (n-1)/2 - 1 and (n-1)/2, cost = n - 1
  • Combined cost: Θ(n) + Θ(n-1) = Θ(n) — same as a single balanced split producing two subarrays of size (n-1)/2
  • The cost of the bad split is absorbed into the cost of the good split

7.3 A Randomized Version of Quicksort

We cannot always assume all permutations are equally likely. Random sampling provides good expected performance over all inputs.

Instead of always using A[r] as the pivot, select a randomly chosen element from A[p..r] by swapping A[r] with a random element before partitioning.

RANDOMIZED-PARTITION(A, p, r)
1  i = RANDOM(p, r)
2  exchange A[r] with A[i]
3  return PARTITION(A, p, r)
RANDOMIZED-QUICKSORT(A, p, r)
1  if p < r
2      q = RANDOMIZED-PARTITION(A, p, r)
3      RANDOMIZED-QUICKSORT(A, p, q - 1)
4      RANDOMIZED-QUICKSORT(A, q + 1, r)

Why this works:

  • The pivot x = A[r] is equally likely to be any of the r - p + 1 elements in the subarray
  • We expect the split to be reasonably well balanced on average
  • No particular input elicits worst-case behavior

7.4 Analysis of Quicksort

7.4.1 Worst-Case Analysis

Goal: Prove the worst-case running time is Θ(n²).

Recurrence:

T(n) = max₀≤q≤n₋₁ (T(q) + T(n - q - 1)) + Θ(n)       (7.1)

The parameter q ranges from 0 to n - 1 (PARTITION produces two subproblems with total size n - 1).

Upper bound proof (substitution method):

  • Guess: T(n) ≤ cn²

  • Substitute into recurrence:

    T(n) ≤ max₀≤q≤n₋₁ (cq² + c(n - q - 1)²) + Θ(n)
         = c · max₀≤q≤n₋₁ (q² + (n - q - 1)²) + Θ(n)
    
  • The expression q² + (n - q - 1)² achieves its maximum at the endpoints q = 0 or q = n - 1 (second derivative w.r.t. q is positive)

  • This gives: max ≤ (n - 1)² = n² - 2n + 1

  • Therefore:

    T(n) ≤ cn² - c(2n - 1) + Θ(n) ≤ cn²
    

    (choosing c large enough so c(2n - 1) dominates the Θ(n) term)

  • Combined with the Ω(n²) lower bound from the unbalanced partitioning case → T(n) = Θ(n²)

7.4.2 Expected Running Time

Assumption: All element values are distinct.

Running Time and Comparisons

Key observations:

  • Each pivot element is never included in future recursive calls → at most n calls to PARTITION
  • Each call to PARTITION takes O(1) time plus time proportional to the number of comparisons (line 4 of PARTITION)

Lemma 7.1: Let X = total number of comparisons in line 4 of PARTITION over the entire execution. Then the running time of QUICKSORT is O(n + X).

Computing Expected Comparisons

Rename elements as z₁, z₂, ..., zₙ where zᵢ is the i-th smallest element. Define Zᵢⱼ = {zᵢ, zᵢ₊₁, ..., zⱼ}.

Key fact: Each pair of elements is compared at most once (elements are only compared to the pivot, and a pivot is never compared again).

Define indicator random variable:

Xᵢⱼ = I{zᵢ is compared to zⱼ}

Total comparisons:

X = Σᵢ₌₁ⁿ⁻¹ Σⱼ₌ᵢ₊₁ⁿ Xᵢⱼ

By linearity of expectation:

E[X] = Σᵢ₌₁ⁿ⁻¹ Σⱼ₌ᵢ₊₁ⁿ Pr{zᵢ is compared to zⱼ}

When Are Two Elements Compared?

  • zᵢ and zⱼ are compared if and only if the first element chosen as a pivot from Zᵢⱼ is either zᵢ or zⱼ
  • If some other element x with zᵢ < x < zⱼ is chosen as pivot first, then zᵢ and zⱼ are separated into different partitions and never compared

Example: For input 1–10, if pivot 7 is chosen first:

  • {1,2,3,4,5,6} and {8,9,10} are separated
  • 2 and 9 will never be compared (7 was the first pivot from Z₂,₉)
  • 7 and 9 are compared (7 is the first pivot from Z₇,₉)

Computing the Probability

Since Zᵢⱼ has j - i + 1 elements and pivots are chosen randomly:

Pr{zᵢ is compared to zⱼ} = Pr{zᵢ or zⱼ is first pivot from Zᵢⱼ}
                          = 1/(j - i + 1) + 1/(j - i + 1)
                          = 2/(j - i + 1)

Final Bound

E[X] = Σᵢ₌₁ⁿ⁻¹ Σⱼ₌ᵢ₊₁ⁿ 2/(j - i + 1)

Substituting k = j - i:

E[X] = Σᵢ₌₁ⁿ⁻¹ Σₖ₌₁ⁿ⁻ⁱ 2/(k + 1)
     < Σᵢ₌₁ⁿ⁻¹ Σₖ₌₁ⁿ 2/k
     = Σᵢ₌₁ⁿ⁻¹ O(lg n)
     = O(n lg n)

(Using the bound on the harmonic series: Σₖ₌₁ⁿ 1/k = O(lg n))

Conclusion: Using RANDOMIZED-PARTITION, the expected running time of quicksort is O(n lg n) when element values are distinct. Combined with the Θ(n lg n) best-case lower bound, the expected running time is Θ(n lg n).


Summary

PropertyValue
Worst-case running timeΘ(n²)
Best-case running timeΘ(n lg n)
Expected running time (randomized)Θ(n lg n)
In-placeYes
StableNo
Partition timeΘ(n)

Key takeaways:

  • Worst case occurs with maximally unbalanced partitions (e.g., already sorted input)
  • Any constant-proportion split yields O(n lg n) — the algorithm is robust
  • Randomization eliminates dependence on input ordering
  • Expected number of comparisons: O(n lg n), derived via indicator random variables and linearity of expectation
  • The PARTITION procedure by Lomuto selects the last element as pivot; Hoare's original version selects the first

Chapter Notes

  • Quicksort was invented by C.A.R. Hoare; his original partition appears in Problem 7-1
  • The PARTITION procedure in Section 7.1 is due to N. Lomuto
  • The analysis in Section 7.4 is due to Avrim Blum
  • McIlroy showed how to engineer a "killer adversary" that forces Θ(n²) on virtually any quicksort implementation

Previous chapter

Heapsort