Quicksort
Introduction to AlgorithmsChapter 7: Quicksort
Quicksort has a worst-case running time of Θ(n²) but is often the best practical choice for sorting because:
- Expected running time is Θ(n lg n) with very small constant factors
- Sorts in place (no auxiliary array needed)
- Works well in virtual-memory environments
7.1 Description of Quicksort
Quicksort applies the divide-and-conquer paradigm to sort a subarray A[p..r]:
- Divide: Partition
A[p..r]into two (possibly empty) subarraysA[p..q-1]andA[q+1..r]such that:- Every element in
A[p..q-1]≤A[q] A[q]≤ every element inA[q+1..r]- Compute index
qas part of partitioning
- Every element in
- Conquer: Recursively sort
A[p..q-1]andA[q+1..r] - Combine: No work needed — the array is already sorted after the recursive calls
QUICKSORT(A, p, r)
1 if p < r
2 q = PARTITION(A, p, r)
3 QUICKSORT(A, p, q - 1)
4 QUICKSORT(A, q + 1, r)
To sort an entire array: QUICKSORT(A, 1, A.length)
Partitioning the Array
The PARTITION procedure is the key to the algorithm. It rearranges A[p..r] in place, selecting x = A[r] as the pivot element.
PARTITION(A, p, r)
1 x = A[r] // pivot
2 i = p - 1
3 for j = p to r - 1
4 if A[j] ≤ x
5 i = i + 1
6 exchange A[i] with A[j]
7 exchange A[i + 1] with A[r]
8 return i + 1
How it works: PARTITION maintains four regions in the array:
| Region | Range | Property |
|---|---|---|
| 1 | A[p..i] | Elements ≤ pivot |
| 2 | A[i+1..j-1] | Elements > pivot |
| 3 | A[j..r-1] | Unrestricted (not yet examined) |
| 4 | A[r] | Pivot element x |
Loop Invariant
At the beginning of each iteration of the for loop (lines 3–6), for any index k:
- If
p ≤ k ≤ i, thenA[k] ≤ x - If
i + 1 ≤ k ≤ j - 1, thenA[k] > x - If
k = r, thenA[k] = x
Proof of correctness:
- Initialization: Before the first iteration,
i = p - 1andj = p. The first two conditions hold trivially (empty ranges). Line 1 satisfies the third condition. - Maintenance: Two cases per iteration:
- If
A[j] > x: just incrementj; the element joins the "greater than" region - If
A[j] ≤ x: incrementi, swapA[i]withA[j], then incrementj; the element joins the "less than or equal" region
- If
- Termination: When
j = r, every element is in one of the three sets. The final swap (line 7) places the pivot between the two partitions.
Running time of PARTITION on a subarray of size n = r - p + 1 is Θ(n).
After line 2 of QUICKSORT, A[q] is strictly less than every element of A[q+1..r].
7.2 Performance of Quicksort
The running time depends on whether partitioning is balanced or unbalanced.
Worst-Case Partitioning
Occurs when PARTITION produces one subproblem with n - 1 elements and one with 0 elements at every level.
Recurrence:
T(n) = T(n - 1) + T(0) + Θ(n)
= T(n - 1) + Θ(n)
- Summing costs at each level gives an arithmetic series → T(n) = Θ(n²)
- This happens when the input is already sorted — a case where insertion sort runs in O(n)
Best-Case Partitioning
Occurs when PARTITION produces two subproblems each of size ≤ n/2.
Recurrence:
T(n) = 2T(n/2) + Θ(n)
- By case 2 of the Master Theorem → T(n) = Θ(n lg n)
Balanced Partitioning
The average case is much closer to the best case than the worst case.
-
Even a 9-to-1 split at every level yields O(n lg n):
T(n) = T(9n/10) + T(n/10) + cn- Every level of the recursion tree has cost
cn - Recursion terminates at depth
log₁₀ n = Θ(lg n)(shortest path) andlog₁₀/₉ n = Θ(lg n)(longest path) - Total cost: O(n lg n)
- Every level of the recursion tree has cost
-
Even a 99-to-1 split yields O(n lg n)
-
Any split of constant proportionality yields a recursion tree of depth Θ(lg n) with O(n) work per level → O(n lg n)
Intuition for the Average Case
- Assume all permutations of input are equally likely
- In practice, some splits are well-balanced, some are unbalanced
- ~80% of the time, PARTITION produces a split more balanced than 9-to-1
- ~20% of the time, the split is less balanced than 9-to-1
Key insight: A "bad" split followed by a "good" split is no worse than a single balanced split:
- Bad split at root: subarrays of size 0 and n - 1, cost = n
- Good split at next level: subarrays of size (n-1)/2 - 1 and (n-1)/2, cost = n - 1
- Combined cost: Θ(n) + Θ(n-1) = Θ(n) — same as a single balanced split producing two subarrays of size (n-1)/2
- The cost of the bad split is absorbed into the cost of the good split
7.3 A Randomized Version of Quicksort
We cannot always assume all permutations are equally likely. Random sampling provides good expected performance over all inputs.
Instead of always using A[r] as the pivot, select a randomly chosen element from A[p..r] by swapping A[r] with a random element before partitioning.
RANDOMIZED-PARTITION(A, p, r)
1 i = RANDOM(p, r)
2 exchange A[r] with A[i]
3 return PARTITION(A, p, r)
RANDOMIZED-QUICKSORT(A, p, r)
1 if p < r
2 q = RANDOMIZED-PARTITION(A, p, r)
3 RANDOMIZED-QUICKSORT(A, p, q - 1)
4 RANDOMIZED-QUICKSORT(A, q + 1, r)
Why this works:
- The pivot
x = A[r]is equally likely to be any of ther - p + 1elements in the subarray - We expect the split to be reasonably well balanced on average
- No particular input elicits worst-case behavior
7.4 Analysis of Quicksort
7.4.1 Worst-Case Analysis
Goal: Prove the worst-case running time is Θ(n²).
Recurrence:
T(n) = max₀≤q≤n₋₁ (T(q) + T(n - q - 1)) + Θ(n) (7.1)
The parameter q ranges from 0 to n - 1 (PARTITION produces two subproblems with total size n - 1).
Upper bound proof (substitution method):
-
Guess:
T(n) ≤ cn² -
Substitute into recurrence:
T(n) ≤ max₀≤q≤n₋₁ (cq² + c(n - q - 1)²) + Θ(n) = c · max₀≤q≤n₋₁ (q² + (n - q - 1)²) + Θ(n) -
The expression
q² + (n - q - 1)²achieves its maximum at the endpointsq = 0orq = n - 1(second derivative w.r.t. q is positive) -
This gives:
max ≤ (n - 1)² = n² - 2n + 1 -
Therefore:
T(n) ≤ cn² - c(2n - 1) + Θ(n) ≤ cn²(choosing
clarge enough soc(2n - 1)dominates theΘ(n)term) -
Combined with the Ω(n²) lower bound from the unbalanced partitioning case → T(n) = Θ(n²)
7.4.2 Expected Running Time
Assumption: All element values are distinct.
Running Time and Comparisons
Key observations:
- Each pivot element is never included in future recursive calls → at most n calls to PARTITION
- Each call to PARTITION takes O(1) time plus time proportional to the number of comparisons (line 4 of PARTITION)
Lemma 7.1: Let X = total number of comparisons in line 4 of PARTITION over the entire execution. Then the running time of QUICKSORT is O(n + X).
Computing Expected Comparisons
Rename elements as z₁, z₂, ..., zₙ where zᵢ is the i-th smallest element. Define Zᵢⱼ = {zᵢ, zᵢ₊₁, ..., zⱼ}.
Key fact: Each pair of elements is compared at most once (elements are only compared to the pivot, and a pivot is never compared again).
Define indicator random variable:
Xᵢⱼ = I{zᵢ is compared to zⱼ}
Total comparisons:
X = Σᵢ₌₁ⁿ⁻¹ Σⱼ₌ᵢ₊₁ⁿ Xᵢⱼ
By linearity of expectation:
E[X] = Σᵢ₌₁ⁿ⁻¹ Σⱼ₌ᵢ₊₁ⁿ Pr{zᵢ is compared to zⱼ}
When Are Two Elements Compared?
zᵢandzⱼare compared if and only if the first element chosen as a pivot fromZᵢⱼis eitherzᵢorzⱼ- If some other element
xwithzᵢ < x < zⱼis chosen as pivot first, thenzᵢandzⱼare separated into different partitions and never compared
Example: For input 1–10, if pivot 7 is chosen first:
- {1,2,3,4,5,6} and {8,9,10} are separated
- 2 and 9 will never be compared (7 was the first pivot from Z₂,₉)
- 7 and 9 are compared (7 is the first pivot from Z₇,₉)
Computing the Probability
Since Zᵢⱼ has j - i + 1 elements and pivots are chosen randomly:
Pr{zᵢ is compared to zⱼ} = Pr{zᵢ or zⱼ is first pivot from Zᵢⱼ}
= 1/(j - i + 1) + 1/(j - i + 1)
= 2/(j - i + 1)
Final Bound
E[X] = Σᵢ₌₁ⁿ⁻¹ Σⱼ₌ᵢ₊₁ⁿ 2/(j - i + 1)
Substituting k = j - i:
E[X] = Σᵢ₌₁ⁿ⁻¹ Σₖ₌₁ⁿ⁻ⁱ 2/(k + 1)
< Σᵢ₌₁ⁿ⁻¹ Σₖ₌₁ⁿ 2/k
= Σᵢ₌₁ⁿ⁻¹ O(lg n)
= O(n lg n)
(Using the bound on the harmonic series: Σₖ₌₁ⁿ 1/k = O(lg n))
Conclusion: Using RANDOMIZED-PARTITION, the expected running time of quicksort is O(n lg n) when element values are distinct. Combined with the Θ(n lg n) best-case lower bound, the expected running time is Θ(n lg n).
Summary
| Property | Value |
|---|---|
| Worst-case running time | Θ(n²) |
| Best-case running time | Θ(n lg n) |
| Expected running time (randomized) | Θ(n lg n) |
| In-place | Yes |
| Stable | No |
| Partition time | Θ(n) |
Key takeaways:
- Worst case occurs with maximally unbalanced partitions (e.g., already sorted input)
- Any constant-proportion split yields O(n lg n) — the algorithm is robust
- Randomization eliminates dependence on input ordering
- Expected number of comparisons: O(n lg n), derived via indicator random variables and linearity of expectation
- The PARTITION procedure by Lomuto selects the last element as pivot; Hoare's original version selects the first
Chapter Notes
- Quicksort was invented by C.A.R. Hoare; his original partition appears in Problem 7-1
- The PARTITION procedure in Section 7.1 is due to N. Lomuto
- The analysis in Section 7.4 is due to Avrim Blum
- McIlroy showed how to engineer a "killer adversary" that forces Θ(n²) on virtually any quicksort implementation
Previous chapter
HeapsortNext chapter
Sorting in Linear Time