Pagefy

Book

Introduction to Algorithms

1.The Role of Algorithms in Computing 2.Getting Started 3.Growth of Functions 4.Divide and Conquer 5.Probabilistic Analysis and Randomized Algorithms 6.Heapsort 7.Quicksort 8.Sorting in Linear Time 9.Medians and Order Statistics 10.Elementary Data Structures 11.Hash Tables 12.Binary Search Trees 13.Red Black Trees 14.Augmenting Data Structures 15.Dynamic Programming 16.Greedy Algorithms 17.Amortized Analysis 18.B Trees 19.Fibonacci Heaps 20.van Emde Boas Trees 21.Data Structures for Disjoint Sets 22.Elementary Graph Algorithms 23.Minimum Spanning Trees 24.Single Source Shortest Paths 25.All Pairs Shortest Paths 26.Maximum Flow 27.Multithreaded Algorithms 28.Matrix Operations 29.Linear Programming 30.Polynomials and the FFT 31.Number Theoretic Algorithms 32.String Matching 33.Computational Geometry 34.NP Completeness 35.Approximation Algorithms 36.Appendix A Summations 37.Appendix B Sets Etc 38.Appendix C Counting and Probability 39.Appendix D Matrices

Quicksort

Introduction to AlgorithmsBuy the book

Chapter 7: Quicksort

Quicksort has a worst-case running time of Θ(n²) but is often the best practical choice for sorting because:

Expected running time is Θ(n lg n) with very small constant factors
Sorts in place (no auxiliary array needed)
Works well in virtual-memory environments

7.1 Description of Quicksort

Quicksort applies the divide-and-conquer paradigm to sort a subarray A[p..r]:

Divide: Partition A[p..r] into two (possibly empty) subarrays A[p..q-1] and A[q+1..r] such that:
- Every element in A[p..q-1] ≤ A[q]
- A[q] ≤ every element in A[q+1..r]
- Compute index q as part of partitioning
Conquer: Recursively sort A[p..q-1] and A[q+1..r]
Combine: No work needed — the array is already sorted after the recursive calls

QUICKSORT(A, p, r)
1  if p < r
2      q = PARTITION(A, p, r)
3      QUICKSORT(A, p, q - 1)
4      QUICKSORT(A, q + 1, r)

To sort an entire array: QUICKSORT(A, 1, A.length)

Partitioning the Array

The PARTITION procedure is the key to the algorithm. It rearranges A[p..r] in place, selecting x = A[r] as the pivot element.

PARTITION(A, p, r)
1  x = A[r]                  // pivot
2  i = p - 1
3  for j = p to r - 1
4      if A[j] ≤ x
5          i = i + 1
6          exchange A[i] with A[j]
7  exchange A[i + 1] with A[r]
8  return i + 1

How it works: PARTITION maintains four regions in the array:

Region	Range	Property
1	`A[p..i]`	Elements ≤ pivot
2	`A[i+1..j-1]`	Elements > pivot
3	`A[j..r-1]`	Unrestricted (not yet examined)
4	`A[r]`	Pivot element `x`

Loop Invariant

At the beginning of each iteration of the for loop (lines 3–6), for any index k:

If p ≤ k ≤ i, then A[k] ≤ x
If i + 1 ≤ k ≤ j - 1, then A[k] > x
If k = r, then A[k] = x

Proof of correctness:

Initialization: Before the first iteration, i = p - 1 and j = p. The first two conditions hold trivially (empty ranges). Line 1 satisfies the third condition.
Maintenance: Two cases per iteration:
- If A[j] > x: just increment j; the element joins the "greater than" region
- If A[j] ≤ x: increment i, swap A[i] with A[j], then increment j; the element joins the "less than or equal" region
Termination: When j = r, every element is in one of the three sets. The final swap (line 7) places the pivot between the two partitions.

Running time of PARTITION on a subarray of size n = r - p + 1 is Θ(n).

After line 2 of QUICKSORT, A[q] is strictly less than every element of A[q+1..r].

7.2 Performance of Quicksort

The running time depends on whether partitioning is balanced or unbalanced.

Worst-Case Partitioning

Occurs when PARTITION produces one subproblem with n - 1 elements and one with 0 elements at every level.

Recurrence:

T(n) = T(n - 1) + T(0) + Θ(n)
     = T(n - 1) + Θ(n)

Summing costs at each level gives an arithmetic series → T(n) = Θ(n²)
This happens when the input is already sorted — a case where insertion sort runs in O(n)

Best-Case Partitioning

Occurs when PARTITION produces two subproblems each of size ≤ n/2.

Recurrence:

T(n) = 2T(n/2) + Θ(n)

By case 2 of the Master Theorem → T(n) = Θ(n lg n)

Balanced Partitioning

The average case is much closer to the best case than the worst case.

Even a 9-to-1 split at every level yields O(n lg n):
```
T(n) = T(9n/10) + T(n/10) + cn
```
- Every level of the recursion tree has cost cn
- Recursion terminates at depth log₁₀ n = Θ(lg n) (shortest path) and log₁₀/₉ n = Θ(lg n) (longest path)
- Total cost: O(n lg n)
Even a 99-to-1 split yields O(n lg n)
Any split of constant proportionality yields a recursion tree of depth Θ(lg n) with O(n) work per level → O(n lg n)

Intuition for the Average Case

Assume all permutations of input are equally likely
In practice, some splits are well-balanced, some are unbalanced
~80% of the time, PARTITION produces a split more balanced than 9-to-1
~20% of the time, the split is less balanced than 9-to-1

Key insight: A "bad" split followed by a "good" split is no worse than a single balanced split:

Bad split at root: subarrays of size 0 and n - 1, cost = n
Good split at next level: subarrays of size (n-1)/2 - 1 and (n-1)/2, cost = n - 1
Combined cost: Θ(n) + Θ(n-1) = Θ(n) — same as a single balanced split producing two subarrays of size (n-1)/2
The cost of the bad split is absorbed into the cost of the good split

7.3 A Randomized Version of Quicksort

We cannot always assume all permutations are equally likely. Random sampling provides good expected performance over all inputs.

Instead of always using A[r] as the pivot, select a randomly chosen element from A[p..r] by swapping A[r] with a random element before partitioning.

RANDOMIZED-PARTITION(A, p, r)
1  i = RANDOM(p, r)
2  exchange A[r] with A[i]
3  return PARTITION(A, p, r)

RANDOMIZED-QUICKSORT(A, p, r)
1  if p < r
2      q = RANDOMIZED-PARTITION(A, p, r)
3      RANDOMIZED-QUICKSORT(A, p, q - 1)
4      RANDOMIZED-QUICKSORT(A, q + 1, r)

Why this works:

The pivot x = A[r] is equally likely to be any of the r - p + 1 elements in the subarray
We expect the split to be reasonably well balanced on average
No particular input elicits worst-case behavior

7.4 Analysis of Quicksort

7.4.1 Worst-Case Analysis

Goal: Prove the worst-case running time is Θ(n²).

Recurrence:

T(n) = max₀≤q≤n₋₁ (T(q) + T(n - q - 1)) + Θ(n)       (7.1)

The parameter q ranges from 0 to n - 1 (PARTITION produces two subproblems with total size n - 1).

Upper bound proof (substitution method):

Guess: T(n) ≤ cn²

Substitute into recurrence:

T(n) ≤ max₀≤q≤n₋₁ (cq² + c(n - q - 1)²) + Θ(n)
     = c · max₀≤q≤n₋₁ (q² + (n - q - 1)²) + Θ(n)

The expression q² + (n - q - 1)² achieves its maximum at the endpoints q = 0 or q = n - 1 (second derivative w.r.t. q is positive)
This gives: max ≤ (n - 1)² = n² - 2n + 1
Therefore:
```
T(n) ≤ cn² - c(2n - 1) + Θ(n) ≤ cn²
```
(choosing c large enough so c(2n - 1) dominates the Θ(n) term)
Combined with the Ω(n²) lower bound from the unbalanced partitioning case → T(n) = Θ(n²)

7.4.2 Expected Running Time

Assumption: All element values are distinct.

Running Time and Comparisons

Key observations:

Each pivot element is never included in future recursive calls → at most n calls to PARTITION
Each call to PARTITION takes O(1) time plus time proportional to the number of comparisons (line 4 of PARTITION)

Lemma 7.1: Let X = total number of comparisons in line 4 of PARTITION over the entire execution. Then the running time of QUICKSORT is O(n + X).

Computing Expected Comparisons

Rename elements as z₁, z₂, ..., zₙ where zᵢ is the i-th smallest element. Define Zᵢⱼ = {zᵢ, zᵢ₊₁, ..., zⱼ}.

Key fact: Each pair of elements is compared at most once (elements are only compared to the pivot, and a pivot is never compared again).

Define indicator random variable:

Xᵢⱼ = I{zᵢ is compared to zⱼ}

Total comparisons:

X = Σᵢ₌₁ⁿ⁻¹ Σⱼ₌ᵢ₊₁ⁿ Xᵢⱼ

By linearity of expectation:

E[X] = Σᵢ₌₁ⁿ⁻¹ Σⱼ₌ᵢ₊₁ⁿ Pr{zᵢ is compared to zⱼ}

When Are Two Elements Compared?

zᵢ and zⱼ are compared if and only if the first element chosen as a pivot from Zᵢⱼ is either zᵢ or zⱼ
If some other element x with zᵢ < x < zⱼ is chosen as pivot first, then zᵢ and zⱼ are separated into different partitions and never compared

Example: For input 1–10, if pivot 7 is chosen first:

{1,2,3,4,5,6} and {8,9,10} are separated
2 and 9 will never be compared (7 was the first pivot from Z₂,₉)
7 and 9 are compared (7 is the first pivot from Z₇,₉)

Computing the Probability

Since Zᵢⱼ has j - i + 1 elements and pivots are chosen randomly:

Pr{zᵢ is compared to zⱼ} = Pr{zᵢ or zⱼ is first pivot from Zᵢⱼ}
                          = 1/(j - i + 1) + 1/(j - i + 1)
                          = 2/(j - i + 1)

Final Bound

E[X] = Σᵢ₌₁ⁿ⁻¹ Σⱼ₌ᵢ₊₁ⁿ 2/(j - i + 1)

Substituting k = j - i:

E[X] = Σᵢ₌₁ⁿ⁻¹ Σₖ₌₁ⁿ⁻ⁱ 2/(k + 1)
     < Σᵢ₌₁ⁿ⁻¹ Σₖ₌₁ⁿ 2/k
     = Σᵢ₌₁ⁿ⁻¹ O(lg n)
     = O(n lg n)

(Using the bound on the harmonic series: Σₖ₌₁ⁿ 1/k = O(lg n))

Conclusion: Using RANDOMIZED-PARTITION, the expected running time of quicksort is O(n lg n) when element values are distinct. Combined with the Θ(n lg n) best-case lower bound, the expected running time is Θ(n lg n).

Summary

Property	Value
Worst-case running time	Θ(n²)
Best-case running time	Θ(n lg n)
Expected running time (randomized)	Θ(n lg n)
In-place	Yes
Stable	No
Partition time	Θ(n)

Key takeaways:

Worst case occurs with maximally unbalanced partitions (e.g., already sorted input)
Any constant-proportion split yields O(n lg n) — the algorithm is robust
Randomization eliminates dependence on input ordering
Expected number of comparisons: O(n lg n), derived via indicator random variables and linearity of expectation
The PARTITION procedure by Lomuto selects the last element as pivot; Hoare's original version selects the first

Chapter Notes

Quicksort was invented by C.A.R. Hoare; his original partition appears in Problem 7-1
The PARTITION procedure in Section 7.1 is due to N. Lomuto
The analysis in Section 7.4 is due to Avrim Blum
McIlroy showed how to engineer a "killer adversary" that forces Θ(n²) on virtually any quicksort implementation

Previous chapter

Heapsort

Next chapter

Sorting in Linear Time