Probabilistic Analysis and Randomized Algorithms
Introduction to AlgorithmsChapter 5: Probabilistic Analysis and Randomized Algorithms
This chapter introduces probabilistic analysis and randomized algorithms — two powerful techniques for analyzing and designing algorithms using probability and randomness.
5.1 The Hiring Problem
Problem Setup
- You need to hire an office assistant through an employment agency
- Candidates arrive one per day, numbered 1 through n
- After each interview, you decide to hire or not
- Interviewing has a low cost cᵢ; hiring has a high cost cₕ
- Strategy: always keep the best candidate seen so far — if a new candidate is better, fire the current one and hire the new one
Pseudocode: HIRE-ASSISTANT(n)
HIRE-ASSISTANT(n)
1 best = 0 // dummy candidate 0 (least qualified)
2 for i = 1 to n
3 interview candidate i
4 if candidate i is better than candidate best
5 best = i
6 hire candidate i
Cost Analysis
- Total cost: O(cᵢ·n + cₕ·m) where m = number of people hired
- Interviewing cost cᵢ·n is fixed (always interview all n candidates)
- Focus is on analyzing the hiring cost cₕ·m, which varies per run
Worst-Case Analysis
- Worst case: candidates arrive in strictly increasing order of quality
- Every candidate is hired → m = n
- Total hiring cost: O(cₕ·n)
Probabilistic Analysis
- Uses probability to analyze expected behavior over a distribution of inputs
- Assumes candidates arrive in random order — i.e., the rank list ⟨rank(1), rank(2), …, rank(n)⟩ is a uniform random permutation of ⟨1, 2, …, n⟩
- Each of the n! permutations is equally likely
- Computes an average-case running time averaged over all possible inputs
Randomized Algorithms
- When we don't know the input distribution, we impose randomness ourselves
- A randomized algorithm has behavior determined by both its input and values from a random-number generator
- RANDOM(a, b): returns an integer in [a, b] uniformly at random; each call is independent
- Key distinction:
- Average-case running time: probability distribution is over the inputs
- Expected running time: the algorithm itself makes random choices
- Advantage: no particular input elicits worst-case behavior — even an adversary cannot produce a bad input
5.2 Indicator Random Variables
Definition
Given a sample space S and an event A, the indicator random variable I{A} is:
I{A} = 1 if A occurs
0 if A does not occur
Lemma 5.1
Given a sample space S and an event A, let X_A = I{A}. Then E[X_A] = Pr{A}.
Proof: E[X_A] = 1 · Pr{A} + 0 · Pr{Ā} = Pr{A}
Example: Coin Flips
- For n flips of a fair coin, let Xᵢ = I{ith flip is heads}
- Total heads: X = X₁ + X₂ + … + Xₙ
- By linearity of expectation: E[X] = Σ E[Xᵢ] = Σ (1/2) = n/2
- Linearity of expectation applies even when variables are dependent
Analysis of the Hiring Problem
- Let Xᵢ = I{candidate i is hired}
- Total hires: X = X₁ + X₂ + … + Xₙ
- Candidate i is hired iff they are the best among the first i candidates
- Probability: Pr{candidate i is hired} = 1/i
- By Lemma 5.1: E[Xᵢ] = 1/i
- By linearity of expectation:
E[X] = Σᵢ₌₁ⁿ (1/i) = ln n + O(1)
Lemma 5.2
Assuming candidates are presented in random order, HIRE-ASSISTANT has an average-case total hiring cost of O(cₕ ln n).
This is a significant improvement over the worst-case cost of O(cₕ·n).
5.3 Randomized Algorithms
Key Idea
- Instead of assuming a distribution on inputs, we impose one
- Randomly permute the candidates before running the algorithm
- This enforces the property that every permutation is equally likely
Pseudocode: RANDOMIZED-HIRE-ASSISTANT(n)
RANDOMIZED-HIRE-ASSISTANT(n)
1 randomly permute the list of candidates
2 best = 0
3 for i = 1 to n
4 interview candidate i
5 if candidate i is better than candidate best
6 best = i
7 hire candidate i
Lemma 5.3
The expected hiring cost of RANDOMIZED-HIRE-ASSISTANT is O(cₕ ln n).
Probabilistic Analysis vs. Randomized Algorithms
| Aspect | Probabilistic Analysis (Lemma 5.2) | Randomized Algorithm (Lemma 5.3) |
|---|---|---|
| Assumption | Input comes in random order | No assumption on input |
| Randomness source | The input | The algorithm itself |
| Cost term | Average-case | Expected |
| Guarantee | Over input distribution | For any input |
Randomly Permuting Arrays
Method 1: PERMUTE-BY-SORTING
PERMUTE-BY-SORTING(A)
1 n = A.length
2 let P[1..n] be a new array
3 for i = 1 to n
4 P[i] = RANDOM(1, n³)
5 sort A, using P as sort keys
- Assign each element a random priority, then sort by priority
- Running time: Θ(n lg n) (dominated by the sort)
- Range 1 to n³ makes it likely all priorities are unique (probability ≥ 1 − 1/n)
Lemma 5.4: PERMUTE-BY-SORTING produces a uniform random permutation (assuming distinct priorities).
Proof sketch: The probability of any specific permutation is:
(1/n) · (1/(n-1)) · (1/(n-2)) · … · (1/1) = 1/n!
Method 2: RANDOMIZE-IN-PLACE (Fisher-Yates Shuffle)
RANDOMIZE-IN-PLACE(A)
1 n = A.length
2 for i = 1 to n
3 swap A[i] with A[RANDOM(i, n)]
- Runs in O(n) time — more efficient than sorting
- In iteration i, choose A[i] randomly from A[i..n]
- After iteration i, A[i] is never altered
Lemma 5.5: RANDOMIZE-IN-PLACE computes a uniform random permutation.
Proof (by loop invariant): Just prior to iteration i, for each possible (i−1)-permutation, the subarray A[1..i−1] contains it with probability (n−i+1)!/n!.
- Initialization (i=1): Empty subarray trivially contains the 0-permutation with probability n!/n! = 1 ✓
- Maintenance: Pr{i-permutation in A[1..i]} = (1/(n−i+1)) · ((n−i+1)!/n!) = (n−i)!/n! ✓
- Termination (i=n+1): A[1..n] is any n-permutation with probability 0!/n! = 1/n! ✓
Common Pitfall
Simply swapping A[i] with A[RANDOM(1, n)] (choosing from the entire array) does not produce a uniform random permutation.
5.4 Probabilistic Analysis and Further Uses of Indicator Random Variables
5.4.1 The Birthday Paradox
Question: How many people must be in a room before there is a ≥ 50% chance that two share a birthday?
Answer: Surprisingly, only 23 people.
Setup
- k people, n = 365 days in a year
- Birthdays are independent and uniformly distributed
- Pr{person i and person j share a birthday} = 1/n
Exact Analysis
Let Bₖ = event that all k birthdays are distinct:
Pr{Bₖ} = 1 · (n-1)/n · (n-2)/n · … · (n-k+1)/n
= ∏ᵢ₌₁ᵏ⁻¹ (1 - i/n)
Using the inequality 1 + x ≤ eˣ:
Pr{Bₖ} ≤ e^(-1/n) · e^(-2/n) · … · e^(-(k-1)/n)
= e^(-k(k-1)/(2n))
This is ≤ 1/2 when k(k−1) ≥ 2n ln 2, giving:
k ≥ (1 + √(1 + 8n·ln2)) / 2
- For n = 365: k ≥ 23
- For Mars (n = 669 Martian days): k ≥ 31
Analysis Using Indicator Random Variables
- Let Xᵢⱼ = I{person i and person j share a birthday}, for 1 ≤ i < j ≤ k
- E[Xᵢⱼ] = 1/n
- Total matching pairs: X = Σᵢ Σⱼ>ᵢ Xᵢⱼ
- By linearity of expectation:
E[X] = C(k,2) · (1/n) = k(k-1) / (2n)
- When k(k−1) ≥ 2n, expected matching pairs ≥ 1
- For n = 365: need k ≥ 28 for E[X] ≥ 1
- Both analyses give Θ(√n) asymptotically
5.4.2 Balls and Bins
Consider tossing identical balls into b bins independently and uniformly at random.
Key Results
-
Expected balls in a given bin (after n tosses): n/b
- Follows binomial distribution b(k; n, 1/b)
-
Expected tosses until a given bin gets a ball: b
- Follows geometric distribution with probability 1/b
-
Expected tosses until every bin has at least one ball: b ln b + O(b)
- Partition tosses into stages: stage i = tosses from (i−1)th hit to ith hit
- A "hit" = ball lands in an empty bin
- During stage i: (i−1) bins occupied, (b−i+1) bins empty
- Probability of a hit in stage i: (b−i+1)/b
- nᵢ (tosses in stage i) follows geometric distribution
- E[nᵢ] = b/(b−i+1)
- Total: E[n] = Σᵢ₌₁ᵇ b/(b−i+1) = b · Σᵢ₌₁ᵇ (1/i) = b(ln b + O(1))
This is also known as the coupon collector's problem: collecting b different coupons requires approximately b ln b random draws.
5.4.3 Streaks
Question: In n fair coin flips, what is the expected length of the longest streak of consecutive heads?
Answer: Θ(lg n)
Upper Bound: O(lg n)
- Let Aᵢₖ = event that a streak of ≥ k heads starts at position i
- Pr{Aᵢₖ} = 1/2ᵏ (by independence)
- For k = 2⌈lg n⌉: Pr{Aᵢₖ} ≤ 1/n²
- By Boole's inequality (union bound): probability of any streak of length ≥ 2⌈lg n⌉ is < 1/n
- Splitting E[L] into two ranges and bounding:
E[L] < 2⌈lg n⌉ · 1 + n · (1/n) = O(lg n)
Lower Bound: Ω(lg n)
- Partition n flips into groups of s = ⌊(lg n)/2⌋ consecutive flips
- Probability one group is all heads: ≥ 1/√n
- Probability no group is all heads: ≤ (1 − 1/√n)^(2n/lg n − 1) = O(1/n)
- So with probability ≥ 1 − O(1/n), the longest streak is ≥ ⌊(lg n)/2⌋
- Therefore: E[L] = Ω(lg n)
Indicator Variable Approach
- Let Xᵢₖ = I{streak of ≥ k heads starts at position i}
- Expected number of streaks of length k: E[X] = (n−k+1)/2ᵏ
- For k = c·lg n: E[X] = Θ(1/n^(c−1))
- c large → few such streaks (unlikely)
- c = 1/2 → E[X] = Θ(√n) → many such streaks (likely)
- Confirms expected longest streak is Θ(lg n)
Probability of Long Streaks
| Streak length | Probability of occurring |
|---|---|
| ≥ r⌈lg n⌉ | ≤ 1/n^(r−1) |
| Example (n=1000): ≥ 20 | ≤ 1/1000 |
| Example (n=1000): ≥ 30 | ≤ 1/1,000,000 |
5.4.4 The On-Line Hiring Problem
A variant where you must hire exactly once and decide immediately after each interview.
Strategy
- Interview and reject the first k candidates (observation phase)
- Then hire the first candidate who is better than all of the first k
- If no such candidate appears, hire the last one (candidate n)
Pseudocode: ON-LINE-MAXIMUM(k, n)
ON-LINE-MAXIMUM(k, n)
1 bestscore = -∞
2 for i = 1 to k
3 if score(i) > bestscore
4 bestscore = score(i)
5 for i = k+1 to n
6 if score(i) > bestscore
7 return i
8 return n
Analysis
- Let S = event of successfully hiring the best candidate
- Sᵢ = event of success when the best candidate is in position i
- Pr{Sᵢ} = 0 for i = 1, …, k (always rejected)
- For i > k: Pr{Sᵢ} = Pr{Bᵢ} · Pr{Oᵢ}
- Bᵢ = best candidate is in position i → Pr{Bᵢ} = 1/n
- Oᵢ = none of positions k+1 to i−1 are selected → Pr{Oᵢ} = k/(i−1)
- These events are independent
- Pr{Sᵢ} = k / (n(i−1))
Pr{S} = Σᵢ₌ₖ₊₁ⁿ k/(n(i-1)) = (k/n) · Σᵢ₌ₖⁿ⁻¹ (1/i)
Optimal k
- Approximate the sum with integrals: Pr{S} ≈ (k/n)(ln n − ln k)
- Differentiate and set to 0: ln k = ln n − 1 = ln(n/e)
- Optimal k = n/e
- Success probability: at least 1/e ≈ 0.368
By interviewing and rejecting the first n/e candidates, then hiring the first one better than all of them, we find the best candidate with probability at least 1/e — regardless of n.
Previous chapter
Divide and ConquerNext chapter
Heapsort