Divide and Conquer
Introduction to AlgorithmsChapter 4: Divide-and-Conquer
Overview
The divide-and-conquer paradigm solves a problem recursively by applying three steps at each level of recursion:
- Divide the problem into a number of subproblems that are smaller instances of the same problem.
- Conquer the subproblems by solving them recursively. If subproblem sizes are small enough, solve them directly (base case).
- Combine the solutions to the subproblems into the solution for the original problem.
- When subproblems are large enough to solve recursively → recursive case
- When subproblems are small enough to solve directly → base case
Recurrences
Recurrences describe the running time of divide-and-conquer algorithms — equations or inequalities that describe a function in terms of its value on smaller inputs.
- Example (merge sort): T(n) = 2T(n/2) + Θ(n), solution: T(n) = Θ(n lg n)
- Unequal splits are possible: T(n) = T(2n/3) + T(n/3) + Θ(n)
- Linear recursion: T(n) = T(n − 1) + Θ(1)
Three methods for solving recurrences:
- Substitution method — guess a bound, prove by induction
- Recursion-tree method — convert to a tree, sum costs at each level
- Master method — cookbook method for recurrences of the form T(n) = aT(n/b) + f(n)
Technicalities
- Floors and ceilings are typically omitted for convenience (they usually don't affect asymptotic bounds)
- Boundary conditions (e.g., T(1) = Θ(1)) are generally omitted; T(n) is assumed constant for small n
- Recurrence inequalities: T(n) ≤ 2T(n/2) + Θ(n) → use O-notation; T(n) ≥ ... → use Ω-notation
4.1 The Maximum-Subarray Problem
Problem Statement
Given an array of numbers (containing some negatives), find the contiguous subarray whose values have the greatest sum.
- Motivated by the stock trading problem: maximize profit by buying low and selling high
- Transform daily prices into daily changes → finding the maximum subarray of changes gives the optimal buy/sell window
- Brute force: check all Θ(n²) pairs → Θ(n²) time
Divide-and-Conquer Solution
For subarray A[low..high], find midpoint mid = ⌊(low + high)/2⌋. A maximum subarray must lie in exactly one of:
- Entirely in A[low..mid] (left half)
- Entirely in A[mid+1..high] (right half)
- Crossing the midpoint (spans both halves)
Solve left and right halves recursively. Find the max crossing subarray in linear time, then return the best of the three.
Finding the Max Crossing Subarray
Any subarray crossing the midpoint is composed of A[i..mid] and A[mid+1..j]. Find each half's maximum independently.
FIND-MAX-CROSSING-SUBARRAY(A, low, mid, high)
left-sum = -∞
sum = 0
for i = mid downto low
sum = sum + A[i]
if sum > left-sum
left-sum = sum
max-left = i
right-sum = -∞
sum = 0
for j = mid + 1 to high
sum = sum + A[j]
if sum > right-sum
right-sum = sum
max-right = j
return (max-left, max-right, left-sum + right-sum)
- Runs in Θ(n) time where n = high − low + 1
Main Algorithm
FIND-MAXIMUM-SUBARRAY(A, low, high)
if high == low
return (low, high, A[low]) // base case: one element
else mid = ⌊(low + high)/2⌋
(left-low, left-high, left-sum) =
FIND-MAXIMUM-SUBARRAY(A, low, mid)
(right-low, right-high, right-sum) =
FIND-MAXIMUM-SUBARRAY(A, mid + 1, high)
(cross-low, cross-high, cross-sum) =
FIND-MAX-CROSSING-SUBARRAY(A, low, mid, high)
if left-sum ≥ right-sum and left-sum ≥ cross-sum
return (left-low, left-high, left-sum)
elseif right-sum ≥ left-sum and right-sum ≥ cross-sum
return (right-low, right-high, right-sum)
else return (cross-low, cross-high, cross-sum)
Analysis
The recurrence is:
- T(1) = Θ(1)
- T(n) = 2T(n/2) + Θ(n) for n > 1
Solution: T(n) = Θ(n lg n) — same as merge sort.
Note: A linear-time O(n) algorithm exists using Kadane's algorithm (dynamic programming), but the divide-and-conquer approach illustrates the paradigm.
4.2 Strassen's Algorithm for Matrix Multiplication
Standard Matrix Multiplication
For n × n matrices A and B, the product C = A · B is defined by:
c_ij = Σ(k=1 to n) a_ik · b_kj
SQUARE-MATRIX-MULTIPLY(A, B)
n = A.rows
let C be a new n × n matrix
for i = 1 to n
for j = 1 to n
c_ij = 0
for k = 1 to n
c_ij = c_ij + a_ik · b_kj
return C
- Running time: Θ(n³) due to triply-nested loops
Simple Divide-and-Conquer Approach
Partition each n × n matrix into four (n/2 × n/2) submatrices:
A = | A11 A12 | B = | B11 B12 | C = | C11 C12 |
| A21 A22 | | B21 B22 | | C21 C22 |
This gives:
- C11 = A11·B11 + A12·B21
- C12 = A11·B12 + A12·B22
- C21 = A21·B11 + A22·B21
- C22 = A21·B12 + A22·B22
SQUARE-MATRIX-MULTIPLY-RECURSIVE(A, B)
n = A.rows
let C be a new n × n matrix
if n == 1
c11 = a11 · b11
else partition A, B, and C into n/2 × n/2 submatrices
C11 = RECURSIVE(A11, B11) + RECURSIVE(A12, B21)
C12 = RECURSIVE(A11, B12) + RECURSIVE(A12, B22)
C21 = RECURSIVE(A21, B11) + RECURSIVE(A22, B21)
C22 = RECURSIVE(A21, B12) + RECURSIVE(A22, B22)
return C
- 8 recursive multiplications of (n/2 × n/2) matrices + 4 matrix additions of Θ(n²)
- Recurrence: T(n) = 8T(n/2) + Θ(n²)
- Solution: T(n) = Θ(n³) — no improvement over the naive method
Key insight: Partitioning by index calculation takes Θ(1), not Θ(n²). Asymptotic notation subsumes constant factors, but recursive notation like T(n/2) does not — the factor of 8 matters.
Strassen's Method
The key idea: reduce from 8 to 7 recursive multiplications at the cost of more additions. This makes the recursion tree less "bushy."
Four steps:
- Divide A, B, C into n/2 × n/2 submatrices (Θ(1) by index calculation)
- Create 10 matrices S1..S10, each n/2 × n/2 (sums/differences of submatrices, Θ(n²) total)
- Recursively compute 7 products P1..P7 (each n/2 × n/2)
- Compute C submatrices from P1..P7 by addition/subtraction (Θ(n²))
Step 2 — The S matrices:
- S1 = B12 − B22
- S2 = A11 + A12
- S3 = A21 + A22
- S4 = B21 − B11
- S5 = A11 + A22
- S6 = B11 + B22
- S7 = A12 − A22
- S8 = B21 + B22
- S9 = A11 − A21
- S10 = B11 + B12
Step 3 — The 7 products:
- P1 = A11 · S1
- P2 = S2 · B22
- P3 = S3 · B11
- P4 = A22 · S4
- P5 = S5 · S6
- P6 = S7 · S8
- P7 = S9 · S10
Step 4 — Combine:
- C11 = P5 + P4 − P2 + P6
- C12 = P1 + P2
- C21 = P3 + P4
- C22 = P5 + P1 − P3 − P7
Analysis
- Recurrence: T(n) = 7T(n/2) + Θ(n²)
- Solution: T(n) = Θ(n^(lg 7)) ≈ Θ(n^2.81)
Practical Considerations
Strassen's algorithm is not always preferred in practice because:
- The constant factor hidden in Θ(n^(lg 7)) is larger than in Θ(n³)
- Not as efficient for sparse matrices
- Less numerically stable (larger floating-point errors accumulate)
- Recursive submatrices consume more memory
In practice, implementations use Strassen's for large matrices and switch to the naive method below a crossover point (system-dependent, typically n = 400–2150).
4.3 The Substitution Method for Solving Recurrences
Overview
The substitution method has two steps:
- Guess the form of the solution
- Use mathematical induction to find the constants and prove the guess correct
Example: T(n) = 2T(⌊n/2⌋) + n
Guess: T(n) = O(n lg n), i.e., T(n) ≤ cn lg n for some c > 0.
Inductive step: Assume T(⌊n/2⌋) ≤ c⌊n/2⌋ lg(⌊n/2⌋). Then:
T(n) ≤ 2(c⌊n/2⌋ lg(⌊n/2⌋)) + n
≤ cn lg(n/2) + n
= cn lg n − cn lg 2 + n
= cn lg n − cn + n
≤ cn lg n (when c ≥ 1)
Base case: T(1) = 1 but cn lg n = 0 at n = 1 — doesn't work! Fix: use T(2) and T(3) as base cases (n₀ = 2). Choose c ≥ 2 so that T(2) ≤ c·2·lg 2 and T(3) ≤ c·3·lg 3.
Making a Good Guess
- Use similarity to known recurrences (e.g., T(n) = 2T(⌊n/2⌋ + 17) + n is still O(n lg n))
- Start with loose bounds and tighten: prove Ω(n) lower bound and O(n²) upper bound, then converge
- Use recursion trees to generate guesses
Subtleties: Strengthening the Inductive Hypothesis
For T(n) = T(⌊n/2⌋) + T(⌈n/2⌉) + 1:
-
Guessing T(n) ≤ cn gives T(n) ≤ cn + 1 — doesn't work!
-
Fix: Subtract a lower-order term. Guess T(n) ≤ cn − d (where d ≥ 0):
T(n) ≤ (c⌊n/2⌋ − d) + (c⌈n/2⌉ − d) + 1 = cn − 2d + 1 ≤ cn − d (when d ≥ 1) ✓
Counterintuitive: when induction fails, subtracting a lower-order term (strengthening the hypothesis) can be easier to prove than a weaker bound.
Avoiding Pitfalls
Do NOT do this:
T(n) ≤ 2(c⌊n/2⌋) + n ≤ cn + n = O(n) ← WRONG!
You must prove the exact form T(n) ≤ cn, not just that T(n) = O(n).
Changing Variables
For T(n) = 2T(⌊√n⌋) + lg n:
- Let m = lg n, so T(2^m) = 2T(2^(m/2)) + m
- Let S(m) = T(2^m), giving S(m) = 2S(m/2) + m
- Solve: S(m) = O(m lg m)
- Back-substitute: T(n) = O(lg n · lg lg n)
4.4 The Recursion-Tree Method for Solving Recurrences
Overview
A recursion tree visualizes the cost of a recurrence:
- Each node represents the cost of a single subproblem
- Sum costs within each level to get per-level costs
- Sum all levels to get the total cost
Best used to generate a good guess, then verify with the substitution method.
Example 1: T(n) = 3T(n/4) + cn²
Building the tree:
- Root: cost cn²
- Each node has 3 children, subproblem size shrinks by factor of 4
- Depth i: 3^i nodes, each with cost c(n/4^i)²
- Total cost at depth i: 3^i · c(n/4^i)² = (3/16)^i · cn²
- Height: log₄ n (subproblem size n/4^i = 1 when i = log₄ n)
- Leaves: 3^(log₄ n) = n^(log₄ 3) leaves, each Θ(1)
Summing all levels:
T(n) = Σ(i=0 to log₄n − 1) (3/16)^i · cn² + Θ(n^(log₄ 3))
Since 3/16 < 1, this is a decreasing geometric series:
T(n) < (1/(1 − 3/16)) · cn² + Θ(n^(log₄ 3))
= (16/13) · cn² + Θ(n^(log₄ 3))
= **O(n²)**
Verification by substitution: Show T(n) ≤ dn² for d ≥ (16/13)c. ✓
The root dominates the total cost — a constant fraction of the total is at the top level.
Example 2: T(n) = T(n/3) + T(2n/3) + cn
- Unbalanced tree — left branch shrinks faster than right
- Every level sums to at most cn
- Longest root-to-leaf path: n → (2/3)n → (2/3)²n → ... → 1, height = log₃/₂ n
- Guess: T(n) = O(n lg n)
Verification: Show T(n) ≤ dn lg n:
T(n) ≤ d(n/3)lg(n/3) + d(2n/3)lg(2n/3) + cn
= dn lg n − d(n/3·lg 3 + 2n/3·lg(3/2)) + cn
= dn lg n − dn(lg 3 − 2/3) + cn
≤ dn lg n (when d ≥ c/(lg 3 − 2/3)) ✓
4.5 The Master Method for Solving Recurrences
The Master Theorem
For recurrences of the form:
T(n) = aT(n/b) + f(n)
where a ≥ 1, b > 1, and f(n) is asymptotically positive:
Compare f(n) with n^(log_b a):
| Case | Condition | Solution |
|---|---|---|
| 1 | f(n) = O(n^(log_b a − ε)) for some ε > 0 | T(n) = Θ(n^(log_b a)) |
| 2 | f(n) = Θ(n^(log_b a)) | T(n) = Θ(n^(log_b a) · lg n) |
| 3 | f(n) = Ω(n^(log_b a + ε)) for some ε > 0, and af(n/b) ≤ cf(n) for some c < 1 | T(n) = Θ(f(n)) |
Intuition
- Case 1: n^(log_b a) dominates → leaf costs dominate the tree
- Case 2: f(n) and n^(log_b a) are equal → costs spread evenly, multiply by lg n
- Case 3: f(n) dominates → root cost dominates the tree
Important Caveats
- In Case 1, f(n) must be polynomially smaller than n^(log_b a) (by a factor of n^ε)
- In Case 3, f(n) must be polynomially larger AND satisfy the regularity condition af(n/b) ≤ cf(n)
- There are gaps between the cases where the master method does not apply (e.g., f(n) = n lg n with a = 2, b = 2)
Examples
Example 1: T(n) = 9T(n/3) + n
- a = 9, b = 3, f(n) = n, n^(log₃ 9) = n²
- f(n) = O(n^(2−1)) → Case 1: T(n) = Θ(n²)
Example 2: T(n) = T(2n/3) + 1
- a = 1, b = 3/2, f(n) = 1, n^(log₃/₂ 1) = n⁰ = 1
- f(n) = Θ(1) → Case 2: T(n) = Θ(lg n)
Example 3: T(n) = 3T(n/4) + n lg n
- a = 3, b = 4, n^(log₄ 3) ≈ n^0.793
- f(n) = n lg n = Ω(n^(0.793 + ε)) for ε ≈ 0.2
- Regularity: 3(n/4)lg(n/4) ≤ (3/4)n lg n ✓
- Case 3: T(n) = Θ(n lg n)
Example 4 (does NOT apply): T(n) = 2T(n/2) + n lg n
- a = 2, b = 2, n^(log₂ 2) = n
- f(n) = n lg n is larger than n but not polynomially larger (lg n < n^ε for any ε > 0)
- Falls in the gap between Case 2 and Case 3
Applying to Earlier Algorithms
| Algorithm | Recurrence | a | b | n^(log_b a) | Case | Solution |
|---|---|---|---|---|---|---|
| Max subarray / Merge sort | T(n) = 2T(n/2) + Θ(n) | 2 | 2 | n | 2 | Θ(n lg n) |
| Naive recursive matrix multiply | T(n) = 8T(n/2) + Θ(n²) | 8 | 2 | n³ | 1 | Θ(n³) |
| Strassen's algorithm | T(n) = 7T(n/2) + Θ(n²) | 7 | 2 | n^(lg 7) | 1 | Θ(n^(lg 7)) ≈ Θ(n^2.81) |
Summary Table
| Topic | Key Result |
|---|---|
| Maximum subarray (divide-and-conquer) | Θ(n lg n) |
| Maximum subarray (Kadane's / DP) | Θ(n) |
| Naive matrix multiplication | Θ(n³) |
| Recursive matrix multiplication (8 subproblems) | Θ(n³) |
| Strassen's algorithm (7 subproblems) | Θ(n^(lg 7)) ≈ Θ(n^2.81) |
| Best known matrix multiplication (Coppersmith-Winograd) | O(n^2.376) |
| Substitution method | Guess + induction |
| Recursion-tree method | Visualize + sum levels |
| Master method | T(n) = aT(n/b) + f(n), three cases |
Previous chapter
Growth of FunctionsNext chapter
Probabilistic Analysis and Randomized Algorithms