Dynamic Programming
Introduction to AlgorithmsChapter 15: Dynamic Programming
Dynamic programming solves problems by combining solutions to subproblems. Unlike divide-and-conquer (which partitions into disjoint subproblems), DP applies when subproblems overlap — i.e., they share sub-subproblems. Each sub-subproblem is solved only once and its answer is saved in a table, avoiding redundant recomputation.
DP is typically applied to optimization problems: problems with many possible solutions where we seek one with the optimal (min or max) value.
Four Steps of Dynamic Programming
- Characterize the structure of an optimal solution.
- Recursively define the value of an optimal solution.
- Compute the value of an optimal solution (typically bottom-up).
- Construct an optimal solution from the computed information.
Steps 1–3 form the core. Step 4 is only needed if we want the actual solution (not just its value), and often requires maintaining additional information during step 3.
15.1 Rod Cutting
Problem Statement
Given a rod of length n inches and a price table p_i for rod pieces of length i = 1, 2, …, n, determine the maximum revenue r_n obtainable by cutting the rod and selling the pieces.
| length i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| price p_i | 1 | 5 | 8 | 9 | 10 | 17 | 17 | 20 | 24 | 30 |
- A rod of length n can be cut in 2^(n−1) different ways.
- If an optimal solution cuts the rod into k pieces of lengths i_1, i_2, …, i_k, then:
r_n = p_{i_1} + p_{i_2} + … + p_{i_k}
Optimal Revenue Values (Example)
| r₁=1 | r₂=5 | r₃=8 | r₄=10 | r₅=13 | r₆=17 | r₇=18 | r₈=22 | r₉=25 | r₁₀=30 |
|---|
Recursive Formulation
The general recurrence (viewing the first piece of length i as fixed, with only the remainder further divided):
r_n = max₁≤i≤n (p_i + r_{n−i})
This exhibits optimal substructure: optimal solutions incorporate optimal solutions to subproblems.
Naive Recursive Implementation
CUT-ROD(p, n)
if n == 0
return 0
q = -∞
for i = 1 to n
q = max(q, p[i] + CUT-ROD(p, n - i))
return q
- Running time: T(n) = 2^n — exponential, because it solves the same subproblems repeatedly.
Top-Down with Memoization
Save the result of each subproblem in an array. Before computing, check if the answer is already known.
MEMOIZED-CUT-ROD(p, n)
let r[0..n] be a new array
for i = 0 to n
r[i] = -∞
return MEMOIZED-CUT-ROD-AUX(p, n, r)
MEMOIZED-CUT-ROD-AUX(p, n, r)
if r[n] ≥ 0
return r[n]
if n == 0
q = 0
else q = -∞
for i = 1 to n
q = max(q, p[i] + MEMOIZED-CUT-ROD-AUX(p, n - i, r))
r[n] = q
return q
Bottom-Up Method
Solve subproblems in order of increasing size. When solving size j, all smaller subproblems are already solved.
BOTTOM-UP-CUT-ROD(p, n)
let r[0..n] be a new array
r[0] = 0
for j = 1 to n
q = -∞
for i = 1 to j
q = max(q, p[i] + r[j - i])
r[j] = q
return r[n]
- Running time: Θ(n²) for both top-down and bottom-up approaches.
- Bottom-up typically has better constant factors (no recursion overhead).
Subproblem Graphs
- A subproblem graph is a directed graph with one vertex per distinct subproblem and an edge from x to y if solving x directly requires the solution to y.
- It is a "collapsed" version of the recursion tree.
- Bottom-up DP processes vertices in reverse topological order.
- The running time of DP is linear in the number of vertices and edges of the subproblem graph.
Reconstructing a Solution
Track not just the optimal value, but also the choice that led to it.
EXTENDED-BOTTOM-UP-CUT-ROD(p, n)
let r[0..n] and s[0..n] be new arrays
r[0] = 0
for j = 1 to n
q = -∞
for i = 1 to j
if q < p[i] + r[j - i]
q = p[i] + r[j - i]
s[j] = i
r[j] = q
return r and s
PRINT-CUT-ROD-SOLUTION(p, n)
(r, s) = EXTENDED-BOTTOM-UP-CUT-ROD(p, n)
while n > 0
print s[n]
n = n - s[n]
15.2 Matrix-Chain Multiplication
Problem Statement
Given a chain of n matrices ⟨A₁, A₂, …, Aₙ⟩ to multiply, where matrix A_i has dimensions p_{i−1} × p_i, find the parenthesization that minimizes the total number of scalar multiplications.
- Multiplying a p × q matrix by a q × r matrix costs p·q·r scalar multiplications.
- The order of parenthesization can have a dramatic impact on cost.
Example: For ⟨A₁, A₂, A₃⟩ with dimensions 10×100, 100×5, 5×50:
- ((A₁A₂)A₃) costs 5,000 + 2,500 = 7,500
- (A₁(A₂A₃)) costs 25,000 + 50,000 = 75,000 — 10× slower!
Number of Parenthesizations
The number of ways to parenthesize n matrices is given by the Catalan numbers, which grow as Ω(4ⁿ/n^(3/2)) — exhaustive search is infeasible.
Step 1: Optimal Substructure
- To parenthesize A_i A_{i+1} … A_j, we split between A_k and A_{k+1} for some i ≤ k < j.
- The parenthesization of the prefix A_i…A_k and suffix A_{k+1}…A_j within an optimal solution must themselves be optimal (by cut-and-paste argument).
Step 2: Recursive Solution
Let m[i, j] = minimum scalar multiplications to compute A_{i..j}.
m[i, j] = 0 if i = j
m[i, j] = min_{i≤k<j} { m[i, k] + m[k+1, j] + p_{i−1}·p_k·p_j } if i < j
Also define s[i, j] = the value of k that achieves the optimal split.
Step 3: Bottom-Up Computation
MATRIX-CHAIN-ORDER(p)
n = p.length - 1
let m[1..n, 1..n] and s[1..n-1, 2..n] be new tables
for i = 1 to n
m[i, i] = 0
for l = 2 to n // l = chain length
for i = 1 to n - l + 1
j = i + l - 1
m[i, j] = ∞
for k = i to j - 1
q = m[i, k] + m[k+1, j] + p_{i-1} · p_k · p_j
if q < m[i, j]
m[i, j] = q
s[i, j] = k
return m and s
- Running time: O(n³)
- Space: Θ(n²) for the m and s tables.
- Fills the table by increasing chain length l.
Step 4: Constructing the Solution
PRINT-OPTIMAL-PARENS(s, i, j)
if i == j
print "A_i"
else print "("
PRINT-OPTIMAL-PARENS(s, i, s[i, j])
PRINT-OPTIMAL-PARENS(s, s[i, j] + 1, j)
print ")"
Example (n = 6): Dimensions ⟨30, 35, 15, 5, 10, 20, 25⟩ → optimal cost m[1,6] = 15,125, parenthesization: ((A₁(A₂A₃))((A₄A₅)A₆))
15.3 Elements of Dynamic Programming
Two key ingredients an optimization problem must have for DP to apply:
1. Optimal Substructure
A problem exhibits optimal substructure if an optimal solution contains within it optimal solutions to subproblems.
Pattern for discovering optimal substructure:
- Show that a solution consists of making a choice (e.g., where to cut, where to split), leaving subproblems to solve.
- Suppose you are given the choice leading to an optimal solution.
- Determine which subproblems ensue and characterize the subproblem space.
- Show subproblem solutions must be optimal via cut-and-paste: if a subproblem solution weren't optimal, you could substitute a better one, contradicting the optimality of the overall solution.
Optimal substructure varies in two ways:
- How many subproblems an optimal solution uses.
- How many choices we have in determining which subproblem(s) to use.
Running time informally depends on: (number of subproblems) × (number of choices per subproblem).
| Problem | Subproblems | Choices | Running Time |
|---|---|---|---|
| Rod cutting | Θ(n) | ≤ n | O(n²) |
| Matrix-chain multiplication | Θ(n²) | ≤ n−1 | O(n³) |
Subtlety: When Optimal Substructure Fails
- Unweighted shortest path — has optimal substructure (subpaths of shortest paths are shortest paths).
- Unweighted longest simple path — does NOT have optimal substructure. Subproblems are not independent (they share vertices/resources). This problem is NP-complete.
Key insight: Subproblems must be independent — the solution to one subproblem must not affect the solution to another subproblem of the same problem.
2. Overlapping Subproblems
The subproblem space must be "small" — a recursive algorithm solves the same subproblems over and over rather than always generating new ones. Typically the total number of distinct subproblems is polynomial in the input size.
- DP stores solutions in a table for O(1) lookup.
- Divide-and-conquer generates brand-new subproblems at each step → no overlap → DP doesn't help.
Memoization (Top-Down DP)
Memoization maintains a table of subproblem solutions with a top-down recursive control structure:
- Initialize all table entries to a sentinel value (e.g., ∞).
- On each recursive call, check if the answer is already in the table.
- If yes, return it immediately. If no, compute it, store it, then return.
MEMOIZED-MATRIX-CHAIN(p)
n = p.length - 1
let m[1..n, 1..n] be a new table
for i = 1 to n
for j = i to n
m[i, j] = ∞
return LOOKUP-CHAIN(m, p, 1, n)
LOOKUP-CHAIN(m, p, i, j)
if m[i, j] < ∞
return m[i, j]
if i == j
m[i, j] = 0
else for k = i to j - 1
q = LOOKUP-CHAIN(m, p, i, k)
+ LOOKUP-CHAIN(m, p, k+1, j) + p_{i-1} · p_k · p_j
if q < m[i, j]
m[i, j] = q
return m[i, j]
Running time: O(n³) — same as bottom-up.
Bottom-Up vs. Top-Down (Memoized)
| Bottom-Up | Top-Down (Memoized) | |
|---|---|---|
| Approach | Solve smallest subproblems first | Recurse naturally, cache results |
| Constant factors | Better (no recursion overhead) | Slightly worse |
| Subproblems solved | All of them | Only those actually needed |
| When to prefer | All subproblems must be solved | Some subproblems may be skipped |
15.4 Longest Common Subsequence (LCS)
Definitions
- A subsequence of X = ⟨x₁, x₂, …, x_m⟩ is obtained by deleting zero or more elements (preserving order).
- A common subsequence of X and Y is a subsequence of both.
- The longest common subsequence (LCS) is a common subsequence of maximum length.
Example: X = ⟨A, B, C, B, D, A, B⟩, Y = ⟨B, D, C, A, B, A⟩ → LCS = ⟨B, C, B, A⟩ (length 4).
- Brute force: enumerate all 2^m subsequences of X → exponential.
Step 1: Optimal Substructure (Theorem 15.1)
Let X = ⟨x₁, …, x_m⟩, Y = ⟨y₁, …, y_n⟩, and Z = ⟨z₁, …, z_k⟩ be any LCS of X and Y. Define prefix X_i = ⟨x₁, …, x_i⟩.
- If x_m = y_n, then z_k = x_m = y_n and Z_{k−1} is an LCS of X_{m−1} and Y_{n−1}.
- If x_m ≠ y_n and z_k ≠ x_m, then Z is an LCS of X_{m−1} and Y.
- If x_m ≠ y_n and z_k ≠ y_n, then Z is an LCS of X and Y_{n−1}.
Step 2: Recursive Solution
Let c[i, j] = length of an LCS of X_i and Y_j.
c[i, j] = 0 if i = 0 or j = 0
c[i, j] = c[i−1, j−1] + 1 if i, j > 0 and x_i = y_j
c[i, j] = max(c[i, j−1], c[i−1, j]) if i, j > 0 and x_i ≠ y_j
Step 3: Computing the Length
LCS-LENGTH(X, Y)
m = X.length
n = Y.length
let b[1..m, 1..n] and c[0..m, 0..n] be new tables
for i = 1 to m
c[i, 0] = 0
for j = 0 to n
c[0, j] = 0
for i = 1 to m
for j = 1 to n
if x_i == y_j
c[i, j] = c[i-1, j-1] + 1
b[i, j] = "↖"
elseif c[i-1, j] ≥ c[i, j-1]
c[i, j] = c[i-1, j]
b[i, j] = "↑"
else
c[i, j] = c[i, j-1]
b[i, j] = "←"
return c and b
- Running time: Θ(mn)
- Space: Θ(mn)
Step 4: Constructing an LCS
Follow the arrows in table b from b[m, n] back to a boundary:
PRINT-LCS(b, X, i, j)
if i == 0 or j == 0
return
if b[i, j] == "↖"
PRINT-LCS(b, X, i-1, j-1)
print x_i
elseif b[i, j] == "↑"
PRINT-LCS(b, X, i-1, j)
else
PRINT-LCS(b, X, i, j-1)
- Running time: O(m + n) — decrements at least one index per call.
Space Optimization
- The b table can be eliminated: given c[i, j], we can determine in O(1) which of the three entries was used.
- For computing only the LCS length, only two rows of c are needed at a time → O(min(m, n)) space.
- However, reconstructing the actual LCS from the reduced table is not possible in O(m + n) time.
15.5 Optimal Binary Search Trees
Problem Statement
Given n distinct keys K = ⟨k₁ < k₂ < … < kₙ⟩ with search probabilities p_i, and n + 1 dummy keys d₀, d₁, …, dₙ (representing unsuccessful searches) with probabilities q_i, construct a BST that minimizes the expected search cost.
The probabilities satisfy: Σp_i + Σq_i = 1
Expected search cost:
E[search cost in T] = 1 + Σᵢ depth_T(k_i)·p_i + Σᵢ depth_T(d_i)·q_i
Key insight: An optimal BST is not necessarily the one with minimum height, nor the one with the highest-probability key at the root.
Step 1: Optimal Substructure
- Any subtree of a BST contains keys in a contiguous range k_i, …, k_j and dummy keys d_{i−1}, …, d_j.
- If T is an optimal BST and T' is a subtree containing k_i, …, k_j, then T' must be an optimal BST for that subproblem (cut-and-paste argument).
- For keys k_i, …, k_j, some key k_r (i ≤ r ≤ j) is the root. The left subtree contains k_i, …, k_{r−1} and the right subtree contains k_{r+1}, …, k_j.
Step 2: Recursive Solution
Let e[i, j] = expected search cost of an optimal BST containing keys k_i, …, k_j.
Define the weight (sum of probabilities in a subtree):
w(i, j) = Σ_{l=i}^{j} p_l + Σ_{l=i−1}^{j} q_l
When a subtree becomes a child of a node, every node's depth increases by 1, adding w(i, j) to the cost.
Recurrence:
e[i, j] = q_{i−1} if j = i − 1 (only dummy key d_{i−1})
e[i, j] = min_{i≤r≤j} { e[i, r−1] + e[r+1, j] + w(i, j) } if i ≤ j
Also define root[i, j] = the index r achieving the minimum.
The weight can be computed incrementally:
w[i, j] = w[i, j−1] + p_j + q_j
Step 3: Bottom-Up Computation
OPTIMAL-BST(p, q, n)
let e[1..n+1, 0..n], w[1..n+1, 0..n],
and root[1..n, 1..n] be new tables
for i = 1 to n + 1
e[i, i-1] = q_{i-1}
w[i, i-1] = q_{i-1}
for l = 1 to n
for i = 1 to n - l + 1
j = i + l - 1
e[i, j] = ∞
w[i, j] = w[i, j-1] + p_j + q_j
for r = i to j
t = e[i, r-1] + e[r+1, j] + w[i, j]
if t < e[i, j]
e[i, j] = t
root[i, j] = r
return e and root
- Running time: Θ(n³) — three nested loops, each up to n.
- Space: Θ(n²) for the e, w, and root tables.
- Structure is very similar to MATRIX-CHAIN-ORDER.
Example (n = 5)
| i | 0 | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|---|
| p_i | — | 0.15 | 0.10 | 0.05 | 0.10 | 0.20 |
| q_i | 0.05 | 0.10 | 0.05 | 0.05 | 0.05 | 0.10 |
Optimal expected cost: e[1, 5] = 2.75, with k₂ as root.
Note: Knuth showed that root[i, j−1] ≤ root[i, j] ≤ root[i+1, j], which can be exploited to reduce the running time to Θ(n²).
Summary of Key DP Problems
| Problem | Subproblems | Recurrence | Time | Space |
|---|---|---|---|---|
| Rod Cutting | Θ(n) | r_n = max(p_i + r_{n−i}) | Θ(n²) | Θ(n) |
| Matrix-Chain Mult. | Θ(n²) | m[i,j] = min(m[i,k] + m[k+1,j] + p_{i−1}p_kp_j) | O(n³) | Θ(n²) |
| LCS | Θ(mn) | c[i,j] based on character match | Θ(mn) | Θ(mn) |
| Optimal BST | Θ(n²) | e[i,j] = min(e[i,r−1] + e[r+1,j] + w(i,j)) | Θ(n³) | Θ(n²) |
Two Hallmarks of DP
- Optimal substructure — optimal solutions contain optimal solutions to subproblems.
- Overlapping subproblems — a recursive solution revisits the same subproblems repeatedly.
Two Implementation Approaches
- Top-down with memoization — natural recursion + caching.
- Bottom-up tabulation — solve subproblems in size order.
Previous chapter
Augmenting Data StructuresNext chapter
Greedy Algorithms