Approximation Algorithms
Introduction to AlgorithmsChapter 35: Approximation Algorithms
Many problems of practical significance are NP-complete, yet too important to abandon. Even if a problem is NP-complete, we have at least three ways to cope:
- If actual inputs are small, an exponential-time algorithm may suffice
- Isolate important special cases solvable in polynomial time
- Find near-optimal solutions in polynomial time
An algorithm that returns near-optimal solutions is called an approximation algorithm.
Performance Ratios for Approximation Algorithms
For an optimization problem where each potential solution has a positive cost:
- Let C = cost of the approximate solution
- Let C* = cost of an optimal solution
An algorithm has an approximation ratio of ρ(n) if, for any input of size n:
$$\max\left(\frac{C}{C^},\ \frac{C^}{C}\right) \leq \rho(n)$$
- A ρ(n)-approximation algorithm achieves approximation ratio ρ(n)
- The ratio is always ≥ 1 (a 1-approximation algorithm produces an optimal solution)
- For minimization: C/C* gives the factor by which the approximate cost exceeds optimal
- For maximization: C*/C gives the factor by which optimal exceeds the approximate cost
Approximation Schemes
- An approximation scheme takes an additional input ε > 0 and is a (1 + ε)-approximation algorithm for any fixed ε
- A polynomial-time approximation scheme (PTAS): for any fixed ε > 0, runs in time polynomial in input size n
- Example running time: O(n^{2/ε})
- A fully polynomial-time approximation scheme (FPTAS): runs in time polynomial in both 1/ε and n
- Example running time: O((1/ε)² · n³)
- Any constant-factor decrease in ε yields only a constant-factor increase in running time
35.1 The Vertex-Cover Problem
A vertex cover of an undirected graph G = (V, E) is a subset V' ⊆ V such that for every edge (u, v) ∈ E, at least one of u or v is in V'. The vertex-cover problem is to find a vertex cover of minimum size (an optimal vertex cover).
This is the optimization version of an NP-complete decision problem.
Algorithm: APPROX-VERTEX-COVER
APPROX-VERTEX-COVER(G)
1 C = ∅
2 E' = G.E
3 while E' ≠ ∅
4 let (u, v) be an arbitrary edge of E'
5 C = C ∪ {u, v}
6 remove from E' every edge incident on either u or v
7 return C
How it works:
- Repeatedly pick an arbitrary edge, add both endpoints to the cover
- Remove all edges incident on those endpoints
- Continue until all edges are covered
Running time: O(V + E) using adjacency lists.
Theorem 35.1
APPROX-VERTEX-COVER is a polynomial-time 2-approximation algorithm.
Proof sketch:
- Let A = set of edges picked in line 4
- A is a maximal matching — no two edges in A share an endpoint
- Any vertex cover (including optimal C*) must include at least one endpoint of each edge in A:
- |C*| ≥ |A| (lower bound)
- Each picked edge adds exactly 2 vertices to C:
- |C| = 2|A| (upper bound)
- Combining: |C| = 2|A| ≤ 2|C*|
Key insight: We prove the bound by relating the solution to a lower bound on the optimal, without knowing the optimal itself.
35.2 The Traveling-Salesman Problem
Given a complete undirected graph G = (V, E) with nonnegative integer cost c(u, v) on each edge, find a hamiltonian cycle (tour) of minimum cost.
The triangle inequality states: for all vertices u, v, w ∈ V,
$$c(u, w) \leq c(u, v) + c(v, w)$$
This holds naturally in many applications (e.g., Euclidean distances).
35.2.1 TSP with the Triangle Inequality
Strategy: Compute a minimum spanning tree (MST) as a lower bound, then convert it to a tour.
APPROX-TSP-TOUR(G, c)
1 select a vertex r ∈ G.V to be a "root" vertex
2 compute a minimum spanning tree T for G from root r
using MST-PRIM(G, c, r)
3 let H be a list of vertices, ordered according to when they
are first visited in a preorder tree walk of T
4 return the hamiltonian cycle H
Running time: Θ(V²) with a simple MST-PRIM implementation.
Theorem 35.2
APPROX-TSP-TOUR is a polynomial-time 2-approximation algorithm for TSP with the triangle inequality.
Proof sketch:
- Let H* be an optimal tour. Deleting any edge from H* gives a spanning tree, so:
- c(T) ≤ c(H*) — MST weight is a lower bound on optimal tour cost
- A full walk W of T traverses every edge exactly twice:
- c(W) = 2·c(T)
- Combining: c(W) ≤ 2·c(H*)
- W is not a tour (visits vertices multiple times). By the triangle inequality, removing repeated visits does not increase cost
- The preorder walk H removes duplicates from W:
- c(H) ≤ c(W) ≤ 2·c(H*)
Example walkthrough:
- Full walk: a, b, c, b, h, b, a, d, e, f, e, g, e, d, a
- Preorder walk (first visits only): a, b, c, h, d, e, f, g
- The preorder tour cost ≈ 19.074 vs. optimal ≈ 14.715
35.2.2 The General TSP (Without Triangle Inequality)
Theorem 35.3
If P ≠ NP, then for any constant ρ ≥ 1, there is no polynomial-time ρ-approximation algorithm for the general TSP.
Proof sketch (by contradiction):
- Assume a ρ-approximation algorithm A exists
- Given a hamiltonian-cycle instance G = (V, E), construct a TSP instance G' = (V, E') where:
- c(u, v) = 1 if (u, v) ∈ E
- c(u, v) = ρ|V| + 1 otherwise
- If G has a hamiltonian cycle → optimal tour cost = |V|
- If G has no hamiltonian cycle → any tour costs > ρ|V|
- The gap (factor > ρ) means algorithm A would distinguish the two cases, solving the NP-complete hamiltonian-cycle problem in polynomial time
General technique: Show that "yes" instances of an NP-hard problem X map to instances of Y with value ≤ k, while "no" instances map to value > ρk. This proves no ρ-approximation exists (unless P = NP).
35.3 The Set-Covering Problem
An instance (X, F) consists of:
- A finite set X (the universe)
- A family F of subsets of X, where X = ∪_{S ∈ F} S
Goal: Find a minimum-size subset C ⊆ F that covers all of X (i.e., X = ∪_{S ∈ C} S).
The size of C is the number of sets it contains. The decision version is NP-complete.
Practical example: X = set of required skills, F = sets of skills each person has. Find the smallest committee covering all skills.
Greedy Approximation Algorithm
GREEDY-SET-COVER(X, F)
1 U = X
2 C = ∅
3 while U ≠ ∅
4 select an S ∈ F that maximizes |S ∩ U|
5 U = U - S
6 C = C ∪ {S}
7 return C
Strategy: At each step, pick the set covering the most uncovered elements (breaking ties arbitrarily).
Running time: O(|X| · |F| · min(|X|, |F|)) with a simple implementation; can be improved to O(Σ_{S ∈ F} |S|).
Theorem 35.4
GREEDY-SET-COVER is a polynomial-time ρ(n)-approximation algorithm, where:
$$\rho(n) = H(\max{|S| : S \in F})$$
Here H(d) = Σ_{i=1}^{d} 1/i is the d-th harmonic number.
Proof sketch:
- Assign a cost of 1 to each set selected; distribute this cost evenly among newly covered elements
- For element x first covered by set Sᵢ: cₓ = 1 / |Sᵢ - (S₁ ∪ ... ∪ Sᵢ₋₁)|
- Total cost: |C| = Σ_{x ∈ X} cₓ
- Since each x ∈ X belongs to at least one set in optimal C*:
- |C| ≤ Σ_{S ∈ C*} Σ_{x ∈ S} cₓ
- Key inequality: For any S ∈ F, Σ_{x ∈ S} cₓ ≤ H(|S|)
- Proved by tracking uᵢ = |S - (S₁ ∪ ... ∪ Sᵢ)| (uncovered elements of S after step i)
- The greedy choice ensures Sᵢ covers at least as many new elements as S would
- The sum telescopes to H(|S|)
- Therefore: |C| ≤ |C*| · H(max{|S| : S ∈ F})
Corollary 35.5
GREEDY-SET-COVER is a polynomial-time (ln |X| + 1)-approximation algorithm.
Special cases: When max{|S| : S ∈ F} is a small constant, the approximation ratio is a small constant. For example, finding a vertex cover in a graph with max degree 3 gives ratio H(3) = 11/6 ≈ 1.83.
35.4 Randomization and Linear Programming
Two powerful techniques for designing approximation algorithms.
Randomized Approximation for MAX-3-CNF Satisfiability
MAX-3-CNF satisfiability: Given a 3-CNF formula, find an assignment maximizing the number of satisfied clauses.
Algorithm: Independently set each variable to 1 with probability 1/2 and to 0 with probability 1/2.
A randomized ρ(n)-approximation algorithm satisfies:
$$\max\left(\frac{C}{C^},\ \frac{C^}{C}\right) \leq \rho(n)$$
where C is the expected cost of the solution.
Theorem 35.6
The random assignment algorithm is a randomized 8/7-approximation algorithm for MAX-3-CNF satisfiability.
Proof:
- Each clause has exactly 3 distinct literals; no clause contains both a variable and its negation
- A clause is unsatisfied only if all 3 literals are set to 0
- Pr[clause i is not satisfied] = (1/2)³ = 1/8
- Pr[clause i is satisfied] = 7/8
- Let Y = total number of satisfied clauses:
- E[Y] = 7m/8 (by linearity of expectation)
- Since m is an upper bound on the optimum:
- Approximation ratio ≤ m / (7m/8) = 8/7
Approximating Weighted Vertex Cover Using Linear Programming
Minimum-weight vertex-cover problem: Given G = (V, E) with positive vertex weights w(v), find a vertex cover of minimum total weight.
Step 1: Formulate as a 0-1 integer program:
minimize Σ_{v ∈ V} w(v) · x(v)
subject to x(u) + x(v) ≥ 1 for each (u, v) ∈ E
x(v) ∈ {0, 1} for each v ∈ V
Step 2: Relax to a linear program (replace x(v) ∈ {0, 1} with 0 ≤ x(v) ≤ 1):
minimize Σ_{v ∈ V} w(v) · x(v)
subject to x(u) + x(v) ≥ 1 for each (u, v) ∈ E
0 ≤ x(v) ≤ 1 for each v ∈ V
The LP optimal value is a lower bound on the integer program (and hence on the optimal vertex cover weight).
Step 3: Round the LP solution:
APPROX-MIN-WEIGHT-VC(G, w)
1 C = ∅
2 compute x̄, an optimal solution to the LP relaxation
3 for each v ∈ V
4 if x̄(v) ≥ 1/2
5 C = C ∪ {v}
6 return C
Theorem 35.7
APPROX-MIN-WEIGHT-VC is a polynomial-time 2-approximation algorithm for minimum-weight vertex cover.
Proof:
- C is a valid vertex cover: For any edge (u, v), the LP constraint gives x̄(u) + x̄(v) ≥ 1, so at least one of them is ≥ 1/2 and gets included
- Weight bound: Let z* = optimal LP value. Then:
- z* ≥ Σ_{v: x̄(v) ≥ 1/2} w(v) · x̄(v) ≥ Σ_{v: x̄(v) ≥ 1/2} w(v) · (1/2) = w(C)/2
- So w(C) ≤ 2z*
- Since z* ≤ w(C*) (LP relaxation lower bound):
- w(C) ≤ 2z* ≤ 2w(C*)
35.5 The Subset-Sum Problem
Instance: A set S = {x₁, x₂, ..., xₙ} of positive integers and a target value t.
Optimization version: Find a subset of S whose sum is as large as possible but not larger than t.
Practical example: A truck with weight limit t and n boxes of various weights — maximize the load.
Exponential-Time Exact Algorithm
Key idea: Iteratively compute Lᵢ, the sorted list of all subset sums of {x₁, ..., xᵢ} that don't exceed t.
Let L + x denote the list obtained by adding x to every element of L.
EXACT-SUBSET-SUM(S, t)
1 n = |S|
2 L₀ = ⟨0⟩
3 for i = 1 to n
4 Lᵢ = MERGE-LISTS(Lᵢ₋₁, Lᵢ₋₁ + xᵢ)
5 remove from Lᵢ every element that is greater than t
6 return the largest element in Lₙ
- MERGE-LISTS(L, L'): Merges two sorted lists, removing duplicates, in O(|L| + |L'|) time
- Uses the identity: Pᵢ = Pᵢ₋₁ ∪ (Pᵢ₋₁ + xᵢ), where Pᵢ is the set of all achievable sums from {x₁, ..., xᵢ}
- |Lᵢ| can be as large as 2ⁱ → exponential time in general
Fully Polynomial-Time Approximation Scheme
Key idea: Trim each list Lᵢ after creation. If two values are close, keep only one representative.
Trimming: Given a list L and parameter δ (0 < δ < 1), remove elements so that for every removed y, some remaining z satisfies:
$$\frac{y}{1 + \delta} \leq z \leq y$$
TRIM(L, δ)
1 let m be the length of L
2 L' = ⟨y₁⟩
3 last = y₁
4 for i = 2 to m
5 if yᵢ > last · (1 + δ)
6 append yᵢ onto the end of L'
7 last = yᵢ
8 return L'
Running time: Θ(m) for a list of length m.
The Approximation Scheme
APPROX-SUBSET-SUM(S, t, ε)
1 n = |S|
2 L₀ = ⟨0⟩
3 for i = 1 to n
4 Lᵢ = MERGE-LISTS(Lᵢ₋₁, Lᵢ₋₁ + xᵢ)
5 Lᵢ = TRIM(Lᵢ, ε/2n)
6 remove from Lᵢ every element that is greater than t
7 let z* be the largest value in Lₙ
8 return z*
Example: S = ⟨104, 102, 201, 101⟩, t = 308, ε = 0.40, δ = ε/8 = 0.05
| Step | List |
|---|---|
| L₀ | ⟨0⟩ |
| L₁ (after trim) | ⟨0, 104⟩ |
| L₂ (after trim) | ⟨0, 102, 206⟩ |
| L₃ (after trim & remove) | ⟨0, 102, 201, 303⟩ |
| L₄ (after trim & remove) | ⟨0, 101, 201, 302⟩ |
Returns z* = 302, which is within 2% of optimal 307 = 104 + 102 + 101 (well within ε = 40%).
Theorem 35.8
APPROX-SUBSET-SUM is a fully polynomial-time approximation scheme for the subset-sum problem.
Proof of approximation ratio:
- For every element y in Pᵢ that is ≤ t, there exists z ∈ Lᵢ such that:
- y / (1 + ε/2n)ⁱ ≤ z ≤ y
- For the optimal y* ∈ Pₙ and the returned z*:
- y* / z* ≤ (1 + ε/2n)ⁿ ≤ e^{ε/2} ≤ 1 + ε (using the fact that ε < 1)
Proof of polynomial running time:
- After trimming, successive elements differ by a factor of at least (1 + ε/2n)
- Each list Lᵢ has at most O(n·ln(t)/ε) elements
- This is polynomial in both the input size and 1/ε
- Total running time is polynomial in n, lg t, and 1/ε
Summary of Approximation Results
| Problem | Algorithm | Approximation Ratio | Time |
|---|---|---|---|
| Vertex Cover | APPROX-VERTEX-COVER | 2 | O(V + E) |
| TSP (triangle inequality) | APPROX-TSP-TOUR | 2 | Θ(V²) |
| General TSP | — | No constant ratio (unless P = NP) | — |
| Set Cover | GREEDY-SET-COVER | H(max|S|) ≤ ln|X| + 1 | O(|X|·|F|·min(|X|,|F|)) |
| MAX-3-CNF SAT | Random assignment | 8/7 (expected) | O(n) |
| Weighted Vertex Cover | APPROX-MIN-WEIGHT-VC (LP rounding) | 2 | Poly (LP solve) |
| Subset Sum | APPROX-SUBSET-SUM | (1 + ε) — FPTAS | Poly in n, lg t, 1/ε |
Previous chapter
NP CompletenessNext chapter
Appendix A Summations