NP Completeness
Introduction to AlgorithmsChapter 34: NP-Completeness
Almost all algorithms studied so far have been polynomial-time algorithms — on inputs of size n, their worst-case running time is O(nᵏ) for some constant k. However, not all problems can be solved in polynomial time:
- Turing's Halting Problem cannot be solved by any computer, regardless of time allowed
- Some problems can be solved, but not in time O(nᵏ) for any constant k
NP-complete problems occupy an interesting middle ground: no polynomial-time algorithm has been found for any of them, yet no one has proven that such algorithms cannot exist. This is the famous P ≠ NP question, open since 1971.
Contrasting pairs (one polynomial, one NP-complete):
| Polynomial-Time | NP-Complete |
|---|---|
| Shortest simple path (O(VE)) | Longest simple path |
| Euler tour (O(E)) | Hamiltonian cycle |
| 2-CNF satisfiability | 3-CNF satisfiability |
Three key complexity classes:
- P — problems solvable in polynomial time
- NP — problems verifiable in polynomial time (given a certificate/solution, we can check it quickly)
- NPC (NP-complete) — problems in NP that are as "hard" as any problem in NP
Key insight: P ⊆ NP. If any NP-complete problem can be solved in polynomial time, then P = NP (every problem in NP becomes polynomial-time solvable).
Practical significance: Proving a problem NP-complete provides strong evidence of intractability → pursue approximation algorithms or tractable special cases instead.
Three key concepts for NP-completeness proofs:
- Decision problems vs. optimization problems — NP-completeness applies to decision problems (yes/no answers), but optimization problems can be recast as decision problems by imposing a bound
- Reductions — transform instances of problem A into instances of problem B in polynomial time, preserving yes/no answers
- A first NP-complete problem — CIRCUIT-SAT serves as the foundation; all other NP-completeness proofs reduce from it (or from problems already proven NP-complete)
34.1 Polynomial Time
We regard polynomial-time solvable problems as tractable for three reasons:
- Practical running times — few real problems require Θ(n¹⁰⁰); once a first polynomial algorithm is found, more efficient ones typically follow
- Model independence — the class of polynomial-time solvable problems is the same across serial RAM, Turing machines, and polynomially-bounded parallel computers
- Closure properties — polynomials are closed under addition, multiplication, and composition, so composing polynomial-time algorithms yields polynomial-time algorithms
Abstract Problems
An abstract problem Q is a binary relation on a set I of problem instances and a set S of problem solutions.
- Example: SHORTEST-PATH maps (graph, u, v) → shortest path sequence
- Abstract decision problem: a function mapping instance set I to {0, 1}
- Example: PATH(⟨G, u, v, k⟩) = 1 if a shortest path from u to v has ≤ k edges
Optimization problems can be recast as decision problems: if the decision version is hard, the optimization version is at least as hard.
Encodings
An encoding maps abstract objects to binary strings. A concrete problem has binary strings as its instance set.
- A concrete problem is polynomial-time solvable if an algorithm solves it in O(nᵏ) time
- P = the set of concrete decision problems that are polynomial-time solvable
Two encodings e₁ and e₂ are polynomially related if polynomial-time computable functions can convert between them.
Lemma 34.1: If e₁ and e₂ are polynomially related encodings of abstract problem Q, then e₁(Q) ∈ P if and only if e₂(Q) ∈ P.
- Binary vs. base-3 representation: polynomially related → no effect on tractability
- Binary vs. unary representation: not polynomially related → can change complexity (e.g., Θ(k) is polynomial in unary input length but exponential in binary input length)
Convention: Use "standard" encoding (polynomially related to binary) and denote it with angle brackets ⟨G⟩.
A Formal-Language Framework
- Alphabet Σ: a finite set of symbols
- Language L over Σ: any set of strings from Σ*
- Operations: union, intersection, complement (L̄ = Σ* − L), concatenation, Kleene star (L* = {ε} ∪ L ∪ L² ∪ ...)
A decision problem Q corresponds to the language L = {x ∈ Σ* : Q(x) = 1}.
Key definitions:
- Algorithm A accepts string x if A(x) = 1
- Algorithm A rejects string x if A(x) = 0
- Language L is accepted by A: L = {x ∈ {0,1}* : A(x) = 1}
- Language L is decided by A: A correctly accepts every x ∈ L and rejects every x ∉ L
- Accepting only requires output on yes-instances; deciding requires correct output on all inputs
Formal definition of P:
P = {L ⊆ {0,1}* : there exists an algorithm A that decides L in polynomial time}
Theorem 34.2: P equals the class of languages accepted by polynomial-time algorithms.
Proof sketch: If A accepts L in O(nᵏ) time, simulate A for cnᵏ steps. If A accepted, output 1; otherwise output 0. This decides L in polynomial time.
34.2 Polynomial-Time Verification
Hamiltonian Cycles
A hamiltonian cycle of an undirected graph G = (V, E) is a simple cycle containing each vertex in V exactly once.
- HAM-CYCLE = {⟨G⟩ : G is a hamiltonian graph}
- Brute-force: check all m! permutations → Ω(2^√n) time → not polynomial
- HAM-CYCLE is NP-complete (proven in Section 34.5)
Verification Algorithms
A verification algorithm A takes two arguments:
- Input string x (the problem instance)
- Certificate y (a proposed proof/witness)
A verifies input x if there exists a certificate y such that A(x, y) = 1.
The language verified by A:
L = {x ∈ {0,1}* : there exists y ∈ {0,1}* such that A(x, y) = 1}
For HAM-CYCLE: the certificate is the sequence of vertices forming the cycle. Verification checks that it's a valid permutation of V with all consecutive edges present → O(n²) time.
The Complexity Class NP
NP is the class of languages verifiable by a polynomial-time algorithm:
L ∈ NP if and only if there exist a two-input polynomial-time algorithm A and constant c such that: L = {x ∈ {0,1}* : ∃ certificate y with |y| = O(|x|ᶜ) such that A(x, y) = 1}
Key relationships:
- P ⊆ NP — if we can solve in polynomial time, we can certainly verify (just ignore the certificate)
- P = NP? — the central open question; most researchers believe P ≠ NP
- co-NP = {L : L̄ ∈ NP}; unknown whether NP = co-NP
- P ⊆ NP ∩ co-NP (since P is closed under complement)
Four possible scenarios for P, NP, co-NP:
- P = NP = co-NP (most unlikely)
- NP = co-NP but P ≠ NP
- P = NP ∩ co-NP but NP ≠ co-NP
- NP ≠ co-NP and P ≠ NP ∩ co-NP (most likely)
34.3 NP-Completeness and Reducibility
Reducibility
A language L₁ is polynomial-time reducible to L₂, written L₁ ≤ₚ L₂, if there exists a polynomial-time computable function f : {0,1}* → {0,1}* such that for all x:
x ∈ L₁ if and only if f(x) ∈ L₂
- f is the reduction function
- The polynomial-time algorithm computing f is the reduction algorithm
How to use a reduction to solve problem A via problem B:
1. Given instance α of A, compute β = f(α) in polynomial time
2. Run polynomial-time algorithm for B on β
3. Use the answer for β as the answer for α
Lemma 34.3: If L₁ ≤ₚ L₂ and L₂ ∈ P, then L₁ ∈ P.
Contrapositive use: If L₁ has no polynomial-time algorithm and L₁ ≤ₚ L₂, then L₂ has no polynomial-time algorithm either.
NP-Completeness Definition
A language L ⊆ {0,1}* is NP-complete if:
- L ∈ NP
- L' ≤ₚ L for every L' ∈ NP
If L satisfies only property 2 (but not necessarily 1), L is NP-hard.
NPC = the class of NP-complete languages.
Theorem 34.4: If any NP-complete problem is polynomial-time solvable, then P = NP. Equivalently, if any problem in NP is not polynomial-time solvable, then no NP-complete problem is polynomial-time solvable.
Circuit Satisfiability (CIRCUIT-SAT)
Boolean combinational circuits are built from:
- NOT gate (inverter): output = ¬x
- AND gate: output = x ∧ y (1 only if all inputs are 1)
- OR gate: output = x ∨ y (1 if any input is 1)
Gates are interconnected by wires (no cycles — the circuit is a DAG). A circuit has inputs, internal wires, and a single output.
A truth assignment is a set of boolean input values. A circuit is satisfiable if some truth assignment causes output = 1.
CIRCUIT-SAT = {⟨C⟩ : C is a satisfiable boolean combinational circuit}
Lemma 34.5: CIRCUIT-SAT ∈ NP.
- Certificate: assignment of values to all wires
- Verification: check each gate computes correctly, check output = 1 → linear time
Lemma 34.6: CIRCUIT-SAT is NP-hard.
- Proof sketch: For any L ∈ NP with verifier A running in O(nᵏ) time:
- Model computation of A as a sequence of configurations c₀, c₁, ..., c_{T(n)}
- Hardware circuit M maps each configuration to the next
- Paste T(n) copies of M together
- Wire known values (program A, input x, initial state) directly
- Leave certificate y as the only free inputs
- Output bit = A's output
- Circuit is satisfiable ⟺ ∃ certificate y such that A(x, y) = 1
Theorem 34.7: The circuit-satisfiability problem is NP-complete.
34.4 NP-Completeness Proofs
Methodology
Lemma 34.8: If L' ≤ₚ L for some L' ∈ NPC, then L is NP-hard. If additionally L ∈ NP, then L ∈ NPC.
Recipe for proving L is NP-complete:
- Prove L ∈ NP
- Select a known NP-complete language L'
- Describe a polynomial-time algorithm computing reduction function f
- Prove: x ∈ L' ⟺ f(x) ∈ L
- Prove the reduction runs in polynomial time
Formula Satisfiability (SAT)
An instance of SAT is a boolean formula φ with:
- n boolean variables: x₁, x₂, ..., xₙ
- m boolean connectives: ∧, ∨, ¬, →, ↔
- Parentheses
SAT = {⟨φ⟩ : φ is a satisfiable boolean formula}
Theorem 34.9: SAT is NP-complete.
Proof outline:
- SAT ∈ NP: Certificate = satisfying assignment. Replace variables with values, evaluate → polynomial time.
- CIRCUIT-SAT ≤ₚ SAT: For each wire xᵢ in circuit C, introduce a variable xᵢ. Express each gate as a clause (e.g., x₁₀ ↔ (x₇ ∧ x₈ ∧ x₉)). The formula φ is the AND of the output variable with all gate clauses.
- C satisfiable ⟺ φ satisfiable
- Reduction runs in polynomial time (formula size is linear in circuit size)
Note: A naive inductive conversion of circuit to formula can cause exponential blowup due to shared subformulas (fan-out ≥ 2). The clause-per-gate approach avoids this.
3-CNF Satisfiability (3-CNF-SAT)
Definitions:
- A literal is a variable or its negation
- Conjunctive Normal Form (CNF): AND of clauses, each clause is OR of literals
- 3-CNF: each clause has exactly 3 distinct literals
Example: (x₁ ∨ ¬x₁ ∨ ¬x₂) ∧ (x₃ ∨ x₂ ∨ x₄) ∧ (¬x₁ ∨ ¬x₃ ∨ ¬x₄)
Theorem 34.10: 3-CNF-SAT is NP-complete.
Proof (SAT ≤ₚ 3-CNF-SAT) — three steps:
Step 1: Parse tree → clauses
- Build a binary parse tree for φ
- Introduce variable yᵢ for each internal node's output
- Rewrite as: φ' = y₁ ∧ (y₁ ↔ (y₂ ∧ ¬x₂)) ∧ (y₂ ↔ (y₃ ∨ y₄)) ∧ ...
- Each clause φ'ᵢ has at most 3 variables
Step 2: Convert each clause to CNF
- Build truth table for each clause φ'ᵢ
- From rows evaluating to 0, build DNF for ¬φ'ᵢ
- Negate and apply DeMorgan's laws to get CNF:
- ¬(a ∧ b) = ¬a ∨ ¬b
- ¬(a ∨ b) = ¬a ∧ ¬b
- Each clause has at most 3 literals (since each φ'ᵢ has ≤ 3 variables → ≤ 8 truth table rows)
Step 3: Pad clauses to exactly 3 literals
- Using auxiliary variables p and q:
- 3 literals → keep as is
- 2 literals (l₁ ∨ l₂) → (l₁ ∨ l₂ ∨ p) ∧ (l₁ ∨ l₂ ∨ ¬p)
- 1 literal (l) → (l ∨ p ∨ q) ∧ (l ∨ p ∨ ¬q) ∧ (l ∨ ¬p ∨ q) ∧ (l ∨ ¬p ∨ ¬q)
Polynomial time: Each step introduces at most a constant factor blowup per connective/clause.
34.5 NP-Complete Problems
Reduction chain:
CIRCUIT-SAT → SAT → 3-CNF-SAT → CLIQUE → VERTEX-COVER → HAM-CYCLE → TSP
↘
3-CNF-SAT ──────────────→ SUBSET-SUM
34.5.1 The Clique Problem
A clique in undirected graph G = (V, E) is a subset V' ⊆ V where every pair of vertices in V' is connected by an edge (a complete subgraph). The size of a clique is |V'|.
CLIQUE = {⟨G, k⟩ : G contains a clique of size k}
Theorem 34.11: CLIQUE is NP-complete.
CLIQUE ∈ NP: Certificate = vertex set V'. Verify all pairs have edges → polynomial time.
3-CNF-SAT ≤ₚ CLIQUE:
Given φ = C₁ ∧ C₂ ∧ ... ∧ Cₖ (each Cᵣ has 3 literals l₁ʳ, l₂ʳ, l₃ʳ):
-
Construct graph G = (V, E):
- For each clause Cᵣ, create a triple of vertices v₁ʳ, v₂ʳ, v₃ʳ
- Add edge (vᵢʳ, vⱼˢ) if:
- r ≠ s (different triples), AND
- lᵢʳ and lⱼˢ are consistent (not negations of each other)
-
Claim: φ is satisfiable ⟺ G has a clique of size k
- (⟹) Satisfying assignment → pick one true literal per clause → k vertices form a clique (all from different triples, all consistent → all connected)
- (⟸) Clique of size k → exactly one vertex per triple → set corresponding literals to 1 (no contradictions since edges only connect consistent literals) → all clauses satisfied
34.5.2 The Vertex-Cover Problem
A vertex cover of G = (V, E) is a subset V' ⊆ V such that every edge (u, v) ∈ E has at least one endpoint in V'.
VERTEX-COVER = {⟨G, k⟩ : G has a vertex cover of size k}
Theorem 34.12: VERTEX-COVER is NP-complete.
VERTEX-COVER ∈ NP: Certificate = V'. Check |V'| = k and every edge is covered → polynomial time.
CLIQUE ≤ₚ VERTEX-COVER:
Uses the complement graph Ḡ = (V, Ē) where Ē = {(u,v) : u ≠ v and (u,v) ∉ E}.
-
Reduction: ⟨G, k⟩ → ⟨Ḡ, |V| − k⟩
-
Claim: G has a clique of size k ⟺ Ḡ has a vertex cover of size |V| − k
- (⟹) Clique V' of size k in G → V − V' is a vertex cover in Ḡ. For any edge (u,v) ∈ Ē: (u,v) ∉ E, so at least one of u, v is not in V' → at least one is in V − V'.
- (⟸) Vertex cover V' of size |V| − k in Ḡ → V − V' is a clique of size k in G. If u, v ∉ V', then (u,v) ∉ Ē (since V' covers all edges of Ḡ), so (u,v) ∈ E.
34.5.3 The Hamiltonian-Cycle Problem
Theorem 34.13: HAM-CYCLE is NP-complete.
HAM-CYCLE ∈ NP: Certificate = sequence of |V| vertices. Verify it's a permutation of V with all consecutive edges (and last-to-first) present → polynomial time.
VERTEX-COVER ≤ₚ HAM-CYCLE:
Given undirected G = (V, E) and integer k, construct G' = (V', E') with a hamiltonian cycle iff G has a vertex cover of size k.
Construction uses a widget Wᵤᵥ for each edge (u, v) ∈ E:
- Each widget has 12 vertices: [u,v,1]...[u,v,6] and [v,u,1]...[v,u,6]
- 14 internal edges
- Only vertices [u,v,1], [u,v,6], [v,u,1], [v,u,6] connect to the rest of G'
Widget property: A hamiltonian cycle can traverse the widget in exactly 3 ways:
- Enter [u,v,1], visit all 12 vertices, exit [u,v,6]
- Two separate passes: [u,v,1]→[u,v,6] and [v,u,1]→[v,u,6] (6 vertices each)
- Enter [v,u,1], visit all 12 vertices, exit [v,u,6]
Additional structure:
- k selector vertices s₁, ..., sₖ
- For each vertex u ∈ V, chain its widgets together into a path
- Connect first/last widget vertices of each chain to all selector vertices
Correctness:
- Vertex cover of size k → hamiltonian cycle: each selector vertex sⱼ starts a path through all widgets of cover vertex uⱼ
- Hamiltonian cycle → vertex cover of size k: partition cycle into "cover paths" starting at selector vertices; each cover path corresponds to a cover vertex
Size of G': |V'| = 12|E| + k, |E'| = 16|E| + (2k−1)|V| → polynomial.
34.5.4 The Traveling-Salesman Problem (TSP)
A tour is a hamiltonian cycle in a complete graph with edge costs. The goal is to minimize total cost.
TSP = {⟨G, c, k⟩ : complete graph G has a tour with cost ≤ k}
Theorem 34.14: TSP is NP-complete.
TSP ∈ NP: Certificate = vertex sequence. Verify it visits each vertex once, sum costs, check ≤ k → polynomial time.
HAM-CYCLE ≤ₚ TSP:
Given G = (V, E):
- Create complete graph G' = (V, E') with cost function:
- c(i, j) = 0 if (i, j) ∈ E
- c(i, j) = 1 if (i, j) ∉ E
- Output ⟨G', c, 0⟩
Correctness:
- G has hamiltonian cycle h → h uses only edges in E → cost 0 in G' → tour of cost 0
- G' has tour h' of cost ≤ 0 → all edges have cost 0 → all edges are in E → h' is a hamiltonian cycle in G
34.5.5 The Subset-Sum Problem
Given a finite set S of positive integers and target t > 0, does a subset S' ⊆ S exist with Σ_{s∈S'} s = t?
SUBSET-SUM = {⟨S, t⟩ : ∃ S' ⊆ S such that Σ_{s∈S'} s = t}
Theorem 34.15: SUBSET-SUM is NP-complete.
SUBSET-SUM ∈ NP: Certificate = S'. Sum elements, check = t → polynomial time.
3-CNF-SAT ≤ₚ SUBSET-SUM:
Given φ with variables x₁,...,xₙ and clauses C₁,...,Cₖ, construct numbers in base 10 with n + k digits (n variable-labeled digits + k clause-labeled digits):
-
Target t: digit 1 for each variable position, digit 4 for each clause position
-
For each variable xᵢ: create two numbers vᵢ and v'ᵢ:
- Both have 1 in the xᵢ digit position, 0 in other variable positions
- vᵢ has 1 in Cⱼ digit if literal xᵢ appears in clause Cⱼ
- v'ᵢ has 1 in Cⱼ digit if literal ¬xᵢ appears in clause Cⱼ
-
For each clause Cⱼ: create two slack variables sⱼ (value 1 in Cⱼ digit) and s'ⱼ (value 2 in Cⱼ digit)
Key property: Maximum digit sum = 6 (three 1s from variables + 1 + 2 from slack) → no carries in base 10.
Correctness:
- (⟹) Satisfying assignment → include vᵢ if xᵢ = 1, else v'ᵢ → variable digits sum to 1 ✓. Each clause has 1–3 true literals → clause digits sum to 1–3 → use slack variables to reach 4 ✓
- (⟸) Subset summing to t → must include exactly one of vᵢ, v'ᵢ per variable (to get 1 in variable digits). Set xᵢ = 1 if vᵢ ∈ S'. Slack variables contribute ≤ 3 per clause digit, so at least one vᵢ/v'ᵢ must contribute 1 → corresponding literal is true → clause satisfied ✓
Size: S has 2n + 2k values, each with n + k digits → polynomial.
Summary of Reductions
CIRCUIT-SAT (first NP-complete problem, proved from scratch)
│
▼
SAT (formula satisfiability)
│
▼
3-CNF-SAT (satisfiability in 3-conjunctive normal form)
│ │
▼ ▼
CLIQUE SUBSET-SUM
│
▼
VERTEX-COVER
│
▼
HAM-CYCLE
│
▼
TSP
To prove a new problem L is NP-complete:
- Show L ∈ NP (exhibit a polynomial-time verifier with polynomial-size certificate)
- Pick a known NP-complete problem L'
- Show L' ≤ₚ L (polynomial-time reduction preserving yes/no answers)
Common pitfalls:
- Reducing in the wrong direction (must reduce from known NP-complete to new problem)
- Reduction must not depend on knowing the solution to the source problem
- Reducing only special cases of the source problem doesn't suffice — must handle all instances
Previous chapter
Computational GeometryNext chapter
Approximation Algorithms