Pagefy

Book

Introduction to Algorithms

1.The Role of Algorithms in Computing 2.Getting Started 3.Growth of Functions 4.Divide and Conquer 5.Probabilistic Analysis and Randomized Algorithms 6.Heapsort 7.Quicksort 8.Sorting in Linear Time 9.Medians and Order Statistics 10.Elementary Data Structures 11.Hash Tables 12.Binary Search Trees 13.Red Black Trees 14.Augmenting Data Structures 15.Dynamic Programming 16.Greedy Algorithms 17.Amortized Analysis 18.B Trees 19.Fibonacci Heaps 20.van Emde Boas Trees 21.Data Structures for Disjoint Sets 22.Elementary Graph Algorithms 23.Minimum Spanning Trees 24.Single Source Shortest Paths 25.All Pairs Shortest Paths 26.Maximum Flow 27.Multithreaded Algorithms 28.Matrix Operations 29.Linear Programming 30.Polynomials and the FFT 31.Number Theoretic Algorithms 32.String Matching 33.Computational Geometry 34.NP Completeness 35.Approximation Algorithms 36.Appendix A Summations 37.Appendix B Sets Etc 38.Appendix C Counting and Probability 39.Appendix D Matrices

NP Completeness

Introduction to AlgorithmsBuy the book

Chapter 34: NP-Completeness

Almost all algorithms studied so far have been polynomial-time algorithms — on inputs of size n, their worst-case running time is O(nᵏ) for some constant k. However, not all problems can be solved in polynomial time:

Turing's Halting Problem cannot be solved by any computer, regardless of time allowed
Some problems can be solved, but not in time O(nᵏ) for any constant k

NP-complete problems occupy an interesting middle ground: no polynomial-time algorithm has been found for any of them, yet no one has proven that such algorithms cannot exist. This is the famous P ≠ NP question, open since 1971.

Contrasting pairs (one polynomial, one NP-complete):

Polynomial-Time	NP-Complete
Shortest simple path (O(VE))	Longest simple path
Euler tour (O(E))	Hamiltonian cycle
2-CNF satisfiability	3-CNF satisfiability

Three key complexity classes:

P — problems solvable in polynomial time
NP — problems verifiable in polynomial time (given a certificate/solution, we can check it quickly)
NPC (NP-complete) — problems in NP that are as "hard" as any problem in NP

Key insight: P ⊆ NP. If any NP-complete problem can be solved in polynomial time, then P = NP (every problem in NP becomes polynomial-time solvable).

Practical significance: Proving a problem NP-complete provides strong evidence of intractability → pursue approximation algorithms or tractable special cases instead.

Three key concepts for NP-completeness proofs:

Decision problems vs. optimization problems — NP-completeness applies to decision problems (yes/no answers), but optimization problems can be recast as decision problems by imposing a bound
Reductions — transform instances of problem A into instances of problem B in polynomial time, preserving yes/no answers
A first NP-complete problem — CIRCUIT-SAT serves as the foundation; all other NP-completeness proofs reduce from it (or from problems already proven NP-complete)

34.1 Polynomial Time

We regard polynomial-time solvable problems as tractable for three reasons:

Practical running times — few real problems require Θ(n¹⁰⁰); once a first polynomial algorithm is found, more efficient ones typically follow
Model independence — the class of polynomial-time solvable problems is the same across serial RAM, Turing machines, and polynomially-bounded parallel computers
Closure properties — polynomials are closed under addition, multiplication, and composition, so composing polynomial-time algorithms yields polynomial-time algorithms

Abstract Problems

An abstract problem Q is a binary relation on a set I of problem instances and a set S of problem solutions.

Example: SHORTEST-PATH maps (graph, u, v) → shortest path sequence
Abstract decision problem: a function mapping instance set I to {0, 1}
Example: PATH(⟨G, u, v, k⟩) = 1 if a shortest path from u to v has ≤ k edges

Optimization problems can be recast as decision problems: if the decision version is hard, the optimization version is at least as hard.

Encodings

An encoding maps abstract objects to binary strings. A concrete problem has binary strings as its instance set.

A concrete problem is polynomial-time solvable if an algorithm solves it in O(nᵏ) time
P = the set of concrete decision problems that are polynomial-time solvable

Two encodings e₁ and e₂ are polynomially related if polynomial-time computable functions can convert between them.

Lemma 34.1: If e₁ and e₂ are polynomially related encodings of abstract problem Q, then e₁(Q) ∈ P if and only if e₂(Q) ∈ P.

Binary vs. base-3 representation: polynomially related → no effect on tractability
Binary vs. unary representation: not polynomially related → can change complexity (e.g., Θ(k) is polynomial in unary input length but exponential in binary input length)

Convention: Use "standard" encoding (polynomially related to binary) and denote it with angle brackets ⟨G⟩.

A Formal-Language Framework

Alphabet Σ: a finite set of symbols
Language L over Σ: any set of strings from Σ*
Operations: union, intersection, complement (L̄ = Σ* − L), concatenation, Kleene star (L* = {ε} ∪ L ∪ L² ∪ ...)

A decision problem Q corresponds to the language L = {x ∈ Σ* : Q(x) = 1}.

Key definitions:

Algorithm A accepts string x if A(x) = 1
Algorithm A rejects string x if A(x) = 0
Language L is accepted by A: L = {x ∈ {0,1}* : A(x) = 1}
Language L is decided by A: A correctly accepts every x ∈ L and rejects every x ∉ L
Accepting only requires output on yes-instances; deciding requires correct output on all inputs

Formal definition of P:

P = {L ⊆ {0,1}* : there exists an algorithm A that decides L in polynomial time}

Theorem 34.2: P equals the class of languages accepted by polynomial-time algorithms.

Proof sketch: If A accepts L in O(nᵏ) time, simulate A for cnᵏ steps. If A accepted, output 1; otherwise output 0. This decides L in polynomial time.

34.2 Polynomial-Time Verification

Hamiltonian Cycles

A hamiltonian cycle of an undirected graph G = (V, E) is a simple cycle containing each vertex in V exactly once.

HAM-CYCLE = {⟨G⟩ : G is a hamiltonian graph}
Brute-force: check all m! permutations → Ω(2^√n) time → not polynomial
HAM-CYCLE is NP-complete (proven in Section 34.5)

Verification Algorithms

A verification algorithm A takes two arguments:

Input string x (the problem instance)
Certificate y (a proposed proof/witness)

A verifies input x if there exists a certificate y such that A(x, y) = 1.

The language verified by A:

L = {x ∈ {0,1}* : there exists y ∈ {0,1}* such that A(x, y) = 1}

For HAM-CYCLE: the certificate is the sequence of vertices forming the cycle. Verification checks that it's a valid permutation of V with all consecutive edges present → O(n²) time.

The Complexity Class NP

NP is the class of languages verifiable by a polynomial-time algorithm:

L ∈ NP if and only if there exist a two-input polynomial-time algorithm A and constant c such that: L = {x ∈ {0,1}* : ∃ certificate y with |y| = O(|x|ᶜ) such that A(x, y) = 1}

Key relationships:

P ⊆ NP — if we can solve in polynomial time, we can certainly verify (just ignore the certificate)
P = NP? — the central open question; most researchers believe P ≠ NP
co-NP = {L : L̄ ∈ NP}; unknown whether NP = co-NP
P ⊆ NP ∩ co-NP (since P is closed under complement)

Four possible scenarios for P, NP, co-NP:

P = NP = co-NP (most unlikely)
NP = co-NP but P ≠ NP
P = NP ∩ co-NP but NP ≠ co-NP
NP ≠ co-NP and P ≠ NP ∩ co-NP (most likely)

34.3 NP-Completeness and Reducibility

Reducibility

A language L₁ is polynomial-time reducible to L₂, written L₁ ≤ₚ L₂, if there exists a polynomial-time computable function f : {0,1}* → {0,1}* such that for all x:

x ∈ L₁ if and only if f(x) ∈ L₂

f is the reduction function
The polynomial-time algorithm computing f is the reduction algorithm

How to use a reduction to solve problem A via problem B:

1. Given instance α of A, compute β = f(α) in polynomial time
2. Run polynomial-time algorithm for B on β
3. Use the answer for β as the answer for α

Lemma 34.3: If L₁ ≤ₚ L₂ and L₂ ∈ P, then L₁ ∈ P.

Contrapositive use: If L₁ has no polynomial-time algorithm and L₁ ≤ₚ L₂, then L₂ has no polynomial-time algorithm either.

NP-Completeness Definition

A language L ⊆ {0,1}* is NP-complete if:

L ∈ NP
L' ≤ₚ L for every L' ∈ NP

If L satisfies only property 2 (but not necessarily 1), L is NP-hard.

NPC = the class of NP-complete languages.

Theorem 34.4: If any NP-complete problem is polynomial-time solvable, then P = NP. Equivalently, if any problem in NP is not polynomial-time solvable, then no NP-complete problem is polynomial-time solvable.

Circuit Satisfiability (CIRCUIT-SAT)

Boolean combinational circuits are built from:

NOT gate (inverter): output = ¬x
AND gate: output = x ∧ y (1 only if all inputs are 1)
OR gate: output = x ∨ y (1 if any input is 1)

Gates are interconnected by wires (no cycles — the circuit is a DAG). A circuit has inputs, internal wires, and a single output.

A truth assignment is a set of boolean input values. A circuit is satisfiable if some truth assignment causes output = 1.

CIRCUIT-SAT = {⟨C⟩ : C is a satisfiable boolean combinational circuit}

Lemma 34.5: CIRCUIT-SAT ∈ NP.

Certificate: assignment of values to all wires
Verification: check each gate computes correctly, check output = 1 → linear time

Lemma 34.6: CIRCUIT-SAT is NP-hard.

Proof sketch: For any L ∈ NP with verifier A running in O(nᵏ) time:
- Model computation of A as a sequence of configurations c₀, c₁, ..., c_{T(n)}
- Hardware circuit M maps each configuration to the next
- Paste T(n) copies of M together
- Wire known values (program A, input x, initial state) directly
- Leave certificate y as the only free inputs
- Output bit = A's output
- Circuit is satisfiable ⟺ ∃ certificate y such that A(x, y) = 1

Theorem 34.7: The circuit-satisfiability problem is NP-complete.

34.4 NP-Completeness Proofs

Methodology

Lemma 34.8: If L' ≤ₚ L for some L' ∈ NPC, then L is NP-hard. If additionally L ∈ NP, then L ∈ NPC.

Recipe for proving L is NP-complete:

Prove L ∈ NP
Select a known NP-complete language L'
Describe a polynomial-time algorithm computing reduction function f
Prove: x ∈ L' ⟺ f(x) ∈ L
Prove the reduction runs in polynomial time

Formula Satisfiability (SAT)

An instance of SAT is a boolean formula φ with:

n boolean variables: x₁, x₂, ..., xₙ
m boolean connectives: ∧, ∨, ¬, →, ↔
Parentheses

SAT = {⟨φ⟩ : φ is a satisfiable boolean formula}

Theorem 34.9: SAT is NP-complete.

Proof outline:

SAT ∈ NP: Certificate = satisfying assignment. Replace variables with values, evaluate → polynomial time.
CIRCUIT-SAT ≤ₚ SAT: For each wire xᵢ in circuit C, introduce a variable xᵢ. Express each gate as a clause (e.g., x₁₀ ↔ (x₇ ∧ x₈ ∧ x₉)). The formula φ is the AND of the output variable with all gate clauses.
- C satisfiable ⟺ φ satisfiable
- Reduction runs in polynomial time (formula size is linear in circuit size)

Note: A naive inductive conversion of circuit to formula can cause exponential blowup due to shared subformulas (fan-out ≥ 2). The clause-per-gate approach avoids this.

3-CNF Satisfiability (3-CNF-SAT)

Definitions:

A literal is a variable or its negation
Conjunctive Normal Form (CNF): AND of clauses, each clause is OR of literals
3-CNF: each clause has exactly 3 distinct literals

Example: (x₁ ∨ ¬x₁ ∨ ¬x₂) ∧ (x₃ ∨ x₂ ∨ x₄) ∧ (¬x₁ ∨ ¬x₃ ∨ ¬x₄)

Theorem 34.10: 3-CNF-SAT is NP-complete.

Proof (SAT ≤ₚ 3-CNF-SAT) — three steps:

Step 1: Parse tree → clauses

Build a binary parse tree for φ
Introduce variable yᵢ for each internal node's output
Rewrite as: φ' = y₁ ∧ (y₁ ↔ (y₂ ∧ ¬x₂)) ∧ (y₂ ↔ (y₃ ∨ y₄)) ∧ ...
Each clause φ'ᵢ has at most 3 variables

Step 2: Convert each clause to CNF

Build truth table for each clause φ'ᵢ
From rows evaluating to 0, build DNF for ¬φ'ᵢ
Negate and apply DeMorgan's laws to get CNF:
- ¬(a ∧ b) = ¬a ∨ ¬b
- ¬(a ∨ b) = ¬a ∧ ¬b
Each clause has at most 3 literals (since each φ'ᵢ has ≤ 3 variables → ≤ 8 truth table rows)

Step 3: Pad clauses to exactly 3 literals

Using auxiliary variables p and q:
- 3 literals → keep as is
- 2 literals (l₁ ∨ l₂) → (l₁ ∨ l₂ ∨ p) ∧ (l₁ ∨ l₂ ∨ ¬p)
- 1 literal (l) → (l ∨ p ∨ q) ∧ (l ∨ p ∨ ¬q) ∧ (l ∨ ¬p ∨ q) ∧ (l ∨ ¬p ∨ ¬q)

Polynomial time: Each step introduces at most a constant factor blowup per connective/clause.

34.5 NP-Complete Problems

Reduction chain:

CIRCUIT-SAT → SAT → 3-CNF-SAT → CLIQUE → VERTEX-COVER → HAM-CYCLE → TSP
                                                              ↘
                                  3-CNF-SAT ──────────────→ SUBSET-SUM

34.5.1 The Clique Problem

A clique in undirected graph G = (V, E) is a subset V' ⊆ V where every pair of vertices in V' is connected by an edge (a complete subgraph). The size of a clique is |V'|.

CLIQUE = {⟨G, k⟩ : G contains a clique of size k}

Theorem 34.11: CLIQUE is NP-complete.

CLIQUE ∈ NP: Certificate = vertex set V'. Verify all pairs have edges → polynomial time.

3-CNF-SAT ≤ₚ CLIQUE:

Given φ = C₁ ∧ C₂ ∧ ... ∧ Cₖ (each Cᵣ has 3 literals l₁ʳ, l₂ʳ, l₃ʳ):

Construct graph G = (V, E):
- For each clause Cᵣ, create a triple of vertices v₁ʳ, v₂ʳ, v₃ʳ
- Add edge (vᵢʳ, vⱼˢ) if:
  - r ≠ s (different triples), AND
  - lᵢʳ and lⱼˢ are consistent (not negations of each other)
Claim: φ is satisfiable ⟺ G has a clique of size k
- (⟹) Satisfying assignment → pick one true literal per clause → k vertices form a clique (all from different triples, all consistent → all connected)
- (⟸) Clique of size k → exactly one vertex per triple → set corresponding literals to 1 (no contradictions since edges only connect consistent literals) → all clauses satisfied

34.5.2 The Vertex-Cover Problem

A vertex cover of G = (V, E) is a subset V' ⊆ V such that every edge (u, v) ∈ E has at least one endpoint in V'.

VERTEX-COVER = {⟨G, k⟩ : G has a vertex cover of size k}

Theorem 34.12: VERTEX-COVER is NP-complete.

VERTEX-COVER ∈ NP: Certificate = V'. Check |V'| = k and every edge is covered → polynomial time.

CLIQUE ≤ₚ VERTEX-COVER:

Uses the complement graph Ḡ = (V, Ē) where Ē = {(u,v) : u ≠ v and (u,v) ∉ E}.

Reduction: ⟨G, k⟩ → ⟨Ḡ, |V| − k⟩
Claim: G has a clique of size k ⟺ Ḡ has a vertex cover of size |V| − k
- (⟹) Clique V' of size k in G → V − V' is a vertex cover in Ḡ. For any edge (u,v) ∈ Ē: (u,v) ∉ E, so at least one of u, v is not in V' → at least one is in V − V'.
- (⟸) Vertex cover V' of size |V| − k in Ḡ → V − V' is a clique of size k in G. If u, v ∉ V', then (u,v) ∉ Ē (since V' covers all edges of Ḡ), so (u,v) ∈ E.

34.5.3 The Hamiltonian-Cycle Problem

Theorem 34.13: HAM-CYCLE is NP-complete.

HAM-CYCLE ∈ NP: Certificate = sequence of |V| vertices. Verify it's a permutation of V with all consecutive edges (and last-to-first) present → polynomial time.

VERTEX-COVER ≤ₚ HAM-CYCLE:

Given undirected G = (V, E) and integer k, construct G' = (V', E') with a hamiltonian cycle iff G has a vertex cover of size k.

Construction uses a widget Wᵤᵥ for each edge (u, v) ∈ E:

Each widget has 12 vertices: [u,v,1]...[u,v,6] and [v,u,1]...[v,u,6]
14 internal edges
Only vertices [u,v,1], [u,v,6], [v,u,1], [v,u,6] connect to the rest of G'

Widget property: A hamiltonian cycle can traverse the widget in exactly 3 ways:

Enter [u,v,1], visit all 12 vertices, exit [u,v,6]
Two separate passes: [u,v,1]→[u,v,6] and [v,u,1]→[v,u,6] (6 vertices each)
Enter [v,u,1], visit all 12 vertices, exit [v,u,6]

Additional structure:

k selector vertices s₁, ..., sₖ
For each vertex u ∈ V, chain its widgets together into a path
Connect first/last widget vertices of each chain to all selector vertices

Correctness:

Vertex cover of size k → hamiltonian cycle: each selector vertex sⱼ starts a path through all widgets of cover vertex uⱼ
Hamiltonian cycle → vertex cover of size k: partition cycle into "cover paths" starting at selector vertices; each cover path corresponds to a cover vertex

Size of G': |V'| = 12|E| + k, |E'| = 16|E| + (2k−1)|V| → polynomial.

34.5.4 The Traveling-Salesman Problem (TSP)

A tour is a hamiltonian cycle in a complete graph with edge costs. The goal is to minimize total cost.

TSP = {⟨G, c, k⟩ : complete graph G has a tour with cost ≤ k}

Theorem 34.14: TSP is NP-complete.

TSP ∈ NP: Certificate = vertex sequence. Verify it visits each vertex once, sum costs, check ≤ k → polynomial time.

HAM-CYCLE ≤ₚ TSP:

Given G = (V, E):

Create complete graph G' = (V, E') with cost function:
- c(i, j) = 0 if (i, j) ∈ E
- c(i, j) = 1 if (i, j) ∉ E
Output ⟨G', c, 0⟩

Correctness:

G has hamiltonian cycle h → h uses only edges in E → cost 0 in G' → tour of cost 0
G' has tour h' of cost ≤ 0 → all edges have cost 0 → all edges are in E → h' is a hamiltonian cycle in G

34.5.5 The Subset-Sum Problem

Given a finite set S of positive integers and target t > 0, does a subset S' ⊆ S exist with Σ_{s∈S'} s = t?

SUBSET-SUM = {⟨S, t⟩ : ∃ S' ⊆ S such that Σ_{s∈S'} s = t}

Theorem 34.15: SUBSET-SUM is NP-complete.

SUBSET-SUM ∈ NP: Certificate = S'. Sum elements, check = t → polynomial time.

3-CNF-SAT ≤ₚ SUBSET-SUM:

Given φ with variables x₁,...,xₙ and clauses C₁,...,Cₖ, construct numbers in base 10 with n + k digits (n variable-labeled digits + k clause-labeled digits):

Target t: digit 1 for each variable position, digit 4 for each clause position
For each variable xᵢ: create two numbers vᵢ and v'ᵢ:
- Both have 1 in the xᵢ digit position, 0 in other variable positions
- vᵢ has 1 in Cⱼ digit if literal xᵢ appears in clause Cⱼ
- v'ᵢ has 1 in Cⱼ digit if literal ¬xᵢ appears in clause Cⱼ
For each clause Cⱼ: create two slack variables sⱼ (value 1 in Cⱼ digit) and s'ⱼ (value 2 in Cⱼ digit)

Key property: Maximum digit sum = 6 (three 1s from variables + 1 + 2 from slack) → no carries in base 10.

Correctness:

(⟹) Satisfying assignment → include vᵢ if xᵢ = 1, else v'ᵢ → variable digits sum to 1 ✓. Each clause has 1–3 true literals → clause digits sum to 1–3 → use slack variables to reach 4 ✓
(⟸) Subset summing to t → must include exactly one of vᵢ, v'ᵢ per variable (to get 1 in variable digits). Set xᵢ = 1 if vᵢ ∈ S'. Slack variables contribute ≤ 3 per clause digit, so at least one vᵢ/v'ᵢ must contribute 1 → corresponding literal is true → clause satisfied ✓

Size: S has 2n + 2k values, each with n + k digits → polynomial.

Summary of Reductions

CIRCUIT-SAT  (first NP-complete problem, proved from scratch)
     │
     ▼
    SAT  (formula satisfiability)
     │
     ▼
 3-CNF-SAT  (satisfiability in 3-conjunctive normal form)
   │     │
   ▼     ▼
CLIQUE  SUBSET-SUM
   │
   ▼
VERTEX-COVER
   │
   ▼
HAM-CYCLE
   │
   ▼
  TSP

To prove a new problem L is NP-complete:

Show L ∈ NP (exhibit a polynomial-time verifier with polynomial-size certificate)
Pick a known NP-complete problem L'
Show L' ≤ₚ L (polynomial-time reduction preserving yes/no answers)

Common pitfalls:

Reducing in the wrong direction (must reduce from known NP-complete to new problem)
Reduction must not depend on knowing the solution to the source problem
Reducing only special cases of the source problem doesn't suffice — must handle all instances

Previous chapter

Computational Geometry

Next chapter

Approximation Algorithms