Pagefy

Pagefy

Back to Data Structures and Algorithms

NP Completeness

Introduction to Algorithms

Chapter 34: NP-Completeness

Almost all algorithms studied so far have been polynomial-time algorithms — on inputs of size n, their worst-case running time is O(nᵏ) for some constant k. However, not all problems can be solved in polynomial time:

  • Turing's Halting Problem cannot be solved by any computer, regardless of time allowed
  • Some problems can be solved, but not in time O(nᵏ) for any constant k

NP-complete problems occupy an interesting middle ground: no polynomial-time algorithm has been found for any of them, yet no one has proven that such algorithms cannot exist. This is the famous P ≠ NP question, open since 1971.

Contrasting pairs (one polynomial, one NP-complete):

Polynomial-TimeNP-Complete
Shortest simple path (O(VE))Longest simple path
Euler tour (O(E))Hamiltonian cycle
2-CNF satisfiability3-CNF satisfiability

Three key complexity classes:

  • P — problems solvable in polynomial time
  • NP — problems verifiable in polynomial time (given a certificate/solution, we can check it quickly)
  • NPC (NP-complete) — problems in NP that are as "hard" as any problem in NP

Key insight: P ⊆ NP. If any NP-complete problem can be solved in polynomial time, then P = NP (every problem in NP becomes polynomial-time solvable).

Practical significance: Proving a problem NP-complete provides strong evidence of intractability → pursue approximation algorithms or tractable special cases instead.

Three key concepts for NP-completeness proofs:

  1. Decision problems vs. optimization problems — NP-completeness applies to decision problems (yes/no answers), but optimization problems can be recast as decision problems by imposing a bound
  2. Reductions — transform instances of problem A into instances of problem B in polynomial time, preserving yes/no answers
  3. A first NP-complete problem — CIRCUIT-SAT serves as the foundation; all other NP-completeness proofs reduce from it (or from problems already proven NP-complete)

34.1 Polynomial Time

We regard polynomial-time solvable problems as tractable for three reasons:

  1. Practical running times — few real problems require Θ(n¹⁰⁰); once a first polynomial algorithm is found, more efficient ones typically follow
  2. Model independence — the class of polynomial-time solvable problems is the same across serial RAM, Turing machines, and polynomially-bounded parallel computers
  3. Closure properties — polynomials are closed under addition, multiplication, and composition, so composing polynomial-time algorithms yields polynomial-time algorithms

Abstract Problems

An abstract problem Q is a binary relation on a set I of problem instances and a set S of problem solutions.

  • Example: SHORTEST-PATH maps (graph, u, v) → shortest path sequence
  • Abstract decision problem: a function mapping instance set I to {0, 1}
  • Example: PATH(⟨G, u, v, k⟩) = 1 if a shortest path from u to v has ≤ k edges

Optimization problems can be recast as decision problems: if the decision version is hard, the optimization version is at least as hard.

Encodings

An encoding maps abstract objects to binary strings. A concrete problem has binary strings as its instance set.

  • A concrete problem is polynomial-time solvable if an algorithm solves it in O(nᵏ) time
  • P = the set of concrete decision problems that are polynomial-time solvable

Two encodings e₁ and e₂ are polynomially related if polynomial-time computable functions can convert between them.

Lemma 34.1: If e₁ and e₂ are polynomially related encodings of abstract problem Q, then e₁(Q) ∈ P if and only if e₂(Q) ∈ P.

  • Binary vs. base-3 representation: polynomially related → no effect on tractability
  • Binary vs. unary representation: not polynomially related → can change complexity (e.g., Θ(k) is polynomial in unary input length but exponential in binary input length)

Convention: Use "standard" encoding (polynomially related to binary) and denote it with angle brackets ⟨G⟩.

A Formal-Language Framework

  • Alphabet Σ: a finite set of symbols
  • Language L over Σ: any set of strings from Σ*
  • Operations: union, intersection, complement (L̄ = Σ* − L), concatenation, Kleene star (L* = {ε} ∪ L ∪ L² ∪ ...)

A decision problem Q corresponds to the language L = {x ∈ Σ* : Q(x) = 1}.

Key definitions:

  • Algorithm A accepts string x if A(x) = 1
  • Algorithm A rejects string x if A(x) = 0
  • Language L is accepted by A: L = {x ∈ {0,1}* : A(x) = 1}
  • Language L is decided by A: A correctly accepts every x ∈ L and rejects every x ∉ L
  • Accepting only requires output on yes-instances; deciding requires correct output on all inputs

Formal definition of P:

P = {L ⊆ {0,1}* : there exists an algorithm A that decides L in polynomial time}

Theorem 34.2: P equals the class of languages accepted by polynomial-time algorithms.

Proof sketch: If A accepts L in O(nᵏ) time, simulate A for cnᵏ steps. If A accepted, output 1; otherwise output 0. This decides L in polynomial time.


34.2 Polynomial-Time Verification

Hamiltonian Cycles

A hamiltonian cycle of an undirected graph G = (V, E) is a simple cycle containing each vertex in V exactly once.

  • HAM-CYCLE = {⟨G⟩ : G is a hamiltonian graph}
  • Brute-force: check all m! permutations → Ω(2^√n) time → not polynomial
  • HAM-CYCLE is NP-complete (proven in Section 34.5)

Verification Algorithms

A verification algorithm A takes two arguments:

  • Input string x (the problem instance)
  • Certificate y (a proposed proof/witness)

A verifies input x if there exists a certificate y such that A(x, y) = 1.

The language verified by A:

L = {x ∈ {0,1}* : there exists y ∈ {0,1}* such that A(x, y) = 1}

For HAM-CYCLE: the certificate is the sequence of vertices forming the cycle. Verification checks that it's a valid permutation of V with all consecutive edges present → O(n²) time.

The Complexity Class NP

NP is the class of languages verifiable by a polynomial-time algorithm:

L ∈ NP if and only if there exist a two-input polynomial-time algorithm A and constant c such that: L = {x ∈ {0,1}* : ∃ certificate y with |y| = O(|x|ᶜ) such that A(x, y) = 1}

Key relationships:

  • P ⊆ NP — if we can solve in polynomial time, we can certainly verify (just ignore the certificate)
  • P = NP? — the central open question; most researchers believe P ≠ NP
  • co-NP = {L : L̄ ∈ NP}; unknown whether NP = co-NP
  • P ⊆ NP ∩ co-NP (since P is closed under complement)

Four possible scenarios for P, NP, co-NP:

  1. P = NP = co-NP (most unlikely)
  2. NP = co-NP but P ≠ NP
  3. P = NP ∩ co-NP but NP ≠ co-NP
  4. NP ≠ co-NP and P ≠ NP ∩ co-NP (most likely)

34.3 NP-Completeness and Reducibility

Reducibility

A language L₁ is polynomial-time reducible to L₂, written L₁ ≤ₚ L₂, if there exists a polynomial-time computable function f : {0,1}* → {0,1}* such that for all x:

x ∈ L₁ if and only if f(x) ∈ L₂

  • f is the reduction function
  • The polynomial-time algorithm computing f is the reduction algorithm

How to use a reduction to solve problem A via problem B:

1. Given instance α of A, compute β = f(α) in polynomial time
2. Run polynomial-time algorithm for B on β
3. Use the answer for β as the answer for α

Lemma 34.3: If L₁ ≤ₚ L₂ and L₂ ∈ P, then L₁ ∈ P.

Contrapositive use: If L₁ has no polynomial-time algorithm and L₁ ≤ₚ L₂, then L₂ has no polynomial-time algorithm either.

NP-Completeness Definition

A language L ⊆ {0,1}* is NP-complete if:

  1. L ∈ NP
  2. L' ≤ₚ L for every L' ∈ NP

If L satisfies only property 2 (but not necessarily 1), L is NP-hard.

NPC = the class of NP-complete languages.

Theorem 34.4: If any NP-complete problem is polynomial-time solvable, then P = NP. Equivalently, if any problem in NP is not polynomial-time solvable, then no NP-complete problem is polynomial-time solvable.

Circuit Satisfiability (CIRCUIT-SAT)

Boolean combinational circuits are built from:

  • NOT gate (inverter): output = ¬x
  • AND gate: output = x ∧ y (1 only if all inputs are 1)
  • OR gate: output = x ∨ y (1 if any input is 1)

Gates are interconnected by wires (no cycles — the circuit is a DAG). A circuit has inputs, internal wires, and a single output.

A truth assignment is a set of boolean input values. A circuit is satisfiable if some truth assignment causes output = 1.

CIRCUIT-SAT = {⟨C⟩ : C is a satisfiable boolean combinational circuit}

Lemma 34.5: CIRCUIT-SAT ∈ NP.

  • Certificate: assignment of values to all wires
  • Verification: check each gate computes correctly, check output = 1 → linear time

Lemma 34.6: CIRCUIT-SAT is NP-hard.

  • Proof sketch: For any L ∈ NP with verifier A running in O(nᵏ) time:
    • Model computation of A as a sequence of configurations c₀, c₁, ..., c_{T(n)}
    • Hardware circuit M maps each configuration to the next
    • Paste T(n) copies of M together
    • Wire known values (program A, input x, initial state) directly
    • Leave certificate y as the only free inputs
    • Output bit = A's output
    • Circuit is satisfiable ⟺ ∃ certificate y such that A(x, y) = 1

Theorem 34.7: The circuit-satisfiability problem is NP-complete.


34.4 NP-Completeness Proofs

Methodology

Lemma 34.8: If L' ≤ₚ L for some L' ∈ NPC, then L is NP-hard. If additionally L ∈ NP, then L ∈ NPC.

Recipe for proving L is NP-complete:

  1. Prove L ∈ NP
  2. Select a known NP-complete language L'
  3. Describe a polynomial-time algorithm computing reduction function f
  4. Prove: x ∈ L' ⟺ f(x) ∈ L
  5. Prove the reduction runs in polynomial time

Formula Satisfiability (SAT)

An instance of SAT is a boolean formula φ with:

  • n boolean variables: x₁, x₂, ..., xₙ
  • m boolean connectives: ∧, ∨, ¬, →, ↔
  • Parentheses

SAT = {⟨φ⟩ : φ is a satisfiable boolean formula}

Theorem 34.9: SAT is NP-complete.

Proof outline:

  • SAT ∈ NP: Certificate = satisfying assignment. Replace variables with values, evaluate → polynomial time.
  • CIRCUIT-SAT ≤ₚ SAT: For each wire xᵢ in circuit C, introduce a variable xᵢ. Express each gate as a clause (e.g., x₁₀ ↔ (x₇ ∧ x₈ ∧ x₉)). The formula φ is the AND of the output variable with all gate clauses.
    • C satisfiable ⟺ φ satisfiable
    • Reduction runs in polynomial time (formula size is linear in circuit size)

Note: A naive inductive conversion of circuit to formula can cause exponential blowup due to shared subformulas (fan-out ≥ 2). The clause-per-gate approach avoids this.

3-CNF Satisfiability (3-CNF-SAT)

Definitions:

  • A literal is a variable or its negation
  • Conjunctive Normal Form (CNF): AND of clauses, each clause is OR of literals
  • 3-CNF: each clause has exactly 3 distinct literals

Example: (x₁ ∨ ¬x₁ ∨ ¬x₂) ∧ (x₃ ∨ x₂ ∨ x₄) ∧ (¬x₁ ∨ ¬x₃ ∨ ¬x₄)

Theorem 34.10: 3-CNF-SAT is NP-complete.

Proof (SAT ≤ₚ 3-CNF-SAT) — three steps:

Step 1: Parse tree → clauses

  • Build a binary parse tree for φ
  • Introduce variable yᵢ for each internal node's output
  • Rewrite as: φ' = y₁ ∧ (y₁ ↔ (y₂ ∧ ¬x₂)) ∧ (y₂ ↔ (y₃ ∨ y₄)) ∧ ...
  • Each clause φ'ᵢ has at most 3 variables

Step 2: Convert each clause to CNF

  • Build truth table for each clause φ'ᵢ
  • From rows evaluating to 0, build DNF for ¬φ'ᵢ
  • Negate and apply DeMorgan's laws to get CNF:
    • ¬(a ∧ b) = ¬a ∨ ¬b
    • ¬(a ∨ b) = ¬a ∧ ¬b
  • Each clause has at most 3 literals (since each φ'ᵢ has ≤ 3 variables → ≤ 8 truth table rows)

Step 3: Pad clauses to exactly 3 literals

  • Using auxiliary variables p and q:
    • 3 literals → keep as is
    • 2 literals (l₁ ∨ l₂) → (l₁ ∨ l₂ ∨ p) ∧ (l₁ ∨ l₂ ∨ ¬p)
    • 1 literal (l) → (l ∨ p ∨ q) ∧ (l ∨ p ∨ ¬q) ∧ (l ∨ ¬p ∨ q) ∧ (l ∨ ¬p ∨ ¬q)

Polynomial time: Each step introduces at most a constant factor blowup per connective/clause.


34.5 NP-Complete Problems

Reduction chain:

CIRCUIT-SAT → SAT → 3-CNF-SAT → CLIQUE → VERTEX-COVER → HAM-CYCLE → TSP
                                                              ↘
                                  3-CNF-SAT ──────────────→ SUBSET-SUM

34.5.1 The Clique Problem

A clique in undirected graph G = (V, E) is a subset V' ⊆ V where every pair of vertices in V' is connected by an edge (a complete subgraph). The size of a clique is |V'|.

CLIQUE = {⟨G, k⟩ : G contains a clique of size k}

Theorem 34.11: CLIQUE is NP-complete.

CLIQUE ∈ NP: Certificate = vertex set V'. Verify all pairs have edges → polynomial time.

3-CNF-SAT ≤ₚ CLIQUE:

Given φ = C₁ ∧ C₂ ∧ ... ∧ Cₖ (each Cᵣ has 3 literals l₁ʳ, l₂ʳ, l₃ʳ):

  1. Construct graph G = (V, E):

    • For each clause Cᵣ, create a triple of vertices v₁ʳ, v₂ʳ, v₃ʳ
    • Add edge (vᵢʳ, vⱼˢ) if:
      • r ≠ s (different triples), AND
      • lᵢʳ and lⱼˢ are consistent (not negations of each other)
  2. Claim: φ is satisfiable ⟺ G has a clique of size k

    • (⟹) Satisfying assignment → pick one true literal per clause → k vertices form a clique (all from different triples, all consistent → all connected)
    • (⟸) Clique of size k → exactly one vertex per triple → set corresponding literals to 1 (no contradictions since edges only connect consistent literals) → all clauses satisfied

34.5.2 The Vertex-Cover Problem

A vertex cover of G = (V, E) is a subset V' ⊆ V such that every edge (u, v) ∈ E has at least one endpoint in V'.

VERTEX-COVER = {⟨G, k⟩ : G has a vertex cover of size k}

Theorem 34.12: VERTEX-COVER is NP-complete.

VERTEX-COVER ∈ NP: Certificate = V'. Check |V'| = k and every edge is covered → polynomial time.

CLIQUE ≤ₚ VERTEX-COVER:

Uses the complement graph Ḡ = (V, Ē) where Ē = {(u,v) : u ≠ v and (u,v) ∉ E}.

  • Reduction: ⟨G, k⟩ → ⟨Ḡ, |V| − k⟩

  • Claim: G has a clique of size k ⟺ Ḡ has a vertex cover of size |V| − k

    • (⟹) Clique V' of size k in G → V − V' is a vertex cover in Ḡ. For any edge (u,v) ∈ Ē: (u,v) ∉ E, so at least one of u, v is not in V' → at least one is in V − V'.
    • (⟸) Vertex cover V' of size |V| − k in Ḡ → V − V' is a clique of size k in G. If u, v ∉ V', then (u,v) ∉ Ē (since V' covers all edges of Ḡ), so (u,v) ∈ E.

34.5.3 The Hamiltonian-Cycle Problem

Theorem 34.13: HAM-CYCLE is NP-complete.

HAM-CYCLE ∈ NP: Certificate = sequence of |V| vertices. Verify it's a permutation of V with all consecutive edges (and last-to-first) present → polynomial time.

VERTEX-COVER ≤ₚ HAM-CYCLE:

Given undirected G = (V, E) and integer k, construct G' = (V', E') with a hamiltonian cycle iff G has a vertex cover of size k.

Construction uses a widget Wᵤᵥ for each edge (u, v) ∈ E:

  • Each widget has 12 vertices: [u,v,1]...[u,v,6] and [v,u,1]...[v,u,6]
  • 14 internal edges
  • Only vertices [u,v,1], [u,v,6], [v,u,1], [v,u,6] connect to the rest of G'

Widget property: A hamiltonian cycle can traverse the widget in exactly 3 ways:

  1. Enter [u,v,1], visit all 12 vertices, exit [u,v,6]
  2. Two separate passes: [u,v,1]→[u,v,6] and [v,u,1]→[v,u,6] (6 vertices each)
  3. Enter [v,u,1], visit all 12 vertices, exit [v,u,6]

Additional structure:

  • k selector vertices s₁, ..., sₖ
  • For each vertex u ∈ V, chain its widgets together into a path
  • Connect first/last widget vertices of each chain to all selector vertices

Correctness:

  • Vertex cover of size k → hamiltonian cycle: each selector vertex sⱼ starts a path through all widgets of cover vertex uⱼ
  • Hamiltonian cycle → vertex cover of size k: partition cycle into "cover paths" starting at selector vertices; each cover path corresponds to a cover vertex

Size of G': |V'| = 12|E| + k, |E'| = 16|E| + (2k−1)|V| → polynomial.

34.5.4 The Traveling-Salesman Problem (TSP)

A tour is a hamiltonian cycle in a complete graph with edge costs. The goal is to minimize total cost.

TSP = {⟨G, c, k⟩ : complete graph G has a tour with cost ≤ k}

Theorem 34.14: TSP is NP-complete.

TSP ∈ NP: Certificate = vertex sequence. Verify it visits each vertex once, sum costs, check ≤ k → polynomial time.

HAM-CYCLE ≤ₚ TSP:

Given G = (V, E):

  1. Create complete graph G' = (V, E') with cost function:
    • c(i, j) = 0 if (i, j) ∈ E
    • c(i, j) = 1 if (i, j) ∉ E
  2. Output ⟨G', c, 0⟩

Correctness:

  • G has hamiltonian cycle h → h uses only edges in E → cost 0 in G' → tour of cost 0
  • G' has tour h' of cost ≤ 0 → all edges have cost 0 → all edges are in E → h' is a hamiltonian cycle in G

34.5.5 The Subset-Sum Problem

Given a finite set S of positive integers and target t > 0, does a subset S' ⊆ S exist with Σ_{s∈S'} s = t?

SUBSET-SUM = {⟨S, t⟩ : ∃ S' ⊆ S such that Σ_{s∈S'} s = t}

Theorem 34.15: SUBSET-SUM is NP-complete.

SUBSET-SUM ∈ NP: Certificate = S'. Sum elements, check = t → polynomial time.

3-CNF-SAT ≤ₚ SUBSET-SUM:

Given φ with variables x₁,...,xₙ and clauses C₁,...,Cₖ, construct numbers in base 10 with n + k digits (n variable-labeled digits + k clause-labeled digits):

  1. Target t: digit 1 for each variable position, digit 4 for each clause position

  2. For each variable xᵢ: create two numbers vᵢ and v'ᵢ:

    • Both have 1 in the xᵢ digit position, 0 in other variable positions
    • vᵢ has 1 in Cⱼ digit if literal xᵢ appears in clause Cⱼ
    • v'ᵢ has 1 in Cⱼ digit if literal ¬xᵢ appears in clause Cⱼ
  3. For each clause Cⱼ: create two slack variables sⱼ (value 1 in Cⱼ digit) and s'ⱼ (value 2 in Cⱼ digit)

Key property: Maximum digit sum = 6 (three 1s from variables + 1 + 2 from slack) → no carries in base 10.

Correctness:

  • (⟹) Satisfying assignment → include vᵢ if xᵢ = 1, else v'ᵢ → variable digits sum to 1 ✓. Each clause has 1–3 true literals → clause digits sum to 1–3 → use slack variables to reach 4 ✓
  • (⟸) Subset summing to t → must include exactly one of vᵢ, v'ᵢ per variable (to get 1 in variable digits). Set xᵢ = 1 if vᵢ ∈ S'. Slack variables contribute ≤ 3 per clause digit, so at least one vᵢ/v'ᵢ must contribute 1 → corresponding literal is true → clause satisfied ✓

Size: S has 2n + 2k values, each with n + k digits → polynomial.


Summary of Reductions

CIRCUIT-SAT  (first NP-complete problem, proved from scratch)
     │
     ▼
    SAT  (formula satisfiability)
     │
     ▼
 3-CNF-SAT  (satisfiability in 3-conjunctive normal form)
   │     │
   ▼     ▼
CLIQUE  SUBSET-SUM
   │
   ▼
VERTEX-COVER
   │
   ▼
HAM-CYCLE
   │
   ▼
  TSP

To prove a new problem L is NP-complete:

  1. Show L ∈ NP (exhibit a polynomial-time verifier with polynomial-size certificate)
  2. Pick a known NP-complete problem L'
  3. Show L' ≤ₚ L (polynomial-time reduction preserving yes/no answers)

Common pitfalls:

  • Reducing in the wrong direction (must reduce from known NP-complete to new problem)
  • Reduction must not depend on knowing the solution to the source problem
  • Reducing only special cases of the source problem doesn't suffice — must handle all instances