Pagefy

Book

Introduction to Algorithms

1.The Role of Algorithms in Computing 2.Getting Started 3.Growth of Functions 4.Divide and Conquer 5.Probabilistic Analysis and Randomized Algorithms 6.Heapsort 7.Quicksort 8.Sorting in Linear Time 9.Medians and Order Statistics 10.Elementary Data Structures 11.Hash Tables 12.Binary Search Trees 13.Red Black Trees 14.Augmenting Data Structures 15.Dynamic Programming 16.Greedy Algorithms 17.Amortized Analysis 18.B Trees 19.Fibonacci Heaps 20.van Emde Boas Trees 21.Data Structures for Disjoint Sets 22.Elementary Graph Algorithms 23.Minimum Spanning Trees 24.Single Source Shortest Paths 25.All Pairs Shortest Paths 26.Maximum Flow 27.Multithreaded Algorithms 28.Matrix Operations 29.Linear Programming 30.Polynomials and the FFT 31.Number Theoretic Algorithms 32.String Matching 33.Computational Geometry 34.NP Completeness 35.Approximation Algorithms 36.Appendix A Summations 37.Appendix B Sets Etc 38.Appendix C Counting and Probability 39.Appendix D Matrices

Elementary Graph Algorithms

Introduction to Algorithms

Chapter 22: Elementary Graph Algorithms

This chapter presents methods for representing a graph and for searching a graph. Searching a graph means systematically following the edges to visit the vertices. A graph-searching algorithm can discover much about the structure of a graph. Techniques for searching a graph lie at the heart of the field of graph algorithms.

22.1 Representations of Graphs

Two standard ways to represent a graph G = (V, E):

Adjacency lists — preferred for sparse graphs (|E| ≪ |V|²)
Adjacency matrix — preferred for dense graphs (|E| ≈ |V|²) or when quick edge-existence queries are needed

Adjacency-List Representation

An array Adj of |V| linked lists, one per vertex
For each vertex u, Adj[u] contains all vertices v such that (u, v) ∈ E
In pseudocode: G.Adj[u]

Space complexity:

Directed graph: sum of all list lengths = |E| → memory is Θ(V + E)
Undirected graph: sum of all list lengths = 2|E| → memory is Θ(V + E)

Advantages:

Compact for sparse graphs
Easily adapted for weighted graphs (store weight w(u, v) alongside v in u's list)
Robust — can be modified to support many graph variants

Disadvantage:

No faster way to check if edge (u, v) exists than scanning Adj[u]

Adjacency-Matrix Representation

A |V| × |V| matrix A = (a_ij) where:
- a_ij = 1 if (i, j) ∈ E
- a_ij = 0 otherwise

Space complexity: Θ(V²) regardless of number of edges

Properties:

For undirected graphs, the matrix is symmetric: A = Aᵀ (can store only upper triangle to save ~half the space)
For weighted graphs, store w(u, v) as the entry; use NIL, 0, or ∞ for non-edges
Simpler than adjacency lists; requires only one bit per entry for unweighted graphs

Representing Attributes

Vertex attributes: denoted v.d, v.π, etc.
Edge attributes: denoted (u, v).f, etc.
In practice, attributes can be stored in parallel arrays (e.g., d[1..|V|]) or as instance variables in an object-oriented design

22.2 Breadth-First Search (BFS)

Breadth-first search is one of the simplest graph-searching algorithms and the archetype for many important graph algorithms (e.g., Prim's MST, Dijkstra's shortest paths).

Given graph G = (V, E) and a source vertex s, BFS:

Discovers every vertex reachable from s
Computes the shortest-path distance (fewest edges) from s to each reachable vertex
Produces a breadth-first tree rooted at s

BFS discovers all vertices at distance k before any vertex at distance k + 1.

Vertex Coloring

Color	Meaning
WHITE	Undiscovered
GRAY	Discovered but not fully explored (the frontier)
BLACK	Fully explored (all adjacent vertices discovered)

Vertex Attributes

u.color — current color
u.d — distance from source s
u.π — predecessor (parent) in the BFS tree

Algorithm

BFS(G, s)
 1  for each vertex u ∈ G.V - {s}
 2      u.color = WHITE
 3      u.d = ∞
 4      u.π = NIL
 5  s.color = GRAY
 6  s.d = 0
 7  s.π = NIL
 8  Q = ∅
 9  ENQUEUE(Q, s)
10  while Q ≠ ∅
11      u = DEQUEUE(Q)
12      for each v ∈ G.Adj[u]
13          if v.color == WHITE
14              v.color = GRAY
15              v.d = u.d + 1
16              v.π = u
17              ENQUEUE(Q, v)
18      u.color = BLACK

Analysis

Each vertex is enqueued and dequeued at most once → queue operations: O(V)
Each adjacency list is scanned at most once → total scan time: O(E)
Initialization: O(V)
Total running time: O(V + E)

Shortest Paths

Shortest-path distance δ(s, v) = minimum number of edges in any path from s to v (∞ if unreachable).

Key results:

Lemma 22.1: For any edge (u, v) ∈ E: δ(s, v) ≤ δ(s, u) + 1
Lemma 22.2: Upon termination, v.d ≥ δ(s, v) for all v ∈ V (upper bound property)
Lemma 22.3: At any point during BFS, if Q = ⟨v₁, v₂, ..., vᵣ⟩, then vᵣ.d ≤ v₁.d + 1 and vᵢ.d ≤ vᵢ₊₁.d (queue holds at most two distinct d values)
Corollary 22.4: Vertices are enqueued in monotonically non-decreasing order of d values
Theorem 22.5 (Correctness of BFS): BFS discovers every vertex reachable from s, and upon termination v.d = δ(s, v) for all v ∈ V. For any reachable v ≠ s, a shortest path from s to v is a shortest path from s to v.π followed by edge (v.π, v).

Breadth-First Trees

The predecessor subgraph G_π = (V_π, E_π) where:

V_π = {v ∈ V : v.π ≠ NIL} ∪ {s}
E_π = {(v.π, v) : v ∈ V_π - {s}}

Lemma 22.6: BFS constructs π so that G_π is a breadth-first tree — it contains a unique simple path from s to each reachable vertex, and that path is a shortest path.

Printing Shortest Paths

PRINT-PATH(G, s, v)
1  if v == s
2      print s
3  elseif v.π == NIL
4      print "no path from" s "to" v "exists"
5  else PRINT-PATH(G, s, v.π)
6      print v

Runs in time linear in the number of vertices on the path.

22.3 Depth-First Search (DFS)

Depth-first search explores edges out of the most recently discovered vertex that still has unexplored edges. When all edges of a vertex are explored, the search backtracks. If undiscovered vertices remain, DFS selects a new source and repeats. This continues until every vertex is discovered.

Key Differences from BFS

Feature	BFS	DFS
Data structure	Queue (FIFO)	Stack (implicit via recursion)
Source	Single source	Multiple sources possible
Predecessor subgraph	Single tree	Depth-first forest (multiple trees)

Timestamps

Each vertex v gets two timestamps (integers in [1, 2|V|]):

v.d — discovery time (when v is first discovered / grayed)
v.f — finishing time (when v's adjacency list is fully examined / blackened)

For every vertex u: u.d < u.f

Vertex u is:

WHITE before time u.d
GRAY between u.d and u.f
BLACK after u.f

Algorithm

DFS(G)
1  for each vertex u ∈ G.V
2      u.color = WHITE
3      u.π = NIL
4  time = 0
5  for each vertex u ∈ G.V
6      if u.color == WHITE
7          DFS-VISIT(G, u)

DFS-VISIT(G, u)
 1  time = time + 1           // white vertex u has just been discovered
 2  u.d = time
 3  u.color = GRAY
 4  for each v ∈ G.Adj[u]     // explore edge (u, v)
 5      if v.color == WHITE
 6          v.π = u
 7          DFS-VISIT(G, v)
 8  u.color = BLACK            // blacken u; it is finished
 9  time = time + 1
10  u.f = time

Analysis

Initialization (lines 1–3, 5–7 of DFS): Θ(V)
DFS-VISIT called exactly once per vertex; loop on lines 4–7 executes |Adj[v]| times per call
Total: Σ|Adj[v]| = Θ(E)
Total running time: Θ(V + E)

Properties of Depth-First Search

Parenthesis Theorem (Theorem 22.7)

For any two vertices u and v, exactly one of the following holds:

[u.d, u.f] and [v.d, v.f] are entirely disjoint — neither is a descendant of the other
[u.d, u.f] ⊂ [v.d, v.f] — u is a descendant of v
[v.d, v.f] ⊂ [u.d, u.f] — v is a descendant of u

Discovery/finishing times have proper parenthesis structure: parentheses are always properly nested.

Corollary 22.8 (Nesting of Descendants' Intervals)

Vertex v is a proper descendant of u in the DFS forest if and only if:

u.d < v.d < v.f < u.f

White-Path Theorem (Theorem 22.9)

Vertex v is a descendant of u if and only if at time u.d, there exists a path from u to v consisting entirely of white vertices.

Classification of Edges

DFS classifies every edge (u, v) into one of four types:

Edge Type	Definition	Color of v when (u,v) explored
Tree edge	Edge in the DFS forest; v was first discovered via this edge	WHITE
Back edge	Connects u to an ancestor v in a DFS tree (includes self-loops)	GRAY
Forward edge	Non-tree edge connecting u to a descendant v	BLACK (u.d < v.d)
Cross edge	All other edges (between trees or non-ancestor/descendant in same tree)	BLACK (u.d > v.d)

Key results:

Theorem 22.10: In a DFS of an undirected graph, every edge is either a tree edge or a back edge (no forward or cross edges)
Lemma 22.11: A directed graph G is acyclic if and only if a DFS of G yields no back edges

22.4 Topological Sort

A topological sort of a directed acyclic graph (DAG) G = (V, E) is a linear ordering of all its vertices such that if G contains edge (u, v), then u appears before v in the ordering.

Only possible if the graph has no cycles
Visualized as vertices along a horizontal line with all directed edges going left to right
Used to indicate precedences among events

Algorithm

TOPOLOGICAL-SORT(G)
1  call DFS(G) to compute finishing times v.f for each vertex v
2  as each vertex is finished, insert it onto the front of a linked list
3  return the linked list of vertices

Vertices appear in reverse order of finishing times (decreasing v.f).

Analysis

DFS: Θ(V + E)
Inserting each vertex onto front of linked list: O(1) per vertex → O(V) total
Total running time: Θ(V + E)

Correctness

Lemma 22.11: A directed graph G is acyclic if and only if a DFS of G yields no back edges.

Proof sketch:

(⇒) If DFS produces a back edge (u, v), then v is an ancestor of u, so there is a path v → u, and the back edge u → v completes a cycle — contradiction
(⇐) If G contains a cycle c, let v be the first vertex discovered in c. By the white-path theorem, all vertices in c become descendants of v, so the edge into v from c is a back edge

Theorem 22.12: TOPOLOGICAL-SORT produces a correct topological sort.

Proof sketch: For any edge (u, v) explored by DFS:

v cannot be gray (that would make it a back edge, contradicting acyclicity)
If v is white → v becomes a descendant of u → v.f < u.f
If v is black → v.f is already set, and u.f will be set later → v.f < u.f

In all cases, v.f < u.f, so u appears before v in the ordering (decreasing finish time).

22.5 Strongly Connected Components

A strongly connected component (SCC) of a directed graph G = (V, E) is a maximal set of vertices C ⊆ V such that for every pair u, v ∈ C: u ↝ v and v ↝ u (mutually reachable).

Key Concepts

Transpose graph Gᵀ = (V, Eᵀ) where Eᵀ = {(u, v) : (v, u) ∈ E} — all edges reversed
- G and Gᵀ have exactly the same strongly connected components
- Can be computed in O(V + E) time from adjacency lists
Component graph G^SCC = (V^SCC, E^SCC):
- One vertex per SCC
- Edge (i, j) ∈ E^SCC if there exists (x, y) ∈ E with x ∈ Cᵢ, y ∈ Cⱼ
- The component graph is always a DAG (Lemma 22.13)

Algorithm (Kosaraju's Algorithm)

STRONGLY-CONNECTED-COMPONENTS(G)
1  call DFS(G) to compute finishing times u.f for each vertex u
2  compute Gᵀ
3  call DFS(Gᵀ), but in the main loop of DFS, consider vertices
       in order of decreasing u.f (as computed in line 1)
4  output the vertices of each tree in the depth-first forest
       formed in line 3 as a separate strongly connected component

Analysis

First DFS: Θ(V + E)
Computing Gᵀ: O(V + E)
Second DFS: Θ(V + E)
Total running time: Θ(V + E)

Why It Works

Notation: For a set of vertices U ⊆ V:

d(U) = min{u.d : u ∈ U} (earliest discovery time)
f(U) = max{u.f : u ∈ U} (latest finishing time)

Lemma 22.13: If C and C' are distinct SCCs, and there is a path u ↝ u' with u ∈ C, u' ∈ C', then there cannot be a path from C' back to C. (This is why the component graph is a DAG.)

Lemma 22.14: If there is an edge (u, v) ∈ E with u ∈ C and v ∈ C' (distinct SCCs), then f(C) > f(C').

Corollary 22.15: If there is an edge (u, v) ∈ Eᵀ with u ∈ C and v ∈ C' (distinct SCCs), then f(C) < f(C').

Theorem 22.16 (Correctness): The algorithm correctly computes SCCs.

Intuition:

The first DFS computes finishing times
In the second DFS on Gᵀ, we start from the SCC with the largest finishing time
By Corollary 22.15, edges in Gᵀ leaving this SCC only go to SCCs with larger finishing times — but there are none, so the search stays within the SCC
We then move to the next unvisited SCC with the largest finishing time, and repeat
Each DFS tree in the second search corresponds to exactly one SCC

In essence, the second DFS visits the vertices of G^SCC in topologically sorted order.

Previous chapter

Data Structures for Disjoint Sets

Next chapter

Minimum Spanning Trees