Elementary Graph Algorithms
Introduction to AlgorithmsChapter 22: Elementary Graph Algorithms
This chapter presents methods for representing a graph and for searching a graph. Searching a graph means systematically following the edges to visit the vertices. A graph-searching algorithm can discover much about the structure of a graph. Techniques for searching a graph lie at the heart of the field of graph algorithms.
22.1 Representations of Graphs
Two standard ways to represent a graph G = (V, E):
- Adjacency lists — preferred for sparse graphs (|E| ≪ |V|²)
- Adjacency matrix — preferred for dense graphs (|E| ≈ |V|²) or when quick edge-existence queries are needed
Adjacency-List Representation
- An array
Adjof |V| linked lists, one per vertex - For each vertex u,
Adj[u]contains all vertices v such that (u, v) ∈ E - In pseudocode:
G.Adj[u]
Space complexity:
- Directed graph: sum of all list lengths = |E| → memory is Θ(V + E)
- Undirected graph: sum of all list lengths = 2|E| → memory is Θ(V + E)
Advantages:
- Compact for sparse graphs
- Easily adapted for weighted graphs (store weight w(u, v) alongside v in u's list)
- Robust — can be modified to support many graph variants
Disadvantage:
- No faster way to check if edge (u, v) exists than scanning
Adj[u]
Adjacency-Matrix Representation
- A |V| × |V| matrix A = (a_ij) where:
- a_ij = 1 if (i, j) ∈ E
- a_ij = 0 otherwise
Space complexity: Θ(V²) regardless of number of edges
Properties:
- For undirected graphs, the matrix is symmetric: A = Aᵀ (can store only upper triangle to save ~half the space)
- For weighted graphs, store w(u, v) as the entry; use NIL, 0, or ∞ for non-edges
- Simpler than adjacency lists; requires only one bit per entry for unweighted graphs
Representing Attributes
- Vertex attributes: denoted v.d, v.π, etc.
- Edge attributes: denoted (u, v).f, etc.
- In practice, attributes can be stored in parallel arrays (e.g., d[1..|V|]) or as instance variables in an object-oriented design
22.2 Breadth-First Search (BFS)
Breadth-first search is one of the simplest graph-searching algorithms and the archetype for many important graph algorithms (e.g., Prim's MST, Dijkstra's shortest paths).
Given graph G = (V, E) and a source vertex s, BFS:
- Discovers every vertex reachable from s
- Computes the shortest-path distance (fewest edges) from s to each reachable vertex
- Produces a breadth-first tree rooted at s
BFS discovers all vertices at distance k before any vertex at distance k + 1.
Vertex Coloring
| Color | Meaning |
|---|---|
| WHITE | Undiscovered |
| GRAY | Discovered but not fully explored (the frontier) |
| BLACK | Fully explored (all adjacent vertices discovered) |
Vertex Attributes
- u.color — current color
- u.d — distance from source s
- u.π — predecessor (parent) in the BFS tree
Algorithm
BFS(G, s)
1 for each vertex u ∈ G.V - {s}
2 u.color = WHITE
3 u.d = ∞
4 u.π = NIL
5 s.color = GRAY
6 s.d = 0
7 s.π = NIL
8 Q = ∅
9 ENQUEUE(Q, s)
10 while Q ≠ ∅
11 u = DEQUEUE(Q)
12 for each v ∈ G.Adj[u]
13 if v.color == WHITE
14 v.color = GRAY
15 v.d = u.d + 1
16 v.π = u
17 ENQUEUE(Q, v)
18 u.color = BLACK
Analysis
- Each vertex is enqueued and dequeued at most once → queue operations: O(V)
- Each adjacency list is scanned at most once → total scan time: O(E)
- Initialization: O(V)
- Total running time: O(V + E)
Shortest Paths
Shortest-path distance δ(s, v) = minimum number of edges in any path from s to v (∞ if unreachable).
Key results:
- Lemma 22.1: For any edge (u, v) ∈ E: δ(s, v) ≤ δ(s, u) + 1
- Lemma 22.2: Upon termination, v.d ≥ δ(s, v) for all v ∈ V (upper bound property)
- Lemma 22.3: At any point during BFS, if Q = ⟨v₁, v₂, ..., vᵣ⟩, then vᵣ.d ≤ v₁.d + 1 and vᵢ.d ≤ vᵢ₊₁.d (queue holds at most two distinct d values)
- Corollary 22.4: Vertices are enqueued in monotonically non-decreasing order of d values
- Theorem 22.5 (Correctness of BFS): BFS discovers every vertex reachable from s, and upon termination v.d = δ(s, v) for all v ∈ V. For any reachable v ≠ s, a shortest path from s to v is a shortest path from s to v.π followed by edge (v.π, v).
Breadth-First Trees
The predecessor subgraph G_π = (V_π, E_π) where:
- V_π = {v ∈ V : v.π ≠ NIL} ∪ {s}
- E_π = {(v.π, v) : v ∈ V_π - {s}}
Lemma 22.6: BFS constructs π so that G_π is a breadth-first tree — it contains a unique simple path from s to each reachable vertex, and that path is a shortest path.
Printing Shortest Paths
PRINT-PATH(G, s, v)
1 if v == s
2 print s
3 elseif v.π == NIL
4 print "no path from" s "to" v "exists"
5 else PRINT-PATH(G, s, v.π)
6 print v
Runs in time linear in the number of vertices on the path.
22.3 Depth-First Search (DFS)
Depth-first search explores edges out of the most recently discovered vertex that still has unexplored edges. When all edges of a vertex are explored, the search backtracks. If undiscovered vertices remain, DFS selects a new source and repeats. This continues until every vertex is discovered.
Key Differences from BFS
| Feature | BFS | DFS |
|---|---|---|
| Data structure | Queue (FIFO) | Stack (implicit via recursion) |
| Source | Single source | Multiple sources possible |
| Predecessor subgraph | Single tree | Depth-first forest (multiple trees) |
Timestamps
Each vertex v gets two timestamps (integers in [1, 2|V|]):
- v.d — discovery time (when v is first discovered / grayed)
- v.f — finishing time (when v's adjacency list is fully examined / blackened)
For every vertex u: u.d < u.f
Vertex u is:
- WHITE before time u.d
- GRAY between u.d and u.f
- BLACK after u.f
Algorithm
DFS(G)
1 for each vertex u ∈ G.V
2 u.color = WHITE
3 u.π = NIL
4 time = 0
5 for each vertex u ∈ G.V
6 if u.color == WHITE
7 DFS-VISIT(G, u)
DFS-VISIT(G, u)
1 time = time + 1 // white vertex u has just been discovered
2 u.d = time
3 u.color = GRAY
4 for each v ∈ G.Adj[u] // explore edge (u, v)
5 if v.color == WHITE
6 v.π = u
7 DFS-VISIT(G, v)
8 u.color = BLACK // blacken u; it is finished
9 time = time + 1
10 u.f = time
Analysis
- Initialization (lines 1–3, 5–7 of DFS): Θ(V)
- DFS-VISIT called exactly once per vertex; loop on lines 4–7 executes |Adj[v]| times per call
- Total: Σ|Adj[v]| = Θ(E)
- Total running time: Θ(V + E)
Properties of Depth-First Search
Parenthesis Theorem (Theorem 22.7)
For any two vertices u and v, exactly one of the following holds:
- [u.d, u.f] and [v.d, v.f] are entirely disjoint — neither is a descendant of the other
- [u.d, u.f] ⊂ [v.d, v.f] — u is a descendant of v
- [v.d, v.f] ⊂ [u.d, u.f] — v is a descendant of u
Discovery/finishing times have proper parenthesis structure: parentheses are always properly nested.
Corollary 22.8 (Nesting of Descendants' Intervals)
Vertex v is a proper descendant of u in the DFS forest if and only if:
u.d < v.d < v.f < u.f
White-Path Theorem (Theorem 22.9)
Vertex v is a descendant of u if and only if at time u.d, there exists a path from u to v consisting entirely of white vertices.
Classification of Edges
DFS classifies every edge (u, v) into one of four types:
| Edge Type | Definition | Color of v when (u,v) explored |
|---|---|---|
| Tree edge | Edge in the DFS forest; v was first discovered via this edge | WHITE |
| Back edge | Connects u to an ancestor v in a DFS tree (includes self-loops) | GRAY |
| Forward edge | Non-tree edge connecting u to a descendant v | BLACK (u.d < v.d) |
| Cross edge | All other edges (between trees or non-ancestor/descendant in same tree) | BLACK (u.d > v.d) |
Key results:
- Theorem 22.10: In a DFS of an undirected graph, every edge is either a tree edge or a back edge (no forward or cross edges)
- Lemma 22.11: A directed graph G is acyclic if and only if a DFS of G yields no back edges
22.4 Topological Sort
A topological sort of a directed acyclic graph (DAG) G = (V, E) is a linear ordering of all its vertices such that if G contains edge (u, v), then u appears before v in the ordering.
- Only possible if the graph has no cycles
- Visualized as vertices along a horizontal line with all directed edges going left to right
- Used to indicate precedences among events
Algorithm
TOPOLOGICAL-SORT(G)
1 call DFS(G) to compute finishing times v.f for each vertex v
2 as each vertex is finished, insert it onto the front of a linked list
3 return the linked list of vertices
Vertices appear in reverse order of finishing times (decreasing v.f).
Analysis
- DFS: Θ(V + E)
- Inserting each vertex onto front of linked list: O(1) per vertex → O(V) total
- Total running time: Θ(V + E)
Correctness
Lemma 22.11: A directed graph G is acyclic if and only if a DFS of G yields no back edges.
Proof sketch:
- (⇒) If DFS produces a back edge (u, v), then v is an ancestor of u, so there is a path v → u, and the back edge u → v completes a cycle — contradiction
- (⇐) If G contains a cycle c, let v be the first vertex discovered in c. By the white-path theorem, all vertices in c become descendants of v, so the edge into v from c is a back edge
Theorem 22.12: TOPOLOGICAL-SORT produces a correct topological sort.
Proof sketch: For any edge (u, v) explored by DFS:
- v cannot be gray (that would make it a back edge, contradicting acyclicity)
- If v is white → v becomes a descendant of u → v.f < u.f
- If v is black → v.f is already set, and u.f will be set later → v.f < u.f
In all cases, v.f < u.f, so u appears before v in the ordering (decreasing finish time).
22.5 Strongly Connected Components
A strongly connected component (SCC) of a directed graph G = (V, E) is a maximal set of vertices C ⊆ V such that for every pair u, v ∈ C: u ↝ v and v ↝ u (mutually reachable).
Key Concepts
-
Transpose graph Gᵀ = (V, Eᵀ) where Eᵀ = {(u, v) : (v, u) ∈ E} — all edges reversed
- G and Gᵀ have exactly the same strongly connected components
- Can be computed in O(V + E) time from adjacency lists
-
Component graph G^SCC = (V^SCC, E^SCC):
- One vertex per SCC
- Edge (i, j) ∈ E^SCC if there exists (x, y) ∈ E with x ∈ Cᵢ, y ∈ Cⱼ
- The component graph is always a DAG (Lemma 22.13)
Algorithm (Kosaraju's Algorithm)
STRONGLY-CONNECTED-COMPONENTS(G)
1 call DFS(G) to compute finishing times u.f for each vertex u
2 compute Gᵀ
3 call DFS(Gᵀ), but in the main loop of DFS, consider vertices
in order of decreasing u.f (as computed in line 1)
4 output the vertices of each tree in the depth-first forest
formed in line 3 as a separate strongly connected component
Analysis
- First DFS: Θ(V + E)
- Computing Gᵀ: O(V + E)
- Second DFS: Θ(V + E)
- Total running time: Θ(V + E)
Why It Works
Notation: For a set of vertices U ⊆ V:
- d(U) = min{u.d : u ∈ U} (earliest discovery time)
- f(U) = max{u.f : u ∈ U} (latest finishing time)
Lemma 22.13: If C and C' are distinct SCCs, and there is a path u ↝ u' with u ∈ C, u' ∈ C', then there cannot be a path from C' back to C. (This is why the component graph is a DAG.)
Lemma 22.14: If there is an edge (u, v) ∈ E with u ∈ C and v ∈ C' (distinct SCCs), then f(C) > f(C').
Corollary 22.15: If there is an edge (u, v) ∈ Eᵀ with u ∈ C and v ∈ C' (distinct SCCs), then f(C) < f(C').
Theorem 22.16 (Correctness): The algorithm correctly computes SCCs.
Intuition:
- The first DFS computes finishing times
- In the second DFS on Gᵀ, we start from the SCC with the largest finishing time
- By Corollary 22.15, edges in Gᵀ leaving this SCC only go to SCCs with larger finishing times — but there are none, so the search stays within the SCC
- We then move to the next unvisited SCC with the largest finishing time, and repeat
- Each DFS tree in the second search corresponds to exactly one SCC
In essence, the second DFS visits the vertices of G^SCC in topologically sorted order.
Previous chapter
Data Structures for Disjoint SetsNext chapter
Minimum Spanning Trees