Pagefy

Pagefy

Back to Data Structures and Algorithms

Elementary Graph Algorithms

Introduction to Algorithms

Chapter 22: Elementary Graph Algorithms

This chapter presents methods for representing a graph and for searching a graph. Searching a graph means systematically following the edges to visit the vertices. A graph-searching algorithm can discover much about the structure of a graph. Techniques for searching a graph lie at the heart of the field of graph algorithms.


22.1 Representations of Graphs

Two standard ways to represent a graph G = (V, E):

  1. Adjacency lists — preferred for sparse graphs (|E| ≪ |V|²)
  2. Adjacency matrix — preferred for dense graphs (|E| ≈ |V|²) or when quick edge-existence queries are needed

Adjacency-List Representation

  • An array Adj of |V| linked lists, one per vertex
  • For each vertex u, Adj[u] contains all vertices v such that (u, v) ∈ E
  • In pseudocode: G.Adj[u]

Space complexity:

  • Directed graph: sum of all list lengths = |E| → memory is Θ(V + E)
  • Undirected graph: sum of all list lengths = 2|E| → memory is Θ(V + E)

Advantages:

  • Compact for sparse graphs
  • Easily adapted for weighted graphs (store weight w(u, v) alongside v in u's list)
  • Robust — can be modified to support many graph variants

Disadvantage:

  • No faster way to check if edge (u, v) exists than scanning Adj[u]

Adjacency-Matrix Representation

  • A |V| × |V| matrix A = (a_ij) where:
    • a_ij = 1 if (i, j) ∈ E
    • a_ij = 0 otherwise

Space complexity: Θ(V²) regardless of number of edges

Properties:

  • For undirected graphs, the matrix is symmetric: A = Aᵀ (can store only upper triangle to save ~half the space)
  • For weighted graphs, store w(u, v) as the entry; use NIL, 0, or ∞ for non-edges
  • Simpler than adjacency lists; requires only one bit per entry for unweighted graphs

Representing Attributes

  • Vertex attributes: denoted v.d, v.π, etc.
  • Edge attributes: denoted (u, v).f, etc.
  • In practice, attributes can be stored in parallel arrays (e.g., d[1..|V|]) or as instance variables in an object-oriented design

22.2 Breadth-First Search (BFS)

Breadth-first search is one of the simplest graph-searching algorithms and the archetype for many important graph algorithms (e.g., Prim's MST, Dijkstra's shortest paths).

Given graph G = (V, E) and a source vertex s, BFS:

  • Discovers every vertex reachable from s
  • Computes the shortest-path distance (fewest edges) from s to each reachable vertex
  • Produces a breadth-first tree rooted at s

BFS discovers all vertices at distance k before any vertex at distance k + 1.

Vertex Coloring

ColorMeaning
WHITEUndiscovered
GRAYDiscovered but not fully explored (the frontier)
BLACKFully explored (all adjacent vertices discovered)

Vertex Attributes

  • u.color — current color
  • u.d — distance from source s
  • u.π — predecessor (parent) in the BFS tree

Algorithm

BFS(G, s)
 1  for each vertex u ∈ G.V - {s}
 2      u.color = WHITE
 3      u.d = ∞
 4      u.π = NIL
 5  s.color = GRAY
 6  s.d = 0
 7  s.π = NIL
 8  Q = ∅
 9  ENQUEUE(Q, s)
10  while Q ≠ ∅
11      u = DEQUEUE(Q)
12      for each v ∈ G.Adj[u]
13          if v.color == WHITE
14              v.color = GRAY
15              v.d = u.d + 1
16              v.π = u
17              ENQUEUE(Q, v)
18      u.color = BLACK

Analysis

  • Each vertex is enqueued and dequeued at most once → queue operations: O(V)
  • Each adjacency list is scanned at most once → total scan time: O(E)
  • Initialization: O(V)
  • Total running time: O(V + E)

Shortest Paths

Shortest-path distance δ(s, v) = minimum number of edges in any path from s to v (∞ if unreachable).

Key results:

  • Lemma 22.1: For any edge (u, v) ∈ E: δ(s, v) ≤ δ(s, u) + 1
  • Lemma 22.2: Upon termination, v.d ≥ δ(s, v) for all v ∈ V (upper bound property)
  • Lemma 22.3: At any point during BFS, if Q = ⟨v₁, v₂, ..., vᵣ⟩, then vᵣ.d ≤ v₁.d + 1 and vᵢ.d ≤ vᵢ₊₁.d (queue holds at most two distinct d values)
  • Corollary 22.4: Vertices are enqueued in monotonically non-decreasing order of d values
  • Theorem 22.5 (Correctness of BFS): BFS discovers every vertex reachable from s, and upon termination v.d = δ(s, v) for all v ∈ V. For any reachable v ≠ s, a shortest path from s to v is a shortest path from s to v.π followed by edge (v.π, v).

Breadth-First Trees

The predecessor subgraph G_π = (V_π, E_π) where:

  • V_π = {v ∈ V : v.π ≠ NIL} ∪ {s}
  • E_π = {(v.π, v) : v ∈ V_π - {s}}

Lemma 22.6: BFS constructs π so that G_π is a breadth-first tree — it contains a unique simple path from s to each reachable vertex, and that path is a shortest path.

Printing Shortest Paths

PRINT-PATH(G, s, v)
1  if v == s
2      print s
3  elseif v.π == NIL
4      print "no path from" s "to" v "exists"
5  else PRINT-PATH(G, s, v.π)
6      print v

Runs in time linear in the number of vertices on the path.


22.3 Depth-First Search (DFS)

Depth-first search explores edges out of the most recently discovered vertex that still has unexplored edges. When all edges of a vertex are explored, the search backtracks. If undiscovered vertices remain, DFS selects a new source and repeats. This continues until every vertex is discovered.

Key Differences from BFS

FeatureBFSDFS
Data structureQueue (FIFO)Stack (implicit via recursion)
SourceSingle sourceMultiple sources possible
Predecessor subgraphSingle treeDepth-first forest (multiple trees)

Timestamps

Each vertex v gets two timestamps (integers in [1, 2|V|]):

  • v.ddiscovery time (when v is first discovered / grayed)
  • v.ffinishing time (when v's adjacency list is fully examined / blackened)

For every vertex u: u.d < u.f

Vertex u is:

  • WHITE before time u.d
  • GRAY between u.d and u.f
  • BLACK after u.f

Algorithm

DFS(G)
1  for each vertex u ∈ G.V
2      u.color = WHITE
3      u.π = NIL
4  time = 0
5  for each vertex u ∈ G.V
6      if u.color == WHITE
7          DFS-VISIT(G, u)

DFS-VISIT(G, u)
 1  time = time + 1           // white vertex u has just been discovered
 2  u.d = time
 3  u.color = GRAY
 4  for each v ∈ G.Adj[u]     // explore edge (u, v)
 5      if v.color == WHITE
 6          v.π = u
 7          DFS-VISIT(G, v)
 8  u.color = BLACK            // blacken u; it is finished
 9  time = time + 1
10  u.f = time

Analysis

  • Initialization (lines 1–3, 5–7 of DFS): Θ(V)
  • DFS-VISIT called exactly once per vertex; loop on lines 4–7 executes |Adj[v]| times per call
  • Total: Σ|Adj[v]| = Θ(E)
  • Total running time: Θ(V + E)

Parenthesis Theorem (Theorem 22.7)

For any two vertices u and v, exactly one of the following holds:

  1. [u.d, u.f] and [v.d, v.f] are entirely disjoint — neither is a descendant of the other
  2. [u.d, u.f] ⊂ [v.d, v.f] — u is a descendant of v
  3. [v.d, v.f] ⊂ [u.d, u.f] — v is a descendant of u

Discovery/finishing times have proper parenthesis structure: parentheses are always properly nested.

Corollary 22.8 (Nesting of Descendants' Intervals)

Vertex v is a proper descendant of u in the DFS forest if and only if:

u.d < v.d < v.f < u.f

White-Path Theorem (Theorem 22.9)

Vertex v is a descendant of u if and only if at time u.d, there exists a path from u to v consisting entirely of white vertices.

Classification of Edges

DFS classifies every edge (u, v) into one of four types:

Edge TypeDefinitionColor of v when (u,v) explored
Tree edgeEdge in the DFS forest; v was first discovered via this edgeWHITE
Back edgeConnects u to an ancestor v in a DFS tree (includes self-loops)GRAY
Forward edgeNon-tree edge connecting u to a descendant vBLACK (u.d < v.d)
Cross edgeAll other edges (between trees or non-ancestor/descendant in same tree)BLACK (u.d > v.d)

Key results:

  • Theorem 22.10: In a DFS of an undirected graph, every edge is either a tree edge or a back edge (no forward or cross edges)
  • Lemma 22.11: A directed graph G is acyclic if and only if a DFS of G yields no back edges

22.4 Topological Sort

A topological sort of a directed acyclic graph (DAG) G = (V, E) is a linear ordering of all its vertices such that if G contains edge (u, v), then u appears before v in the ordering.

  • Only possible if the graph has no cycles
  • Visualized as vertices along a horizontal line with all directed edges going left to right
  • Used to indicate precedences among events

Algorithm

TOPOLOGICAL-SORT(G)
1  call DFS(G) to compute finishing times v.f for each vertex v
2  as each vertex is finished, insert it onto the front of a linked list
3  return the linked list of vertices

Vertices appear in reverse order of finishing times (decreasing v.f).

Analysis

  • DFS: Θ(V + E)
  • Inserting each vertex onto front of linked list: O(1) per vertex → O(V) total
  • Total running time: Θ(V + E)

Correctness

Lemma 22.11: A directed graph G is acyclic if and only if a DFS of G yields no back edges.

Proof sketch:

  • (⇒) If DFS produces a back edge (u, v), then v is an ancestor of u, so there is a path v → u, and the back edge u → v completes a cycle — contradiction
  • (⇐) If G contains a cycle c, let v be the first vertex discovered in c. By the white-path theorem, all vertices in c become descendants of v, so the edge into v from c is a back edge

Theorem 22.12: TOPOLOGICAL-SORT produces a correct topological sort.

Proof sketch: For any edge (u, v) explored by DFS:

  • v cannot be gray (that would make it a back edge, contradicting acyclicity)
  • If v is white → v becomes a descendant of u → v.f < u.f
  • If v is black → v.f is already set, and u.f will be set later → v.f < u.f

In all cases, v.f < u.f, so u appears before v in the ordering (decreasing finish time).


22.5 Strongly Connected Components

A strongly connected component (SCC) of a directed graph G = (V, E) is a maximal set of vertices C ⊆ V such that for every pair u, v ∈ C: u ↝ v and v ↝ u (mutually reachable).

Key Concepts

  • Transpose graph Gᵀ = (V, Eᵀ) where Eᵀ = {(u, v) : (v, u) ∈ E} — all edges reversed

    • G and Gᵀ have exactly the same strongly connected components
    • Can be computed in O(V + E) time from adjacency lists
  • Component graph G^SCC = (V^SCC, E^SCC):

    • One vertex per SCC
    • Edge (i, j) ∈ E^SCC if there exists (x, y) ∈ E with x ∈ Cᵢ, y ∈ Cⱼ
    • The component graph is always a DAG (Lemma 22.13)

Algorithm (Kosaraju's Algorithm)

STRONGLY-CONNECTED-COMPONENTS(G)
1  call DFS(G) to compute finishing times u.f for each vertex u
2  compute Gᵀ
3  call DFS(Gᵀ), but in the main loop of DFS, consider vertices
       in order of decreasing u.f (as computed in line 1)
4  output the vertices of each tree in the depth-first forest
       formed in line 3 as a separate strongly connected component

Analysis

  • First DFS: Θ(V + E)
  • Computing Gᵀ: O(V + E)
  • Second DFS: Θ(V + E)
  • Total running time: Θ(V + E)

Why It Works

Notation: For a set of vertices U ⊆ V:

  • d(U) = min{u.d : u ∈ U} (earliest discovery time)
  • f(U) = max{u.f : u ∈ U} (latest finishing time)

Lemma 22.13: If C and C' are distinct SCCs, and there is a path u ↝ u' with u ∈ C, u' ∈ C', then there cannot be a path from C' back to C. (This is why the component graph is a DAG.)

Lemma 22.14: If there is an edge (u, v) ∈ E with u ∈ C and v ∈ C' (distinct SCCs), then f(C) > f(C').

Corollary 22.15: If there is an edge (u, v) ∈ Eᵀ with u ∈ C and v ∈ C' (distinct SCCs), then f(C) < f(C').

Theorem 22.16 (Correctness): The algorithm correctly computes SCCs.

Intuition:

  1. The first DFS computes finishing times
  2. In the second DFS on Gᵀ, we start from the SCC with the largest finishing time
  3. By Corollary 22.15, edges in Gᵀ leaving this SCC only go to SCCs with larger finishing times — but there are none, so the search stays within the SCC
  4. We then move to the next unvisited SCC with the largest finishing time, and repeat
  5. Each DFS tree in the second search corresponds to exactly one SCC

In essence, the second DFS visits the vertices of G^SCC in topologically sorted order.