Pagefy

Book

Introduction to Algorithms

1.The Role of Algorithms in Computing 2.Getting Started 3.Growth of Functions 4.Divide and Conquer 5.Probabilistic Analysis and Randomized Algorithms 6.Heapsort 7.Quicksort 8.Sorting in Linear Time 9.Medians and Order Statistics 10.Elementary Data Structures 11.Hash Tables 12.Binary Search Trees 13.Red Black Trees 14.Augmenting Data Structures 15.Dynamic Programming 16.Greedy Algorithms 17.Amortized Analysis 18.B Trees 19.Fibonacci Heaps 20.van Emde Boas Trees 21.Data Structures for Disjoint Sets 22.Elementary Graph Algorithms 23.Minimum Spanning Trees 24.Single Source Shortest Paths 25.All Pairs Shortest Paths 26.Maximum Flow 27.Multithreaded Algorithms 28.Matrix Operations 29.Linear Programming 30.Polynomials and the FFT 31.Number Theoretic Algorithms 32.String Matching 33.Computational Geometry 34.NP Completeness 35.Approximation Algorithms 36.Appendix A Summations 37.Appendix B Sets Etc 38.Appendix C Counting and Probability 39.Appendix D Matrices

Augmenting Data Structures

Introduction to Algorithms

Chapter 14: Augmenting Data Structures

Many engineering situations require more than a textbook data structure. Rather than creating entirely new structures, we can often augment an existing one by storing additional information and programming new operations. The challenge is ensuring the added information can be efficiently updated and maintained by the ordinary operations.

This chapter augments red-black trees to build:

Order-statistic trees — fast rank and selection queries on dynamic sets
Interval trees — maintaining a dynamic set of intervals with overlap queries

14.1 Dynamic Order Statistics

An order-statistic tree is a red-black tree with additional information stored in each node to support fast order-statistic operations in O(lg n) time.

Structure

Each node x stores an extra attribute x.size — the number of internal nodes in the subtree rooted at x (including x itself).

The sentinel's size is defined as 0: T.nil.size = 0
Size identity:

x.size = x.left.size + x.right.size + 1

Keys need not be distinct. Rank is defined as the position at which an element would appear in an inorder tree walk.

Retrieving an Element with a Given Rank

OS-SELECT(x, i) returns a pointer to the node containing the i-th smallest key in the subtree rooted at x.

OS-SELECT(x, i)
1  r = x.left.size + 1
2  if i == r
3      return x
4  elseif i < r
5      return OS-SELECT(x.left, i)
6  else
7      return OS-SELECT(x.right, i - r)

How it works:

Compute r, the rank of x within its own subtree
If i == r, node x is the answer
If i < r, recurse into the left subtree
If i > r, recurse into the right subtree looking for the (i - r)-th smallest element

Running time: O(lg n) — each recursive call descends one level in the red-black tree.

Determining the Rank of an Element

OS-RANK(T, x) returns the position of node x in the linear order determined by an inorder walk of T.

OS-RANK(T, x)
1  r = x.left.size + 1
2  y = x
3  while y ≠ T.root
4      if y == y.p.right
5          r = r + y.p.left.size + 1
6      y = y.p
7  return r

How it works:

Start with r as the rank of x within its own subtree
Walk up the tree toward the root
Each time y is a right child, add the size of the left sibling's subtree plus 1 (for the parent) to r
When y is a left child, r stays unchanged (no additional predecessors)

Loop invariant: At the start of each iteration, r is the rank of x.key in the subtree rooted at y.

Running time: O(lg n) — each iteration moves up one level.

Maintaining Subtree Sizes

Both insertion and deletion maintain the size attribute without affecting the O(lg n) running time.

Insertion (two phases):

Phase 1 (going down): Increment x.size for each node on the path from root to the insertion point. The new node gets size = 1. Cost: O(lg n).
Phase 2 (fixup — rotations): At most 2 rotations. After a rotation, update sizes in O(1):

// Added to LEFT-ROTATE(T, x) after the rotation:
y.size = x.size
x.size = x.left.size + x.right.size + 1

Deletion (two phases):

Phase 1: Traverse from the removed node y up to the root, decrementing size at each node. Cost: O(lg n).
Phase 2 (fixup): At most 3 rotations, each updated in O(1).

Total time for both insertion and deletion: O(lg n).

14.2 How to Augment a Data Structure

The Four-Step Method

Augmenting a data structure follows four general steps:

Choose an underlying data structure.
Determine additional information to maintain in the underlying data structure.
Verify that the additional information can be maintained for the basic modifying operations on the underlying data structure.
Develop new operations that use the additional information.

These steps need not be followed strictly in order — design is iterative. There is no point determining additional information (step 2) if it cannot be maintained efficiently (step 3).

Example — Order-statistic trees:

Step	Decision
1	Red-black tree (supports MINIMUM, MAXIMUM, SUCCESSOR, PREDECESSOR efficiently)
2	`size` attribute in each node
3	Insertion/deletion maintain `size` in O(lg n) via path updates + O(1) per rotation
4	OS-SELECT and OS-RANK

Theorem 14.1: Augmenting a Red-Black Tree

Let f be an attribute that augments a red-black tree T of n nodes, and suppose that the value of f for each node x depends only on the information in nodes x, x.left, and x.right (possibly including x.left.f and x.right.f). Then we can maintain the values of f in all nodes of T during insertion and deletion without asymptotically affecting the O(lg n) performance of these operations.

Proof sketch:

A change to x.f propagates only to ancestors of x — at most O(lg n) nodes
Insertion:
- Phase 1: Compute x.f in O(1) for the new node, then propagate up — O(lg n)
- Phase 2: At most 2 rotations, each affecting 2 nodes — O(lg n) total for f updates
Deletion:
- Phase 1: Local tree modification, propagate up — O(lg n)
- Phase 2: At most 3 rotations — O(lg n) total for f updates

In many practical cases (e.g., size in order-statistic trees), the cost of updating after a rotation is O(1) rather than O(lg n).

14.3 Interval Trees

An interval tree is a red-black tree augmented to maintain a dynamic set of intervals, supporting efficient overlap queries.

Interval Basics

A closed interval [t₁, t₂] where t₁ ≤ t₂ represents the set {t ∈ ℝ : t₁ ≤ t ≤ t₂}
An interval i is represented as an object with attributes:
- i.low — the low endpoint
- i.high — the high endpoint
Two intervals i and i' overlap if and only if: i.low ≤ i'.high AND i'.low ≤ i.high

Interval Trichotomy

For any two intervals i and i', exactly one of the following holds:

i and i' overlap (i ∩ i' ≠ ∅)
i is to the left of i' (i.high < i'.low)
i is to the right of i' (i'.high < i.low)

Supported Operations

INTERVAL-INSERT(T, x) — adds element x (containing an interval x.int) to tree T
INTERVAL-DELETE(T, x) — removes element x from tree T
INTERVAL-SEARCH(T, i) — returns a pointer to a node x such that x.int overlaps i, or T.nil if no such interval exists

Design (Following the Four-Step Method)

Step 1: Underlying data structure

Red-black tree where each node x contains an interval x.int
The key of x is the low endpoint: x.int.low
An inorder walk lists intervals sorted by low endpoint

Step 2: Additional information

Each node x stores x.max — the maximum value of any interval endpoint in the subtree rooted at x

Step 3: Maintaining the information

The max attribute satisfies:

x.max = max(x.int.high, x.left.max, x.right.max)

Since x.max depends only on information in x, x.left, and x.right, Theorem 14.1 guarantees that insertion and deletion run in O(lg n) time
Updates after a rotation take O(1) time

Step 4: New operation — INTERVAL-SEARCH

INTERVAL-SEARCH(T, i)
1  x = T.root
2  while x ≠ T.nil and i does not overlap x.int
3      if x.left ≠ T.nil and x.left.max ≥ i.low
4          x = x.left
5      else
6          x = x.right
7  return x

How it works:

Start at the root and walk down the tree
At each node, if the current interval doesn't overlap i:
- Go left if the left subtree exists and its max value is ≥ i.low (there might be an overlapping interval in the left subtree)
- Otherwise, go right
Terminates when an overlapping interval is found or x = T.nil

Running time: O(lg n) — each iteration descends one level.

Correctness of INTERVAL-SEARCH (Theorem 14.2)

Any execution of INTERVAL-SEARCH(T, i) either returns a node whose interval overlaps i, or returns T.nil and the tree contains no node whose interval overlaps i.

Loop invariant: If tree T contains an interval that overlaps i, then the subtree rooted at x contains such an interval.

Key arguments for maintenance:

Going right (line 5): Either x.left = T.nil (no left subtree to search) or x.left.max < i.low. In the latter case, every interval i' in the left subtree has i'.high ≤ x.left.max < i.low, so no interval in the left subtree overlaps i. Safe to go right.
Going left (line 4): We have x.left.max ≥ i.low, meaning some interval i' in the left subtree has i'.high = x.left.max ≥ i.low. If no interval in the left subtree overlaps i, then i.high < i'.low. Since the tree is keyed by low endpoints, every interval i'' in the right subtree has i''.low ≥ i'.low > i.high, so no right-subtree interval overlaps i either. Safe to go left.

Summary Table

Operation	Time
INTERVAL-INSERT	O(lg n)
INTERVAL-DELETE	O(lg n)
INTERVAL-SEARCH	O(lg n)

Key Takeaways

Augmenting existing data structures is often preferable to designing new ones from scratch
The four-step method (choose structure → add info → verify maintenance → develop operations) provides a systematic approach
Theorem 14.1 makes step 3 easy for red-black trees: if the new attribute depends only on a node and its children, it can be maintained in O(lg n) time
Order-statistic trees add a size field to support O(lg n) rank and selection queries
Interval trees add a max field to support O(lg n) interval overlap queries
Both structures are built on red-black trees and preserve the O(lg n) time for all standard operations

Previous chapter

Red Black Trees

Next chapter

Dynamic Programming