Augmenting Data Structures
Introduction to AlgorithmsChapter 14: Augmenting Data Structures
Many engineering situations require more than a textbook data structure. Rather than creating entirely new structures, we can often augment an existing one by storing additional information and programming new operations. The challenge is ensuring the added information can be efficiently updated and maintained by the ordinary operations.
This chapter augments red-black trees to build:
- Order-statistic trees — fast rank and selection queries on dynamic sets
- Interval trees — maintaining a dynamic set of intervals with overlap queries
14.1 Dynamic Order Statistics
An order-statistic tree is a red-black tree with additional information stored in each node to support fast order-statistic operations in O(lg n) time.
Structure
Each node x stores an extra attribute x.size — the number of internal nodes in the subtree rooted at x (including x itself).
- The sentinel's size is defined as 0:
T.nil.size = 0 - Size identity:
x.size = x.left.size + x.right.size + 1
- Keys need not be distinct. Rank is defined as the position at which an element would appear in an inorder tree walk.
Retrieving an Element with a Given Rank
OS-SELECT(x, i) returns a pointer to the node containing the i-th smallest key in the subtree rooted at x.
OS-SELECT(x, i)
1 r = x.left.size + 1
2 if i == r
3 return x
4 elseif i < r
5 return OS-SELECT(x.left, i)
6 else
7 return OS-SELECT(x.right, i - r)
How it works:
- Compute
r, the rank ofxwithin its own subtree - If
i == r, nodexis the answer - If
i < r, recurse into the left subtree - If
i > r, recurse into the right subtree looking for the(i - r)-th smallest element
Running time: O(lg n) — each recursive call descends one level in the red-black tree.
Determining the Rank of an Element
OS-RANK(T, x) returns the position of node x in the linear order determined by an inorder walk of T.
OS-RANK(T, x)
1 r = x.left.size + 1
2 y = x
3 while y ≠ T.root
4 if y == y.p.right
5 r = r + y.p.left.size + 1
6 y = y.p
7 return r
How it works:
- Start with
ras the rank ofxwithin its own subtree - Walk up the tree toward the root
- Each time
yis a right child, add the size of the left sibling's subtree plus 1 (for the parent) tor - When
yis a left child,rstays unchanged (no additional predecessors)
Loop invariant: At the start of each iteration, r is the rank of x.key in the subtree rooted at y.
Running time: O(lg n) — each iteration moves up one level.
Maintaining Subtree Sizes
Both insertion and deletion maintain the size attribute without affecting the O(lg n) running time.
Insertion (two phases):
- Phase 1 (going down): Increment
x.sizefor each node on the path from root to the insertion point. The new node getssize = 1. Cost: O(lg n). - Phase 2 (fixup — rotations): At most 2 rotations. After a rotation, update sizes in O(1):
// Added to LEFT-ROTATE(T, x) after the rotation:
y.size = x.size
x.size = x.left.size + x.right.size + 1
Deletion (two phases):
- Phase 1: Traverse from the removed node
yup to the root, decrementingsizeat each node. Cost: O(lg n). - Phase 2 (fixup): At most 3 rotations, each updated in O(1).
Total time for both insertion and deletion: O(lg n).
14.2 How to Augment a Data Structure
The Four-Step Method
Augmenting a data structure follows four general steps:
- Choose an underlying data structure.
- Determine additional information to maintain in the underlying data structure.
- Verify that the additional information can be maintained for the basic modifying operations on the underlying data structure.
- Develop new operations that use the additional information.
These steps need not be followed strictly in order — design is iterative. There is no point determining additional information (step 2) if it cannot be maintained efficiently (step 3).
Example — Order-statistic trees:
| Step | Decision |
|---|---|
| 1 | Red-black tree (supports MINIMUM, MAXIMUM, SUCCESSOR, PREDECESSOR efficiently) |
| 2 | size attribute in each node |
| 3 | Insertion/deletion maintain size in O(lg n) via path updates + O(1) per rotation |
| 4 | OS-SELECT and OS-RANK |
Theorem 14.1: Augmenting a Red-Black Tree
Let
fbe an attribute that augments a red-black treeTofnnodes, and suppose that the value offfor each nodexdepends only on the information in nodesx,x.left, andx.right(possibly includingx.left.fandx.right.f). Then we can maintain the values offin all nodes ofTduring insertion and deletion without asymptotically affecting the O(lg n) performance of these operations.
Proof sketch:
- A change to
x.fpropagates only to ancestors ofx— at most O(lg n) nodes - Insertion:
- Phase 1: Compute
x.fin O(1) for the new node, then propagate up — O(lg n) - Phase 2: At most 2 rotations, each affecting 2 nodes — O(lg n) total for
fupdates
- Phase 1: Compute
- Deletion:
- Phase 1: Local tree modification, propagate up — O(lg n)
- Phase 2: At most 3 rotations — O(lg n) total for
fupdates
In many practical cases (e.g., size in order-statistic trees), the cost of updating after a rotation is O(1) rather than O(lg n).
14.3 Interval Trees
An interval tree is a red-black tree augmented to maintain a dynamic set of intervals, supporting efficient overlap queries.
Interval Basics
- A closed interval
[t₁, t₂]wheret₁ ≤ t₂represents the set{t ∈ ℝ : t₁ ≤ t ≤ t₂} - An interval
iis represented as an object with attributes:i.low— the low endpointi.high— the high endpoint
- Two intervals
iandi'overlap if and only if:i.low ≤ i'.highANDi'.low ≤ i.high
Interval Trichotomy
For any two intervals i and i', exactly one of the following holds:
iandi'overlap (i ∩ i' ≠ ∅)iis to the left ofi'(i.high < i'.low)iis to the right ofi'(i'.high < i.low)
Supported Operations
- INTERVAL-INSERT(T, x) — adds element
x(containing an intervalx.int) to treeT - INTERVAL-DELETE(T, x) — removes element
xfrom treeT - INTERVAL-SEARCH(T, i) — returns a pointer to a node
xsuch thatx.intoverlapsi, orT.nilif no such interval exists
Design (Following the Four-Step Method)
Step 1: Underlying data structure
- Red-black tree where each node
xcontains an intervalx.int - The key of
xis the low endpoint:x.int.low - An inorder walk lists intervals sorted by low endpoint
Step 2: Additional information
- Each node
xstoresx.max— the maximum value of any interval endpoint in the subtree rooted atx
Step 3: Maintaining the information
- The
maxattribute satisfies:
x.max = max(x.int.high, x.left.max, x.right.max)
- Since
x.maxdepends only on information inx,x.left, andx.right, Theorem 14.1 guarantees that insertion and deletion run in O(lg n) time - Updates after a rotation take O(1) time
Step 4: New operation — INTERVAL-SEARCH
INTERVAL-SEARCH(T, i)
1 x = T.root
2 while x ≠ T.nil and i does not overlap x.int
3 if x.left ≠ T.nil and x.left.max ≥ i.low
4 x = x.left
5 else
6 x = x.right
7 return x
How it works:
- Start at the root and walk down the tree
- At each node, if the current interval doesn't overlap
i:- Go left if the left subtree exists and its
maxvalue is ≥i.low(there might be an overlapping interval in the left subtree) - Otherwise, go right
- Go left if the left subtree exists and its
- Terminates when an overlapping interval is found or
x = T.nil
Running time: O(lg n) — each iteration descends one level.
Correctness of INTERVAL-SEARCH (Theorem 14.2)
Any execution of INTERVAL-SEARCH(T, i) either returns a node whose interval overlaps
i, or returnsT.niland the tree contains no node whose interval overlapsi.
Loop invariant: If tree T contains an interval that overlaps i, then the subtree rooted at x contains such an interval.
Key arguments for maintenance:
-
Going right (line 5): Either
x.left = T.nil(no left subtree to search) orx.left.max < i.low. In the latter case, every intervali'in the left subtree hasi'.high ≤ x.left.max < i.low, so no interval in the left subtree overlapsi. Safe to go right. -
Going left (line 4): We have
x.left.max ≥ i.low, meaning some intervali'in the left subtree hasi'.high = x.left.max ≥ i.low. If no interval in the left subtree overlapsi, theni.high < i'.low. Since the tree is keyed by low endpoints, every intervali''in the right subtree hasi''.low ≥ i'.low > i.high, so no right-subtree interval overlapsieither. Safe to go left.
Summary Table
| Operation | Time |
|---|---|
| INTERVAL-INSERT | O(lg n) |
| INTERVAL-DELETE | O(lg n) |
| INTERVAL-SEARCH | O(lg n) |
Key Takeaways
- Augmenting existing data structures is often preferable to designing new ones from scratch
- The four-step method (choose structure → add info → verify maintenance → develop operations) provides a systematic approach
- Theorem 14.1 makes step 3 easy for red-black trees: if the new attribute depends only on a node and its children, it can be maintained in O(lg n) time
- Order-statistic trees add a
sizefield to support O(lg n) rank and selection queries - Interval trees add a
maxfield to support O(lg n) interval overlap queries - Both structures are built on red-black trees and preserve the O(lg n) time for all standard operations
Previous chapter
Red Black TreesNext chapter
Dynamic Programming