Pagefy

Pagefy

Back to Data Structures and Algorithms

Augmenting Data Structures

Introduction to Algorithms

Chapter 14: Augmenting Data Structures

Many engineering situations require more than a textbook data structure. Rather than creating entirely new structures, we can often augment an existing one by storing additional information and programming new operations. The challenge is ensuring the added information can be efficiently updated and maintained by the ordinary operations.

This chapter augments red-black trees to build:

  1. Order-statistic trees — fast rank and selection queries on dynamic sets
  2. Interval trees — maintaining a dynamic set of intervals with overlap queries

14.1 Dynamic Order Statistics

An order-statistic tree is a red-black tree with additional information stored in each node to support fast order-statistic operations in O(lg n) time.

Structure

Each node x stores an extra attribute x.size — the number of internal nodes in the subtree rooted at x (including x itself).

  • The sentinel's size is defined as 0: T.nil.size = 0
  • Size identity:
x.size = x.left.size + x.right.size + 1
  • Keys need not be distinct. Rank is defined as the position at which an element would appear in an inorder tree walk.

Retrieving an Element with a Given Rank

OS-SELECT(x, i) returns a pointer to the node containing the i-th smallest key in the subtree rooted at x.

OS-SELECT(x, i)
1  r = x.left.size + 1
2  if i == r
3      return x
4  elseif i < r
5      return OS-SELECT(x.left, i)
6  else
7      return OS-SELECT(x.right, i - r)

How it works:

  • Compute r, the rank of x within its own subtree
  • If i == r, node x is the answer
  • If i < r, recurse into the left subtree
  • If i > r, recurse into the right subtree looking for the (i - r)-th smallest element

Running time: O(lg n) — each recursive call descends one level in the red-black tree.

Determining the Rank of an Element

OS-RANK(T, x) returns the position of node x in the linear order determined by an inorder walk of T.

OS-RANK(T, x)
1  r = x.left.size + 1
2  y = x
3  while y ≠ T.root
4      if y == y.p.right
5          r = r + y.p.left.size + 1
6      y = y.p
7  return r

How it works:

  • Start with r as the rank of x within its own subtree
  • Walk up the tree toward the root
  • Each time y is a right child, add the size of the left sibling's subtree plus 1 (for the parent) to r
  • When y is a left child, r stays unchanged (no additional predecessors)

Loop invariant: At the start of each iteration, r is the rank of x.key in the subtree rooted at y.

Running time: O(lg n) — each iteration moves up one level.

Maintaining Subtree Sizes

Both insertion and deletion maintain the size attribute without affecting the O(lg n) running time.

Insertion (two phases):

  1. Phase 1 (going down): Increment x.size for each node on the path from root to the insertion point. The new node gets size = 1. Cost: O(lg n).
  2. Phase 2 (fixup — rotations): At most 2 rotations. After a rotation, update sizes in O(1):
// Added to LEFT-ROTATE(T, x) after the rotation:
y.size = x.size
x.size = x.left.size + x.right.size + 1

Deletion (two phases):

  1. Phase 1: Traverse from the removed node y up to the root, decrementing size at each node. Cost: O(lg n).
  2. Phase 2 (fixup): At most 3 rotations, each updated in O(1).

Total time for both insertion and deletion: O(lg n).


14.2 How to Augment a Data Structure

The Four-Step Method

Augmenting a data structure follows four general steps:

  1. Choose an underlying data structure.
  2. Determine additional information to maintain in the underlying data structure.
  3. Verify that the additional information can be maintained for the basic modifying operations on the underlying data structure.
  4. Develop new operations that use the additional information.

These steps need not be followed strictly in order — design is iterative. There is no point determining additional information (step 2) if it cannot be maintained efficiently (step 3).

Example — Order-statistic trees:

StepDecision
1Red-black tree (supports MINIMUM, MAXIMUM, SUCCESSOR, PREDECESSOR efficiently)
2size attribute in each node
3Insertion/deletion maintain size in O(lg n) via path updates + O(1) per rotation
4OS-SELECT and OS-RANK

Theorem 14.1: Augmenting a Red-Black Tree

Let f be an attribute that augments a red-black tree T of n nodes, and suppose that the value of f for each node x depends only on the information in nodes x, x.left, and x.right (possibly including x.left.f and x.right.f). Then we can maintain the values of f in all nodes of T during insertion and deletion without asymptotically affecting the O(lg n) performance of these operations.

Proof sketch:

  • A change to x.f propagates only to ancestors of x — at most O(lg n) nodes
  • Insertion:
    • Phase 1: Compute x.f in O(1) for the new node, then propagate up — O(lg n)
    • Phase 2: At most 2 rotations, each affecting 2 nodes — O(lg n) total for f updates
  • Deletion:
    • Phase 1: Local tree modification, propagate up — O(lg n)
    • Phase 2: At most 3 rotations — O(lg n) total for f updates

In many practical cases (e.g., size in order-statistic trees), the cost of updating after a rotation is O(1) rather than O(lg n).


14.3 Interval Trees

An interval tree is a red-black tree augmented to maintain a dynamic set of intervals, supporting efficient overlap queries.

Interval Basics

  • A closed interval [t₁, t₂] where t₁ ≤ t₂ represents the set {t ∈ ℝ : t₁ ≤ t ≤ t₂}
  • An interval i is represented as an object with attributes:
    • i.low — the low endpoint
    • i.high — the high endpoint
  • Two intervals i and i' overlap if and only if: i.low ≤ i'.high AND i'.low ≤ i.high

Interval Trichotomy

For any two intervals i and i', exactly one of the following holds:

  1. i and i' overlap (i ∩ i' ≠ ∅)
  2. i is to the left of i' (i.high < i'.low)
  3. i is to the right of i' (i'.high < i.low)

Supported Operations

  • INTERVAL-INSERT(T, x) — adds element x (containing an interval x.int) to tree T
  • INTERVAL-DELETE(T, x) — removes element x from tree T
  • INTERVAL-SEARCH(T, i) — returns a pointer to a node x such that x.int overlaps i, or T.nil if no such interval exists

Design (Following the Four-Step Method)

Step 1: Underlying data structure

  • Red-black tree where each node x contains an interval x.int
  • The key of x is the low endpoint: x.int.low
  • An inorder walk lists intervals sorted by low endpoint

Step 2: Additional information

  • Each node x stores x.max — the maximum value of any interval endpoint in the subtree rooted at x

Step 3: Maintaining the information

  • The max attribute satisfies:
x.max = max(x.int.high, x.left.max, x.right.max)
  • Since x.max depends only on information in x, x.left, and x.right, Theorem 14.1 guarantees that insertion and deletion run in O(lg n) time
  • Updates after a rotation take O(1) time

Step 4: New operation — INTERVAL-SEARCH

INTERVAL-SEARCH(T, i)
1  x = T.root
2  while x ≠ T.nil and i does not overlap x.int
3      if x.left ≠ T.nil and x.left.max ≥ i.low
4          x = x.left
5      else
6          x = x.right
7  return x

How it works:

  • Start at the root and walk down the tree
  • At each node, if the current interval doesn't overlap i:
    • Go left if the left subtree exists and its max value is ≥ i.low (there might be an overlapping interval in the left subtree)
    • Otherwise, go right
  • Terminates when an overlapping interval is found or x = T.nil

Running time: O(lg n) — each iteration descends one level.

Correctness of INTERVAL-SEARCH (Theorem 14.2)

Any execution of INTERVAL-SEARCH(T, i) either returns a node whose interval overlaps i, or returns T.nil and the tree contains no node whose interval overlaps i.

Loop invariant: If tree T contains an interval that overlaps i, then the subtree rooted at x contains such an interval.

Key arguments for maintenance:

  • Going right (line 5): Either x.left = T.nil (no left subtree to search) or x.left.max < i.low. In the latter case, every interval i' in the left subtree has i'.high ≤ x.left.max < i.low, so no interval in the left subtree overlaps i. Safe to go right.

  • Going left (line 4): We have x.left.max ≥ i.low, meaning some interval i' in the left subtree has i'.high = x.left.max ≥ i.low. If no interval in the left subtree overlaps i, then i.high < i'.low. Since the tree is keyed by low endpoints, every interval i'' in the right subtree has i''.low ≥ i'.low > i.high, so no right-subtree interval overlaps i either. Safe to go left.

Summary Table

OperationTime
INTERVAL-INSERTO(lg n)
INTERVAL-DELETEO(lg n)
INTERVAL-SEARCHO(lg n)

Key Takeaways

  • Augmenting existing data structures is often preferable to designing new ones from scratch
  • The four-step method (choose structure → add info → verify maintenance → develop operations) provides a systematic approach
  • Theorem 14.1 makes step 3 easy for red-black trees: if the new attribute depends only on a node and its children, it can be maintained in O(lg n) time
  • Order-statistic trees add a size field to support O(lg n) rank and selection queries
  • Interval trees add a max field to support O(lg n) interval overlap queries
  • Both structures are built on red-black trees and preserve the O(lg n) time for all standard operations

Previous chapter

Red Black Trees