Pagefy

Pagefy

Back to Data Structures and Algorithms

van Emde Boas Trees

Introduction to Algorithms

Chapter 20: van Emde Boas Trees

Motivation

  • Previous data structures (binary heaps, red-black trees, Fibonacci heaps) require at least Ω(lg n) time for at least one priority-queue operation
  • The Ω(n lg n) sorting lower bound implies this when decisions are comparison-based
  • When keys are integers in a bounded range, we can circumvent this lower bound (as with counting sort)
  • van Emde Boas (vEB) trees support all dynamic set operations in O(lg lg u) worst-case time
    • Keys must be integers in the range 0 to u − 1, with no duplicates

Supported Operations

All in O(lg lg u) time:

  • MEMBER, INSERT, DELETE
  • MINIMUM, MAXIMUM
  • SUCCESSOR, PREDECESSOR

Key Definitions

  • n = number of elements currently in the set
  • u = universe size (range of possible values: {0, 1, 2, ..., u − 1})
  • Assume u = 2^k for some integer k ≥ 1

Preliminary Approaches

Direct Addressing (Bit Vector)

  • Store a bit vector A[0 .. u − 1]: A[x] = 1 if x is in the set
  • O(1) time for INSERT, DELETE, MEMBER
  • Θ(u) time for MINIMUM, MAXIMUM, SUCCESSOR, PREDECESSOR (must scan)

Superimposing a Binary Tree

  • Overlay a binary tree of bits on top of the bit vector
  • Each internal node stores the logical-OR of its two children
  • A node contains 1 iff some leaf in its subtree contains 1

Operations using the binary tree:

  • MINIMUM: From root, always go to the leftmost child containing a 1
  • MAXIMUM: From root, always go to the rightmost child containing a 1
  • SUCCESSOR(x): Go up from leaf x until entering a node from the left that has a 1 in its right child; then descend that right subtree taking leftmost 1s
  • PREDECESSOR(x): Symmetric — go up until entering from the right with a 1 in the left child; descend taking rightmost 1s
  • INSERT(x): Set 1 in every node on the path from leaf x to root
  • DELETE(x): Go from leaf x to root, recomputing each node as logical-OR of its children

Running time: All operations O(lg u) (tree height is lg u)

Superimposing a Tree of Constant Height

  • Assume u = 2^(2k), so √u is an integer
  • Use a tree of degree √u instead of degree 2 → tree height is always 2
  • The √u internal nodes at depth 1 form a summary[0 .. √u − 1] array
  • summary[i] = 1 iff the i-th cluster (subarray A[i√u .. (i+1)√u − 1]) contains a 1

Key indexing: For a value x:

  • Cluster number = ⌊x / √u⌋
  • Position within cluster = x mod √u

Operations:

  • INSERT(x): Set A[x] = 1 and summary[⌊x/√u⌋] = 1 → O(1)
  • MINIMUM/MAXIMUM: Find leftmost/rightmost 1 in summary, then search within that cluster → O(√u)
  • SUCCESSOR(x): Search right within x's cluster; if not found, search right in summary, then find minimum in that cluster → O(√u)
  • DELETE(x): Set A[x] = 0, recompute summary[⌊x/√u⌋] as OR of the cluster → O(√u)

The degree-√u tree idea is the key insight behind vEB trees, even though O(√u) is worse than O(lg u).


A Recursive Structure

The Core Recurrence

The target recurrence for O(lg lg u) operations:

T(u) = T(√u) + O(1)

Proof that T(u) = O(lg lg u):

  • Let m = lg u, so u = 2^m
  • T(2^m) = T(2^(m/2)) + O(1)
  • Let S(m) = T(2^m): S(m) = S(m/2) + O(1) → S(m) = O(lg m) by master method
  • Therefore T(u) = O(lg m) = O(lg lg u)

Intuition: The universe size sequence is u, u^(1/2), u^(1/4), u^(1/8), ... Each level halves the number of bits needed. Starting with lg u bits, after lg(lg u) levels we reach universe size 2.

Helper Functions

For a universe of size u, given a value x (viewed as a lg u-bit integer):

FunctionDefinitionMeaning
high(x)⌊x / √u⌋Most significant (lg u)/2 bits — cluster number
low(x)x mod √uLeast significant (lg u)/2 bits — position within cluster
index(x, y)x · √u + yReconstruct element from cluster number and offset

Identity: x = index(high(x), low(x))

Proto van Emde Boas Structures

A proto-vEB structure proto-EB(u) is defined recursively:

  • Base case (u = 2): Contains an array A[0..1] of two bits
  • Recursive case (u ≥ 4, u = 2^(2k)): Contains:
    • u: universe size
    • summary: pointer to a proto-EB(√u) structure
    • cluster[0 .. √u − 1]: array of √u pointers to proto-EB(√u) structures

Element x is stored in cluster[high(x)] at position low(x).

Operations on Proto-vEB

MEMBER — O(lg lg u) ✓

PROTO-VEB-MEMBER(V, x)
  if V.u == 2
      return V.A[x]
  else return PROTO-VEB-MEMBER(V.cluster[high(x)], low(x))

Recurrence: T(u) = T(√u) + O(1) → O(lg lg u)

MINIMUM — Θ(lg u) ✗

PROTO-VEB-MINIMUM(V)
  if V.u == 2
      if V.A[0] == 1
          return 0
      elseif V.A[1] == 1
          return 1
      else return NIL
  else min-cluster = PROTO-VEB-MINIMUM(V.summary)
      if min-cluster == NIL
          return NIL
      else offset = PROTO-VEB-MINIMUM(V.cluster[min-cluster])
          return index(min-cluster, offset)

Makes two recursive calls → T(u) = 2T(√u) + O(1) → Θ(lg u)

SUCCESSOR — Θ(lg u · lg lg u) ✗

PROTO-VEB-SUCCESSOR(V, x)
  if V.u == 2
      if x == 0 and V.A[1] == 1
          return 1
      else return NIL
  else offset = PROTO-VEB-SUCCESSOR(V.cluster[high(x)], low(x))
      if offset ≠ NIL
          return index(high(x), offset)
      else succ-cluster = PROTO-VEB-SUCCESSOR(V.summary, high(x))
          if succ-cluster == NIL
              return NIL
          else offset = PROTO-VEB-MINIMUM(V.cluster[succ-cluster])
              return index(succ-cluster, offset)

Two recursive calls + one MINIMUM call → T(u) = 2T(√u) + Θ(lg u) → Θ(lg u · lg lg u)

INSERT — Θ(lg u) ✗

PROTO-VEB-INSERT(V, x)
  if V.u == 2
      V.A[x] = 1
  else PROTO-VEB-INSERT(V.cluster[high(x)], low(x))
      PROTO-VEB-INSERT(V.summary, high(x))

Two recursive calls → T(u) = 2T(√u) + O(1) → Θ(lg u)

Problem: Proto-vEB makes too many recursive calls. We need to reduce each operation to at most one recursive call.


The van Emde Boas Tree

Handling Non-Square Universe Sizes

When u is an odd power of 2 (u = 2^(2k+1)), √u is not an integer. We define:

SymbolDefinitionMeaning
⌈√u⌉ (upper square root)2^⌈(lg u)/2⌉Number of clusters
⌊√u⌋ (lower square root)2^⌊(lg u)/2⌋Universe size of each cluster

So u = ⌈√u⌉ · ⌊√u⌋. When u is an even power of 2, both equal √u.

Updated helper functions:

  • high(x) = ⌊x / ⌊√u⌋⌋
  • low(x) = x mod ⌊√u⌋
  • index(x, y) = x · ⌊√u⌋ + y

vEB Tree Structure

A vEB tree vEB(u) contains:

  • u: universe size
  • min: minimum element in the tree (or NIL if empty)
  • max: maximum element in the tree (or NIL if empty)
  • summary: pointer to a vEB(⌈√u⌉) tree
  • cluster[0 .. ⌈√u⌉ − 1]: array of ⌈√u⌉ pointers to vEB(⌊√u⌋) trees

Critical property: The element stored in min does NOT appear in any cluster. The element in max does appear in a cluster (unless min == max, i.e., only one element).

Base case (u = 2): Only uses min and max (no A array, no clusters).

Why min/max Enable O(lg lg u)

The min and max attributes reduce recursive calls in four ways:

  1. MINIMUM/MAXIMUM need no recursion — just return min or max
  2. SUCCESSOR: Can check if successor is in x's cluster by comparing x with the cluster's max (no recursive call needed). Symmetric for PREDECESSOR and min
  3. Empty/singleton detection in O(1): both NIL → empty; equal non-NIL → one element; unequal non-NIL → two or more
  4. Insert into empty tree and delete from singleton take O(1) — just update min/max

Recurrence for vEB Operations

T(u) ≤ T(⌈√u⌉) + O(1)

Solution: T(u) = O(lg lg u) (same analysis as before; the ceiling doesn't affect asymptotics since ⌈m/2⌉ ≤ 2m/3 for m ≥ 2).

Space and Creation

  • Total space: O(u)
  • Creating an empty vEB tree: O(u) time
  • Only worthwhile when performing enough operations to amortize creation cost

vEB Tree Operations

MINIMUM and MAXIMUM — O(1)

VEB-TREE-MINIMUM(V)
  return V.min

VEB-TREE-MAXIMUM(V)
  return V.max

MEMBER — O(lg lg u)

VEB-TREE-MEMBER(V, x)
  if x == V.min or x == V.max
      return TRUE
  elseif V.u == 2
      return FALSE
  else return VEB-TREE-MEMBER(V.cluster[high(x)], low(x))
  • First checks min/max directly
  • Base case: if u = 2 and x isn't min or max, return FALSE
  • Otherwise recurse into the appropriate cluster

SUCCESSOR — O(lg lg u)

VEB-TREE-SUCCESSOR(V, x)
  if V.u == 2
      if x == 0 and V.max == 1
          return 1
      else return NIL
  elseif V.min ≠ NIL and x < V.min
      return V.min
  else max-low = VEB-TREE-MAXIMUM(V.cluster[high(x)])
      if max-low ≠ NIL and low(x) < max-low
          offset = VEB-TREE-SUCCESSOR(V.cluster[high(x)], low(x))
          return index(high(x), offset)
      else succ-cluster = VEB-TREE-SUCCESSOR(V.summary, high(x))
          if succ-cluster == NIL
              return NIL
          else offset = VEB-TREE-MINIMUM(V.cluster[succ-cluster])
              return index(succ-cluster, offset)

Key insight: By checking the cluster's max in O(1), we determine whether to recurse into the cluster or the summary — never both.

Cases:

  1. Base case (u = 2): Successor of 0 is 1 if present
  2. x < V.min: Successor is V.min
  3. Successor in same cluster: low(x) < max of x's cluster → recurse within cluster
  4. Successor in later cluster: Search summary for next nonempty cluster, return its minimum

PREDECESSOR — O(lg lg u)

VEB-TREE-PREDECESSOR(V, x)
  if V.u == 2
      if x == 1 and V.min == 0
          return 0
      else return NIL
  elseif V.max ≠ NIL and x > V.max
      return V.max
  else min-low = VEB-TREE-MINIMUM(V.cluster[high(x)])
      if min-low ≠ NIL and low(x) > min-low
          offset = VEB-TREE-PREDECESSOR(V.cluster[high(x)], low(x))
          return index(high(x), offset)
      else pred-cluster = VEB-TREE-PREDECESSOR(V.summary, high(x))
          if pred-cluster == NIL
              if V.min ≠ NIL and x > V.min
                  return V.min
              else return NIL
          else offset = VEB-TREE-MAXIMUM(V.cluster[pred-cluster])
              return index(pred-cluster, offset)

Extra case vs. SUCCESSOR: If no predecessor in any earlier cluster, the predecessor might be V.min (since min is not stored in any cluster).

INSERT — O(lg lg u)

VEB-EMPTY-TREE-INSERT(V, x)
  V.min = x
  V.max = x

VEB-TREE-INSERT(V, x)
  if V.min == NIL
      VEB-EMPTY-TREE-INSERT(V, x)
  else if x < V.min
          exchange x with V.min
      if V.u > 2
          if VEB-TREE-MINIMUM(V.cluster[high(x)]) == NIL
              VEB-TREE-INSERT(V.summary, high(x))
              VEB-EMPTY-TREE-INSERT(V.cluster[high(x)], low(x))
          else VEB-TREE-INSERT(V.cluster[high(x)], low(x))
      if x > V.max
          V.max = x

Key insight — only one recursive call:

  • If the target cluster is empty: recursively insert cluster number into summary (the cluster insert is O(1) via VEB-EMPTY-TREE-INSERT)
  • If the target cluster is non-empty: recursively insert into the cluster (no summary update needed)

Cases:

  1. Empty tree: Just set min = max = x
  2. x < min: Swap x with min (old min gets inserted into a cluster)
  3. Cluster empty: Update summary + trivially insert into empty cluster
  4. Cluster non-empty: Insert into cluster only
  5. Update max if x > max

DELETE — O(lg lg u)

VEB-TREE-DELETE(V, x)
  if V.min == V.max
      V.min = NIL
      V.max = NIL
  elseif V.u == 2
      if x == 0
          V.min = 1
      else V.min = 0
      V.max = V.min
  else if x == V.min
          first-cluster = VEB-TREE-MINIMUM(V.summary)
          x = index(first-cluster,
              VEB-TREE-MINIMUM(V.cluster[first-cluster]))
          V.min = x
      VEB-TREE-DELETE(V.cluster[high(x)], low(x))
      if VEB-TREE-MINIMUM(V.cluster[high(x)]) == NIL
          VEB-TREE-DELETE(V.summary, high(x))
          if x == V.max
              summary-max = VEB-TREE-MAXIMUM(V.summary)
              if summary-max == NIL
                  V.max = V.min
              else V.max = index(summary-max,
                  VEB-TREE-MAXIMUM(V.cluster[summary-max]))
      elseif x == V.max
          V.max = index(high(x),
              VEB-TREE-MAXIMUM(V.cluster[high(x)]))

Key insight — still only one effective recursive call:

  • Line 13 deletes x from its cluster
  • Line 15 deletes x's cluster number from summary only if the cluster became empty
  • But if the cluster became empty, x was its only element, so line 13 took O(1) time
  • Mutually exclusive: either line 13 is O(1) or line 15 doesn't execute

Cases:

  1. Single element (min == max): Set both to NIL
  2. Base case (u = 2): Set min = max = remaining element
  3. x == min: Replace min with the smallest element from clusters, then delete that element
  4. Cluster becomes empty after deletion: Remove cluster from summary; update max if needed
  5. Cluster still non-empty: Update max if x was the max

Summary of Running Times

OperationBit VectorBinary Tree√u-TreeProto-vEBvEB Tree
MEMBERO(1)O(1)O(1)O(lg lg u)O(lg lg u)
INSERTO(1)O(lg u)O(1)Θ(lg u)O(lg lg u)
DELETEO(1)O(lg u)O(√u)Θ(lg u)O(lg lg u)
MINIMUMΘ(u)O(lg u)O(√u)Θ(lg u)O(1)
MAXIMUMΘ(u)O(lg u)O(√u)Θ(lg u)O(1)
SUCCESSORΘ(u)O(lg u)O(√u)Θ(lg u lg lg u)O(lg lg u)
PREDECESSORΘ(u)O(lg u)O(√u)Θ(lg u lg lg u)O(lg lg u)

Space: O(u) | Creation time: O(u)


Key Takeaways

  1. vEB trees exploit integer keys in a bounded universe to beat the Ω(lg n) comparison-based barrier
  2. The recursive √u decomposition is the central structural idea — halving the number of bits at each level yields lg lg u depth
  3. Storing min separately (outside clusters) is the critical trick that reduces two recursive calls to one
  4. Each operation makes at most one recursive call on a sub-problem of size at most ⌈√u⌉, giving T(u) = T(⌈√u⌉) + O(1) = O(lg lg u)
  5. The trade-off: O(u) space and creation time, so vEB trees are best when u is manageable and many operations are performed