van Emde Boas Trees
Introduction to AlgorithmsChapter 20: van Emde Boas Trees
Motivation
- Previous data structures (binary heaps, red-black trees, Fibonacci heaps) require at least Ω(lg n) time for at least one priority-queue operation
- The Ω(n lg n) sorting lower bound implies this when decisions are comparison-based
- When keys are integers in a bounded range, we can circumvent this lower bound (as with counting sort)
- van Emde Boas (vEB) trees support all dynamic set operations in O(lg lg u) worst-case time
- Keys must be integers in the range 0 to u − 1, with no duplicates
Supported Operations
All in O(lg lg u) time:
- MEMBER, INSERT, DELETE
- MINIMUM, MAXIMUM
- SUCCESSOR, PREDECESSOR
Key Definitions
- n = number of elements currently in the set
- u = universe size (range of possible values: {0, 1, 2, ..., u − 1})
- Assume u = 2^k for some integer k ≥ 1
Preliminary Approaches
Direct Addressing (Bit Vector)
- Store a bit vector A[0 .. u − 1]: A[x] = 1 if x is in the set
- O(1) time for INSERT, DELETE, MEMBER
- Θ(u) time for MINIMUM, MAXIMUM, SUCCESSOR, PREDECESSOR (must scan)
Superimposing a Binary Tree
- Overlay a binary tree of bits on top of the bit vector
- Each internal node stores the logical-OR of its two children
- A node contains 1 iff some leaf in its subtree contains 1
Operations using the binary tree:
- MINIMUM: From root, always go to the leftmost child containing a 1
- MAXIMUM: From root, always go to the rightmost child containing a 1
- SUCCESSOR(x): Go up from leaf x until entering a node from the left that has a 1 in its right child; then descend that right subtree taking leftmost 1s
- PREDECESSOR(x): Symmetric — go up until entering from the right with a 1 in the left child; descend taking rightmost 1s
- INSERT(x): Set 1 in every node on the path from leaf x to root
- DELETE(x): Go from leaf x to root, recomputing each node as logical-OR of its children
Running time: All operations O(lg u) (tree height is lg u)
Superimposing a Tree of Constant Height
- Assume u = 2^(2k), so √u is an integer
- Use a tree of degree √u instead of degree 2 → tree height is always 2
- The √u internal nodes at depth 1 form a summary[0 .. √u − 1] array
- summary[i] = 1 iff the i-th cluster (subarray A[i√u .. (i+1)√u − 1]) contains a 1
Key indexing: For a value x:
- Cluster number = ⌊x / √u⌋
- Position within cluster = x mod √u
Operations:
- INSERT(x): Set A[x] = 1 and summary[⌊x/√u⌋] = 1 → O(1)
- MINIMUM/MAXIMUM: Find leftmost/rightmost 1 in summary, then search within that cluster → O(√u)
- SUCCESSOR(x): Search right within x's cluster; if not found, search right in summary, then find minimum in that cluster → O(√u)
- DELETE(x): Set A[x] = 0, recompute summary[⌊x/√u⌋] as OR of the cluster → O(√u)
The degree-√u tree idea is the key insight behind vEB trees, even though O(√u) is worse than O(lg u).
A Recursive Structure
The Core Recurrence
The target recurrence for O(lg lg u) operations:
T(u) = T(√u) + O(1)
Proof that T(u) = O(lg lg u):
- Let m = lg u, so u = 2^m
- T(2^m) = T(2^(m/2)) + O(1)
- Let S(m) = T(2^m): S(m) = S(m/2) + O(1) → S(m) = O(lg m) by master method
- Therefore T(u) = O(lg m) = O(lg lg u)
Intuition: The universe size sequence is u, u^(1/2), u^(1/4), u^(1/8), ... Each level halves the number of bits needed. Starting with lg u bits, after lg(lg u) levels we reach universe size 2.
Helper Functions
For a universe of size u, given a value x (viewed as a lg u-bit integer):
| Function | Definition | Meaning |
|---|---|---|
| high(x) | ⌊x / √u⌋ | Most significant (lg u)/2 bits — cluster number |
| low(x) | x mod √u | Least significant (lg u)/2 bits — position within cluster |
| index(x, y) | x · √u + y | Reconstruct element from cluster number and offset |
Identity: x = index(high(x), low(x))
Proto van Emde Boas Structures
A proto-vEB structure proto-EB(u) is defined recursively:
- Base case (u = 2): Contains an array A[0..1] of two bits
- Recursive case (u ≥ 4, u = 2^(2k)): Contains:
- u: universe size
- summary: pointer to a proto-EB(√u) structure
- cluster[0 .. √u − 1]: array of √u pointers to proto-EB(√u) structures
Element x is stored in cluster[high(x)] at position low(x).
Operations on Proto-vEB
MEMBER — O(lg lg u) ✓
PROTO-VEB-MEMBER(V, x)
if V.u == 2
return V.A[x]
else return PROTO-VEB-MEMBER(V.cluster[high(x)], low(x))
Recurrence: T(u) = T(√u) + O(1) → O(lg lg u) ✓
MINIMUM — Θ(lg u) ✗
PROTO-VEB-MINIMUM(V)
if V.u == 2
if V.A[0] == 1
return 0
elseif V.A[1] == 1
return 1
else return NIL
else min-cluster = PROTO-VEB-MINIMUM(V.summary)
if min-cluster == NIL
return NIL
else offset = PROTO-VEB-MINIMUM(V.cluster[min-cluster])
return index(min-cluster, offset)
Makes two recursive calls → T(u) = 2T(√u) + O(1) → Θ(lg u) ✗
SUCCESSOR — Θ(lg u · lg lg u) ✗
PROTO-VEB-SUCCESSOR(V, x)
if V.u == 2
if x == 0 and V.A[1] == 1
return 1
else return NIL
else offset = PROTO-VEB-SUCCESSOR(V.cluster[high(x)], low(x))
if offset ≠ NIL
return index(high(x), offset)
else succ-cluster = PROTO-VEB-SUCCESSOR(V.summary, high(x))
if succ-cluster == NIL
return NIL
else offset = PROTO-VEB-MINIMUM(V.cluster[succ-cluster])
return index(succ-cluster, offset)
Two recursive calls + one MINIMUM call → T(u) = 2T(√u) + Θ(lg u) → Θ(lg u · lg lg u) ✗
INSERT — Θ(lg u) ✗
PROTO-VEB-INSERT(V, x)
if V.u == 2
V.A[x] = 1
else PROTO-VEB-INSERT(V.cluster[high(x)], low(x))
PROTO-VEB-INSERT(V.summary, high(x))
Two recursive calls → T(u) = 2T(√u) + O(1) → Θ(lg u) ✗
Problem: Proto-vEB makes too many recursive calls. We need to reduce each operation to at most one recursive call.
The van Emde Boas Tree
Handling Non-Square Universe Sizes
When u is an odd power of 2 (u = 2^(2k+1)), √u is not an integer. We define:
| Symbol | Definition | Meaning |
|---|---|---|
| ⌈√u⌉ (upper square root) | 2^⌈(lg u)/2⌉ | Number of clusters |
| ⌊√u⌋ (lower square root) | 2^⌊(lg u)/2⌋ | Universe size of each cluster |
So u = ⌈√u⌉ · ⌊√u⌋. When u is an even power of 2, both equal √u.
Updated helper functions:
- high(x) = ⌊x / ⌊√u⌋⌋
- low(x) = x mod ⌊√u⌋
- index(x, y) = x · ⌊√u⌋ + y
vEB Tree Structure
A vEB tree vEB(u) contains:
- u: universe size
- min: minimum element in the tree (or NIL if empty)
- max: maximum element in the tree (or NIL if empty)
- summary: pointer to a vEB(⌈√u⌉) tree
- cluster[0 .. ⌈√u⌉ − 1]: array of ⌈√u⌉ pointers to vEB(⌊√u⌋) trees
Critical property: The element stored in min does NOT appear in any cluster. The element in max does appear in a cluster (unless min == max, i.e., only one element).
Base case (u = 2): Only uses min and max (no A array, no clusters).
Why min/max Enable O(lg lg u)
The min and max attributes reduce recursive calls in four ways:
- MINIMUM/MAXIMUM need no recursion — just return min or max
- SUCCESSOR: Can check if successor is in x's cluster by comparing x with the cluster's max (no recursive call needed). Symmetric for PREDECESSOR and min
- Empty/singleton detection in O(1): both NIL → empty; equal non-NIL → one element; unequal non-NIL → two or more
- Insert into empty tree and delete from singleton take O(1) — just update min/max
Recurrence for vEB Operations
T(u) ≤ T(⌈√u⌉) + O(1)
Solution: T(u) = O(lg lg u) (same analysis as before; the ceiling doesn't affect asymptotics since ⌈m/2⌉ ≤ 2m/3 for m ≥ 2).
Space and Creation
- Total space: O(u)
- Creating an empty vEB tree: O(u) time
- Only worthwhile when performing enough operations to amortize creation cost
vEB Tree Operations
MINIMUM and MAXIMUM — O(1)
VEB-TREE-MINIMUM(V)
return V.min
VEB-TREE-MAXIMUM(V)
return V.max
MEMBER — O(lg lg u)
VEB-TREE-MEMBER(V, x)
if x == V.min or x == V.max
return TRUE
elseif V.u == 2
return FALSE
else return VEB-TREE-MEMBER(V.cluster[high(x)], low(x))
- First checks min/max directly
- Base case: if u = 2 and x isn't min or max, return FALSE
- Otherwise recurse into the appropriate cluster
SUCCESSOR — O(lg lg u)
VEB-TREE-SUCCESSOR(V, x)
if V.u == 2
if x == 0 and V.max == 1
return 1
else return NIL
elseif V.min ≠ NIL and x < V.min
return V.min
else max-low = VEB-TREE-MAXIMUM(V.cluster[high(x)])
if max-low ≠ NIL and low(x) < max-low
offset = VEB-TREE-SUCCESSOR(V.cluster[high(x)], low(x))
return index(high(x), offset)
else succ-cluster = VEB-TREE-SUCCESSOR(V.summary, high(x))
if succ-cluster == NIL
return NIL
else offset = VEB-TREE-MINIMUM(V.cluster[succ-cluster])
return index(succ-cluster, offset)
Key insight: By checking the cluster's max in O(1), we determine whether to recurse into the cluster or the summary — never both.
Cases:
- Base case (u = 2): Successor of 0 is 1 if present
- x < V.min: Successor is V.min
- Successor in same cluster: low(x) < max of x's cluster → recurse within cluster
- Successor in later cluster: Search summary for next nonempty cluster, return its minimum
PREDECESSOR — O(lg lg u)
VEB-TREE-PREDECESSOR(V, x)
if V.u == 2
if x == 1 and V.min == 0
return 0
else return NIL
elseif V.max ≠ NIL and x > V.max
return V.max
else min-low = VEB-TREE-MINIMUM(V.cluster[high(x)])
if min-low ≠ NIL and low(x) > min-low
offset = VEB-TREE-PREDECESSOR(V.cluster[high(x)], low(x))
return index(high(x), offset)
else pred-cluster = VEB-TREE-PREDECESSOR(V.summary, high(x))
if pred-cluster == NIL
if V.min ≠ NIL and x > V.min
return V.min
else return NIL
else offset = VEB-TREE-MAXIMUM(V.cluster[pred-cluster])
return index(pred-cluster, offset)
Extra case vs. SUCCESSOR: If no predecessor in any earlier cluster, the predecessor might be V.min (since min is not stored in any cluster).
INSERT — O(lg lg u)
VEB-EMPTY-TREE-INSERT(V, x)
V.min = x
V.max = x
VEB-TREE-INSERT(V, x)
if V.min == NIL
VEB-EMPTY-TREE-INSERT(V, x)
else if x < V.min
exchange x with V.min
if V.u > 2
if VEB-TREE-MINIMUM(V.cluster[high(x)]) == NIL
VEB-TREE-INSERT(V.summary, high(x))
VEB-EMPTY-TREE-INSERT(V.cluster[high(x)], low(x))
else VEB-TREE-INSERT(V.cluster[high(x)], low(x))
if x > V.max
V.max = x
Key insight — only one recursive call:
- If the target cluster is empty: recursively insert cluster number into summary (the cluster insert is O(1) via VEB-EMPTY-TREE-INSERT)
- If the target cluster is non-empty: recursively insert into the cluster (no summary update needed)
Cases:
- Empty tree: Just set min = max = x
- x < min: Swap x with min (old min gets inserted into a cluster)
- Cluster empty: Update summary + trivially insert into empty cluster
- Cluster non-empty: Insert into cluster only
- Update max if x > max
DELETE — O(lg lg u)
VEB-TREE-DELETE(V, x)
if V.min == V.max
V.min = NIL
V.max = NIL
elseif V.u == 2
if x == 0
V.min = 1
else V.min = 0
V.max = V.min
else if x == V.min
first-cluster = VEB-TREE-MINIMUM(V.summary)
x = index(first-cluster,
VEB-TREE-MINIMUM(V.cluster[first-cluster]))
V.min = x
VEB-TREE-DELETE(V.cluster[high(x)], low(x))
if VEB-TREE-MINIMUM(V.cluster[high(x)]) == NIL
VEB-TREE-DELETE(V.summary, high(x))
if x == V.max
summary-max = VEB-TREE-MAXIMUM(V.summary)
if summary-max == NIL
V.max = V.min
else V.max = index(summary-max,
VEB-TREE-MAXIMUM(V.cluster[summary-max]))
elseif x == V.max
V.max = index(high(x),
VEB-TREE-MAXIMUM(V.cluster[high(x)]))
Key insight — still only one effective recursive call:
- Line 13 deletes x from its cluster
- Line 15 deletes x's cluster number from summary only if the cluster became empty
- But if the cluster became empty, x was its only element, so line 13 took O(1) time
- Mutually exclusive: either line 13 is O(1) or line 15 doesn't execute
Cases:
- Single element (min == max): Set both to NIL
- Base case (u = 2): Set min = max = remaining element
- x == min: Replace min with the smallest element from clusters, then delete that element
- Cluster becomes empty after deletion: Remove cluster from summary; update max if needed
- Cluster still non-empty: Update max if x was the max
Summary of Running Times
| Operation | Bit Vector | Binary Tree | √u-Tree | Proto-vEB | vEB Tree |
|---|---|---|---|---|---|
| MEMBER | O(1) | O(1) | O(1) | O(lg lg u) | O(lg lg u) |
| INSERT | O(1) | O(lg u) | O(1) | Θ(lg u) | O(lg lg u) |
| DELETE | O(1) | O(lg u) | O(√u) | Θ(lg u) | O(lg lg u) |
| MINIMUM | Θ(u) | O(lg u) | O(√u) | Θ(lg u) | O(1) |
| MAXIMUM | Θ(u) | O(lg u) | O(√u) | Θ(lg u) | O(1) |
| SUCCESSOR | Θ(u) | O(lg u) | O(√u) | Θ(lg u lg lg u) | O(lg lg u) |
| PREDECESSOR | Θ(u) | O(lg u) | O(√u) | Θ(lg u lg lg u) | O(lg lg u) |
Space: O(u) | Creation time: O(u)
Key Takeaways
- vEB trees exploit integer keys in a bounded universe to beat the Ω(lg n) comparison-based barrier
- The recursive √u decomposition is the central structural idea — halving the number of bits at each level yields lg lg u depth
- Storing min separately (outside clusters) is the critical trick that reduces two recursive calls to one
- Each operation makes at most one recursive call on a sub-problem of size at most ⌈√u⌉, giving T(u) = T(⌈√u⌉) + O(1) = O(lg lg u)
- The trade-off: O(u) space and creation time, so vEB trees are best when u is manageable and many operations are performed
Previous chapter
Fibonacci HeapsNext chapter
Data Structures for Disjoint Sets