B Trees
Introduction to AlgorithmsChapter 18: B-Trees
B-trees are balanced search trees designed to work well on disks or other direct-access secondary storage devices. Many database systems use B-trees, or variants of B-trees, to store information.
- B-tree nodes may have many children (from a few to thousands) — the branching factor can be very large
- Every n-node B-tree has height O(lg n), but the exact height can be considerably less than a red-black tree because the base of the logarithm is much larger
- B-trees generalize binary search trees: an internal node x with x.n keys has x.n + 1 children
- Keys in a node serve as dividing points separating the range of keys into subranges, each handled by one child
Data Structures on Secondary Storage
- Primary memory (main memory): silicon memory chips — fast but expensive
- Secondary storage (magnetic disks): cheaper, higher capacity, but much slower due to moving mechanical parts
- Two components of disk access time: platter rotation and arm movement
- Disks read/write in units of pages (typically 2¹¹ to 2¹⁴ bytes)
- Accessing a page and reading it from disk often takes longer than examining all the information read
Running Time Components
For B-tree algorithms, we separately account for:
- Number of disk accesses (measured in pages read/written)
- CPU (computing) time
Disk Operation Model
- B-tree algorithms copy selected pages from disk into main memory as needed
- Only a constant number of pages are kept in main memory at any time
- DISK-READ(x): read object x into main memory (no-op if already in memory)
- DISK-WRITE(x): save changes made to object x back to disk
Typical usage pattern:
x = a pointer to some object
DISK-READ(x)
operations that access and/or modify the attributes of x
DISK-WRITE(x) // omitted if no attributes of x were changed
other operations that access but do not modify attributes of x
Why B-Trees Are Powerful
- A B-tree node is usually as large as a whole disk page
- Branching factors between 50 and 2000 are common in practice
- A B-tree with branching factor 1001 and height 2 can store over one billion keys, requiring at most two disk accesses to find any key (keeping the root in memory)
Definition of B-Trees
A B-tree T is a rooted tree (whose root is T.root) having the following properties:
-
Every node x has the following attributes:
- x.n — the number of keys currently stored in node x
- The x.n keys themselves: x.key₁ ≤ x.key₂ ≤ … ≤ x.key_(x.n), stored in nondecreasing order
- x.leaf — a boolean value: TRUE if x is a leaf, FALSE if x is an internal node
-
Each internal node x also contains x.n + 1 pointers x.c₁, x.c₂, …, x.c_(x.n+1) to its children. Leaf nodes have no children.
-
Key separation property: The keys x.keyᵢ separate the ranges of keys stored in each subtree. If kᵢ is any key stored in the subtree rooted at x.cᵢ, then:
- k₁ ≤ x.key₁ ≤ k₂ ≤ x.key₂ ≤ … ≤ x.key_(x.n) ≤ k_(x.n+1)
-
All leaves have the same depth, which is the tree's height h.
-
Bounds on the number of keys (expressed in terms of a fixed integer t ≥ 2 called the minimum degree):
- Every node other than the root must have at least t − 1 keys (thus at least t children for internal nodes). The root must have at least 1 key (if the tree is nonempty).
- Every node may contain at most 2t − 1 keys (thus at most 2t children for internal nodes). A node is full if it contains exactly 2t − 1 keys.
Special Case: t = 2
When t = 2, every internal node has 2, 3, or 4 children — this is a 2-3-4 tree.
Variant: B⁺-Tree
A B⁺-tree stores all satellite information in the leaves and stores only keys and child pointers in internal nodes, maximizing the branching factor of internal nodes.
Height of a B-Tree (Theorem 18.1)
If n ≥ 1, then for any n-key B-tree T of height h and minimum degree t ≥ 2:
h ≤ log_t((n + 1) / 2)
Proof sketch:
- The root contains at least 1 key; all other nodes contain at least t − 1 keys
- At depth 1: at least 2 nodes
- At depth 2: at least 2t nodes
- At depth d: at least 2t^(d−1) nodes
- Total keys: n ≥ 1 + 2(t − 1)·((t^h − 1)/(t − 1)) = 2t^h − 1
- Therefore: t^h ≤ (n + 1)/2, giving h ≤ log_t((n + 1)/2)
Key insight: B-trees save a factor of about lg t over red-black trees in the number of nodes examined, avoiding a substantial number of disk accesses.
Basic Operations on B-Trees
Conventions
- The root of the B-tree is always in main memory (no DISK-READ needed for root; DISK-WRITE required when root changes)
- Any node passed as a parameter has already had DISK-READ performed on it
- All procedures are one-pass algorithms that proceed downward from the root without backing up
Searching a B-Tree
Searching is a generalization of binary search tree search — at each internal node x, we make an (x.n + 1)-way branching decision.
B-TREE-SEARCH returns the ordered pair (y, i) such that y.keyᵢ = k, or NIL if k is not found. Top-level call: B-TREE-SEARCH(T.root, k).
B-TREE-SEARCH(x, k)
1 i = 1
2 while i ≤ x.n and k > x.keyᵢ
3 i = i + 1
4 if i ≤ x.n and k == x.keyᵢ
5 return (x, i)
6 elseif x.leaf
7 return NIL
8 else DISK-READ(x.cᵢ)
9 return B-TREE-SEARCH(x.cᵢ, k)
Analysis:
- Disk accesses: O(h) = O(log_t n)
- CPU time per node: O(t) (linear search through keys)
- Total CPU time: O(t·h) = O(t · log_t n)
Creating an Empty B-Tree
B-TREE-CREATE(T)
1 x = ALLOCATE-NODE()
2 x.leaf = TRUE
3 x.n = 0
4 DISK-WRITE(x)
5 T.root = x
Analysis: O(1) disk operations and O(1) CPU time.
Inserting a Key into a B-Tree
Insertion is more complicated than in a binary search tree:
- We cannot simply create a new leaf node — we must insert into an existing leaf node
- If a leaf is full, we must split it before inserting
- We use a single-pass approach: split every full node encountered on the way down, so we never need to back up
Splitting a Node
B-TREE-SPLIT-CHILD takes a nonfull internal node x and an index i such that x.cᵢ is a full child. It splits the full child around its median key into two nodes with t − 1 keys each, and moves the median key up into the parent.
- Splitting is the only means by which the tree grows in height
- To split a full root, first make it a child of a new empty root node
B-TREE-SPLIT-CHILD(x, i)
1 z = ALLOCATE-NODE()
2 y = x.cᵢ
3 z.leaf = y.leaf
4 z.n = t - 1
5 for j = 1 to t - 1
6 z.keyⱼ = y.key_(j+t)
7 if not y.leaf
8 for j = 1 to t
9 z.cⱼ = y.c_(j+t)
10 y.n = t - 1
11 for j = x.n + 1 downto i + 1
12 x.c_(j+1) = x.cⱼ
13 x.c_(i+1) = z
14 for j = x.n downto i
15 x.key_(j+1) = x.keyⱼ
16 x.keyᵢ = y.key_t
17 x.n = x.n + 1
18 DISK-WRITE(y)
19 DISK-WRITE(z)
20 DISK-WRITE(x)
What happens:
- Node y (the full child with 2t − 1 keys) is split
- z gets the largest t − 1 keys (and t children if not a leaf) from y
- y is reduced to t − 1 keys
- The median key (y.key_t) moves up into parent x
- z becomes a new child of x, positioned just after y
Analysis: Θ(t) CPU time, O(1) disk operations.
Inserting into the B-Tree
B-TREE-INSERT(T, k)
1 r = T.root
2 if r.n == 2t - 1
3 s = ALLOCATE-NODE()
4 T.root = s
5 s.leaf = FALSE
6 s.n = 0
7 s.c₁ = r
8 B-TREE-SPLIT-CHILD(s, 1)
9 B-TREE-INSERT-NONFULL(s, k)
10 else B-TREE-INSERT-NONFULL(r, k)
Key point: Lines 3–9 handle the case where the root is full — the root splits and a new node becomes the root. A B-tree increases in height at the top, not at the bottom.
Inserting into a Nonfull Node
B-TREE-INSERT-NONFULL(x, k)
1 i = x.n
2 if x.leaf
3 while i ≥ 1 and k < x.keyᵢ
4 x.key_(i+1) = x.keyᵢ
5 i = i - 1
6 x.key_(i+1) = k
7 x.n = x.n + 1
8 DISK-WRITE(x)
9 else while i ≥ 1 and k < x.keyᵢ
10 i = i - 1
11 i = i + 1
12 DISK-READ(x.cᵢ)
13 if x.cᵢ.n == 2t - 1
14 B-TREE-SPLIT-CHILD(x, i)
15 if k > x.keyᵢ
16 i = i + 1
17 B-TREE-INSERT-NONFULL(x.cᵢ, k)
How it works:
- If x is a leaf (lines 2–8): insert key k directly by shifting keys to make room
- If x is internal (lines 9–17):
- Find the correct child to descend to
- If that child is full, split it first (line 14)
- After splitting, determine which of the two resulting children to descend to (lines 15–16)
- Recurse on the appropriate (nonfull) child
Analysis:
- Disk accesses: O(h)
- CPU time: O(t·h) = O(t · log_t n)
- Since B-TREE-INSERT-NONFULL is tail-recursive, it can be implemented as a while loop, requiring only O(1) pages in main memory at any time
Deleting a Key from a B-Tree
Deletion is analogous to insertion but more complicated because:
- A key can be deleted from any node (not just a leaf)
- Deleting from an internal node requires rearranging children
- We must ensure no node becomes too small (fewer than t − 1 keys, except the root)
Design principle: When B-TREE-DELETE recurses on a node x, it guarantees x has at least t keys (one more than the minimum). This allows deletion in a single downward pass without backing up.
Root shrinking: If the root ever becomes an internal node with no keys (possible in cases 2c and 3b), delete the root and make its only child the new root, decreasing the tree height by one.
Case 1: Key k is in a Leaf Node x
Simply delete k from x.
Case 2: Key k is in an Internal Node x
-
Case 2a: If the child y that precedes k in node x has at least t keys:
- Find the predecessor k' of k in the subtree rooted at y
- Recursively delete k' and replace k by k' in x
-
Case 2b: If y has fewer than t keys, examine the child z that follows k in node x. If z has at least t keys:
- Find the successor k' of k in the subtree rooted at z
- Recursively delete k' and replace k by k' in x
-
Case 2c: If both y and z have only t − 1 keys:
- Merge k and all of z into y (y now contains 2t − 1 keys)
- Free z
- Recursively delete k from y
Case 3: Key k is NOT in Internal Node x
Determine the child x.cᵢ whose subtree must contain k. If x.cᵢ has only t − 1 keys, execute 3a or 3b to ensure we descend to a node with at least t keys:
-
Case 3a: If x.cᵢ has only t − 1 keys but has an immediate sibling with at least t keys:
- Move a key from x down into x.cᵢ
- Move a key from the sibling up into x
- Move the appropriate child pointer from the sibling into x.cᵢ
- (This is a rotation)
-
Case 3b: If x.cᵢ and both of its immediate siblings have t − 1 keys:
- Merge x.cᵢ with one sibling
- Move a key from x down into the new merged node as its median key
Then recurse on the appropriate child of x.
Deletion Analysis
- Disk accesses: O(h)
- CPU time: O(t·h) = O(t · log_t n)
- Most deletions in practice occur at leaves (single downward pass)
- Deletion from an internal node (cases 2a, 2b) may require returning to replace the key with its predecessor/successor
Previous chapter
Amortized AnalysisNext chapter
Fibonacci Heaps