Pagefy

Book

Introduction to Algorithms

1.The Role of Algorithms in Computing 2.Getting Started 3.Growth of Functions 4.Divide and Conquer 5.Probabilistic Analysis and Randomized Algorithms 6.Heapsort 7.Quicksort 8.Sorting in Linear Time 9.Medians and Order Statistics 10.Elementary Data Structures 11.Hash Tables 12.Binary Search Trees 13.Red Black Trees 14.Augmenting Data Structures 15.Dynamic Programming 16.Greedy Algorithms 17.Amortized Analysis 18.B Trees 19.Fibonacci Heaps 20.van Emde Boas Trees 21.Data Structures for Disjoint Sets 22.Elementary Graph Algorithms 23.Minimum Spanning Trees 24.Single Source Shortest Paths 25.All Pairs Shortest Paths 26.Maximum Flow 27.Multithreaded Algorithms 28.Matrix Operations 29.Linear Programming 30.Polynomials and the FFT 31.Number Theoretic Algorithms 32.String Matching 33.Computational Geometry 34.NP Completeness 35.Approximation Algorithms 36.Appendix A Summations 37.Appendix B Sets Etc 38.Appendix C Counting and Probability 39.Appendix D Matrices

B Trees

Introduction to Algorithms

Chapter 18: B-Trees

B-trees are balanced search trees designed to work well on disks or other direct-access secondary storage devices. Many database systems use B-trees, or variants of B-trees, to store information.

B-tree nodes may have many children (from a few to thousands) — the branching factor can be very large
Every n-node B-tree has height O(lg n), but the exact height can be considerably less than a red-black tree because the base of the logarithm is much larger
B-trees generalize binary search trees: an internal node x with x.n keys has x.n + 1 children
Keys in a node serve as dividing points separating the range of keys into subranges, each handled by one child

Data Structures on Secondary Storage

Primary memory (main memory): silicon memory chips — fast but expensive
Secondary storage (magnetic disks): cheaper, higher capacity, but much slower due to moving mechanical parts
Two components of disk access time: platter rotation and arm movement
Disks read/write in units of pages (typically 2¹¹ to 2¹⁴ bytes)
Accessing a page and reading it from disk often takes longer than examining all the information read

Running Time Components

For B-tree algorithms, we separately account for:

Number of disk accesses (measured in pages read/written)
CPU (computing) time

Disk Operation Model

B-tree algorithms copy selected pages from disk into main memory as needed
Only a constant number of pages are kept in main memory at any time
DISK-READ(x): read object x into main memory (no-op if already in memory)
DISK-WRITE(x): save changes made to object x back to disk

Typical usage pattern:

x = a pointer to some object
DISK-READ(x)
operations that access and/or modify the attributes of x
DISK-WRITE(x)       // omitted if no attributes of x were changed
other operations that access but do not modify attributes of x

Why B-Trees Are Powerful

A B-tree node is usually as large as a whole disk page
Branching factors between 50 and 2000 are common in practice
A B-tree with branching factor 1001 and height 2 can store over one billion keys, requiring at most two disk accesses to find any key (keeping the root in memory)

Definition of B-Trees

A B-tree T is a rooted tree (whose root is T.root) having the following properties:

Every node x has the following attributes:
- x.n — the number of keys currently stored in node x
- The x.n keys themselves: x.key₁ ≤ x.key₂ ≤ … ≤ x.key_(x.n), stored in nondecreasing order
- x.leaf — a boolean value: TRUE if x is a leaf, FALSE if x is an internal node
Each internal node x also contains x.n + 1 pointers x.c₁, x.c₂, …, x.c_(x.n+1) to its children. Leaf nodes have no children.
Key separation property: The keys x.keyᵢ separate the ranges of keys stored in each subtree. If kᵢ is any key stored in the subtree rooted at x.cᵢ, then:
- k₁ ≤ x.key₁ ≤ k₂ ≤ x.key₂ ≤ … ≤ x.key_(x.n) ≤ k_(x.n+1)
All leaves have the same depth, which is the tree's height h.
Bounds on the number of keys (expressed in terms of a fixed integer t ≥ 2 called the minimum degree):
- Every node other than the root must have at least t − 1 keys (thus at least t children for internal nodes). The root must have at least 1 key (if the tree is nonempty).
- Every node may contain at most 2t − 1 keys (thus at most 2t children for internal nodes). A node is full if it contains exactly 2t − 1 keys.

Special Case: t = 2

When t = 2, every internal node has 2, 3, or 4 children — this is a 2-3-4 tree.

Variant: B⁺-Tree

A B⁺-tree stores all satellite information in the leaves and stores only keys and child pointers in internal nodes, maximizing the branching factor of internal nodes.

Height of a B-Tree (Theorem 18.1)

If n ≥ 1, then for any n-key B-tree T of height h and minimum degree t ≥ 2:

h ≤ log_t((n + 1) / 2)

Proof sketch:

The root contains at least 1 key; all other nodes contain at least t − 1 keys
At depth 1: at least 2 nodes
At depth 2: at least 2t nodes
At depth d: at least 2t^(d−1) nodes
Total keys: n ≥ 1 + 2(t − 1)·((t^h − 1)/(t − 1)) = 2t^h − 1
Therefore: t^h ≤ (n + 1)/2, giving h ≤ log_t((n + 1)/2)

Key insight: B-trees save a factor of about lg t over red-black trees in the number of nodes examined, avoiding a substantial number of disk accesses.

Basic Operations on B-Trees

Conventions

The root of the B-tree is always in main memory (no DISK-READ needed for root; DISK-WRITE required when root changes)
Any node passed as a parameter has already had DISK-READ performed on it
All procedures are one-pass algorithms that proceed downward from the root without backing up

Searching a B-Tree

Searching is a generalization of binary search tree search — at each internal node x, we make an (x.n + 1)-way branching decision.

B-TREE-SEARCH returns the ordered pair (y, i) such that y.keyᵢ = k, or NIL if k is not found. Top-level call: B-TREE-SEARCH(T.root, k).

B-TREE-SEARCH(x, k)
1   i = 1
2   while i ≤ x.n and k > x.keyᵢ
3       i = i + 1
4   if i ≤ x.n and k == x.keyᵢ
5       return (x, i)
6   elseif x.leaf
7       return NIL
8   else DISK-READ(x.cᵢ)
9       return B-TREE-SEARCH(x.cᵢ, k)

Analysis:

Disk accesses: O(h) = O(log_t n)
CPU time per node: O(t) (linear search through keys)
Total CPU time: O(t·h) = O(t · log_t n)

Creating an Empty B-Tree

B-TREE-CREATE(T)
1   x = ALLOCATE-NODE()
2   x.leaf = TRUE
3   x.n = 0
4   DISK-WRITE(x)
5   T.root = x

Analysis: O(1) disk operations and O(1) CPU time.

Inserting a Key into a B-Tree

Insertion is more complicated than in a binary search tree:

We cannot simply create a new leaf node — we must insert into an existing leaf node
If a leaf is full, we must split it before inserting
We use a single-pass approach: split every full node encountered on the way down, so we never need to back up

Splitting a Node

B-TREE-SPLIT-CHILD takes a nonfull internal node x and an index i such that x.cᵢ is a full child. It splits the full child around its median key into two nodes with t − 1 keys each, and moves the median key up into the parent.

Splitting is the only means by which the tree grows in height
To split a full root, first make it a child of a new empty root node

B-TREE-SPLIT-CHILD(x, i)
1   z = ALLOCATE-NODE()
2   y = x.cᵢ
3   z.leaf = y.leaf
4   z.n = t - 1
5   for j = 1 to t - 1
6       z.keyⱼ = y.key_(j+t)
7   if not y.leaf
8       for j = 1 to t
9           z.cⱼ = y.c_(j+t)
10  y.n = t - 1
11  for j = x.n + 1 downto i + 1
12      x.c_(j+1) = x.cⱼ
13  x.c_(i+1) = z
14  for j = x.n downto i
15      x.key_(j+1) = x.keyⱼ
16  x.keyᵢ = y.key_t
17  x.n = x.n + 1
18  DISK-WRITE(y)
19  DISK-WRITE(z)
20  DISK-WRITE(x)

What happens:

Node y (the full child with 2t − 1 keys) is split
z gets the largest t − 1 keys (and t children if not a leaf) from y
y is reduced to t − 1 keys
The median key (y.key_t) moves up into parent x
z becomes a new child of x, positioned just after y

Analysis: Θ(t) CPU time, O(1) disk operations.

Inserting into the B-Tree

B-TREE-INSERT(T, k)
1   r = T.root
2   if r.n == 2t - 1
3       s = ALLOCATE-NODE()
4       T.root = s
5       s.leaf = FALSE
6       s.n = 0
7       s.c₁ = r
8       B-TREE-SPLIT-CHILD(s, 1)
9       B-TREE-INSERT-NONFULL(s, k)
10  else B-TREE-INSERT-NONFULL(r, k)

Key point: Lines 3–9 handle the case where the root is full — the root splits and a new node becomes the root. A B-tree increases in height at the top, not at the bottom.

Inserting into a Nonfull Node

B-TREE-INSERT-NONFULL(x, k)
1   i = x.n
2   if x.leaf
3       while i ≥ 1 and k < x.keyᵢ
4           x.key_(i+1) = x.keyᵢ
5           i = i - 1
6       x.key_(i+1) = k
7       x.n = x.n + 1
8       DISK-WRITE(x)
9   else while i ≥ 1 and k < x.keyᵢ
10          i = i - 1
11      i = i + 1
12      DISK-READ(x.cᵢ)
13      if x.cᵢ.n == 2t - 1
14          B-TREE-SPLIT-CHILD(x, i)
15          if k > x.keyᵢ
16              i = i + 1
17      B-TREE-INSERT-NONFULL(x.cᵢ, k)

How it works:

If x is a leaf (lines 2–8): insert key k directly by shifting keys to make room
If x is internal (lines 9–17):
- Find the correct child to descend to
- If that child is full, split it first (line 14)
- After splitting, determine which of the two resulting children to descend to (lines 15–16)
- Recurse on the appropriate (nonfull) child

Analysis:

Disk accesses: O(h)
CPU time: O(t·h) = O(t · log_t n)
Since B-TREE-INSERT-NONFULL is tail-recursive, it can be implemented as a while loop, requiring only O(1) pages in main memory at any time

Deleting a Key from a B-Tree

Deletion is analogous to insertion but more complicated because:

A key can be deleted from any node (not just a leaf)
Deleting from an internal node requires rearranging children
We must ensure no node becomes too small (fewer than t − 1 keys, except the root)

Design principle: When B-TREE-DELETE recurses on a node x, it guarantees x has at least t keys (one more than the minimum). This allows deletion in a single downward pass without backing up.

Root shrinking: If the root ever becomes an internal node with no keys (possible in cases 2c and 3b), delete the root and make its only child the new root, decreasing the tree height by one.

Case 1: Key k is in a Leaf Node x

Simply delete k from x.

Case 2: Key k is in an Internal Node x

Case 2a: If the child y that precedes k in node x has at least t keys:
- Find the predecessor k' of k in the subtree rooted at y
- Recursively delete k' and replace k by k' in x
Case 2b: If y has fewer than t keys, examine the child z that follows k in node x. If z has at least t keys:
- Find the successor k' of k in the subtree rooted at z
- Recursively delete k' and replace k by k' in x
Case 2c: If both y and z have only t − 1 keys:
- Merge k and all of z into y (y now contains 2t − 1 keys)
- Free z
- Recursively delete k from y

Case 3: Key k is NOT in Internal Node x

Determine the child x.cᵢ whose subtree must contain k. If x.cᵢ has only t − 1 keys, execute 3a or 3b to ensure we descend to a node with at least t keys:

Case 3a: If x.cᵢ has only t − 1 keys but has an immediate sibling with at least t keys:
- Move a key from x down into x.cᵢ
- Move a key from the sibling up into x
- Move the appropriate child pointer from the sibling into x.cᵢ
- (This is a rotation)
Case 3b: If x.cᵢ and both of its immediate siblings have t − 1 keys:
- Merge x.cᵢ with one sibling
- Move a key from x down into the new merged node as its median key

Then recurse on the appropriate child of x.

Deletion Analysis

Disk accesses: O(h)
CPU time: O(t·h) = O(t · log_t n)
Most deletions in practice occur at leaves (single downward pass)
Deletion from an internal node (cases 2a, 2b) may require returning to replace the key with its predecessor/successor

Previous chapter

Amortized Analysis

Next chapter

Fibonacci Heaps