Binary Search Trees
Introduction to AlgorithmsChapter 12: Binary Search Trees
The search tree data structure supports many dynamic-set operations: SEARCH, MINIMUM, MAXIMUM, PREDECESSOR, SUCCESSOR, INSERT, and DELETE. It can serve as both a dictionary and a priority queue.
- Basic operations take time proportional to the height of the tree.
- For a complete binary tree with n nodes: operations run in Θ(lg n) worst-case time.
- For a linear chain of n nodes: operations take Θ(n) worst-case time.
- The expected height of a randomly built BST is O(lg n), so basic operations take Θ(lg n) time on average.
12.1 What is a Binary Search Tree?
A binary search tree (BST) is organized as a binary tree represented by a linked data structure. Each node is an object containing:
- key (and satellite data)
- left — pointer to left child
- right — pointer to right child
- p — pointer to parent (NIL for the root)
The Binary-Search-Tree Property
Let x be a node in a binary search tree. If y is a node in the left subtree of x, then y.key ≤ x.key. If y is a node in the right subtree of x, then y.key ≥ x.key.
This property allows us to print all keys in sorted order using a simple recursive algorithm.
Tree Walks
- Inorder tree walk — prints the root between its left and right subtrees (produces sorted output)
- Preorder tree walk — prints the root before either subtree
- Postorder tree walk — prints the root after both subtrees
INORDER-TREE-WALK(x)
1 if x ≠ NIL
2 INORDER-TREE-WALK(x.left)
3 print x.key
4 INORDER-TREE-WALK(x.right)
Theorem 12.1: If x is the root of an n-node subtree, then INORDER-TREE-WALK(x) takes Θ(n) time.
- The procedure calls itself recursively exactly twice for each node (once for left child, once for right child).
- Proof uses the substitution method: T(n) ≤ T(k) + T(n − k − 1) + d, showing T(n) = O(n).
12.2 Querying a Binary Search Tree
All query operations run in O(h) time on a tree of height h.
Searching
TREE-SEARCH(x, k)
1 if x == NIL or k == x.key
2 return x
3 if k < x.key
4 return TREE-SEARCH(x.left, k)
5 else return TREE-SEARCH(x.right, k)
- Begins at the root and traces a simple path downward.
- Compares k with each node's key, going left if k is smaller, right if larger.
- Running time: O(h).
Iterative version (generally more efficient in practice):
ITERATIVE-TREE-SEARCH(x, k)
1 while x ≠ NIL and k ≠ x.key
2 if k < x.key
3 x = x.left
4 else x = x.right
5 return x
Minimum and Maximum
TREE-MINIMUM(x)
1 while x.left ≠ NIL
2 x = x.left
3 return x
TREE-MAXIMUM(x)
1 while x.right ≠ NIL
2 x = x.right
3 return x
- Minimum: follow left child pointers from the root until NIL.
- Maximum: follow right child pointers from the root until NIL.
- Both run in O(h) time.
Successor and Predecessor
The successor of node x is the node with the smallest key greater than x.key.
TREE-SUCCESSOR(x)
1 if x.right ≠ NIL
2 return TREE-MINIMUM(x.right)
3 y = x.p
4 while y ≠ NIL and x == y.right
5 x = y
6 y = y.p
7 return y
Two cases:
- Right subtree is nonempty: successor is the leftmost node in the right subtree (via TREE-MINIMUM).
- Right subtree is empty: successor is the lowest ancestor of x whose left child is also an ancestor of x. Walk up the tree until we find a node that is a left child of its parent.
- TREE-PREDECESSOR is symmetric.
- Both run in O(h) time.
Theorem 12.2: SEARCH, MINIMUM, MAXIMUM, SUCCESSOR, and PREDECESSOR each run in O(h) time on a BST of height h.
12.3 Insertion and Deletion
Insertion
TREE-INSERT(T, z)
1 y = NIL
2 x = T.root
3 while x ≠ NIL
4 y = x
5 if z.key < x.key
6 x = x.left
7 else x = x.right
8 z.p = y
9 if y == NIL
10 T.root = z // tree T was empty
11 elseif z.key < y.key
12 y.left = z
13 else y.right = z
- Traces a path downward from the root, maintaining a trailing pointer y as the parent of x.
- When x becomes NIL, that is the position where z should be inserted.
- Lines 8–13 set the appropriate pointers to link z into the tree.
- Running time: O(h).
Deletion
Deleting node z from BST T has three conceptual cases:
- z has no children — simply remove z (replace with NIL).
- z has one child — elevate that child to take z's position.
- z has two children — find z's successor y (in z's right subtree), and have y replace z.
The implementation uses four cases (based on subtree structure):
- (a) z has no left child → replace z by its right child.
- (b) z has a left child but no right child → replace z by its left child.
- (c) z has two children and y (successor) is z's right child → replace z by y, keeping y's right child.
- (d) z has two children and y is deeper in z's right subtree → replace y by its own right child, then replace z by y.
TRANSPLANT Subroutine
Replaces the subtree rooted at u with the subtree rooted at v:
TRANSPLANT(T, u, v)
1 if u.p == NIL
2 T.root = v
3 elseif u == u.p.left
4 u.p.left = v
5 else u.p.right = v
6 if v ≠ NIL
7 v.p = u.p
- Updates u's parent to point to v instead.
- Does not update v.left or v.right — that is the caller's responsibility.
TREE-DELETE
TREE-DELETE(T, z)
1 if z.left == NIL
2 TRANSPLANT(T, z, z.right)
3 elseif z.right == NIL
4 TRANSPLANT(T, z, z.left)
5 else y = TREE-MINIMUM(z.right)
6 if y.p ≠ z
7 TRANSPLANT(T, y, y.right)
8 y.right = z.right
9 y.right.p = y
10 TRANSPLANT(T, z, y)
11 y.left = z.left
12 y.left.p = y
- Lines 1–2: no left child → replace z with right child.
- Lines 3–4: left child only → replace z with left child.
- Lines 5–12: two children → find successor y via TREE-MINIMUM(z.right).
- Lines 6–9: if y is not z's direct right child, first splice y out and fix up the right subtree.
- Lines 10–12: replace z by y and attach z's left subtree to y.
- Running time: O(h) (all lines are constant time except the TREE-MINIMUM call).
Theorem 12.3: INSERT and DELETE each run in O(h) time on a BST of height h.
12.4 Randomly Built Binary Search Trees
The height of a BST varies as items are inserted and deleted:
- Worst case: inserting n items in strictly increasing order produces a chain of height n − 1.
- Best case: height is ⌊lg n⌋.
Definition
A randomly built binary search tree on n keys arises from inserting the keys in random order into an initially empty tree, where each of the n! permutations of input keys is equally likely.
Note: This is different from assuming every BST on n keys is equally likely.
Theorem 12.4
The expected height of a randomly built binary search tree on n distinct keys is O(lg n).
Proof Outline
Define three random variables:
- X_n = height of a randomly built BST on n keys
- Y_n = 2^{X_n} = exponential height
- R_n = rank of the key chosen as root (equally likely to be any value in {1, 2, ..., n})
Key relationships:
- If R_n = i, the left subtree has i − 1 keys and the right subtree has n − i keys.
- Y_n = 2 · max(Y_{i−1}, Y_{n−i})
- Base cases: Y_1 = 1, Y_0 = 0.
Using indicator random variables Z_{n,i} = I{R_n = i}:
- E[Z_{n,i}] = 1/n
- Z_{n,i} is independent of Y_{i−1} and Y_{n−i}
Deriving the recurrence:
$$E[Y_n] = \frac{2}{n} \sum_{i=1}^{n} E[\max(Y_{i-1}, Y_{n-i})]$$
Since max(a, b) ≤ a + b:
$$E[Y_n] \leq \frac{4}{n} \sum_{i=0}^{n-1} E[Y_i]$$
Solving by substitution: Show E[Y_n] ≤ (1/4) · C(n+3, 3) using the identity:
$$\sum_{i=0}^{n-1} \binom{i+3}{3} = \binom{n+3}{4}$$
Final step: Since f(x) = 2^x is convex, apply Jensen's inequality:
$$2^{E[X_n]} \leq E[2^{X_n}] = E[Y_n] \leq \frac{1}{4}\binom{n+3}{3} = O(n^3)$$
Taking logarithms: E[X_n] = O(lg n). ∎
Summary of Running Times
| Operation | Running Time |
|---|---|
| INORDER-TREE-WALK | Θ(n) |
| SEARCH | O(h) |
| MINIMUM | O(h) |
| MAXIMUM | O(h) |
| SUCCESSOR | O(h) |
| PREDECESSOR | O(h) |
| INSERT | O(h) |
| DELETE | O(h) |
Where h = height of the tree:
- Worst case: h = n − 1 (linear chain)
- Best case: h = ⌊lg n⌋ (balanced tree)
- Average case (randomly built): h = O(lg n)
Previous chapter
Hash TablesNext chapter
Red Black Trees