Heapsort
Introduction to AlgorithmsChapter 6: Heapsort
Heapsort combines the best attributes of merge sort and insertion sort:
- Like merge sort: runs in O(n lg n) time
- Like insertion sort: sorts in place (only a constant number of elements stored outside the input array)
Heapsort introduces the heap data structure, which is useful both for sorting and for implementing an efficient priority queue.
Note: "Heap" here refers to the data structure, not garbage-collected storage used in languages like Java or Lisp.
6.1 Heaps
The (binary) heap is an array object that can be viewed as a nearly complete binary tree.
- The tree is completely filled on all levels except possibly the lowest, which is filled from the left up to a point
- Each node of the tree corresponds to an element of the array
Array Representation
An array A representing a heap has two attributes:
- A.length — the number of elements in the array
- A.heap-size — how many elements in the heap are stored within
A0 ≤ A.heap-size ≤ A.length- Only elements in
A[1 .. A.heap-size]are valid heap elements
The root of the tree is A[1]. Given the index i of a node:
PARENT(i)
return ⌊i/2⌋
LEFT(i)
return 2i
RIGHT(i)
return 2i + 1
- LEFT: computed by shifting
ileft one bit - RIGHT: computed by shifting
ileft one bit and adding 1 - PARENT: computed by shifting
iright one bit
These are often implemented as macros or inline procedures.
Max-Heaps and Min-Heaps
There are two kinds of binary heaps:
-
Max-heap: for every node
iother than the root:A[PARENT(i)] ≥ A[i]- The largest element is at the root
- Subtrees rooted at any node contain no values larger than that node
-
Min-heap: for every node
iother than the root:A[PARENT(i)] ≤ A[i]- The smallest element is at the root
Heapsort uses max-heaps. Min-heaps commonly implement priority queues.
Height
- Height of a node: number of edges on the longest simple downward path from the node to a leaf
- Height of the heap: height of its root
- A heap of
nelements has height ⌊lg n⌋ - Basic operations run in time proportional to the height: O(lg n)
Key Procedures Summary
| Procedure | Running Time | Purpose |
|---|---|---|
| MAX-HEAPIFY | O(lg n) | Maintain the max-heap property |
| BUILD-MAX-HEAP | O(n) | Produce a max-heap from an unordered array |
| HEAPSORT | O(n lg n) | Sort an array in place |
| HEAP-MAXIMUM | Θ(1) | Return the maximum element |
| HEAP-EXTRACT-MAX | O(lg n) | Remove and return the maximum |
| HEAP-INCREASE-KEY | O(lg n) | Increase an element's key |
| MAX-HEAP-INSERT | O(lg n) | Insert a new element |
6.2 Maintaining the Heap Property
MAX-HEAPIFY is the key procedure for maintaining the max-heap property.
Assumptions
- The binary trees rooted at
LEFT(i)andRIGHT(i)are already max-heaps A[i]might be smaller than its children, violating the max-heap property- MAX-HEAPIFY lets the value at
A[i]"float down" in the heap
Algorithm
MAX-HEAPIFY(A, i)
l = LEFT(i)
r = RIGHT(i)
if l ≤ A.heap-size and A[l] > A[i]
largest = l
else largest = i
if r ≤ A.heap-size and A[r] > A[largest]
largest = r
if largest ≠ i
exchange A[i] with A[largest]
MAX-HEAPIFY(A, largest)
How It Works
- Determine the largest of
A[i],A[LEFT(i)], andA[RIGHT(i)] - If
A[i]is largest → subtree already a max-heap → terminate - Otherwise → swap
A[i]withA[largest] - The subtree rooted at
largestmight now violate the property → recurse
Running Time Analysis
- Fixing up relationships among
A[i],A[LEFT(i)],A[RIGHT(i)]takes Θ(1) - Plus the time to recurse on a child's subtree
- Each child's subtree has size at most 2n/3 (worst case: bottom level exactly half full)
- Recurrence: T(n) ≤ T(2n/3) + Θ(1)
- By case 2 of the Master Theorem: T(n) = O(lg n)
- Equivalently: O(h) where
his the height of the node
6.3 Building a Heap
BUILD-MAX-HEAP converts an unordered array A[1 .. n] into a max-heap using a bottom-up approach.
Key Insight
- Elements in
A[⌊n/2⌋ + 1 .. n]are all leaves of the tree - Each leaf is a trivial 1-element max-heap
- Process the remaining (internal) nodes from bottom to top, calling MAX-HEAPIFY on each
Algorithm
BUILD-MAX-HEAP(A)
A.heap-size = A.length
for i = ⌊A.length/2⌋ downto 1
MAX-HEAPIFY(A, i)
Correctness (Loop Invariant)
At the start of each iteration of the for loop, each node
i+1, i+2, ..., nis the root of a max-heap.
- Initialization:
i = ⌊n/2⌋. Nodes⌊n/2⌋ + 1, ..., nare leaves → trivial max-heaps ✓ - Maintenance: Children of node
iare numbered higher thani→ by the invariant, they are roots of max-heaps → callingMAX-HEAPIFY(A, i)makes nodeia max-heap root ✓ - Termination:
i = 0. Every node1, 2, ..., nis the root of a max-heap. In particular, node 1 (the root) ✓
Running Time
Naive bound: O(n) calls × O(lg n) each = O(n lg n)
Tighter bound: The time for MAX-HEAPIFY varies with node height, and most nodes have small height.
- An n-element heap has height ⌊lg n⌋
- At most ⌈n / 2^(h+1)⌉ nodes of height
h - Total cost:
⌊lg n⌋
Σ ⌈n / 2^(h+1)⌉ · O(h) = O(n · Σ h/2^h)
h=0
Using the identity:
∞
Σ h/2^h = (1/2) / (1 - 1/2)² = 2
h=0
Therefore: BUILD-MAX-HEAP runs in O(n) time — linear!
BUILD-MIN-HEAP works the same way but calls MIN-HEAPIFY instead. Also runs in O(n) time.
6.4 The Heapsort Algorithm
Strategy
- Build a max-heap from the input array
- The maximum element is at
A[1]— swap it withA[n](its correct final position) - Shrink the heap by decrementing
A.heap-size - The children of the root remain max-heaps, but the new root may violate the property → call
MAX-HEAPIFY(A, 1) - Repeat for heap sizes
n-1down to2
Algorithm
HEAPSORT(A)
BUILD-MAX-HEAP(A)
for i = A.length downto 2
exchange A[1] with A[i]
A.heap-size = A.heap-size - 1
MAX-HEAPIFY(A, 1)
Running Time
BUILD-MAX-HEAP: O(n)n - 1calls toMAX-HEAPIFY: each O(lg n)- Total: O(n lg n)
Properties
- In-place sorting algorithm
- Not stable (equal elements may change relative order)
- Worst-case running time: Θ(n lg n)
- Best-case running time (distinct elements): Θ(n lg n)
6.5 Priority Queues
One of the most popular applications of a heap: an efficient priority queue.
A priority queue is a data structure for maintaining a set S of elements, each with an associated value called a key.
Max-Priority Queue Operations
| Operation | Description |
|---|---|
| INSERT(S, x) | Insert element x into set S |
| MAXIMUM(S) | Return the element with the largest key |
| EXTRACT-MAX(S) | Remove and return the element with the largest key |
| INCREASE-KEY(S, x, k) | Increase element x's key to new value k |
Min-Priority Queue Operations
| Operation | Description |
|---|---|
| INSERT(S, x) | Insert element x into set S |
| MINIMUM(S) | Return the element with the smallest key |
| EXTRACT-MIN(S) | Remove and return the element with the smallest key |
| DECREASE-KEY(S, x, k) | Decrease element x's key to new value k |
Applications
- Max-priority queue: scheduling jobs on a shared computer (select highest-priority job via EXTRACT-MAX, add jobs via INSERT)
- Min-priority queue: event-driven simulation (events simulated in order of occurrence via EXTRACT-MIN; new events added via INSERT)
- Min-priority queues with DECREASE-KEY appear in Chapters 23 and 24 (e.g., Dijkstra's algorithm)
Implementation Note
When using a heap to implement a priority queue, each heap element stores a handle to the corresponding application object (and vice versa). When heap elements move during operations, the handles must be updated accordingly.
HEAP-MAXIMUM
Returns the maximum element in Θ(1) time.
HEAP-MAXIMUM(A)
return A[1]
HEAP-EXTRACT-MAX
Removes and returns the maximum element. Running time: O(lg n).
HEAP-EXTRACT-MAX(A)
if A.heap-size < 1
error "heap underflow"
max = A[1]
A[1] = A[A.heap-size]
A.heap-size = A.heap-size - 1
MAX-HEAPIFY(A, 1)
return max
- Saves the max, moves the last element to the root, shrinks the heap, then restores the heap property via MAX-HEAPIFY
- Constant work on top of O(lg n) for MAX-HEAPIFY → O(lg n) total
HEAP-INCREASE-KEY
Increases the key of element at index i to a new value. Running time: O(lg n).
HEAP-INCREASE-KEY(A, i, key)
if key < A[i]
error "new key is smaller than current key"
A[i] = key
while i > 1 and A[PARENT(i)] < A[i]
exchange A[i] with A[PARENT(i)]
i = PARENT(i)
- Updates the key, then bubbles up toward the root
- Repeatedly compares with parent, swapping if the element's key is larger
- Terminates when the max-heap property holds
- Path from node to root has length O(lg n) → O(lg n) time
MAX-HEAP-INSERT
Inserts a new element into the max-heap. Running time: O(lg n).
MAX-HEAP-INSERT(A, key)
A.heap-size = A.heap-size + 1
A[A.heap-size] = -∞
HEAP-INCREASE-KEY(A, A.heap-size, key)
- Expands the heap by adding a new leaf with key -∞
- Calls HEAP-INCREASE-KEY to set the correct value and restore the heap property
Priority Queue Complexity Summary
All priority-queue operations on a set of size n run in O(lg n) time (MAXIMUM in Θ(1)).
| Operation | Time |
|---|---|
| HEAP-MAXIMUM | Θ(1) |
| HEAP-EXTRACT-MAX | O(lg n) |
| HEAP-INCREASE-KEY | O(lg n) |
| MAX-HEAP-INSERT | O(lg n) |
Previous chapter
Probabilistic Analysis and Randomized AlgorithmsNext chapter
Quicksort