Pagefy

Pagefy

Back to Data Structures and Algorithms

Heapsort

Introduction to Algorithms

Chapter 6: Heapsort

Heapsort combines the best attributes of merge sort and insertion sort:

  • Like merge sort: runs in O(n lg n) time
  • Like insertion sort: sorts in place (only a constant number of elements stored outside the input array)

Heapsort introduces the heap data structure, which is useful both for sorting and for implementing an efficient priority queue.

Note: "Heap" here refers to the data structure, not garbage-collected storage used in languages like Java or Lisp.


6.1 Heaps

The (binary) heap is an array object that can be viewed as a nearly complete binary tree.

  • The tree is completely filled on all levels except possibly the lowest, which is filled from the left up to a point
  • Each node of the tree corresponds to an element of the array

Array Representation

An array A representing a heap has two attributes:

  • A.length — the number of elements in the array
  • A.heap-size — how many elements in the heap are stored within A
    • 0 ≤ A.heap-size ≤ A.length
    • Only elements in A[1 .. A.heap-size] are valid heap elements

The root of the tree is A[1]. Given the index i of a node:

PARENT(i)
    return ⌊i/2⌋

LEFT(i)
    return 2i

RIGHT(i)
    return 2i + 1
  • LEFT: computed by shifting i left one bit
  • RIGHT: computed by shifting i left one bit and adding 1
  • PARENT: computed by shifting i right one bit

These are often implemented as macros or inline procedures.

Max-Heaps and Min-Heaps

There are two kinds of binary heaps:

  • Max-heap: for every node i other than the root:

    • A[PARENT(i)] ≥ A[i]
    • The largest element is at the root
    • Subtrees rooted at any node contain no values larger than that node
  • Min-heap: for every node i other than the root:

    • A[PARENT(i)] ≤ A[i]
    • The smallest element is at the root

Heapsort uses max-heaps. Min-heaps commonly implement priority queues.

Height

  • Height of a node: number of edges on the longest simple downward path from the node to a leaf
  • Height of the heap: height of its root
  • A heap of n elements has height ⌊lg n⌋
  • Basic operations run in time proportional to the height: O(lg n)

Key Procedures Summary

ProcedureRunning TimePurpose
MAX-HEAPIFYO(lg n)Maintain the max-heap property
BUILD-MAX-HEAPO(n)Produce a max-heap from an unordered array
HEAPSORTO(n lg n)Sort an array in place
HEAP-MAXIMUMΘ(1)Return the maximum element
HEAP-EXTRACT-MAXO(lg n)Remove and return the maximum
HEAP-INCREASE-KEYO(lg n)Increase an element's key
MAX-HEAP-INSERTO(lg n)Insert a new element

6.2 Maintaining the Heap Property

MAX-HEAPIFY is the key procedure for maintaining the max-heap property.

Assumptions

  • The binary trees rooted at LEFT(i) and RIGHT(i) are already max-heaps
  • A[i] might be smaller than its children, violating the max-heap property
  • MAX-HEAPIFY lets the value at A[i] "float down" in the heap

Algorithm

MAX-HEAPIFY(A, i)
    l = LEFT(i)
    r = RIGHT(i)
    if l ≤ A.heap-size and A[l] > A[i]
        largest = l
    else largest = i
    if r ≤ A.heap-size and A[r] > A[largest]
        largest = r
    if largest ≠ i
        exchange A[i] with A[largest]
        MAX-HEAPIFY(A, largest)

How It Works

  1. Determine the largest of A[i], A[LEFT(i)], and A[RIGHT(i)]
  2. If A[i] is largest → subtree already a max-heap → terminate
  3. Otherwise → swap A[i] with A[largest]
  4. The subtree rooted at largest might now violate the property → recurse

Running Time Analysis

  • Fixing up relationships among A[i], A[LEFT(i)], A[RIGHT(i)] takes Θ(1)
  • Plus the time to recurse on a child's subtree
  • Each child's subtree has size at most 2n/3 (worst case: bottom level exactly half full)
  • Recurrence: T(n) ≤ T(2n/3) + Θ(1)
  • By case 2 of the Master Theorem: T(n) = O(lg n)
  • Equivalently: O(h) where h is the height of the node

6.3 Building a Heap

BUILD-MAX-HEAP converts an unordered array A[1 .. n] into a max-heap using a bottom-up approach.

Key Insight

  • Elements in A[⌊n/2⌋ + 1 .. n] are all leaves of the tree
  • Each leaf is a trivial 1-element max-heap
  • Process the remaining (internal) nodes from bottom to top, calling MAX-HEAPIFY on each

Algorithm

BUILD-MAX-HEAP(A)
    A.heap-size = A.length
    for i = ⌊A.length/2⌋ downto 1
        MAX-HEAPIFY(A, i)

Correctness (Loop Invariant)

At the start of each iteration of the for loop, each node i+1, i+2, ..., n is the root of a max-heap.

  • Initialization: i = ⌊n/2⌋. Nodes ⌊n/2⌋ + 1, ..., n are leaves → trivial max-heaps ✓
  • Maintenance: Children of node i are numbered higher than i → by the invariant, they are roots of max-heaps → calling MAX-HEAPIFY(A, i) makes node i a max-heap root ✓
  • Termination: i = 0. Every node 1, 2, ..., n is the root of a max-heap. In particular, node 1 (the root) ✓

Running Time

Naive bound: O(n) calls × O(lg n) each = O(n lg n)

Tighter bound: The time for MAX-HEAPIFY varies with node height, and most nodes have small height.

  • An n-element heap has height ⌊lg n⌋
  • At most ⌈n / 2^(h+1)⌉ nodes of height h
  • Total cost:
  ⌊lg n⌋
   Σ    ⌈n / 2^(h+1)⌉ · O(h)  =  O(n · Σ h/2^h)
  h=0

Using the identity:

  ∞
  Σ  h/2^h  =  (1/2) / (1 - 1/2)²  =  2
  h=0

Therefore: BUILD-MAX-HEAP runs in O(n) time — linear!

BUILD-MIN-HEAP works the same way but calls MIN-HEAPIFY instead. Also runs in O(n) time.


6.4 The Heapsort Algorithm

Strategy

  1. Build a max-heap from the input array
  2. The maximum element is at A[1] — swap it with A[n] (its correct final position)
  3. Shrink the heap by decrementing A.heap-size
  4. The children of the root remain max-heaps, but the new root may violate the property → call MAX-HEAPIFY(A, 1)
  5. Repeat for heap sizes n-1 down to 2

Algorithm

HEAPSORT(A)
    BUILD-MAX-HEAP(A)
    for i = A.length downto 2
        exchange A[1] with A[i]
        A.heap-size = A.heap-size - 1
        MAX-HEAPIFY(A, 1)

Running Time

  • BUILD-MAX-HEAP: O(n)
  • n - 1 calls to MAX-HEAPIFY: each O(lg n)
  • Total: O(n lg n)

Properties

  • In-place sorting algorithm
  • Not stable (equal elements may change relative order)
  • Worst-case running time: Θ(n lg n)
  • Best-case running time (distinct elements): Θ(n lg n)

6.5 Priority Queues

One of the most popular applications of a heap: an efficient priority queue.

A priority queue is a data structure for maintaining a set S of elements, each with an associated value called a key.

Max-Priority Queue Operations

OperationDescription
INSERT(S, x)Insert element x into set S
MAXIMUM(S)Return the element with the largest key
EXTRACT-MAX(S)Remove and return the element with the largest key
INCREASE-KEY(S, x, k)Increase element x's key to new value k

Min-Priority Queue Operations

OperationDescription
INSERT(S, x)Insert element x into set S
MINIMUM(S)Return the element with the smallest key
EXTRACT-MIN(S)Remove and return the element with the smallest key
DECREASE-KEY(S, x, k)Decrease element x's key to new value k

Applications

  • Max-priority queue: scheduling jobs on a shared computer (select highest-priority job via EXTRACT-MAX, add jobs via INSERT)
  • Min-priority queue: event-driven simulation (events simulated in order of occurrence via EXTRACT-MIN; new events added via INSERT)
  • Min-priority queues with DECREASE-KEY appear in Chapters 23 and 24 (e.g., Dijkstra's algorithm)

Implementation Note

When using a heap to implement a priority queue, each heap element stores a handle to the corresponding application object (and vice versa). When heap elements move during operations, the handles must be updated accordingly.


HEAP-MAXIMUM

Returns the maximum element in Θ(1) time.

HEAP-MAXIMUM(A)
    return A[1]

HEAP-EXTRACT-MAX

Removes and returns the maximum element. Running time: O(lg n).

HEAP-EXTRACT-MAX(A)
    if A.heap-size < 1
        error "heap underflow"
    max = A[1]
    A[1] = A[A.heap-size]
    A.heap-size = A.heap-size - 1
    MAX-HEAPIFY(A, 1)
    return max
  • Saves the max, moves the last element to the root, shrinks the heap, then restores the heap property via MAX-HEAPIFY
  • Constant work on top of O(lg n) for MAX-HEAPIFY → O(lg n) total

HEAP-INCREASE-KEY

Increases the key of element at index i to a new value. Running time: O(lg n).

HEAP-INCREASE-KEY(A, i, key)
    if key < A[i]
        error "new key is smaller than current key"
    A[i] = key
    while i > 1 and A[PARENT(i)] < A[i]
        exchange A[i] with A[PARENT(i)]
        i = PARENT(i)
  • Updates the key, then bubbles up toward the root
  • Repeatedly compares with parent, swapping if the element's key is larger
  • Terminates when the max-heap property holds
  • Path from node to root has length O(lg n) → O(lg n) time

MAX-HEAP-INSERT

Inserts a new element into the max-heap. Running time: O(lg n).

MAX-HEAP-INSERT(A, key)
    A.heap-size = A.heap-size + 1
    A[A.heap-size] = -∞
    HEAP-INCREASE-KEY(A, A.heap-size, key)
  • Expands the heap by adding a new leaf with key -∞
  • Calls HEAP-INCREASE-KEY to set the correct value and restore the heap property

Priority Queue Complexity Summary

All priority-queue operations on a set of size n run in O(lg n) time (MAXIMUM in Θ(1)).

OperationTime
HEAP-MAXIMUMΘ(1)
HEAP-EXTRACT-MAXO(lg n)
HEAP-INCREASE-KEYO(lg n)
MAX-HEAP-INSERTO(lg n)