Pagefy

Book

Introduction to Algorithms

1.The Role of Algorithms in Computing 2.Getting Started 3.Growth of Functions 4.Divide and Conquer 5.Probabilistic Analysis and Randomized Algorithms 6.Heapsort 7.Quicksort 8.Sorting in Linear Time 9.Medians and Order Statistics 10.Elementary Data Structures 11.Hash Tables 12.Binary Search Trees 13.Red Black Trees 14.Augmenting Data Structures 15.Dynamic Programming 16.Greedy Algorithms 17.Amortized Analysis 18.B Trees 19.Fibonacci Heaps 20.van Emde Boas Trees 21.Data Structures for Disjoint Sets 22.Elementary Graph Algorithms 23.Minimum Spanning Trees 24.Single Source Shortest Paths 25.All Pairs Shortest Paths 26.Maximum Flow 27.Multithreaded Algorithms 28.Matrix Operations 29.Linear Programming 30.Polynomials and the FFT 31.Number Theoretic Algorithms 32.String Matching 33.Computational Geometry 34.NP Completeness 35.Approximation Algorithms 36.Appendix A Summations 37.Appendix B Sets Etc 38.Appendix C Counting and Probability 39.Appendix D Matrices

Sorting in Linear Time

Introduction to Algorithms

Chapter 8: Sorting in Linear Time

Merge sort, heapsort, and quicksort all achieve O(n lg n) time. These are all comparison sorts — they determine order based only on comparisons between input elements. In this chapter we prove that no comparison sort can do better than Ω(n lg n) in the worst case, and then present three algorithms that beat this bound by not being comparison sorts.

8.1 Lower Bounds for Sorting

In a comparison sort, the only operation used to determine order is comparing pairs of elements (e.g., aᵢ ≤ aⱼ). We cannot inspect element values directly or gain order information any other way.

The Decision-Tree Model

A decision tree is a full binary tree that abstractly represents a comparison sort:

Each internal node is labeled i:j, representing a comparison aᵢ ≤ aⱼ
The left subtree corresponds to aᵢ ≤ aⱼ being true
The right subtree corresponds to aᵢ > aⱼ
Each leaf is labeled with a permutation ⟨π(1), π(2), …, π(n)⟩ representing the determined sorted order
Execution of the sorting algorithm traces a root-to-leaf path
For correctness, every one of the n! permutations must appear as a reachable leaf

The worst-case number of comparisons equals the height of the decision tree.

A Lower Bound for the Worst Case

Theorem 8.1: Any comparison sort algorithm requires Ω(n lg n) comparisons in the worst case.

Proof sketch:

A decision tree of height h has at most 2ʰ leaves
It must have at least n! reachable leaves (one per permutation)
Therefore: n! ≤ 2ʰ
Taking logarithms: h ≥ lg(n!) = Ω(n lg n) (by Stirling's approximation)

Corollary 8.2: Heapsort and merge sort are asymptotically optimal comparison sorts, since their O(n lg n) upper bounds match the Ω(n lg n) lower bound.

8.2 Counting Sort

Counting sort assumes each of the n input elements is an integer in the range 0 to k. When k = O(n), it runs in Θ(n) time.

Key Idea

For each input element x, determine the number of elements less than x, then place x directly into its correct output position.

Algorithm

Input: Array A[1..n], output array B[1..n], integer k (max value)
Auxiliary: Array C[0..k] for counting

COUNTING-SORT(A, B, k)
1   let C[0..k] be a new array
2   for i = 0 to k
3       C[i] = 0
4   for j = 1 to A.length
5       C[A[j]] = C[A[j]] + 1
6   // C[i] now contains the number of elements equal to i
7   for i = 1 to k
8       C[i] = C[i] + C[i - 1]
9   // C[i] now contains the number of elements ≤ i
10  for j = A.length downto 1
11      B[C[A[j]]] = A[j]
12      C[A[j]] = C[A[j]] - 1

How It Works

Lines 2–3: Initialize all counters to 0
Lines 4–5: Count occurrences — C[i] = number of elements equal to i
Lines 7–8: Compute prefix sums — C[i] = number of elements ≤ i
Lines 10–12: Place each element into its correct position in B, decrementing the counter so duplicates go to the preceding position

Analysis

Loop	Time
Lines 2–3 (init C)	Θ(k)
Lines 4–5 (count)	Θ(n)
Lines 7–8 (prefix sums)	Θ(k)
Lines 10–12 (output)	Θ(n)
Total	Θ(n + k)

When k = O(n), the running time is Θ(n).

Why It Beats Ω(n lg n)

Counting sort is not a comparison sort — it uses element values as array indices. The Ω(n lg n) lower bound does not apply.

Stability

Counting sort is stable: elements with equal keys appear in the output in the same order as in the input. This is critical because counting sort is used as a subroutine in radix sort, which requires a stable intermediate sort.

Stability is achieved by iterating downto 1 in line 10

8.3 Radix Sort

Radix sort sorts numbers by processing individual digits, starting from the least significant digit to the most significant digit.

Key Idea

Sort on the least significant digit first using a stable sort
Repeat for each digit position, moving toward the most significant digit
After d passes (for d-digit numbers), the array is fully sorted

This is counterintuitive — sorting most-significant-digit first would require recursive bin management. Least-significant-digit first avoids this entirely.

Algorithm

RADIX-SORT(A, d)
1   for i = 1 to d
2       use a stable sort to sort array A on digit i

Where digit 1 is the lowest-order digit and digit d is the highest-order digit.

Correctness

By induction on the digit position being sorted. The stability of the intermediate sort ensures that the relative order established by previous passes is preserved.

Analysis

Lemma 8.3: Given n numbers with d digits, each digit taking up to k values, RADIX-SORT runs in Θ(d(n + k)) time when the stable sort takes Θ(n + k) time.

Using counting sort (digits in range 0 to k−1) as the stable sort: each pass is Θ(n + k)
There are d passes → total Θ(d(n + k))
When d is constant and k = O(n): Θ(n) — linear time

Optimizing with Bit Grouping

Lemma 8.4: Given n numbers with b bits and any positive integer r ≤ b, RADIX-SORT runs in Θ((b/r)(n + 2ʳ)) time.

View each key as having d = ⌈b/r⌉ digits of r bits each
Each digit is in range 0 to 2ʳ − 1
Example: 32-bit word with r = 8 → 4 passes, k = 255

Choosing optimal r:

If b < ⌊lg n⌋: choose r = b → running time Θ(n)
If b ≥ ⌊lg n⌋: choose r = ⌊lg n⌋ → running time Θ(bn / lg n)

Radix Sort vs. Comparison Sorts

If b = O(lg n) and r ≈ lg n, radix sort runs in Θ(n) vs. quicksort's Θ(n lg n)
However, constant factors differ — each radix sort pass may be slower
Quicksort often uses hardware caches more effectively
Radix sort (with counting sort) is not in-place, unlike quicksort
The best choice depends on implementation, machine, and input characteristics

8.4 Bucket Sort

Bucket sort assumes input is drawn from a uniform distribution over the interval [0, 1) and achieves average-case O(n) time.

Key Idea

Divide [0, 1) into n equal-sized buckets (subintervals)
Distribute input elements into buckets
Sort each bucket individually (using insertion sort)
Concatenate all buckets in order

Since input is uniformly distributed, we expect few elements per bucket.

Algorithm

BUCKET-SORT(A)
1   n = A.length
2   let B[0..n-1] be a new array
3   for i = 0 to n - 1
4       make B[i] an empty list
5   for i = 1 to n
6       insert A[i] into list B[⌊n·A[i]⌋]
7   for i = 0 to n - 1
8       sort list B[i] with insertion sort
9   concatenate the lists B[0], B[1], ..., B[n-1] together in order

Input: Array A[1..n] where each element satisfies 0 ≤ A[i] < 1
Auxiliary: Array B[0..n−1] of linked lists (buckets)
Bucket i holds values in the half-open interval [i/n, (i+1)/n)

Correctness

For any two elements A[i] ≤ A[j]:

Either they fall in the same bucket (insertion sort orders them correctly)
Or A[i] falls in a lower-indexed bucket (concatenation orders them correctly)

Analysis

Running time:

$$T(n) = \Theta(n) + \sum_{i=0}^{n-1} O(n_i^2)$$

where nᵢ is the number of elements in bucket i.

Expected running time (uniform distribution):

Define indicator variable Xᵢⱼ = 1 if A[j] falls in bucket i
Each element falls in any bucket with probability 1/n
E[nᵢ²] = 2 − 1/n for each bucket i

Proof of E[nᵢ²] = 2 − 1/n:

E[Xᵢⱼ²] = 1/n (since Xᵢⱼ is an indicator with probability 1/n)
E[XᵢⱼXᵢₖ] = 1/n² for j ≠ k (independence)
E[nᵢ²] = n · (1/n) + n(n−1) · (1/n²) = 1 + (n−1)/n = 2 − 1/n

Therefore:

$$E[T(n)] = \Theta(n) + n \cdot O(2 - 1/n) = \Theta(n)$$

Summary of Properties

Property	Detail
Average case	Θ(n) (uniform distribution)
Worst case	Θ(n²) (all elements in one bucket)
Assumption	Input uniformly distributed over [0, 1)
Not a comparison sort	Uses element values to assign buckets

Note: Even without a uniform distribution, bucket sort runs in linear time as long as the sum of the squares of the bucket sizes is linear in n.

Summary: Comparison of Linear-Time Sorts

Algorithm	Time	Assumption	Stable?	In-Place?
Counting Sort	Θ(n + k)	Integers in [0, k]	Yes	No
Radix Sort	Θ(d(n + k))	d-digit keys, digits in [0, k−1]	Yes	No
Bucket Sort	Θ(n) avg	Uniform distribution on [0, 1)	Depends on sub-sort	No

Key takeaway: The Ω(n lg n) lower bound applies only to comparison sorts. By exploiting structure in the input (integer keys, bounded range, known distribution), we can sort in linear time.

Previous chapter

Quicksort

Next chapter

Medians and Order Statistics