Pagefy

Pagefy

Back to Data Structures and Algorithms

Sorting in Linear Time

Introduction to Algorithms

Chapter 8: Sorting in Linear Time

Merge sort, heapsort, and quicksort all achieve O(n lg n) time. These are all comparison sorts — they determine order based only on comparisons between input elements. In this chapter we prove that no comparison sort can do better than Ω(n lg n) in the worst case, and then present three algorithms that beat this bound by not being comparison sorts.


8.1 Lower Bounds for Sorting

In a comparison sort, the only operation used to determine order is comparing pairs of elements (e.g., aᵢ ≤ aⱼ). We cannot inspect element values directly or gain order information any other way.

The Decision-Tree Model

A decision tree is a full binary tree that abstractly represents a comparison sort:

  • Each internal node is labeled i:j, representing a comparison aᵢ ≤ aⱼ
  • The left subtree corresponds to aᵢ ≤ aⱼ being true
  • The right subtree corresponds to aᵢ > aⱼ
  • Each leaf is labeled with a permutation ⟨π(1), π(2), …, π(n)⟩ representing the determined sorted order
  • Execution of the sorting algorithm traces a root-to-leaf path
  • For correctness, every one of the n! permutations must appear as a reachable leaf

The worst-case number of comparisons equals the height of the decision tree.

A Lower Bound for the Worst Case

Theorem 8.1: Any comparison sort algorithm requires Ω(n lg n) comparisons in the worst case.

Proof sketch:

  1. A decision tree of height h has at most leaves
  2. It must have at least n! reachable leaves (one per permutation)
  3. Therefore: n! ≤ 2ʰ
  4. Taking logarithms: h ≥ lg(n!) = Ω(n lg n) (by Stirling's approximation)

Corollary 8.2: Heapsort and merge sort are asymptotically optimal comparison sorts, since their O(n lg n) upper bounds match the Ω(n lg n) lower bound.


8.2 Counting Sort

Counting sort assumes each of the n input elements is an integer in the range 0 to k. When k = O(n), it runs in Θ(n) time.

Key Idea

For each input element x, determine the number of elements less than x, then place x directly into its correct output position.

Algorithm

  • Input: Array A[1..n], output array B[1..n], integer k (max value)
  • Auxiliary: Array C[0..k] for counting
COUNTING-SORT(A, B, k)
1   let C[0..k] be a new array
2   for i = 0 to k
3       C[i] = 0
4   for j = 1 to A.length
5       C[A[j]] = C[A[j]] + 1
6   // C[i] now contains the number of elements equal to i
7   for i = 1 to k
8       C[i] = C[i] + C[i - 1]
9   // C[i] now contains the number of elements ≤ i
10  for j = A.length downto 1
11      B[C[A[j]]] = A[j]
12      C[A[j]] = C[A[j]] - 1

How It Works

  1. Lines 2–3: Initialize all counters to 0
  2. Lines 4–5: Count occurrences — C[i] = number of elements equal to i
  3. Lines 7–8: Compute prefix sums — C[i] = number of elements ≤ i
  4. Lines 10–12: Place each element into its correct position in B, decrementing the counter so duplicates go to the preceding position

Analysis

LoopTime
Lines 2–3 (init C)Θ(k)
Lines 4–5 (count)Θ(n)
Lines 7–8 (prefix sums)Θ(k)
Lines 10–12 (output)Θ(n)
TotalΘ(n + k)

When k = O(n), the running time is Θ(n).

Why It Beats Ω(n lg n)

Counting sort is not a comparison sort — it uses element values as array indices. The Ω(n lg n) lower bound does not apply.

Stability

Counting sort is stable: elements with equal keys appear in the output in the same order as in the input. This is critical because counting sort is used as a subroutine in radix sort, which requires a stable intermediate sort.

  • Stability is achieved by iterating downto 1 in line 10

8.3 Radix Sort

Radix sort sorts numbers by processing individual digits, starting from the least significant digit to the most significant digit.

Key Idea

  • Sort on the least significant digit first using a stable sort
  • Repeat for each digit position, moving toward the most significant digit
  • After d passes (for d-digit numbers), the array is fully sorted

This is counterintuitive — sorting most-significant-digit first would require recursive bin management. Least-significant-digit first avoids this entirely.

Algorithm

RADIX-SORT(A, d)
1   for i = 1 to d
2       use a stable sort to sort array A on digit i

Where digit 1 is the lowest-order digit and digit d is the highest-order digit.

Correctness

By induction on the digit position being sorted. The stability of the intermediate sort ensures that the relative order established by previous passes is preserved.

Analysis

Lemma 8.3: Given n numbers with d digits, each digit taking up to k values, RADIX-SORT runs in Θ(d(n + k)) time when the stable sort takes Θ(n + k) time.

  • Using counting sort (digits in range 0 to k−1) as the stable sort: each pass is Θ(n + k)
  • There are d passes → total Θ(d(n + k))
  • When d is constant and k = O(n): Θ(n) — linear time

Optimizing with Bit Grouping

Lemma 8.4: Given n numbers with b bits and any positive integer r ≤ b, RADIX-SORT runs in Θ((b/r)(n + 2ʳ)) time.

  • View each key as having d = ⌈b/r⌉ digits of r bits each
  • Each digit is in range 0 to 2ʳ − 1
  • Example: 32-bit word with r = 8 → 4 passes, k = 255

Choosing optimal r:

  • If b < ⌊lg n⌋: choose r = b → running time Θ(n)
  • If b ≥ ⌊lg n⌋: choose r = ⌊lg n⌋ → running time Θ(bn / lg n)

Radix Sort vs. Comparison Sorts

  • If b = O(lg n) and r ≈ lg n, radix sort runs in Θ(n) vs. quicksort's Θ(n lg n)
  • However, constant factors differ — each radix sort pass may be slower
  • Quicksort often uses hardware caches more effectively
  • Radix sort (with counting sort) is not in-place, unlike quicksort
  • The best choice depends on implementation, machine, and input characteristics

8.4 Bucket Sort

Bucket sort assumes input is drawn from a uniform distribution over the interval [0, 1) and achieves average-case O(n) time.

Key Idea

  1. Divide [0, 1) into n equal-sized buckets (subintervals)
  2. Distribute input elements into buckets
  3. Sort each bucket individually (using insertion sort)
  4. Concatenate all buckets in order

Since input is uniformly distributed, we expect few elements per bucket.

Algorithm

BUCKET-SORT(A)
1   n = A.length
2   let B[0..n-1] be a new array
3   for i = 0 to n - 1
4       make B[i] an empty list
5   for i = 1 to n
6       insert A[i] into list B[⌊n·A[i]⌋]
7   for i = 0 to n - 1
8       sort list B[i] with insertion sort
9   concatenate the lists B[0], B[1], ..., B[n-1] together in order
  • Input: Array A[1..n] where each element satisfies 0 ≤ A[i] < 1
  • Auxiliary: Array B[0..n−1] of linked lists (buckets)
  • Bucket i holds values in the half-open interval [i/n, (i+1)/n)

Correctness

For any two elements A[i] ≤ A[j]:

  • Either they fall in the same bucket (insertion sort orders them correctly)
  • Or A[i] falls in a lower-indexed bucket (concatenation orders them correctly)

Analysis

Running time:

$$T(n) = \Theta(n) + \sum_{i=0}^{n-1} O(n_i^2)$$

where nᵢ is the number of elements in bucket i.

Expected running time (uniform distribution):

  • Define indicator variable Xᵢⱼ = 1 if A[j] falls in bucket i
  • Each element falls in any bucket with probability 1/n
  • E[nᵢ²] = 2 − 1/n for each bucket i

Proof of E[nᵢ²] = 2 − 1/n:

  • E[Xᵢⱼ²] = 1/n (since Xᵢⱼ is an indicator with probability 1/n)
  • E[XᵢⱼXᵢₖ] = 1/n² for j ≠ k (independence)
  • E[nᵢ²] = n · (1/n) + n(n−1) · (1/n²) = 1 + (n−1)/n = 2 − 1/n

Therefore:

$$E[T(n)] = \Theta(n) + n \cdot O(2 - 1/n) = \Theta(n)$$

Summary of Properties

PropertyDetail
Average caseΘ(n) (uniform distribution)
Worst caseΘ(n²) (all elements in one bucket)
AssumptionInput uniformly distributed over [0, 1)
Not a comparison sortUses element values to assign buckets

Note: Even without a uniform distribution, bucket sort runs in linear time as long as the sum of the squares of the bucket sizes is linear in n.


Summary: Comparison of Linear-Time Sorts

AlgorithmTimeAssumptionStable?In-Place?
Counting SortΘ(n + k)Integers in [0, k]YesNo
Radix SortΘ(d(n + k))d-digit keys, digits in [0, k−1]YesNo
Bucket SortΘ(n) avgUniform distribution on [0, 1)Depends on sub-sortNo

Key takeaway: The Ω(n lg n) lower bound applies only to comparison sorts. By exploiting structure in the input (integer keys, bounded range, known distribution), we can sort in linear time.