Pagefy

Book

Introduction to Algorithms

1.The Role of Algorithms in Computing 2.Getting Started 3.Growth of Functions 4.Divide and Conquer 5.Probabilistic Analysis and Randomized Algorithms 6.Heapsort 7.Quicksort 8.Sorting in Linear Time 9.Medians and Order Statistics 10.Elementary Data Structures 11.Hash Tables 12.Binary Search Trees 13.Red Black Trees 14.Augmenting Data Structures 15.Dynamic Programming 16.Greedy Algorithms 17.Amortized Analysis 18.B Trees 19.Fibonacci Heaps 20.van Emde Boas Trees 21.Data Structures for Disjoint Sets 22.Elementary Graph Algorithms 23.Minimum Spanning Trees 24.Single Source Shortest Paths 25.All Pairs Shortest Paths 26.Maximum Flow 27.Multithreaded Algorithms 28.Matrix Operations 29.Linear Programming 30.Polynomials and the FFT 31.Number Theoretic Algorithms 32.String Matching 33.Computational Geometry 34.NP Completeness 35.Approximation Algorithms 36.Appendix A Summations 37.Appendix B Sets Etc 38.Appendix C Counting and Probability 39.Appendix D Matrices

Appendix C Counting and Probability

Introduction to Algorithms

Appendix C: Counting and Probability

This appendix reviews elementary combinatorics and probability theory. It covers counting techniques, probability axioms, discrete random variables, and important distributions arising from Bernoulli trials.

C.1 Counting

Counting theory answers the question "How many?" without enumerating all choices.

Rules of Sum and Product

Rule of sum: The number of ways to choose one element from one of two disjoint sets is the sum of their cardinalities.
- If A and B have no members in common: |A ∪ B| = |A| + |B|
- Example: A license plate position that is a letter or digit has 26 + 10 = 36 possibilities.
Rule of product: The number of ways to choose an ordered pair is the product of the number of ways to choose each element.
- |A × B| = |A| · |B|
- Example: 28 flavors × 4 toppings = 112 possible sundaes.

Strings

A string over a finite set S is a sequence of elements of S.
A k-string is a string of length k.
A substring s' of a string s is an ordered sequence of consecutive elements of s.
A k-string over a set S can be viewed as an element of the Cartesian product S^k.
The number of k-strings over an n-set is n^k.
- Example: The number of binary k-strings is 2^k.

Permutations

A permutation of a finite set S is an ordered sequence of all elements of S, each appearing exactly once.
There are n! permutations of a set of n elements.
A k-permutation of S is an ordered sequence of k elements of S, with no element appearing more than once.
The number of k-permutations of an n-set:

$$\frac{n!}{(n-k)!} = n(n-1)(n-2)\cdots(n-k+1)$$

Combinations

A k-combination of an n-set S is a k-subset of S (order does not matter).
Every k-combination has exactly k! permutations of its elements, each being a distinct k-permutation.
The number of k-combinations of an n-set:

$$\frac{n!}{k!(n-k)!}$$

Binomial Coefficients

The notation C(n, k) ("n choose k") denotes the number of k-combinations of an n-set:

$$\binom{n}{k} = \frac{n!}{k!(n-k)!}$$

Symmetry property:

$$\binom{n}{k} = \binom{n}{n-k}$$

Binomial expansion (the source of the name "binomial coefficients"):

$$(x + y)^n = \sum_{k=0}^{n} \binom{n}{k} x^k y^{n-k}$$

Special case (x = y = 1):

$$2^n = \sum_{k=0}^{n} \binom{n}{k}$$

This counts the 2^n binary n-strings by the number of 1s they contain.

Binomial Bounds

Lower bound (for 1 ≤ k ≤ n):

$$\binom{n}{k} \geq \left(\frac{n}{k}\right)^k$$

Upper bound (using Stirling's approximation k! ≥ (k/e)^k):

$$\binom{n}{k} \leq \frac{n^k}{k!} \leq \left(\frac{en}{k}\right)^k$$

Entropy-based bound: For k = λn where 0 ≤ λ ≤ 1:

$$\binom{n}{\lambda n} \leq 2^{nH(\lambda)}$$

where H(λ) is the (binary) entropy function:

$$H(\lambda) = -\lambda \lg \lambda - (1-\lambda) \lg(1-\lambda)$$

with the convention that 0 lg 0 = 0, so H(0) = H(1) = 0.

C.2 Probability

Sample Spaces and Events

A sample space S is a set whose elements are called elementary events (possible outcomes of an experiment).
An event is a subset of the sample space S.
The event S is the certain event; the event ∅ is the null event.
Two events A and B are mutually exclusive if A ∩ B = ∅.

Axioms of Probability

A probability distribution Pr{} on a sample space S maps events to real numbers satisfying:

Pr{A} ≥ 0 for any event A.
Pr{S} = 1 (normalization).
Pr{A ∪ B} = Pr{A} + Pr{B} for any two mutually exclusive events A and B. More generally, for any pairwise mutually exclusive sequence of events A₁, A₂, ...:

$$\Pr\left{\bigcup_i A_i\right} = \sum_i \Pr{A_i}$$

Immediate consequences:

Pr{∅} = 0
If A ⊆ B, then Pr{A} ≤ Pr{B}
Pr{Ā} = 1 − Pr{A} (complement rule)
For any two events A and B:
- Pr{A ∪ B} = Pr{A} + Pr{B} − Pr{A ∩ B}
- Pr{A ∪ B} ≤ Pr{A} + Pr{B} (union bound / Boole's inequality)

Discrete Probability Distributions

A probability distribution is discrete if defined over a finite or countably infinite sample space.
For any event A:

$$\Pr{A} = \sum_{s \in A} \Pr{s}$$

Uniform probability distribution: Every elementary event s ∈ S has probability Pr{s} = 1/|S|. The experiment is described as "picking an element of S at random."

Continuous Uniform Probability Distribution

Defined over a closed interval [a, b] of the reals (a < b).
For any closed interval [c, d] where a ≤ c ≤ d ≤ b:

$$\Pr{[c, d]} = \frac{d - c}{b - a}$$

The probability of any single point is 0.

Conditional Probability and Independence

The conditional probability of event A given event B:

$$\Pr{A \mid B} = \frac{\Pr{A \cap B}}{\Pr{B}} \quad \text{whenever } \Pr{B} \neq 0$$

Two events are independent if:

$$\Pr{A \cap B} = \Pr{A} \cdot \Pr{B}$$

Equivalently (when Pr{B} ≠ 0): Pr{A | B} = Pr{A}.

A collection A₁, A₂, ..., Aₙ is pairwise independent if:

$$\Pr{A_i \cap A_j} = \Pr{A_i} \cdot \Pr{A_j} \quad \text{for all } 1 \leq i < j \leq n$$

The collection is (mutually) independent if every k-subset satisfies:

$$\Pr{A_{i_1} \cap A_{i_2} \cap \cdots \cap A_{i_k}} = \Pr{A_{i_1}} \cdot \Pr{A_{i_2}} \cdots \Pr{A_{i_k}}$$

Key distinction: Pairwise independence does NOT imply mutual independence.
- Example: Flipping two fair coins. Let A₁ = first coin heads, A₂ = second coin heads, A₃ = coins are different. These are pairwise independent but not mutually independent since Pr{A₁ ∩ A₂ ∩ A₃} = 0 ≠ 1/8 = Pr{A₁}·Pr{A₂}·Pr{A₃}.

Bayes's Theorem

From the definition of conditional probability:

$$\Pr{A \cap B} = \Pr{B} \cdot \Pr{A \mid B} = \Pr{A} \cdot \Pr{B \mid A}$$

Solving for Pr{A | B}:

$$\Pr{A \mid B} = \frac{\Pr{A} \cdot \Pr{B \mid A}}{\Pr{B}}$$

Expanded form (using the law of total probability for the denominator):

$$\Pr{A \mid B} = \frac{\Pr{A} \cdot \Pr{B \mid A}}{\Pr{A} \cdot \Pr{B \mid A} + \Pr{\bar{A}} \cdot \Pr{B \mid \bar{A}}}$$

Example: Choose one of two coins at random (one fair, one biased to always show heads). Flip it twice and get heads both times. What is the probability it's the biased coin?
- Let A = biased coin chosen, B = two heads.
- Pr{A} = 1/2, Pr{B | A} = 1, Pr{Ā} = 1/2, Pr{B | Ā} = 1/4
- Pr{A | B} = (1/2 · 1) / (1/2 · 1 + 1/2 · 1/4) = 4/5

C.3 Discrete Random Variables

A (discrete) random variable X is a function from a finite or countably infinite sample space S to the real numbers.
For a random variable X and a real number x:

$$\Pr{X = x} = \sum_{s \in S : X(s) = x} \Pr{s}$$

The probability density function of X: f(x) = Pr{X = x}
- Pr{X = x} ≥ 0 for all x
- Σ_x Pr{X = x} = 1
The joint probability density function of X and Y: f(x, y) = Pr{X = x and Y = y}
Two random variables X and Y are independent if for all x and y:

$$\Pr{X = x \text{ and } Y = y} = \Pr{X = x} \cdot \Pr{Y = y}$$

Expected Value of a Random Variable

The expected value (or expectation or mean) of a discrete random variable X:

$$E[X] = \sum_x x \cdot \Pr{X = x}$$

Key properties:

Linearity of expectation (holds even if X and Y are NOT independent):

$$E[X + Y] = E[X] + E[Y]$$

Scaling:

$$E[aX] = a \cdot E[X]$$

General linearity:

$$E[aX + Y] = a \cdot E[X] + E[Y]$$

Product of independent variables: When X and Y are independent:

$$E[XY] = E[X] \cdot E[Y]$$

More generally, for mutually independent X₁, X₂, ..., Xₙ:

$$E[X_1 X_2 \cdots X_n] = E[X_1] \cdot E[X_2] \cdots E[X_n]$$

For a function g(x) applied to X:

$$E[g(X)] = \sum_x g(x) \cdot \Pr{X = x}$$

Natural number formula: When X takes values from ℕ = {0, 1, 2, ...}:

$$E[X] = \sum_{i=1}^{\infty} \Pr{X \geq i}$$

Jensen's inequality: For a convex function f:

$$E[f(X)] \geq f(E[X])$$

Variance and Standard Deviation

The variance of a random variable X with mean E[X]:

$$\text{Var}[X] = E[(X - E[X])^2] = E[X^2] - E^2[X]$$

Rearranging: E[X²] = Var[X] + E²[X]

Key properties:

Var[aX] = a² · Var[X]
When X and Y are independent: Var[X + Y] = Var[X] + Var[Y]
For pairwise independent X₁, X₂, ..., Xₙ:

$$\text{Var}\left[\sum_{i=1}^{n} X_i\right] = \sum_{i=1}^{n} \text{Var}[X_i]$$

The standard deviation of X is σ_X = √Var[X].

C.4 The Geometric and Binomial Distributions

Both distributions arise from Bernoulli trials — experiments with only two outcomes:

Success with probability p
Failure with probability q = 1 − p

Trials are mutually independent and (unless stated otherwise) each has the same probability p.

The Geometric Distribution

Models the number of trials needed to obtain the first success.
X has values in {1, 2, ...}, and for k ≥ 1:

$$\Pr{X = k} = q^{k-1} p$$

(k − 1 failures followed by one success)

Expected value:

$$E[X] = \frac{1}{p}$$

Variance:

$$\text{Var}[X] = \frac{q}{p^2}$$

Example: Rolling two dice until getting a 7 or 11. Probability of success p = 8/36 = 2/9. Expected rolls = 1/p = 9/2 = 4.5.

The Binomial Distribution

Models the number of successes in n Bernoulli trials.
X has values in {0, 1, ..., n}, and for k = 0, 1, ..., n:

$$\Pr{X = k} = \binom{n}{k} p^k q^{n-k}$$

Notation: b(k; n, p) = C(n,k) · p^k · (1−p)^(n−k)
The name "binomial" comes from the kth term of the expansion of (p + q)^n.
Normalization: Since p + q = 1:

$$\sum_{k=0}^{n} b(k; n, p) = 1$$

Expected value (derivable via linearity of expectation):

$$E[X] = np$$

Let Xᵢ be the indicator for success on trial i. Then E[Xᵢ] = p, and E[X] = Σ E[Xᵢ] = np.

Variance:

$$\text{Var}[X] = npq$$

Since Var[Xᵢ] = p(1−p) = pq, and trials are independent: Var[X] = Σ Var[Xᵢ] = npq.

Shape of the Binomial Distribution

The ratio of successive terms:

$$\frac{b(k; n, p)}{b(k-1; n, p)} = \frac{(n-k+1)p}{kq}$$

This ratio is > 1 when k < (n+1)p and < 1 when k > (n+1)p. Therefore:

The distribution increases for k < (n+1)p
The distribution decreases for k > (n+1)p
The maximum is attained at the unique integer k in the range np − q < k < (n+1)p (or at two adjacent values if (n+1)p is an integer).

Lemma C.1 (Upper Bound on Binomial Distribution)

For n ≥ 0, 0 < p < 1, q = 1 − p, and 0 ≤ k ≤ n:

$$b(k; n, p) \leq \left(\frac{np}{k}\right)^k \left(\frac{nq}{n-k}\right)^{n-k}$$

C.5 The Tails of the Binomial Distribution

The tails are the regions of b(k; n, p) far from the mean np. We bound the probability of having at least (or at most) k successes.

Right Tail Bound (Theorem C.2)

For 0 ≤ k ≤ n, the probability of at least k successes:

$$\Pr{X \geq k} = \sum_{i=k}^{n} b(i; n, p) \leq \binom{n}{k} p^k$$

Proof sketch: Uses Boole's inequality (union bound) over all k-subsets of trials.

Left Tail Bound (Corollary C.3)

For 0 ≤ k ≤ n, the probability of at most k successes:

$$\Pr{X \leq k} \leq \binom{n}{n-k}(1-p)^{n-k} = \binom{n}{k}(1-p)^{n-k}$$

Exponential Decay of the Left Tail (Theorem C.4)

For 0 < k < np, the probability of fewer than k successes:

$$\Pr{X < k} < \frac{kq}{np - k} \cdot b(k; n, p)$$

Proof idea: The terms b(i; n, p) for i < k form a series bounded by a geometric series with ratio x = kq/((n−k)p) < 1.

Corollary C.5

For 0 < k ≤ np/2: the probability of fewer than k successes is less than half the probability of fewer than k+1 successes. This shows the left tail diminishes exponentially far from the mean.

Right Tail Analogues

Corollary C.6: For np < k < n:

$$\Pr{X > k} < \frac{(n-k)p}{k - np} \cdot b(k; n, p)$$

Corollary C.7: For (np + n)/2 < k < n: the probability of more than k successes is less than half the probability of more than k−1 successes.

Chernoff-type Bound (Theorem C.8)

For n Bernoulli trials where the ith trial has success probability pᵢ, let X be the total successes and μ = E[X] = Σpᵢ. Then for r > μ:

$$\Pr{X - \mu \geq r} \leq \left(\frac{e\mu}{r}\right)^r$$

Proof outline:

Convert to an exponential: Pr{X − μ ≥ r} = Pr{e^(α(X−μ)) ≥ e^(αr)}
Apply Markov's inequality: Pr{e^(α(X−μ)) ≥ e^(αr)} ≤ E[e^(α(X−μ))] / e^(αr)
Bound E[e^(α(X−μ))] using independence and the inequality 1 + x ≤ e^x
Choose α = ln(r/μ) to minimize the bound

Corollary C.9 (Binomial Right Tail — Chernoff Bound)

For n Bernoulli trials each with success probability p, and r > np:

$$\Pr{X - np \geq r} \leq \left(\frac{npe}{r}\right)^r$$

This follows directly from Theorem C.8 with μ = np.

Summary of Key Formulas

Concept	Formula
k-strings over n-set	n^k
Permutations of n elements	n!
k-permutations of n-set	n!/(n−k)!
k-combinations (binomial coefficient)	n! / (k!(n−k)!)
Binomial expansion	(x+y)^n = Σ C(n,k) x^k y^(n−k)
Geometric distribution Pr{X=k}	q^(k−1) · p
Geometric expectation	1/p
Geometric variance	q/p²
Binomial distribution b(k;n,p)	C(n,k) · p^k · q^(n−k)
Binomial expectation	np
Binomial variance	npq
Chernoff bound (right tail)	Pr{X−μ ≥ r} ≤ (eμ/r)^r

Previous chapter

Appendix B Sets Etc

Next chapter

Appendix D Matrices