The Role of Algorithms in Computing
Introduction to AlgorithmsChapter 1: The Role of Algorithms in Computing
1.1 Algorithms
What is an Algorithm?
- An algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values, as output.
- An algorithm is a sequence of computational steps that transform the input into the output.
- An algorithm can also be viewed as a tool for solving a well-specified computational problem.
- The problem statement specifies the desired input/output relationship in general terms.
- The algorithm describes a specific computational procedure for achieving that relationship.
The Sorting Problem (Example)
- Input: A sequence of n numbers ⟨a₁, a₂, …, aₙ⟩.
- Output: A permutation (reordering) ⟨a'₁, a'₂, …, a'ₙ⟩ of the input sequence such that a'₁ ≤ a'₂ ≤ … ≤ a'ₙ.
- Example: Given ⟨31, 41, 59, 26, 41, 58⟩, a sorting algorithm returns ⟨26, 31, 41, 41, 58, 59⟩.
- An instance of a problem consists of the input (satisfying whatever constraints are imposed in the problem statement) needed to compute a solution.
Correctness
- An algorithm is correct if, for every input instance, it halts with the correct output.
- A correct algorithm solves the given computational problem.
- An incorrect algorithm might not halt at all on some inputs, or might halt with an incorrect answer.
- Incorrect algorithms can sometimes be useful if we can control their error rate (e.g., algorithms for finding large prime numbers in Chapter 31).
Specifying Algorithms
- An algorithm can be specified in English, as a computer program, or even as a hardware design.
- The only requirement is that the specification must provide a precise description of the computational procedure.
What Kinds of Problems Are Solved by Algorithms?
Practical applications of algorithms are ubiquitous:
- The Human Genome Project — Identifying genes in human DNA, determining sequences of billions of chemical base pairs, storing information in databases, and developing tools for data analysis. Each step requires sophisticated algorithms.
- The Internet — Managing and manipulating large volumes of data. Key algorithmic problems include finding good data routes (Chapter 24) and search engine indexing (Chapters 11 and 32).
- Electronic Commerce — Depends on privacy of personal information. Core technologies include public-key cryptography and digital signatures (Chapter 31), based on numerical algorithms and number theory.
- Resource Allocation — Manufacturing and commercial enterprises allocate scarce resources optimally. Examples: oil well placement, campaign advertising, airline crew scheduling, ISP resource placement. These are solved using linear programming (Chapter 29).
Specific Algorithmic Problems
- Shortest Path: Given a road map with distances, determine the shortest route between two intersections. Modeled as a graph problem — finding the shortest path from one vertex to another (Chapter 24).
- Longest Common Subsequence: Given two sequences X and Y, find a longest common subsequence. Uses dynamic programming (Chapter 15). Brute force would examine 2ᵐ and 2ⁿ possible subsequences — prohibitively slow.
- Topological Sorting: Given a mechanical design with parts that depend on other parts, list them in order so each part appears before any part that uses it. There are n! possible orders. Solved efficiently in Chapter 22.
- Convex Hull: Given n points in the plane, find the smallest convex polygon containing all points. Of the 2ⁿ subsets of points, we must determine which are vertices and in what order. Chapter 33 gives two good methods.
Common Characteristics of Algorithmic Problems
- They have many candidate solutions, the overwhelming majority of which do not solve the problem. Finding one that does, or one that is "best," can be a challenge.
- They have practical applications (e.g., shortest paths for transportation firms, Internet routing, GPS driving directions).
Discrete Fourier Transform
- Not every problem has an easily identified set of candidate solutions.
- The discrete Fourier transform converts the time domain to the frequency domain, producing numerical coefficients that determine the strength of various frequencies in a sampled signal.
- Applications: signal processing, data compression, multiplying large polynomials and integers.
- The fast Fourier transform (FFT) is an efficient algorithm for this problem (Chapter 30).
Data Structures
- A data structure is a way to store and organize data in order to facilitate access and modifications.
- No single data structure works well for all purposes — it is important to know the strengths and limitations of several of them.
Technique
- This book teaches techniques of algorithm design and analysis so you can:
- Develop algorithms on your own
- Show that they give the correct answer
- Understand their efficiency
- Some chapters address specific problems (medians in Ch. 9, minimum spanning trees in Ch. 23, maximum flow in Ch. 26).
- Other chapters address techniques: divide-and-conquer (Ch. 4), dynamic programming (Ch. 15), amortized analysis (Ch. 17).
Hard Problems (NP-Completeness)
- Most of the book is about efficient algorithms. The usual measure of efficiency is speed.
- Some problems have no known efficient solution — an interesting subset of these are called NP-complete problems (Chapter 34).
- Key properties of NP-complete problems:
- No efficient algorithm has ever been found for any NP-complete problem, but nobody has proven that an efficient algorithm cannot exist.
- If an efficient algorithm exists for any one NP-complete problem, then efficient algorithms exist for all of them.
- Several NP-complete problems are similar (but not identical) to problems that do have efficient algorithms. A small change in the problem statement can cause a big change in the efficiency of the best known algorithm.
- Practical importance: NP-complete problems arise surprisingly often in real applications. If you can show a problem is NP-complete, you can focus on developing an approximation algorithm that gives a good (but not necessarily optimal) solution.
- Traveling-Salesman Problem: A delivery company wants to find the order of delivery stops that minimizes total distance, returning to the depot at the end. This is NP-complete — no known efficient algorithm. However, approximation algorithms (Chapter 35) can give distances not too far above the optimal.
Parallelism
- Physical limitations (power density increases superlinearly with clock speed) prevent ever-increasing clock speeds — chips risk melting.
- Modern chips contain multiple processing cores — multicore computers are a type of parallel computer.
- To get the best performance from multicore computers, algorithms must be designed with parallelism in mind.
- Chapter 27 presents a model for multithreaded algorithms that take advantage of multiple cores.
1.2 Algorithms as a Technology
Why Algorithms Matter Even with Infinite Resources
- Even if computers were infinitely fast and memory were free, you would still need algorithms to demonstrate that your solution terminates and produces the correct answer.
- In reality, computing time is a bounded resource, and so is memory space. Efficient algorithms help use these resources wisely.
Efficiency
- Different algorithms for the same problem often differ dramatically in efficiency.
- These differences can be much more significant than differences due to hardware and software.
Insertion Sort vs. Merge Sort (Concrete Example)
- Insertion sort running time: roughly c₁n² (proportional to n²)
- Merge sort running time: roughly c₂n lg n (where lg n = log₂ n)
- Insertion sort typically has a smaller constant factor (c₁ < c₂), so it runs faster for small input sizes.
- Once input size n becomes large enough, merge sort's lg n vs. n factor more than compensates for the difference in constants.
- There will always be a crossover point beyond which merge sort is faster.
The Dramatic Comparison
| Computer A (Insertion Sort) | Computer B (Merge Sort) | |
|---|---|---|
| Speed | 10 billion instructions/sec | 10 million instructions/sec |
| Code | 2n² instructions (machine language, expert programmer) | 50n lg n instructions (high-level language, average programmer) |
| Sorting 10⁷ numbers | 20,000 seconds (~5.5 hours) | ~1,163 seconds (~20 minutes) |
- Computer B (1000× slower hardware) running merge sort is 17× faster than Computer A running insertion sort.
- For 100 million numbers: insertion sort takes more than 23 days, merge sort takes under 4 hours.
- As problem size increases, the relative advantage of the better algorithm grows.
Algorithms and Other Technologies
- Algorithms should be considered a technology, just like hardware.
- Total system performance depends on choosing efficient algorithms as much as on choosing fast hardware.
- Other advanced technologies (architectures, GUIs, object-oriented systems, web technologies, networking) are important, but many applications still require algorithmic content.
- Example: A web-based mapping/directions service relies on fast hardware, GUIs, and networking — but also requires algorithms for finding routes (shortest-path), rendering maps, and interpolating addresses.
- Even applications without explicit algorithmic content at the application level rely heavily on algorithms underneath:
- Hardware design uses algorithms
- GUI design relies on algorithms
- Network routing relies on algorithms
- Compilers, interpreters, and assemblers all make extensive use of algorithms
- Algorithms are at the core of most technologies used in contemporary computers.
- A solid base of algorithmic knowledge and technique is what separates truly skilled programmers from novices.
Next chapter
Getting Started