Overview
What is COS 226? Intermediate-level survey course. Programming and problem solving with applications. Algorithm: method for solving a problem. Data structure: method to store information.
! ! ! !
Topic
Data Structures and Algorithms stack, queue, list, union-find, priority queue quicksort, mergesort, heapsort, radix sorts hash table, BST, red-black tree, B-tree DFS, Prim, Kruskal, Dijkstra, Ford-Fulkerson KMP, Rabin-Karp, TST, Huffman, LZW Graham scan, k-d tree, Voronoi diagram
Algorithms and Data Structures Princeton University Fall 2005 Kevin Wayne
1
data types sorting searching graphs strings geometry
A misperception: algiros [painful] + arithmos [number].
2
Impact of Great Algorithms
Internet. Packet routing, Google, Akamai. Biology. Human genome project, protein folding. Computers. Circuit layout, file system, compilers. Secure communications. Cell phones, e-commerce. Computer graphics. Hollywood movies, video games. Multimedia. CD player, DVD, MP3, JPG, DivX, HDTV. Transportation. Airline crew scheduling, map routing. Physics. N-body simulation, particle collision simulation. Information processing. Database search, data compression. ...
"For me, great algorithms are the poetry of computation. Just like verse, they can be terse, allusive, dense, and even mysterious. But once unlocked, they cast a brilliant new light on some aspect of computing." - Francis Sullivan
Why Study Algorithms?
Using a computer? Want it to go faster? Process more data? Want it to do something that would otherwise be impossible?
! !
Algorithms as a field of study. Old enough that basics are known. New enough that new discoveries arise. Burgeoning application areas. Philosophical implications.
! ! ! !
The Usual Suspects
Lectures. Kevin Wayne (Kevin) MW 11-12:20, Bowen 222.
!
Coursework and Grading
Regular programming assignments: 45% Due 11:59pm, starting 9/26. More details next lecture.
! !
Precepts. Harlan Yu (Harlan), Keith Morley (Keith) T 12:30, Friend 110. T 3:30, Friend 111. Clarify programming assignments, exercises, lecture material. First precept meets 9/20.
! ! ! !
Weekly written exercises: 15% Due at beginning of Thursday lecture, starting 9/22.
!
Exams: Closed book with cheatsheet. Midterm. 15% Final. 25%
! ! !
Staff discretion. Adjust borderline cases.
Course Materials
Course web page. http://www.princeton.edu/~cos226 Syllabus. Exercises. Lecture slides. Programming assignments.
! ! ! !
Union Find
Algorithms in Java, 3rd edition. Parts 1-4. (sorting, searching) Part 5. (graph algorithms)
! !
Algorithms in C, 2nd edition. Strings and geometry handouts.
!
Reference: Chapter 1, Algorithms in Java, 3rd Edition, Robert Sedgewick.
Robert Sedgew i ck and Kevi n Wayne
Copyri ght 2005
http://w w w .Pri nceton.EDU/~cos226
Network Connectivity
1
An Example Problem: Network Connectivity
Network connectivity. Nodes at grid points. Add connections between pairs of nodes. Is there a path from node A to node B?
! ! !
in 3 4 4 9 8 0 2 3 5 6 2 9 5 9 7 3 4 8 5 6 0 2 6 1
out 3 4 4 9 8 0 2 3 5 6
evidence
6 (234-9)
5 9 7 3 4 8 (5-6) (23-48-0) 6 1 0 7 8 2 3 4
A
B
22 23
Union-Find Abstraction
What are critical operations we need to support? Objects.
!
Union-Find Abstraction
What are critical operations we need to support? Objects. Disjoint sets of objects. Find: are two objects in the same set? Union: replace sets containing two items by their union.
! ! ! !
grid points
Disjoint sets of objects.
0 1 2-3-9 5-6 7 4-8
subsets of connected grid points
Find: are objects 2 and 9 in the same set?
0 1 2-3-9 5-6 7 4-8
are two grid points connected?
Goal. Design efficient data structure for union and find. Number of operations M can be huge. Number of objects N can be huge.
! !
Union: merge sets containing 3 and 8.
0 1 2-3-4-8-9 7 8-4
add a connection between two grid points
24 25
Objects
Applications involve manipulating objects of all types. Variable name aliases. Pixels in a digital photo. Computers in a network. Web pages on the Internet. Transistors in a computer chip. Metallic sites in a composite system.
! ! ! ! ! !
Quick-Find [eager approach]
Data structure. Integer array id[] of size N. Interpretation: p and q are connected if they have the same id.
! !
i 0 id[i] 0
1 1
2 9
3 9
4 9
5 6
6 6
7 7
8 8
9 9
5 and 6 are connected 2, 3, 4, and 9 are connected
When programming, convenient to name them 0 to N-1. Details not relevant to union-find. Integers allow quick-access to object-related info (array indices).
! !
Find. Check if p and q have the same id.
id[3] = 9; id[6] = 6 3 and 6 not connected
Union. To merge components containing p and q, change all entries with id[p] to id[q].
i 0 id[i] 0 1 1 2 6 3 6 4 6 5 6 6 6 7 7 8 8 9 6
union of 3 and 6 2, 3, 4, 5, 6, and 9 are connected
many values can change
26 28
Quick-Find
Quick-Find: Java Implementation
3-4 4-9 8-0 2-3 5-6 5-9 7-3 4-8 6-1
0 1 2 4 4 5 6 7 8 9 0 1 2 9 9 5 6 7 8 9 0 1 2 9 9 5 6 7 0 9 0 1 9 9 9 5 6 7 0 9 0 1 9 9 9 6 6 7 0 9 0 1 9 9 9 9 9 7 0 9 0 1 9 9 9 9 9 9 0 9 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
29
public class QuickFind { private int[] id; public QuickFind(int N) { id = new int[N]; for (int i = 0; i < N; i++) id[i] = i; } public boolean find(int p, int q) { return id[p] == id[q]; } public void unite(int p, int q) { int pid = id[p]; for (int i = 0; i < id.length; i++) if (id[i] == pid) id[i] = id[q]; } }
set id of each object to itself
1 operation
N operations
30
Problem Size and Computation Time
Rough standard for 2000. 109 operations per second. 109 words of main memory. Touch all words in approximately 1 second. [unchanged since 1950!]
! ! !
Quick-Union
Data structure. Integer array id[] of size N. Interpretation: id[x] is parent of x. Root of x is id[id[id[...id[x]...]]].
! ! !
keep going until it doesn't change
Ex. Huge problem for quick find. 1010 edges connecting 109 nodes. Quick-find might take 1020 operations. [10 ops per query] 3,000 years of computer time!
! ! !
i 0 id[i] 0
1 1
2 9
3 4
4 9
5 6
6 6
7 7
8 8
9 9
1 2
9 4 3
6 5
Find. Check if p and q have the same root.
Paradoxically, quadratic algorithms get worse with newer equipment. New computer may be 10x as fast. But, has 10x as much memory so problem may be 10x bigger. With quadratic algorithm, takes 10x as long!
! ! !
3's root is 9; 5's root is 6 3 and 5 are not connected
Union. Set the id of q's root to the id of p's root.
i 0 id[i] 0 1 1 2 9 3 4 4 9 5 6 6 9 7 7 8 8 9 9
0 1 2
only one value changes
31
9 4
p
7 6 5
q
33
Quick-Union
Quick-Union: Java Implementation
public class QuickUnion { private int[] id; public QuickUnion(int N) { id = new int[N]; for (int i = 0; i < N; i++) id[i] = i; } private int root(int x) { while (x != id[x]) x = id[x]; return x; } public boolean find(int p, int q) { return root(p) == root(q); } public void unite(int p, int q) { int i = root(p), j = root(q); if (i == j) return; id[i] = j; } }
34 35
3-4 4-9 8-0 2-3 5-6 5-9 7-3 4-8 6-1
0 1 2 4 4 5 6 7 8 9 0 1 2 4 9 5 6 7 8 9 0 1 2 4 9 5 6 7 0 9 0 1 9 4 9 5 6 7 0 9 0 1 9 4 9 6 6 7 0 9 0 1 9 4 9 6 9 7 0 9 0 1 9 4 9 6 9 9 0 9 0 1 9 4 9 6 9 9 0 0 1 1 9 4 9 6 9 9 0 0
time proportional to depth of x
time proportional to depth of p and q
time proportional to depth of p and q
Summary
Quick-find defect. Union too expensive. Trees are flat, but too hard to keep them flat.
! !
Weighted Quick-Union
Weighted quick-union. Modify quick-union to avoid tall trees. Keep track of size of each component. Balance by linking small tree below large one.
! ! !
Quick-union defect. Finding the root can be expensive. Trees can get tall.
! !
Ex: union of 5 and 3. Quick union: link 9 to 6. Weighted quick union: link 6 to 9.
! !
Data Structure Quick-find Quick-union
Union N 1
Find 1 N
size
1 2
9 4 3
q
6 5
p
union of two root nodes
36
37
Weighted Quick-Union
Weighted Quick-Union: Java Implementation
Java implementation. Almost identical to quick-union. Maintain extra array sz[] to count number of elements in the tree rooted at i.
! !
3-4 4-9 8-0 2-3 5-6 5-9 7-3 4-8 6-1
0 1 2 3 3 5 6 7 8 9 0 1 2 3 3 5 6 7 8 3 8 1 2 3 3 5 6 7 8 3 8 1 3 3 3 5 6 7 8 3 8 1 3 3 3 5 5 7 8 3 8 1 3 3 3 3 5 7 8 3 8 1 3 3 3 3 5 3 8 3 8 1 3 3 3 3 5 3 3 3 8 3 3 3 3 3 5 3 3 3
38
Find. Identical to quick-union. Union. Same as quick-union, but merge smaller tree into the larger tree, and update the sz[] array.
if (sz[i] < sz[j]) { id[i] = j; sz[j] += sz[i]; } else { id[j] = i; sz[i] += sz[j]; }
39
Weighted Quick-Union: Analysis
Analysis. Find: takes time proportional to depth of p and q. Union: takes constant time, given roots. Fact: depth is at most 1 + lg N. [needs proof]
! ! !
Path Compression
Path compression. Just after computing the root of x, set id of each examined node to root(x).
0 Data Structure Quick-find Quick-union Weighted QU Union N 1
Find 1 N lg N 6 3
lg N
10
11
Stop at guaranteed acceptable performance? No, can improve further.
8 9
root(9)
10
40
11
41
Weighted Quick-Union with Path Compression
Path compression. Standard implementation: add second loop to root to set the id of each examined node to the root. Simpler one-pass variant: make each examined node point to its grandparent.
! !
Weighted Quick-Union with Path Compression
3-4 4-9 8-0 2-3
0 1 2 3 3 5 6 7 8 9 0 1 2 3 3 5 6 7 8 3 8 1 2 3 3 5 6 7 8 3 8 1 3 3 3 5 6 7 8 3 8 1 3 3 3 5 5 7 8 3 8 1 3 3 3 3 5 7 8 3 8 1 3 3 3 3 5 3 8 3 8 1 3 3 3 3 5 3 3 3 8 3 3 3 3 3 3 3 3 3
public int root(int x) { while (x != id[x]) { id[x] = id[id[x]]; x = id[x]; } return x; }
only one extra line of code !
5-6 5-9 7-3 4-8
In practice. No reason not to! Keeps tree almost completely flat.
6-1
42
43
Weighted Quick-Union with Path Compression
Theorem. A sequence of M union and find operations on N elements takes O(N + M lg* N) time. Proof is very difficult. But the algorithm is still simple!
! !
Context
Ex. Huge practical problem. 1010 edges connecting 109 nodes. WQUPC reduces time from 3,000 years to 1 minute. Supercomputer wouldn't help much. Good algorithm makes solution possible.
! ! ! !
N 2 4 16 65536 265536
lg* N 1 2 3 4 5
Remark. lg* N is a constant in this universe.
Bottom line. WQUPC on cell phone beats QF on supercomputer!
Linear algorithm? Cost within constant factor of reading in the data. Theory: WQUPC is not quite linear. Practice: WQUPC is linear.
! ! !
Algorithm Quick-find Quick-union Weighted QU M union-find ops on a set of N elements Path compression Weighted + path
Time MN MN N + M log N N + M log N 5 (M + N)
44
45
Other Applications
Applications
Union-find applications. Hex. Percolation. Connectivity. Image processing. Least common ancestor. Equivalence of finite state automata. Hinley-Milner polymorphic type inference. Kruskal's minimum spanning tree algorithm. Compiling equivalence statements in Fortran. Scheduling unit-time tasks to P processors so that each job finishes between its release time and deadline.
! ! ! ! ! ! ! ! ! !
Robert Sedgew i ck and Kevi n Wayne
Copyri ght 2005
http://w w w .Pri nceton.EDU/~cos226
47
Hex
Hex. [Piet Hein 1942, John Nash 1948, Parker Brothers 1962] Two players alternate in picking a cell in a hex grid. Black: make a black path from upper left to lower right. White: make a white path from lower left to upper right.
! ! !
Percolation
Percolation phase-transition. Two parallel conducting bars (top and bottom). Electricity flows from a site to one of its 4 neighbors if both are occupied by conductors. Suppose each site is randomly chosen to be a conductor or insulator with probability p.
! ! !
top
0 2
0 3
0 4 0
0 0 0
0 6 0
0 0 0
0 8
0 9
0 0 0 0 49 1 1 insulator
10 11 12
14 15
20 21 22 23 24
14 14 28 29 30 31 32 33 34 35 36 14 39 40
Reference: http://mathw orld.w olfram.com/GameofHex.html
1 1 1
42 43 32 45 46 54 55 56 57 58 1 1 1 1 1
1 1 1
1 1 1
50 1
1 1
52 1
Goal. Algorithm to detect when a player has won.
48
bottom
49
Percolation
Q. What is percolation threshold p* at which charge carriers can percolate from top to bottom? A. ~ 0.592746 for square lattices. [constant only known via simulation]
Summary
Lessons. Simple algorithms can be very useful. Start with brute force approach. don't use for large problems might be nontrivial to analyze can't use for huge problems Strive for worst-case performance guarantees. Identify fundamental abstractions: union-find. Apply to many domains.
! ! ! ! !
top
0 2
0 3
0 4 0
0 0 0
0 6 0
0 0 0
0 8
0 9
0 0 0 0 49 1 1 insulator
10 11 12
14 15
20 21 22 23 24
14 14 28 29 30 31 32 33 34 35 36 14 39 40 50 1 1 1 52 1 1 1 1 42 43 32 45 46 54 55 56 57 58 1 1 1 1 1 1 1 1 1 1 1
bottom
50
51