0% found this document useful (0 votes)
54 views168 pages

Full Cse 408 Unit 3

floyd and warshall theorem daa

Uploaded by

PRINCE RAJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views168 pages

Full Cse 408 Unit 3

floyd and warshall theorem daa

Uploaded by

PRINCE RAJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 168

CSE408

Floyd and warshal


binomial coff
Lecture # 22
All-Pairs Shortest Path Problem

Suppose we are given a directed graph G=(V,E)


and a weight function w: E->R.
We assume that G does not contain cycles of
weight 0 or less.

The All-Pairs Shortest Path Problem asks to find


the length of the shortest path between any pair
of vertices in G.

2
Quick Solutions

If the weight function is nonnegative for all


edges, then we can use Dijkstra’s single source
shortest path algorithm for all vertices to solve
the problem.

This yields an O(n3) algorithm on graphs with n


vertices (on dense graphs and with a simple
implementation).

3
Quick Solution

For arbitrary weight functions, we can use the


Bellman-Ford algorithm applied to all vertices.
This yields an O(n4) algorithm for graphs with n
vertices.

4
Floyd-Warshall

We will now investigate a dynamic programming


solution that solved the problem in O(n3) time
for a graph with n vertices.
This algorithm is known as the Floyd-Warshall
algorithm, but it was apparently described
earlier by Roy.

5
Representation of the Input

We assume that the input is represented by a


weight matrix W= (wij)i,j in E that is defined by

wij= 0 if i=j
wij= w(i,j) if ij and (i,j) in E
wij=  if ij and (i,j) not in E

6
Format of the Output

If the graph has n vertices, we return a distance


matrix (dij), where dij the length of the path from
i to j.

7
Intermediate Vertices

Without loss of generality, we will assume that


V={1,2,…,n}, i.e., that the vertices of the graph
are numbered from 1 to n.

Given a path p=(v1, v2,…, vm) in the graph, we


will call the vertices vk with index k in {2,…,m-1}
the intermediate vertices of p.

8
Key Definition

The key to the Floyd-Warshall algorithm is the


following definition:

Let dij(k) denote the length of the shortest path


from i to j such that all intermediate vertices are
contained in the set {1,…,k}.

9
Remark 1

A shortest path does not contain any vertex


twice, as this would imply that the path contains
a cycle. By assumption, cycles in the graph have
a positive weight, so removing the cycle would
result in a shorter path, which is impossible.

10
Remark 2

Consider a shortest path p from i to j such that


the intermediate vertices are from the set
{1,…,k}.
• If the vertex k is not an intermediate vertex on
p, then dij(k) = dij(k-1)
If the vertex k is an intermediate vertex on p,
then dij(k) = dik(k-1) + dkj(k-1)
Interestingly, in either case, the subpaths contain merely nodes
from {1,…,k-1}.

11
Remark 2

Therefore, we can conclude that

dij(k) = min{dij(k-1) , dik(k-1) + dkj(k-1)}

12
Recursive Formulation

If we do not use intermediate nodes, i.e., when


k=0, then
dij(0) = wij
If k>0, then
dij(k) = min{dij(k-1) , dik(k-1) + dkj(k-1)}

13
The Floyd-Warshall Algorithm

Floyd-Warshall(W)
n = # of rows of W;
D(0) = W;
for k = 1 to n do
for i = 1 to n do
for j = 1 to n do
dij(k) = min{dij(k-1) , dik(k-1) + dkj(k-1)};
od;
od;
od;
return D(n);
14
Time and Space Requirements

The running time is obviously O(n3).

However, in this version, the space requirements


are high. One can reduce the space from O(n3)
to O(n2) by using a single array d.

15
CSE408
Optimal binary search
tree and
Knapsack problem
Lecture # 23
Optimal binary search trees

• e.g. binary search trees for 3, 7, 9, 12;

3 7 7 12

3 12 3 9 9
7

9 9 12 3

7
12
(a) (b) (c) (d)

8 -2
Optimal binary search trees

• n identifiers : a1 <a2 <a3 <…< an


Pi, 1in : the probability that ai is searched.
Qi, 0in : the probability that x is searched
where ai < x < ai+1 (a0=-, an+1=).
n n

 P + Q
i =1
i
i =1
i =1

8 -3
10
• Identifiers : 4, 5, 8, 10, 11,
12, 14
5 14
• Internal node : successful
search, Pi
4 8 11 E7
• External node : unsuccessful
E0 E1 E2 E3 E4 12 search, Qi

E5 E6

◼ The expected cost of a binary tree:


n n

 P  level(a ) +  Q  (level(E ) − 1)
=1
i i i i
◼ Thenlevel of the root : 1 n =0

8 -4
The dynamic programming approach

• Let C(i, j) denote the cost of an optimal binary search


tree containing ai,…,aj .
• The cost of the optimal binary search tree with ak as
its root :
  k −1
  n

C(1, n) = min Pk + Q 0 +  (Pi + Q i ) + C(1, k − 1) + Q k +  (Pi + Q i ) + C(k + 1, n ) 
1 k  n
  i =1   i = k +1 

ak
P1...Pk-1 Pk+1...Pn
Q0...Qk-1 Qk...Qn

a1...ak-1 ak+1...an

C(1,k-1) C(k+1,n) 8 -5
General formula

  k −1

C(i, j) = min  Pk + Qi-1 +  (Pm + Q m ) + C(i, k − 1)
i k  j
  m =i 
 j

+ Q k +  (Pm + Q m ) + C(k + 1, j) 
 m = k +1 
 j

= min C(i, k − 1) + C(k + 1, j) + Qi-1 +  (Pm + Q m )
i k  j
 m =i 
ak
P1...Pk-1 Pk+1...Pn
Q0...Qk-1 Qk...Qn

a1...ak-1 ak+1 ...an

C(1,k-1) C(k+1,n) 8 -6
Computation relationships of sub trees

• e.g. n=4
C(1,4)

C(1,3) C(2,4)

C(1,2) C(2,3) C(3,4)


• Time complexity : O(n3)
when j-i=m, there are (n-m) C(i, j)’s to compute.
Each C(i, j) with j-i=m can be computed in O(m) time.

O(  m(n − m)) = O(n )


1 m  n
3

8 -7
Knapsack problem

There are two versions of the problem:


1. “0-1 knapsack problem”
• Items are indivisible; you either take an item or not. Some
special instances can be solved with dynamic programming

2. “Fractional knapsack problem”


• Items are divisible: you can take any fraction of an item
0-1 Knapsack problem

• Given a knapsack with maximum capacity W,


and a set S consisting of n items
• Each item i has some weight wi and benefit
value bi (all wi and W are integer values)
• Problem: How to pack the knapsack to achieve
maximum total value of packed items?
0-1 Knapsack problem

• Problem, in other words, is to find


max  bi subject to  wi  W
iT iT

The problem is called a “0-1” problem,


because each item must be entirely
accepted or rejected.
0-1 Knapsack problem: brute-force approach

Let’s first solve this problem with a


straightforward algorithm
• Since there are n items, there are 2n
possible combinations of items.
• We go through all combinations and find
the one with maximum value and with total
weight less or equal to W
• Running time will be O(2n)
0-1 Knapsack problem: dynamic programming pproach

• We can do better with an algorithm based on


dynamic programming

• We need to carefully identify the subproblems


Defining a Subproblem

• Given a knapsack with maximum capacity W,


and a set S consisting of n items
• Each item i has some weight wi and benefit
value bi (all wi and W are integer values)
• Problem: How to pack the knapsack to achieve
maximum total value of packed items?
Defining a Sub problem

• We can do better with an algorithm based on


dynamic programming

• We need to carefully identify the subproblems

Let’s try this:


If items are labeled 1..n, then a subproblem
would be to find an optimal solution for
Sk = {items labeled 1, 2, .. k}
CSE408
Matrix Chain
Multiplication
Lecture # 24
Matrix-chain Multiplication

• Suppose we have a sequence or chain A1, A2,


…, An of n matrices to be multiplied
– That is, we want to compute the product
A1A2…An

• There are many possible ways


(parenthesizations) to compute the product

11-2
Matrix-chain Multiplication …contd

• Example: consider the chain A1, A2, A3, A4 of


4 matrices
– Let us compute the product A1A2A3A4
• There are 5 possible ways:
1. (A1(A2(A3A4)))
2. (A1((A2A3)A4))
3. ((A1A2)(A3A4))
4. ((A1(A2A3))A4)
5. (((A1A2)A3)A4)
11-3
Matrix-chain Multiplication …contd

• To compute the number of scalar


multiplications necessary, we must know:
– Algorithm to multiply two matrices
– Matrix dimensions

• Can you write the algorithm to multiply two


matrices?

11-4
Algorithm to Multiply 2 Matrices
Input: Matrices Ap×q and Bq×r (with dimensions p×q and q×r)
Result: Matrix Cp×r resulting from the product A·B

MATRIX-MULTIPLY(Ap×q , Bq×r)
1. for i ← 1 to p
2. for j ← 1 to r
3. C[i, j] ← 0
4. for k ← 1 to q
5. C[i, j] ← C[i, j] + A[i, k] · B[k, j]
6. return C
Scalar multiplication in line 5 dominates time to compute
CNumber of scalar multiplications = pqr
11-5
Matrix-chain Multiplication …contd

• Example: Consider three matrices A10100,


B1005, and C550
• There are 2 ways to parenthesize
– ((AB)C) = D105 · C550
• AB  10·100·5=5,000 scalar multiplications Total:
• DC  10·5·50 =2,500 scalar multiplications 7,500
– (A(BC)) = A10100 · E10050
• BC  100·5·50=25,000 scalar multiplications
• AE  10·100·50 =50,000 scalar multiplications
Total:
11-6
75,000
Matrix-chain Multiplication …contd

• Matrix-chain multiplication problem


– Given a chain A1, A2, …, An of n matrices, where
for i=1, 2, …, n, matrix Ai has dimension pi-1pi
– Parenthesize the product A1A2…An such that the
total number of scalar multiplications is
minimized
• Brute force method of exhaustive search
takes time exponential in n

11-7
Dynamic Programming Approach

• The structure of an optimal solution


– Let us use the notation Ai..j for the matrix that
results from the product Ai Ai+1 … Aj
– An optimal parenthesization of the product
A1A2…An splits the product between Ak and Ak+1
for some integer k where1 ≤ k < n
– First compute matrices A1..k and Ak+1..n ; then
multiply them to get the final matrix A1..n

11-8
Dynamic Programming Approach …contd

– Key observation: parenthesizations of the


subchains A1A2…Ak and Ak+1Ak+2…An must also be
optimal if the parenthesization of the chain
A1A2…An is optimal (why?)

– That is, the optimal solution to the problem


contains within it the optimal solution to
subproblems

11-9
Dynamic Programming Approach …contd

• Recursive definition of the value of an


optimal solution
– Let m[i, j] be the minimum number of scalar
multiplications necessary to compute Ai..j
– Minimum cost to compute A1..n is m[1, n]
– Suppose the optimal parenthesization of Ai..j
splits the product between Ak and Ak+1 for some
integer k where i ≤ k < j

11-10
Dynamic Programming Approach …contd

– Ai..j = (Ai Ai+1…Ak)·(Ak+1Ak+2…Aj)= Ai..k · Ak+1..j


– Cost of computing Ai..j = cost of computing Ai..k +
cost of computing Ak+1..j + cost of multiplying Ai..k
and Ak+1..j
– Cost of multiplying Ai..k and Ak+1..j is pi-1pk pj

– m[i, j ] = m[i, k] + m[k+1, j ] + pi-1pk pj for i ≤


k<j
– m[i, i ] = 0 for i=1,2,…,n

11-11
Dynamic Programming Approach …contd

– But… optimal parenthesization occurs at one


value of k among all possible i ≤ k < j
– Check all these and select the best one

0 if i=j
m[i, j ] =
min {m[i, k] + m[k+1, j ] + pi-1pk pj } if i<j
i ≤ k< j

11-12
Dynamic Programming Approach …contd

• To keep track of how to construct an optimal


solution, we use a table s
• s[i, j ] = value of k at which Ai Ai+1 … Aj is split
for optimal parenthesization
• Algorithm: next slide
– First computes costs for chains of length l=1
– Then for chains of length l=2,3, … and so on
– Computes the optimal cost bottom-up

11-13
Algorithm to Compute Optimal Cost
Input: Array p[0…n] containing matrix dimensions and n
Result: Minimum-cost table m and split table s

MATRIX-CHAIN-ORDER(p[ ], n)
for i ← 1 to n Takes O(n3) time
m[i, i] ← 0
Requires O(n 2) space
for l ← 2 to n
for i ← 1 to n-l+1
j ← i+l-1
m[i, j] ← 
for k ← i to j-1
q ← m[i, k] + m[k+1, j] + p[i-1] p[k] p[j]
if q < m[i, j]
m[i, j] ← q
s[i, j] ← k
return m and s
11-14
Constructing Optimal Solution

• Our algorithm computes the minimum-cost


table m and the split table s
• The optimal solution can be constructed
from the split table s
– Each entry s[i, j ]=k shows where to split the
product Ai Ai+1 … Aj for the minimum cost

11-15
Example

• Show how to multiply this Matrix Dimension


matrix chain optimally
A1 30×35

• Solution on the board A2 35×15


– Minimum cost 15,125 A3 15×5
– Optimal parenthesization
((A1(A2A3))((A4 A5)A6)) A4 5×10
A5 10×20
A6 20×25

11-16
CSE408
Longest Common Sub
Sequence
Lecture # 25
Dynamic programming

• It is used, when the solution can be


recursively described in terms of solutions
to subproblems (optimal substructure)
• Algorithm finds solutions to subproblems
and stores them in memory for later use
• More efficient than “brute-force methods”,
which solve the same subproblems over
and over again
2
Longest Common Subsequence (LCS)

Application: comparison of two DNA strings


Ex: X= {A B C B D A B }, Y= {B D C A B A}
Longest Common Subsequence:
X= AB C BDAB
Y= B D CAB A
Brute force algorithm would compare each
subsequence of X with the symbols in Y
2/27/2024 3
LCS Algorithm
• if |X| = m, |Y| = n, then there are 2m
subsequences of x; we must compare each
with Y (n comparisons)
• So the running time of the brute-force
algorithm is O(n 2m)
• Notice that the LCS problem has optimal
substructure: solutions of subproblems are
parts of the final solution.
• Subproblems: “find LCS of pairs of prefixes
of X and Y”
2/27/2024 4
LCS Algorithm
• First we’ll find the length of LCS. Later we’ll
modify the algorithm to find LCS itself.
• Define Xi, Yj to be the prefixes of X and Y of
length i and j respectively
• Define c[i,j] to be the length of LCS of Xi and
Yj
• Then the length of LCS of X and Y will be
c[m,n]
c[i − 1, j − 1] + 1 if x[i ] = y[ j ],
c[i, j ] = 
 max(c[i, j − 1], c[i − 1, j ]) otherwise
2/27/2024 5
LCS recursive solution
c[i − 1, j − 1] + 1 if x[i] = y[ j ],
c[i, j ] = 
 max( c[i, j − 1], c[i − 1, j ]) otherwise
• We start with i = j = 0 (empty substrings of x
and y)
• Since X0 and Y0 are empty strings, their LCS
is always empty (i.e. c[0,0] = 0)
• LCS of empty string and any other string is
empty, so for every i and j: c[0, j] = c[i,0] = 0
2/27/2024 6
LCS recursive solution
c[i − 1, j − 1] + 1 if x[i] = y[ j ],
c[i, j ] = 
 max( c[i, j − 1], c[i − 1, j ]) otherwise
• When we calculate c[i,j], we consider two
cases:
• First case: x[i]=y[j]: one more symbol in
strings X and Y matches, so the length of LCS
Xi and Yj equals to the length of LCS of
smaller strings Xi-1 and Yi-1 , plus 1
2/27/2024 7
LCS recursive solution
c[i − 1, j − 1] + 1 if x[i] = y[ j ],
c[i, j ] = 
 max( c[i, j − 1], c[i − 1, j ]) otherwise

• Second case: x[i] != y[j]


• As symbols don’t match, our solution is not
improved, and the length of LCS(Xi , Yj) is the
same as before (i.e. maximum of LCS(Xi, Yj-1)
and LCS(Xi-1,Yj)

Why not just take the length of LCS(Xi-1, Yj-1) ?


2/27/2024 8
LCS Length Algorithm
LCS-Length(X, Y)
1. m = length(X) // get the # of symbols in X
2. n = length(Y) // get the # of symbols in Y
3. for i = 1 to m c[i,0] = 0 // special case: Y0
4. for j = 1 to n c[0,j] = 0 // special case: X0
5. for i = 1 to m // for all Xi
6. for j = 1 to n // for all Yj
7. if ( Xi == Yj )
8. c[i,j] = c[i-1,j-1] + 1
9. else c[i,j] = max( c[i-1,j], c[i,j-1] )
10. return c
2/27/2024 9
LCS Example
We’ll see how LCS algorithm works on the
following example:
• X = ABCB
• Y = BDCAB

What is the Longest Common Subsequence


of X and Y?

LCS(X, Y) = BCB
X=AB C B
Y= BDCAB 10
ABCB
LCS Example (0) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi

A
1

2 B

3 C

4 B

X = ABCB; m = |X| = 4
Y = BDCAB; n = |Y| = 5
Allocate array c[5,4]
11
ABCB
LCS Example (1) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0

2 B
0

3 C 0

4 B 0

for i = 1 to m c[i,0] = 0
for j = 1 to n c[0,j] = 0
12
ABCB
LCS Example (2) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0

2 B
0

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

2/27/2024 13
ABCB
LCS Example (3) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0

2 B
0

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

2/27/2024 14
ABCB
LCS Example (4) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1

2 B
0

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

2/27/2024 15
ABCB
LCS Example (5) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

2/27/2024 16
ABCB
LCS Example (6) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

2/27/2024 17
ABCB
LCS Example (7) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

2/27/2024 18
ABCB
LCS Example (8) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

2/27/2024 19
ABCB
LCS Example (10) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

2/27/2024 20
ABCB
LCS Example (11) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

2/27/2024 21
ABCB
LCS Example (12) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2 2 2

4 B 0

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

2/27/2024 22
ABCB
LCS Example (13) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2 2 2

4 B 0 1

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

2/27/2024 23
ABCB
LCS Example (14) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2 2 2

4 B 0 1 1 2 2

if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

2/27/2024 24
ABCB
LCS Example (15) BDCAB
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2 2 2

4 B 0 1 1 2 2 3
if ( Xi == Yj )
c[i,j] = c[i-1,j-1] + 1
else c[i,j] = max( c[i-1,j], c[i,j-1] )

2/27/2024 25
LCS Algorithm Running Time

• LCS algorithm calculates the values of each


entry of the array c[m,n]
• So what is the running time?

O(m*n)
since each c[i,j] is calculated in
constant time, and there are m*n
elements in the array
2/27/2024 26
How to find actual LCS
• So far, we have just found the length of LCS,
but not LCS itself.
• We want to modify this algorithm to make it
output Longest Common Subsequence of X
and Y
Each c[i,j] depends on c[i-1,j] and c[i,j-1]
or c[i-1, j-1]
For each c[i,j] we can say how it was acquired:
2 2 For example, here
2 3 c[i,j] = c[i-1,j-1] +1 = 2+1=3
27
How to find actual LCS - continued
• Remember that
c[i − 1, j − 1] + 1 if x[i] = y[ j ],
c[i, j ] = 
 max( c[i, j − 1], c[i − 1, j ]) otherwise

So we can start from c[m,n] and go backwards


Whenever c[i,j] = c[i-1, j-1]+1, remember x[i]
(because x[i] is a part of LCS)
When i=0 or j=0 (i.e. we reached the
beginning), output remembered letters in
reverse order
2/27/2024 28
Finding LCS
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2 2 2

4 B 0 1 1 2 2 3

2/27/2024 29
Finding LCS (2)
j 0 1 2 3 4 5
i Yj B D C A B

0 Xi
0 0 0 0 0 0

A
1 0 0 0 0 1 1

2 B
0 1 1 1 1 2

3 C 0 1 1 2 2 2

4 B 0 1 1 2 2 3
LCS (reversed order): B C B
LCS (straight order): B C B
(this string turned out to be a palindrome) 30
CSE408
Bubble sort
Maximum & Minimum
Lecture #16
Why Study Sorting Algorithms?

• There are a variety of situations that we can


encounter
– Do we have randomly ordered keys?
– Are all keys distinct?
– How large is the set of keys to be ordered?
– Need guaranteed performance?

• Various algorithms are better suited to some


of these situations
Some Definitions

• Internal Sort
– The data to be sorted is all stored in the
computer’s main memory.
• External Sort
– Some of the data to be sorted might be stored in
some external, slower, device.
• In Place Sort
– The amount of extra space required to sort the
data is constant with the input size.
Bubble Sort

• Idea:
– Repeatedly pass through the array
– Swaps adjacent elements that are out of order
i
1 2 3 n

8 4 6 9 2 3 1
j

• Easier to implement, but slower than Insertion


sort
Example
8 4 6 9 2 3 1 1 8 4 6 9 2 3
i=1 j i=2 j

8 4 6 9 2 1 3 1 2 8 4 6 9 3
i=1 j i=3 j

8 4 6 9 1 2 3 1 2 3 8 4 6 9
i=1 j i=4 j

8 4 6 1 9 2 3 1 2 3 4 8 6 9
i=1 j i=5 j

8 4 1 6 9 2 3 1 2 3 4 6 8 9
i=1 j i=6 j

8 1 4 6 9 2 3 1 2 3 4 6 8 9
i=1 j i=7
j
1 8 4 6 9 2 3
i=1 j
Bubble Sort

Alg.: BUBBLESORT(A)
for i  1 to length[A]
do for j  length[A] downto i + 1
do if A[j] < A[j -1]
i then exchange A[j]  A[j-1]
8 4 6 9 2 3 1
i=1 j
Bubble-Sort Running Time
Alg.: BUBBLESORT(A)
for i  1 to length[A] c1
do for j  length[A] downto i + 1 c2
Comparisons:  n2/2 do if A[j] < A[j -1] c3
Exchanges:  n2/2
then exchange A[j]  A[j-1] c4
n

 (n − i )
n n
T(n) = c1(n+1) + c2  (n − i + 1) + c3  (n − i ) + c4
i =1 i =1 i =1
n
= (n) + (c2 + c2 + c4)  (n − i )
i =1
n n n
n ( n + 1) n 2
n
where  (n − i ) = n −  i = n 2 − = −
i =1 i =1 i =1 2 2 2
Thus,T(n) = (n2)
Selection Sort
• Idea:
– Find the smallest element in the array
– Exchange it with the element in the first position
– Find the second smallest element and exchange it
with the element in the second position
– Continue until the array is sorted
• Disadvantage:
– Running time depends only slightly on the amount
of order in the file
Example
8 4 6 9 2 3 1 1 2 3 4 9 6 8

1 4 6 9 2 3 8 1 2 3 4 6 9 8

1 2 6 9 4 3 8 1 2 3 4 6 8 9

1 2 3 9 4 6 8 1 2 3 4 6 8 9
Selection Sort

Alg.: SELECTION-SORT(A) 8 4 6 9 2 3 1

n ← length[A]
for j ← 1 to n - 1
do smallest ← j
for i ← j + 1 to n
do if A[i] < A[smallest]
then smallest ← i
exchange A[j] ↔ A[smallest]
Analysis of Selection Sort
Alg.: SELECTION-SORT(A)
cost times
n ← length[A]
c1 1
for j ← 1 to n - 1
c2 n
do smallest ← j
c3 n-1
n2/2 for i ← j + 1 to n c4 nj=−11 (n − j + 1)
comparisons
do if A[i] < A[smallest] c5 
n −1
j =1
(n − j )
n
exchanges then smallest ← i c6 
n −1
j =1
(n − j )

c7
exchange A[j] ↔ A[smallest] n-1
n −1 n −1 n −1
T ( n) = c1 + c2 n + c3 (n − 1) + c4  (n − j + 1) + c5  ( n − j ) + c6  ( n − j ) + c7 (n − 1) = (n 2 )
j =1 j =1 j =2
MINIMUM AND MAXIMUM
CSE408
Count, Radix & Bucket
Sort
Lecture #17
How Fast Can We Sort?

• Selection Sort, Bubble Sort, Insertion Sort: O(n2)

• Heap Sort, Merge sort: O(nlgn)

• Quicksort: O(nlgn) - average

• What is common to all these algorithms?


– Make comparisons between input elements

ai < aj, ai ≤ aj, ai = aj, ai ≥ aj, or ai > aj

2
Lower-Bound for Sorting

• Theorem: To sort n elements, comparison sorts must


make (nlgn) comparisons in the worst case.

(see CS477 for a proof)

3
Can we do better?

• Linear sorting algorithms


– Counting Sort

– Radix Sort

– Bucket sort

• Make certain assumptions about the data

• Linear sorts are NOT “comparison sorts”


4
Counting Sort
• Assumptions:
– n integers which are in the range [0 ... r]
– r is in the order of n, that is, r=O(n)
• Idea:
– For each element x, find the number of elements  x
– Place x into its correct position in the output array

output array

5
Step 1

(i.e., frequencies)

(r=6)

6
Step 2

C (frequencies) Cnew (cumulative sums)

7
Algorithm

• Start from the last element of A

• Place A[i] at its correct place in the output array

• Decrease C[A[i]] by one

1 2 3 4 5 6 7 8

A 2 5 3 0 2 3 0 3
0 1 2 3 4 5

Cnew 2 2 4 7 7 8
8
Example
1 2 3 4 5 6 7 8 0 1 2 3 4 5

A 2 5 3 0 2 3 0 3 Cnew 2 2 4 7 7 8

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

B 3 B 0 3
0 1 2 3 4 5
0 1 2 3 4 5

Cnew 2 2 4 6 7 8 Cnew 1 2 4 6 7 8

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

B 0 3 3 B 0 2 3 3
0 1 2 3 4 5
0 1 2 3 4 5

Cnew 1 2 4 5 7 8 Cnew 1 2 3 5 7 8
9
Example (cont.)
1 2 3 4 5 6 7 8

A 2 5 3 0 2 3 0 3

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

B 0 0 2 3 3 B 0 0 2 3 3 3 5
0 1 2 3 4 5 0 1 2 3 4 5

C 0 2 3 5 7 8 C 0 2 3 4 7 7

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

B 0 0 2 3 3 3 B 0 0 2 2 3 3 3 5
0 1 2 3 4 5

C 0 2 3 4 7 8

10
COUNTING-SORT
1 j n

A
Alg.: COUNTING-SORT(A, B, n, k) 0 k

1. for i ← 0 to r C
2. do C[ i ] ← 0 B
1 n

3. for j ← 1 to n
4. do C[A[ j ]] ← C[A[ j ]] + 1
5. C[i] contains the number of elements equal to i
6. for i ← 1 to r
7. do C[ i ] ← C[ i ] + C[i -1]
8. C[i] contains the number of elements ≤ i
9. for j ← n downto 1
10. do B[C[A[ j ]]] ← A[ j ]
11. C[A[ j ]] ← C[A[ j ]] - 1
11
Analysis of Counting Sort
Alg.: COUNTING-SORT(A, B, n, k)
1. for i ← 0 to r O(r)
2. do C[ i ] ← 0
3. for j ← 1 to n
O(n)
4. do C[A[ j ]] ← C[A[ j ]] + 1
5. C[i] contains the number of elements equal to i
6. for i ← 1 to r
O(r)
7. do C[ i ] ← C[ i ] + C[i -1]
8. C[i] contains the number of elements ≤ i
9. for j ← n downto 1
10. do B[C[A[ j ]]] ← A[ j ] O(n)

11. C[A[ j ]] ← C[A[ j ]] - 1


Overall time: O(n + r) 12
Analysis of Counting Sort

• Overall time: O(n + r)

• In practice we use COUNTING sort when r = O(n)

 running time is O(n)

13
Radix Sort
• Represents keys as d-digit numbers in some
base-k
key = x1x2...xd where 0≤xi≤k-1

• Example: key=15

key10 = 15, d=2, k=10 where 0≤xi≤9

key2 = 1111, d=4, k=2 where 0≤xi≤1


14
Radix Sort
• Assumptions
d=O(1) and k =O(n)
• Sorting looks at one column at a time
– For a d digit number, sort the least significant
digit first
– Continue sorting on the next least significant
digit, until all digits have been sorted
– Requires only d passes through the list

15
RADIX-SORT
Alg.: RADIX-SORT(A, d)
for i ← 1 to d
do use a stable sort to sort array A on digit i
(stable sort: preserves order of identical elements)

16
Analysis of Radix Sort

• Given n numbers of d digits each, where each

digit may take up to k possible values, RADIX-

SORT correctly sorts the numbers in O(d(n+k))

– One pass of sorting per digit takes O(n+k) assuming

that we use counting sort

– There are d passes (for each digit)

17
Analysis of Radix Sort

• Given n numbers of d digits each, where each

digit may take up to k possible values, RADIX-

SORT correctly sorts the numbers in O(d(n+k))

– Assuming d=O(1) and k=O(n), running time is O(n)

18
Bucket Sort
• Assumption:
– the input is generated by a random process that distributes
elements uniformly over [0, 1)
• Idea:
– Divide [0, 1) into k equal-sized buckets (k=Θ(n))
– Distribute the n input values into the buckets
– Sort each bucket (e.g., using quicksort)
– Go through the buckets in order, listing elements in each one

• Input: A[1 . . n], where 0 ≤ A[i] < 1 for all i

• Output: elements A[i] sorted

19
Example - Bucket Sort
A 1 .78 B 0 /

2 .17 1 .17 .12 /

3 .39 2 .26 .21 .23 /


4 .26 3 .39 /

5 .72 4 / Distribute
6 .94 5 / Into buckets
7 .21 6 .68 /

8 .12 7 .78 .72 /

9 .23 8 /
10 .68 9 .94 /
20
Example - Bucket Sort

0 /

1 .12 .17 /

2 .21 .23 .26 /

3 .39 /
Sort within each
4 /
bucket
5 /

6 .68 /

7 .72 .78 /

8 /

9 .94 / 21
Example - Bucket Sort
.12 .17 .21 .23 .26 .39 .68 .72 .78 .94 /

0 /

1 .12 .17 /

2 .21 .23 .26 /

3 .39 /

4 /

5 / Concatenate the lists from


6 .68 / 0 to n – 1 together, in order

7 .72 .78 /

8 /

9 .94 / 22
Analysis of Bucket Sort

Alg.: BUCKET-SORT(A, n)

for i ← 1 to n
O(n)
do insert A[i] into list B[nA[i]]

for i ← 0 to k - 1
k O(n/k log(n/k))
do sort list B[i] with quicksort sort =O(nlog(n/k)

concatenate lists B[0], B[1], . . . , B[n -1]


together in order O(k)

return the concatenated lists


O(n) (if k=Θ(n))
23
Radix Sort as a Bucket Sort

24
CSE408
Heap & Heap sort,Hashing

Lecture #18
Special Types of Trees
• Def: Full binary tree = a 4

binary tree in which each 1 3


node is either a leaf or has
2 16 9 10
degree exactly 2. 14 8 7
12
Full binary tree

4
• Def: Complete binary tree = a
binary tree in which all leaves 1 3

are on the same level and all 2 16 9 10


internal nodes have degree 2.
Complete binary tree
Definitions
• Height of a node = the number of edges on the longest
simple path from the node down to a leaf
• Level of a node = the length of a path from the root to
the node
• Height of tree = height of root node

4 Height of root = 3

1 3
Height of (2)= 1 2 16 9 10 Level of (10)= 2
14 8
Useful Properties

height

height

(see Ex 6.1-2, page 129)


d +1
d
2 −1
n   2l = = 2d +1 − 1 Height of root = 3
l =0 2 −1 4

1 3
Height of (2)= 1 2 16 9 10 Level of (10)= 2
14 8
The Heap Data Structure
• Def: A heap is a nearly complete binary tree with
the following two properties:
– Structural property: all levels are full, except
possibly the last one, which is filled from left to right
– Order (heap) property: for any node x
Parent(x) ≥ x

8 From the heap property, it


follows that:
7 4 “The root is the maximum
5 2 element of the heap!”
Heap

A heap is a binary tree that is filled in order


Array Representation of Heaps
• A heap can be stored as an
array A.
– Root of tree is A[1]
– Left child of A[i] = A[2i]
– Right child of A[i] = A[2i + 1]
– Parent of A[i] = A[ i/2 ]
– Heapsize[A] ≤ length[A]
• The elements in the subarray
A[(n/2+1) .. n] are leaves
Heap Types
• Max-heaps (largest element at root), have the
max-heap property:
– for all nodes i, excluding the root:
A[PARENT(i)] ≥ A[i]

• Min-heaps (smallest element at root), have the


min-heap property:
– for all nodes i, excluding the root:
A[PARENT(i)] ≤ A[i]
Adding/Deleting Nodes

• New nodes are always inserted at the bottom


level (left to right)
• Nodes are removed from the bottom level (right
to left)
Operations on Heaps

• Maintain/Restore the max-heap property


– MAX-HEAPIFY
• Create a max-heap from an unordered array
– BUILD-MAX-HEAP
• Sort an array in place
– HEAPSORT
• Priority queues
Maintaining the Heap Property
• Suppose a node is smaller than a
child
– Left and Right subtrees of i are max-heaps
• To eliminate the violation:
– Exchange with larger child
– Move down the tree
– Continue until node is not smaller than
children
Example
MAX-HEAPIFY(A, 2, 10)

A[2]  A[4]

A[2] violates the heap property A[4] violates the heap property

A[4]  A[9]

Heap property restored


Maintaining the Heap Property
• Assumptions: Alg: MAX-HEAPIFY(A, i, n)
– Left and Right 1. l ← LEFT(i)
subtrees of i are 2. r ← RIGHT(i)
max-heaps 3. if l ≤ n and A[l] > A[i]
– A[i] may be 4. then largest ←l
smaller than its
5. else largest ←i
children
6. if r ≤ n and A[r] > A[largest]
7. then largest ←r
8. if largest  i
9. then exchange A[i] ↔ A[largest]
10. MAX-HEAPIFY(A, largest, n)
MAX-HEAPIFY Running Time
• Intuitively:
- h
-
- 2h
- O(h)

• Running time of MAX-HEAPIFY is O(lgn)

• Can be written in terms of the height of the heap,


as being O(h)
– Since the height of the heap is lgn
Building a Heap
• Convert an array A[1 … n] into a max-heap (n = length[A])
• The elements in the subarray A[(n/2+1) .. n] are leaves
• Apply MAX-HEAPIFY on elements between 1 and n/2

Alg: BUILD-MAX-HEAP(A) 1

1. n = length[A] 4

for i ← n/2 downto 1


2 3
2. 1 3
4 5 6 7

3. do MAX-HEAPIFY(A, i, n) 8
2 9 10
16 9 10
14 8 7

A: 4 1 3 2 16 9 10 14 8 7
Example: A 4 1 3 2 16 9 10 14 8 7

i=5 i=4 i=3


1 1 1

4 4 4
2 3 2 3 2 3

1 3 1 3 1 3
4 5 6 7 4 5 6 7 4 5 6 7

8
2 9 10
16 9 10 8 2 9 10
16 9 10 8 14 9 10
16 9 10
14 8 7 14 8 7 2 8 7

i=2 i=1
1 1 1

4 4 16
2 3 2 3 2 3

1 10 16 10 14 10
4 5 6 7 4 5 6 7 4 5 6 7

8
14 9 10
16 9 3 8
14 9 10
7 9 3 8
8 9 10
7 9 3
2 8 7 2 8 1 2 4 1
Running Time of BUILD MAX HEAP

Alg: BUILD-MAX-HEAP(A)
1. n = length[A]
2. for i ← n/2 downto 1
O(n)
3. do MAX-HEAPIFY(A, i, n) O(lgn)

 Running time: O(nlgn)


• This is not an asymptotically tight upper bound
Running Time of BUILD MAX HEAP
• HEAPIFY takes O(h)  the cost of HEAPIFY on a node i is
proportional to the height of the node i in the tree
h
 T (n) =  ni hi =  2i (h − i ) = O(n)
h

Height i =0 i =0 Level No. of nodes


h0 = 3 (lgn) i=0 20

h1 = 2 i=1 21

h2 = 1 i=2 22

h3 = 0 i = 3 (lgn) 23

hi = h – i height of the heap rooted at level i


ni = 2i number of nodes at level i
Running Time of BUILD MAX HEAP
h
T (n) =  ni hi Cost of HEAPIFY at level i  number of nodes at that level
i =0
h
=  2i (h − i ) Replace the values of ni and hi computed before
i =0
h
h−i h
= h −i
2 Multiply by 2h both at the nominator and denominator and
i =0 2 write 2i as 1−i
2
h
k
=2  k
h
Change variables: k = h - i
k =0 2

k
 n k
The sum above is smaller than the sum of all elements to 
k =0 2 and h = lgn

= O (n) The sum above is smaller than 2

Running time of BUILD-MAX-HEAP: T(n) = O(n)


Heapsort
• Goal:
– Sort an array using heap representations

• Idea:
– Build a max-heap from the array
– Swap the root (the maximum element) with the last
element in the array
– “Discard” this last node by decreasing the heap size
– Call MAX-HEAPIFY on the new root
– Repeat this process until only one node remains
Example: A=[7, 4, 3, 1, 2]

MAX-HEAPIFY(A, 1, 4) MAX-HEAPIFY(A, 1, 3) MAX-HEAPIFY(A, 1, 2)

MAX-HEAPIFY(A, 1, 1)
Alg: HEAPSORT(A)

1. BUILD-MAX-HEAP(A) O(n)
2. for i ← length[A] downto 2
3. do exchange A[1] ↔ A[i] n-1 times

4. MAX-HEAPIFY(A, 1, i - 1) O(lgn)

• Running time: O(nlgn) --- Can be


shown to be Θ(nlgn)
Priority Queues

12 4
Operations on Priority Queues

• Max-priority queues support the following


operations:
– INSERT(S, x): inserts element x into set S

– EXTRACT-MAX(S): removes and returns element of


S with largest key

– MAXIMUM(S): returns element of S with largest key

– INCREASE-KEY(S, x, k): increases value of element


x’s key to k (Assume k ≥ x’s current key value)
HEAP-MAXIMUM
Goal:
– Return the largest element of the heap

Running time: O(1)


Alg: HEAP-MAXIMUM(A)
1. return A[1]
Heap A:

Heap-Maximum(A) returns 7
HEAP-EXTRACT-MAX
Goal:
– Extract the largest element of the heap (i.e., return the max
value and also remove that element from the heap
Idea:
– Exchange the root element with the last
– Decrease the size of the heap by 1 element
– Call MAX-HEAPIFY on the new root, on a heap of size n-1

Heap A: Root is the largest element


Example: HEAP-EXTRACT-MAX

16 1

14 10 max = 16 14 10
8 7 9 3 8 7 9 3
2 4 1 2 4
Heap size decreased with 1

14

Call MAX-HEAPIFY(A, 1, n-1)


8 10
4 7 9 3
2 1
HEAP-EXTRACT-MAX

Alg: HEAP-EXTRACT-MAX(A, n)

1. if n < 1
2. then error “heap underflow”

3. max ← A[1]

4. A[1] ← A[n]

5. MAX-HEAPIFY(A, 1, n-1) remakes heap

6. return max
Running time: O(lgn)
HEAP-INCREASE-KEY
• Goal:
– Increases the key of an element i in the heap
• Idea:
– Increment the key of A[i] to its new value
– If the max-heap property does not hold anymore:
traverse a path toward the root to find the proper
place for the newly increased key
16

14 10
8 i 7 9 3
Key [i] ← 15 2 4 1
Example: HEAP-INCREASE-KEY
16 16

14 10 14 10
8 i 7 9 3 8 i 7 9 3
2 4 1 2 15 1

Key [i ] ← 15

16 16
i
14 10 15 10
i
15 7 9 3 14 7 9 3
2 8 1 2 8 1
HEAP-INCREASE-KEY

Alg: HEAP-INCREASE-KEY(A, i, key)

1. if key < A[i]


2. then error “new key is smaller than current key”
3. A[i] ← key
4. while i > 1 and A[PARENT(i)] < A[i] 16
5. do exchange A[i] ↔ A[PARENT(i)]
14 10
6. i ← PARENT(i)
8 i 7 9 3
2 4 1
• Running time: O(lgn)
Key [i] ← 15
MAX-HEAP-INSERT
• Goal:
16
– Inserts a new element into a max-
heap 14 10
8 7 9 3
• Idea:
2 4 1 -
– Expand the max-heap with a new
16
element whose key is -
– Calls HEAP-INCREASE-KEY to 14 10
set the key of the new node to its 8 7 9 3
correct value and maintain the 2 4 1 15

max-heap property
Example: MAX-HEAP-INSERT
Insert value 15: Increase the key to 15
- Start by inserting - Call HEAP-INCREASE-KEY on A[11] = 15
16 16

14 10 14 10
8 7 9 3 8 7 9 3
2 4 1 - 2 4 1 15

The restored heap containing


the newly added element

16 16

14 10 15 10

8 15 9 3 8 14 9 3

2 4 1 7 2 4 1 7
MAX-HEAP-INSERT

16
Alg: MAX-HEAP-INSERT(A, key, n)
14 10
1. heap-size[A] ← n + 1 8 7 9 3
2 4 1 -
2. A[n + 1] ← -

3. HEAP-INCREASE-KEY(A, n + 1, key)

Running time: O(lgn)


Summary

• We can perform the following operations on


heaps:
– MAX-HEAPIFY O(lgn)
– BUILD-MAX-HEAP O(n)
– HEAP-SORT O(nlgn)
– MAX-HEAP-INSERT O(lgn)
– HEAP-EXTRACT-MAX O(lgn)
Average
– HEAP-INCREASE-KEY O(lgn) O(lgn)
– HEAP-MAXIMUM O(1)
Hash Functions
• If the input keys are integers then simply Key
mod TableSize is a general strategy.
– Unless key happens to have some undesirable
properties. (e.g. all keys end in 0 and we use mod 10)
• If the keys are strings, hash function needs more
care.
– First convert it into a numeric value.
Some methods
• Truncation:
– e.g. 123456789 map to a table of 1000 addresses
by picking 3 digits of the key.
• Folding:
– e.g. 123|456|789: add them and take mod.
• Key mod N:
– N is the size of the table, better if it is prime.
• Squaring:
– Square the key and then truncate
• Radix conversion:
– e.g. 1 2 3 4 treat it to be base 11, truncate if
necessary.
Hash Function 1
• Add up the ASCII values of all characters of the key.
int hash(const string &key, int tableSize)
{
int hasVal = 0;

for (int i = 0; i < key.length(); i++)


hashVal += key[i];
return hashVal % tableSize;
}

• Simple to implement and fast.


• However, if the table size is large, the function does not distribute the
keys well.
• e.g. Table size =10000, key length <= 8, the hash function
can assume values only between 0 and 1016
Hash Function 2
• Examine only the first 3 characters of the key.
int hash (const string &key, int tableSize)
{
return (key[0]+27 * key[1] + 729*key[2]) % tableSize;

• In theory, 26 * 26 * 26 = 17576 different words can be generated.


However, English is not random, only 2851 different combinations are
possible.
• Thus, this function although easily computable, is also not appropriate if
the hash table is reasonably large.
Hash Function 3
KeySize −1
hash(key ) =  Key
i =0
[ KeySize − i − 1]  37 i

int hash (const string &key, int tableSize)


{
int hashVal = 0;

for (int i = 0; i < key.length(); i++)


hashVal = 37 * hashVal + key[i];

hashVal %=tableSize;
if (hashVal < 0) /* in case overflows occurs */
hashVal += tableSize;

return hashVal;
};
Hash function for strings:
98 108 105 key[i]

key a l i
0 1 2 i
KeySize = 3;

hash(“ali”) = (105 * 1 + 108*37 + 98*372) % 10,007 = 8172

0
1
2
“ali” hash ……
function 8172
ali
……
10,006 (TableSize)
Double Hashing
• A second hash function is used to drive the
collision resolution.
– f(i) = i * hash2(x)
• We apply a second hash function to x and probe
at a distance hash2(x), 2*hash2(x), … and so on.
• The function hash2(x) must never evaluate to
zero.
– e.g. Let hash2(x) = x mod 9 and try to insert 99 in the
previous example.
• A function such as hash2(x) = R – ( x mod R)
with R a prime smaller than TableSize will work
well.
– e.g. try R = 7 for the previous example.(7 - x mode 7)
Hashing Applications
• Compilers use hash tables to implement the
symbol table (a data structure to keep track of
declared variables).
• Game programs use hash tables to keep track of
positions it has encountered (transposition table)
• Online spelling checkers.
Summary
• Hash tables can be used to implement the insert
and find operations in constant average time.
– it depends on the load factor not on the number of
items in the table.
• It is important to have a prime TableSize and a
correct choice of load factor and hash function.
• For separate chaining the load factor should be
close to 1.
• For open addressing load factor should not
exceed 0.5 unless this is completely
unavoidable.
– Rehashing can be implemented to grow (or shrink)
the table.
CSE408
Lower bound theory

Lecture # 31
Lower bound

• We always consider the algorithm with


smaller order of complexity as the better.
• How can we ensure that is there any faster
method available
• We need to find a function g(n) which is a
lower bound on the time that algorithm must
take, So if f(n) is the time for some algorithm
then we can write f(n)=Ω(g(n) where g(n) is
lower bound for f(n)
Lower bound theory

• Or we can say f(n) >= c * g(n) for all n>n0


• Where c & n0 are constants
• For many problems it is easy to calculate the
lower bound on n inputs
• e.g. consider the set of problems to find the
maximum of an ordered set of n integers. Clearly
every integer must be examined at least once. So
Ω(n) is a lower bound for that. For matrix
multiplication we have 2n2inputs so the lower
bound can be Ω(n2)
Sorting and Searching

• For all sorting & searching we use comparison


trees for finding the lower bound.
• For an unordered set the searching algorithm will
take Ω(n) as the lower bound. For an ordered set
it will take

• Ω(logn) as the lower bound. Similarly all the


sorting algorithms can not sort in less then
Ω(nlogn) time so Ω(nlogn) is the lower bound for
sorting algorithms.
Oracle and adversary arguments

• Given some computational model, the oracle tells


the outcome of each comparison. In order to
derive a good lower bound, the oracle tries its
best to cause the algorithm to work as hard as it
might.
• Take the example of a tournament where the
values are called players and the largest value is
interpreted as the winner and the second largest
is taken as the runner up. So the problem is the
similar to finding the largest and the second
largest number out of a set of n numbers.
Second largest

• Since we can’t determine the second largest


element without having determined the
largest element, at least n-1 comparisons are
necessary. We need to show that there is
always some sequence of comparisons which
forces the second largest to be fund in (logn)-1
additional comparisons
Tournament theory

• The winner of the tournament has played x matches then


there are x people who are candidates for the runner up
position. The runner up has lost only once to the winner
and other x-1 persons must have lost to one other person.
We can produce a oracle which decides the results of
matches in such a way that winner plays logn other people.
• In a match between a and b the oracle declares a as the
winner if a is previously undefeated and b has lost at least
once or if both a and b are undefeated but a has won more
match then b. In any other case the oracle can decide
arbitrarily as long as it remains consistent.
Winning match

• Corresponding to this tournament imagine


drawing a directed graph with n vertices. Each
vertex corresponds to one of the n players.
Draw a directed edge from vertex b to a iff
either player a has defeated b or a has
defeated another player who has defeated b.
Since for the overall winner there must be an
edge from each of the remaining n-1 vertices,
it follows that the winner must have played at
least logn matches.
Increasing matrix

• M is an nXn integer matrix in which the keys in


each row are in increasing order. Consider the
problem of finding the position of an integer x
in M. Determine a lower bound for the no. of
comparisons to be done in the problem.

You might also like