0% found this document useful (0 votes)
145 views88 pages

DAA (Unit 2)

Uploaded by

aanandram221
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views88 pages

DAA (Unit 2)

Uploaded by

aanandram221
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 88

next →←

Introduction of Dynamic Programming


Dynamic Programming is the most powerful design technique for solving
optimization problems.

Divide & Conquer algorithm partition the problem into disjoint subproblems solve
the subproblems recursively and then combine their solution to solve the original
problems.

Dynamic Programming is used when the subproblems are not independent, e.g.
when they share the same subproblems. In this case, divide and conquer may do
more work than necessary, because it solves the same sub problem multiple times.

Dynamic Programming solves each subproblems just once and stores the result in a
table so that it can be repeatedly retrieved if needed again.

Dynamic Programming is a Bottom-up approach- we solve all possible small


problems and then combine to obtain solutions for bigger problems.

Dynamic Programming is a paradigm of algorithm design in which an optimization


problem is solved by a combination of achieving sub-problem solutions and
appearing to the "principle of optimality".

Characteristics of Dynamic Programming:

Dynamic Programming works when a problem has the following features:-


o Optimal Substructure: If an optimal solution contains optimal sub solutions
then a problem exhibits optimal substructure.
o Overlapping subproblems: When a recursive algorithm would visit the same
subproblems repeatedly, then a problem has overlapping subproblems.

If a problem has optimal substructure, then we can recursively define an optimal


solution. If a problem has overlapping subproblems, then we can improve on a
recursive implementation by computing each subproblem only once.

If a problem doesn't have optimal substructure, there is no basis for defining a


recursive algorithm to find the optimal solutions. If a problem doesn't have
overlapping sub problems, we don't have anything to gain by using dynamic
programming.
If the space of subproblems is enough (i.e. polynomial in the size of the input),
dynamic programming can be much more efficient than recursion.

Elements of Dynamic Programming

There are basically three elements that characterize a dynamic programming


algorithm:-

1. Substructure: Decompose the given problem into smaller subproblems.


Express the solution of the original problem in terms of the solution for
smaller problems.
2. Table Structure: After solving the sub-problems, store the results to the sub
problems in a table. This is done because subproblem solutions are reused
many times, and we do not want to repeatedly solve the same problem over
and over again.
3. Bottom-up Computation: Using table, combine the solution of smaller
subproblems to solve larger subproblems and eventually arrives at a solution
to complete problem.
Note: Bottom-up means:-

i. Start with smallest subproblems.


ii. Combining their solutions obtain the solution to sub-problems of increasing size.
iii. Until solving at the solution of the original problem.

Applications of dynamic programming

1. 0/1 knapsack problem


2. Mathematical optimization problem
3. All pair Shortest path problem
4. Reliability design problem
5. Longest common subsequence (LCS)
6. Flight control and robotics control
7. Time-sharing: It schedules the job to maximize CPU usage.

Matrix chain multiplication problem using Dynamic


Programming :-

We are covered a many of the real world problems.In our day to day life when we do
making coin change, robotics world, aircraft, mathematical problems like Fibonacci
sequence, simple matrix multiplication of more than two matrices and its multiplication
possibility is many more so in that get the best and optimal solution. NOW we can look
about one problem that is MATRIX CHAIN MULTIPLICATION PROBLEM.
Matrix chain multiplication (or Matrix Chain Ordering Problem, MCOP) is an optimization
problem that can be solved using dynamic programming. Given a sequence of matrices, the
goal is to find the most efficient way to multiply these matrices. The problem is not actually to
perform the multiplications, but merely to decide the sequence of the matrix multiplications
involved.

Matrix Chain Multiplication

It is a Method under Dynamic Programming in which previous output is


taken as input for next.

Here, Chain means one matrix's column is equal to the second matrix's
row [always].

In general:

If A = ⌊aij⌋ is a p x q matrix
B = ⌊bij⌋ is a q x r matrix
C = ⌊cij⌋ is a p x r matrix

Then

Given following matrices {A1,A2,A3,...An} and we have to perform the


matrix multiplication, which can be accomplished by a series of matrix
multiplications

A1 xA2 x,A3 x.....x An


Matrix Multiplication operation is associative in nature rather
commutative. By this, we mean that we have to follow the above matrix
order for multiplication but we are free to parenthesize the above
multiplication depending upon our need.

In general, for 1≤ i≤ p and 1≤ j ≤ r

It can be observed that the total entries in matrix 'C' is 'pr' as the
matrix is of dimension p x r Also each entry takes O (q) times to
compute, thus the total time to compute all possible entries for the
matrix 'C' which is a multiplication of 'A' and 'B' is proportional to the
product of the dimension p q r.

It is also noticed that we can save the number of operations by


reordering the parenthesis.

Example1: Let us have 3 matrices, A1,A2,A3 of order (10 x 100), (100 x


5) and (5 x 50) respectively.

Three Matrices can be multiplied in two ways:

1. A1,(A2,A3): First multiplying(A2 and A3) then multiplying and


resultant withA1.
2. (A1,A2),A3: First multiplying(A1 and A2) then multiplying and
resultant withA3.

No of Scalar multiplication in Case 1 will be:


1. (100 x 5 x 50) + (10 x 100 x 50) = 25000 + 50000 = 75000

No of Scalar multiplication in Case 2 will be:

2. (100 x 10 x 5) + (10 x 5 x 50) = 5000 + 2500 = 7500

To find the best possible way to calculate the product, we could simply
parenthesis the expression in every possible fashion and count each
time how many scalar multiplication are required.

Matrix Chain Multiplication Problem can be stated as "find the optimal


parenthesization of a chain of matrices to be multiplied such that the
number of scalar multiplication is minimized".

Number of ways for parenthesizing the matrices:

There are very large numbers of ways of parenthesizing these matrices.


If there are n items, there are (n-1) ways in which the outer most pair
of parenthesis can place.

(A1) (A2,A3,A4,................An)
Or (A1,A2) (A3,A4 .................An)
Or (A1,A2,A3) (A4 ...............An)
........................

Or(A1,A2,A3.............An-1) (An)

It can be observed that after splitting the kth matrices, we are left with
two parenthesized sequence of matrices: one consist 'k' matrices and
another consist 'n-k' matrices.
Now there are 'L' ways of parenthesizing the left sublist and 'R' ways of
parenthesizing the right sublist then the Total will be L.R:

Also p (n) = c (n-1) where c (n) is the nth Catalon number

c (n) =

On applying Stirling's formula we have

c (n) = Ω

Which shows that 4n grows faster, as it is an exponential function, then


n1.5.

Development of Dynamic Programming Algorithm

1. Characterize the structure of an optimal solution.


2. Define the value of an optimal solution recursively.
3. Compute the value of an optimal solution in a bottom-up fashion.
4. Construct the optimal solution from the computed information.
Dynamic Programming Approach

Let Ai,j be the result of multiplying matrices i through j. It can be seen


that the dimension of Ai,j is pi-1 x pj matrix.

Dynamic Programming solution involves breaking up the problems into


subproblems whose solution can be combined to solve the global
problem.

At the greatest level of parenthesization, we multiply two matrices

A1.....n=A1....k x Ak+1....n)

Thus we are left with two questions:

o How to split the sequence of matrices?


o How to parenthesize the subsequence A 1.....k andAk+1......n?

One possible answer to the first question for finding the best value of 'k'
is to check all possible choices of 'k' and consider the best among them.
But that it can be observed that checking all possibilities will lead to an
exponential number of total possibilities. It can also be noticed that
there exists only O (n2 ) different sequence of matrices, in this way do
not reach the exponential growth.

Step1: Structure of an optimal parenthesization: Our first step in


the dynamic paradigm is to find the optimal substructure and then use it
to construct an optimal solution to the problem from an optimal solution
to subproblems.
Let Ai....j where i≤ j denotes the matrix that results from evaluating the
product

Ai Ai+1....Aj.

If i < j then any parenthesization of the product A i Ai+1 ......Aj must split
that the product between Ak and Ak+1 for some integer k in the range i ≤
k ≤ j. That is for some value of k, we first compute the matrices A i.....k &
Ak+1....j and then multiply them together to produce the final product
Ai....j. The cost of computing Ai....k plus the cost of computing Ak+1....j plus
the cost of multiplying them together is the cost of parenthesization.

Step 2: A Recursive Solution: Let m [i, j] be the minimum number of


scalar multiplication needed to compute the matrixA i....j.

If i=j the chain consist of just one matrix A i....i=Ai so no scalar


multiplication are necessary to compute the product. Thus m [i, j] = 0
for i= 1, 2, 3....n.

If i<j we assume that to optimally parenthesize the product we split it


between Ak and Ak+1 where i≤ k ≤j. Then m [i,j] equals the minimum
cost for computing the subproducts Ai....k and Ak+1....j+ cost of multiplying
them together. We know Ai has dimension pi-1 x pi, so computing the
product Ai....k and Ak+1....jtakes pi-1 pk pj scalar multiplication, we obtain

m [i,j] = m [i, k] + m [k + 1, j] + pi-1 pk pj

There are only (j-1) possible values for 'k' namely k = i, i+1.....j-1.
Since the optimal parenthesization must use one of these values for 'k'
we need only check them all to find the best.
So the minimum cost of parenthesizing the product
Ai Ai+1......Aj becomes

To construct an optimal solution, let us define s [i,j] to be the value of


'k' at which we can split the product A i Ai+1 .....Aj To obtain an optimal
parenthesization i.e. s [i, j] = k such that

m [i,j] = m [i, k] + m [k + 1, j] + pi-1 pk pj

Example of Matrix Chain Multiplication

Example: We are given the sequence {4, 10, 3, 12, 20, and 7}. The
matrices have size 4 x 10, 10 x 3, 3 x 12, 12 x 20, 20 x 7. We need to
compute M [i,j], 0 ≤ i, j≤ 5. We know M [i, i] = 0 for all i.
Let us proceed with working away from the diagonal. We compute the
optimal solution for the product of 2 matrices.

Here P0 to P5 are Position and M1 to M5 are matrix of size (pi to pi-1)

On the basis of sequence, we make a formula

In Dynamic Programming, initialization of every method done by '0'.So we


initialize it by '0'.It will sort out diagonally.

We have to sort out all the combination but the minimum output
combination is taken into consideration.

Calculation of Product of 2 matrices:


1. m (1,2) = m1 x m2
= 4 x 10 x 10 x 3
= 4 x 10 x 3 = 120

2. m (2, 3) = m2 x m3
= 10 x 3 x 3 x 12
= 10 x 3 x 12 = 360
3. m (3, 4) = m3 x m4
= 3 x 12 x 12 x 20
= 3 x 12 x 20 = 720

4. m (4,5) = m4 x m5
= 12 x 20 x 20 x 7
= 12 x 20 x 7 = 1680

o We initialize the diagonal element with equal i,j value with '0'.
o After that second diagonal is sorted out and we get all the values
corresponded to it

Now the third diagonal will be solved out in the same way.

Now product of 3 matrices:

M [1, 3] = M1 M2 M3
1. There are two cases by which we can solve this multiplication: ( M 1 x
M2) + M3, M1+ (M2x M3)
2. After solving both cases we choose the case in which minimum
output is there.
M [1, 3] =264

As Comparing both output 264 is minimum in both cases so we


insert 264 in table and ( M1 x M2) + M3 this combination is chosen for the
output making.

M [2, 4] = M2 M3 M4
1. There are two cases by which we can solve this multiplication: (M 2x
M3)+M4, M2+(M3 x M4)
2. After solving both cases we choose the case in which minimum
output is there.

M [2, 4] = 1320

As Comparing both output 1320 is minimum in both cases so we


insert 1320 in table and M2+(M3 x M4) this combination is chosen for the
output making.

M [3, 5] = M3 M4 M5
1. There are two cases by which we can solve this multiplication: ( M 3 x
M4) + M5, M3+ ( M4xM5)
2. After solving both cases we choose the case in which minimum
output is there.

M [3, 5] = 1140
As Comparing both output 1140 is minimum in both cases so we
insert 1140 in table and ( M3 x M4) + M5this combination is chosen for the
output making.

Now Product of 4 matrices:

M [1, 4] = M1 M2 M3 M4

There are three cases by which we can solve this multiplication:

1. ( M1 x M2 x M3) M4
2. M1 x(M2 x M3 x M4)
3. (M1 xM2) x ( M3 x M4)

After solving these cases we choose the case in which minimum output is
there

M [1, 4] =1080

As comparing the output of different cases then '1080' is minimum


output, so we insert 1080 in the table and (M1 xM2) x (M3 x M4)
combination is taken out in output making,
M [2, 5] = M2 M3 M4 M5

There are three cases by which we can solve this multiplication:

1. (M2 x M3 x M4)x M5
2. M2 x( M3 x M4 x M5)

3. (M2 x M3)x ( M4 x M5)

After solving these cases we choose the case in which minimum output is
there

M [2, 5] = 1350

As comparing the output of different cases then '1350' is minimum


output, so we insert 1350 in the table and M2 x( M3 x M4 xM5)combination
is taken out in output making.

Now Product of 5 matrices:

M [1, 5] = M1 M2 M3 M4 M5

There are five cases by which we can solve this multiplication:


1. (M1 x M2 xM3 x M4 )x M5
2. M1 x( M2 xM3 x M4 xM5)
3. (M1 x M2 xM3)x M4 xM5
4. M1 x M2x(M3 x M4 xM5)

After solving these cases we choose the case in which minimum output is
there

M [1, 5] = 1344

As comparing the output of different cases then '1344' is minimum output,


so we insert 1344 in the table and M1 x M2 x(M3 x M4 x M5)combination is
taken out in output making.

Final Output is:


Step 3: Computing Optimal Costs: let us assume that matrix Ai has
dimension pi-1x pi for i=1, 2, 3....n. The input is a sequence (p0,p1,......pn)
where length [p] = n+1. The procedure uses an auxiliary table m [1....n,
1.....n] for storing m [i, j] costs an auxiliary table s [1.....n, 1.....n] that
record which index of k achieved the optimal costs in computing m [i, j].

The algorithm first computes m [i, j] ← 0 for i=1, 2, 3.....n, the minimum
costs for the chain of length 1.

Algorithm of Matrix Chain Multiplication

MATRIX-CHAIN-ORDER (p)

1. n length[p]-1
2. for i ← 1 to n
3. do m [i, i] ← 0
4. for l ← 2 to n // l is the chain length
5. do for i ← 1 to n-l + 1
6. do j ← i+ l -1
7. m[i,j] ← ∞
8. for k ← i to j-1
9. do q ← m [i, k] + m [k + 1, j] + pi-1 pk pj
10. If q < m [i,j]
11. then m [i,j] ← q
12. s [i,j] ← k
13. return m and s.

We will use table s to construct an optimal solution.


Step 1: Constructing an Optimal Solution:

PRINT-OPTIMAL-PARENS (s, i, j)
1. if i=j
2. then print "A"
3. else print "("
4. PRINT-OPTIMAL-PARENS (s, i, s [i, j])
5. PRINT-OPTIMAL-PARENS (s, s [i, j] + 1, j)
6. print ")"

Analysis: There are three nested loops. Each loop executes a maximum
n times.

1. l, length, O (n) iterations.


2. i, start, O (n) iterations.
3. k, split point, O (n) iterations

Body of loop constant complexity

Total Complexity is: O (n3)


Longest Common Subsequence:

The longest common subsequence (LCS) is defined as the longest subsequence that is common to

all the given sequences, provided that the elements of the subsequence are not required to occupy

consecutive positions within the original sequences.

If S1 and S2 are the two given sequences then, Z is the common subsequence of S1 and S2 if Z is

a subsequence of both S1 and S2. Furthermore, Z must be a strictly increasing sequence of the

indices of both S1 and S2.

In a strictly increasing sequence, the indices of the elements chosen from the original sequences

must be in ascending order in Z.

If

S1 = {B, C, D, A, A, C, D}

Then, {A, D, B} cannot be a subsequence of S1 as the order of the elements is not the same (ie.

not strictly increasing sequence).

Let us understand LCS with an example.

If
S1 = {B, C, D, A, A, C, D}

S2 = {A, C, D, B, A, C}

Then, common subsequences are {B, C}, {C, D, A, C}, {D, A, C}, {A, A, C}, {A, C},

{C, D}, ...

Among these subsequences, {C, D, A, C} is the longest common subsequence. We are going to

find this longest common subsequence using dynamic programming.

Before proceeding further, if you do not already know about dynamic programming, please go

through dynamic programming.

Using Dynamic Programming to find the LCS

Let us take two sequences:

The following steps are followed for finding the longest common subsequence.
1. Create a table of dimension n+1*m+1 where n and m are the lengths of X and Y respectively. The

first row and the first column are filled with zeros.

2. Fill each cell of the table using the following logic.

3. If the character correspoding to the current row and current column are matching, then fill the

current cell by adding one to the diagonal element. Point an arrow to the diagonal cell.

4. Else take the maximum value from the previous column and previous row element for filling the

current cell. Point an arrow to the cell with maximum value

.
If they are equal, point to any of them.

5. Step 2 is repeated until the table is filled.

6. The value in the last row and the last column is the length of the longest common subsequence.
7. In order to find the longest common subsequence, start from the last element and follow the

direction of the arrow. The elements corresponding to () symbol form the longest common

subsequence.

Thus, the longest common subsequence is CD.


How dynamic programming algorithm is more efficient than the recursive algorithm while

solving an LCS problem?

The method of dynamic programming reduces the number of function calls. It stores the result of

each function call so that it can be used in future calls without the need for redundant calls.

In the above dynamic algorithm, the results obtained from each comparison between elements

of X and the elements of Y are stored in a table so that they can be used in future computations.

So, the time taken by a dynamic approach is the time taken to fill the table (ie. O(mn)). Whereas,

recursion algorithm has the complexity of 2max(m, n)


.

Longest Common Subsequence Algorithm

X and Y be two given sequences


Initialize a table LCS of dimension X.length * Y.length
X.label = X
Y.label = Y
LCS[0][] = 0
LCS[][0] = 0
Start from LCS[1][1]
Compare X[i] and Y[j]
If X[i] = Y[j]
LCS[i][j] = 1 + LCS[i-1, j-1]
Point an arrow to LCS[i][j]

Else
LCS[i][j] = max(LCS[i-1][j], LCS[i][j-1])
Point an arrow to max(LCS[i-1][j], LCS[i][j-1])

Greedy Algorithms
Greedy is an algorithmic paradigm that builds up a solution piece by piece, always choosing the
next piece that offers the most obvious and immediate benefit. So the problems where choosing
locally optimal also leads to global solution are best fit for Greedy.

What is Greedy Algorithm?

In GREEDY ALGORITHM a set of resources are recursively divided based on the maximum,
immediate availability of that resource at any given stage of execution

To solve a problem based on the greedy approach, there are two stages

1. scanning the list of items


2. optimization.

These stages are covered parallelly, on course of division of the array.

To understand the greedy approach, you will need to have a working knowledge of recursion and
context switching. This helps you to understand how to trace the code. You can define the greedy
paradigm in terms of your own necessary and sufficient statements.

Two conditions define the greedy paradigm.

 Each stepwise solution must structure a problem towards its best-accepted solution.

 It is sufficient if the structuring of the problem can halt in a finite number of greedy steps.

Components of Greedy Algorithm

Greedy algorithms have the following five components −


 A candidate set − A solution is created from this set.
 A selection function − Used to choose the best candidate to be added to the solution.
 A feasibility function − Used to determine whether a candidate can be used to contribute to
the solution.
 An objective function − Used to assign a value to a solution or a partial solution.
 A solution function − Used to indicate whether a complete solution has been reached

ACTIVITY SELECTION PROBLEM:


The Activity Selection Problem is an optimization problem which deals with the selection of non-
conflicting activities that needs to be executed by a single person or machine in a given time frame.
Each activity is marked by a start and finish time. Greedy technique is used for finding the solution
since this is an optimization problem.

What is Activity Selection Problem?


Let's consider that you have n activities with their start and finish times, the objective is to find
solution set having maximum number of non-conflicting activities that can be executed in a
single time frame, assuming that only one person or machine is available for execution.
Some points to note here:

 It might not be possible to complete all the activities, since their timings can collapse.
 Two activities, say i and j, are said to be non-conflicting if si >= fj or sj >=
fi where si and sj denote the starting time of activities i and j respectively,
and fi and fj refer to the finishing time of the activities i and j respectively.
 Greedy approach can be used to find the solution since we want to maximize the count of
activities that can be executed. This approach will greedily choose an activity with earliest
finish time at every step, thus yielding an optimal solution.

Input Data for the Algorithm:

 act[] array containing all the activities.


 s[] array containing the starting time of all the activities.

 f[] array containing the finishing time of all the activities.


Ouput Data from the Algorithm:

 sol[] array refering to the solution set containing the maximum number of non-conflicting
activities.

Steps for Activity Selection Problem


Following are the steps we will be following to solve the activity selection problem,
Step 1: Sort the given activities in ascending order according to their finishing time.
Step 2: Select the first activity from sorted array act[] and add it to sol[] array.
Step 3: Repeat steps 4 and 5 for the remaining activities in act[].
Step 4: If the start time of the currently selected activity is greater than or equal to the finish time of
previously selected activity, then add it to the sol[] array.
Step 5: Select the next activity in act[] array.
Step 6: Print the sol[] array.

Activity Selection Problem Example


Let's try to trace the steps of above algorithm using an example:
In the table below, we have 6 activities with corresponding start and end time, the objective is to
compute an execution schedule having maximum number of non-conflicting activities:
Start Time (s) Finish Time (f) Activity Name

5 9 a1

1 2 a2

3 4 a3

0 6 a4

5 7 a5

8 9 a6
A possible solution would be:
Step 1: Sort the given activities in ascending order according to their finishing time.
The table after we have sorted it:

Start Time (s) Finish Time (f) Activity Name

5 2 a2

1 4 a3

3 6 a4

0 7 a5

5 9 a1

8 9 a6

Step 2: Select the first activity from sorted array act[] and add it to the sol[] array, thus sol =
{a2}.
Step 3: Repeat the steps 4 and 5 for the remaining activities in act[].

Step 4: If the start time of the currently selected activity is greater than or equal to the finish time of
the previously selected activity, then add it to sol[].
Step 5: Select the next activity in act[]
For the data given in the above table,

A. Select activity a3. Since the start time of a3 is greater than the finish time of a2 (i.e. s(a3) >
f(a2)), we add a3 to the solution set. Thus sol = {a2, a3}.
B. Select a4. Since s(a4) < f(a3), it is not added to the solution set.
C. Select a5. Since s(a5) > f(a3), a5 gets added to solution set. Thus sol = {a2, a3, a5}
D. Select a1. Since s(a1) < f(a5), a1 is not added to the solution set.
E. Select a6. a6 is added to the solution set since s(a6) > f(a5). Thus sol = {a2, a3, a5, a6}.

Step 6: At last, print the array sol[]


Hence, the execution schedule of maximum number of non-conflicting activities will be:

(1,2)
(3,4)
(5,7)
(8,9)
In the above diagram, the selected activities have been highlighted in grey.
Algorithm of Activity selection problem:
GREEDY- ACTIVITY SELECTOR (s, f)
1. n ← length [s]
2. A ← {1}
3. j ← 1.
4. for i ← 2 to n
5. do if si ≥ fi
6. then A ← A ∪ {i}
7. j ← i
8. return A

Huffman codes:
 Huffman Algorithm was developed by David Huffman in 1951.
 This is a technique which is used in a data compression or it can be said that it is a coding
technique which is used for encoding data.

 This technique is a mother of all data compression scheme.


 This idea is basically dependent upon the frequency, i.e. the frequency of the corresponding
character which needs to be compressed, and by that frequency, only Huffman code will be
generated.
 In case of Huffman coding, the most generated character will get the small code and least
generated character will get the large code.
 Huffman tree is a specific method of representing each symbol.

 This technique produces a code in such a manner that no codeword is a prefix of some other
code word. These codes are called as prefix cod e.
Huffman Coding is generally useful to compress the data in which there are frequently occurring
characters.

How Huffman Coding works?

Suppose the string below is to be sent over a network.

Each character occupies 8 bits. There are a total of 15 characters in the above string. Thus, a total
of 8 * 15 = 120 bits are required to send this string.
Using the Huffman Coding technique, we can compress the string to a smaller size.

Huffman coding first creates a tree using the frequencies of the character and then generates code
for each character.

Once the data is encoded, it has to be decoded. Decoding is done using the same tree.
Huffman Coding prevents any ambiguity in the decoding process using the concept of prefix
code ie. a code associated with a character should not be present in the prefix of any other code.
The tree created above helps in maintaining the property.
Huffman coding is done with the help of the following steps.

1. Calculate the frequency of each character in the string.

2. Sort the characters in increasing order of the frequency. These are stored in a priority queue Q.

3. Make each unique character as a leaf node.

4. Create an empty node z. Assign the minimum frequency to the left child of z and assign the second
minimum frequency to the right child of z. Set the value of the z as the sum of the above two
minimum frequencies.
5. Remove these two minimum frequencies from Q and add the sum into the list of frequencies (*
denote the internal nodes in the figure above).
6. Insert node z into the tree.
7. Repeat steps 3 to 5 for all the characters
.

\
8. For each non-leaf node, assign 0 to the left edge and 1 to the right edge.

For sending the above string over a network, we have to send the tree as well as the above
compressed-code. The total size is given by the table below.

Character Frequency Code Size

A 5 11 5*2 = 10

B 1 100 1*3 = 3

C 6 0 6*1 = 6

D 3 101 3*3 = 9

4 * 8 = 32 bits 15 bits 28 bits

Without encoding, the total size of the string was 120 bits. After encoding the size is reduced to 32
+ 15 + 28 = 75.
Decoding the code

For decoding the code, we can take the code and traverse through the tree to find the character.
Let ### is to be decoded, we can traverse from the root as in the figure below.

Huffman Coding Complexity

The time complexity for encoding each unique character based on its frequency is O(nlog n).
Extracting minimum frequency from the priority queue takes place 2*(n-1) times and its complexity
is O(log n). Thus the overall complexity is O(nlog n).
Huffman Coding Applications

1. Huffman coding is used in conventional compression formats like GZIP, BZIP2, PKZIP, etc.

2. For text and fax transmissions.

Algorithm for Huffman code

1. Input:-Number of message with frequency count.


2. Output: - Huffman merge tree.
3. Begin
4. Let Q be the priority queue,
5. Q= {initialize priority queue with frequencies of all symbol
or message}
6. Repeat n-1 times
7. Create a new node Z
8. X=extract_min(Q)
9. Y=extract_min(Q)
10. Frequency(Z) =Frequency(X) +Frequency(y);
11. Insert (Z, Q)
12. End repeat
13. Return (extract_min(Q))
14. End.

Example:

Let obtain a set of Huffman code for the message (m1.....m7) with relative frequencies (q1.....q7) =
(4,5,7,8,10,12,20). Let us draw the Huffman tree for the given set of codes.

Step 1) Arrange the data in ascending order in a table.

4,5,7,8,10,12,20

Step 2) Combine first two entries of a table and by this create a parent node.

Step 3)

A) Remove the entries 4 and 5 from the table and inert 9 at its appropriate position. 7,8,9,10,12,20

Combine minimum value of table and create a parent node.


B) Now remove the entries 7 and 8 from the table and insert 15 at its appropriate
position. 9,10,12,15,20

Combine minimum value of two blocks and create a parent node.

C) Remove the entries 9 and 10 from the table and insert 19 at its proper position. 12,15,19,20.

Combine minimum value of two blocks and create parent node.


D) Remove the entries 15 and 12 from the table and insert 27 at its appropriate position. 19,20,27

Combine minimum value of two blocks and create parent node.

E) Remove the entries 19 and 20 from the table and insert 39 in the table. 27,39

Combine minimum value of two blocks and create parent node.


Step 4) Now assign left child as 0 and right child as 1 to encode the frequencies.
Now, codes for the given frequencies are given below:

Time complexity:

O(nlogn) is the overall time complexity. Where n is the number of characters.

TASK SCHEDULING PROBLEM:


Given an array of jobs where every job has a deadline and associated profit if the job is finished before
the deadline. It is also given that every job takes single unit of time, so the minimum possible deadline
for any job is 1. How to maximize total profit if only one job can be scheduled at a time.

Job sequencing is the set of jobs, associated with the job i where deadline di >= 0 and profit pi >
0. For any job i the profit is earned if and only if the job is completed by its deadline. To complete a
job, one has to process the job on a machine for one unit of time. Only one machine is available for
processing the jobs.
Examples:
Input: Four Jobs with following
deadlines and profits
JobID Deadline Profit
a 4 20
b 1 10
c 1 40
d 1 30
Output: Following is maximum
profit sequence of jobs
c, a

Input: Five Jobs with following


deadlines and profits
JobID Deadline Profit
a 2 100
b 1 19
c 2 27
d 1 25
e 3 15
Output: Following is maximum
profit sequence of jobs
c, a, e
This is a standard Greedy Algorithm problem. Following is algorithm.
1) Sort all jobs in decreasing order of profit.
2) Iterate on jobs in decreasing order of profit.For each job , do the following :
a)Find a time slot i, such that slot is empty and i < deadline and i is greatest.Put the job in
this slot and mark this slot filled.
b)If no such i exists, then ignore the job.
Time complexity

Job sequencing problems has the time complexity of O(n2).

Example:

Given a set of 9 jobs where each job has a deadline and profit associated to it .Each job takes 1 unit
of time to complete and only one job can be scheduled at a time. We earn the profit if and only if
the job is completed by its deadline. The task is to find the maximum profit and the number of jobs
done.

Jobs Profit Deadline


J1 85 5
J2 25 4
J3 16 3
J4 40 3
J5 55 4
J6 19 5
J7 92 2
J8 80 3
J9 15 7

Step 1:
Step 2:

Step 3:

Step 4:
Step 5:

Step 6:

So, the maximum profit = 40 + 92 + 80 + 55 + 85 + 15 = 367

Algorithm for job sequencing

Input: A is the array of jobs with deadline and profit S array will be the output.

1. Begin
2. Sort all the jobs based on profit Pi so
3. P1 > P2 > P3 …………………………….>=Pn
4. d = maximum deadline of job in A
5. Create array S[1,…………………,d]
6. For i=1 to n do
7. Find the largest job x
8. For j=I to 1
9. If ((S[j] = 0) and (x deadline<= d1))
10. Then
11. S[x] = I;

12. Break;
13. End if
14. End for
15. End for
16. End

TRAVELLING SALESMAN PROBLEM:


Travelling Salesman Problem (TSP): Given a set of cities and distance between every pair of
cities, the problem is to find the shortest possible route that visits every city exactly once and returns
to the starting point.
Note the difference between Hamiltonian Cycle and TSP. The Hamiltoninan cycle problem is to find
if there exist a tour that visits every city exactly once. Here we know that Hamiltonian Tour exists
(because the graph is complete) and in fact many such tours exist, the problem is to find a minimum
weight Hamiltonian Cycle.

For example, consider the graph shown in figure on right side. A TSP tour in the graph is 1-2-4-3-1.
The cost of the tour is 10+25+30+15 which is 80.
The problem is a famous NP hard problem. There is no polynomial time know solution for this
problem.
Following are different solutions for the traveling salesman problem.
Naive Solution:
1) Consider city 1 as the starting and ending point.
2) Generate all (n-1)! Permutations of cities.
3) Calculate cost of every permutation and keep track of minimum cost permutation.
4) Return the permutation with minimum cost.
Time Complexity: Θ(n!)
The traveling salesman problems abide by a salesman and a set of cities. The salesman
has to visit every one of the cities starting from a certain one (e.g., the hometown)
and to return to the same city. The challenge of the problem is that the traveling
salesman needs to minimize the total length of the trip.

Suppose the cities are x1 x2..... xn where cost cij denotes the cost of travelling from city
xi to xj. The travelling salesperson problem is to find a route starting and ending at
x1 that will take in all cities with the minimum cost.

Example: A newspaper agent daily drops the newspaper to the area assigned in such
a manner that he has to cover all the houses in the respective area with minimum
travel cost. Compute the minimum travel cost.

The area assigned to the agent where he has to drop the newspaper is shown in fig:

Solution: The cost- adjacency matrix of graph G is as follows:

costij =
The tour starts from area H1 and then select the minimum cost area reachable from H 1.

Mark area H6 because it is the minimum cost area reachable from H1 and then select
minimum cost area reachable from H 6.
Mark area H7 because it is the minimum cost area reachable from H 6 and then select
minimum cost area reachable from H 7.

Mark area H8 because it is the minimum cost area reachable from H8.
Mark area H5 because it is the minimum cost area reachable from H 5.

Mark area H2 because it is the minimum cost area reachable from H 2.


Mark area H3 because it is the minimum cost area reachable from H 3.

Mark area H4 and then select the minimum cost area reachable from H 4 it is H1.So,
using the greedy strategy, we get the following.

4 3 2 4 3 2 1 6
H1 → H6 → H7 → H8 → H5 → H2 → H3 → H4 → H1.

Thus the minimum travel cost = 4 + 3 + 2 + 4 + 3 + 2 + 1 + 6 = 25


BINOMIAL HEAPS:
The main application of Binary Heap is as implement priority queue. Binomial Heap is an extension
of Binary Heap that provides faster union or merge operation together with other operations
provided by Binary Heap.
A Binomial Heap is a collection of Binomial Trees

What is a Binomial Tree?


A Binomial Tree of order 0 has 1 node. A Binomial Tree of order k can be constructed by taking two
binomial trees of order k-1 and making one as leftmost child or other.
A Binomial Tree of order k has following properties.
a) It has exactly 2k nodes.
b) It has depth as k.
c) There are exactly kCi nodes at depth i for i = 0, 1, . . . , k.
d) The root has degree k and children of root are themselves Binomial Trees with order k-1, k-2,.. 0
from left to right.

k = 0 (Single Node)

k = 1 (2 nodes)
[We take two k = 0 order Binomial Trees, and
make one as child of other]
o
\
o

k = 2 (4 nodes)
[We take two k = 1 order Binomial Trees, and
make one as child of other]
o
/ \
o o
\
o

k = 3 (8 nodes)
[We take two k = 2 order Binomial Trees, and
make one as child of other]
o
/ | \
o o o
| / \
o o o
\
o
Binomial Heap:
A Binomial Heap is a set of Binomial Trees where each Binomial Tree follows Min Heap property.
And there can be at most one Binomial Tree of any degree.
Examples Binomial Heap:
12------------10--------------------20
/ \ / | \
15 50 70 50 40
| / | |
30 80 85 65
|
100
A Binomial Heap with 13 nodes. It is a collection of 3
Binomial Trees of orders 0, 2 and 3 from left to right.

10--------------------20
/ \ / | \
15 50 70 50 40
| / | |
30 80 85 65
|
100
A Binomial Heap with 12 nodes. It is a collection of 2
Binomial Trees of orders 2 and 3 from left to right.
Binary Representation of a number and Binomial Heaps
A Binomial Heap with n nodes has the number of Binomial Trees equal to the number of set bits in
the Binary representation of n. For example let n be 13, there 3 set bits in the binary representation
of n (00001101), hence 3 Binomial Trees. We can also relate the degree of these Binomial Trees
with positions of set bits. With this relation, we can conclude that there are O(Logn) Binomial Trees
in a Binomial Heap with ‘n’ nodes.
Operations of Binomial Heap:
The main operation in Binomial Heap is union(), all other operations mainly use this operation. The
union() operation is to combine two Binomial Heaps into one. Let us first discuss other operations,
we will discuss union later.
1) insert(H, k): Inserts a key ‘k’ to Binomial Heap ‘H’. This operation first creates a
Binomial Heap with single key ‘k’, then calls union on H and the new Binomial heap.

2) getMin(H): A simple way to getMin() is to traverse the list of root of Binomial Trees and
return the minimum key. This implementation requires O(Logn) time. It can be
optimized to O(1) by maintaining a pointer to minimum key root.

3) extractMin(H): This operation also uses union(). We first call getMin() to find the minimum key
Binomial Tree, then we remove the node and create a new Binomial Heap by connecting all
subtrees of the removed minimum node. Finally, we call union() on H and the newly created
Binomial Heap. This operation requires O(Logn) time.

4) delete(H): Like Binary Heap, delete operation first reduces the key to minus infinite, then calls
extractMin().

5) decreaseKey(H): decreaseKey() is also similar to Binary Heap. We compare the decreases key
with it parent and if parent’s key is more, we swap keys and recur for the parent. We stop when we
either reach a node whose parent has a smaller key or we hit the root node. Time complexity of
decreaseKey() is O(Logn).

How to represent Binomial Heap?


A Binomial Heap is a set of Binomial Trees. A Binomial Tree must be represented in a way
that allows sequential access to all siblings, starting from the leftmost sibling (We need this
in and extractMin() and delete()). The idea is to represent Binomial Trees as the leftmost
child and right-sibling representation, i.e.
Implementation of Binomial Heap
In previous article, we have discussed about the concepts related to Binomial heap.

Examples Binomial Heap:

12------------10--------------------20
/ \ / | \
15 50 70 50 40
| / | |
30 80 85 65
|
100
A Binomial Heap with 13 nodes. It is a collection of 3
Binomial Trees of orders 0, 2 and 3 from left to right.

10--------------------20
/ \ / | \
15 50 70 50 40
| / | |
30 80 85 65
|
100
In this article, implementation of Binomial Heap is discussed. Following functions implemented :
1. insert(H, k): Inserts a key ‘k’ to Binomial Heap ‘H’. This operation first creates a Binomial Heap with
single key ‘k’, then calls union on H and the new Binomial heap.

2. getMin(H): A simple way to getMin() is to traverse the list of root of Binomial Trees and return the
minimum key. This implementation requires O(Logn) time. It can be optimized to O(1) by maintaining a
pointer to minimum key root.
3. extractMin(H): This operation also uses union(). We first call getMin() to find the minimum key
Binomial Tree, then we remove the node and create a new Binomial Heap by connecting all subtrees of
the removed minimum node. Finally we call union() on H and the newly created Binomial Heap. This
operation requires O(Logn) time.
4.

FIBONACCI HEAPS:
In computer science, a Fibonacci heap is a data structure for priority queue operations, consisting
of a collection of heap-ordered trees. It has a better amortized running time than many other priority
queue data structures including the binary heap and binomial heap. Michael L. Fredman and Robert
E. Tarjan developed Fibonacci heaps in 1984 and published them in a scientific journal in 1987.
Fibonacci heaps are named after the Fibonacci numbers, which are used in their running time
analysis.
For the Fibonacci heap, the find-minimum operation takes constant (O(1)) amortized time.[1] The
insert and decrease key operations also work in constant amortized time. [2] Deleting an element
(most often used in the special case of deleting the minimum element) works in O(log n) amortized
Where n is the size of heap.

Using Fibonacci heaps for priority queues improves the asymptotic running time of important
algorithms, such as Dijkstra's algorithm for computing the shortest path between two nodes in a
graph, compared to the same algorithm using other slower priority queue data structures.

Heaps are mainly used for implementing priority queue. We have discussed below heaps in
previous posts.
Binary Heap
Binomial Heap
In terms of Time Complexity, Fibonacci Heap beats both Binary and Binomial Heaps.
Below are amortized time complexities of Fibonacci Heap.
1) Find Min: Θ(1) [Same as both Binary and Binomial]
2) Delete Min: O(Log n) [Θ(Log n) in both Binary and Binomial]
3) Insert: Θ(1) [Θ(Log n) in Binary and Θ(1) in Binomial]
4) Decrease-Key: Θ(1) [Θ(Log n) in both Binary and Binomial]
5) Merge: Θ(1) [Θ(m Log n) or Θ(m+n) in Binary and
Θ(Log n) in Binomial]
Like Binomial Heap, Fibonacci Heap is a collection of trees with min-heap or max-heap property. In
Fibonacci Heap, trees can can have any shape even all trees can be single nodes (This is unlike
Binomial Heap where every tree has to be Binomial Tree).
Below is an example Fibonacci Heap taken from here.

Fibonacci Heap maintains a pointer to minimum value (which is root of a tree). All tree roots are
connected using circular doubly linked list, so all of them can be accessed using single ‘min’ pointer.
The main idea is to execute operations in “lazy” way. For example merge operation simply links two
heaps, insert operation simply adds a new tree with single node. The operation extract minimum is
the most complicated operation. It does delayed work of consolidating trees. This makes delete also
complicated as delete first decreases key to minus infinite, then calls extract minimum.
Below are some interesting facts about Fibonacci Heap
1. The reduced time complexity of Decrease-Key has importance in Dijkstra and Prim algorithms. With
Binary Heap, time complexity of these algorithms is O(VLogV + ELogV). If Fibonacci Heap is used,
then time complexity is improved to O(VLogV + E)
2. Although Fibonacci Heap looks promising time complexity wise, it has been found slow in practice as
hidden constants are high (Source Wiki).
3. Fibonacci heap are mainly called so because Fibonacci numbers are used in the running time analysis.
Also, every node in Fibonacci Heap has degree at most O(log n) and the size of a subtree rooted in a
node of degree k is at least Fk+2, where Fk is the kth Fibonacci number.
Fibonacci Heap – Insertion and Union
Prerequisites:Fibonacci Heap (Introduction)
Fibonacci Heap is a collection of trees with min-heap or max-heap property. In Fibonacci Heap,
trees can can have any shape even all trees can be single nodes (This is unlike Binomial Heap
where every tree has to be Binomial Tree).
In this article, we will discuss Insertion and Union operation on Fibonacci Heap.

Insertion: To insert a node in a Fibonacci heap H, the following algorithm is followed:

1. Create a new node ‘x’.


2. Check whether heap H is empty or not.
3. If H is empty then:
 Make x as the only node in the root list.
 Set H(min) pointer to x.
4. Else:
 Insert x into root list and update H(min).
Example:

Union: Union of two Fibonacci heaps H1 and H2 can be accomplished as follows:


1. Join root lists of Fibonacci heaps H1 and H2 and make a single Fibonacci heap H.
2. If H1(min) < H2(min) then:
 H(min) = H1(min).
3. Else:
 H(min) = H2(min).
Example:

Fibonacci Heap – Deletion, Extract min and Decrease key


In the last post, we discussed Insertion and Union of Fibonacci Heaps. In this post, we will discuss
Extract_min(), Decrease_key() and Deletion() operations on Fibonacci heap.
Prerequisites:
Fibonacci Heap (Introduction)
Fibonacci Heap – Insertion and Union
Extract_min(): We create a function for deleting the minimum node and setting the min pointer to
the minimum value in the remaining heap. The following algorithm is followed:

1. Delete the min node.


2. Set head to the next min node and add all the tree of the deleted node in root list.
3. Create an array of degree pointers of the size of the deleted node.
4. Set degree pointer to current node.
5. Move to the next node.
 If degrees are different then set degree pointer to next node.
 If degrees are same then join the Fibonacci trees by union operation.
6. Repeat steps 4 and 5 until the heap is completed.

Example:

Decrease_key(): To decrease the value of any element in the heap, we follow the following
algorithm:
 Decrease the value of the node ‘x’ to the new chosen value.

 CASE 1) If min heap property is not violated,

 Update min pointer if necessary.


 CASE 2) If min heap property is violated and parent of ‘x’ is unmarked,

 Cut off the link between ‘x’ and its parent.


 Mark the parent of ‘x’.
 Add tree rooted at ‘x’ to the root list and update min pointer if necessary.
 CASE 3)If min heap property is violated and parent of ‘x’ is marked,

 Cut off the link between ‘x’ and its parent p[x].
 Add ‘x’ to the root list, updating min pointer if necessary.
 Cut off link between p[x] and p[p[x]].
 Add p[x] to the root list, updating min pointer if necessary.
 If p[p[x]] is unmarked, mark it.
 Else, cut off p[p[x]] and repeat steps 4.2 to 4.5, taking p[p[x]] as ‘x’.

Example:

Deletion(): To delete any element in a Fibonacci heap, the following algorithm is followed:
1. Decrease the value of the node to be deleted ‘x’ to minimum by Decrease_key() function.
2. By using min heap property, heapify the heap containing ‘x’, bringing ‘x’ to the root list.
3. Apply Extract_min() algorithm to the Fibonacci heap.
Example:

SPLAY TREE:

A splay tree is a self-balancing binary search tree with the additional property that
recently accessed elements are quick to access again. It performs basic operations
such as insertion, look-up and removal in O(log n) amortized time. For many
sequences of non-random operations, splay trees perform better than other search
trees, even when the specific pattern of the sequence is unknown. The splay tree was
invented by Daniel Sleator and Robert Tarjan in 1985.[1]
All normal operations on a binary search tree are combined with one basic operation,
called splaying. Splaying the tree for a certain element rearranges the tree so that the
element is placed at the root of the tree. One way to do this with the basic search
operation is to first perform a standard binary tree search for the element in question,
and then use tree rotations in a specific fashion to bring the element to the top.
Alternatively, a top-down algorithm can combine the search and the tree
reorganization into a single phase.
Advantages
Good performance for a splay tree depends on the fact that it is self-optimizing, in
that frequently accessed nodes will move nearer to the root where they can be
accessed more quickly. The worst-case height—though unlikely—is O(n), with the
average being O(log n). Having frequently-used nodes near the root is an advantage
for many practical applications (also see Locality of reference), and is particularly
useful for implementing caches and garbage collection algorithms.
Advantages include:

 Comparable performance: Average-case performance is as efficient as other


trees.[2]
 Small memory footprint: Splay trees do not need to store any bookkeeping data.

Disadvantages
The most significant disadvantage of splay trees is that the height of a splay tree can
be linear. For example, this will be the case after accessing all n elements in non-
decreasing order. Since the height of a tree corresponds to the worst-case access
time, this means that the actual cost of an operation can be high. However
the amortized access cost of this worst case is logarithmic, O(log n). Also, the
expected access cost can be reduced to O(log n) by using a randomized variant.[3]
The representation of splay trees can change even when they are accessed in a
'read-only' manner (i.e. by find operations). This complicates the use of such splay
trees in a multi-threaded environment. Specifically, extra management is needed if
multiple threads are allowed to perform find operations concurrently. This also makes
them unsuitable for general use in purely functional programming, although even
there they can be used in limited ways to implement priority queues.
Operations

Splaying
When a node x is accessed, a splay operation is performed on x to move it to the
root. To perform a splay operation we carry out a sequence of splay steps, each of
which moves x closer to the root. By performing a splay operation on the node of
interest after every access, the recently accessed nodes are kept near the root and
the tree remains roughly balanced, so that we achieve the desired amortized time
bounds.
Each particular step depends on three factors:

 Whether x is the left or right child of its parent node, p,


 whether p is the root or not, and if not
 whether p is the left or right child of its parent, g (the grandparent of x).
It is important to remember to set gg (the great-grandparent of x) to now point to x
after any splay operation. If gg is null, then x obviously is now the root and must be
updated as such.

There are three types of splay steps, each of which has two symmetric variants: left-
and right-handed. For the sake of brevity, only one of these two is shown for each
type. (In the following diagrams, circles indicate nodes of interest and triangles
indicate single nodes or sub-trees.) The three types of splay steps are:
Zig step: this step is done when p is the root. The tree is rotated on the edge
between x and p. Zig steps exist to deal with the parity issue and will be done only as
the last step in a splay operation and only when x has odd depth at the beginning of
the operation.
Zig-zig step: this step is done when p is not the root and x and p are either both right
children or are both left children. The picture below shows the case where x and p are
both left children. The tree is rotated on the edge joining p with its parent g, then
rotated on the edge joining x with p. Note that zig-zig steps are the only thing that
differentiate splay trees from the rotate to root method introduced by Allen and
Munro[4] prior to the introduction of splay trees.

Zig-zag step: this step is done when p is not the root and x is a right child and p is a
left child or vice versa. The tree is rotated on the edge between p and x, and then
rotated on the resulting edge between x and g.
Join
Given two trees S and T such that all elements of S are smaller than the elements of
T, the following steps can be used to join them to a single tree:

 Splay the largest item in S. Now this item is in the root of S and has a null right
child.
 Set the right child of the new root to T.

Split
Given a tree and an element x, return two new trees: one containing all elements less
than or equal to x and the other containing all elements greater than x. This can be
done in the following way:

 Splay x. Now it is in the root so the tree to its left contains all elements smaller
than x and the tree to its right contains all element larger than x.
 Split the right subtree from the rest of the tree.

Insertion
To insert a value x into a splay tree:
 Insert x as with a normal binary search tree.
 when an item is inserted, a splay is performed.
 As a result, the newly inserted node x becomes the root of the tree.
Alternatively:

 Use the split operation to split the tree at the value of x to two sub-trees: S and T.
 Create a new tree in which x is the root, S is its left sub-tree and T its right sub-
tree.

Deletion
To delete a node x, use the same method as with a binary search tree:

 If x has two children:


o Swap its value with that of either the rightmost node of its left sub tree (its in-
order predecessor) or the leftmost node of its right subtree (its in-order
successor).
o Remove that node instead.

In this way, deletion is reduced to the problem of removing a node with 0 or 1


children. Unlike a binary search tree, in a splay tree after deletion, we splay the
parent of the removed node to the top of the tree.
Alternatively:

 The node to be deleted is first splayed, i.e. brought to the root of the tree and then
deleted. leaves the tree with two sub trees.
 The two sub-trees are then joined using a "join" operation

Implementation and variants


Splaying, as mentioned above, is performed during a second, bottom-up pass over
the access path of a node. It is possible to record the access path during the first
pass for use during the second, but that requires extra space during the access
operation. Another alternative is to keep a parent pointer in every node, which avoids
the need for extra space during access operations but may reduce overall time
efficiency because of the need to update those pointers.[1]
Another method which can be used is based on the argument that we can restructure
the tree on our way down the access path instead of making a second pass. This top-
down splaying routine uses three sets of nodes - left tree, right tree and middle tree.
The first two contain all items of original tree known to be less than or greater than
current item respectively. The middle tree consists of the sub-tree rooted at the
current node. These three sets are updated down the access path while keeping the
splay operations in check. Another method, semisplaying, modifies the zig-zig case to
reduce the amount of restructuring done in all operations.

Red Black Tree:


A Red Black Tree is a type of self-balancing binary search tree, in which every
node is colored with a red or black. The red black tree satisfies all the properties of
the binary search tree but there are some additional properties which were added
in a Red Black Tree. The height of a Red-Black tree is O(Logn) where (n is the
number of nodes in the tree).

Properties of Red Black Tree


1. The root node should always be black in color.
2. Every null child of a node is black in red black tree.
3. The children of a red node are black. It can be possible that parent of red
node is black node.

4. All the leaves have the same black depth.


5. Every simple path from the root node to the (downward) leaf node contains
the same number of black nodes.

Representation of Red Black Tree

While representing the red black tree color of each node should be shown. In this
tree leaf nodes are simply termed as null nodes which means they are not physical
nodes. It can be checked easily in the above-given tree there are two types of node
in which one of them is red and another one is black in color. The above-given tree
follows all the properties of a red black tree that are
1. It is a binary search tree.
2. The root node is black.
3. The children’s of red node are black.
4. All the root to external node paths contain same number of black nodes.
Example: Consider path 75-90-80-88-null and 75-40-30-null in both these
paths 3 black nodes are there.

Advantages of Red Black Tree


1. Red black tree are useful when we need insertion and deletion relatively
frequent.
2. Red-black trees are self-balancing so these operations are guaranteed to be
O(logn).
3. They have relatively low constants in a wide variety of scenarios.

Operations on RB Trees:
The search-tree operations TREE-INSERT and TREE-DELETE, when runs on
a red-black tree with n keys, take O (log n) time. Because they customize
the tree, the conclusion may violate the red-black properties. To restore
these properties, we must change the color of some of the nodes in the
tree and also change the pointer structure.

1. Rotation:

Restructuring operations on red-black trees can generally be expressed


more clearly in details of the rotation operation.
Clearly, the order (Ax By C) is preserved by the rotation operation.
Therefore, if we start with a BST and only restructure using rotation, then
we will still have a BST i.e. rotation do not break the BST-Property.

LEFT ROTATE (T, x)


1. y ← right [x]
1. y ← right [x]
2. right [x] ← left [y]
3. p [left[y]] ← x
4. p[y] ← p[x]
5. If p[x] = nil [T]
then root [T] ← y
else if x = left [p[x]]
then left [p[x]] ← y
else right [p[x]] ← y
6. left [y] ← x.
7. p [x] ← y.

Example: Draw the complete binary tree of height 3 on the keys {1, 2,
3... 15}. Add the NIL leaves and color the nodes in three different ways
such that the black heights of the resulting trees are: 2, 3 and 4.

Solution:
Tree with black-height-2

Tree with black-height-3


Tree with black-height-4

2. Insertion:
o Insert the new node the way it is done in Binary Search Trees.
o Color the node red

o If an inconsistency arises for the red-black tree, fix the tree according
to the type of discrepancy.

A discrepancy can decision from a parent and a child both having a red
color. This type of discrepancy is determined by the location of the node
concerning grandparent, and the color of the sibling of the parent.
RB-INSERT (T, z)
1. y ← nil [T]
2. x ← root [T]
3. while x ≠ NIL [T]
4. do y ← x
5. if key [z] < key [x]
6. then x ← left [x]
7. else x ← right [x]
8. p [z] ← y
9. if y = nil [T]
10. then root [T] ← z
11. else if key [z] < key [y]
12. then left [y] ← z
13. else right [y] ← z
14. left [z] ← nil [T]
15. right [z] ← nil [T]
16. color [z] ← RED
17. RB-INSERT-FIXUP (T, z)

After the insert new node, Coloring this new node into black may violate
the black-height conditions and coloring this new node into red may
violate coloring conditions i.e. root is black and red node has no red
children. We know the black-height violations are hard. So we color the
node red. After this, if there is any color violation, then we have to correct
them by an RB-INSERT-FIXUP procedure.
RB-INSERT-FIXUP (T, z)
1. while color [p[z]] = RED
2. do if p [z] = left [p[p[z]]]
3. then y ← right [p[p[z]]]
4. If color [y] = RED
5. then color [p[z]] ← BLACK //Case 1
6. color [y] ← BLACK //Case 1
7. color [p[z]] ← RED //Case 1
8. z ← p[p[z]] //Case 1
9. else if z= right [p[z]]
10. then z ← p [z] //Case 2
11. LEFT-ROTATE (T, z) //Case 2
12. color [p[z]] ← BLACK //Case 3
13. color [p [p[z]]] ← RED //Case 3
14. RIGHT-ROTATE (T,p [p[z]]) //Case 3
15. else (same as then clause)
With "right" and "left" exchanged
16. color [root[T]] ← BLACK

Example: Show the red-black trees that result after successively inserting
the keys 41,38,31,12,19,8 into an initially empty red-black tree.
Solution:

Insert 41
Insert 19
Thus the final tree is

3. Deletion:

First, search for an element to be deleted

o If the element to be deleted is in a node with only left child, swap this
node with one containing the largest element in the left subtree.
(This node has no right child).
o If the element to be deleted is in a node with only right child, swap
this node with the one containing the smallest element in the right
subtree (This node has no left child).
o If the element to be deleted is in a node with both a left child and a
right child, then swap in any of the above two ways. While swapping,
swap only the keys but not the colors.
o The item to be deleted is now having only a left child or only a right
child. Replace this node with its sole child. This may violate red
constraints or black constraint. Violation of red constraints can be
easily fixed.
o If the deleted node is black, the black constraint is violated. The
elimination of a black node y causes any path that contained y to
have one fewer black node.
o Two cases arise:
o The replacing node is red, in which case we merely color it black
to make up for the loss of one black node.
o The replacing node is black.

The strategy RB-DELETE is a minor change of the TREE-DELETE


procedure. After splicing out a node, it calls an auxiliary procedure RB-
DELETE-FIXUP that changes colors and performs rotation to restore the
red-black properties.

RB-DELETE (T, z)
1. if left [z] = nil [T] or right [z] = nil [T]
2. then y ← z
3. else y ← TREE-SUCCESSOR (z)
4. if left [y] ≠ nil [T]
5. then x ← left [y]
6. else x ← right [y]
7. p [x] ← p [y]
8. if p[y] = nil [T]
9. then root [T] ← x
10. else if y = left [p[y]]
11. then left [p[y]] ← x
12. else right [p[y]] ← x
13. if y≠ z
14. then key [z] ← key [y]
15. copy y's satellite data into z
16. if color [y] = BLACK
17. then RB-delete-FIXUP (T, x)
18. return y

RB-DELETE-FIXUP (T, x)
1. while x ≠ root [T] and color [x] = BLACK
2. do if x = left [p[x]]
3. then w ← right [p[x]]
4. if color [w] = RED
5. then color [w] ← BLACK //Case 1
6. color [p[x]] ← RED //Case 1
7. LEFT-ROTATE (T, p [x]) //Case 1
8. w ← right [p[x]] //Case 1
9. If color [left [w]] = BLACK and color [right[w]] = BLACK
10. then color [w] ← RED //Case 2
11. x ← p[x] //Case 2
12. else if color [right [w]] = BLACK
13. then color [left[w]] ← BLACK //Case 3
14. color [w] ← RED //Case 3
15. RIGHT-ROTATE (T, w) //Case 3
16. w ← right [p[x]] //Case 3
17. color [w] ← color [p[x]] //Case 4
18. color p[x] ← BLACK //Case 4
19. color [right [w]] ← BLACK //Case 4
20. LEFT-ROTATE (T, p [x]) //Case 4
21. x ← root [T] //Case 4
22. else (same as then clause with "right" and "left"
exchanged)
23. color [x] ← BLACK

Example: In a previous example, we found that the red-black tree that


results from successively inserting the keys 41,38,31,12,19,8 into an
initially empty tree. Now show the red-black trees that result from the
successful deletion of the keys in the order 8, 12, 19,31,38,41.

Solution:
Delete 38

Delete 41

No Tree.

You might also like