Daa Book
Daa Book
ANALYSIS OF
ALGORITHM
1
LECTURE NOTES
ON
III Semester
Prepared by
V.SARANYA , M.TECH
K. .RAGHAVI , M.E
2
CONTENTS
3
2.6 Network Flow 75
2.7 Maximum Bipartite Matching 76
4
5.3 Non Deterministic algorithms 138
5.4 Approximation Algorithms 143
5.4.1 Travelling salesman problem 144
5.5 Bin Packing Problem 146
5.6 Randomized Algorithm 158
5.7 Primality Testing 158
5.8 Randomized quick sort 161
5
23CS312 DESIGN AND ANALYSIS OF ALGORITHMS L T P C
3 0 2 4
COURSE OBJECTIVES:
To understand and apply the algorithm analysis techniques on searching and
sorting algorithms
To critically analyze the efficiency of graph
algorithms
To understand different algorithm design
techniques
To solve programming problems using state space
tree
To understand the concepts behind NP Completeness, Approximation algorithms
and randomized algorithms
UNIT I INTRODUCTION 9
Time and space complexity - Asymptotic Notations – Solving Recurrences:
substitution method - Lower bounds - hash function - searching: linear search, binary
search and Interpolation Search, String Matching: The naïve string - matching
algorithm - Rabin-Karp algorithm - Sorting: Insertion sort, heap sort
Time and space complexity - Asymptotic Notations – Solving Recurrences: substitution method
- Lower bounds - hash function - searching: linear search, binary search and Interpolation
Search, String Matching: The naïve string - matching algorithm - Rabin-Karp algorithm -
Sorting: Insertion sort, heap sort
Definition:
Algorithm
Characteristics of an algorithm:
Non – ambiguity / Precise : Each step in an algorithm should be non- ambiguous. i.e
eachinstruction should be clear and precise.
Finiteness: The algorithm should be finite. The algorithm must be terminated in a
specifiedtime.
Uniqueness: The algorithm must be specified a required output.
Input: Algorithm receives input
Output: Algorithm produces output
Generality: The algorithm must works for all set of inputs.
Notion of an Algorithm
An algorithm is a sequence of unambiguous instruction for solving a problem, for
obtaining a required outputfor any legitimate input in a finite amount of time.
• The range of inputs for which an algorithm works has to be specified carefully.
• The same algorithm can be represented in several different ways.
• There may exist several algorithms for solving the same problem.
2
1.1 Understanding of Algorithm
2. The range of input for which an algorithm works has to be specified carefully.
3. The same algorithm can be represented in different ways.
4. Several algorithms for solving the same problem may exist.
5. Algorithms for the same problem can be based on very different ideas and can solve
the problem with dramatically different speeds.
The example here is to find the gcd of two integers with three different ways: The gcd
of two nonnegative, not-both –zero integers m & n, denoted as gcd (m, n) is defined as
the largest integer that divides both m &n evenly, i.e., with a remainder of zero.
Euclid of Alexandria outlined an algorithm, for solving this problem in one of the
volumes of his Elements.
Gcd (m, n) = gcd (n, m mod n)
since gcd (m, o) = m. {the last value of m is also the gcd of the initial m & n.}
3
nd
This algorithm comes to a stop, when the 2 no becomes 0. The second number of the
pair gets smaller witheach iteration and it cannot become negative. Indeed, the new
value of n on the next iteration is m mod n, which is always smaller than n. hence, the
value of the second number in the pair eventually becomes 0, I algorithm stops.
The second method for the same problem is: obtained from the definition itself. i.e., gcd
of m & n is the largest integer that divides both numbers evenly. Obviously, that
number cannot be greater than the second number (or) smaller of these two numbers,
which we will denote by t = min {m, n}. So start checking whether t divides both m and
n: if it does t is the answer ; if it doesn’t t is decreased by 1 and try again. (Do this
repeatedly till you reach 12 and then stop for the example given below)
This procedure is more complex and ambiguity arises since the prime factorization is not
defined.So to make it as an efficient algorithm, incorporate the algorithm to find the prime
factors.
Performance Analysis:
Performance analysis or analysis of algorithms refers to the task of determining the
efficiency of an algorithm i.,e how much computing time and storage an algorithm
requires to run (or execute). This analysis of algorithm helps in judging the value of one
algorithm over another.
1. Instruction Space: Instruction space is the space needed to store the compiled version
of the programinstructions. The amount of instruction space that is needed depends on
factors such as
i). The compiler used to compile the program into machine code.ii). The compiler options
in effect at the time of compilation.
iii). The target computer, i.,e computer on which the algorithm run. Note that, one
compiler may produceless code as compared to another compiler, when the same
program is compiled by these two.
2. Data Space: Data space is the space needed to store all constant and variable values.
Data space has two components.
1. Space needed by constants, for example 0, 1, 2.134.
2. Space needed by dynamically allocated objects such as arrays, structures, classes.
5
Environmental Stack Space:
1. Environmental stack space is used during execution of functions.
2. Each time function is involved the followingdata are saved as the environmental
stack.
The return address.
1. Value of local variables.
2. Value of formal parameters in the function being invoked.
3. Environmental stack space is mainly used inrecursive functions.
}
The following equation determines the number of addition, subtraction,
multiplication, division compares, loads stores and so on, that would be made by
the code for p.
tp(n) = CaADD(n)+ CsSUB(n)+ CmMUL(n)+ CdDIV(n)+……………..
where n denotes instance characteristics, and Ca, Cs, Cm, Cd and so on… As denote
the time needed for an addition, subtraction, multiplication, division and so on,
and ADD, SUB, MUL, DIV
and so on, are functions whose values are the number of additions, subtractions,
multiplications, divisions and so on. But this method is an impossible task to find
out time complexity.
In the above algorithm, there are no instance characteristics and the space needed by
X, Y, Z is independent of instancecharacteristics, therefore we can write, S(XYZ)
=3+0=3 One space each for X, Y and Z Space complexity is O(1).
6
1.2.2 Time Complexity :
7
Complexity of Algorithms:
• Best Case: Inputs are provided in such a way that the minimum time is required
to process them.
• Average Case: The amount of time the algorithm takes on an average set
ofinputs.
• Worst Case: The amount of time the algorithm takes on the worst possible set of
inputs.
6 1
3 4 5 7 9 1 1
0
2 5
1 2 3 4 5 6 7 8 9
• Worst Case: If we want to search an element 15, whether it is present in the array
or not.
First, A(1) is compared with 15 (i.,e, 3=15), no match occurs. Continue this
process until either element is found or the list is exhausted. The element is found
at 9th comparison. So number of comparisons are
Time complexity is O(n).
8
There are different types of time complexities:
• O(1) or constant time: the algorithm takes the same amount of time to run
regardless of the size of the input.
• O(n) or linear time: the algorithm's running time increases linearly with the
size of the input.
• O(n) or linear space: the algorithm's memory usage increases linearly with the
a. Asymptotic Notation
Input: Here our input is an integer array of size "n" and we have
one integer "k" that we need to searchfor in that array.
Output: If the element "k" is found in the array, then we have return 1, otherwise
we have
Average case: We calculate the running time for all possible inputs, sum all the
calculated values and divide the sum by the total number of inputs. We must know
(or predict) distribution of cases.
Worst case: This is the upper bound on running time of an algorithm. We must
know the case that causes the maximum number of operations to be executed. In
our example, the worst case can be if the given arrayis [1, 2, 3, 4, 5] and we try to
find if element "6" is present in the array or not. Here, the if-condition of ourloop
will be executed 5 times and then the algorithm will give "0" as output.
So, we learned about the best, average, and worst case of an algorithm. Now, let's
get back to the asymptotic notation where we saw that we use three asymptotic
notation to represent the complexity of an algorithm i.e. Θ Notation (theta), Ω
Notation, Big O Notation
11
positive constant c and some nonnegative integer n0 such that f(n) ≤ cg(n) for all n
≥ n0 and c >0.O notation analyses the worst case for the given function. The
definition is illustrated in the following figure. Here n is size of an input &f(n) is a
function of n. When the input size increases, then the time is also increases.
Example:Let take f(n)=3n+2 and g(n)=n. We have to prove f(n)∈O(g(n)). By the definition,
f(n)≤c g(n)
3n+2≤ c * n where c>0, n0≥1.
We can substitute any value for c. The best option is, When c=4,
3n+2≤ 4 * n. Most of the cases, 4*n is greater than 3n+2.
Where n≥2.Reason for taking n≥2:
If n=1, 3(1)+2 ≤ 4(1)=> 5≤4=>
Becomes False.If n=2, 3(2)+2
≤ 4(2)=> 8≤8=>
Becomes True.
If n=3, 3(3)+2 ≤ 4(3)=> 11≤12=> Becomes True.
If n=4, 3(4)+2 ≤ 4(4)=> 14≤16=>
Becomes True. And so on.Therefore
3n+2 ∈ O(n).
2. Ω Notation (Big Omega):Definition: A function f(n) is said to be in Ω(g(n)), denoted f(n)
∈ Ω(g(n)), if f(n) is bounded below by some positive constant multiple of
g(n) for all large n, i.e.,if there exist some positive constant c and some
nonnegative integer n0 such that
f(n) ≥ cg(n) for all n ≥ n0 and c > 0.Ω notation analyses the best case
for the given function. Thedefinition is illustrated in the following figure.
12
if n=1, 3(1)+2 ≥1(1)=> 5≥1=> Becomes True. If n=2, 3(2)+2 ≥1(2)=>
8≥2=> Becomes True.If n=3, 3(3)+2 ≥1(3)=> 11≥3=> Becomes True. And so on.
Therefore 3n+2 ∈ Ω(n).
PROOF:
The proof extends to orders of growth the following simple fact about four arbitrary
real numbers a1, b1,a2, b2: if a1 ≤ b1 and a2 ≤ b2, then a1 + a2 ≤ 2 max{b1, b2}.
Since t1(n) ∈ O(g1(n)), there exist some positive constant c1 and some nonnegative
integer n1 such thatt1(n) ≤ c1g1(n) for all n ≥ n1.
Similarly, since t2(n) ∈ O(g2(n)),t2(n) ≤ c2g2(n) for all n ≥ n2.
Let us denote c3 = max{c1, c2} and consider n ≥ max{n1, n2} so that we can use both
inequalities.
14
1.4 Recurrence Relation
A recurrence relation is a mathematical equation that describes the relation between
the input sizeand the running time of a recursive algorithm. It expresses the running
time of a problem in termsof the running time of smaller instances of the same
problem.
A recurrence relation typically has the form T(n) = aT(n/b) + f(n) where:
For example, let's consider the problem of computing the nth Fibonacci number. A
simple recursive algorithm for solving this problem is as follows:
15
Fibonacci(n) if n <= 1 return n else
16
Recursive Tree Method Master Theorem
Now in this article we are going to focus on Substitution Method.
1. Forward Substitution:
It is called Forward Substitution because here we substitute recurrence of any term
into next terms.It uses following steps to find Time using recurrences-
Pick Recurrence Relation and the given initial Condition
Put the value from previous recurrence into the next recurrence Observe and
Guess the pattern and the time
Prove that the guessed result is correct using mathematical Induction.Now we will
use these steps to solve a problem. The problem is- T(n) = T(n-1) + n, n>1
T(n) = 1, n=1
Now we will go step by step-
1. Pick Recurrence and the given initial Condition:
T(n)=T(n-1)+n, n>1T(n)=1, n=1
2. Put the value from previous recurrence into the next recurrence:
T(1) = 1T(2) = T(1) + 2 = 1 + 2 = 3T(3) = T(2) + 3 = 1 + 2 + 3 = 6T(4)= T(3) + 4 = 1 + 2 + 3
+ 4 = 10
3. Observe and Guess the pattern and the time:
So guessed pattern will be-T(n) = 1 + 2 + 3 + n = (n * (n+1))/2Time Complexity will be O(n2)
4. Prove that the guessed result is correct using mathematical Induction:
Prove T(1) is true:
T(1) = 1 * (1+1)/2 = 2/2 = 1 and from definition of recurrence we know T(1) = 1. Hence
proved T(1) is trueAssume T(N-1) = ((N - 1) * (N-1+1))/2 = (N * (N-1))/2 to be true
Then prove T(N) will be true:T(N) = T(N-1) + N from recurrence definition
Now, T(N-1) = N * (N-1)/2So, T(N) = T(N-1) + N = (N * (N-1))/2 + N = (N * (N-1) +
2N)/2 =N * (N+1)/2And
from our guess also T(N)=N(N+1)/2Hence T(N) is true.Therefore our guess was
correct and time will be O(N2)
2. Backward Substitution:
It is called Backward Substitution because here we substitute recurrence of any term
into previousterms. It uses following steps to find Time using recurrences-
Take the main recurrence and try to write recurrences of previous terms
Take just previous recurrence and substitute into main recurrence
Again take one more previous recurrence and substitute into main
recurrence.
Do this process until you reach to the initial condition
17
o After this substitute the the value from initial condition and get the
solutionNow we will use these steps to solve a problem.
The problem is- T(n) = T(n-1) + n, n>1T(n) = 1, n = 1
Now we will go step by step-
1.Take the main recurrence and try to write recurrences of previous terms:
T(n) = T(n-1) + nT(n-1) = T(n-2) + n - 1T(n-2) = T(n-3) + n - 2
2.Take just previous recurrence and substitute into main recurrence
put T(n-1) into T(n)So, T(n)=T(n-2)+ n-1 + n
3.Again take one more previous recurrence and substitute into main recurrence
put T(n-2) into T(n)So, T(n)=T(n-3)+ n-2 + n-1 + n
4.Do this process until you reach to the initial condition
So similarly we can find T(n-3), T(n-4) and so on and can insert into T(n). Eventually we
will get following: T(n)=T(1) + 2 + 3 + 4 + ...... + n-1 + n
5.After this substitute the the value from initial condition and get the solution
Put T(1)=1, T(n) = 1 +2 +3 + 4 +................... + n-1 + n = n(n+1)/2. So Time will be O(N2)
Limitations of Substitution method:
The Substitution method is a useful technique to solve recurrence relations, but it also
has somelimitations. Some of the limitations are:
It is not guaranteed that we will find the solution as substitution method is based
on guesses.
It doesn't provide guidance on how to make an accurate guess, often relying on
intuition or trialand error.
It may only yield a specific or approximate solution rather than the most general or
precise one.
The substitution method isn't universally applicable to all recurrence relations,
especially thosewith complex or variable forms that do not get simplified using
substitution.
Lower and Upper Bound
The Lower and Upper Bound Theory provides a way to find the lowest complexity
algorithm to solve a problem. Before understanding the theory, first, let’s have a
brief look at what Lower and Upper bounds are.
LowerBound –
Let L(n) be the running time of an algorithm A(say), then g(n) is the Lower Bound
of A if there exist two constants C and N such that L(n) >= C*g(n) for n > N. Lower
bound of an algorithm is shown by the asymptotic notation called Big Omega (or
just Omega).
UpperBound –
Let U(n) be the running time of an algorithm A(say), then g(n) is the Upper Bound
of A if there exist two constants C and N such that U(n) <= C*g(n) for n > N. Upper
bound of an algorithm is shown by the asymptotic notation called Big Oh(O) (or just
Oh).
18
1.LowerBoundTheory:
According to the lower bound theory, for a lower bound L(n) of an algorithm, it is
not possible to have anyother algorithm (for a common problem) whose time
complexity is less than L(n) for random input. Also,every algorithm must take at
least L(n) time in the worst case. Note that L(n) here is the minimum of all
thepossible algorithms, of maximum
complexity. The Lower Bound is very important for any algorithm. Once we
calculated it, then we can compare it withthe actual complexity of the algorithm
and if their order is the same then we can declare our algorithm asoptimal. So in
this section, we will be discussing techniques for finding the lower bound of an
algorithm. Note that our main motive is to get an optimal algorithm, which is the
one having its Upper Bound the Sameas its Lower Bound (U(n)=L(n)). Merge Sort
is a common example of an optimal algorithm.
TrivialLowerBound
–
It is the easiest method to find the lower bound. The Lower bounds which can be
easily observed based onthe number of input taken and the number of output
produced are called Trivial Lower Bound
A hash function is a mathematical function that takes an input (or "message") and
produces a fixed-size string of characters, often in the form of a hash value or hash
code. This output is typically a sequence of numbers and letters, and the same input
always produces the same output. Hash functions are widely used in computer
science, cryptography, and data processing.
1. Data Integrity: Ensures data has not been altered during transmission or
storage by comparing hash values.
2. Cryptography: Used in digital signatures, password hashing, and secure
19
communication protocols.
3. Hash Tables: Enables fast data retrieval by mapping keys to unique
indices.
4. Blockchain: Verifies transactions and maintains chain integrity by linking
blocks with hash values.
5. Checksums: Verifies file integrity during downloads or transfers.
1.6 Searching
Searching is the process of fetching a specific element in a collection of elements. The
collection can be an array or a linked list. If you find the element in the list, the
process is considered successful, and it returns the location of that element.
Two prominent search strategies are extensively used to find a specific item on a list.
However, the algorithmchosen is determined by the list's organization.
1. Linear Search
2. Binary Search
3. Interpolation search
20
Step 3: If both are matched, display "Target element is found" and terminate the
Linear Search function. Step 4: If both are not matched, compare the search element
with the next element in the array. Step 5: In this step, repeat steps 3 and 4 until the
search (Target) element is compared with the last element of the array. Step 6 - If the
last element in the list does not match, the Linear Search Function will be terminated,
and themessage "Element is not found" will be displayed.
Linear Search ( Array Arr, Value a ) // Arr is the name of the array, and a is the searched
element. Step 1: Set i to 0 // i is the index of an array which starts from 0
Step 2: ifi > n then go to step 7 // n is the number of elements in
array Step 3: if Arr[i] = a then go to step 6
Step 4: Set i to i + 1
Step 5: Goto step 2
Step 6: Print element a found at index i and go to
step 8 Step 7: Print element not found
Step 8: Exit
Start
linear_search ( Array , value)
Step 1: The searched element 39 is compared to the first element of an array, which
is 13.
21
The match is not found, you now move on to the next element and try to implement
a comparison.Step 2: Now, search element 39 is compared to the second element of
an array, 9.
Step 3: Now, search element 39 is compared with the third element, which is 21.
Again, both the elements are not matching, you move onto the next following
element. Step 4; Next, search element 39 is compared with the fourth element,
whichis 15.
Step 5: Next, search element 39 is compared with the fifth element 39.
22
The Complexity of Linear Search Algorithm
Three different complexities faced while performing Linear Search Algorithm, they
arementioned as follows.
Best Case
Worst Case
Average Case
Best Case Complexity
The element being searched could be found in the first position.
In this case, the search ends with a single successful comparison.
Thus, in the best-case scenario, the linear search algorithm performs O(1)
operations.
Worst Case Complexity
The element being searched may be at the last position in the array or not at all.
In the first case, the search succeeds in ‘n’ comparisons.
In the next case, the search fails after ‘n’ comparisons.
Thus, in the worst-case scenario, the linear search algorithm performs O(n)
operations.
Average Case Complexity
When the element to be searched is in the middle of the array, the average case of
the Linear Search Algorithm is O(n).
Space Complexity of Linear Search Algorithm
The linear search algorithm takes up no extra space; its space complexity is O(n) for
an array of n elements.
Application of Linear Search Algorithm
The linear search algorithm has the following applications:
• Linear search can be applied to both single-dimensional and multi-
dimensional arrays.
• Linear search is easy to implement and effective when the array
contains only a fewelements.
• Linear Search is also efficient when the search is performed to fetch a
single search inan unordered-List.
23
Code Implementation of Linear Search Algorithm
#include<stdio.h.>
#include<stdlib.h>
#include<conio.h>
int main()
{
int array[50],i,target,num;
24
1.6.2 Binary Search
Binary search is the search technique that works efficiently on sorted lists. Hence,
to search an element into some list using the binary search technique, we must
ensure that the list issorted.
Binary search follows the divide and conquer approach in which the list is divided
into two halves, and the item is compared with the middle element of the list. If
the match is found then, the location of the middle element is returned. Otherwise,
we search into either of the halvesdepending upon the result produced through
the match
NOTE: Binary search can be implemented on sorted array elements. If the list
elements are not arranged in a sorted manner, we have first to sort them.
Algorithm
1. Binary_Search(a, lower_bound, upper_bound, val) // 'a' is the given
array, 'lower_bound' is the index of the first array element, 'upper_bound' is the
index of the last array element, 'val' is the value tosearch
2. Step 1: set beg = lower_bound, end = upper_bound, pos = - 1
3. Step 2: repeat steps 3 and 4 while beg <=end
4. Step 3: set mid = (beg + end)/2
5. Step 4: if a[mid] = val
6. set pos = mid
7. print pos
8. go to step 6
9. else if a[mid] > val
10. set end = mid - 1
11. else
12. set beg = mid + 1
13. [end of if]
14. [end of loop]
15. Step 5: if pos = -1
16. print "value is not present in the array"
17. [end of if]
18. Step 6: exit Procedure binary_searchA ←
25
sorted
array n ← size ofarray x ← value
to besearched Set lowerBound = 1
Set upperBound = n while xnot found
if upperBound < lowerBound EXIT: x does not exists.
set midPoint = lowerBound + ( upperBound - lowerBound
) / 2 if A[midPoint] < x
set lowerBound = midPoint
+ 1 if A[midPoint] > x
set upperBound = midPoint -1 if A[midPoint] = x
EXIT: x found at locationmidPoint end
while
end procedure
beg = 0
end = 8
mid = (0 + 8)/2 = 4. So, 4 is the mid of the array.
26
Now, the element to search is found. So algorithm will return the index of the
elementmatched.
Binary Search complexity
Now, let's see the time complexity of Binary search in the best case, average
case, andworst case. We will also see the space complexity of Binary search.
Time Complexity
Case Time
Complexity
Best Case O(1)
Average O(logn)
Case
Worst O(logn)
Case
Best Case Complexity - In Binary search, best case occurs when the
element to search is found in first comparison, i.e., when the first middle element
itself is the element to be searched. The best-case time complexity of Binary search is
O(1).
Average Case Complexity - The average case time complexity of Binary
search is O(logn).
Worst Case Complexity - In Binary search, the worst case occurs, when
we have to keep reducing the search space till it has only one element.
The worst-case time complexity of Binary search is O(logn).
Space Complexity
o The space complexity of binary search is O(1).
27
Implementation of Binary Search
Program: Write a program to implement Binary search in C language.
#include <stdio.h>
int binarySearch(int a[], int beg, int end, int val)
{
int mid;
if(end >= beg)
{ mid = (beg + end)/2;
/* if the item to be searched is present at middle */
if(a[mid] == val)
{
return mid+1;
}
/* if the item to be searched is smaller than middle, then it can only be in left
subarray*/
else if(a[mid] < val)
{
return binarySearch(a, mid+1, end, val);
}
/* if the item to be searched is greater than middle, then it can only be in right
subarray*/
else
. {binaryarch(a, bkkeg, mid-1
}
int main() {
int a[] = {11, 14, 25, 30, 40, 41, 52, 57, 70}; // given array
int val = 40; // value to be searched
int n = sizeof(a) / sizeof(a[0]); // size of array
int res = binarySearch(a, 0, n-1, val); // Store result printf("The elements of the array
are - ");
for (int i = 0; i < n; i++) printf("%d ", a[i]);
printf("\nElement to be searched is - %d", val);
if (res == -1)
printf("\nElement is not present in the array");
else
printf("\nElement is present at %d position of array", res);
return 0;
}
28
Output
Initially, theprobe position is the position of the middle most item of the collection.
If a match occurs, then the index of the item is returned. To split the list into
two parts, we use the following method −
mid = Lo + ((Hi - Lo) / (A[Hi] - A[Lo])) * (X - A[Lo])
where − A = list
Lo = Lowest index of the list Hi = Highest indexof the list
A[n] = Value stored at index n in the list
If the middle item is greater than the item, then the probe position is again
calculated in the sub- array to the right of the middle item. Otherwise, the item is
searched in the subarray to the left of the middle item. This process continues on the
sub- array as well until the size of subarrayreduces to zero.
Runtime complexity of interpolation search algorithm is Ο(log (log n)) as compared
to
Ο(log n) of BST in favorable situations.
Algorithm
As it is an improvisation of the existing BST algorithm, we are mentioning the
29
steps tosearch the 'target' data value index, using position probing −
Step 1 − Start searching data from middle of the list.Step 2 − If it is a match, return
the index of the item,
and exit. Step 3 − If it is not a match, probe position.
Step 4 − Divide the list using probing formula and find the new midle.Step 5 − If data
is greater than middle, search in higher sub-list.
Step 6 − If data is smaller than middle, search in lower sub-list. Step 7 − Repeat until
match.
PseudocodeA
→ Array list N
→ Size of A
X → Target Value
if A[Mid] = X
EXIT: Success, Target found at Mid elseif A[Mid] < X
Set Lo to Mid+1 else if A[Mid] > X Set Hi to Mid-1
end if end if End While End
Procedure
Implementation of interpolation in C
#include<stdio.h
>#define MAX 10
// array of items on which linear search will be conducted. int list[MAX] = { 10, 14,
19, 26, 27, 31, 33, 35, 42, 44 };
int find(int data) { int
lo = 0;
int hi = MAX - 1; int mid = -1;
int comparisons = 1; int index = -1; while(lo
<= hi) {
printf("\nComparison %d \n" , comparisons ) ; printf("lo :
%d, list[%d] = %d\n", lo, lo, list[lo]);
30
printf("hi : %d, list[%d] = %d\n", hi, hi, list[hi]);
comparisons++;
// probe the mid point
mid = lo + (((double)(hi - lo) / (list[hi] - list[lo])) * (data - list[lo])); printf("mid =
%d\n",mid);
// data found if(list[mid]
== data) {
index = mid; break;
} else {
if(list[mid] < data) {
// if data is larger, data is in upper half lo = mid
+ 1;
} else {
// if data is smaller, data is in lower half hi = mid
- 1;
}
}
}
31
Worst-case-O(n)
The worst case occurs when the given data set is exponentially distributed.
Averagecase-O(log(log(n)))
If the data set is sorted and uniformly distributed, then it takes O(log(log(n))) time as
on an average(log(log(n))) comparisons are made.
Space Complexity
O(1) as no extra space is required.
Naïve pattern searching is the simplest method among other pattern searching
algorithms. It checks for all character of the main string to the pattern. This
algorithm is helpful for smaller texts. It does not need any pre-processing phases.
We can find substring by checking once for the string. It alsodoes not occupy extra
space to perform the operation.
The time complexity of Naïve Pattern Search method is O(m*n). The m is the size
of pattern andn is the size of the main string.
32
Algorithm
naive_algorithm(pattern, text)
Input − The text and the pattern
Output − locations, where the pattern is present in the text Stpaart _len := pattern
Size
str_len := string size
for i := 0 to (str_len - pat_len), do for j:= 0 to pat_len, do if text[i+j] ≠ pattern[j], then
breakif j == patLen, then
display the position i, as there pattern foundEnd
Implementation in C
#include <stdio.h> #include <string.h> int main (){
char txt[] = "tutorialsPointisthebestplatformforprogrammers"; charpat[] = "a";
int M = strlen (pat); int N
= strlen (txt);
for (int i = 0; i <= N - M; i++){ intj;
for (j = 0; j < M; j++) if (txt[i + j] != pat[j]) break;
if (j == M) printf("Patternmatchesatindex%d", i);
}
return 0;
}
Output
Pattern matches at 6 Pattern matches at 25 Pattern matches at 39
1.7.2 Rabin-Karp matching pattern
Example: Foí stíing matching, woíking module q = 11, how many spuíious hits
does the Rabin-Kaíp matcheí encounteís in ľext ľ = 31415926535…….
T = 31415926535……. P = 26 Here T.Length =11 so Q = 11 And P mod Q = 26 mod 11
= 4 Nowfind the exact match of P mod Q…
34
Implementation In C #include<stdio.h
>
#include<string.h
>int main (){
char txt[80], pat[80]; intq;
printf("Enterthecontainerstring "); scanf ("%s",&txt);
printf("Enterthepatterntobesearched "); scanf ("%s", &pat);
int d=256;
printf("Enteraprimenumber "); scanf ("%d", &q);
int M = strlen (pat); int
N = strlen (txt); int i, j;int p = 0; int t = 0; int h = 1;
for (i = 0; i < M - 1; i++) h = (h
*d) % q;
for (i = 0; i < M; i++){ p = (d * p + pat[i]) % q;t = (d * t + txt[i]) % q;
}
for (i = 0; i <= N - M; i++)
{ if (p == t){
for (j = 0; j < M; j++){ if (txt[i + j] != pat[j]) break;
}
if (j == M) printf("Patternfoundatinde
%d ", i);
}
if (i < N - M){
t = (d * (t - txt[i] * h) + txt[i + M]) % q; if (t
< 0)
t = (t + q);
}
}
return 0;
}
35
Output
Enter the container string tutorials point is the best programming website Enter the
pattern tobe searched
p
Enter a prime number 3 Pattern found at index
8Pattern found at index 21
n this problem, we are given two strings a text and a pattern. Our task is to create
a programfor KMP algorithm for pattern search, it will find all the occurrences of
pattern in text string.
Here, we have to find all the occurrences of patterns in the text.
Let’s take an example to understand the problem,
Input
text = “xyztrwqxyzfg” pattern = “xyz” Output
Found at index 0Found at index 7
Here, we will discuss the solution to the problem using KMP (Knuth Morris Pratt)
pattern searching algorithm, it will use a preprocessing string of the pattern which
will be used for matching in the text. And help’s in processing or finding pattern
matches in the case where matching characters are followed by the character of the
string that does not match the pattern.
We will preprocess the pattern wand to create an array that contains the proper prefix
and suffix from the pattern that will help in finding the mismatch patterns.
Program for KMP Algorithm for Pattern Searching
// C Program for KMP Algorithm for Pattern
SearchingExample #include<iostream>
#include<string.h> using namespace std;
void prefixSuffixArray(char* pat, int M, int* pps)
{ int length = 0; pps[0] = 0; int i = 1; while (i < M) {
if (pat[i] == pat[length]) { length++;
pps[i] = length; i++;
} else {
if (length != 0)
length = pps[length - 1]; else { pps[i] = 0; i++;
}
}
}
}
void KMPAlgorithm(char* text, char* pattern) {int M = strlen(pattern);
int N = strlen(text); int pps[M]; prefixSuffixArray(pattern, M, pps); int i = 0; int j = 0;
while (i
< N) {
char pattern[] = "xyz"; printf("The pattern is found in the textat the following index :
");
KMPAlgorithm(text, pattern);return 0;
}
Output
The pattern is found in the text at the following index
– Found pattern at index 0Found pattern at index 7
1.8 Sorting :
Sorting is a fundamental operation in data structures and algorithms. It involves
arranging elements of a list or array in a specific order, usually ascending or
descending. Sorting is essential for optimizing search operations, organizing data,
and improving the efficiency of algorithms.
The same approach is applied in insertion sort. The idea behind the insertion sort is
that first take one element,iterate it through the sorted array. Although it is simple to
use, it is not appropriate for large data sets as the time complexity of insertion sort in
the average case and worst case is O(n2), where n is the number of items. Insertion
sort is less efficient than the other sorting algorithms like heap sort, quick sort, merge
sort, etc.
Algorithm
The simple steps of achieving the insertion sort are listed as follows -
37
Step 1 - If the element is the first element, assume that it is already sorted. Return 1.
Step2 - Pick the next element, and store it separately in a key. Step3 - Now, compare
the key with all elementsin the sorted array.
Step 4 - If the element in the sorted array is smaller than the current element, then
move to the next element.Else, shift greater elements in the array towards the right.
Step 5 - Insert the value.
Step 6 - Repeat until the array is sorted. Working of Insertion sort AlgorithmNow,
let's see the working of the insertion sort Algorithm.
To understand the working of the insertion sort algorithm, let's take an unsorted
array. It will be easier tounderstand the insertion sort via an example.
Let the elements of array are -
Here, 31 is greater than 12. That means both elements are already in ascending order.
So, fornow, 12 is stored in a sorted sub-array.
Here, 25 is smaller than 31. So, 31 is not at correct position. Now, swap 31 with
25. Alongwith swapping, insertion sort will also check it with all elements in the
sorted array. For now, the sorted array has only one element, i.e. 12. So, 25 is greater
than 12. Hence, the sortedarray remains sorted after swapping.
Now, two elements in the sorted array are 12 and 25. Move forward to the next
elements that are31 and 8.
Hence, they are already sorted. Now, the sorted array includes 8, 12, 25 and 31.
Move to the next elements that are 32 and 17.
while(j>=0 && temp <= a[j]) /* Move the elements greater than temp to one
position a headfrom their current position*/
j+1] = a[j];
= j-1;
j+1] = temp;
40
}
int main()
{
int a[] = { 12, 31, 25, 8, 32, 17 };
int n = sizeof(a) / sizeof(a[0]);
printf("Before sorting array elements are -
\n");
printArr(a, n);
insert(a, n);
printf("\nAfter sorting array elements are -
\n");
printArr(a, n);
return 0;
Heap sort processes the elements by creating the min-heap or max-heap using the
elements of the given array. Min-heap or max-heap represents the ordering of array
in which the root element represents the minimum or maximum element of the
array.
Algorithm
1. HeapSort(arr)
2. BuildMaxHeap(arr)
3. for i = length(arr) to 2
4. swap arr[1] with arr[i]
5. heap_size[arr] = heap_size[arr] ? 1
6. MaxHeapify(arr,1)
7. End
BuildMaxHeap(arr)
1. BuildMaxHeap(arr)
2. heap_size(arr) = length(arr)
3. for i = length(arr)/2 to 1
4. MaxHeapify(arr,i)
5. End
MaxHeapify(arr,i)
1. MaxHeapify(arr,i)
2. L = left(i)
3. R = right(i)
4. if L ? heap_size[arr] and arr[L] > arr[i]
5. largest = L
6. else
7. largest = i
8. if R ? heap_size[arr] and arr[R] > arr[largest]
9. largest = R
10. if largest != i
11. swap arr[i] with arr[largest]
12. MaxHeapify(arr,largest)
13. End
42
Working of Heap sort Algorithm
In heap sort, basically, there are two phases involved in the sorting of elements. By
using the heapsort algorithm, they are as follows -
o The first step includes the creation of a heap by adjusting the elements
of the array.
o After the creation of heap, now remove the root element of the heap
repeatedly by shifting it to the end ofthe array, and then store the heap structure
with the remaining elements.
First, we have to construct a heap from the given array and convert it into max heap.
After converting the given heap into max heap, the array elements are -
Next, we have to delete the root element (89) from the max heap. To delete this node,
we have toswap it with the last node, i.e. (11). After deleting the root element, we
again have to heapify it toconvert it into max heap.
43
After swapping the array element 89 with 11, and converting the heap into max-
heap, the elementsof array are -
In the next step, again, we have to delete the root element (81) from the max
heap. To deletethis node, we have to swap it with the last node, i.e. (54). After
deleting the root element, we again have to heapify it to convert it into max heap.
After swapping the array element 81 with 54 and converting the heap into max-
heap, the elements of array are -
In the next step, we have to delete the root element (76) from the max heap again.
To deletethis node, we have to swap it with the last node, i.e. (9). After deleting the
root element, weagain have to heapify it to convert it into max heap.
After swapping the array element 76 with 9 and converting the heap into max-
heap, the elements of array are -
44
In the next step, again we have to delete the root element (54) from the max heap.
To delete this node, we have to swap it with the last node, i.e. (14). After deleting
the root element, weagain have to heapify it to convert it into max heap.
After swapping the array element 54 with 14 and converting the heap into max-
heap, the elements of array are -
In the next step, again we have to delete the root element (22) from the max heap.
To delete this node, we have to swap it with the last node, i.e. (11). After deleting
the root element, weagain have to heapify it to convert it into max heap.
After swapping the array element 22 with 11 and converting the heap into max-
heap, the elementsof array are -
In the next step, again we have to delete the root element (14) from the max heap.
To delete this node, we have to swap it with the last node, i.e. (9). After deleting the
root element, we again haveto heapify it to convert it into max heap.
After swapping the array element 14 with 9 and converting the heap into max-heap,
the elementsof array are -
45
In the next step, again we have to delete the root element (11) from the max heap.
To delete this node, we have to swap it with the last node, i.e. (9). After deleting the
root element, we again haveto heapify it to convert it into max heap.
After swapping the array element 11 with 9, the elements of array are -
Now, heap has only one element left. After deleting it, heap will be empty.
Time complexity of Heap sort in the best case, average case, and worst case
1. Time Complexity
Average (n log n)
Case
Worst Case (n log n)
o Best Case Complexity - It occurs when there is no sorting required, i.e. the array is
already sorted. The best-case time complexity of heap sort is O(n logn).
o Average Case Complexity - It occurs when the array elements are in jumbled order
that is not properly ascending and not properly descending. The average case time
complexity of heap sort is O(n log n).
o Worst Case Complexity - It occurs when the array elements are required to be
sorted in reverse order. That means suppose you have to sort the array elements in
ascending order, but its elements are in descending order. The worst-case time
complexity of heap sort is O(n log n).
46
The time complexity of heap sort is O(n logn) in all three cases (best case, average
case,and worst case). The height of a complete binary tree having n elements is
logn.
2. Space Complexity
23. }
24. /*Function to implement the heap sort*/
47
25. void heapSort(int a[], int n)
26. {
27. for (int i = n / 2 - 1; i >= 0; i--)
28. heapify(a, n, i);
29. // One by one extract an element from heap
30. for (int i = n - 1; i >= 0; i--) {
31. /* Move current root element to end*/
32. // swap a[0] with a[i]
33. int temp = a[0];
34. a[0] = a[i];
35. a[i
] = temp; 36.
37.
heapify(a, i, 0);
38. }
39. }
40. /* function to print the array elements */
41. void printArr(int arr[], int n)
42. {
43. for (int i = 0; i < n; ++i)
44. {
45. printf("%d", arr[i]);
47. }
48.
49. }
}
Output
49
UNIT II - GRAPH ALGORITHMS
Representations of graphs - Graph traversal: DFS - BFS - Minimum spanning tree:
Kruskal’s and Prim’s algorithm - Shortest path: Bellman - Ford algorithm -
Dijkstra’s algorithm - Maximum flow: Flow networks - Ford-Fulkerson method -
Maximum bipartite matching.
2.1 Graph Representation
Graph algorithms are methods for solving problems related to graph data
Definition
A graph G(V, E) is a non-linear data structure that consists of nodeand edge pairs of
objects connected by links.
There are 2 types of graphs:
Directed Undirected
Directed graph
Directed Graph
Undirected graph
50
Undirected Graph
Representation of Graphs
Graph data structure is represented using the following representations.
1. Adjacency Matrix
2. Adjacency List
Adjacency Matrix
Adjacency List
51
2.2 Graph traversals
Applications of graphs
Social network graphs : To tweet or not to tweet. Graphs that represent who knows
whom, who communicates with whom, who influences whom, or other
relationships in social structures. An example is the twitter graph of who follows
whom.
Graphs in epidemiology: Vertices represent individuals and directed edges to
view the transfer of an infectious disease from one individual to another.
Analyzing such graphs has become an important component in understanding
and controlling the spread of diseases.
Protein-protein interactions graphs: Vertices represent proteins and edges
represent interactions between them that carry out some biological function in the
cell. These graphs can be used to, for example, study molecular pathway—chains
of molecular interactions in a cellular process.
Network packet traffic graphs: Vertices are IP (Internet protocol) addresses and
edges are the packets that flow between them. Such graphs are used for analyzing
network security, studying the spread of worms, and tracking criminal or non-
criminal activity.
Neural networks: Vertices represent neurons and edges are the synapses between
them. Neural networks are used to understand how our brain works and how
connections change when we learn. The human brain has about 1011 neurons and
close to 1015 synapses.
Graph traversal
Graph traversal is a technique used for searching a vertex in a graph. The graph
traversal is also usedto decide the order of vertices is visited in the search process.
A graph traversal finds the edges to beused in the search process without creating
loops. That means using graph traversal we visit all the vertices of the graph
without getting into looping path.
There are two graph traversal techniques and they are as follows...
52
2.2.1 BFS (Breadth First Search)
BFS traversal of a graph produces a spanning tree as final result. Spanning Tree is
a graph without loops. We use Queue data structure with maximum size of total
number of vertices inthe graph to implement BFS traversal.
BFS Algorithm
For the BFS implementation, we will categorize each vertex of the graph into two
categories:
Visited
Not Visited
The reason for this division is to prevent the traversal of the same node again thus
avoiding cycles in the graph. Breadth First Traversal in Data Structure uses a queue to
keep track of the vertices to visit. By processing vertices in the order they were added to
the queue,BFS ensures that all the vertices at the current level are visited before moving
on to the next level. A boolean visited array is used to mark the visited vertices. This
leads to a breadth-first exploration of the graph or tree, hence the name "breadth-first
traversal."
Here's a step-by-step explanation of how the BFS algorithm works:
Start by selecting a starting vertex or node.
Add the starting vertex to the end of the queue.
Mark the starting vertex as visited and add it to the visited array/list.
While the queue is not empty, do the following steps:
Remove the front vertex from the queue.
Visit the removed vertex and process it.
Enqueue all the adjacent vertices of the removed vertex that have not been visited
yet.
Mark each visited adjacent vertex as visited and add it to the visited array/list.
Repeat step 4 until the queue is empty.
The graph might have two different disconnected parts so to make sure that we
cover everyvertex, we can also run the BFS algorithm on every node.
Example
tep Raversal Escription
53
e then see an unvisited adjacent
nodeom S. In this example, we
have three odes but
alphabetically we choose A, ark it
as visited and enqueue it.
rom A we have D as
unvisiteddjacent node.
We mark it as isited
and enqueue it.
BFS pseudocode
create a queue Q
mark v as visited and put v into Q while Q is non- emptyremove the head u of Q
DFS Algorithm
For the DFS implementation, we will categorize each vertex of the graph into
twocategories:
1. Visited
2. Not Visited
The reason for this division is to prevent the traversal of the same node again thus,
avoiding cycles in the graph. Depth First Traversal in Data Structure uses a stack
to keep track of the vertices to visit. A boolean visited array is used to mark the
visited vertices.
55
Here's a step-by-step explanation of how the DFS algorithm works::
Keep repeating steps 3 and 4 until the stack is empty.The graph might have two
different disconnectedparts so to make sure that we cover every vertex, we can
also run the BFS algorithm on every node.
Depth First Search (DFS) algorithm traverses a graph in a depthward motion and
uses a stack to remember to get the next vertex to start a search, when a dead end
occurs in anyiteration.
56
t raversal escription
e
p
57
ark A as visited and
put it onto the ack.
Explore any unvisited
adjacent ode from A.
Both S and D are
adjacentA but we are
concerned for
unvisited odes only.
isit D and mark it as
visited and put ontoe
stack. Here, we
ave B and C nodes,
which are adjacent D
and both are
unvisited. However,
weall again choose in
an alphabetical
rder.
e choose B, mark it as
visited and put nto the
stack. Here B does not
have anynvisited
adjacent node.
So, we pop B om the
stack.
58
Pseudocode of DFS
DFS(G, u)
u.visited = true
for each v ∈ G.Adj[u] if v.visited == false DFS(G,v)
init() {
For each u ∈ G u.visited = false For each u ∈ G DFS(G, u)
}
Applications of Depth First Search Algorithm
F (V + E) (V)
S
F (V + E) (V)
S
59
PROPERTIES OF THE MINIMUM SPANNING TREE
A minimum spanning tree of a graph is unique, if the weight of all the edges are
distinct. Otherwise, there may be multiple minimum spanning trees. (Specific
algorithms typically output one of the possibleminimum spanning trees).
Minimum spanning tree is also the tree with minimum product of weights of
edges. (It can be easily proved by replacing the weights of all edges with their
logarithms)
In a minimum spanning tree of a graph, the maximum weight of an edge is the
minimum possible from all possible spanning trees of that graph. (This follows
from the validity of Kruskal's algorithm).
The maximum spanning tree (spanning tree with the sum of weights of edges
being maximum) of a graph can be obtained similarly to that of the minimum
spanning tree, by changing the signs of the weights of all the edges to their
opposite and then applying any of the minimum spanning tree algorithm.
60
Example of Prim's algorithm
Choose a vertex
61
Choose the nearest edge not yet in the solution, if there are multiple choices, choose
one at random
62
and slow down thealgorithm on very large graphs.
3. The choice of starting node can affect the MST output, which may not be desirable
in some applications.
Other Implementations of Prim’s Algorithm:
Given below are some other implementations of Prim’s Algorithm
Prim’s Algorithm for Adjacency Matrix Representation – In this article we have
discussed the methodof implementing Prim’s Algorithm if the graph is
represented by an adjacency matrix.
Prim’s Algorithm for Adjacency List Representation – In this article Prim’s
Algorithm implementationis described for graphs represented by an adjacency
list.
Prim’s Algorithm using Priority Queue: In this article, we have discussed a time-
efficient approach toimplement Prim’s algorithm.
63
Start with a weighted graph
Choose the edge with the least weight, if there are more than 1, choose anyone
Choose the next shortest edge that doesn't create a cycle and add it
64
Choose the next shortest edge that doesn't create a cycle and add it
Time Complexity:
O(E * logE) or O(E * logV) Sorting of edges takes O(E * logE) time.
After sorting, we iterate through all edges and apply the find-union
algorithm. The find and unionoperations can take at most O(logV) time.
So overall complexity is O(E * logE + E * logV) time.
The value of E can be at most O(V2), so O(logV) and O(logE) are the same.
Therefore, the overall timecomplexity is O(E * logE) or O(E*logV)
Auxiliary Space: O(V + E), where V is the number of vertices and E is the
number of edges in the graph.
Applications of Kruskal's algorithm
Kruskal's algorithm has several applications, including: Designing efficient network
connections or cable layout planning. Constructing efficient road networks or
transportation systems. Cluster analysis or datagrouping. Approximate solutions
to the traveling salesman problem.
65
2.4 Shortest Path Algorithm
The shortest path problem is about finding a path between vertices in a graph
such thatthe total sum of the edges weights is minimum.
66
67
Bellman Ford Pseudocode
We need to maintain the path distance of every vertex. We can store that in an
arrayof size v, where v is the number of vertices.
We also want to be able to get the shortest path, not only know the length of the
shortestpath. For this, we map each vertex to the vertex that last updated its path
length.
Once the algorithm is over, we can backtrack from the destination vertex to the
sourcevertex to find the path.
function bellmanFord(G,S) for each vertex V in G distance[V] <- infinite
previous[V] <- NULLdistance[S] <- 0 for each vertex V in Gfor each edge
(U,V) in G
tempDistance <- distance[U] + edge_weight(U, V) if tempDistance <distance[V]
distance[V] <- tempDistance
previous[V] <- U
68
Ford'sComplexity
omplexity
Average Case O(VE
Dijkstra's algorithm allows us to find the shortest path between any two vertices of
a graph.It differs from the minimum spanning tree because the shortest distance
between two vertices might not include all the vertices of the graph.
Djikstra used this property in the opposite direction i.e we overestimate the
distance of eachvertex from the starting vertex. Then we visit each node and its
neighbors to find the shortestsubpath to those neighbors.
The algorithm uses a greedy approach in the sense that we find the next best
solution hopingthat the end result is the best solution for the whole problem.
69
Start with a weighted graph
Choose a starting vertex and assign infinity path values to all other devices
70
If the path length of the adjacent vertex is lesser than new path length, don't update
it
After each iteration, we pick the unvisited vertex with the least path length. So we
choose 5 before 7
71
Notice how the rightmost vertex has its path length updated twice
72
return distance[], previous[]
Dijkstra's Algorithm Complexity
Time Complexity: O(E Log V)
where, E is the number of edges and V is the number ofvertices. Space Complexity: O(V)
Time Complexity: Time complexity of the above algorithm is O(max_flow * E). We run a loop
whilethere is an augmenting path. In worst case, we may add 1 unit flow in every iteration.
Therefore the time complexity becomes O(max_flow * E).
Ford-Fulkerson Algorithm
Initially, the flow of value is 0. Find some augmenting Path p and increase flow f on eachedge
of p by residual Capacity cf (p). When no augmenting path exists, flow f is a maximum flow.
FORD-FULKERSON METHOD (G, s, t)
1. Initialize flow f to 0
2. while there exists an augmenting path p
3. do argument flow f along p
4. Return f
FORD-FULKERSON (G, s, t)
1. for each edge (u, v) ∈ E [G]
2. do f [u, v] ← 0
3. f [u, v] ← 0
4. while there exists a path p from s to t in the residual network Gf.
5. do cf (p)←min?{ Cf (u,v):(u,v)is on p}
6. for each edge (u, v) in p
7. do f [u, v] ← f [u, v] + cf (p)
8. f [u, v] ←-f[u,v]
Example: Each Directed Edge is labeled with capacity. Use the Ford-Fulkerson algorithmto
find the maximum flow.
Solution: The left side of each part shows the residual network Gf with a shadedaugmenting
path p,and the right side of each part shows the net flow f.
74
2.5 MAXIMUM FLOW
It is defined as the maximum amount of flow that the network would allow to flow from
source to sink.Multiple algorithms exist in solving the maximum flow problem. Two major
algorithms to solve these kind of problems are Ford-Fulkerson algorithm and Dinic's
Algorithm.
In optimization theory, maximum flow problems involve finding a feasible flow through a
flownetwork that obtains the maximum possible flow rate.
The maximum flow problem can be seen as a special case of more complex network flow
problems, such as the circulation problem. The maximum value of an s-t flow (i.e., flow from
source s to sink t)is equal to the minimum capacity of an s-t cut (i.e., cut severing s from t) in
the network, as stated in the max-flow min-cut theorem.
75
anddelivery of goods can be modelled using flow networks.
Definition: A Flow Network is a directed graph G = (V, E) such that
1. For each edge (u, v) ∈ E, we associate a nonnegative weight capacity c (u, v) ≥ 0.If (u, v) ∉ E,
weassume that c (u, v) = 0.
2. There are two distinguishing points, the source s, and the sink t;
3. For every vertex v ∈ V, there is a path from s to t containing v.
Let G = (V, E) be a flow network. Let s be the source of the network, and let t be the sink.A flow
in G is a real-valued function f: V x V→R such that the following properties hold:
Play Video
o Capacity Constraint: For all u, v ∈ V, we need f (u, v) ≤ c (u, v).
o Skew Symmetry: For all u, v ∈ V, we need f (u, v) = - f (u, v).
o Flow Conservation: For all u ∈ V-{s, t}, we need
The quantity f (u, v), which can be positive or negative, is known as the net flow from vertex
u to vertex v. In the maximum-flow problem, we are given a flow network G withsource s and
sink t, and
a flow of maximum value from s to t.
A bipartite graph is an undirected graph G=(V,E) in which V can be partitioned into two sets
V1 and V2 such that (u,v) E implies either u V1 and v V12 or vice versa. • That is,
all edges go between the two sets V1and V2 and not within V1 and V2.
The bipartite matching is a set of edges in a graph is chosen in such a way, that no two edgesin
that set will share an endpoint. The maximum matching is matching the maximum numberof
edges.
When the maximum match is found, we cannot add another edge. If one edge is added to the
maximum matched graph, it is no longer a matching. For a bipartite graph, there can be
morethan one maximum matching is possible.
Algorithm
77
UNIT – III ADVANCED DESIGN AND ANALYSIS TECHNIQUES
Divide and Conquer methodology: Merge sort - Quick sort- Dynamic programming:
Elements of dynamic programming -Matrix-chain multiplication - Multi stage graphs. Greedy
Technique: Elements of the greedy strategy - Activity-selection problem - Huffman Trees
3.1 Divide and Conquer methodology
Definition:
It involves breaking a larger problem into smaller subproblems, solving them independently,
and then combining their solutions to solve the original problem. The basic idea is to
recursively divide the problem into smaller subproblems until they become simple enough
to be solved directly. Once the solutions to the subproblems are obtained, they are then
combined to produce the overall solution.
Divide and Conquer Algorithm can be divided into three steps: Divide, Conquer and Merge.
1. Divide:
2. Conquer:
• If a subproblem is small enough (often referred to as the “base case”), we solve itdirectly
without further recursion.
78
• Combine the sub-problems to get the final solution of the whole problem.
• Once the smaller subproblems are solved, we recursively combine their solutions to get the
solution of larger problem.
• The goal is to formulate a solution for the original problem by merging the results from the
subproblems.
• Dividing the Problem: The first step is to break the problem into smaller, more manageable
subproblems. This division can be done recursively until the subproblems become simple
enough to solve directly.
• Conquering Each Subproblem: Once divided, the subproblems are solved individually. This
may involve applying the same divide and conquer approach recursively until the
subproblems become simple enough to solve directly, or it may involve applying a different
algorithm or technique.
• Combining Solutions: After solving the subproblems, their solutions are combined to obtain
the solution to the original problem. This combination step should be relatively efficient and
straightforward, as the solutions to the subproblems should be designed to fit together
seamlessly.
3.1.1 Finding the maximum and minimum element in the array:
We can use Divide and Conquer Algorithm to find the maximum element in the array by
dividing the array into two equal sized subarrays, finding the maximum of those two
individual halves by again dividing them into two smaller halves. This is done till we reach
subarrays of size 1. After reaching the elements, we return the maximum element and
combine the subarrays by returning the maximum in each subarray.
Similarly, we can use Divide and Conquer Algorithm to find the minimum element in the
79
array by dividing the array into two equal sized subarrays, finding the minimum of those two
individual halves by again dividing them into two smaller halves. This is done till we reach
subarrays of size 1. After reaching the elements, we return the minimum element and
combine the subarrays by returning the minimum in each subarray.
#include <stdio.h> struct Pair {
int min; int max;
};
struct Pair findMinMax(int arr[], int low, int high) { struct Pair minmax, left, right;
int mid;
return minmax;
}
int main() {
int arr[] = {1000, 11, 445, 1, 330, 3000};
int arr_size = 6;
struct Pair minmax = findMinMax(arr, 0, arr_size - 1); printf("Minimum element is %d\n",
minmax.min); printf("Maximum element is %d\n", minmax.max); return 0;
}
for i = 0 to n1 - 1
L[i] = array[left + i] for j = 0 to n2 - 1
R[j] = array[mid + 1 + j]
82
3.1.3 Quick Sort
QuickSort is one of the best sorting algorithms that follows the divide-and- conquer
approach like Merge Sort but unlike Merge Sort, this algorithm does in place sorting. In this
article, we will learn how to implement quicksort in C language.
What is QuickSort Algorithm?
The basic idea behind QuickSort is to select a pivot element from the array and partition the
other elements into two sub-arrays according to whether they are less than or greater than the
pivot. The sub-arrays are then sorted recursively. This process of partitioning and sorting
continues until the entire array is sorted.
83
84
85
86
Now,
87
Now,
#include <stdio.h>
// Partition function
int partition(int arr[], int low, int high) {
// The QuickSort function implementation void quickSort(int arr[], int low, int high) {
if (low < high) {
int main() {
int arr[] = {10, 7, 8, 9, 1, 5};
int n = sizeof(arr) / sizeof(arr[0]);
89
conquer algorithm does more work than necessary, repeatedly solving the common
subsubproblems. A dynamic-programming algorithm solves each subsubproblem just
once and then saves its answer in a
table, thereby avoiding the work of recomputing the answer every time it solves each
subproblem. Dynamic programming typically applies to optimization problems. Such
problems canhave many possible solutions. Each solution has a value, and you want to find
a solutionwith the optimal (minimum or maximum) value. We call such a solution an optimal
solution to the problem, as opposed to the optimal solution, since there may be several
solutionsthat achieve the optimal value.
To develop a dynamic-programming algorithm, follow a sequence of foursteps:
1. Characterize the structure of an optimal solution.
2. Recursively define the value of an optimal solution.
3. Compute the value of an optimal solution, typically in a bottom-up fashion.
4. Construct an optimal solution from computed information.
Steps 1,2,3 form the basis of a dynamic-programming solution to a problem. If you need only
the value of an optimal solution, and not the solution itself, then you can omit step
4. When you do perform step 4, it often pays to maintain additional information during
step3 so that you can easily construct an optimal solution.
Matrix Chain Multiplication is a fundamental problem in the field of computer science and
algorithms that deals with the efficient multiplication of a sequence of matrices. The core
objective of this problem is to determine the optimal way to parenthesize the matrices in the
given sequence to minimize the total number of scalar multiplications required for the matrix
chain multiplication. This problem has broad applications in various domains, including
computer graphics, numerical simulations, optimization, and scientific computing, where
matrix operations are pervasive.
In many computational tasks, matrices are utilized to represent complex data and
transformations. For instance, in computer graphics, matrices are employed to represent
transformations like scaling, rotation, and translation. In scientific simulations, matrices are
used to model physical systems and equations. Performing matrix multiplications in these
applications can be computationally expensive, especially when dealing with long sequences
of matrices. Matrix Chain Multiplication seeks to optimize these computations by finding the
90
most efficient order of multiplication.
91
Let us proceed with working away from the diagonal. We compute the optimal solution for
the product of 2 matrices.
Here P0 to P5 are Position and M1 to M5 are matrix of size (pi to pi-1) On the basis of
We have to sort out all the combination but the minimum output combination is taken into
consideration.
2. m (2, 3) = m2 x m3
= 10 x 3 x 3 x 12
= 10 x 3 x 12 = 360
3. m (3, 4) = m3 x m4
= 3 x 12 x 12 x 20
= 3 x 12 x 20 = 720
4. m (4,5) = m4 x m5
92
= 12 x 20 x 20 x 7
= 12 x 20 x 7 = 1680
o We initialize the diagonal element with equal i,j value with '0'.
o After that second diagonal is sorted out and we get all the values corresponded
to itNow the third diagonal will be solved out in the same way.
M [1, 3] = M1 M2 M3
1. There are two cases by which we can solve this multiplication: ( M1 x M2) + M3, M1+(M2x
M3)
2. After solving both cases we choose the case in which minimum output is there.
M [1, 3] =264
As Comparing both output 264 is minimum in both cases so we insert 264 in table and (
M1 x M2) + M3 this combination is chosen for the output making.
M [2, 4] = M2 M3 M4
1. There are two cases by which we can solve this multiplication: (M2x M3)+M4, M2+(M3 x
M4)
2. After solving both cases we choose the case in which minimum output is there.
M [2, 4] = 1320
As Comparing both output 1320 is minimum in both cases so we insert 1320 in table and
M2+(M3 x M4) this combination is chosen for the output making.
93
M [3, 5] = M3 M4 M5
1. There are two cases by which we can solve this multiplication: ( M3 x M4) + M5, M3+(
M4xM5)
2. After solving both cases we choose the case in which minimum output is there.
M [3, 5] = 1140
As Comparing both output 1140 is minimum in both cases so we insert 1140 in table and (
M3 x M4) + M5this combination is chosen for the output making.
M [1, 4] = M1 M2 M3 M4
There are three cases by which we can solve this multiplication:
1. ( M1 x M2 x M3) M4
2. M1 x(M2 x M3 x M4)
3. (M1 xM2) x ( M3 x M4)
After solving these cases we choose the case in which minimum output is there
M [1, 4] =1080
As comparing the output of different cases then '1080' is minimum output, so we insert 1080
in the table and (M1 xM2) x (M3 x M4) combination is taken out in output making,
M [2, 5] = M2 M3 M4 M5
There are three cases by which we can solve this multiplication:
94
1. (M2 x M3 x M4)x M5
2. M2 x( M3 x M4 x M5)
3. (M2 x M3)x ( M4 x M5)
After solving these cases we choose the case in which minimum output is there
M [2, 5] = 1350
As comparing the output of different cases then '1350' is minimum output, so we insert 1350
in the table and M2 x( M3 x M4 xM5)combination is taken out in output making.
M [1, 5] = M1 M2 M3 M4 M5
There are five cases by which we can solve this multiplication:
1. (M1 x M2 xM3 x M4 )x M5
2. M1 x( M2 xM3 x M4 xM5)
3. (M1 x M2 xM3)x M4 xM5
4. M1 x M2x(M3 x M4 xM5)
After solving these cases we choose the case in which minimum output is there
M [1, 5] = 1344
As comparing the output of different cases then '1344' is minimum output, so we insert 1344
in the table and M1 x M2 x(M3 x M4 x M5)combination is taken out in output making.
95
Final Output is:
Step 3: Computing Optimal Costs: let us assume that matrix Ai has dimension pi-1x pi for
i=1, 2, 3....n. The input is a sequence (p0,p1,. ... pn) where length [p] = n+1. The procedure
uses an
auxiliary table m [1....n, 1.....n] for storing m [i, j] costs an auxiliary table s [1.....n, 1 .n]
that
record which index of k achieved the optimal costs in computing m [i, j].
The algorithm first computes m [i, j] ← 0 for i=1, 2, 3.....n, the minimum costs for the chain
of length 1.
.2 Multi-Stage Graph
A Multistage graph is a directed, weighted graph in which the nodes can be divided into a
setof stages such that all edges are from a stage to next stage only (In other words there is
no edge between vertices of same stage and from a vertex of current stage to previous stage).
The vertices of a multistage graph are divided into n number of disjoint subsets S = { S1 , S2
, S3 ……….. Sn }, where S1 is the source and Sn is the sink ( destination ). The
cardinality of S1 and Sn are equal to 1. i.e., |S1| = |Sn| = 1.We are given a multistage graph,
a source and adestination, we need to find shortest path from source to destination. By
convention, we consider source at stage 1 and destination as last stage.
Multistage graph G = (V, E, W) is a weighted directed graph in which vertices are partitioned
into k ≥ 2 disjoint sub sets V = {V1, V2, …, Vk} such that if edge (u, v) is present in E then
u ∈ Vi and v ∈ Vi+1, 1 ≤ i ≤ k. The goal of multistage graph problem is to find minimum cost
path from source to destination vertex.
The input to the algorithm is a k-stage graph, n vertices are indexed in increasing order of
stages.
The algorithm operates in the backward direction, i.e. it starts from the last vertex of the
graph and proceeds in a backward direction to find minimum cost path.
▪ Minimum cost of vertex j ∈ Vi from vertex r ∈ Vi+1 is defined as, Cost[j] = min{
c[j, r] + cost[r] }
where, c[j, r] is the weight of edge <j, r> and cost[r] is the cost of moving from end
96
vertex to vertex r.
Algorithm MULTI_STAGE(G, k, n, p)
// Input:
k: Number of stages in graph G = (V, E)c[i, j]:Cost of edge (i, j)
Example: Find minimum path cost between vertex s and t for following multistage graph
using dynamic programming.
According to the formula, we have to calculate the cost (i, j) using the following steps
Step 1: Cost (K-2, j)
In this step, three nodes (node 4, 5. 6) are selected as j. Hence, we have three options to
choose the minimum cost at this step.
Two nodes are selected as j because at stage k - 3 = 2 there are two nodes, 2 and 3. So, the
value i = 2 and j = 2 and 3.
Cost(2, 3) = {c(3, 4) + Cost(4, 8) + Cost(8, 9), c(3, 5) + Cost(5, 8)+ Cost(8, 9), c(3, 6) +
Cost(6, 8) + Cost(8, 9)} = 10
Cost (1, 1) = {c(1, 2) + Cost(2, 6) + Cost(6, 8) + Cost(8, 9), c(1, 3) + Cost(3, 5) + Cost(5,
8) + Cost(8, 9))} = 12
The greedy method is quite powerful and works well for a wide range of problems.
Applications of the greedy method, including minimum-spanning-tree algorithms,
Dijkstra’s algorithm for shortest paths from a single source, and a greedy set-covering
heuristic. Minimum-spanning-tree algorithms furnish a classic example of the greedy
method.
3.3.2 Activity Selection Problem
You are given n activities with their start and finish times. Select the maximum
number of activities that can be performed by a single person, assuming that a person
can only work on a single activity at a time.
Examples:
98
Input: start[] = {10, 12, 20}, finish[] = {20, 25, 30}
Output: 0
Explanation: A person can perform at most one activities.
Input: start[] = {1, 3, 0, 5, 8, 5}, finish[] = {2, 4, 6, 7, 9, 9};
Output: 0 1 3 4
Explanation: A person can perform at most four activities. The maximum set of
activities that can be executed
is {0, 1, 3, 4} [These are indexes in start[] and finish[] Following steps are used to solve
the problem:
In the implementation, it is assumed that the activities are already sorted according
to their finish time, otherwise the time complexity will rise to O(N*log(N)) and
Auxiliary Space will rise to O(N), aswe have to create a 2-D array for storing the start
and finish times together.
Algorithm:
The method which is used to construct optimal prefix code is called Huffman coding.
100
This algorithm builds a tree in bottom up manner. We can denote this tree by T
Let, |c| be number of leaves |c| -1 are number of operations required to merge the
nodes. Q be the priority queue which can be used while constructing binary heap.
Frequency of string
101
1. Calculate the frequency of each character in the string
2. Sort the characters in increasing order of the frequency. These are stored
in a priority queue
Each character occupies 8 bits. There are a total of 15 characters in the above string. Thus, a total
of 8 * 15 = 120 bits are required to send this
string.
Using the Huffman Coding technique, we can compress the string to a smaller size.
Huffman coding first creates a tree using the frequencies of the character and then generates code
for each character.
Once the data is encoded, it has to be decoded. Decoding is done using the same tree.
Huffman Coding prevents any ambiguity in the decoding process using the concept
of prefix code ie. a code associated with a character should not be present in the prefix of any
other code. The tree created above helps in maintaining the property.
Huffman coding is done with the help of the following steps.
103
Assign 0 to the left edge and 1 to the right edge
For sending the above string over a network, we have to send the tree as well as the
above compressed-code. The total size is given by the table below.
Without encoding, the total size of the string was 120 bits. After encoding the size is
reduced to
32+ 15 + 28
=755 75
For decoding the code, we can take the code and traverse through the tree to find the character.
Let 101 is to be decoded, we can traverse from the root as in the figure below.
Decoding
105
UNIT IV STATE SPACE SEARCH ALGORITHMS
4.1 Backtracking
Backtracking is a problem-solving algorithmic technique that involves finding a
solution incrementally by trying different options and undoing them if they lead
to a dead end. It is commonly used in situations where you need to explore multiple
possibilities to solve a problem, like searching for a path in a maze or solving
puzzles like Sudoku. When a dead end is reached, the algorithm backtracks to the
previous decision point and explores a different path until a solution is found or all
possibilities have been exhausted.
Basic Terminologies
• Candidate: A candidate is a potential choice or element that can be added to
the currentsolution.
• Solution: The solution is a valid and complete configuration that satisfies all
problemconstraints.
• Partial Solution: A partial solution is an intermediate or incomplete
configuration being constructed during the backtracking process.
• Decision Space: The decision space is the set of all possible candidates or
choices at each decision point.
• Decision Point: A decision point is a specific step in the algorithm where a
106
candidate is chosen and added to the partial solution.
• Feasible Solution: A feasible solution is a partial or complete solution that
adheres to allconstraints.
• Dead End: A dead end occurs when a partial solution cannot be extended
without violatingconstraints.
• Backtrack: Backtracking involves undoing previous decisions and returning
to a priordecision point.
• Search Space: The search space includes all possible combinations of
candidates andchoices.
• Optimal Solution: In optimization problems, the optimal solution is the best
possible solution.
Types of Backtracking Problems
Problems associated with backtracking can be categorized into 3 categories:
• Decision Problems: Here, we search for a feasible solution.
• Optimization Problems: For this type, we search for the best solution.
• Enumeration Problems: We find set of all possible feasible solutions to the
problems ofthis type.
4.1.1 N-Queen Problem
107
The expected output is in the form of a matrix that has ‘Q‘s for the blocks where
queens are placedand the empty spaces are represented by ‘.’ . For example, the
following is the output matrix for the above 4-Queen solution.
0Q00
000Q
Q000
00Q0
The idea is to place queens one by one in different columns, starting from the
leftmost column.When we place a queen in a column, we check for clashes with
already placed queens. In the current column, if we find a row for which there is no
clash, we mark this row and column as part ofthe solution. If we do not find such
a row due to clashes, then we backtrack and return false.
Follow the steps mentioned below to implement the idea:
Try all rows in the current column. Do the following for every row.
108
If all rows have been tried and valid solution is not found return false to
trigger backtracking.
Place (k, i) returns a Boolean value that is true if the kth queen can be placed in
column i. It testsboth whether i is distinct from all previous costs x1, x2,. xk-1 and
whether there is no other queen on
the same diagonal.
1. Place (k, i)
2. {
3. For j ← 1 to k - 1
4. do if (x [j] = i)
5. or (Abs x [j]) - i) = (Abs (j - k))
6. then return false;
7. return true;
8. }
Place (k, i) return true if a queen can be placed in the kth row and ith column
109
otherwise return is false.
x [] is a global array whose final k - 1 values have been set. Abs (r) returns the
absolute value of r.
1. N - Queens (k, n)
2. {
3. For i ← 1 to n
4. do if Place (k, i) then5.
{
6. x [k] ← i;
7. if (k ==n) then
8 write (x [1...
. n));
9 Else
.
1 N - Queens
0 (k + 1, n);
.
1 }
1
.
1
2
.
}
110
Output: {0, 1, 2, 4, 3, 0}.
Hamiltonian Cycle using Backtracking Algorithm:
Create an empty path array and add vertex 0 to it. Add other vertices, starting from
the vertex 1. Before adding a vertex, check for whether it is adjacent to the
previously added vertex and not already added. If we find such a vertex, we add
the vertex as part of the solution. If we do not find a vertex then we return false.
Let’s find out the Hamiltonian cycle for the following graph:
111
• As cycle is not found in path {0, 3, 1, 4, 2}. So, return from node 2, node
4.
112
• Return from node 4, node 2, node 1.
113
• In the Hamiltonian path {0,3,4,2,1,0} we get cycle as node 1 is the neighbour
of node 0.
114
};
int path[NODE];
// Function to display the Hamiltonian cycle void displayCycle() {
printf("Cycle Found: ");
for (int i = 0; i < NODE; i++) printf("%d ", path[i]);
// Print the first vertex again printf("%d\n", path[0]);
}
// Function to check if adding vertex v to the path is valid int isValid(int v, int k) {
// If there is no edge between path[k-1] and v if (graph[path[k - 1]][v] == 0)
return 0;
// Check if vertex v is already taken in the pathfor (int i = 0; i < k; i++)
if (path[i] == v) return 0;
return 1;
}
// Function to find the Hamiltonian cycle int cycleFound(int k) {
// When all vertices are in the path if (k == NODE) {
// Check if there is an edge between the last and first vertex if (graph[path[k -
1]][path[0]] == 1)
return 1; else
return 0;
}
// Try adding each vertex (except the starting point) to the path for (int v = 1; v <
NODE; v++) {
if (isValid(v, k)) { path[k] = v;
if (cycleFound(k + 1) == 1)
return 1;
// Backtrack: Remove v from the path path[k] = -1;
}
}
return 0;
}
// Function to find and display the Hamiltonian cycle int hamiltonianCycle() {
for (int i = 0; i < NODE; i++) path[i] = -1;
// Set the first vertex as 0 path[0] = 0;
if (cycleFound(1) == 0) { printf("Solution does not exist\n"); return 0;
}
displayCycle(); return 1;
}
int main() { hamiltonianCycle(); return 0;
115
}
Backtracking
For example, for S = {1, 2, 5, 6, 8) and d = 9, there are two solutions: {1, 2, 6} and {1,
8}. Of course,some instances of this problem may have no solutions.
• It is convenient to sort the set‘s elements in increasing order. So we will assume
• that s1 ≤ s2 ≤ ……. ≤ sn
• The state-space tree can be constructed as a binary tree as that in the following
figure for the instances S = (3, 5, 6, 7) and d = 15.
• The root of the tree represents the starting point, with no decisions about the
given elements made as yet.
• Its left and right children represent, respectively, inclusion and exclusion ofs1
in a set being sought.
• Similarly, going to the left from a node of the first level corresponds to inclusion
of s2, while going to the right corresponds to its exclusion, and soon.
• Thus, a path from the root to a node on the ith level of the tree indicates which
of the first inumbers have been included in the subsets represented by that node.
• We record the value of s‘ the sum of these numbers, in the node, Ifs is equal to
d. we have asolution to the problem.
• We can either, report this result and stop or, if all the solutions need to he
found,continue by backtracking to the node‘s parent.
116
Time Complexity: O(2n) The above solution may try all subsets of the given set in
the worst case. Therefore time complexity of the above solution is exponential.
Auxiliary Space: O(n) where n is recursion stack space.
117
The minimum number of colors needed to color a graph is called its chromatic
number. For example, the following can be colored a minimum of 2 colors.
• A decision problem is stated as, “With given M colors and graph G, whether a
such colorscheme is possible or not?”.
• The optimization problem is stated as, “Given M colors and graph G, find the
minimumnumber of colors required for graph coloring.”
118
Algorithm of Graph Coloring using Backtracking:
Assign colors one by one to different vertices, starting from vertex 0. Before
assigning a color, check if the adjacent vertices have the same color or not. If there
is any color assignment that does not violate the conditions, mark the color
assignment as part of the solution. If no assignmentof color is possible then
backtrack and return false.
Follow the given steps to solve the problem:
• Create a recursive function that takes the graph, current index, number of
vertices, andcolor array.
• If the current index is equal to the number of vertices. Print the color
configuration in thecolor array.
119
4.2 Branch and Bound
The Branch and Bound Algorithm is a method used in combinatorial optimization
problems to systematically search for the best solution. It works by dividing the
problem into smaller subproblems, or branches, and then eliminating certain
branches based on bounds on the optimal solution. This process continues until the
best solution is found or all branches have been explored. Branch and Bound is
commonly used in problems like the traveling salesman and job scheduling.
4.2.1 Solving 15 puzzle problem
Given a 4×4 board with 15 tiles (every tile has one number from 1 to 15) and one
empty space. The objective is to place the numbers on tiles in order using the empty
space. We can slide fouradjacent (left, right, above and below) tiles into the empty
space. For example,
120
1. Here X marks the spot to where the elements can be shifted and the final
configuration always remains the same the puzzle is solvable.
In general, for a given grid of width N, we can find out check if a N*N – 1 puzzle
is solvable or not by following below simple rules :
If N is odd, then puzzle instance is solvable if number of inversions is even in the
input state.
• the blank is on an odd row counting from the bottom (last, third-last,
fifth-last, etc.) and number of inversions is even.
121
The algorithm above can be illustrated with the following program code: int a[16];
122
for (int i=0; i<16; ++i) cin >> a[i];
int inv = 0;
for (int i=0; i<16; ++i) if (a[i])
for (int j=0; j<i; ++j) if (a[j] > a[i])
++inv;
for (int i=0; i<16; ++i) if (a[i] == 0)
inv += 1 + i / 4;
Let there be N workers and N jobs. Any worker can be assigned to perform any job,
incurring some cost that may vary depending on the work-job assignment. It is
required to perform all jobs by assigning exactly one worker to each job and exactly
one job to each agent in such a way that the total cost of the assignment is
minimized.
1. Brute Force
n!
The brute force approach generates all possible job assignments (which are
permutations) and computes the total cost for each assignment. The
N.
assignment with the minimum cost is chosen.
This approach has a time complexity of O(n!) , making it impractical for large
2. Hungarian Algorithm
123
The Hungarian algorithm is an efficient method to solve the assignment problem
with a time
complexity O(n^3)
of . It transforms the cost matrix and finds the optimal assignment by
reducing
the matrix and covering zeros with a minimum number of lines.
3. DFS/BFS on State Space Tree
This approach involves constructing a state space tree where each path from the root
to a leaf node represents a possible assignment. Depth-First Search (DFS) or
Breadth-First Search (BFS) can be used to explore the tree, but these methods may
not always find the optimal solution efficiently.
4. Branch and Bound
The Branch and Bound method is a more intelligent approach that uses a cost
function to guide the search for the optimal solution. It constructs a state space tree
and explores nodes based ontheir estimated cost, using a priority queue (min-heap)
to always expand the node with the least cost first. This method significantly
reduces the search space and improves efficiency.
Let’s take below example and try to calculate promising cost when Job 2 is assigned
to worker A.
Since Job 2 is assigned to worker A (marked in green), cost becomes 2 and Job 2 and
worker A becomes unavailable (marked in red).
124
Now we assign job 3 to worker B as it has minimum cost from list of unassigned
jobs. Cost becomes 2 + 3 = 5 and Job 3 and worker B also becomes unavailable.
Finally, job 1 gets assigned to worker C as it has minimum cost among unassigned
jobs and job 4 gets assigned to worker D as it is only Job left. Total cost becomes 2 +
3 + 5 + 4 = 14.
125
/* findMinCost uses Least() and Add() to maintain the list of live nodes
Least() finds a live node with least cost, deletes it from the list and returns it
126
The Branch and Bound method is an efficient algorithm design paradigm used to
solve combinatorial problems. It systematically explores branches of a decision tree,
pruning branches that cannot yield better solutions than the current best1.
Steps to Implement Branch and Bound for 0/1 Knapsack
Sort Items: Begin by sorting items in decreasing order of their value-to-
weight ratio.
Initialize: Create a dummy node representing the root of the decision tree
and initialize themaximum profit to zero.
Use a Queue: Use a queue to explore nodes. Start by enqueuing the dummy
node.
Explore Nodes: While the queue is not empty, dequeue a node and explore
its children: Include the Item: Calculate the profit and weight if the current
item is included. Exclude the Item:Calculate the profit and weight if the
current item is excluded. Update Maximum Profit: If the profit of the current
node is greater than the maximum profit, update the maximum profit.
Calculate Bound: Compute the upper bound for the current node using the
greedy approach. If the bound is greater than the maximum profit, enqueue
the node for further exploration.
The backtracking based solution works better than brute force by ignoring infeasible
solutions. We can do better (than backtracking) if we know a bound on best possible
solution subtree rooted withevery node. If the best in subtree is worse than current
best, we can simply ignore this node and its subtrees. So we compute bound (best
solution) for every node and compare the bound with current best solution before
exploring the node.
127
4.2.4 Traveling Salesman Problem using Branch and Bound
Given a set of cities and distance between every pair of cities, the problem is to
find the shortest possible tour that visits every city exactly once and returns to the
starting point.
128
For example, consider the graph shown in figure on right side. A TSP tour in the
graph is 0-1-3-2-
0. The cost of the tour is 10+25+30+15 which is 80.
Branch and Bound method, for current node in tree, we compute a bound on best
possible solution that we can get if we down this node. If the bound on best
possible solution itself is worse than
current best (best computed so far), then we ignore the subtree rooted with the
node. Note that the cost through a node includes two costs.
1. Cost of reaching the node from the root (When we reach a node, we have this cost
computed)
2. Cost of reaching an answer from current node to a leaf (We compute a bound on
this cost todecide whether to ignore subtree with this node or not).
In branch and bound, the challenging part is figuring out a way to compute a
bound on best possible solution. Below is an idea used to compute bounds for
Travelling salesman problem. Cost of any tour can be written as below.
For example, consider the above shown graph. Below are minimum cost two edges
adjacent to every node.
129
N Least cost Total cost
o edges
d
e
0 (0, 1), (0, 2) 25
1 (0, 1), (1, 3) 35
2 (0, 2), (2, 3) 45
3 (0, 3), (1, 3) 45
130
UNIT V NP-COMPLETE AND APPROXIMATION ALGORITHM
On the other hand, intractable problems are computational problems for which no
known algorithm can solve them efficiently in the worst-case scenario. This means
that the time required to solve an intractable problem grows exponentially or even
faster with the input size.
2. Matrix multiplication: Given two matrices A and B, the task is to find their
product C = AB. The best-known algorithm for matrix multiplication runs in
O(n^2.37) time complexity, whichis considered tractable for practical applications.
3. Shortest path in a graph: Given a graph G and two nodes s and t, the task is
to find the
132
shortestpath between s and t. Algorithms like Dijkstra's algorithm and the A*
algorithm can solve thisproblem in O(m + n log n) time complexity, where m is the
number of edges and n is the number of nodes in the graph.
4. Linear programming: Given a system of linear constraints and a linear
objective function, the task is to find the values of the variables that optimize the
objective function subject to the constraints. Algorithms like the simplex method can
solve this problem in polynomial time.
1. Traveling salesman problem (TSP): Given a set of cities and the distances
between them, thetask is to find the shortest possible route that visits each city
exactly once and returns to the starting city. The best-known algorithms for solving
the TSP havean exponential worst-case time complexity, which makes it intractable
for large instances of the problem.
2. Knapsack problem: Given a set of items with weights and values, and a
knapsack that can carry a maximum weight, the task is to find the most valuable
subset of items that can be carried by the knapsack. The knapsack problem is also
NP-hard and is intractable for large instances of the problem.
3. Boolean satisfiability problem (SAT): Given a boolean formula in
conjunctive normal form(CNF), the task is to determine if there exists an assignment
of truth values to the variables that makes the formula true. The SAT problem is one
of the most well- known NP-completeproblems, which means that any NP problem
can be reduced toSAT in polynomial time.
4. Subset sum problem: Given a set of integers and a target sum, the task is
to find a subset of the integers that sums up to the target sum. Like the knapsack
problem, the subset sum problem is also intractable for large instances of the
problem.
5. Graph isomorphism problem: Given two graphs G1 and G2, the task is to
determine if there
1. Linear search: Given a list of n items, the task is to find a specific item in
133
the list. The time complexity of linear search is O(n), which is a polynomial function
of the input size.
2. Bubble sort: Given a list of n items, the task is to sort them in ascending or
descending order. The time complexity of bubble sort is O(n^2), which is also a
polynomial function of the input size.
3. Shortest path in a graph: Given a graph G and two nodes s and t, the task
is to find the shortest path between s and t. Algorithms like Dijkstra's algorithm and
the A* algorithm can solve this problem in O(m + n log n) time complexity, which
is a polynomial function of the inputsize.
4. Maximum flow in a network: Given a network with a source node and a
sink node, and capacities on the edges, the task is to find the maximum flow from
the source to the sink. The Ford-Fulkerson algorithm can solve this problem in
O(mf), where m is the number of edges in the network and f is the maximum flow,
which is also a polynomial function of theinput size.
NP problems were a little harder for me to understand, but I think this is what they
are. Interms of solving a NP problem, the run-time would not be polynomial. It
would be something like O(n!) or something much larger.
NP-Hard Problems
NP-Complete problems are problems that live in both the NP and NP-Hard classes.
This means that NP-Complete problems can be verified in polynomial time and that
any NP problem can be reduced to this problem in polynomial time.
134
5.1.1 Polynomial-Time Algorithm
CComplexity Description
yclass
6. Sorting Algorithms:
o Merge Sort
o Bubble Sort
7. Graph Algorithms:
o Dijkstra’s Algorithm for shortest paths
o Kruskal’s Algorithm for Minimum Spanning Tree
8. Searching Algorithms:
o Binary Search
135
o Linear Search
9. Mathematical Computations:
o Matrix Multiplication
o Euclid's Algorithm for GCD
10. Dynamic Programming:
o Longest Common Subsequence
o Knapsack Problem
BubbleSort(array):
for i = 0 to length(array) - 1:
for j = 0 to length(array) - i - 1:
if array[j] > array[j + 1]:
swap(array[j], array[j + 1]) Characteristics
Venn Diagrams are used for Visual Representation of Relationships as they provide
136
a clear, visual method for showing how different sets intersect, overlap, or remain
distinct. This helps in understandingcomplex relationships between sets in
mathematics.
In a Venn diagram sets are represented as circles and the circles are shown inside
the rectangle which represents the Universal set. A universal set contains all the
circles since it has all the elements present involving all the sets.
NP-Problem:
The NP problems set of problems whose solutions are hard to find but easy to verify
and aresolved by Non-Deterministic Machine in polynomial time.
NP-HardProblem:
A Problem X is NP-Hard if there is an NP-Complete problem Y, such that Y is
reducible to X in polynomial time. NP-Hard problems are as hard as NP-Complete
problems. NP-HardProblem need not be in NP class.
example :
1. Hamiltonian cycle .
2. optimization problem .
3. Shortest path
NP-Complete Problem:
A problem X is NP-Complete if there is an NP problem Y, such that Y is reducible
to X in polynomial time. NP-Complete problems are as hard as NP problems. A
problem is NP- Complete if it is a part of both NP and NP-Hard Problem. A non-
deterministic Turing machinecan solve NP-Completeproblem in polynomial time.
A problem is np-complete when it is both np and np hard combines together. this
137
means np complete problems can be verified in polynomial time.
Example:
1. Decision problems.
2. Regular graphs.
NP-hard NP-Complete
138
produce the same output for a given
input, non-deterministic algorithms can produce different outputs due to the
inherent randomness or multiple execution paths.
Key Principles
• j= choice(a, n)
if(A[j]==x) then
{
write(j);
success();
}
write(0); failure();
NP-Hardness
139
probably not NP-hard (unless P=NP).
140
However, several techniques can be applied to tackle these problems:
Approximation: Finding a solution close to the optimal one.
Randomization: Using randomness to achieve faster average running times.
Restriction: Limiting the input structure to make the problem easier.
Parameterization: Fixing certain parameters to simplify the problem.
Heuristic: Employing algorithms that work well in practice but lack guaranteed
performance2. Importance of NP-Completeness
Suppose you have a problem A which you do not know how to solve. However,
you know an algorithm to solve problem B. If you can "transform" an instance α of
problem A into an instance β of problem B, you can use the known algorithm for B
to solve the "transformed" instance β, and obtain the solution for α from the
solution β, by "reversing the transformation". We then say that A reduces to B.
Why Are Reductions Important?
Polynomial-time Reductions
TRAVELING-SALESPERSON-PROBLEM
Given an undirected complete graph with non-negative weight on edges and b ∈
R+,the TRAVELING-SALESPERSON(/previously MAN)-
PROBLEM (TSP) problem asks:
Does there exist a tour of cost at most b?
141
YES-instance for b = 108 but NO-instance for b = 107 as OPT = 108
Note that we have built a specialized tsp visualization page that discusses this
problem and various strategies to attack this problem in a much detailed manner.
C-SAT
The CIRCUIT-SATISFIABILITY problem (C-SAT) is the decision problem of
determining whether a given Boolean circuit - essentially a Directed Acyclic Graph
- (using AND 𝖠, OR ∨, NOT ¬ gates) has an assignment of its n binary inputs that
makes the output True (1). In other words, it asks
whether the n inputs to a given Boolean circuit can be consistently set to True (1) or
False (0) such that the circuit outputs True (1). If that is the case, the circuit is called
satisfiable. Otherwise, the circuit is called unsatisfiable.
CNF-SAT
142
Example: Φ = (x̄1 ∨ x2 ∨ x3) 𝖠 (x1 ∨ x̄2 ∨ x3) 𝖠 (x̄1 ∨ x2 ∨ x4) is a YES-instance.
(Sample certificate: x1 = x2 = x4 = True, x3 = False).
Overview :
An approximation algorithm is a way of dealing with NP-completeness for an
optimization problem. This technique does not guarantee the best solution. The goal
of the approximation algorithm is to come as close as possible to the optimal
solution in polynomial time. Such algorithms are called approximation algorithms
or heuristic algorithms.
Scenario-1 :
2. We say that an algorithm for a problem has an appropriate ratio of P(n) if, for
any input size n, the cost C of the solution produced by the algorithm is within a
factor of P(n) of the cost C*of an optimal solution as follows.
max(C/C*,C*/C)<=P(n)
143
Scenario-2 :
If an algorithm reaches an approximation ratio of P(n), then we call it a P(n)-
approximation algorithm.
• For a maximization problem, 0< C < C*, and the ratio of C*/C gives the factor
by which the cost of an optimal solution is larger than the cost of the approximate
algorithm.
• For a minimization problem, 0< C* < C, and the ratio of C/C* gives the factor
by which the cost of an approximate solution is larger than the cost of an optimal
solution.
It is established that solving the travelling salesperson problems for the perfect
optimal solutions is not possible in polynomial time.
The triangle inequality is satisfied only if the cost function c, for all the vertices of a
144
triangle u, v andw, satisfies the following equation
Minimum Spanning Tree − The minimum spanning tree is a tree data structure
that contains all the vertices of main graph with minimum number of edges
connecting them. We apply prim’s algorithm for minimum spanning tree in this
case.
Pre-order Traversal − The pre-order traversal is done on tree data structures where
a pointer is walked through all the nodes of the tree in a [root – left child –
right child] order.
Algorithm
Step 1 − Choose any vertex of the given graph randomly as the starting and ending
point.
Step 2 − Construct a minimum spanning tree of the graph with the vertex chosen as
the root using prim’s algorithm.
Step 4 − The pre-order solution obtained is the Hamiltonian path of the travelling
salesperson.
Pseudocode APPROX_TSP(G, c)
145
5.5 Bin Packing problem
Bin Packing problem involves assigning n items of different weights and bins each
of capacity c to a bin such that number of total used bins is minimized. It may be
assumed that all itemshave weights smaller than bin capacity.
The following 4 algorithms depend on the order of their inputs. They pack the item
givenfirst and then move on to the next input or next item
Visual Representation
Let us consider the same example as used above and bins of size 1
Assuming the sizes of the items be {0.5, 0.7, 0.5, 0.2, 0.4, 0.2, 0.5, 0.1, 0.6}.
The minimum number of bins required would be Ceil ((Total Weight) / (Bin
Capacity))= Celi(3.7/1) = 4 bins.
The Next fit solution (NF(I))for this instance I would be- Considering 0.5 sized item
Moving on to the 0.7 sized item, we cannot place it in the first bin. Hence we place it
in a new bin.
146
Moving on to the 0.5 sized item, we cannot place it in the current bin. Hence we place it
in a new bin.
Moving on to the 0.2 sized item, we can place it in the current (third bin)
Similarly, placing all the other items following the Next-Fit algorithm we get-
147
Thus we need 6 bins as opposed to the 4 bins of the optimal solution. Thus we
can see thatthis algorithm is not very efficient.
The time complexity of the algorithm is clearly O(n). It is easy to prove that, for
anyinstance I of BPP,the solution value NF(I) provided by the algorithm
satisfies the bound
NF(I)<2z(I)
where z(I) denotes the optimal solution value. Furthermore, there exist
instances for whichthe ratio NF(I)/z(I) is arbitrarily close to 2, i.e. the worst-
case approximation ratio of NF is r(NF) = 2.
Psuedocode
148
indices andassigns each item to the lowest indexed initialized bin into which it
fits; only when the current item cannot fit into any initialized bin, is a new bin
introduced
Visual Representation
Let us consider the same example as used above and bins of size 1
Assuming the sizes of the items be {0.5, 0.7, 0.5, 0.2, 0.4, 0.2, 0.5, 0.1, 0.6}.
The minimum number of bins required would be Ceil ((Total Weight) / (Bin
Capacity))= Celi(3.7/1) = 4 bins.
The First fit solution (FF(I))for this instance I would be- Considering 0.5 sized
new bin.
Moving on to the 0.7 sized item, we cannot place it in the first bin. Hence we
place it in a
Moving on to the 0.5 sized item, we can place it in the first bin.
Moving on to the 0.2 sized item, we can place it in the first bin, we check with
the secondbin and we can place it there.
149
Moving on to the 0.4 sized item, we cannot place it in any existing bin. Hence
we place it in a new bin.
If FF(I) is the First-fit implementation for I instance and z(I) is the most optimal
solution, then:
150
It can be seen that the First Fit never uses more than 1.7 * z(I) bins. So First-Fit
isbetterthan Next Fit in terms of upper bound on number of bins.
Psuedocode FIRSTFIT(size[], n, c)
{
size[] is the array containg the sizes of the items, n is the number of items and c
is thecapacity of the bin/Initialize result (Count of bins) res = 0;
Create an array to store remaining space in bins there can be at most n bins
bin_rem[n];
return res;
}
Simply put,the idea is to places the next item in the tightest spot. That is, put
151
it in thebin sothat the smallest empty space is left.
Visual Representation
Let us consider the same example as used above and bins of size 1
Assuming the sizes of the items be {0.5, 0.7, 0.5, 0.2, 0.4, 0.2, 0.5, 0.1, 0.6}.
The minimum number of bins required would be Ceil ((Total Weight) / (Bin
Capacity))= Celi(3.7/1) = 4 bins.
The First fit solution (FF(I))for this instance I would be- Considering 0.5 sized
item first, we can place it in the first bin
Moving on to the 0.7 sized item, we cannot place it in the first bin. Hence we
place it in a new bin.
we can place it in the first bin tightly.
Moving on to
the 0.5 sized item,
Moving on to the 0.2 sized item, we cannot place it in the first bin but we can
152
place it in second bin tightly.
Moving on to the 0.4 sized item, we cannot place it in any existing bin. Hence
we place it in a new bin.
Similarly, placing all the other items following the First-Fit algorithm we get-
Thus we need 5 bins as opposed to the 4 bins of the optimal solution but is
much moreefficient than Next-Fit algorithm.
It can be noted that Best-Fit (BF), is obtained from FF by assigning the current
item to the feasible bin (if any) having the smallest residual capacity (breaking
ties in favour of the lowest indexedbin). BF satisfies the same worst-case
bounds as FF
If z(I) is the optimal number of bins, then Best Fit never uses more than 2 *
z(I)-2 bins. SoBest Fit is same as Next Fit in terms of upper bound on number
of bins.
Psuedocode
BESTFIT(size[],n, c)
{
size[] is the array containg the sizes of the items, n is the number of items and
c is thecapacity of the bin
Initialize result (Count of bins) res = 0;
153
Create an array to store remaining space in bins there can be at most n bins
Initialize minimum space left and index of best bin int min = c + 1, bi = 0;for (j
Algorithm
Let us consider the same example as used above and bins of size 1
Assuming the sizes of the items be {0.5, 0.7, 0.5, 0.2, 0.4, 0.2, 0.5, 0.1, 0.6}.
Sorting them we get {0.7, 0.6, 0.5, 0.5, 0.5, 0.4, 0.2, 0.2, 0.1}
We then select 0.6 sized item. We cannot place it in bin 1. So, we place it in bin
2
We then select 0.5 sized item. We cannot place it in any existing. So, we
place it in bin 3
155
We then select 0.5 sized item. We can place it in bin 3
Doing the same for all items, we get.
Thus only 4 bins are required which is the same as the optimal solution.
Let us consider the same example as used above and bins of size 1
Assuming the sizes of the items be {0.5, 0.7, 0.5, 0.2, 0.4, 0.2, 0.5, 0.1, 0.6}.
Sorting them we get {0.7, 0.6, 0.5, 0.5, 0.5, 0.4, 0.2, 0.2, 0.1}
156
The Best fit Decreasing solution would be-
We then select 0.6 sized item. We cannot place it in bin 1. So, we place it in bin
2
We then select 0.5 sized item. We cannot place it in any existing. So, we place
it in bin 3
157
Thus only 4 bins are required which is the same as the optimal solution.
5.6 Randomized Algorithms
Definition
Primality testing is the process used to determine whether a given number n is a prime
number or not. A prime number is defined as a natural number greater than 1 that has
no divisors other than 1 and itself. For example, numbers like 2, 3, 5, and 7 are prime,
158
while 4 and 6 are not because they have divisors other than 1 and themselves.
Primality testing can be carried out using deterministic and probabilistic algorithms:
1. Deterministic Methods
• Trial Division:
• The simplest approach, where a number n is tested for divisibility byall integers
from 2 up to \sqrt{n} .
• Sieve of Eratosthenes:
• Efficient for finding all primes up to a specified limit but not suitable forchecking
2. Probabilistic Methods
• These methods give a high probability of correctness rather than adefinite answer.
• Uses Fermat’s Little Theorem, which states that if n is prime, then forany integer a
positives(Carmichael numbers).
random values.
Example
• Check divisibility by all integers from 2 up to \sqrt{29} \approx 5.4 , i.e., 2, 3, and
5.
Explanation
Primality testing plays a crucial role in cryptography, where large primes are used for
encryption algorithms such as RSA. The efficiency of probabilistic tests like MillerRabin
makes them suitable for cryptographic applications.
160
complexity O(\log n).
Definition
Randomized Quick Sort is a variation of the traditional Quick Sort algorithm, where
the pivot element is chosen randomly. This randomization helps to avoid the worstcase
time complexity scenario by improving the likelihood of evenly splitting the array,
making the sorting process more efficient. Algorithm
1. Choose a Pivot Randomly:
• Rearrange the array so that elements smaller than the pivot are on theleft, and
Example
1. Randomly select a pivot, say 2, and swap it with the last element.
161
2. Partition the array around 2:
Explanation
Randomized Quick Sort helps avoid the worst-case scenario of O(n^2) , which occurs
when the pivot consistently divides the array into unbalanced parts. By randomizing
the pivot selection, the algorithm ensures a high probability of good splits, leading to
• Worst-Case Time Complexity: O(n^2) , occurs when the pivot choice results in
unbalanced partitions.
Randomized Quick Sort combines the efficiency of Quick Sort with the benefits of
162