0% found this document useful (0 votes)
10 views167 pages

Daa Book

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views167 pages

Daa Book

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 167

DESIGN AND

ANALYSIS OF
ALGORITHM

1
LECTURE NOTES

ON

23CS312 - DESIGN AND ANALYSIS OF


ALGORITHMS

III Semester

Computer Science & Engineering

Prepared by

V.SARANYA , M.TECH

K. .RAGHAVI , M.E

2
CONTENTS

Chapter No Title Page


no
I INTRODUCTION 2
1.1 Understanding of Algorithm 3

1.2 Algorithm analysis 4


1.2.1 Space Complexity 5
1.2.2 Time Complexity 7
1.3 Asymptotic notation 9
1.4 Solving Recurrence (Recurrence 15
relation)
1.4.1 Substitution method 17
1.5 Hash Function 19
1.6 Searching 20
1.6.1 Linear Search 20
1.6.2 Binary Search 25
1.6.3 Interpolation Search 29
1.7 Pattern Search 32
1.7.1 Naïve pattern searching 32
1.7.2 Rabin-Karp matching pattern 33
1.8 Sorting 37
1.8.1 Insertion sort 37
1.8.2 Heap Sort 41
II Graph Algorithm
2.1 Graph Representation 50
2.2 Graph traversal 52
2.2.1 BFS(Breadth first search) 53
2.2.2 DFS(Depth First Traversal 55
2.3 Minimum Spanning Tree 59
2.3.1 Prim’s Algorithm 60
2.3.2 Kruskal Algorithm 63
2.4 Shortest Path Algorithm 66
2.4.1 Bellman Algorithm 68
2.4.2 Dijkstra Algorithm 69
2.4.3 Ford-Fulkerson Algorithm 73
2.5 Maximum Flow 75

3
2.6 Network Flow 75
2.7 Maximum Bipartite Matching 76

ADVANCED DESIGN AND


III ANALYSIS TECHNIQUES

3.1 Divide and Conquer methodology 78


3.1.1 Finding the maximum and minimum 80
element in the array

3.1.2 Merge sort 81


3.1.3 Quick Sort 83
3.2 Dynamic programming: Elements of 89
dynamic programming –
3.2.1 Matrix-chain multiplication – 90
3.2.2 Multi stage graphs. 96
3.3 Greedy Technique: Elements of the 98
greedy strategy –
3.3.1 Greddy Algorithm 98
3.3.2 Activity-selection problem – 98
3.4 Huffman coding 101
IV State Space Search Algorithm
4.1 Backtracking 107
4.1.1 n-Queens problem 107
4.1.2 Hamiltonian Circuit Problem 110
4.1.3 Subset Sum Problem 116
4.1.4 Graph colouring problem 117
4.2 Branch and Bound 120
4.2.1 Solving 15-Puzzle problem 120
4.2.2 Assignment Problem 124
4.2.3 Knapsack Problem 126
4.2.4 Travelling Salesman Problem 129
V NP-COMPLETE AND
APPROXIMATION ALGORITHM

5.1 Tractable and intractable problems 131


5.1.1 Polynomial time algorithms 136
5.2 Venn diagram representation 136

4
5.3 Non Deterministic algorithms 138
5.4 Approximation Algorithms 143
5.4.1 Travelling salesman problem 144
5.5 Bin Packing Problem 146
5.6 Randomized Algorithm 158
5.7 Primality Testing 158
5.8 Randomized quick sort 161

5
23CS312 DESIGN AND ANALYSIS OF ALGORITHMS L T P C
3 0 2 4
COURSE OBJECTIVES:
To understand and apply the algorithm analysis techniques on searching and
sorting algorithms
To critically analyze the efficiency of graph
algorithms
To understand different algorithm design
techniques
To solve programming problems using state space
tree
To understand the concepts behind NP Completeness, Approximation algorithms
and randomized algorithms

UNIT I INTRODUCTION 9
Time and space complexity - Asymptotic Notations – Solving Recurrences:
substitution method - Lower bounds - hash function - searching: linear search, binary
search and Interpolation Search, String Matching: The naïve string - matching
algorithm - Rabin-Karp algorithm - Sorting: Insertion sort, heap sort

UNIT II GRAPH ALGORITHMS 9


Representations of graphs - Graph traversal: DFS - BFS - Minimum spanning tree:
Kruskal’s and Prim’s algorithm - Shortest path: Bellman - Ford algorithm - Dijkstra’s
algorithm - Maximum flow: Flow networks - Ford-Fulkerson method - Maximum
bipartite matching.
UNIT III ADVANCED DESIGN AND ANALYSIS TECHNIQUES 9
Divide and Conquer methodology: Merge sort - Quick sort- Dynamic programming:
Elements of dynamic programming - Matrix-chain multiplication - Multi stage graphs.
Greedy Technique: Elements of the greedy strategy - Activity-selection problem -
Huffman Trees

UNIT IV STATE SPACE SEARCH ALGORITHMS 9


Backtracking : n-Queens problem - Hamiltonian Circuit Problem - Subset Sum Problem
- Graph colouring problem Branch and Bound : Solving 15-Puzzle problem -
Assignment problem - Knapsack Problem - Travelling Salesman Problem.

UNIT V NP-COMPLETE AND APPROXIMATION ALGORITHM 9


Tractable and intractable problems: Polynomial time algorithms - Venn
diagram representation – Non Deterministic algorithms - NP-hardness and NP-
completeness - Problem reduction: TSP - 3 CNF problem. Approximation
Algorithms: Bin Packing problem - Randomized Algorithms: concept and
application - primality testing - randomized quick sort.
TOTAL: 45 PERIODS
1
UNIT 1- Introduction

Time and space complexity - Asymptotic Notations – Solving Recurrences: substitution method
- Lower bounds - hash function - searching: linear search, binary search and Interpolation
Search, String Matching: The naïve string - matching algorithm - Rabin-Karp algorithm -
Sorting: Insertion sort, heap sort

Definition:
Algorithm

An algorithm is a sequence of unambiguous instruction for solving a problem, for


obtaining a required outputfor any legitimate input in a finite amount of time.
“Algorithmic is more than the branch of computer science. It is the core of
computer science, and, in allfairness, can be said to be relevant it most of science,
business and technology”

Characteristics of an algorithm:
Non – ambiguity / Precise : Each step in an algorithm should be non- ambiguous. i.e
eachinstruction should be clear and precise.
Finiteness: The algorithm should be finite. The algorithm must be terminated in a
specifiedtime.
Uniqueness: The algorithm must be specified a required output.
Input: Algorithm receives input
Output: Algorithm produces output
Generality: The algorithm must works for all set of inputs.

Notion of an Algorithm
An algorithm is a sequence of unambiguous instruction for solving a problem, for
obtaining a required outputfor any legitimate input in a finite amount of time.
• The range of inputs for which an algorithm works has to be specified carefully.
• The same algorithm can be represented in several different ways.
• There may exist several algorithms for solving the same problem.

An algorithm is a finite set of instructions that accomplishes a particular task. Another


definition is a sequence of unambiguous instructions for solving a problem i.e, for
obtaining a required output for any legitimate (genuine) input in a finite amount of
time.
Algorithmic is the backend concept of the program or it is just like the recipe of the
program.

2
1.1 Understanding of Algorithm

An algorithm is a sequence of unambiguous instructions for solving a problem. i.e.,


for obtaining a required output for any legitimate input in a finite amount of time.There
are various methods to solve the sameproblem. The important points to be remembered
are:
1. The non-ambiguity requirement for each step of an algorithm cannot be compromised.

2. The range of input for which an algorithm works has to be specified carefully.
3. The same algorithm can be represented in different ways.
4. Several algorithms for solving the same problem may exist.
5. Algorithms for the same problem can be based on very different ideas and can solve
the problem with dramatically different speeds.

The example here is to find the gcd of two integers with three different ways: The gcd
of two nonnegative, not-both –zero integers m & n, denoted as gcd (m, n) is defined as
the largest integer that divides both m &n evenly, i.e., with a remainder of zero.
Euclid of Alexandria outlined an algorithm, for solving this problem in one of the
volumes of his Elements.
Gcd (m, n) = gcd (n, m mod n)

is applied repeatedly until m mod n is equal 0;

since gcd (m, o) = m. {the last value of m is also the gcd of the initial m & n.}

The structured description of this algorithm is:


Step 1: If n=0, return the value of m as the answer and stop; otherwise, proceed
to step2.
Step 2: Divide m by n and assign the value of the remainder to r.
Step 3: Assign the value of n to m and the value of r to n. Go to step 1.

Euclid”s algorithm : ALGORITHM Euclid(m, n)


//Computes gcd(m, n) by Euclid’s algorithm
//Input: Two nonnegative, not-both-zero integers m and n
//Output: Greatest common divisor of m and n
while n _= 0 do
r←m mod
n m←n n←r
return m

3
nd
This algorithm comes to a stop, when the 2 no becomes 0. The second number of the
pair gets smaller witheach iteration and it cannot become negative. Indeed, the new
value of n on the next iteration is m mod n, which is always smaller than n. hence, the
value of the second number in the pair eventually becomes 0, I algorithm stops.

Example: gcd (60, 24) = gcd (24,12) = gcd (12,0) = 12.

The second method for the same problem is: obtained from the definition itself. i.e., gcd
of m & n is the largest integer that divides both numbers evenly. Obviously, that
number cannot be greater than the second number (or) smaller of these two numbers,
which we will denote by t = min {m, n}. So start checking whether t divides both m and
n: if it does t is the answer ; if it doesn’t t is decreased by 1 and try again. (Do this
repeatedly till you reach 12 and then stop for the example given below)

Consecutive integer checking algorithm:


Step 1: Assign the value of min {m, n} to t.
Step 2: Divide m by t. If the remainder of this division is 0, go to step 3; otherwise, go to
step 4.
Step 3: Divide n by t. If the remainder of this division is 0, return the value of t as the
answer andstop; otherwise, proceed to step 4.
Step 4: Decrease the value of t by 1. Go to step 2.
Note: this algorithm, will not work when one of its inputs is zero. So, we have to specify
the rangeof input explicitly and carefully.
(If p is a common factor occurring pm & pn times is m and n, respectively,it should
be repeated min { pm,pn } times.).
Example: 60 = 2.2.3.5
24= 2.2.2.3
gcd (60,24) = 2.2.3 = 12 .

This procedure is more complex and ambiguity arises since the prime factorization is not
defined.So to make it as an efficient algorithm, incorporate the algorithm to find the prime
factors.

1.1 Algorithm Analysis


Algorithm analysis is the process of determining the time and space complexity of
an algorithm, which are measures of the algorithm's efficiency. Time complexity refers to
the amount of time it takes for an algorithm to run as a function of the size of the input,
and is typically expressed using big O notation. Space complexity refers to the amount of
memory required by an algorithm as a function of the size of the input, and is also typically
expressed using big O notation. To analyze the time complexity of an algorithm, we need
to consider the number of operations performed by the algorithm, and how the number
4
of operations changes as the size of the input increases. This can be done by counting the
number of basic operations performed in the algorithm, such as comparisons,
assignments, and function calls. The number of basic operations is then used to determine
the algorithm's time complexity using big O notation. To analyze the space complexity of
an algorithm, we need to consider the amount of memory used by the algorithm, and how
the amount of memory used changes as the size of the input increases. This can be done
by counting the number of variables used by the algorithm, and how the number of
variables used changesas the size of the input increases. The amount of memory used is
then used to determine the algorithm's space complexity using big O notation. It's
important to note that analyzing the time and space complexity of an algorithm is a way
to evaluate the efficiency of an algorithm and trade-off between time and space, but it is
not a definitive measure of the actual performance of the algorithm, as it depends on the
specific implementation of the algorithm, the computer and the input. Time and Space
Complexity.

Performance Analysis:
Performance analysis or analysis of algorithms refers to the task of determining the
efficiency of an algorithm i.,e how much computing time and storage an algorithm
requires to run (or execute). This analysis of algorithm helps in judging the value of one
algorithm over another.

To judge an algorithm, particularly two things are taken into consideration


1. Space complexity
2. Time complexity.

1.1.1 Space Complexity: The space complexity of an algorithm (program) is the


amount of memory it needsto run to completion.
The space needed by an algorithm has the following components.
• Instruction Space.
• Data Space.
• Environment Stack Space.

1. Instruction Space: Instruction space is the space needed to store the compiled version
of the programinstructions. The amount of instruction space that is needed depends on
factors such as
i). The compiler used to compile the program into machine code.ii). The compiler options
in effect at the time of compilation.
iii). The target computer, i.,e computer on which the algorithm run. Note that, one
compiler may produceless code as compared to another compiler, when the same
program is compiled by these two.

2. Data Space: Data space is the space needed to store all constant and variable values.
Data space has two components.
1. Space needed by constants, for example 0, 1, 2.134.
2. Space needed by dynamically allocated objects such as arrays, structures, classes.
5
Environmental Stack Space:
1. Environmental stack space is used during execution of functions.
2. Each time function is involved the followingdata are saved as the environmental
stack.
The return address.
1. Value of local variables.
2. Value of formal parameters in the function being invoked.
3. Environmental stack space is mainly used inrecursive functions.

Thus, the space requirement of any program p may therefore be written as

Space complexityS(P) = C + Sp (Instance characteristics).

Example of instance character is:

Algorithm NEC (float x, float y, float z)


{
Return (X + Y +Y * Z + (X + Y +Z)) /(X+ Y) + 4.0;

}
The following equation determines the number of addition, subtraction,
multiplication, division compares, loads stores and so on, that would be made by
the code for p.
tp(n) = CaADD(n)+ CsSUB(n)+ CmMUL(n)+ CdDIV(n)+……………..

where n denotes instance characteristics, and Ca, Cs, Cm, Cd and so on… As denote
the time needed for an addition, subtraction, multiplication, division and so on,
and ADD, SUB, MUL, DIV
and so on, are functions whose values are the number of additions, subtractions,
multiplications, divisions and so on. But this method is an impossible task to find
out time complexity.

Method 1: introduce a global variable “count”, which is initialized to zero. So each


time a statement in the signal program is executed, count is incremented by the step
count of that statement.

In the above algorithm, there are no instance characteristics and the space needed by
X, Y, Z is independent of instancecharacteristics, therefore we can write, S(XYZ)
=3+0=3 One space each for X, Y and Z Space complexity is O(1).

6
1.2.2 Time Complexity :

The time complexity of an algorithm is the amount of compile time


it needs to run to completion. We can measure time complexity of an algorithm in two
approaches 1. Priori analysis or compile time 2. Posteriori analysis or run (execution)
time. In priori analysis before the algorithm is executed we will analyze the behavior
of the algorithm. A priori analysis concentrates on
determining the order if execution of statements. In Posteriori analysis while the
algorithm is executed we measure the execution time. Posteriori analysis gives
accurate values but it is very costly. As we know that the compile time does not
depend on the size of the input. Hence, we will confine ourselves to consider only the
run-time which depends on the size of the inputand this run- time is denoted by TP(n).

Hence Time complexity T(P) = C + TP(n).


The time (T(P)) taken by a program P is the sum of the compile time and execution
time. The compile time does not depend on the instance characteristics, so we
concentrate on the runtime of a program. This runtimeis denoted by tp (instance
characteristics).

Example: Algorithm Sum(a, n)


{ s:=0
;
for i:=1 to n do
{
s:=s+a[i];
}
return s;
}

Algorithm sum with count


statement addedcount:=0;
Algorithm Sum(a,n)
{ s:=0
;
count:=count+ 1;for i:=1 to n do
{
count:=count +1; s:=s+a[i]; count:=count+1;
}
count:=count+1; //for last time of for loopcount:=count+1; //for return statement return s;

} Thus the total number of steps are 2n+3

7
Complexity of Algorithms:
• Best Case: Inputs are provided in such a way that the minimum time is required
to process them.
• Average Case: The amount of time the algorithm takes on an average set
ofinputs.
• Worst Case: The amount of time the algorithm takes on the worst possible set of
inputs.

Example: Linear Search

6 1
3 4 5 7 9 1 1
0
2 5
1 2 3 4 5 6 7 8 9

• Best Case: If we want to search an element 3, whether it is present in the array or


not. First, A(1) is compared with 3, match occurs. So the number of comparisons is
only one. It is observed that search takes minimum number of comparisons, so it
comes under best case.
Time complexity is O(1).
• Average Case: If we want to search an element 7, whether it is present in the array
or not. First, A(1) is compared with 7 i,.e, (3=7), no match occurs. Next, compare
A(2) and 7, no match occurs. Compare A(3) and A(4) with 7, no match
occurs. Up to now 4 comparisons takes place. Now compare A(5) and 7 (i.,e,
7=7), so match occurs. The number of comparisons is 5. It is observed that
search takes average number of comparisons. So it comes under average case.
.
Time complexity is O = O(n) (we neglect constant)

• Worst Case: If we want to search an element 15, whether it is present in the array
or not.
First, A(1) is compared with 15 (i.,e, 3=15), no match occurs. Continue this
process until either element is found or the list is exhausted. The element is found
at 9th comparison. So number of comparisons are
Time complexity is O(n).

Time and Space Complexity Notation

Time complexity is a measure of how long an algorithm takes to run as a function


of the size of the input.It is typically expressed using big O notation, which
describes the upper bound on the growth of the time required by the algorithm.
For example, an algorithm with a time complexity of O(n) takes longer to run as
the input size (n) increases.

8
There are different types of time complexities:

• O(1) or constant time: the algorithm takes the same amount of time to run
regardless of the size of the input.

• O(log n) or logarithmic time: the algorithm's running time increases


logarithmically with the size of theinput.

• O(n) or linear time: the algorithm's running time increases linearly with the
size of the input.

• O(n log n) or linear logarithmic time: the algorithm's running


time increases linearly with the size of theinput and
logarithmically with the size of the input.

• O(n^2) or quadratic time: the algorithm's running time increases quadratically


with the size of the input.
• O(2^n) or exponential time: the algorithm's running time increases exponentially
with the size of the input.
Space complexity, on the other hand, is a measure of how much
memory an algorithm uses as a function of the size of the input. Like time
complexity, it is typically expressed using big O notation. For example, an
algorithm with a space complexity of O(n) uses more memory as the input
size (n) increases. Space complexities are generally categorized as:
• O(1) or constant space: the algorithm uses the same amount of memory

regardless of the size of the input.

• O(n) or linear space: the algorithm's memory usage increases linearly with the

size of the input.

• O(n^2) or quadratic space: the algorithm's memory usage increases

quadratically with the size of the input.

a. Asymptotic Notation

Asymptotic Notation is used to describe the running time of an algorithm - how


much time an algorithm takeswith a given input, n. There are three different
notations: big O, big Theta (Θ), and big Omega (Ω).

Big O notation (O(f(n))) provides an upper bound on the growth of a function. It


describes the worst-case scenario for the time or space complexity of an algorithm.
For example, an algorithm with a time complexity of O(n^2) means that the running
9
time of the algorithm is at most n^2, where n is the size of the input.

Big Ω notation (Ω(f(n))) provides a lower bound on the growth of a function. It


describes the best-case scenario for the time or space complexity of an algorithm.
For example, an algorithm with a space complexity of Ω(n) means that the memory
usage of the algorithm is at least n, where n is the size of the input.

Big Θ notation (Θ(f(n))) provides a tight bound on the growth of a function. It


describes the average-case scenario for the time or space complexity of an algorithm.
For example, an algorithm with a time complexity of Θ(n log n) means that the
running time of the algorithm is both O(n log n) and Ω(n log n), where n is the size
of the input.
It's important to note that the asymptotic notation only describes the behavior of the
function for large values of n, and does not provide information about the exact
behavior of the function for small values of n. Also, for some cases, the best, worst
and average cases can be the same, in that case the notation will be simplified to
O(f(n)) = Ω(f(n)) = Θ(f(n))
Additionally, these notations can be used to compare the efficiency of different
algorithms, where a lower order of the function is considered more efficient. For
example, an algorithm with a time complexity of O(n) is more efficient than an
algorithm with a time complexity of O(n^2).
It's also worth mentioning that asymptotic notation is not only limited to time and
space complexitybut can be used to express the behavior of any function, not just
algorithms.
There are three asymptotic notations that are used to represent the time complexity
of an algorithm.They are:

Input: Here our input is an integer array of size "n" and we have
one integer "k" that we need to searchfor in that array.
Output: If the element "k" is found in the array, then we have return 1, otherwise
we have

// for-loop to iterate with each element in the array


for (int i = 0; i < n; ++i)
{
// check if ith element is equal to "k" or not
if (arr[i] == k)
return 1; // return 1, if you find "k"

}return 0; // return 0, if you didn't find "k" }


10
If the input array is [1, 2, 3, 4, 5] and you want to find if "1" is present in the
array or not, then the if- condition of the code will be executed 1 time and it will
find that the element 1 is there in the
array. So, theif-condition will take 1 second here.
If the input array is [1, 2, 3, 4, 5] and you want to find if "3" is present in the array
or not, then the if- condition of the code will be executed 3 times and it will find that
the element 3 is there in the array. So, the if-condition will take 3 seconds here.
If the input array is [1, 2, 3, 4, 5] and you want to find if "6" is present in the array
or not, then the if- condition of the code will be executed 5 times and it will find that
the element 6 is not there in the array and the algorithm will return 0 in this case.
So, the if-condition will take 5 seconds here.
As we can see that for the same input array, we have different time for different
values of "k". So,this can be divided into three cases:
Best case: This is the lower bound on running time of an algorithm. We must
know the case that causes the minimum number of operations to be executed. In
the above example, our array was [1, 2, 3, 4, 5] andwe are finding if "1" is present
in the array or not. So here, after only one comparison, we will get that ddelement
is present in the array. So, this is the best case of our algorithm.

Average case: We calculate the running time for all possible inputs, sum all the
calculated values and divide the sum by the total number of inputs. We must know
(or predict) distribution of cases.
Worst case: This is the upper bound on running time of an algorithm. We must
know the case that causes the maximum number of operations to be executed. In
our example, the worst case can be if the given arrayis [1, 2, 3, 4, 5] and we try to
find if element "6" is present in the array or not. Here, the if-condition of ourloop
will be executed 5 times and then the algorithm will give "0" as output.
So, we learned about the best, average, and worst case of an algorithm. Now, let's
get back to the asymptotic notation where we saw that we use three asymptotic
notation to represent the complexity of an algorithm i.e. Θ Notation (theta), Ω
Notation, Big O Notation

Types Asymptotic Notations:


O Notation (Big Oh) Ω Notation (Big Omega)θ Notation (Big Theta)
O Notation (Big Oh):Definition: A function f(n) is said to be in O(g(n)), denoted
f(n) ∈ O(g(n)),if f(n) is bounded above by some constant multiple of g(n) for all
large n, i.e., if there exist some

11
positive constant c and some nonnegative integer n0 such that f(n) ≤ cg(n) for all n
≥ n0 and c >0.O notation analyses the worst case for the given function. The
definition is illustrated in the following figure. Here n is size of an input &f(n) is a
function of n. When the input size increases, then the time is also increases.

Example:Let take f(n)=3n+2 and g(n)=n. We have to prove f(n)∈O(g(n)). By the definition,
f(n)≤c g(n)
3n+2≤ c * n where c>0, n0≥1.

We can substitute any value for c. The best option is, When c=4,
3n+2≤ 4 * n. Most of the cases, 4*n is greater than 3n+2.
Where n≥2.Reason for taking n≥2:
If n=1, 3(1)+2 ≤ 4(1)=> 5≤4=>
Becomes False.If n=2, 3(2)+2
≤ 4(2)=> 8≤8=>
Becomes True.
If n=3, 3(3)+2 ≤ 4(3)=> 11≤12=> Becomes True.
If n=4, 3(4)+2 ≤ 4(4)=> 14≤16=>
Becomes True. And so on.Therefore
3n+2 ∈ O(n).
2. Ω Notation (Big Omega):Definition: A function f(n) is said to be in Ω(g(n)), denoted f(n)
∈ Ω(g(n)), if f(n) is bounded below by some positive constant multiple of
g(n) for all large n, i.e.,if there exist some positive constant c and some
nonnegative integer n0 such that
f(n) ≥ cg(n) for all n ≥ n0 and c > 0.Ω notation analyses the best case
for the given function. Thedefinition is illustrated in the following figure.

12
if n=1, 3(1)+2 ≥1(1)=> 5≥1=> Becomes True. If n=2, 3(2)+2 ≥1(2)=>
8≥2=> Becomes True.If n=3, 3(3)+2 ≥1(3)=> 11≥3=> Becomes True. And so on.
Therefore 3n+2 ∈ Ω(n).

3. θ Notation (Big Theta):


Definition: A function f(n) is said to be in θ(g(n)), denoted f(n) ∈ θ(g(n)), if f(n) is
bounded bothabove and below by some positive constant multiples of g(n) for all
large n, i.e., if there exist some positive constants c1 and c2 and some nonnegative
integer n0 such that c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ n0. θ notation analyses the
average case for the given function.

Here n is size of an input &f(n) is a function of n.


When the input size increases, then the time is also increases. c1 and c2 are different
constants. f(n) isbounded by both upper and lower i.e) c1g(n) ≤ f(n) ≤ c2g(n).
13
Example:
Let take f(n)=3n+2 and g(n)=n.
We have to prove f(n) ∈θ(g(n)). By the definition, c1g(n) ≤ f(n) ≤ c2g(n)The definition
is illustrated in the following figure.
c1 * n ≤ 3n+2 ≤ c2 * n 3n+2 ≤ c2 * n when c2=4and 3n+2≥ c1 * n when c1=1.Such that,
1*n ≤ 3n+2 ≤ 4*n. Where n≥2. Therefore 3n+2
=θ(n).

Useful Property Involving the Asymptotic Notations Property 1:


If t1(n) ∈ O(g1(n)) and t2(n) ∈ O(g2(n)), then t1(n) + t2(n) ∈ O(max{g1(n), g2(n)}).

PROOF:
The proof extends to orders of growth the following simple fact about four arbitrary
real numbers a1, b1,a2, b2: if a1 ≤ b1 and a2 ≤ b2, then a1 + a2 ≤ 2 max{b1, b2}.
Since t1(n) ∈ O(g1(n)), there exist some positive constant c1 and some nonnegative
integer n1 such thatt1(n) ≤ c1g1(n) for all n ≥ n1.
Similarly, since t2(n) ∈ O(g2(n)),t2(n) ≤ c2g2(n) for all n ≥ n2.
Let us denote c3 = max{c1, c2} and consider n ≥ max{n1, n2} so that we can use both
inequalities.

Adding them yields the following:


t1(n) + t2(n) ≤ c1g1(n) + c2g2(n)
≤ c3g1(n) + c3g2(n) = c3[g1(n) + g2(n)] ≤ c32 max{g1(n), g2(n)}.
Hence, t1(n) + t2(n) ∈ O(max{g1(n), g2(n)}), with the constants c and n0 required by
the O definition being2c3 = 2 max{c1, c2} and max{n1, n2}, respectively.

14
1.4 Recurrence Relation
A recurrence relation is a mathematical equation that describes the relation between
the input sizeand the running time of a recursive algorithm. It expresses the running
time of a problem in termsof the running time of smaller instances of the same
problem.

A recurrence relation typically has the form T(n) = aT(n/b) + f(n) where:

T(n) is the running time of the algorithm on an input of size n


a is the number of recursive calls made by the algorithm
b is the size of the input passed to each recursive call

f(n) is the time required to perform any non-recursive operations


The recurrence relation can be used to determine the time complexity of the
algorithm using techniques such as the Master Theorem or Substitution Method.

For example, let's consider the problem of computing the nth Fibonacci number. A
simple recursive algorithm for solving this problem is as follows:

15
Fibonacci(n) if n <= 1 return n else

return Fibonacci(n-1) + Fibonacci(n-2)


The recurrence relation for this algorithm is T(n) = T(n-1) + T(n-2) + O(1), which
describes the running time of the algorithm in terms of the running time of the two
smaller instances of the problem with input sizes n- 1 and n-2. Using the Master
Theorem, it can be shown that the time complexity of this algorithm is O(2^n) which
is very inefficient for large input sizes.
What is a Recurrence Relation?
Whenever any function makes a recursive call to itself, its time can be computed by
a Recurrence Relation. Recurrence Relation is simply a mathematical
relation/equation that can give the value of any term in terms of some previous
smaller terms. For example,
T(n) = T(n-1) + N
It is a recurrence relation because the value of the nth term is given in its previous
term
i.e (n-1)the term.
Types of Recurrence Relation:
There are different types of recurrence relation that can be possible in the
mathematical world.Some of them are-
1. Linear Recurrence Relation: In case of Linear Recurrence Relation
every term is dependentlinearly on its previous term. Example of Linear
Recurrence Relation can be T(n) = T(n-1) + T(n-2) + T(n-3)
2. Divide and Conquer Recurrence Relation: It the type of
Recurrence Relation which isobtained from Divide and Conquer Algorithm.
Example of such recurrence relation can be
T(n) = 3T(n/2) + 9n
3. First Order Recurrence Relation: It is the type of recurrence relation
in which every term isdependent on just previous term. Example of this type of
recurrence relation can be-
T(n) = T(n-1)2
(4) Higher Order Recurrence Relation- It is the type of recurrence relation where
one term is not only dependent on just one previous term but on multiple previous
terms. If it will be dependent on two previous term then it will be called to be
second order. Similarly, for three previous term its will be called to be of third order
and so on. Let us see example of an third orderRecurrence relation
T(n) = 2T(n-1)2 + KT(n-2) + T(n-3)
Till now we have seen different recurrence relations but how to find time taken by
any recursive algorithm. So to calculate time we need to solve the recurrence
relation. Now for solving recurrence we have three famous methods-
Substitution Method

16
Recursive Tree Method Master Theorem
Now in this article we are going to focus on Substitution Method.

1.4.1 Substitution Method:


Substitution Method is very famous method for solving any recurrences. There are
two types ofsubstitution methods-
1. Forward Substitution
2. Backward Substitution

1. Forward Substitution:
It is called Forward Substitution because here we substitute recurrence of any term
into next terms.It uses following steps to find Time using recurrences-
Pick Recurrence Relation and the given initial Condition
Put the value from previous recurrence into the next recurrence Observe and
Guess the pattern and the time
Prove that the guessed result is correct using mathematical Induction.Now we will
use these steps to solve a problem. The problem is- T(n) = T(n-1) + n, n>1
T(n) = 1, n=1
Now we will go step by step-
1. Pick Recurrence and the given initial Condition:
T(n)=T(n-1)+n, n>1T(n)=1, n=1
2. Put the value from previous recurrence into the next recurrence:
T(1) = 1T(2) = T(1) + 2 = 1 + 2 = 3T(3) = T(2) + 3 = 1 + 2 + 3 = 6T(4)= T(3) + 4 = 1 + 2 + 3
+ 4 = 10
3. Observe and Guess the pattern and the time:
So guessed pattern will be-T(n) = 1 + 2 + 3 + n = (n * (n+1))/2Time Complexity will be O(n2)
4. Prove that the guessed result is correct using mathematical Induction:
Prove T(1) is true:
T(1) = 1 * (1+1)/2 = 2/2 = 1 and from definition of recurrence we know T(1) = 1. Hence
proved T(1) is trueAssume T(N-1) = ((N - 1) * (N-1+1))/2 = (N * (N-1))/2 to be true
Then prove T(N) will be true:T(N) = T(N-1) + N from recurrence definition
Now, T(N-1) = N * (N-1)/2So, T(N) = T(N-1) + N = (N * (N-1))/2 + N = (N * (N-1) +
2N)/2 =N * (N+1)/2And
from our guess also T(N)=N(N+1)/2Hence T(N) is true.Therefore our guess was
correct and time will be O(N2)

2. Backward Substitution:
It is called Backward Substitution because here we substitute recurrence of any term
into previousterms. It uses following steps to find Time using recurrences-
Take the main recurrence and try to write recurrences of previous terms
Take just previous recurrence and substitute into main recurrence
Again take one more previous recurrence and substitute into main
recurrence.
Do this process until you reach to the initial condition
17
o After this substitute the the value from initial condition and get the
solutionNow we will use these steps to solve a problem.
The problem is- T(n) = T(n-1) + n, n>1T(n) = 1, n = 1
Now we will go step by step-
1.Take the main recurrence and try to write recurrences of previous terms:
T(n) = T(n-1) + nT(n-1) = T(n-2) + n - 1T(n-2) = T(n-3) + n - 2
2.Take just previous recurrence and substitute into main recurrence
put T(n-1) into T(n)So, T(n)=T(n-2)+ n-1 + n
3.Again take one more previous recurrence and substitute into main recurrence
put T(n-2) into T(n)So, T(n)=T(n-3)+ n-2 + n-1 + n
4.Do this process until you reach to the initial condition
So similarly we can find T(n-3), T(n-4) and so on and can insert into T(n). Eventually we
will get following: T(n)=T(1) + 2 + 3 + 4 + ...... + n-1 + n
5.After this substitute the the value from initial condition and get the solution
Put T(1)=1, T(n) = 1 +2 +3 + 4 +................... + n-1 + n = n(n+1)/2. So Time will be O(N2)
Limitations of Substitution method:
The Substitution method is a useful technique to solve recurrence relations, but it also
has somelimitations. Some of the limitations are:
It is not guaranteed that we will find the solution as substitution method is based
on guesses.
It doesn't provide guidance on how to make an accurate guess, often relying on
intuition or trialand error.
It may only yield a specific or approximate solution rather than the most general or
precise one.
The substitution method isn't universally applicable to all recurrence relations,
especially thosewith complex or variable forms that do not get simplified using
substitution.
Lower and Upper Bound

The Lower and Upper Bound Theory provides a way to find the lowest complexity
algorithm to solve a problem. Before understanding the theory, first, let’s have a
brief look at what Lower and Upper bounds are.

LowerBound –
Let L(n) be the running time of an algorithm A(say), then g(n) is the Lower Bound
of A if there exist two constants C and N such that L(n) >= C*g(n) for n > N. Lower
bound of an algorithm is shown by the asymptotic notation called Big Omega (or
just Omega).

UpperBound –
Let U(n) be the running time of an algorithm A(say), then g(n) is the Upper Bound
of A if there exist two constants C and N such that U(n) <= C*g(n) for n > N. Upper
bound of an algorithm is shown by the asymptotic notation called Big Oh(O) (or just
Oh).

18
1.LowerBoundTheory:
According to the lower bound theory, for a lower bound L(n) of an algorithm, it is
not possible to have anyother algorithm (for a common problem) whose time
complexity is less than L(n) for random input. Also,every algorithm must take at
least L(n) time in the worst case. Note that L(n) here is the minimum of all
thepossible algorithms, of maximum
complexity. The Lower Bound is very important for any algorithm. Once we
calculated it, then we can compare it withthe actual complexity of the algorithm
and if their order is the same then we can declare our algorithm asoptimal. So in
this section, we will be discussing techniques for finding the lower bound of an
algorithm. Note that our main motive is to get an optimal algorithm, which is the
one having its Upper Bound the Sameas its Lower Bound (U(n)=L(n)). Merge Sort
is a common example of an optimal algorithm.
TrivialLowerBound

It is the easiest method to find the lower bound. The Lower bounds which can be
easily observed based onthe number of input taken and the number of output
produced are called Trivial Lower Bound

1.5 HASH FUNCTION

A hash function is a mathematical function that takes an input (or "message") and
produces a fixed-size string of characters, often in the form of a hash value or hash
code. This output is typically a sequence of numbers and letters, and the same input
always produces the same output. Hash functions are widely used in computer
science, cryptography, and data processing.

Characteristics of a Good Hash Function

Deterministic: The same input always yields the same output.


Fast Computation: Hashing the input should be computationally efficient.
Uniform Distribution: Outputs (hash values) should be evenly distributed to
minimize collisions.

Pre-image Resistance: Given a hash output, it should be computationally infeasible


to determine the original input.
Collision Resistance: It should be hard to find two different inputs that produce
the same hash value.
Avalanche Effect: A small change in input should drastically change the hash
value.

Applications of Hash Functions

1. Data Integrity: Ensures data has not been altered during transmission or
storage by comparing hash values.
2. Cryptography: Used in digital signatures, password hashing, and secure
19
communication protocols.
3. Hash Tables: Enables fast data retrieval by mapping keys to unique
indices.
4. Blockchain: Verifies transactions and maintains chain integrity by linking
blocks with hash values.
5. Checksums: Verifies file integrity during downloads or transfers.

1.6 Searching
Searching is the process of fetching a specific element in a collection of elements. The
collection can be an array or a linked list. If you find the element in the list, the
process is considered successful, and it returns the location of that element.
Two prominent search strategies are extensively used to find a specific item on a list.
However, the algorithmchosen is determined by the list's organization.

1. Linear Search
2. Binary Search
3. Interpolation search

1.6.1 Linear Search


Linear search, often known as sequential search, is the most basic search technique.
In this type of search, we go through the entire list and try to fetch a match
for a single element. If we find a match, then the address of the matching target
element is returned.
On the other hand, if the element is not found, then it returns a NULL value.
Following is a step- by-step approach employed to perform Linear Search Algorithm.

The procedures for implementing linear search are as follows:


Step 1: First, read the search element (Target element) in the array.
Step 2: In the second step compare the search element with the first element in the
array.

20
Step 3: If both are matched, display "Target element is found" and terminate the
Linear Search function. Step 4: If both are not matched, compare the search element
with the next element in the array. Step 5: In this step, repeat steps 3 and 4 until the
search (Target) element is compared with the last element of the array. Step 6 - If the
last element in the list does not match, the Linear Search Function will be terminated,
and themessage "Element is not found" will be displayed.

Linear Search ( Array Arr, Value a ) // Arr is the name of the array, and a is the searched
element. Step 1: Set i to 0 // i is the index of an array which starts from 0
Step 2: ifi > n then go to step 7 // n is the number of elements in
array Step 3: if Arr[i] = a then go to step 6
Step 4: Set i to i + 1
Step 5: Goto step 2
Step 6: Print element a found at index i and go to
step 8 Step 7: Print element not found
Step 8: Exit

Algorithm and Pseudocode of Linear Search Algorithm Algorithm of the Linear


Search Algorithm
Pseudocode of Linear Search Algorithm

Start
linear_search ( Array , value)

For each element in the array


If (searched element == value)
Return's the searched element locationend if
en d for end
Example of Linear Search Algorithm
Consider an array of size 7 with elements 13, 9, 21, 15, 39, 19, and 27 that starts
with 0 and ends withsize minus one, 6.
Search element = 39

Step 1: The searched element 39 is compared to the first element of an array, which
is 13.

21
The match is not found, you now move on to the next element and try to implement
a comparison.Step 2: Now, search element 39 is compared to the second element of
an array, 9.

Step 3: Now, search element 39 is compared with the third element, which is 21.

Again, both the elements are not matching, you move onto the next following
element. Step 4; Next, search element 39 is compared with the fourth element,
whichis 15.

Step 5: Next, search element 39 is compared with the fifth element 39.

A perfect match is found, display the element found at location 4.

22
The Complexity of Linear Search Algorithm
Three different complexities faced while performing Linear Search Algorithm, they
arementioned as follows.
Best Case
Worst Case
Average Case
Best Case Complexity
The element being searched could be found in the first position.
In this case, the search ends with a single successful comparison.
Thus, in the best-case scenario, the linear search algorithm performs O(1)
operations.
Worst Case Complexity
The element being searched may be at the last position in the array or not at all.
In the first case, the search succeeds in ‘n’ comparisons.
In the next case, the search fails after ‘n’ comparisons.
Thus, in the worst-case scenario, the linear search algorithm performs O(n)
operations.
Average Case Complexity
When the element to be searched is in the middle of the array, the average case of
the Linear Search Algorithm is O(n).
Space Complexity of Linear Search Algorithm
The linear search algorithm takes up no extra space; its space complexity is O(n) for
an array of n elements.
Application of Linear Search Algorithm
The linear search algorithm has the following applications:
• Linear search can be applied to both single-dimensional and multi-
dimensional arrays.
• Linear search is easy to implement and effective when the array
contains only a fewelements.
• Linear Search is also efficient when the search is performed to fetch a
single search inan unordered-List.

23
Code Implementation of Linear Search Algorithm

#include<stdio.h.>
#include<stdlib.h>
#include<conio.h>
int main()
{
int array[50],i,target,num;

printf("How many elements do you want in the


array");scanf("%d",&num);
printf("Enter array
elements:");
for(i=0;i<num;++i)
scanf("%d",&array[i]);
printf("Enter element to
search:");
scanf("%d",&target);
for(i=0;i<num;++i)
if(array[i]==target
)break;
if(i<num)
printf("Target element found at location
%d",i); };

24
1.6.2 Binary Search

Binary search is the search technique that works efficiently on sorted lists. Hence,
to search an element into some list using the binary search technique, we must
ensure that the list issorted.
Binary search follows the divide and conquer approach in which the list is divided
into two halves, and the item is compared with the middle element of the list. If
the match is found then, the location of the middle element is returned. Otherwise,
we search into either of the halvesdepending upon the result produced through
the match
NOTE: Binary search can be implemented on sorted array elements. If the list
elements are not arranged in a sorted manner, we have first to sort them.

Algorithm
1. Binary_Search(a, lower_bound, upper_bound, val) // 'a' is the given
array, 'lower_bound' is the index of the first array element, 'upper_bound' is the
index of the last array element, 'val' is the value tosearch
2. Step 1: set beg = lower_bound, end = upper_bound, pos = - 1
3. Step 2: repeat steps 3 and 4 while beg <=end
4. Step 3: set mid = (beg + end)/2
5. Step 4: if a[mid] = val
6. set pos = mid
7. print pos
8. go to step 6
9. else if a[mid] > val
10. set end = mid - 1
11. else
12. set beg = mid + 1
13. [end of if]
14. [end of loop]
15. Step 5: if pos = -1
16. print "value is not present in the array"
17. [end of if]
18. Step 6: exit Procedure binary_searchA ←

25
sorted
array n ← size ofarray x ← value
to besearched Set lowerBound = 1
Set upperBound = n while xnot found
if upperBound < lowerBound EXIT: x does not exists.
set midPoint = lowerBound + ( upperBound - lowerBound
) / 2 if A[midPoint] < x
set lowerBound = midPoint
+ 1 if A[midPoint] > x
set upperBound = midPoint -1 if A[midPoint] = x
EXIT: x found at locationmidPoint end
while
end procedure

Working of Binary search


To understand the working of the Binary search algorithm, let's take a sorted array.
It willbe easy to understand the working of Binary search with an example.
There are two methods to implement the binary search algorithm -
o Iterative method
o Recursive method
The recursive method of binary search follows the divide and conquerapproach.
Let the elements of array are -

Let the element to search is, K = 56


We have to use the below formula to calculate the mid of the array -
1. mid = (beg
+end)/2 So, in the given array -

beg = 0
end = 8
mid = (0 + 8)/2 = 4. So, 4 is the mid of the array.

26
Now, the element to search is found. So algorithm will return the index of the
elementmatched.
Binary Search complexity
Now, let's see the time complexity of Binary search in the best case, average
case, andworst case. We will also see the space complexity of Binary search.
Time Complexity

Case Time
Complexity
Best Case O(1)

Average O(logn)
Case
Worst O(logn)
Case

Best Case Complexity - In Binary search, best case occurs when the
element to search is found in first comparison, i.e., when the first middle element
itself is the element to be searched. The best-case time complexity of Binary search is
O(1).
Average Case Complexity - The average case time complexity of Binary
search is O(logn).
Worst Case Complexity - In Binary search, the worst case occurs, when
we have to keep reducing the search space till it has only one element.
The worst-case time complexity of Binary search is O(logn).

Space Complexity O(1)

Space Complexity
o The space complexity of binary search is O(1).

27
Implementation of Binary Search
Program: Write a program to implement Binary search in C language.
#include <stdio.h>
int binarySearch(int a[], int beg, int end, int val)
{
int mid;
if(end >= beg)
{ mid = (beg + end)/2;
/* if the item to be searched is present at middle */
if(a[mid] == val)
{
return mid+1;
}
/* if the item to be searched is smaller than middle, then it can only be in left
subarray*/
else if(a[mid] < val)
{
return binarySearch(a, mid+1, end, val);
}
/* if the item to be searched is greater than middle, then it can only be in right
subarray*/

else
. {binaryarch(a, bkkeg, mid-1
}
int main() {
int a[] = {11, 14, 25, 30, 40, 41, 52, 57, 70}; // given array
int val = 40; // value to be searched
int n = sizeof(a) / sizeof(a[0]); // size of array
int res = binarySearch(a, 0, n-1, val); // Store result printf("The elements of the array
are - ");
for (int i = 0; i < n; i++) printf("%d ", a[i]);
printf("\nElement to be searched is - %d", val);
if (res == -1)
printf("\nElement is not present in the array");
else
printf("\nElement is present at %d position of array", res);
return 0;
}

28
Output

1.6.3 Interpolation Search


Interpolation search is an improved variant of binary search. This search
algorithm workson the probing position of the required value. For this algorithm to
work properly, the data collection should be in a sorted form and equally
distributed.
Binary search has a huge advantage of time complexity over linear search.
Linear searchhas worst- case complexity of Ο(n) whereas binary search has Ο(log n).
There are cases where the location of target data may be known in advance.
For example,in case of a telephone directory, if we want to search the telephone
number of Morphius. Here, linear search and even binary search will seem slow as
we can directly jump to memory space where the names start from 'M' are stored.
Position Probing in Interpolation Search
Interpolation search finds a particular item by computing the probe position.

Initially, theprobe position is the position of the middle most item of the collection.
If a match occurs, then the index of the item is returned. To split the list into
two parts, we use the following method −
mid = Lo + ((Hi - Lo) / (A[Hi] - A[Lo])) * (X - A[Lo])

where − A = list
Lo = Lowest index of the list Hi = Highest indexof the list
A[n] = Value stored at index n in the list

If the middle item is greater than the item, then the probe position is again
calculated in the sub- array to the right of the middle item. Otherwise, the item is
searched in the subarray to the left of the middle item. This process continues on the
sub- array as well until the size of subarrayreduces to zero.
Runtime complexity of interpolation search algorithm is Ο(log (log n)) as compared
to
Ο(log n) of BST in favorable situations.
Algorithm
As it is an improvisation of the existing BST algorithm, we are mentioning the
29
steps tosearch the 'target' data value index, using position probing −
Step 1 − Start searching data from middle of the list.Step 2 − If it is a match, return
the index of the item,
and exit. Step 3 − If it is not a match, probe position.
Step 4 − Divide the list using probing formula and find the new midle.Step 5 − If data
is greater than middle, search in higher sub-list.
Step 6 − If data is smaller than middle, search in lower sub-list. Step 7 − Repeat until
match.

PseudocodeA
→ Array list N
→ Size of A
X → Target Value

Procedure Interpolation_Search() SetLo → 0

Set Mid → -1 Set Hi → N-1

While X does not match

if Lo equals to Hi OR A[Lo] equals to A[Hi]


EXIT: Failure, Target not found end if

Set Mid = Lo + ((Hi - Lo) / (A[Hi] - A[Lo])) * (X - A[Lo])

if A[Mid] = X
EXIT: Success, Target found at Mid elseif A[Mid] < X
Set Lo to Mid+1 else if A[Mid] > X Set Hi to Mid-1
end if end if End While End

Procedure

Implementation of interpolation in C

#include<stdio.h
>#define MAX 10
// array of items on which linear search will be conducted. int list[MAX] = { 10, 14,
19, 26, 27, 31, 33, 35, 42, 44 };
int find(int data) { int
lo = 0;
int hi = MAX - 1; int mid = -1;
int comparisons = 1; int index = -1; while(lo
<= hi) {
printf("\nComparison %d \n" , comparisons ) ; printf("lo :
%d, list[%d] = %d\n", lo, lo, list[lo]);

30
printf("hi : %d, list[%d] = %d\n", hi, hi, list[hi]);

comparisons++;
// probe the mid point
mid = lo + (((double)(hi - lo) / (list[hi] - list[lo])) * (data - list[lo])); printf("mid =
%d\n",mid);
// data found if(list[mid]
== data) {
index = mid; break;
} else {
if(list[mid] < data) {
// if data is larger, data is in upper half lo = mid
+ 1;
} else {
// if data is smaller, data is in lower half hi = mid
- 1;
}
}
}

printf("\nTotal comparisons made: %d", -- comparisons);return index;


}
int main() {
//find location of 33 int location = find(33);

// if element was foundif(location != -1)


printf("\nElement found at location: %d" ,(location+1)); else printf("Element not
found."); return
0;
}
If we compile and run the above program, it will produce the following result
−Output
Comparison 1
lo : 0, list[0] = 10
hi : 9, list[9] = 44
mid = 6

Total comparisons made: 1 Element found at location: 7


Time Complexity
Bestcase-O(1)
The best-case occurs when the target is found exactly as the first expected position
computed usingthe formula. As we only perform one comparison, the time
complexity is O(1).

31
Worst-case-O(n)
The worst case occurs when the given data set is exponentially distributed.

Averagecase-O(log(log(n)))
If the data set is sorted and uniformly distributed, then it takes O(log(log(n))) time as
on an average(log(log(n))) comparisons are made.

Space Complexity
O(1) as no extra space is required.

1.7 Pattern Search


Pattern Searching algorithms are used to find a pattern or substring from another
bigger string. There are different algorithms. The main goal to design these type of
algorithms to reduce the time complexity. The traditional approach may take lots of
time to complete the pattern searching taskfor a longer text.
Here we will see different algorithms to get a better performance of pattern
matching.In this Section We are going to cover.

Aho-Corasick Algorithm Anagram Pattern Search Bad Character Heuristic


Boyer Moore Algorithm
Efficient Construction of Finite Automata kasai’s Algorithm
Knuth-Morris-Pratt Algorithm Manacher’s Algorithm
Naive Pattern Searching Rabin-Karp Algorithm Suffix Array
Trie of all Suffixes Z Algorithm

1.7.1 Naïve pattern searching

Naïve pattern searching is the simplest method among other pattern searching
algorithms. It checks for all character of the main string to the pattern. This
algorithm is helpful for smaller texts. It does not need any pre-processing phases.
We can find substring by checking once for the string. It alsodoes not occupy extra
space to perform the operation.
The time complexity of Naïve Pattern Search method is O(m*n). The m is the size
of pattern andn is the size of the main string.

Input and Output Input:


Main String: “ABAAABCDBBABCDDEBCABC”,
pattern: “ABC”Output: Pattern found at position: 4 Pattern found at position:
10 Pattern found at position: 18

32
Algorithm
naive_algorithm(pattern, text)
Input − The text and the pattern
Output − locations, where the pattern is present in the text Stpaart _len := pattern
Size
str_len := string size
for i := 0 to (str_len - pat_len), do for j:= 0 to pat_len, do if text[i+j] ≠ pattern[j], then
breakif j == patLen, then
display the position i, as there pattern foundEnd

Implementation in C
#include <stdio.h> #include <string.h> int main (){
char txt[] = "tutorialsPointisthebestplatformforprogrammers"; charpat[] = "a";
int M = strlen (pat); int N
= strlen (txt);
for (int i = 0; i <= N - M; i++){ intj;
for (j = 0; j < M; j++) if (txt[i + j] != pat[j]) break;
if (j == M) printf("Patternmatchesatindex%d", i);
}
return 0;
}
Output
Pattern matches at 6 Pattern matches at 25 Pattern matches at 39
1.7.2 Rabin-Karp matching pattern

Rabin-Karp is another pattern searching algorithm. It is the string matching


algorithm that was proposed by Rabin and Karp to find the pattern in a more efficient
way. Like the Naive Algorithm, it also checks the pattern by moving the window one
by one, but without checking all characters for all cases, it finds the hash value. When
the hash value is matched, then only it proceeds to check each character. In this way,
there is only one comparison per text subsequence making it a more efficient
algorithm for pattern searching.
Preprocessing time- O(m)
The time complexity of the Rabin-Karp Algorithm is O(m+n), but for the worst case,
it is O(mn).
Algorithm
rabinkarp_algo(text, pattern, prime)
Input − The main text and the pattern. Another prime number of find hash location

Output − locations, where the pattern is found Startpat_len :=


pattern Length
str_len := string Length
patHash := 0 and strHash := 0, h := 1
maxChar := total number of characters in character set for indexi of all character in
the pattern, do
33
h := (h*maxChar) mod prime
for all character index i of pattern, do
patHash := (maxChar*patHash + pattern[i]) mod primestrHash :=
(maxChar*strHash + text[i]) mod prime for i := 0 to (str_len
pat_len),do if patHash = strHash, then
for charIndex := 0 to pat_len -1, do
if text[i+charIndex] ≠ pattern[charIndex], thenbreak
if charIndex = pat_len, then
print the location i as pattern found at i position. ifi < (str_len - pat_len), then
strHash := (maxChar*(strHash – text[i]*h)+text[i+patLen]) mod prime, thenif strHash
< 0, then
strHash := strHash + primeEnd

Example: Foí stíing matching, woíking module q = 11, how many spuíious hits
does the Rabin-Kaíp matcheí encounteís in ľext ľ = 31415926535…….
T = 31415926535……. P = 26 Here T.Length =11 so Q = 11 And P mod Q = 26 mod 11
= 4 Nowfind the exact match of P mod Q…

34
Implementation In C #include<stdio.h

>
#include<string.h
>int main (){
char txt[80], pat[80]; intq;
printf("Enterthecontainerstring "); scanf ("%s",&txt);
printf("Enterthepatterntobesearched "); scanf ("%s", &pat);
int d=256;
printf("Enteraprimenumber "); scanf ("%d", &q);
int M = strlen (pat); int
N = strlen (txt); int i, j;int p = 0; int t = 0; int h = 1;
for (i = 0; i < M - 1; i++) h = (h
*d) % q;
for (i = 0; i < M; i++){ p = (d * p + pat[i]) % q;t = (d * t + txt[i]) % q;
}
for (i = 0; i <= N - M; i++)
{ if (p == t){
for (j = 0; j < M; j++){ if (txt[i + j] != pat[j]) break;
}
if (j == M) printf("Patternfoundatinde
%d ", i);
}
if (i < N - M){
t = (d * (t - txt[i] * h) + txt[i + M]) % q; if (t
< 0)
t = (t + q);
}
}
return 0;
}
35
Output
Enter the container string tutorials point is the best programming website Enter the
pattern tobe searched
p
Enter a prime number 3 Pattern found at index
8Pattern found at index 21

n this problem, we are given two strings a text and a pattern. Our task is to create
a programfor KMP algorithm for pattern search, it will find all the occurrences of
pattern in text string.
Here, we have to find all the occurrences of patterns in the text.
Let’s take an example to understand the problem,
Input
text = “xyztrwqxyzfg” pattern = “xyz” Output
Found at index 0Found at index 7
Here, we will discuss the solution to the problem using KMP (Knuth Morris Pratt)
pattern searching algorithm, it will use a preprocessing string of the pattern which
will be used for matching in the text. And help’s in processing or finding pattern
matches in the case where matching characters are followed by the character of the
string that does not match the pattern.
We will preprocess the pattern wand to create an array that contains the proper prefix
and suffix from the pattern that will help in finding the mismatch patterns.
Program for KMP Algorithm for Pattern Searching
// C Program for KMP Algorithm for Pattern
SearchingExample #include<iostream>
#include<string.h> using namespace std;
void prefixSuffixArray(char* pat, int M, int* pps)
{ int length = 0; pps[0] = 0; int i = 1; while (i < M) {
if (pat[i] == pat[length]) { length++;
pps[i] = length; i++;
} else {
if (length != 0)
length = pps[length - 1]; else { pps[i] = 0; i++;
}
}
}
}
void KMPAlgorithm(char* text, char* pattern) {int M = strlen(pattern);
int N = strlen(text); int pps[M]; prefixSuffixArray(pattern, M, pps); int i = 0; int j = 0;
while (i
< N) {

if(pattern[j] ==text[i]) { j++;


36
i++;
}
if (j == M)
{
printf("Foundpatternatindex%d", i - j); j = pps[j - 1];
}
else if (i < N && pattern[j] != text[i]) { if (j != 0) j = pps[j - 1]; else
i = i + 1;
}
}
}
int main() {
char text[] = "xyztrwqxyzfg";

char pattern[] = "xyz"; printf("The pattern is found in the textat the following index :
");
KMPAlgorithm(text, pattern);return 0;
}
Output
The pattern is found in the text at the following index
– Found pattern at index 0Found pattern at index 7
1.8 Sorting :
Sorting is a fundamental operation in data structures and algorithms. It involves
arranging elements of a list or array in a specific order, usually ascending or
descending. Sorting is essential for optimizing search operations, organizing data,
and improving the efficiency of algorithms.

1.8.1 Insertion sort


Insertion sort works similar to the sorting of playing cards in hands. It is assumed that
the first card is alreadysorted in the card game, and then we select an unsorted card.
If the selected unsorted card is greater than thefirst card, it will be placed at the right
side; otherwise, it will be placed at the left side. Similarly, all unsortedcards are taken
and put in their exact place.

The same approach is applied in insertion sort. The idea behind the insertion sort is
that first take one element,iterate it through the sorted array. Although it is simple to
use, it is not appropriate for large data sets as the time complexity of insertion sort in
the average case and worst case is O(n2), where n is the number of items. Insertion
sort is less efficient than the other sorting algorithms like heap sort, quick sort, merge
sort, etc.

Algorithm
The simple steps of achieving the insertion sort are listed as follows -
37
Step 1 - If the element is the first element, assume that it is already sorted. Return 1.
Step2 - Pick the next element, and store it separately in a key. Step3 - Now, compare
the key with all elementsin the sorted array.
Step 4 - If the element in the sorted array is smaller than the current element, then
move to the next element.Else, shift greater elements in the array towards the right.
Step 5 - Insert the value.
Step 6 - Repeat until the array is sorted. Working of Insertion sort AlgorithmNow,
let's see the working of the insertion sort Algorithm.
To understand the working of the insertion sort algorithm, let's take an unsorted
array. It will be easier tounderstand the insertion sort via an example.
Let the elements of array are -

Initially, the first two elements are compared in insertion sort.

Here, 31 is greater than 12. That means both elements are already in ascending order.
So, fornow, 12 is stored in a sorted sub-array.

Now, move to the next two elements and compare them.

Here, 25 is smaller than 31. So, 31 is not at correct position. Now, swap 31 with
25. Alongwith swapping, insertion sort will also check it with all elements in the
sorted array. For now, the sorted array has only one element, i.e. 12. So, 25 is greater
than 12. Hence, the sortedarray remains sorted after swapping.

Now, two elements in the sorted array are 12 and 25. Move forward to the next
elements that are31 and 8.

Both 31 and 8 are not sorted. So, swap them.

After swapping, elements 25 and 8 are unsorted.


38
So, swap them.

Now, elements 12 and 8 are unsorted.

So, swap them too.


Now, the sorted array has three items that are 8, 12 and 25. Move to the next items that
are 31 and32.

Hence, they are already sorted. Now, the sorted array includes 8, 12, 25 and 31.
Move to the next elements that are 32 and 17.

17 is smaller than 32. So, swap them.

Swapping makes 31 and 17 unsorted. So, swap them too.

Now, swapping makes 25 and 17 unsorted. So, perform swapping again.

Now, the array is completely sorted.

Insertion sort complexity


39
1. Time Complexity
ase ime
Complexit
y
est (n)
Case
verage (n2)
Case
orst (n2)
Case
o Best Case Complexity -
o It occurs when
there is no sorting required, i.e. the array is already sorted. The best-case time
complexity of insertion sort is
O(n).
o Average Case Complexity - It occurs when the array elements are in jumbled order
that is not properly ascending andnot properly descending. The average case time
complexity of insertion sort is O(n2).
o Worst Case Complexity - It occurs when the array elements are required to be sorted
in reverse order. That means suppose you have to sort the array elements in ascending
order, but its elements are in descending order. The worst- case time complexity of
insertion sort is O(n2).
2. Space Complexity
Space Complexity O(1)
Stable YES
o The space complexity of insertion sort is O(1). It is because, in insertion sort, an extra
variable is required for swapping.
Implementation of insertion sort

Program: Write a program to implement insertion sort in C language.


#include <stdio.h>
void insert(int a[], int n) /* function to sort an aay with insertion sort */
{
int i, j, temp;
for (i = 1; i < n; i++)
{temp = a[i]; j = i - 1;

while(j>=0 && temp <= a[j]) /* Move the elements greater than temp to one
position a headfrom their current position*/

j+1] = a[j];
= j-1;

j+1] = temp;

40
}

void printArr(int a[], int n) /* function to print the array */


{
int i;
for (i = 0; i < n; i++) printf("%d ", a[i]);
}

int main()
{
int a[] = { 12, 31, 25, 8, 32, 17 };
int n = sizeof(a) / sizeof(a[0]);
printf("Before sorting array elements are -
\n");
printArr(a, n);
insert(a, n);
printf("\nAfter sorting array elements are -
\n");
printArr(a, n);
return 0;

1.8.2 Heap Sort

Heap Sort Algorithm

Heap sort processes the elements by creating the min-heap or max-heap using the
elements of the given array. Min-heap or max-heap represents the ordering of array
in which the root element represents the minimum or maximum element of the
array.

Heap sort basically recursively performs two main operations -


o Build a heap H, using the elements of array.
o Repeatedly delete the root element of the heap formed in 1st phase.
A heap is a complete binary tree, and the binary tree is a tree in which the node can
have the utmost two children. A complete binary tree is a binary tree in which all
41
the levels except the last level, i.e., leaf node, should be completely filled, and all
the nodes should be left-justified.
Heapsort is a popular and efficient sorting algorithm. The concept of heap sort is to
eliminate the elements one by one from the heap part of the list, and then insert them
into the sorted part of thelist.

Algorithm
1. HeapSort(arr)
2. BuildMaxHeap(arr)
3. for i = length(arr) to 2
4. swap arr[1] with arr[i]
5. heap_size[arr] = heap_size[arr] ? 1
6. MaxHeapify(arr,1)
7. End
BuildMaxHeap(arr)

1. BuildMaxHeap(arr)
2. heap_size(arr) = length(arr)
3. for i = length(arr)/2 to 1
4. MaxHeapify(arr,i)
5. End
MaxHeapify(arr,i)
1. MaxHeapify(arr,i)
2. L = left(i)
3. R = right(i)
4. if L ? heap_size[arr] and arr[L] > arr[i]
5. largest = L
6. else
7. largest = i
8. if R ? heap_size[arr] and arr[R] > arr[largest]

9. largest = R
10. if largest != i
11. swap arr[i] with arr[largest]
12. MaxHeapify(arr,largest)
13. End
42
Working of Heap sort Algorithm
In heap sort, basically, there are two phases involved in the sorting of elements. By
using the heapsort algorithm, they are as follows -

o The first step includes the creation of a heap by adjusting the elements
of the array.
o After the creation of heap, now remove the root element of the heap
repeatedly by shifting it to the end ofthe array, and then store the heap structure
with the remaining elements.

First, we have to construct a heap from the given array and convert it into max heap.

After converting the given heap into max heap, the array elements are -

Next, we have to delete the root element (89) from the max heap. To delete this node,
we have toswap it with the last node, i.e. (11). After deleting the root element, we
again have to heapify it toconvert it into max heap.

43
After swapping the array element 89 with 11, and converting the heap into max-
heap, the elementsof array are -

In the next step, again, we have to delete the root element (81) from the max
heap. To deletethis node, we have to swap it with the last node, i.e. (54). After
deleting the root element, we again have to heapify it to convert it into max heap.

After swapping the array element 81 with 54 and converting the heap into max-
heap, the elements of array are -

In the next step, we have to delete the root element (76) from the max heap again.
To deletethis node, we have to swap it with the last node, i.e. (9). After deleting the
root element, weagain have to heapify it to convert it into max heap.

After swapping the array element 76 with 9 and converting the heap into max-
heap, the elements of array are -

44
In the next step, again we have to delete the root element (54) from the max heap.
To delete this node, we have to swap it with the last node, i.e. (14). After deleting
the root element, weagain have to heapify it to convert it into max heap.

After swapping the array element 54 with 14 and converting the heap into max-
heap, the elements of array are -

In the next step, again we have to delete the root element (22) from the max heap.
To delete this node, we have to swap it with the last node, i.e. (11). After deleting
the root element, weagain have to heapify it to convert it into max heap.

After swapping the array element 22 with 11 and converting the heap into max-
heap, the elementsof array are -

In the next step, again we have to delete the root element (14) from the max heap.
To delete this node, we have to swap it with the last node, i.e. (9). After deleting the
root element, we again haveto heapify it to convert it into max heap.

After swapping the array element 14 with 9 and converting the heap into max-heap,
the elementsof array are -

45
In the next step, again we have to delete the root element (11) from the max heap.
To delete this node, we have to swap it with the last node, i.e. (9). After deleting the
root element, we again haveto heapify it to convert it into max heap.

After swapping the array element 11 with 9, the elements of array are -

Now, heap has only one element left. After deleting it, heap will be empty.

After completion of sorting, the array elements are -

Time complexity of Heap sort in the best case, average case, and worst case
1. Time Complexity

Case Time Complexity

Best Case (n logn)

Average (n log n)
Case
Worst Case (n log n)

o Best Case Complexity - It occurs when there is no sorting required, i.e. the array is
already sorted. The best-case time complexity of heap sort is O(n logn).
o Average Case Complexity - It occurs when the array elements are in jumbled order
that is not properly ascending and not properly descending. The average case time
complexity of heap sort is O(n log n).
o Worst Case Complexity - It occurs when the array elements are required to be
sorted in reverse order. That means suppose you have to sort the array elements in
ascending order, but its elements are in descending order. The worst-case time
complexity of heap sort is O(n log n).
46
The time complexity of heap sort is O(n logn) in all three cases (best case, average
case,and worst case). The height of a complete binary tree having n elements is
logn.

2. Space Complexity

The space complexity of Heap sort isO(1).


Implementation of Heapsort
Program: Write a program to implement heap sort in C language.
1. #include <stdio.h>
2. /S* function to heapify a subtree. Here 'i' is the
3. index of root node in array a[], and 'n' is the size of heap. */
4. void heapify(int a[], int n, int i)
5. {
int largest = i; // Initialize largest as root
6. int left = 2 * i + 1; // left child
7. int right = 2 * i + 2; // right child// If left child is larger than root
8. if (left < n && a[left] > a[largest])
9. largest = left;

10. // If right child is larger than root


11. if (right < n && a[right] > a[largest])
12. largest = right;
13. // If root is not largest
14. if (largest != i) {
15. // swap a[i] with a[largest]
16. int temp = a[i];
17. a[i] = a[largest];
18. a[largest] = temp;
19. heapify(a, n, largest);
22. }

23. }
24. /*Function to implement the heap sort*/

47
25. void heapSort(int a[], int n)
26. {
27. for (int i = n / 2 - 1; i >= 0; i--)
28. heapify(a, n, i);
29. // One by one extract an element from heap
30. for (int i = n - 1; i >= 0; i--) {
31. /* Move current root element to end*/
32. // swap a[0] with a[i]
33. int temp = a[0];
34. a[0] = a[i];
35. a[i
] = temp; 36.
37.

heapify(a, i, 0);
38. }
39. }
40. /* function to print the array elements */
41. void printArr(int arr[], int n)
42. {
43. for (int i = 0; i < n; ++i)
44. {
45. printf("%d", arr[i]);

46. printf(" ");

47. }
48.
49. }

50. int main()


{
51. int a[] = {48, 10, 23, 43, 28, 26, 1};
52. int n = sizeof(a) / sizeof(a[0]);
48
53. printf("Before sorting array elements are - \n");
54. printArr(a, n);
55. heapSort(a, n);
56. printf("\nAfter sorting array elements are - \n");
57. printArr(a, n);
58. return 0;
60.

}
Output

49
UNIT II - GRAPH ALGORITHMS
Representations of graphs - Graph traversal: DFS - BFS - Minimum spanning tree:
Kruskal’s and Prim’s algorithm - Shortest path: Bellman - Ford algorithm -
Dijkstra’s algorithm - Maximum flow: Flow networks - Ford-Fulkerson method -
Maximum bipartite matching.
2.1 Graph Representation

Graph algorithms are methods for solving problems related to graph data

structures, which consist of nodes (vertices) connected by edges. Graphs are

widely used in various fields, including computer networks, social networks,

transportation, and computational biology.

Definition

A graph G(V, E) is a non-linear data structure that consists of nodeand edge pairs of
objects connected by links.
There are 2 types of graphs:

Directed Undirected

Directed graph

A graph with only directed edges is said to be a directed graph. Example


The following directed graph has 5 vertices and 8 edges. This graphG can be defined
as G = (V, E), where V =
{A,B,C,D,E} and E =
{(A,B), (A,C) (B, E), (B,D), (D, A), (D, E),(C,D),(D,D)}.

Directed Graph

Undirected graph

A graph with only undirected edges is said to be an undirected graph. Example


The following is an undirected graph.

50
Undirected Graph

Representation of Graphs
Graph data structure is represented using the following representations.

1. Adjacency Matrix
2. Adjacency List

Adjacency Matrix

In this representation, the graph can be represented using a matrix of size n x


n, where n is the number of vertices.
This matrix is filled with either 1’s or 0’s.
Here, 1 represents that there is an edge from row vertex to column vertex, and
0 represents that there is no edge from row vertex to column vertex.

Directed graph representationAdjacency list


In this representation, every vertex of the graph contains a list of its adjacent
vertices.
If the graph is not dense, i.e., the number of edges is less, then it isefficient to
represent the graph through the adjacency list.

Adjacency List

51
2.2 Graph traversals

Graph traversal is a technique used to search for a vertex in a graph. It is also


used to decide the order of vertices to be visited in the searchprocess.
A graph traversal finds the edges to be used in the search process
without creating loops. This means that, with graph traversal, we can visit all the
vertices of the graph without getting into a looping path. There are two graph
traversal techniques:

DFS (Depth First Search)


BFS (Breadth-First Search)

Applications of graphs
Social network graphs : To tweet or not to tweet. Graphs that represent who knows
whom, who communicates with whom, who influences whom, or other
relationships in social structures. An example is the twitter graph of who follows
whom.
Graphs in epidemiology: Vertices represent individuals and directed edges to
view the transfer of an infectious disease from one individual to another.
Analyzing such graphs has become an important component in understanding
and controlling the spread of diseases.
Protein-protein interactions graphs: Vertices represent proteins and edges
represent interactions between them that carry out some biological function in the
cell. These graphs can be used to, for example, study molecular pathway—chains
of molecular interactions in a cellular process.
Network packet traffic graphs: Vertices are IP (Internet protocol) addresses and
edges are the packets that flow between them. Such graphs are used for analyzing
network security, studying the spread of worms, and tracking criminal or non-
criminal activity.
Neural networks: Vertices represent neurons and edges are the synapses between
them. Neural networks are used to understand how our brain works and how
connections change when we learn. The human brain has about 1011 neurons and
close to 1015 synapses.
Graph traversal

Graph traversal is a technique used for searching a vertex in a graph. The graph
traversal is also usedto decide the order of vertices is visited in the search process.
A graph traversal finds the edges to beused in the search process without creating
loops. That means using graph traversal we visit all the vertices of the graph
without getting into looping path.
There are two graph traversal techniques and they are as follows...

DFS (Depth First Search)


BFS (Breadth First Search)

52
2.2.1 BFS (Breadth First Search)
BFS traversal of a graph produces a spanning tree as final result. Spanning Tree is
a graph without loops. We use Queue data structure with maximum size of total
number of vertices inthe graph to implement BFS traversal.
BFS Algorithm
For the BFS implementation, we will categorize each vertex of the graph into two
categories:
Visited
Not Visited
The reason for this division is to prevent the traversal of the same node again thus
avoiding cycles in the graph. Breadth First Traversal in Data Structure uses a queue to
keep track of the vertices to visit. By processing vertices in the order they were added to
the queue,BFS ensures that all the vertices at the current level are visited before moving
on to the next level. A boolean visited array is used to mark the visited vertices. This
leads to a breadth-first exploration of the graph or tree, hence the name "breadth-first
traversal."
Here's a step-by-step explanation of how the BFS algorithm works:
Start by selecting a starting vertex or node.
Add the starting vertex to the end of the queue.
Mark the starting vertex as visited and add it to the visited array/list.
While the queue is not empty, do the following steps:
Remove the front vertex from the queue.
Visit the removed vertex and process it.
Enqueue all the adjacent vertices of the removed vertex that have not been visited
yet.
Mark each visited adjacent vertex as visited and add it to the visited array/list.
Repeat step 4 until the queue is empty.
The graph might have two different disconnected parts so to make sure that we
cover everyvertex, we can also run the BFS algorithm on every node.
Example
tep Raversal Escription

itialize the queue.

e start from visiting S (starting


node),nd mark it as visited.

53
e then see an unvisited adjacent
nodeom S. In this example, we
have three odes but
alphabetically we choose A, ark it
as visited and enqueue it.

ext, the unvisited


adjacent node from S
B. We mark it as
visited and enqueue
.

ext, the unvisited


adjacent node from S
C. We mark it as
visited and enqueue
.

ow, S is left with no


unvisited
adjacentodes. So, we
dequeue and find A.

rom A we have D as
unvisiteddjacent node.
We mark it as isited
and enqueue it.

BFS pseudocode

create a queue Q
mark v as visited and put v into Q while Q is non- emptyremove the head u of Q

mark and enqueue all (unvisited) neighbours of u


BFS Algorithm Complexity
The time complexity of the BFS algorithm is represented in the form of O(V + E),
54
where Vis the number of nodes and E is the number of edges.
The space complexity of the algorithm is O(V).
Applications of Breadth First Search Algorithm

1. Minimum spanning tree for unweighted graphs:In Breadth-First Search we can


reach from any given source vertex to another vertex, with the minimum number
of edges, and this principle can be used to find the minimum spanning tree which
is the path coveringall vertices in the shortest paths.
2. Peer-to-peer networking: In Peer-to-peer networking, to find the neighboring
peer fromany other peer, the Breadth-First Search is used.
3. Crawlers in search engines: Search engines need to crawl the internet. To do so,
they can start from any source page, follow the links contained in that page in the
Breadth- First Search manner, and therefore explore other pages.
4. GPS navigation systems: To find locations within a given radius from any source
person, we can find all neighboring locations using the Breadth-First Search, and
keep on exploring until those are within the K radius.
5. Broadcasting in networks: While broadcasting from any source, we find all its
neighboring peers and continue broadcasting to them, and so on.
6. Path Finding: To find if there is a path between 2 vertices, we can take any vertex
as a source, and keep on traversing until we reach the destination vertex. If we
explore all vertices reachable from the source and cannot find the destination
vertex, then that meansthere is no path between these 2 vertices.
7. Finding all reachable Nodes from a given Vertex: All vertices that are reachable
from a given vertex can be found using the BFS approach in any disconnected
graph. The vertices that are marked as visited in the visited array after the BFS is
complete contain all those reachable vertices.

2.1.1 DFS(Depth First Traversal )


The depth-first search algorithm works by starting from an arbitrary node of the
graph and exploring asfar as possible before backtracking i.e. moving to an
adjacent node until there is no unvisited adjacent node. After backtracking, it
repeats the same process for all the remaining vertices which have not beenvisited
till now. It is a recursive algorithm for searching all the vertices of a graph or tree
data structure.

DFS Algorithm

For the DFS implementation, we will categorize each vertex of the graph into
twocategories:

1. Visited
2. Not Visited

The reason for this division is to prevent the traversal of the same node again thus,
avoiding cycles in the graph. Depth First Traversal in Data Structure uses a stack
to keep track of the vertices to visit. A boolean visited array is used to mark the
visited vertices.

55
Here's a step-by-step explanation of how the DFS algorithm works::

1. Start by selecting a starting vertex or node.


2. Add the starting vertex on top of a stack.
3. Take the top item of the stack and add it to the visited list/array.
4. Create a list of that vertex's adjacent nodes. Add the ones that aren't in the visited
list to the top of thestack.

Keep repeating steps 3 and 4 until the stack is empty.The graph might have two
different disconnectedparts so to make sure that we cover every vertex, we can
also run the BFS algorithm on every node.

Depth First Search (DFS) algorithm traverses a graph in a depthward motion and
uses a stack to remember to get the next vertex to start a search, when a dead end
occurs in anyiteration.

As in the example given above, DFS algorithm traverses from S to A to D to G to E


to Bfirst, then to F and lastly to C. It employs the following rules.
Rule 1 − Visit the adjacent unvisited vertex. Mark it as visited. Display it. Push
it in a stack.
Rule 2 − If no adjacent vertex is found, pop up a vertex from the stack. (It will pop
up allthe vertices from the stack, which do not have adjacent vertices.)
Rule 3 − Repeat Rule 1 and Rule 2 until the stack is empty.

56
t raversal escription
e
p

itialize the stack.

ark S as visited and


put it onto the stack.
xplore any unvisited
adjacent node from
. We have three nodes
and we can pick ny of
them. For this
example, we shall ke
the node in an
alphabetical order.

57
ark A as visited and
put it onto the ack.
Explore any unvisited
adjacent ode from A.
Both S and D are
adjacentA but we are
concerned for
unvisited odes only.
isit D and mark it as
visited and put ontoe
stack. Here, we
ave B and C nodes,
which are adjacent D
and both are
unvisited. However,
weall again choose in
an alphabetical
rder.

e choose B, mark it as
visited and put nto the
stack. Here B does not
have anynvisited
adjacent node.
So, we pop B om the
stack.

e check the stack top


for return to the
revious node and
check if it has any
nvisited nodes. Here,
we nd D to be on the
top of the stack.

nly unvisited adjacent


node is om D is C now.
So we visit C, mark
itvisited and put it
onto the stack.

58
Pseudocode of DFS
DFS(G, u)
u.visited = true
for each v ∈ G.Adj[u] if v.visited == false DFS(G,v)

init() {
For each u ∈ G u.visited = false For each u ∈ G DFS(G, u)
}
Applications of Depth First Search Algorithm

1. Finding any path between two vertices u and v in the graph.


2. Finding the number of connected components present in a given undirected graph.
3. For doing topological sorting of a given directed graph.
4. Finding strongly connected components in the directed graph.
5. Finding cycles in the given graph.
6. To check if the given graph is bipartite.

Complexity Analysis of BFS and DFS Algorithms

Time space Complexity


Complexity

F (V + E) (V)
S
F (V + E) (V)
S

Here, V is the number of nodes and E is the number of edges.


2.3 Minimum Spanning Tree
A Spanning Tree is a tree which have V vertices and V-1 edges. All nodes in a
spanning tree are reachable from each other.
A Minimum Spanning Tree(MST) or minimum weight spanning tree for a
weighted,
connected, undirected graph is a spanning tree having a weight less than or equal
to the weight of every other possible spanning tree. The weight of a spanning tree
is the sum of weights given to each edge of the spanning tree. In short out of all
spanning trees of a givengraph, the spanning tree having minimum weight is MST.

59
PROPERTIES OF THE MINIMUM SPANNING TREE

A minimum spanning tree of a graph is unique, if the weight of all the edges are
distinct. Otherwise, there may be multiple minimum spanning trees. (Specific
algorithms typically output one of the possibleminimum spanning trees).
Minimum spanning tree is also the tree with minimum product of weights of
edges. (It can be easily proved by replacing the weights of all edges with their
logarithms)
In a minimum spanning tree of a graph, the maximum weight of an edge is the
minimum possible from all possible spanning trees of that graph. (This follows
from the validity of Kruskal's algorithm).
The maximum spanning tree (spanning tree with the sum of weights of edges
being maximum) of a graph can be obtained similarly to that of the minimum
spanning tree, by changing the signs of the weights of all the edges to their
opposite and then applying any of the minimum spanning tree algorithm.

Algorithms for finding Minimum Spanning Tree(MST):-


1. Prim’s Algorithm
2. Kruskal’s Algorithm
2.3.1 Prim’s Algorithm
Prim's algorithm is a minimum spanning tree algorithm that takes a graph as
input andfinds the subset of the edges of that graph which
form a tree that includes every vertex
has the minimum sum of weights among all the trees that can be formed from the
graph

How Prim's algorithm works


It falls under a class of algorithms called greedy algorithms that find the local
optimum inthe hopes of finding a global optimum.
We start from one vertex and keep adding edges with the lowest weight until we
reachour goal. The steps for implementing Prim's algorithm are as follows:
1. Initialize the minimum spanning tree with a vertex chosen at random.
2. Find all the edges that connect the tree to new vertices, find the minimum and add
it to the tree
3. Keep repeating step 2 until we get a minimum spanning tree

60
Example of Prim's algorithm

Start with a weighted graph

Choose a vertex

Choose the shortest edge from this vertex and add it

Choose the nearest vertex not yet in the solution

61
Choose the nearest edge not yet in the solution, if there are multiple choices, choose
one at random

Prim's Algorithm pseudocode


The pseudocode for prim's algorithm shows how we create two sets of vertices U
and V-U.U contains the list of vertices that have been visited and V-U the list of
vertices that haven't.One by one, we move vertices from set V-U to set U by
connecting the least weight edge. T = ∅;
U = { 1 };
while (U ≠ V)
let (u, v) be the lowest cost edge such that u ∈ U and v ∈
V - U;T = T 𝖴 {(u, v)}
U = U 𝖴 {v}

Complexity Analysis of Prim’s Algorithm:


Time Complexity: O(V2), If the input graph is represented using an adjacency list,
then the time complexity of Prim’s algorithm can be reduced to O(E * logV) with
the help of a binary heap. In thisimplementation, we are always considering the
spanning tree to start from the root of the graph Auxiliary Space: O(V)
Advantages:
1. Prim’s algorithm is guaranteed to find the MST in a connected, weighted graph.
2. It has a time complexity of O(E log V) using a binary heap or Fibonacci heap, where
E is the numberof edges and V is the number of vertices.
3. It is a relatively simple algorithm to understand and implement compared to some
other MST algorithms.
Disadvantages:
1. Like Kruskal’s algorithm, Prim’s algorithm can be slow on dense graphs with many
edges, as it requiresiterating over all edges at least once.
2. Prim’s algorithm relies on a priority queue, which can take up extra memory

62
and slow down thealgorithm on very large graphs.
3. The choice of starting node can affect the MST output, which may not be desirable
in some applications.
Other Implementations of Prim’s Algorithm:
Given below are some other implementations of Prim’s Algorithm
Prim’s Algorithm for Adjacency Matrix Representation – In this article we have
discussed the methodof implementing Prim’s Algorithm if the graph is
represented by an adjacency matrix.
Prim’s Algorithm for Adjacency List Representation – In this article Prim’s
Algorithm implementationis described for graphs represented by an adjacency
list.
Prim’s Algorithm using Priority Queue: In this article, we have discussed a time-
efficient approach toimplement Prim’s algorithm.

2.3.2 Kruskal Algorithm

Kruskal's algorithm is a minimum spanning tree algorithm that takes a graph as


input andfinds the subset of the edges of that graph which
form a tree that includes every vertex
has the minimum sum of weights among all the trees that can be formed from
the graph
How Kruskal's algorithm works
It falls under a class of algorithms called greedy algorithms that find the local
optimum inthe hopes of finding a global optimum.
We start from the edges with the lowest weight and keep adding edges until we
reach ourgoal. The steps for implementing Kruskal's algorithm are as follows:
1. Sort all the edges from low weight to high
2. Take the edge with the lowest weight and add it to the spanning tree. If adding
the edgecreated a cycle, then reject this edge.
3. Keep adding edges until we reach all vertices.

Example of Kruskal's algorithm

63
Start with a weighted graph

Choose the edge with the least weight, if there are more than 1, choose anyone

Choose the next shortest edge and add it

Choose the next shortest edge that doesn't create a cycle and add it

64
Choose the next shortest edge that doesn't create a cycle and add it

Repeat until you have a spanning tree

Kruskal Algorithm Pseudocode


KRUSKAL(G):
A=∅
For each vertex v ∈ G.V:
MAKE-SET(v)
For each edge (u, v) ∈ G.E ordered by increasing order by weight(u, v):if FIND-
SET(u) ≠ FIND-SET(v):
A = A 𝖴 {(u, v)} UNION(u, v)
return A

Time Complexity:
O(E * logE) or O(E * logV) Sorting of edges takes O(E * logE) time.
After sorting, we iterate through all edges and apply the find-union
algorithm. The find and unionoperations can take at most O(logV) time.
So overall complexity is O(E * logE + E * logV) time.
The value of E can be at most O(V2), so O(logV) and O(logE) are the same.
Therefore, the overall timecomplexity is O(E * logE) or O(E*logV)
Auxiliary Space: O(V + E), where V is the number of vertices and E is the
number of edges in the graph.
Applications of Kruskal's algorithm
Kruskal's algorithm has several applications, including: Designing efficient network
connections or cable layout planning. Constructing efficient road networks or
transportation systems. Cluster analysis or datagrouping. Approximate solutions
to the traveling salesman problem.

65
2.4 Shortest Path Algorithm

The shortest path problem is about finding a path between vertices in a graph
such thatthe total sum of the edges weights is minimum.

Algorithm for Shortest Path


1. Bellman Algorithm
2. Dijkstra Algorithm
3. Floyd Warshall Algorithm

2.4.1 Bellman Algorithm


Bellman Ford algorithm helps us find the shortest path from a vertex to all
other vertices of a weighted graph.
It is similar to Dijkstra's algorithm but it can work with graphs in which edges
canhave negative weights.
How Bellman Ford's algorithm works
Bellman Ford algorithm works by overestimating the length of the path from the
starting vertex to all other vertices. Then it iteratively relaxes those estimates by
finding new paths that are shorter than the previously overestimated paths.
By doing this repeatedly for all vertices, we can guarantee that the result is
optimized.

66
67
Bellman Ford Pseudocode
We need to maintain the path distance of every vertex. We can store that in an
arrayof size v, where v is the number of vertices.
We also want to be able to get the shortest path, not only know the length of the
shortestpath. For this, we map each vertex to the vertex that last updated its path
length.
Once the algorithm is over, we can backtrack from the destination vertex to the
sourcevertex to find the path.
function bellmanFord(G,S) for each vertex V in G distance[V] <- infinite
previous[V] <- NULLdistance[S] <- 0 for each vertex V in Gfor each edge
(U,V) in G
tempDistance <- distance[U] + edge_weight(U, V) if tempDistance <distance[V]
distance[V] <- tempDistance
previous[V] <- U

for each edge (U,V) in G


If distance[U] + edge_weight(U, V) < distance[V}
Error: Negative Cycle Exists return distance[], previous[] Bellman

68
Ford'sComplexity

Best Case O(E

omplexity
Average Case O(VE

Worst Case Complexity O(VE

2.4.2 Dijkstra Algorithm

Dijkstra's algorithm allows us to find the shortest path between any two vertices of
a graph.It differs from the minimum spanning tree because the shortest distance
between two vertices might not include all the vertices of the graph.

How Dijkstra's Algorithm works


Dijkstra's Algorithm works on the basis that any subpath B -> D of the shortest
path A ->D between vertices A and D is also the shortest path between vertices B
and D.

Each subpath is the shortest path

Djikstra used this property in the opposite direction i.e we overestimate the
distance of eachvertex from the starting vertex. Then we visit each node and its
neighbors to find the shortestsubpath to those neighbors.
The algorithm uses a greedy approach in the sense that we find the next best
solution hopingthat the end result is the best solution for the whole problem.

Example of Dijkstra's algorithm


It is easier to start with an example and then think about the algorithm.

69
Start with a weighted graph

Choose a starting vertex and assign infinity path values to all other devices

Go to each vertex and update its path length

70
If the path length of the adjacent vertex is lesser than new path length, don't update
it

Avoid updating path lengths of already visited vertices

After each iteration, we pick the unvisited vertex with the least path length. So we
choose 5 before 7

71
Notice how the rightmost vertex has its path length updated twice

Repeat until all the vertices have been visited

Djikstra's algorithm pseudocode


We need to maintain the path distance of every vertex. We can store that in an
array ofsize v, where v is the number of vertices.
We also want to be able to get the shortest path, not only know the length of the
shortestpath. For this, we map each vertex to the vertex that last updated its path
length.
Once the algorithm is over, we can backtrack from the destination vertex to the
source vertexto find the path.
A minimum priority queue can be used to efficiently receive the vertex with
least pathdistance. function dijkstra(G, S)
for each vertex V in G distance[V] <- infinite previous[V] <- NULL
If V != S, add V to Priority QueueQ distance[S] <- 0

while Q IS NOT EMPTYU <- Extract MIN from Q


for each unvisited neighbour V of U tempDistance <- distance[U] +
edge_weight(U,V) if tempDistance < distance[V]
distance[V] <- tempDistanceprevious[V]
<- U

72
return distance[], previous[]
Dijkstra's Algorithm Complexity
Time Complexity: O(E Log V)
where, E is the number of edges and V is the number ofvertices. Space Complexity: O(V)

2.4.3 Ford-Fulkerson algorithm


The Ford-Fulkerson algorithm is a widely used algorithm to solve the maximum flow
problem in a flow network. The maximum flow problem involves determining the maximum
amount of flow thatcan be sent from a source vertex to a sink vertex in a directed weighted
graph, subject to capacity constraints on the edges.
The algorithm works by iteratively finding an augmenting path, which is a path from the
source to thesink in the residual graph, i.e., the graph obtained by subtracting the current flow
from the capacity ofeach edge. The algorithm then increases the flow along this path by the
maximum possible amount, which is the minimum capacity of the edges along the path.
Problem:
Given a graph which represents a flow network where every edge has a capacity. Also, given
twovertices source ‘s’ and sink ‘t’ in the graph, find the maximum possible flow from s to t
with the following constraints:
Flow on an edge doesn’t exceed the given capacity of the edge. Incoming flow is equal to
outgoing flow for every vertex except s and t.For example, consider the following graph from
the CLRS book.

The maximum possible flow in the above graph is 23.

Prerequisite : Max Flow Problem Introduction Ford-Fulkerson Algorithm


The following is simple idea of Ford-Fulkerson algorithm:
73
Start with initial flow as 0.
While there exists an augmenting path from the source to the sink:
Find an augmenting path using any path-finding algorithm, such as breadth-first search or
depth-firstsearch.
Determine the amount of flow that can be sent along the augmenting path, which is the
minimumresidual capacity along the edges of the path.
Increase the flow along the augmenting path by the determined amount.
Return the maximum flow.

Time Complexity: Time complexity of the above algorithm is O(max_flow * E). We run a loop
whilethere is an augmenting path. In worst case, we may add 1 unit flow in every iteration.
Therefore the time complexity becomes O(max_flow * E).

Ford-Fulkerson Algorithm

Initially, the flow of value is 0. Find some augmenting Path p and increase flow f on eachedge
of p by residual Capacity cf (p). When no augmenting path exists, flow f is a maximum flow.
FORD-FULKERSON METHOD (G, s, t)
1. Initialize flow f to 0
2. while there exists an augmenting path p
3. do argument flow f along p
4. Return f

FORD-FULKERSON (G, s, t)
1. for each edge (u, v) ∈ E [G]
2. do f [u, v] ← 0
3. f [u, v] ← 0
4. while there exists a path p from s to t in the residual network Gf.
5. do cf (p)←min?{ Cf (u,v):(u,v)is on p}
6. for each edge (u, v) in p
7. do f [u, v] ← f [u, v] + cf (p)
8. f [u, v] ←-f[u,v]

Example: Each Directed Edge is labeled with capacity. Use the Ford-Fulkerson algorithmto
find the maximum flow.

Solution: The left side of each part shows the residual network Gf with a shadedaugmenting
path p,and the right side of each part shows the net flow f.

74
2.5 MAXIMUM FLOW

It is defined as the maximum amount of flow that the network would allow to flow from
source to sink.Multiple algorithms exist in solving the maximum flow problem. Two major
algorithms to solve these kind of problems are Ford-Fulkerson algorithm and Dinic's
Algorithm.

In optimization theory, maximum flow problems involve finding a feasible flow through a
flownetwork that obtains the maximum possible flow rate.

The maximum flow problem can be seen as a special case of more complex network flow
problems, such as the circulation problem. The maximum value of an s-t flow (i.e., flow from
source s to sink t)is equal to the minimum capacity of an s-t cut (i.e., cut severing s from t) in
the network, as stated in the max-flow min-cut theorem.

2.6 Network Flow


Flow Network is a directed graph that is used for modeling material Flow. There are two
different vertices; one is a source which produces material at some steady rate, and another
one is sink which consumes the content at the same constant speed. The flow of the material
at any mark in the system is the rate at which the element moves.
Some real-life problems like the flow of liquids through pipes, the current through wires

75
anddelivery of goods can be modelled using flow networks.
Definition: A Flow Network is a directed graph G = (V, E) such that
1. For each edge (u, v) ∈ E, we associate a nonnegative weight capacity c (u, v) ≥ 0.If (u, v) ∉ E,
weassume that c (u, v) = 0.
2. There are two distinguishing points, the source s, and the sink t;
3. For every vertex v ∈ V, there is a path from s to t containing v.
Let G = (V, E) be a flow network. Let s be the source of the network, and let t be the sink.A flow
in G is a real-valued function f: V x V→R such that the following properties hold:
Play Video
o Capacity Constraint: For all u, v ∈ V, we need f (u, v) ≤ c (u, v).
o Skew Symmetry: For all u, v ∈ V, we need f (u, v) = - f (u, v).
o Flow Conservation: For all u ∈ V-{s, t}, we need

The quantity f (u, v), which can be positive or negative, is known as the net flow from vertex
u to vertex v. In the maximum-flow problem, we are given a flow network G withsource s and
sink t, and
a flow of maximum value from s to t.

2.7 Maximum Bipartite Matching

A bipartite graph is an undirected graph G=(V,E) in which V can be partitioned into two sets
V1 and V2 such that (u,v) E implies either u V1 and v V12 or vice versa. • That is,
all edges go between the two sets V1and V2 and not within V1 and V2.

The bipartite matching is a set of edges in a graph is chosen in such a way, that no two edgesin
that set will share an endpoint. The maximum matching is matching the maximum numberof

edges.
When the maximum match is found, we cannot add another edge. If one edge is added to the
maximum matched graph, it is no longer a matching. For a bipartite graph, there can be
morethan one maximum matching is possible.

Algorithm

bipartiteMatch(u, visited, assign)


76
Input: Starting node, visited list to keep track, assign the list to assign node with another node.
Output − Returns true when a matching for vertex u is possible.
Begin
for all vertex v, which are adjacent with u,do if v is not visited, then
mark v as visited
if v is not assigned, or bipartiteMatch(assign[v], visited, assign) is true, then assign[v] := u
return true done
return false End
maxMatch(grap h)Input − The given graph.
Output − The maximum number of the match.
Begin
initially no vertex is assigned count := 0
for all applicant u in M, do make all node as unvisited
if bipartiteMatch(u, visited, assign), then increase count by 1
done End

Applications of Bipartite Graph


Various applications of bipartite graph are:
Matching Problems
In bipartite graphs, vertices are divided into two disjoint sets, and edges only connect vertices
from different sets. This property makes bipartite graphs useful for modeling matching
problems, such as assigning tasks to workers or matching students to schools.
Recommendation Systems
Bipartite graphs can represent relationships between users and items in recommendation
systems. Users are in one set, items (like products or movies) are in the other, and edges
indicate user-item interactions. Analyzing these graphs can help recommend items to users
based on their preferences or similarities with other users.
Social Networks
In social networks, bipartite graphs can represent connections between two different kinds of
things, like users and events, or users and interests. For instance, in a user-event graph, lines
link users to events they go to, helping with things like suggesting events or finding groups of
users who like similar things.Resource Allocation
Bipartite graphs can represent allocation problems, such as assigning resources to tasks or
employees to projects. By modeling resources and tasks as two disjoint sets of vertices and
edges indicating compatibility or assignment, bipartite graphs can help optimize resource
allocation and scheduling.
Information Retrieval
In information retrieval systems, bipartite graphs can model relationships between documents
and terms. Documents and terms are represented as two disjoint sets of vertices, and edges
indicate which terms appear inwhich documents. Analyzing these graphs helps improve
search algorithms and document clustering techniques.
Transportation Networks
Bipartite graphs can represent transportation networks, where one set of vertices represents
locations (e.g., cities or nodes), and the other set represents transportation routes (e.g., roads
or edges). These graphs are usedfor optimizing transportation systems, route planning, and
logistics management.

77
UNIT – III ADVANCED DESIGN AND ANALYSIS TECHNIQUES

Divide and Conquer methodology: Merge sort - Quick sort- Dynamic programming:
Elements of dynamic programming -Matrix-chain multiplication - Multi stage graphs. Greedy
Technique: Elements of the greedy strategy - Activity-selection problem - Huffman Trees
3.1 Divide and Conquer methodology

Divide and Conquer Algorithm is a problem-solving technique used to solve problems by


dividing the main problem into subproblems, solving them individually and then
mergingthem to find solution to the original problem. In this article, we are going to discuss
how Divide and Conquer Algorithm is helpful and how we can use it to solve problems.

Definition:

It involves breaking a larger problem into smaller subproblems, solving them independently,
and then combining their solutions to solve the original problem. The basic idea is to
recursively divide the problem into smaller subproblems until they become simple enough
to be solved directly. Once the solutions to the subproblems are obtained, they are then
combined to produce the overall solution.

Working of Divide and Conquer Algorithm:

Divide and Conquer Algorithm can be divided into three steps: Divide, Conquer and Merge.

1. Divide:

• Break down the original problem into smaller subproblems.

• Each subproblem should represent a part of the overall problem.

• The goal is to divide the problem until no further division is possible.

2. Conquer:

• Solve each of the smaller subproblems individually.

• If a subproblem is small enough (often referred to as the “base case”), we solve itdirectly
without further recursion.

• The goal is to find solutions for these subproblems independently.


3. Merge:

78
• Combine the sub-problems to get the final solution of the whole problem.

• Once the smaller subproblems are solved, we recursively combine their solutions to get the
solution of larger problem.
• The goal is to formulate a solution for the original problem by merging the results from the
subproblems.

Characteristics of Divide and Conquer Algorithm:


Divide and Conquer Algorithm involves breaking down a problem into smaller, more
manageable parts, solving each part individually, and then combining the solutions to solve
the original problem. The characteristics of Divide and Conquer Algorithm are:

• Dividing the Problem: The first step is to break the problem into smaller, more manageable
subproblems. This division can be done recursively until the subproblems become simple
enough to solve directly.

• Independence of Subproblems: Each subproblem should be independent of the others,


meaning that solving one subproblem does not depend on the solution of another. This allows
for parallel processing or concurrent execution of subproblems, which can lead to efficiency
gains.

• Conquering Each Subproblem: Once divided, the subproblems are solved individually. This
may involve applying the same divide and conquer approach recursively until the
subproblems become simple enough to solve directly, or it may involve applying a different
algorithm or technique.
• Combining Solutions: After solving the subproblems, their solutions are combined to obtain
the solution to the original problem. This combination step should be relatively efficient and
straightforward, as the solutions to the subproblems should be designed to fit together
seamlessly.
3.1.1 Finding the maximum and minimum element in the array:

We can use Divide and Conquer Algorithm to find the maximum element in the array by
dividing the array into two equal sized subarrays, finding the maximum of those two
individual halves by again dividing them into two smaller halves. This is done till we reach
subarrays of size 1. After reaching the elements, we return the maximum element and
combine the subarrays by returning the maximum in each subarray.

Similarly, we can use Divide and Conquer Algorithm to find the minimum element in the
79
array by dividing the array into two equal sized subarrays, finding the minimum of those two
individual halves by again dividing them into two smaller halves. This is done till we reach
subarrays of size 1. After reaching the elements, we return the minimum element and
combine the subarrays by returning the minimum in each subarray.
#include <stdio.h> struct Pair {
int min; int max;
};
struct Pair findMinMax(int arr[], int low, int high) { struct Pair minmax, left, right;
int mid;

// If there is only one elementif (low == high) {


minmax.max = arr[low]; minmax.min = arr[low]; return minmax;
}

// If there are two elements if (high == low + 1) {


if (arr[low] > arr[high]) { minmax.max = arr[low]; minmax.min = arr[high];
} else {
minmax.max = arr[high]; minmax.min = arr[low];
}
return minmax;
}

// If there are more than two elements mid = (low + high) / 2;


left = findMinMax(arr, low, mid);
right = findMinMax(arr, mid + 1, high);

// Compare minimums of two parts if (left.min < right.min) {


minmax.min = left.min;
} else {
minmax.min = right.min;
}

// Compare maximums of two parts if (left.max > right.max) {


minmax.max = left.max;
} else {
minmax.max = right.max;
80
}

return minmax;
}

int main() {
int arr[] = {1000, 11, 445, 1, 330, 3000};
int arr_size = 6;
struct Pair minmax = findMinMax(arr, 0, arr_size - 1); printf("Minimum element is %d\n",
minmax.min); printf("Maximum element is %d\n", minmax.max); return 0;
}

3.1.2 Merge sort


Merge sort is a sorting algorithm that follows the divide-and-conquer approach. It works by
recursively dividing the input array into smaller subarrays and sorting those subarrays then
merging them back together to obtain the sorted array.
In simple terms, we can say that the process of merge sort is to divide the array into two
halves, sort each half, and then merge the sorted halves back together. This process is repeated
until the entire array is sorted.
Merge sort is a popular sorting algorithm known for its efficiency and stability. It follows
the divide-and-conquer approach to sort a given array of elements.
Here’s a step-by-step explanation of how merge sort works:
1. Divide: Divide the list or array recursively into two halves until it can no more be divided.
2. Conquer: Each subarray is sorted individually using the merge sort algorithm.
3. Merge: The sorted subarrays are merged back together in sorted order. The process continues
until all elements from both subarrays have been merged.

Illustration of Merge Sort:


Let’s sort the array or list [38, 27, 43, 10] using Merge Sort Divide:
• [38, 27, 43, 10] is divided into [38, 27 ] and [43, 10] .
• [38, 27] is divided into [38] and [27] .
• [43, 10] is divided into [43] and [10]
.Conquer:
• [38] is already sorted.
• [27] is already sorted.
• [43] is already sorted.
81
• [10] is already sorted. Merge:
• Merge [38] and [27] to get [27, 38] .
• Merge [43] and [10] to get [10,43] .
• Merge [27, 38] and [10,43] to get the final sorted list [10, 27, 38,43]Therefore, the sorted list is
[10, 27, 38, 43] .

Merge Sort pseudocode


function MergeSort(array, left, right) if left >= right
return
mid = (left + right) / 2 MergeSort(array, left, mid) MergeSort(array, mid + 1, right)
Merge(array, left, mid, right)
end function
function Merge(array, left, mid, right) n1 = mid - left + 1
n2 = right - mid
create temp arrays L[n1] and R[n2]

for i = 0 to n1 - 1
L[i] = array[left + i] for j = 0 to n2 - 1
R[j] = array[mid + 1 + j]

i = 0, j = 0, k = left while i < n1 and j < n2


if L[i] <= R[j]
array[k] = L[i] i = i + 1
else
array[k] = R[j]
j= j+1
k=k+1

while i < n1 array[k] = L[i] i = i + 1


k=k+1

while j < n2 array[k] = R[j]


j= j+1
k = k + 1 end function

82
3.1.3 Quick Sort

QuickSort is one of the best sorting algorithms that follows the divide-and- conquer
approach like Merge Sort but unlike Merge Sort, this algorithm does in place sorting. In this
article, we will learn how to implement quicksort in C language.
What is QuickSort Algorithm?
The basic idea behind QuickSort is to select a pivot element from the array and partition the
other elements into two sub-arrays according to whether they are less than or greater than the
pivot. The sub-arrays are then sorted recursively. This process of partitioning and sorting
continues until the entire array is sorted.

It is a divide and conquer algorithm.

• Step 1 − Pick an element from an array, call it as pivot element.


• Step 2 − Divide an unsorted array element into two arrays.
• Step 3 − If the value less than pivot element come under first sub array, the remainingelements
with value greater than pivot come in second sub array.

Consider an example given below, wherein

• P is the pivot element.


• L is the left pointer.
• R is the right pointer.

The elements are 6, 3, 7, 2, 4, 5.

83
84
85
86
Now,

• The pivot is in fixed position.


• All the left elements are less.
• The right elements are greater than pivot.
• Now, divide the array into 2 sub arrays left part and right part.
• Take left partition apply quick sort.

87
Now,

• The pivot is in fixed position.


• All the left elements are less and sorted
• The right elements are greater and are in sorted order.
• The final sorted list is combining two sub arrays is 2, 3, 4, 5, 6, 7

#include <stdio.h>

void swap(int* a, int* b);

// Partition function
int partition(int arr[], int low, int high) {

// Choose the pivot int pivot = arr[high];

// Index of smaller element and indicates


// the right position of pivot found so far int i = low - 1;

// Traverse arr[low..high] and move all smaller


// elements to the left side. Elements from low to
// i are smaller after every iteration for (int j = low; j <= high - 1; j++) {
if (arr[j] < pivot) { i++;
swap(&arr[i], &arr[j]);
88
}
}
// Move pivot after smaller elements and
// return its position swap(&arr[i + 1], &arr[high]); return i + 1;
}

// The QuickSort function implementation void quickSort(int arr[], int low, int high) {
if (low < high) {

// pi is the partition return index of pivot int pi = partition(arr, low, high);

// Recursion calls for smaller elements


// and greater or equals elements quickSort(arr, low, pi - 1); quickSort(arr, pi + 1, high);
}
}

void swap(int* a, int* b) { int t = *a;


*a = *b;
*b = t;
}

int main() {
int arr[] = {10, 7, 8, 9, 1, 5};
int n = sizeof(arr) / sizeof(arr[0]);

quickSort(arr, 0, n - 1); for (int i = 0; i < n; i++)


{printf("%d ", arr[i]);
}

3.2 Dynamic Programming

Dynamic programming, like the divide-and-conquer method, solves problems by combining


the solutions to subproblems. (<Programming= in this context refers to a tabular method, not
to writing computer code.)Divide-and-conquer algorithms partition the problem into disjoint
subproblems, solve the subproblems recursively, and then combine their solutions to solve
the original problem. In contrast, dynamic programming applies when the subproblems
overlap that is, when subproblems share subsubproblems. In this context, a divide-and-

89
conquer algorithm does more work than necessary, repeatedly solving the common
subsubproblems. A dynamic-programming algorithm solves each subsubproblem just
once and then saves its answer in a
table, thereby avoiding the work of recomputing the answer every time it solves each
subproblem. Dynamic programming typically applies to optimization problems. Such
problems canhave many possible solutions. Each solution has a value, and you want to find
a solutionwith the optimal (minimum or maximum) value. We call such a solution an optimal
solution to the problem, as opposed to the optimal solution, since there may be several
solutionsthat achieve the optimal value.
To develop a dynamic-programming algorithm, follow a sequence of foursteps:
1. Characterize the structure of an optimal solution.
2. Recursively define the value of an optimal solution.
3. Compute the value of an optimal solution, typically in a bottom-up fashion.
4. Construct an optimal solution from computed information.
Steps 1,2,3 form the basis of a dynamic-programming solution to a problem. If you need only
the value of an optimal solution, and not the solution itself, then you can omit step
4. When you do perform step 4, it often pays to maintain additional information during
step3 so that you can easily construct an optimal solution.

3.2.1 Matrix Chain Multiplication

Matrix Chain Multiplication is a fundamental problem in the field of computer science and
algorithms that deals with the efficient multiplication of a sequence of matrices. The core
objective of this problem is to determine the optimal way to parenthesize the matrices in the
given sequence to minimize the total number of scalar multiplications required for the matrix
chain multiplication. This problem has broad applications in various domains, including
computer graphics, numerical simulations, optimization, and scientific computing, where
matrix operations are pervasive.

In many computational tasks, matrices are utilized to represent complex data and
transformations. For instance, in computer graphics, matrices are employed to represent
transformations like scaling, rotation, and translation. In scientific simulations, matrices are
used to model physical systems and equations. Performing matrix multiplications in these
applications can be computationally expensive, especially when dealing with long sequences
of matrices. Matrix Chain Multiplication seeks to optimize these computations by finding the
90
most efficient order of multiplication.

Example of Matrix Chain Multiplication


Example: We are given the sequence {4, 10, 3, 12, 20, and 7}. The matrices have size 4 x
10, 10 x 3, 3 x 12, 12 x 20, 20 x 7. We need to compute M [i,j], 0 ≤ i, j≤ 5. We know M [i, i]
= 0 for all i.

91
Let us proceed with working away from the diagonal. We compute the optimal solution for
the product of 2 matrices.

Here P0 to P5 are Position and M1 to M5 are matrix of size (pi to pi-1) On the basis of

sequence, we make a formula

In Dynamic Programming, initialization of every method done by '0'.So we initialize it by '0'.It


will sort out diagonally.

We have to sort out all the combination but the minimum output combination is taken into
consideration.

Calculation of Product of 2 matrices:


1. m (1,2) = m1 x m2
= 4 x 10 x 10 x 3
= 4 x 10 x 3 = 120

2. m (2, 3) = m2 x m3
= 10 x 3 x 3 x 12
= 10 x 3 x 12 = 360

3. m (3, 4) = m3 x m4
= 3 x 12 x 12 x 20
= 3 x 12 x 20 = 720

4. m (4,5) = m4 x m5
92
= 12 x 20 x 20 x 7
= 12 x 20 x 7 = 1680

o We initialize the diagonal element with equal i,j value with '0'.
o After that second diagonal is sorted out and we get all the values corresponded
to itNow the third diagonal will be solved out in the same way.

Now product of 3 matrices:

M [1, 3] = M1 M2 M3

1. There are two cases by which we can solve this multiplication: ( M1 x M2) + M3, M1+(M2x
M3)
2. After solving both cases we choose the case in which minimum output is there.

M [1, 3] =264

As Comparing both output 264 is minimum in both cases so we insert 264 in table and (
M1 x M2) + M3 this combination is chosen for the output making.

M [2, 4] = M2 M3 M4

1. There are two cases by which we can solve this multiplication: (M2x M3)+M4, M2+(M3 x
M4)
2. After solving both cases we choose the case in which minimum output is there.

M [2, 4] = 1320

As Comparing both output 1320 is minimum in both cases so we insert 1320 in table and
M2+(M3 x M4) this combination is chosen for the output making.
93
M [3, 5] = M3 M4 M5
1. There are two cases by which we can solve this multiplication: ( M3 x M4) + M5, M3+(
M4xM5)
2. After solving both cases we choose the case in which minimum output is there.

M [3, 5] = 1140
As Comparing both output 1140 is minimum in both cases so we insert 1140 in table and (
M3 x M4) + M5this combination is chosen for the output making.

Now Product of 4 matrices:

M [1, 4] = M1 M2 M3 M4
There are three cases by which we can solve this multiplication:

1. ( M1 x M2 x M3) M4
2. M1 x(M2 x M3 x M4)
3. (M1 xM2) x ( M3 x M4)

After solving these cases we choose the case in which minimum output is there

M [1, 4] =1080

As comparing the output of different cases then '1080' is minimum output, so we insert 1080
in the table and (M1 xM2) x (M3 x M4) combination is taken out in output making,

M [2, 5] = M2 M3 M4 M5
There are three cases by which we can solve this multiplication:

94
1. (M2 x M3 x M4)x M5
2. M2 x( M3 x M4 x M5)
3. (M2 x M3)x ( M4 x M5)

After solving these cases we choose the case in which minimum output is there

M [2, 5] = 1350
As comparing the output of different cases then '1350' is minimum output, so we insert 1350
in the table and M2 x( M3 x M4 xM5)combination is taken out in output making.

Now Product of 5 matrices:

M [1, 5] = M1 M2 M3 M4 M5
There are five cases by which we can solve this multiplication:

1. (M1 x M2 xM3 x M4 )x M5
2. M1 x( M2 xM3 x M4 xM5)
3. (M1 x M2 xM3)x M4 xM5
4. M1 x M2x(M3 x M4 xM5)

After solving these cases we choose the case in which minimum output is there

M [1, 5] = 1344
As comparing the output of different cases then '1344' is minimum output, so we insert 1344
in the table and M1 x M2 x(M3 x M4 x M5)combination is taken out in output making.

95
Final Output is:

Step 3: Computing Optimal Costs: let us assume that matrix Ai has dimension pi-1x pi for
i=1, 2, 3....n. The input is a sequence (p0,p1,. ... pn) where length [p] = n+1. The procedure
uses an
auxiliary table m [1....n, 1.....n] for storing m [i, j] costs an auxiliary table s [1.....n, 1 .n]
that
record which index of k achieved the optimal costs in computing m [i, j].

The algorithm first computes m [i, j] ← 0 for i=1, 2, 3.....n, the minimum costs for the chain
of length 1.

.2 Multi-Stage Graph

A Multistage graph is a directed, weighted graph in which the nodes can be divided into a
setof stages such that all edges are from a stage to next stage only (In other words there is
no edge between vertices of same stage and from a vertex of current stage to previous stage).

The vertices of a multistage graph are divided into n number of disjoint subsets S = { S1 , S2
, S3 ……….. Sn }, where S1 is the source and Sn is the sink ( destination ). The
cardinality of S1 and Sn are equal to 1. i.e., |S1| = |Sn| = 1.We are given a multistage graph,
a source and adestination, we need to find shortest path from source to destination. By
convention, we consider source at stage 1 and destination as last stage.

Multistage Graph problem is defined as follow:

Multistage graph G = (V, E, W) is a weighted directed graph in which vertices are partitioned
into k ≥ 2 disjoint sub sets V = {V1, V2, …, Vk} such that if edge (u, v) is present in E then
u ∈ Vi and v ∈ Vi+1, 1 ≤ i ≤ k. The goal of multistage graph problem is to find minimum cost
path from source to destination vertex.
The input to the algorithm is a k-stage graph, n vertices are indexed in increasing order of
stages.
The algorithm operates in the backward direction, i.e. it starts from the last vertex of the
graph and proceeds in a backward direction to find minimum cost path.
▪ Minimum cost of vertex j ∈ Vi from vertex r ∈ Vi+1 is defined as, Cost[j] = min{

c[j, r] + cost[r] }

where, c[j, r] is the weight of edge <j, r> and cost[r] is the cost of moving from end
96
vertex to vertex r.
Algorithm MULTI_STAGE(G, k, n, p)

// Description: Solve multi-stage problem using dynamic programming

// Input:
k: Number of stages in graph G = (V, E)c[i, j]:Cost of edge (i, j)

// Output: p[1:k]:Minimum cost path cost[n] ← 0


for j ← n – 1 to 1 do
//Let r be a vertex such that (j, r) in E and c[j, r] + cost[r] is minimumcost[j] ← c[j, r] + cost[r]
π[j] ← r end
//Find minimum cost path p[1] ← 1
p[k] ← n

for j ← 2 to k - 1 do p[j] ← π[p[j - 1]]


end

Example: Find minimum path cost between vertex s and t for following multistage graph
using dynamic programming.

According to the formula, we have to calculate the cost (i, j) using the following steps
Step 1: Cost (K-2, j)

In this step, three nodes (node 4, 5. 6) are selected as j. Hence, we have three options to
choose the minimum cost at this step.

Cost(3, 4) = min {c(4, 7) + Cost(7, 9),c(4, 8) + Cost(8, 9)} = 7


97
Cost(3, 5) = min {c(5, 7) + Cost(7, 9),c(5, 8) + Cost(8, 9)} = 5

Cost(3, 6) = min {c(6, 7) + Cost(7, 9),c(6, 8) + Cost(8, 9)} = 5

Step 2: Cost (K-3, j)

Two nodes are selected as j because at stage k - 3 = 2 there are two nodes, 2 and 3. So, the
value i = 2 and j = 2 and 3.

Cost(2, 2) = min {c(2, 4) + Cost(4, 8) + Cost(8, 9),c(2, 6) + Cost(6, 8) + Cost(8, 9)} = 8

Cost(2, 3) = {c(3, 4) + Cost(4, 8) + Cost(8, 9), c(3, 5) + Cost(5, 8)+ Cost(8, 9), c(3, 6) +
Cost(6, 8) + Cost(8, 9)} = 10

Step 3: Cost (K-4, j)

Cost (1, 1) = {c(1, 2) + Cost(2, 6) + Cost(6, 8) + Cost(8, 9), c(1, 3) + Cost(3, 5) + Cost(5,
8) + Cost(8, 9))} = 12

c(1, 3) + Cost(3, 6) + Cost(6, 8 + Cost(8, 9))} = 13

Hence, the path having the minimum cost is 1→ 3→ 5→ 8→ 9.

3.3.1 Greedy Algorithm

Algorithms for optimization problems typically go through a


sequence of steps, with a set of choices at each step. For many optimization problems,
using dynamic programming to determine the best choices is overkill, and simpler,
more efficient algorithms will do. A greedy algorithm always makes the choice that
looks best at the moment. That is, it makes a locally optimal choice in the hope that
this choice leads to a globally optimal solution.

The greedy method is quite powerful and works well for a wide range of problems.
Applications of the greedy method, including minimum-spanning-tree algorithms,
Dijkstra’s algorithm for shortest paths from a single source, and a greedy set-covering
heuristic. Minimum-spanning-tree algorithms furnish a classic example of the greedy
method.
3.3.2 Activity Selection Problem

You are given n activities with their start and finish times. Select the maximum
number of activities that can be performed by a single person, assuming that a person
can only work on a single activity at a time.
Examples:
98
Input: start[] = {10, 12, 20}, finish[] = {20, 25, 30}
Output: 0
Explanation: A person can perform at most one activities.
Input: start[] = {1, 3, 0, 5, 8, 5}, finish[] = {2, 4, 6, 7, 9, 9};
Output: 0 1 3 4
Explanation: A person can perform at most four activities. The maximum set of
activities that can be executed
is {0, 1, 3, 4} [These are indexes in start[] and finish[] Following steps are used to solve

the problem:

• Sort the activities according to their finishing time


• Select the first activity from the sorted array and print it
• Do the following for the remaining activities in the sorted array
• If the start time of this activity is greater than or equal to the finish time of the
previouslyselected activity then select this activity and print it

In the implementation, it is assumed that the activities are already sorted according
to their finish time, otherwise the time complexity will rise to O(N*log(N)) and
Auxiliary Space will rise to O(N), aswe have to create a 2-D array for storing the start
and finish times together.

// C program for activity selection problem.

// The following implementation assumes that the activities


// are already sorted according to their finish time #include <stdio.h>

// Prints a maximum set of activities that can be done by a


// single person, one at a time.
void printMaxActivities(int s[], int f[], int n)
{
int i, j;

printf("Following activities are selected\n");

// The first activity always gets selected i = 0;


printf("%d ", i);

// Consider rest of the activities for (j = 1; j < n; j++) {


// If this activity has start time greater than or
// equal to the finish time of previously selected
99
// activity, then select it if (s[j] >= f[i]) {
printf("%d ", j); i = j;
}
}
}

// Driver code int main()


{
int s[] = { 1, 3, 0, 5, 8, 5 };
int f[] = { 2, 4, 6, 7, 9, 9 };
int n = sizeof(s) / sizeof(s[0]);

// Function call printMaxActivities(s, f, n); return 0;


}

3.4 Huffman coding

Huffman coding is a lossless data compression algorithm. The idea is to assign


variable-length codes to input characters, lengths of the assigned codes are based on
the frequencies of corresponding characters.
The variable-length codes assigned to input characters are Prefix Codes, means the
codes (bit sequences) are assigned in such a way that the code assigned to one
character is not the prefix of code assigned to any other character. This is how
Huffman Coding makes sure that there is no ambiguity when decoding the
generated bitstream.
Let us understand prefix codes with a counter example. Let there be four characters
a, b, c and d, and their corresponding variable length codes be 00, 01, 0 and 1. This
coding leads to ambiguity because code assigned to c is the prefix of codes assigned
to a and b. If the compressed bit stream is 0001, the de-compressed output may be
“cccd” or “ccb” or “acd” or “ab”.

There are mainly two major parts in Huffman Coding


1. Build a Huffman Tree from input characters.

2. Traverse the Huffman Tree and assign codes to characters.

Algorithm:
The method which is used to construct optimal prefix code is called Huffman coding.
100
This algorithm builds a tree in bottom up manner. We can denote this tree by T

Let, |c| be number of leaves |c| -1 are number of operations required to merge the
nodes. Q be the priority queue which can be used while constructing binary heap.

Algorithm Huffman (c)


{
n= |c| Q = c
for i<-1 to n-1
do
{
temp <- get node ()
left (temp] Get_min (Q) right [temp] Get Min (Q) a = left [templ b = right [temp]
F [temp]<- f[a] + [b]

insert (Q, temp)


}

return Get_min (0)


}

Frequency of string

101
1. Calculate the frequency of each character in the string

2. Sort the characters in increasing order of the frequency. These are stored
in a priority queue
Each character occupies 8 bits. There are a total of 15 characters in the above string. Thus, a total
of 8 * 15 = 120 bits are required to send this
string.
Using the Huffman Coding technique, we can compress the string to a smaller size.

Huffman coding first creates a tree using the frequencies of the character and then generates code
for each character.

Once the data is encoded, it has to be decoded. Decoding is done using the same tree.

Huffman Coding prevents any ambiguity in the decoding process using the concept
of prefix code ie. a code associated with a character should not be present in the prefix of any
other code. The tree created above helps in maintaining the property.
Huffman coding is done with the help of the following steps.

Characters sorted according to the frequency

3. Make each unique character as a leaf node.

Repeat steps 3 to 5 for all the characters.


102
For each non-leaf node, assign 0 to the left edge and 1 to the right edge

103
Assign 0 to the left edge and 1 to the right edge

For sending the above string over a network, we have to send the tree as well as the
above compressed-code. The total size is given by the table below.

Without encoding, the total size of the string was 120 bits. After encoding the size is
reduced to
32+ 15 + 28
=755 75

Decoding the code

For decoding the code, we can take the code and traverse through the tree to find the character.

Let 101 is to be decoded, we can traverse from the root as in the figure below.

Decoding

Huffman Coding Algorithm


Create a priority queue Q consisting of each unique character. sort then in
104
ascending order of their frequencies, for all the unique characters: create a
newNode extract minimum value from Q and assign it to leftChild of newNode
extract minimum value from Q and assign it to rightChild of newNode calculate
the sum of these two minimum values and assign it to the value of newNode
insert this newNode into the tree return rootNode.

105
UNIT IV STATE SPACE SEARCH ALGORITHMS

Backtracking : n-Queens problem - Hamiltonian Circuit Problem - Subset Sum Problem


- Graph colouring problem Branch and Bound : Solving 15-Puzzle problem -
Assignment problem - Knapsack Problem - Travelling Salesman Problem.

4.1 Backtracking
Backtracking is a problem-solving algorithmic technique that involves finding a
solution incrementally by trying different options and undoing them if they lead
to a dead end. It is commonly used in situations where you need to explore multiple
possibilities to solve a problem, like searching for a path in a maze or solving
puzzles like Sudoku. When a dead end is reached, the algorithm backtracks to the
previous decision point and explores a different path until a solution is found or all
possibilities have been exhausted.

Basic Terminologies
• Candidate: A candidate is a potential choice or element that can be added to
the currentsolution.
• Solution: The solution is a valid and complete configuration that satisfies all
problemconstraints.
• Partial Solution: A partial solution is an intermediate or incomplete
configuration being constructed during the backtracking process.
• Decision Space: The decision space is the set of all possible candidates or
choices at each decision point.
• Decision Point: A decision point is a specific step in the algorithm where a
106
candidate is chosen and added to the partial solution.
• Feasible Solution: A feasible solution is a partial or complete solution that
adheres to allconstraints.
• Dead End: A dead end occurs when a partial solution cannot be extended
without violatingconstraints.
• Backtrack: Backtracking involves undoing previous decisions and returning
to a priordecision point.
• Search Space: The search space includes all possible combinations of
candidates andchoices.
• Optimal Solution: In optimization problems, the optimal solution is the best
possible solution.
Types of Backtracking Problems
Problems associated with backtracking can be categorized into 3 categories:
• Decision Problems: Here, we search for a feasible solution.
• Optimization Problems: For this type, we search for the best solution.
• Enumeration Problems: We find set of all possible feasible solutions to the
problems ofthis type.
4.1.1 N-Queen Problem

The N Queen is the problem of placing N chess queens on an N×N chessboard so


that no two queens attack each other.

For example, the following is a solution for the 4 Queen problem.

107
The expected output is in the form of a matrix that has ‘Q‘s for the blocks where
queens are placedand the empty spaces are represented by ‘.’ . For example, the
following is the output matrix for the above 4-Queen solution.
0Q00
000Q
Q000
00Q0

N Queen Problem using Backtracking:

The idea is to place queens one by one in different columns, starting from the
leftmost column.When we place a queen in a column, we check for clashes with
already placed queens. In the current column, if we find a row for which there is no
clash, we mark this row and column as part ofthe solution. If we do not find such
a row due to clashes, then we backtrack and return false.
Follow the steps mentioned below to implement the idea:

Start in the leftmost column

If all queens are placed return true

Try all rows in the current column. Do the following for every row.

If the queen can be placed safely in this row


Then mark this [row, column] as part of the solution and recursively check
if placing queen here leads to a solution.
If placing the queen in [row, column] leads to a solution then return true.
If placing queen doesn’t lead to a solution then unmark this [row, column]
then backtrack and try other rows.

108
If all rows have been tried and valid solution is not found return false to
trigger backtracking.

One possible solution for 8 queens problem

1. Thus, the solution for 8 -queen problem for (4, 6, 8, 2, 7, 1, 3, 5).


2. If two queens are placed at position (i, j) and (k, l).
3. Then they are on same diagonal only if (i - j) = k - l or i + j = k + l.
4. The first equation implies that j - l = i - k.
5. The second equation implies that j - l = k - i.
6. Therefore, two queens lie on the duplicate diagonal if and only if
7. |j-l|=|i-k|

Place (k, i) returns a Boolean value that is true if the kth queen can be placed in
column i. It testsboth whether i is distinct from all previous costs x1, x2,. xk-1 and
whether there is no other queen on
the same diagonal.

Using place, we give a precise solution to then n- queens problem.

1. Place (k, i)
2. {
3. For j ← 1 to k - 1
4. do if (x [j] = i)
5. or (Abs x [j]) - i) = (Abs (j - k))
6. then return false;
7. return true;
8. }
Place (k, i) return true if a queen can be placed in the kth row and ith column

109
otherwise return is false.

x [] is a global array whose final k - 1 values have been set. Abs (r) returns the
absolute value of r.

1. N - Queens (k, n)
2. {
3. For i ← 1 to n
4. do if Place (k, i) then5.
{
6. x [k] ← i;
7. if (k ==n) then
8 write (x [1...
. n));
9 Else
.
1 N - Queens
0 (k + 1, n);
.
1 }
1
.
1
2
.
}

4.1.2 Hamiltonian Circuit Problem

Hamiltonian Cycle or Circuit in a graph G is a cycle that visits every vertex of G


exactly once and returns to the starting vertex.
• Similar to the Hamiltonian Cycle problem, finding a Hamiltonian Path in a
general graph is also NP-complete and can be challenging. However, it is often a
more easier problem than finding a Hamiltonian Cycle.
• Hamiltonian Paths have applications in various fields, such as finding
optimal routes in transportation networks, circuit design, and graph theory
research.
Problems Statement: Given an undirected graph, the task is to determine whether
the graph contains a Hamiltonian cycle or not. If it contains, then prints the path.
Problems Statement: Given an undirected graph, the task is to determine whether
the graph contains a Hamiltonian cycle or not. If it contains, then prints the path.
Example: Input: graph[][] = {{0, 1, 0, 1, 0},{1, 0, 1, 1, 1},{0, 1, 0, 0, 1},{1, 1, 0, 0, 1},{0, 1,
1, 1, 0}}

110
Output: {0, 1, 2, 4, 3, 0}.
Hamiltonian Cycle using Backtracking Algorithm:
Create an empty path array and add vertex 0 to it. Add other vertices, starting from
the vertex 1. Before adding a vertex, check for whether it is adjacent to the
previously added vertex and not already added. If we find such a vertex, we add
the vertex as part of the solution. If we do not find a vertex then we return false.

Let’s find out the Hamiltonian cycle for the following graph:

• Start with the node 0 .

• Apply DFS for finding the Hamiltonian path.


• When base case reach (i.e. total no of node traversed == V (total vertex)):

o Check whether current node is a neighbour of starting node.


o As node 2 and node 0 are not neighbours of each other so return from
it.

111
• As cycle is not found in path {0, 3, 1, 4, 2}. So, return from node 2, node
4.

• Now, explore another option for node 1 (i.e node 2)


• When it hits the base condition again check for Hamiltonian cycle
• As node 4 is not the neighbour of node 0, again cycle is not found then return.

112
• Return from node 4, node 2, node 1.

113
• In the Hamiltonian path {0,3,4,2,1,0} we get cycle as node 1 is the neighbour
of node 0.

• So print this cyclic path .

• This is our Hamiltonian cycle.


Time Complexity : O(N!), where N is number of vertices. Auxiliary Space : O(1),
since no extra space used. #include <stdio.h>
#define NODE 5
int graph[NODE][NODE] = {
{0, 1, 0, 1, 0},
{1, 0, 1, 1, 1},
{0, 1, 0, 0, 1},
{1, 1, 0, 0, 1},
{0, 1, 1, 1, 0},

114
};
int path[NODE];
// Function to display the Hamiltonian cycle void displayCycle() {
printf("Cycle Found: ");
for (int i = 0; i < NODE; i++) printf("%d ", path[i]);
// Print the first vertex again printf("%d\n", path[0]);
}
// Function to check if adding vertex v to the path is valid int isValid(int v, int k) {
// If there is no edge between path[k-1] and v if (graph[path[k - 1]][v] == 0)
return 0;
// Check if vertex v is already taken in the pathfor (int i = 0; i < k; i++)
if (path[i] == v) return 0;
return 1;
}
// Function to find the Hamiltonian cycle int cycleFound(int k) {
// When all vertices are in the path if (k == NODE) {
// Check if there is an edge between the last and first vertex if (graph[path[k -
1]][path[0]] == 1)
return 1; else
return 0;
}
// Try adding each vertex (except the starting point) to the path for (int v = 1; v <
NODE; v++) {
if (isValid(v, k)) { path[k] = v;
if (cycleFound(k + 1) == 1)
return 1;
// Backtrack: Remove v from the path path[k] = -1;
}
}
return 0;
}
// Function to find and display the Hamiltonian cycle int hamiltonianCycle() {
for (int i = 0; i < NODE; i++) path[i] = -1;
// Set the first vertex as 0 path[0] = 0;
if (cycleFound(1) == 0) { printf("Solution does not exist\n"); return 0;
}
displayCycle(); return 1;
}
int main() { hamiltonianCycle(); return 0;

115
}

4.1.3 Subset Sum Problem


Given a set[] of non-negative integers and a value sum, the task is to print the subset
of the given set whose sum is equal to the given sum.
Input: set[] = {1,2,1}, sum = 3
Output: [1,2],[2,1]
Explanation: There are subsets [1,2],[2,1] with sum 3.
Input: set[] = {3, 34, 4, 12, 5, 2}, sum = 30
Output: []
Explanation: There is no subset that add up to 30. Subset Sum Problem using

Backtracking

Subset-Sum Problem is finding a subset of a given set S = {s1,s2….sn} of n


positive integers whose sum is equal to a given positive integer d.

For example, for S = {1, 2, 5, 6, 8) and d = 9, there are two solutions: {1, 2, 6} and {1,
8}. Of course,some instances of this problem may have no solutions.
• It is convenient to sort the set‘s elements in increasing order. So we will assume
• that s1 ≤ s2 ≤ ……. ≤ sn
• The state-space tree can be constructed as a binary tree as that in the following
figure for the instances S = (3, 5, 6, 7) and d = 15.
• The root of the tree represents the starting point, with no decisions about the
given elements made as yet.
• Its left and right children represent, respectively, inclusion and exclusion ofs1
in a set being sought.
• Similarly, going to the left from a node of the first level corresponds to inclusion
of s2, while going to the right corresponds to its exclusion, and soon.
• Thus, a path from the root to a node on the ith level of the tree indicates which
of the first inumbers have been included in the subsets represented by that node.
• We record the value of s‘ the sum of these numbers, in the node, Ifs is equal to
d. we have asolution to the problem.
• We can either, report this result and stop or, if all the solutions need to he
found,continue by backtracking to the node‘s parent.

116
Time Complexity: O(2n) The above solution may try all subsets of the given set in
the worst case. Therefore time complexity of the above solution is exponential.
Auxiliary Space: O(n) where n is recursion stack space.

4.1.4 Graph Coloring problem


Graph coloring refers to the problem of coloring vertices of a graph in such a way
that no two adjacent vertices have the same color. This is also called the vertex
coloring problem. If coloring is done using at most m colors, it is called m-coloring.

117
The minimum number of colors needed to color a graph is called its chromatic
number. For example, the following can be colored a minimum of 2 colors.

The problem of finding a chromatic number of a given graph is NP-complete.


Graph coloring problem is both, a decision problem as well as an optimization
problem.

• A decision problem is stated as, “With given M colors and graph G, whether a
such colorscheme is possible or not?”.

• The optimization problem is stated as, “Given M colors and graph G, find the
minimumnumber of colors required for graph coloring.”

118
Algorithm of Graph Coloring using Backtracking:
Assign colors one by one to different vertices, starting from vertex 0. Before
assigning a color, check if the adjacent vertices have the same color or not. If there
is any color assignment that does not violate the conditions, mark the color
assignment as part of the solution. If no assignmentof color is possible then
backtrack and return false.
Follow the given steps to solve the problem:
• Create a recursive function that takes the graph, current index, number of
vertices, andcolor array.

• If the current index is equal to the number of vertices. Print the color
configuration in thecolor array.

• Assign a color to a vertex from the range (1 to m).


o For every assigned color, check if the configuration is safe, (i.e. check if
the adjacentvertices do not have the same color) and recursively call the function
with the next index and number of vertices else return false
o If any recursive function returns true then break the loop and return true
o If no recursive function returns true then return false

119
4.2 Branch and Bound
The Branch and Bound Algorithm is a method used in combinatorial optimization
problems to systematically search for the best solution. It works by dividing the
problem into smaller subproblems, or branches, and then eliminating certain
branches based on bounds on the optimal solution. This process continues until the
best solution is found or all branches have been explored. Branch and Bound is
commonly used in problems like the traveling salesman and job scheduling.
4.2.1 Solving 15 puzzle problem
Given a 4×4 board with 15 tiles (every tile has one number from 1 to 15) and one
empty space. The objective is to place the numbers on tiles in order using the empty
space. We can slide fouradjacent (left, right, above and below) tiles into the empty
space. For example,

120
1. Here X marks the spot to where the elements can be shifted and the final
configuration always remains the same the puzzle is solvable.
In general, for a given grid of width N, we can find out check if a N*N – 1 puzzle
is solvable or not by following below simple rules :
If N is odd, then puzzle instance is solvable if number of inversions is even in the
input state.

2. If N is even, puzzle instance is solvable if


• the blank is on an even row counting from the bottom (second-last,
fourth-last, etc.) and number of inversions is odd.

• the blank is on an odd row counting from the bottom (last, third-last,
fifth-last, etc.) and number of inversions is even.

3. For all other cases, the puzzle instance is not solvable.

What is an inversion here?


If we assume the tiles written out in a single row (1D Array) instead of being spread
in N-rows (2D Array), a pair of tiles (a, b) form an inversion if a appears before b
but a > b.
For above example, consider the tiles written out in a row, like this: 2 1 3 4 5 6 7 8 9
10 11 12 13 14 15 X
The above grid forms only 1 inversion i.e. (2, 1).
Illustration:

121
The algorithm above can be illustrated with the following program code: int a[16];
122
for (int i=0; i<16; ++i) cin >> a[i];

int inv = 0;
for (int i=0; i<16; ++i) if (a[i])
for (int j=0; j<i; ++j) if (a[j] > a[i])
++inv;
for (int i=0; i<16; ++i) if (a[i] == 0)
inv += 1 + i / 4;

puts ((inv & 1) ? "No Solution" : "Solution Exists");

4.2.2 Assignment problem

Let there be N workers and N jobs. Any worker can be assigned to perform any job,
incurring some cost that may vary depending on the work-job assignment. It is
required to perform all jobs by assigning exactly one worker to each job and exactly
one job to each agent in such a way that the total cost of the assignment is
minimized.

Approaches to Solve the Assignment Problem

1. Brute Force
n!
The brute force approach generates all possible job assignments (which are
permutations) and computes the total cost for each assignment. The
N.
assignment with the minimum cost is chosen.
This approach has a time complexity of O(n!) , making it impractical for large
2. Hungarian Algorithm
123
The Hungarian algorithm is an efficient method to solve the assignment problem
with a time
complexity O(n^3)
of . It transforms the cost matrix and finds the optimal assignment by
reducing
the matrix and covering zeros with a minimum number of lines.
3. DFS/BFS on State Space Tree
This approach involves constructing a state space tree where each path from the root
to a leaf node represents a possible assignment. Depth-First Search (DFS) or
Breadth-First Search (BFS) can be used to explore the tree, but these methods may
not always find the optimal solution efficiently.
4. Branch and Bound
The Branch and Bound method is a more intelligent approach that uses a cost
function to guide the search for the optimal solution. It constructs a state space tree
and explores nodes based ontheir estimated cost, using a priority queue (min-heap)
to always expand the node with the least cost first. This method significantly
reduces the search space and improves efficiency.

Let’s take below example and try to calculate promising cost when Job 2 is assigned
to worker A.

Since Job 2 is assigned to worker A (marked in green), cost becomes 2 and Job 2 and
worker A becomes unavailable (marked in red).

124
Now we assign job 3 to worker B as it has minimum cost from list of unassigned
jobs. Cost becomes 2 + 3 = 5 and Job 3 and worker B also becomes unavailable.

Finally, job 1 gets assigned to worker C as it has minimum cost among unassigned
jobs and job 4 gets assigned to worker D as it is only Job left. Total cost becomes 2 +
3 + 5 + 4 = 14.

125
/* findMinCost uses Least() and Add() to maintain the list of live nodes

Least() finds a live node with least cost, deletes it from the list and returns it

Add(x) calculates cost of x and adds it to the listof live nodes

Implements list of live nodes as a min heap */

// Search Space Tree Node node


{
int job_number;
int worker_number; node parent;
int cost;
}

// Input: Cost Matrix of Job Assignment problem


// Output: Optimal cost and Assignment of Jobs algorithm findMinCost
(costMatrix mat[][])
{
// Initialize list of live nodes(min-Heap)
// with root of search tree i.e. a Dummy nodewhile (true)
{
// Find a live node with least estimated cost E = Least();

// The found node is deleted from the list


// of live nodes
if (E is a leaf node)
{
printSolution(); return;
}for each child x of E
{
Add(x); // Add x to list of live nodes;
x->parent = E; // Pointer for path to root
}
}
}

4.2.3 Knapsack Problem


The 0/1 Knapsack problem is a classic combinatorial optimization problem where
we aim to maximize the total value of items placed in a knapsack without exceeding
its capacity. Each item can either be included or excluded from the knapsack, hence
the name 0/1 Knapsack1.
Branch and Bound Approach

126
The Branch and Bound method is an efficient algorithm design paradigm used to
solve combinatorial problems. It systematically explores branches of a decision tree,
pruning branches that cannot yield better solutions than the current best1.
Steps to Implement Branch and Bound for 0/1 Knapsack
Sort Items: Begin by sorting items in decreasing order of their value-to-
weight ratio.
Initialize: Create a dummy node representing the root of the decision tree
and initialize themaximum profit to zero.
Use a Queue: Use a queue to explore nodes. Start by enqueuing the dummy
node.
Explore Nodes: While the queue is not empty, dequeue a node and explore
its children: Include the Item: Calculate the profit and weight if the current
item is included. Exclude the Item:Calculate the profit and weight if the
current item is excluded. Update Maximum Profit: If the profit of the current
node is greater than the maximum profit, update the maximum profit.
Calculate Bound: Compute the upper bound for the current node using the
greedy approach. If the bound is greater than the maximum profit, enqueue
the node for further exploration.

Input: N = 3, W = 4, v[] = {1, 2, 3}, w[] = {4, 5, 1}


Output: 3
Explanation: There are two items which have weight less than or equal to 4. If we
select the item with weight 4, the possible profit is 1. And if we select the item with
weight 1, the possible profit is
3. So the maximum possible profit is 3. Note that we cannot put both the items with
weight 4 and 1 together as the capacity of the bag is 4.
Input: N = 5, W = 10, v[] = {40, 50, 100, 95, 30}, w[] = {2, 3.14, 1.98, 5, 3}
Output: 235

The backtracking based solution works better than brute force by ignoring infeasible
solutions. We can do better (than backtracking) if we know a bound on best possible
solution subtree rooted withevery node. If the best in subtree is worse than current
best, we can simply ignore this node and its subtrees. So we compute bound (best
solution) for every node and compare the bound with current best solution before
exploring the node.

127
4.2.4 Traveling Salesman Problem using Branch and Bound

Given a set of cities and distance between every pair of cities, the problem is to
find the shortest possible tour that visits every city exactly once and returns to the
starting point.

128
For example, consider the graph shown in figure on right side. A TSP tour in the
graph is 0-1-3-2-
0. The cost of the tour is 10+25+30+15 which is 80.

Branch and Bound method, for current node in tree, we compute a bound on best
possible solution that we can get if we down this node. If the bound on best
possible solution itself is worse than
current best (best computed so far), then we ignore the subtree rooted with the
node. Note that the cost through a node includes two costs.
1. Cost of reaching the node from the root (When we reach a node, we have this cost
computed)
2. Cost of reaching an answer from current node to a leaf (We compute a bound on
this cost todecide whether to ignore subtree with this node or not).

• In cases of a maximization problem, an upper bound tells us the maximum


possible solution if we follow the given node. For example in 0/1 knapsack we
used Greedy approach to find an upper bound.
• In cases of a minimization problem, a lower bound tells us the minimum possible
solution if we follow the given node. For example, in Job Assignment Problem, we
get a lower bound by assigning least cost job to a worker.

In branch and bound, the challenging part is figuring out a way to compute a
bound on best possible solution. Below is an idea used to compute bounds for
Travelling salesman problem. Cost of any tour can be written as below.

Cost of a tour T = (1/2) * ? (Sum of cost of two edges


adjacent to u and in the tour T) where u ? VFor every vertex u,
if we consider two edges through it in T, and sum their costs. The overall sum for
all vertices would be twice of cost of tour T (We have considered every edge twice.)

(Sum of two tour edges adjacent to u) >= (sum of minimum


weight two edges adjacent to u)

Cost of any tour >= 1/2) * ? (Sum of cost of two minimum


weight edges adjacent to u) where u ? V

For example, consider the above shown graph. Below are minimum cost two edges
adjacent to every node.

129
N Least cost Total cost
o edges
d
e
0 (0, 1), (0, 2) 25
1 (0, 1), (1, 3) 35
2 (0, 2), (2, 3) 45
3 (0, 3), (1, 3) 45

Thus a lower bound on the cost of any tour = 1/2(25 + 35 + 45 + 45)


= 75

130
UNIT V NP-COMPLETE AND APPROXIMATION ALGORITHM

Tractable and intractable problems: Polynomial time algorithms - Venn


diagram representation – Non Deterministic algorithms - NP-hardness and NP-
completeness - Problem reduction: TSP - 3 CNF problem. Approximation
Algorithms: Bin Packing problem - Randomized Algorithms: concept and
application - primality testing - randomized quick sort.

5.1 Tractable and Intractable Problems

Tractable problems refer to computational problems that can be solved efficiently


using algorithms that can scale with the input size of the problem. In other words,
the time requiredto solve a tractableproblem increases at most polynomially with
the input size.

On the other hand, intractable problems are computational problems for which no
known algorithm can solve them efficiently in the worst-case scenario. This means
that the time required to solve an intractable problem grows exponentially or even
faster with the input size.

One example of a tractable problem is computing the sum of a list of n numbers.


The time required tosolve this problem scales linearly with the input size, as each
number can be added to a running totalin constant time. Another example is
computing the shortest path between two nodes in a graph, whichcan be solved
efficiently using algorithms like Dijkstra's algorithm or the A* algorithm.

In contrast, some well-known intractable problems include the traveling salesman


problem, the knapsack problem, and the Boolean satisfiability problem. These
problems are NP-hard, meaning thatany problem in NP (the set of problems that
can be solved in polynomial time using a non- deterministic Turing machine) can be
reduced to them in polynomial time. While it is possible to findapproximate
solutions to these problems, there is no known algorithm that can solve them exactly
in polynomial time.
In summary, tractable problems are those that can be solved efficiently with
algorithms that scale wellwith the input size, while intractable problems are those
that cannot be solved efficiently in the worst-case scenario.

Examples of Tractable problems

1. Sorting: Given a list of n items, the task is to sort them in ascending or


descending order. Algorithms like QuickSort and MergeSort can solve this problem
131
in O(n log n) time complexity.

2. Matrix multiplication: Given two matrices A and B, the task is to find their
product C = AB. The best-known algorithm for matrix multiplication runs in
O(n^2.37) time complexity, whichis considered tractable for practical applications.

3. Shortest path in a graph: Given a graph G and two nodes s and t, the task is
to find the

132
shortestpath between s and t. Algorithms like Dijkstra's algorithm and the A*
algorithm can solve thisproblem in O(m + n log n) time complexity, where m is the
number of edges and n is the number of nodes in the graph.
4. Linear programming: Given a system of linear constraints and a linear
objective function, the task is to find the values of the variables that optimize the
objective function subject to the constraints. Algorithms like the simplex method can
solve this problem in polynomial time.

5. Graph coloring: Given an undirected graph G, the task is to assign a color


to each node such that no two adjacent nodes have the same color, using as few
colors as possible. The greedy algorithm can solve this problem in O(n^2) time
complexity, where n is the number of nodes in the graph.
These problems are considered tractable because algorithms exist that can solve
them in polynomialtime complexity, which means that the time required to solve
them grows no faster than a polynomialfunction of the input size.

Examples of intractable problems

1. Traveling salesman problem (TSP): Given a set of cities and the distances
between them, thetask is to find the shortest possible route that visits each city
exactly once and returns to the starting city. The best-known algorithms for solving
the TSP havean exponential worst-case time complexity, which makes it intractable
for large instances of the problem.

2. Knapsack problem: Given a set of items with weights and values, and a
knapsack that can carry a maximum weight, the task is to find the most valuable
subset of items that can be carried by the knapsack. The knapsack problem is also
NP-hard and is intractable for large instances of the problem.
3. Boolean satisfiability problem (SAT): Given a boolean formula in
conjunctive normal form(CNF), the task is to determine if there exists an assignment
of truth values to the variables that makes the formula true. The SAT problem is one
of the most well- known NP-completeproblems, which means that any NP problem
can be reduced toSAT in polynomial time.

4. Subset sum problem: Given a set of integers and a target sum, the task is
to find a subset of the integers that sums up to the target sum. Like the knapsack
problem, the subset sum problem is also intractable for large instances of the
problem.

5. Graph isomorphism problem: Given two graphs G1 and G2, the task is to
determine if there

1. Linear search: Given a list of n items, the task is to find a specific item in
133
the list. The time complexity of linear search is O(n), which is a polynomial function
of the input size.
2. Bubble sort: Given a list of n items, the task is to sort them in ascending or
descending order. The time complexity of bubble sort is O(n^2), which is also a
polynomial function of the input size.

3. Shortest path in a graph: Given a graph G and two nodes s and t, the task
is to find the shortest path between s and t. Algorithms like Dijkstra's algorithm and
the A* algorithm can solve this problem in O(m + n log n) time complexity, which
is a polynomial function of the inputsize.
4. Maximum flow in a network: Given a network with a source node and a
sink node, and capacities on the edges, the task is to find the maximum flow from
the source to the sink. The Ford-Fulkerson algorithm can solve this problem in
O(mf), where m is the number of edges in the network and f is the maximum flow,
which is also a polynomial function of theinput size.

5. Linear programming: Given a system of linear constraints and a linear


objective function, thetask is to find the values of the variables that optimize the
objective function subject to the constraints. Algorithms like the simplex method can
solve this problem in polynomial time.
P (Polynomial) problems

P problems refer to problems where an algorithm would take a polynomial amount


of time to solve, or where Big-O is a polynomial (i.e. O(1), O(n), O(n²), etc). These are
problems thatwould be considered ‘easy’ to solve, and thus do not generally have
immense run times.
NP (Non-deterministic Polynomial) Problems

NP problems were a little harder for me to understand, but I think this is what they
are. Interms of solving a NP problem, the run-time would not be polynomial. It
would be something like O(n!) or something much larger.
NP-Hard Problems

A problem is classified as NP-Hard when an algorithm for solving it can be translated


to solve any NP problem. Then we can say, this problem is at least as hard as any NP
problem,but it could be much harder or more complex.
NP-Complete Problems

NP-Complete problems are problems that live in both the NP and NP-Hard classes.
This means that NP-Complete problems can be verified in polynomial time and that
any NP problem can be reduced to this problem in polynomial time.

134
5.1.1 Polynomial-Time Algorithm

A polynomial-time algorithm is an algorithm whose running time grows at most


as a polynomial function of the size of its input. The term is primarily used in the
field of computational complexity to describe problems that are solvable
efficiently in terms of computational resources.

Polynomial time is a cornerstone in theoretical computer science, helping to


classify and understandcomputational problems.

CComplexity Description
yclass

NP- A problem that is NP-hard but can be checked for validity in

complete p polynomial timeonce a solution is known

NP-hard A problem that is at least as difficult to solve as another

problem that is NP-hard

Examples of Polynomial-Time Algorithms:

6. Sorting Algorithms:
o Merge Sort
o Bubble Sort
7. Graph Algorithms:
o Dijkstra’s Algorithm for shortest paths
o Kruskal’s Algorithm for Minimum Spanning Tree
8. Searching Algorithms:
o Binary Search

135
o Linear Search
9. Mathematical Computations:
o Matrix Multiplication
o Euclid's Algorithm for GCD
10. Dynamic Programming:
o Longest Common Subsequence
o Knapsack Problem

Example of Polynomial-Time Algorithm in Pseudocode:


Example: Bubble Sort

BubbleSort(array):
for i = 0 to length(array) - 1:
for j = 0 to length(array) - i - 1:
if array[j] > array[j + 1]:
swap(array[j], array[j + 1]) Characteristics

1. Time Complexity Form: T(n)= O(nk), where k is a constant.


2. Scalability: Polynomial growth is much more manageable than
exponential growth, making thesealgorithms practical for large inputs.
3. Classes of Problems: Problems solvable by polynomial-time algorithms
belong to the class

5.2 Venn diagram representation


Venn Diagrams are used for the visual representation of relationships as they
provide a clear, visual method for showing how different sets intersect, overlap, or
remain distinct. They are essential tools in mathematics and logic for illustrating
the relationships between sets. By employing intersecting circlesor other shapes,
Venn diagrams visually represent how different sets overlap, share common
elements, or remain distinct.
Venn diagrams are commonly used to categorize items graphically, underscoring
their similarities and distinctions. In this article, we will discuss all the topics
related to the Venn diagram and its definition and examples, and various terms
related to it.

What is a Venn Diagram?


Venn Diagrams are used to represent the groups of data in circles, if the circles are
overlapping, some elements in the groups are common, if they are not overlapping,
there is nothing common between the groups or sets of data.

A Venn diagram is a graphical representation of mathematical or logical


relationships between different sets. It consists of overlapping circles, each
representing a set, with the overlapping areasillustrating the common elements
shared by the sets.

Venn Diagrams are used for Visual Representation of Relationships as they provide

136
a clear, visual method for showing how different sets intersect, overlap, or remain
distinct. This helps in understandingcomplex relationships between sets in
mathematics.
In a Venn diagram sets are represented as circles and the circles are shown inside
the rectangle which represents the Universal set. A universal set contains all the
circles since it has all the elements present involving all the sets.

NP-Problem:

The NP problems set of problems whose solutions are hard to find but easy to verify
and aresolved by Non-Deterministic Machine in polynomial time.

NP-HardProblem:
A Problem X is NP-Hard if there is an NP-Complete problem Y, such that Y is
reducible to X in polynomial time. NP-Hard problems are as hard as NP-Complete
problems. NP-HardProblem need not be in NP class.

If every problem of NP can be polynomial time reduced to it called as NP Hard. A


lot of times takes the particular problem solve and reducing different problems.

example :
1. Hamiltonian cycle .
2. optimization problem .
3. Shortest path
NP-Complete Problem:
A problem X is NP-Complete if there is an NP problem Y, such that Y is reducible
to X in polynomial time. NP-Complete problems are as hard as NP problems. A
problem is NP- Complete if it is a part of both NP and NP-Hard Problem. A non-
deterministic Turing machinecan solve NP-Completeproblem in polynomial time.
A problem is np-complete when it is both np and np hard combines together. this

137
means np complete problems can be verified in polynomial time.
Example:
1. Decision problems.
2. Regular graphs.

Difference between NP-Hard and NP-Complete:

NP-hard NP-Complete

NP-Hard problems(say X) can


NP-Complete problems can be solved
be solved if and only if there is a
by a non-deterministic
NP-Complete problem(say Y) Algorithm/Turing Machine in
that can be reducible into Xin
polynomial time.
polynomial time.

To solve this problem, it do not To solve this problem, it must be both


have to NP and
bein NP . NP-hard problems.

Time is unknown in NP-Hard. Time is known as it is fixed in NP-


Hard.

NP-hard is not a decision NP-Complete is exclusively a


problem. decision problem.

Not all NP-hard All NP-complete problems are NP-


problems are NP- hard
complete.

Do not have to be a Decision It is exclusively a Decision problem.


problem.

It is optimization problem used. It is Decision problem used.

Example: Determine whether a graph


Example: Halting has a Hamiltonian cycle, Determine
problem,Vertex whether a Boolean formula is
coverproblem, etc. satisfiable or not, Circuit- satisfiability
problem, etc.

5.3 Non-Deterministic Algorithm


A non-deterministic algorithm is an algorithm that can exhibit different behaviors
on different runs, even for the same input. Unlike deterministic algorithms, which

138
produce the same output for a given
input, non-deterministic algorithms can produce different outputs due to the
inherent randomness or multiple execution paths.
Key Principles

Non-deterministic algorithms are characterized by their ability to make arbitrary


choices during execution. These choices can lead to different outcomes, making it
challenging to predict the exact behavior of the algorithm. The main components
of non-deterministic algorithms include:
Choice (X): Selects any value randomly from the set X.
Failure (): Indicates an unsuccessful solution.
Success (): Indicates a successful solution and terminates the current thread.

• j= choice(a, n)
if(A[j]==x) then
{
write(j);
success();
}
write(0); failure();

NP-Hardness

In computational complexity theory, a computational problem H is called NP-hard


if, for every problem L which can be solved in non-deterministic polynomial-time,
there is a polynomial- time reduction from L to H. That is, assuming a solution for
H takes 1 unit time, H's solution can be used to solve L in polynomial time. As a
consequence, finding a polynomial time algorithm to solve a single NP-hard
problem would give polynomial time algorithms for all the problems in the
complexity class NP. As it is suspected, but unproven, that P≠NP, it is unlikely that
any polynomial-time algorithms for NP-hard problems exist.

A simple example of an NP-hard problem is the subset sum problem.


Informally, if H is NP-hard, then it is at least as difficult to solve as the problems
in NP. However, theopposite direction is not true: some problems are undecidable,
and therefore even more difficult to solve than all problems in NP, but they are

139
probably not NP-hard (unless P=NP).

NP-complete problems are a subset of the larger class of NP (nondeterministic


polynomial time) problems. These problems are significant in computational
complexity theory because they represent some of the most challenging
problems to solve efficiently. Definition and Key Principles

A problem is classified as NP-complete if it satisfies two conditions:


1. It is in NP: This means that any solution to the problem can be
verified quickly (in polynomial time) by a deterministic machine.
2. Every problem in NP can be reduced to it in polynomial time: This
implies that if we can solve
an NP-complete problem in polynomial time, we can solve all NP problems in
polynomial time.

Decision vs. Optimization Problems

NP-completeness applies to decision problems, which have a yes or no answer.


However, solving a decision problem in polynomial time often allows us to solve
the corresponding optimization problem in polynomial time as well. For example,
the vertex cover problem can be framed as a decision problem: given a graph and a
number (k ), is there a vertex cover of size ( k )
Reduction

Reduction is a technique used to prove that a problem is NP-complete. If we can


transform a known NP-complete problem into another problem in polynomial time,
Boolean satisfiability problem (SAT): The first problem proven to be NP-complete
the new problem is also NP-complete. This method leverages the transitivity of
reduction

Examples of NP-Complete Problems


Some well-known NP-complete problems include:
Knapsack problem
Hamiltonian path problem
Travelling salesman problem (decision version) Subset sum problem
Graph coloring problem

Solving NP-Complete Problems


Currently, no polynomial-time algorithms are known for NP-complete problems.

140
However, several techniques can be applied to tackle these problems:
Approximation: Finding a solution close to the optimal one.
Randomization: Using randomness to achieve faster average running times.
Restriction: Limiting the input structure to make the problem easier.
Parameterization: Fixing certain parameters to simplify the problem.
Heuristic: Employing algorithms that work well in practice but lack guaranteed
performance2. Importance of NP-Completeness

Understanding NP-completeness is crucial for computer scientists and engineers. It


helps in recognizing problems that are unlikely to have efficient solutions and
guides the search for approximate or heuristic methods.
5.3 Problem Reduction

A reduction is a way of transforming/converting one problem into another problem.

Suppose you have a problem A which you do not know how to solve. However,
you know an algorithm to solve problem B. If you can "transform" an instance α of
problem A into an instance β of problem B, you can use the known algorithm for B
to solve the "transformed" instance β, and obtain the solution for α from the
solution β, by "reversing the transformation". We then say that A reduces to B.
Why Are Reductions Important?

Through reductions, we have another tool to solve problems.

Furthermore, with an efficient, polynomial-time "transformation", reductions


provide the ability to compare the "hardness" of two problems A and B.

Polynomial-time Reductions

Given two decision problems A and B, a polynomial-time reduction


from A to B, denoted A ≤p B, is a transformation from instance α of A to
instance β of B such that:

1. α is a YES-instance for A if and only if β is a YES-


instance for B. (A YES-instance is an instance of the decision problem whose answer
is YES/True).
2. The transformation takes polynomial time in the size of α.

TRAVELING-SALESPERSON-PROBLEM
Given an undirected complete graph with non-negative weight on edges and b ∈
R+,the TRAVELING-SALESPERSON(/previously MAN)-
PROBLEM (TSP) problem asks:
Does there exist a tour of cost at most b?

141
YES-instance for b = 108 but NO-instance for b = 107 as OPT = 108

Note that we have built a specialized tsp visualization page that discusses this
problem and various strategies to attack this problem in a much detailed manner.

C-SAT
The CIRCUIT-SATISFIABILITY problem (C-SAT) is the decision problem of
determining whether a given Boolean circuit - essentially a Directed Acyclic Graph
- (using AND 𝖠, OR ∨, NOT ¬ gates) has an assignment of its n binary inputs that
makes the output True (1). In other words, it asks
whether the n inputs to a given Boolean circuit can be consistently set to True (1) or
False (0) such that the circuit outputs True (1). If that is the case, the circuit is called
satisfiable. Otherwise, the circuit is called unsatisfiable.

CNF-SAT

(OR ∨) of literals, e.g, Cj = x1 ∨ x̄2 ∨ x3.

The CONJUNCTIVE-NORMAL-FORM-SATISFIABILITY (CNF-SAT), or


sometimes just called as SATISFIABILITY (SAT) problem, is the decision problem
of determining whether a given formula Φ that is conjunction (AND 𝖠) of clauses,
e.g., Φ = C1 𝖠 C2 𝖠 C3 𝖠 C4, has a satisfying Definitions: Literal is a Boolean variable
or its negation, e.g., xi, x̄i.
Clause: A disjunction truth assignment.3-SAT

3- CNF-SAT(isfiability), usually just called as 3-SAT problem, is a CNF-SAT where


each clause contains exactly 3 literals corresponding to different variables.

142
Example: Φ = (x̄1 ∨ x2 ∨ x3) 𝖠 (x1 ∨ x̄2 ∨ x3) 𝖠 (x̄1 ∨ x2 ∨ x4) is a YES-instance.
(Sample certificate: x1 = x2 = x4 = True, x3 = False).

5.4 Approximation Algorithms

Overview :
An approximation algorithm is a way of dealing with NP-completeness for an
optimization problem. This technique does not guarantee the best solution. The goal
of the approximation algorithm is to come as close as possible to the optimal
solution in polynomial time. Such algorithms are called approximation algorithms
or heuristic algorithms.

Features of Approximation Algorithm :


Here, we will discuss the features of the Approximation Algorithm as follows.

• An approximation algorithm guarantees to run in polynomial time though it


does not guarantee the most effective solution.
• An approximation algorithm guarantees to seek out high accuracy and top
quality solution(say within 1% of optimum)
• Approximation algorithms are used to get an answer near the (optimal)
solution of an optimization problem in polynomial time

Performance Ratios for approximation


algorithms :
Here, we will discuss the performance ratios of the Approximation Algorithm as
follows.

Scenario-1 :

1. Suppose that we are working on an optimization problem in which each


potential solution has a cost, and we wish to find a near-optimal solution. Depending
on the problem, we may define an optimal solution as one with maximum possible
cost or one with minimum possible cost,i.e, the problem can either be a maximization
or minimization problem.

2. We say that an algorithm for a problem has an appropriate ratio of P(n) if, for
any input size n, the cost C of the solution produced by the algorithm is within a
factor of P(n) of the cost C*of an optimal solution as follows.

max(C/C*,C*/C)<=P(n)

143
Scenario-2 :
If an algorithm reaches an approximation ratio of P(n), then we call it a P(n)-
approximation algorithm.

• For a maximization problem, 0< C < C*, and the ratio of C*/C gives the factor
by which the cost of an optimal solution is larger than the cost of the approximate
algorithm.
• For a minimization problem, 0< C* < C, and the ratio of C/C* gives the factor
by which the cost of an approximate solution is larger than the cost of an optimal
solution.

Some examples of the Approximation algorithm :


Here, we will discuss some examples of the Approximation Algorithm as follows.
The Vertex Cover Problem –
In the vertex cover problem, the optimization problem is to find the vertex cover
with the fewest vertices, and the approximation problem is to find the vertex cover
with few vertices.

Travelling Salesman Problem –


In the traveling salesperson problem, the optimization problem is to find
the shortest cycle, and the approximation problem is to find a short cycle.

The Set Covering Problem –


This is an optimization problem that models many problems that require resources
to be allocated. Here, a logarithmic approximation ratio is used.

The Subset Sum Problem –


In the Subset sum problem, the optimization problem is to find a subset of
{x1,×2,×3…xn} whose sum is as large as possible but not larger than the target value
t

5.4.1 Travelling salesman problem

It is established that solving the travelling salesperson problems for the perfect
optimal solutions is not possible in polynomial time.

Therefore, the approximation solution is expected to find a near optimal solution


for this NP-Hard problem. However, an approximate algorithm is devised only if
the cost function (which is defined as the distance between two plotted points) in
the problem satisfies triangle inequality.

The triangle inequality is satisfied only if the cost function c, for all the vertices of a
144
triangle u, v andw, satisfies the following equation

c(u, w)≤ c(u, v)+c(v, w)

It is usually automatically satisfied in many applications.


Travelling Salesperson Approximation Algorithm

The travelling salesperson approximation algorithm requires some prerequisite


algorithms to be performed so we can achieve a near optimal solution. Let us look
at those prerequisite algorithms briefly −

Minimum Spanning Tree − The minimum spanning tree is a tree data structure
that contains all the vertices of main graph with minimum number of edges
connecting them. We apply prim’s algorithm for minimum spanning tree in this
case.

Pre-order Traversal − The pre-order traversal is done on tree data structures where
a pointer is walked through all the nodes of the tree in a [root – left child –
right child] order.

Algorithm

Step 1 − Choose any vertex of the given graph randomly as the starting and ending
point.

Step 2 − Construct a minimum spanning tree of the graph with the vertex chosen as
the root using prim’s algorithm.

Step 3 − Once the spanning tree is constructed, pre-order traversal is performed


on the minimum spanning tree obtained in the previous step.

Step 4 − The pre-order solution obtained is the Hamiltonian path of the travelling
salesperson.

Pseudocode APPROX_TSP(G, c)

r <- root node of the minimum spanning tree T <- MST_Prim(G, c, r)

visited = {ф} for i in range V:

H <- Preorder_Traversal(G) visited = {H}

145
5.5 Bin Packing problem

Bin Packing problem involves assigning n items of different weights and bins each
of capacity c to a bin such that number of total used bins is minimized. It may be
assumed that all itemshave weights smaller than bin capacity.

The following 4 algorithms depend on the order of their inputs. They pack the item
givenfirst and then move on to the next input or next item

Next Fit algorithm


The simplest approximate approach to the bin packing problem is the Next-Fit
(NF)algorithm which is explained later in this article. The first item is assigned to
bin 1. Items 2,...
,n are then considered by increasing indices : each item is assigned to the current
bin, if it fits; otherwise, it isassigned to a new bin, which becomes the current one.

Visual Representation

Let us consider the same example as used above and bins of size 1

Assuming the sizes of the items be {0.5, 0.7, 0.5, 0.2, 0.4, 0.2, 0.5, 0.1, 0.6}.
The minimum number of bins required would be Ceil ((Total Weight) / (Bin
Capacity))= Celi(3.7/1) = 4 bins.
The Next fit solution (NF(I))for this instance I would be- Considering 0.5 sized item

first,we can place it in the first bin

Moving on to the 0.7 sized item, we cannot place it in the first bin. Hence we place it
in a new bin.

146
Moving on to the 0.5 sized item, we cannot place it in the current bin. Hence we place it
in a new bin.

Moving on to the 0.2 sized item, we can place it in the current (third bin)

Similarly, placing all the other items following the Next-Fit algorithm we get-

147
Thus we need 6 bins as opposed to the 4 bins of the optimal solution. Thus we
can see thatthis algorithm is not very efficient.

Analyzing the approximation ratio of Next-Fit algorithm

The time complexity of the algorithm is clearly O(n). It is easy to prove that, for
anyinstance I of BPP,the solution value NF(I) provided by the algorithm
satisfies the bound

NF(I)<2z(I)

where z(I) denotes the optimal solution value. Furthermore, there exist
instances for whichthe ratio NF(I)/z(I) is arbitrarily close to 2, i.e. the worst-
case approximation ratio of NF is r(NF) = 2.
Psuedocode

NEXT FIT (size[], n, c)


size[] is the array containg the sizes of the items, n is the number of items and c
is thecapacity of the bin
{
Initialize result (Count of bins) and remaining capacity in current bin. res =
0bin_rem = c Place items one by one for (int i = 0; i < n; i++) {
// If this item can't fit in current bin if (size[i] > bin_rem) {
Use a new bin res++ bin_rem = c - size[i]
}
else
bin_rem -= size[i];
}
return res;
}

First Fit algorithm


A better algorithm, First-Fit (FF), considers the items according to increasing

148
indices andassigns each item to the lowest indexed initialized bin into which it
fits; only when the current item cannot fit into any initialized bin, is a new bin
introduced

Visual Representation

Let us consider the same example as used above and bins of size 1

Assuming the sizes of the items be {0.5, 0.7, 0.5, 0.2, 0.4, 0.2, 0.5, 0.1, 0.6}.
The minimum number of bins required would be Ceil ((Total Weight) / (Bin
Capacity))= Celi(3.7/1) = 4 bins.
The First fit solution (FF(I))for this instance I would be- Considering 0.5 sized

item first,we can place it in the first bin

new bin.

Moving on to the 0.7 sized item, we cannot place it in the first bin. Hence we
place it in a

Moving on to the 0.5 sized item, we can place it in the first bin.

Moving on to the 0.2 sized item, we can place it in the first bin, we check with
the secondbin and we can place it there.

149
Moving on to the 0.4 sized item, we cannot place it in any existing bin. Hence
we place it in a new bin.

Thus we need 5 bins as opposed to the 4 bins of the optimal solution


but is much moreefficient than Next-Fit algorithm.
Analyzing the approximation ratio of Next-Fit algorithm

If FF(I) is the First-fit implementation for I instance and z(I) is the most optimal
solution, then:

150
It can be seen that the First Fit never uses more than 1.7 * z(I) bins. So First-Fit
isbetterthan Next Fit in terms of upper bound on number of bins.
Psuedocode FIRSTFIT(size[], n, c)
{
size[] is the array containg the sizes of the items, n is the number of items and c
is thecapacity of the bin/Initialize result (Count of bins) res = 0;
Create an array to store remaining space in bins there can be at most n bins
bin_rem[n];

Plae items one by one for (int i = 0; i < n; i++) {


Find the first bin that can accommodate weight[i] int j;for (j = 0; j < res; j++) {
if (bin_rem[j] >= size[i]) { bin_rem[j] = bin_rem[j] - size[i]; break;
}
}
If no bin could accommodate size[i] if (j == res) { bin_rem[res] = c - size[i];
res++;
}
}

return res;
}

1) Best Fit Algorithm


The next algorithm, Best-Fit (BF), is obtained from FF by assigning the current
item to the feasible bin (if any) having the smallest residual capacity (breaking
ties in favor of the lowest indexed bin).

Simply put,the idea is to places the next item in the tightest spot. That is, put

151
it in thebin sothat the smallest empty space is left.
Visual Representation

Let us consider the same example as used above and bins of size 1

Assuming the sizes of the items be {0.5, 0.7, 0.5, 0.2, 0.4, 0.2, 0.5, 0.1, 0.6}.
The minimum number of bins required would be Ceil ((Total Weight) / (Bin
Capacity))= Celi(3.7/1) = 4 bins.

The First fit solution (FF(I))for this instance I would be- Considering 0.5 sized
item first, we can place it in the first bin

Moving on to the 0.7 sized item, we cannot place it in the first bin. Hence we
place it in a new bin.
we can place it in the first bin tightly.

Moving on to
the 0.5 sized item,

Moving on to the 0.2 sized item, we cannot place it in the first bin but we can

152
place it in second bin tightly.

Moving on to the 0.4 sized item, we cannot place it in any existing bin. Hence
we place it in a new bin.

Similarly, placing all the other items following the First-Fit algorithm we get-

Thus we need 5 bins as opposed to the 4 bins of the optimal solution but is
much moreefficient than Next-Fit algorithm.

Analyzing the approximation ratio of Best-Fit algorithm

It can be noted that Best-Fit (BF), is obtained from FF by assigning the current
item to the feasible bin (if any) having the smallest residual capacity (breaking
ties in favour of the lowest indexedbin). BF satisfies the same worst-case
bounds as FF

Analysis Of upper-bound of Best-Fit algorithm

If z(I) is the optimal number of bins, then Best Fit never uses more than 2 *
z(I)-2 bins. SoBest Fit is same as Next Fit in terms of upper bound on number
of bins.
Psuedocode

BESTFIT(size[],n, c)
{

size[] is the array containg the sizes of the items, n is the number of items and
c is thecapacity of the bin
Initialize result (Count of bins) res = 0;
153
Create an array to store remaining space in bins there can be at most n bins

bin_rem[n];Place items one by one for (int i = 0; i < n; i++) {

Find the best bin that can accommodate weight[i] int j;

Initialize minimum space left and index of best bin int min = c + 1, bi = 0;for (j

= 0;j < res; j++) {


if (bin_rem[j] >= size[i] && bin_rem[j] - size[i] < min) { bi = j; min = bin_rem[j]
- size[i];
}

If no bin could accommodate weight[i],create a new bin if (min == c + 1)


{bin_rem[res] = c - size[i]; res++;
}
else
Assign the item to best bin bin_rem[bi] -= size[i];
}
return res;
}
In the offline version, we have all items at our disposal since the start of
the
execution. The natural solution is to sort the array from largest to smallest, and
then apply the algorithms discussed henceforth.
NOTE: In the online programs we have given the inputs upfront for simplicity
but it can alsowork interactively
Let us look at the various offline algorithms
1) First Fit Decreasing
We first sort the array of items in decreasing size by weight and apply first-fit
algorithm asdiscussed above

Algorithm

• Read the inputs of items


154
• Sort the array of items in decreasing order by their sizes
• Apply First-Fit algorithm Visual Representation

Let us consider the same example as used above and bins of size 1

Assuming the sizes of the items be {0.5, 0.7, 0.5, 0.2, 0.4, 0.2, 0.5, 0.1, 0.6}.
Sorting them we get {0.7, 0.6, 0.5, 0.5, 0.5, 0.4, 0.2, 0.2, 0.1}

The First fit Decreasing solution would be-


We will start with 0.7 and place it in the first bin

We then select 0.6 sized item. We cannot place it in bin 1. So, we place it in bin
2

We then select 0.5 sized item. We cannot place it in any existing. So, we
place it in bin 3

155
We then select 0.5 sized item. We can place it in bin 3
Doing the same for all items, we get.

Thus only 4 bins are required which is the same as the optimal solution.

2) Best Fit Decreasing


We first sort the array of items in decreasing size by weight and apply Best-
fit algorithm asdiscussed above
Algorithm

• Read the inputs of items

• Sort the array of items in decreasing order by their sizes

• Apply Next-Fit algorithm Visual Representation

Let us consider the same example as used above and bins of size 1

Assuming the sizes of the items be {0.5, 0.7, 0.5, 0.2, 0.4, 0.2, 0.5, 0.1, 0.6}.
Sorting them we get {0.7, 0.6, 0.5, 0.5, 0.5, 0.4, 0.2, 0.2, 0.1}

156
The Best fit Decreasing solution would be-

We will start with 0.7 and place it in the first bin

We then select 0.6 sized item. We cannot place it in bin 1. So, we place it in bin
2

We then select 0.5 sized item. We cannot place it in any existing. So, we place
it in bin 3

We then select 0.5 sized item. We can place it in bin 3

Doing the same for all items, we get.

157
Thus only 4 bins are required which is the same as the optimal solution.
5.6 Randomized Algorithms

Randomized algorithms are a fascinating subset of computational strategies that utilize


randomness as a fundamental component in their logic. These algorithms can achieve
significant improvements in performance and efficiency across a variety of complex
problems.

Randomized algorithms are computational processes that utilize random numbers at


one or more points in their logic, allowing for a degree of unpredictability in their
operation. They provide solutions to problems where deterministic algorithms may be
inefficient or infeasible, often delivering results more quickly and with less
computational overhead.
The core idea behind these algorithms involves making decisions based on random
inputs or choices. This randomness can lead to different outcomes with each execution,
which, depending on the algorithm, can be acceptable if the average performance over
many runs remains favorable. Often, randomized algorithms are designed to output
correct results with high probability, offering a practical approach to complex
problems. A common example of a randomized algorithm is the QuickSort sorting
algorithm, which chooses a pivot element randomly. This pivot selection helps
distribute the data more evenly across recursive calls, leading to improved average
performance, particularly in scenarios involving large datasets. Understanding the
fundamentals of randomized algorithms is essential for recognizing their wide-ranging
applications in fields such as computer science, cryptography, and optimization.

5.7 primality testing

Definition

Primality testing is the process used to determine whether a given number n is a prime

number or not. A prime number is defined as a natural number greater than 1 that has

no divisors other than 1 and itself. For example, numbers like 2, 3, 5, and 7 are prime,

158
while 4 and 6 are not because they have divisors other than 1 and themselves.

Methods for Primality Testing

Primality testing can be carried out using deterministic and probabilistic algorithms:

1. Deterministic Methods

• These algorithms give a definite answer about the primality of anumber.

• Trial Division:

• The simplest approach, where a number n is tested for divisibility byall integers

from 2 up to \sqrt{n} .

• If any divisor is found, n is composite; otherwise, it is prime.

• The time complexity is O(\sqrt{n}) .

• Sieve of Eratosthenes:

• Generates all prime numbers up to a given limit.

• Starting from 2, it marks all multiples of each prime number as nonprime.

• Efficient for finding all primes up to a specified limit but not suitable forchecking

large individual numbers.

2. Probabilistic Methods

• These methods give a high probability of correctness rather than adefinite answer.

• Fermat’s Primality Test:

• Uses Fermat’s Little Theorem, which states that if n is prime, then forany integer a

, 1 < a < n , \( a^{n-1} \equiv 1 \mod n \).


159
• If this condition holds for several randomly chosen values of a , then n
is probably prime.

• Miller-Rabin Primality Test:

• A refinement of Fermat’s test that reduces the number of false

positives(Carmichael numbers).

• Tests if n satisfies specific properties related to modular arithmetic forseveral

random values.

• Increasing the number of iterations reduces the probability of error.

Example

For n = 29 , trial division can be used:

• Check divisibility by all integers from 2 up to \sqrt{29} \approx 5.4 , i.e., 2, 3, and
5.

• Since 29 is not divisible by any of these, it is prime.

Explanation

Primality testing plays a crucial role in cryptography, where large primes are used for

encryption algorithms such as RSA. The efficiency of probabilistic tests like MillerRabin
makes them suitable for cryptographic applications.

Time and Space Complexity

• Trial Division: O(\sqrt{n}) , space complexity O(1) .

• Sieve of Eratosthenes: O(n \log \log n) , space complexity O(n) .

• Miller-Rabin Test: O(k \log^3 n) , where k is the number of iterations,space

160
complexity O(\log n).

5.8 Randomized Quick Sort

Definition

Randomized Quick Sort is a variation of the traditional Quick Sort algorithm, where

the pivot element is chosen randomly. This randomization helps to avoid the worstcase
time complexity scenario by improving the likelihood of evenly splitting the array,
making the sorting process more efficient. Algorithm
1. Choose a Pivot Randomly:

• Select a pivot element at random from the array.

• Swap the pivot element with the last element.

2. Partition the Array:

• Rearrange the array so that elements smaller than the pivot are on theleft, and

elements larger than the pivot are on the right.

• Move the pivot to its correct sorted position.

3. Recursively Apply the Procedure:

• Apply Randomized Quick Sort to the left and right subarrays.

• Continue until the array is fully sorted.

Example

Let’s sort the array [3, 6, 8, 10, 1, 2, 1]:

1. Randomly select a pivot, say 2, and swap it with the last element.

• Array becomes [3, 6, 8, 10, 1, 1, 2].

161
2. Partition the array around 2:

• Elements less than 2: [1, 1].


• Elements greater than 2: [3, 6, 8, 10].

• Rearrange the array as [1, 1, 2, 3, 6, 8, 10].

3. Recursively sort the subarrays:

• Left subarray [1, 1] is already sorted.

• Sort the right subarray [3, 6, 8, 10].

Explanation

Randomized Quick Sort helps avoid the worst-case scenario of O(n^2) , which occurs

when the pivot consistently divides the array into unbalanced parts. By randomizing

the pivot selection, the algorithm ensures a high probability of good splits, leading to

an average-case time complexity of O(n \log n) .

Time and Space Complexity

• Average-Case Time Complexity: O(n \log n) , where n is the numberof elements.

• Worst-Case Time Complexity: O(n^2) , occurs when the pivot choice results in

unbalanced partitions.

• Space Complexity: O(\log n) , due to the recursion stack.

Randomized Quick Sort combines the efficiency of Quick Sort with the benefits of

randomization, making it an excellent choice for many practical sorting tasks.

162

You might also like