Searching and Sorting
Searching
• Two types of searches (Topics covered in Unit 1) :
• Linear Search
• Binary Search
Sorting
• The arrangement of data in a preferred order is called sorting in the
data structure. By sorting data, it is easier to search through it quickly
and easily. The simplest example of sorting is a dictionary. Before the
era of the Internet, when you wanted to look up a word in a
dictionary, you would do so in alphabetical order. This made it easy.
• Imagine the panic if you had to go through a big book with all the
English words from the world in a jumbled order! It is the same panic
an engineer will go through if their data is not sorted and structured.
External Sorting
• External sorting is a term for a class of sorting algorithms that can
handle massive amounts of data.
• External sorting is required when the data being sorted do not fit into
the main memory of a computing device (usually RAM) and instead,
they must reside in the slower external memory (usually a hard
drive).
• External sorting typically uses a hybrid sort-merge strategy. In the
sorting phase, chunks of data small enough to fit in main memory are
read, sorted, and written out to a temporary file. In the merge phase,
the sorted sub-files are combined into a single larger file.
External Sorting
• External sorting algorithms generally fall into two types,
distribution sorting, which resembles quicksort, and external
merge sort, which resembles merge sort.
Internal Sorting
• An internal sort is any data sorting process that takes place entirely
within the main memory of a computer. This is possible whenever the
data to be sorted is small enough to all be held in the main memory.
• Some common internal sorting algorithms include:
1.Bubble Sort
2.Insertion Sort
3.Quick Sort
4.Heap Sort
5.Radix Sort
6.Selection sort
Bubble Sort
• Bubble Sort is a simplest algorithm which is used to sort a given set of n
elements provided in form of an array with n number of elements.
• Bubble Sort compares all the element one by one and sort them based on
their values.
• If the given array has to be sorted in ascending order, then bubble sort will
start by comparing the first element of the array with the second element,
if the first element is greater than the second element, it will swap both
the elements, and then move on to compare the second and the third
element, and so on.
• If we have total n elements, then we need to repeat this process for n-1
times.
• It is known as bubble sort, because with every complete iteration the
largest element in the given array, bubbles up towards the last place or the
highest index, just like a water bubble rises up to the water surface.
• Sorting takes place by stepping through all the elements one-by-one and
comparing it with the adjacent element and swapping them if required.
Bubble Sort
• Idea about Bubble Sorting:
• Given an array of n items
1. Compare pair of adjacent items
2. Swap if the items are out of order
3. Repeat until the end of array
◼ The largest item will be at the last position
4. Reduce n by 1 and go to Step 1
• ◼ Analogy
• Large item is like “bubble” that floats to the end of the array
Bubble Sort Algorithm
Algorithm bubbleSort (int arr[], int n)
{ // This algorithm accepts an array arr[n] consisting of n elements
for i := 0 to n-2 step 1 do
{
flag:=0;
for j := 0 to n-i-2 step 1 do
{
if( arr[j] > arr[j+1])
{// swap the elements
temp := arr[j];
arr[j] := arr[j+1];
arr[j+1] := temp;
flag:=1;
}
if flag = 0 then
break;
}
}
Bubble Sort Program
void bubbleSort(int arr[], int n)
{
int i, j, temp, flag =1;
for(i = 0; i < n-1 && flag== 1; i++)
{ flag=0;
for(j = 0; j < n-i-1; j++)
{
if( arr[j] > arr[j+1])
{
// swap the elements
temp = arr[j];
arr[j] = arr[j+1];
arr[j+1] = temp;
flag=1;
}
}
}
Bubble Sort Program
main()
{
int i, n, x[25];
printf("\n Enter the number of elements: ");
scanf("%d",&n);
printf("\n Enter Data:");
for(i = 0; i < n ; i++)
scanf("%d", &x[i]);
bubbleSort(x, n);
printf ("\n Array Elements after sorting: ");
for (i = 0; i < n; i++)
printf ("%5d", x[i]);
}
Selection Sort
• Suppose x is an array of size n stored in memory.
• We will assume that there are two subarrays: Sorted subarray and
unsorted sub array. Initially sorted sub array has 0 elements and un
sorted subarray has n elements.
• Selection sort algorithm finds the smallest element in the unsorted
sub-array x and place it at fist index of unsorted sub-array. It simply
continues this procedure until it places the second biggest element in
the second last position of the array.
Selection Sort
void selectionSort( int a[], int n )
{
int i, j, temp=0, min;
for( i=0; i < n-1; i++ )
{
min = i;
for( j=i+1; j < n; j++ )
{
if( a[j] < a[min] )
min=j;
}
if(min !=i)
{
temp = a[i];
a[i]= a[min];
a[min] = temp;
}
}
}
Insertion Sort
• Insertion sort is a simple sorting algorithm that works similar to the
way you sort playing cards in your hands. The array is virtually split
into a sorted and an unsorted part. Values from the unsorted part are
picked and placed at the correct position in the sorted part.
Insertion Sort
To sort an array of size n in ascending order:
1. Iterate from arr[1] to arr[n] over the array.
2. Compare the current element (key) to its predecessor.
3. If the key element is smaller than its predecessor, compare it to the
elements before. Move the greater elements one position up to
make space for the swapped element.
Second Example
• 12, 11, 13, 5, 6
• Let us loop for i = 1 (second element of the array) to 4 (last element of the
array)
• i = 1. Since 11 is smaller than 12, move 12 and insert 11 before 12
11, 12, 13, 5, 6
• i = 2. 13 will remain at its position as all elements in A[0..I-1] are smaller
than 13
11, 12, 13, 5, 6
• i = 3. 5 will move to the beginning and all other elements from 11 to 13 will
move one position ahead of their current position.
5, 11, 12, 13, 6
• i = 4. 6 will move to position after 5, and elements from 11 to 13 will move
one position ahead of their current position.
5, 6, 11, 12, 13
Insertion Sort
void InsertionSort(arr[], int n)
{ int temp;
for (i=1; i<n;i++)
{
temp=arr[i];
j=i-1;
while(j>=0 && arr[j]>temp)
{
arr[j+1] =arr[j];
j--;
}
arr[j+1]=temp;
}
}
Quick Sort
Quick Sort
Partition Algorithm of Quick Sort
Quick Sort and interchange algo.
Merge Sort
Merge Sort
Algorithm for merging two sub-arrays
Example of Merge Sort
A[1: 10] = (310, 285, 179, 652, 351, 423, 861, 254, 450, 520)
Example of Merge Sort
Shell Sort
• Shell sort is an algorithm that first sorts the elements far apart from each
other and successively reduces the interval between the elements to be
sorted. It is a generalized version of insertion sort.
• In shell sort, elements at a specific interval are sorted. The interval
between the elements is gradually decreased based on the sequence used.
The performance of the shell sort depends on the type of sequence used
for a given input array.
• Some of the optimal sequences used are:
• Shell's original sequence: N/2 , N/4 , …, 1
• Knuth's increments: 1, 4, 13, …, (3k – 1) / 2
Shell Sort
The shell sort, sometimes called the “diminishing increment sort,” improves
on the insertion sort by breaking the original list into a number of smaller
sublists, each of which is sorted using an insertion sort. The unique way that
these sublists are chosen is the key to the shell sort. Instead of breaking the
list into sublists of contiguous items, the shell sort uses an increment i,
sometimes called the gap, to create a sublist by choosing all items that are i
items apart.
This can be seen in Figure 6. This list has nine items. If we use an increment
of three, there are three sublists, each of which can be sorted by an insertion
sort. After completing these sorts, we get the list shown in Figure 7. Although
this list is not completely sorted, something very interesting has happened. By
sorting the sublists, we have moved the items closer to where they actually
belong.
Example 1 of Shell Sort
Pass 1, gap1 = n/3
Example 1 Shell sort
• Figure 8 shows a final insertion sort using an increment of one; in
other words, a standard insertion sort. Note that by performing the
earlier sublist sorts, we have now reduced the total number of shifting
operations necessary to put the list in its final order. For this case, we
need only four more shifts to complete the process.
Pass 2, gap2 = gap1/3
Previous Example: Different gap value
The way in which the increments are chosen is the unique feature of the
shell sort. In this case, we begin with gap = n/2 sublists. On the next
pass, n/4 sublists are sorted. Eventually, a single list is sorted with the
basic insertion sort. Figure 9 shows the first sublists for our example
using this increment.
Pass 1, gap=n/2
23 29 15 19 31 7 9 5 2
Shell Sort function
void ShellSort(int a[], int n)
{
int gap, i, j;
for(gap=n/2; gap>=1; gap=gap/2)
for(j=gap;j<n;j++)
for(i=j-gap;i>=0; i=i-gap)
{
if (a[i+gap] >a[i])
break;
else
{
int temp =a[i+gap];
a[i+gap]= a[i];
a[i]= temp;
}
}
}
Hashing
Drawback of Linear search and binary search techniques is-
• As the number of elements increases, time taken to perform the search also
increases. This becomes problematic when total number of elements become
too large.
• What is Hashing?
• Hashing is a searching technique where the time taken to perform the search
does not depend upon the total number of elements. It completes the search
with constant time complexity.
• It minimizes the number of comparisons while performing the search. This
method generally used the hash functions to map the keys into a table, which
is called a hash table.
Hash table
• Hash table is a type of data structure which is used for storing and
accessing data very quickly. Insertion of data in a table is based on a
key value. Hence every entry in the hash table is defined with some
key. By using this key data can be searched in the hash table by few
key comparisons and then searching time is dependent upon the size of
the hash table.
• It also facilitates easy insertion and deletion of data. Most of the
symbol table in compilers are implemented using hash tables.
• In systems where the primary objective is retrieval of information,
hash tables are used extensively.
Hash Function
• Hash function: Hash function is a function which is applied on a key
by which it produces an integer, which can be used as an address of
hash table. Hence one can use the same hash function for accessing
the data from the hash table. The integer returned by the hash
function is called hash key.
H: K-> L
Where H is a hash function, K is the key and L is the memory address.
• Popular hash functions
• Division method
• Mid square method
• Digit folding method
Division method
• In this the hash function is dependent upon the remainder of a
division. For example:-if the record with keys 52, 68, 99, 84 are to be
placed in a hash table and let us take the table size is 10.
• Then:
h(key)=key mod table_size.
2=52%10
8=68%10
9=99%10
4=84%10
Mid square method
• In this method firstly key is squared and then mid part of the result is
taken as the index.
• For example: consider that if we want to place a record with key 3101
and the size of table is 1000. So 3101*3101=9616201 i.e. h (3101) =
162 (middle 3 digit)
Digit folding method
• In this method the key is divided into separate parts and by using some
simple operations these parts are combined to produce a hash key.
• For example: consider a record with key 12465512 then it will be
divided into parts i.e. 124, 655, 12. After dividing the parts combine
these parts by adding it.
H(key)=124+655+12
=791
Characteristics of good hashing function
1.The hash function should generate different hash values for the
similar string.
2.The hash function is easy to understand and simple to compute.
3.The hash function should produce the keys which will get
distributed, uniformly over an array.
4.A number of collisions should be less while placing the data in
the hash table.
5.The hash function is a perfect hash function when it uses all the
input data.
Collision
• Collision: Suppose we want to add a new record R with key k,
but suppose the memory location address H(k) is already
occupied. This situation is called collision.
• Load factor (λ): ratio of number of keys to the number of
addresses where these keys can be stored is known as load
factor. If a class has 24 students and we have table space for 365
records. Then load factor =24/365.
• Collision resolution policy: It is the method used to determine
where to store a value that has undergone a collision.
Collision resolution technique
• There are generally two broad ways of collision resolution:
1. Open addressing: array based implementation
a. Linear Probing (linear search)
b. Quadratic probing (non linear search)
c. Double Hashing (uses two hash functions)
2. Separate Chaining: linked based implementation
Chaining
It is a method in which additional field with data i.e. chain is introduced. A chain is
maintained at the home bucket. In this when a collision occurs then a linked list is
maintained for colliding data.
Example: Let us consider a hash table of size 10 and we apply a hash function of
H(key)=key % size of table. Let us take the keys to be inserted are 31,33,77,61. In the
above diagram we can see at same bucket 1 there are two records which are maintained by
linked list or we can say by chaining method.
Linear probing
• It is very easy and simple method to resolve or to handle the collision. In this collision can be
solved by placing the second record linearly down, whenever the empty place is found. In this
method there is a problem of clustering which means at some place block of a data is formed in
a hash table.
• Example: Let us consider a hash table of size 10 and hash function is defined as H(key)=key %
table size. Consider that following keys are to be inserted that are 56,64,36,71.
• In this diagram we can see that 56 and 36 need to be placed at same bucket but by linear probing
technique the records linearly placed downward if place is empty i.e. it can be seen 36 is placed
at index 7.
Quadratic probing
• This is a method in which solving of clustering problem is done. In this method the hash
function is defined by the H(key)=(H(key)+x*x)%table size. Let us consider we have to
insert following elements that are:-67, 90,55,17,49.
• In this we can see if we insert 67, 90, and 55 it can be inserted easily but at case of 17
hash function is used in such a manner that :-(17+0*0)%10=17 (when x=0 it provide the
index value 7 only) by making the increment in value of x. let x =1 so (17+1*1)%10=8.in
this case bucket 8 is empty hence we will place 17 at index 8.
Double hashing
• It is a technique in which two hash function are used when there is an occurrence of collision.
In this method 1 hash function is simple as same as division method. But for the second hash
function there are two important rules which are
• It must never evaluate to zero.
• Must sure about the buckets, that they are probed.
• The hash functions for this technique are:
• H1(key)=key % table size
• H2(key)=P-(key mod P)
• Where, p is a prime number which should be taken smaller than the size of a hash table.
• Example: Let us consider we have to insert 67, 90,55,17,49. In this we can see 67, 90 and 55
can be inserted in a hash table by using first hash function but in case of 17 again the bucket is
full and in this case we have to use the second hash function which is H2(key)=P-(key mode P)
here p is a prime number which should be taken smaller than the hash table so value of p will
be the 7.
• i.e. H2(17)=7-(17%7)=7-3=4 that means we have to take 4 jumps for placing the 17. Therefore
17 will be placed at index 1.