HASHING
Searching Techniques
◦ Linear Search takes O(n) time to perform the search in unsorted arrays of n
 elements.
◦ Binary Search takes O(log n) time to perform the search in sorted arrays of n
 elements.
◦ It takes O(log n) time to perform the search in a Binary Search Tree consisting
 of n elements.
◦ The main drawback of these techniques is-
  ◦ As the number of elements increases, the search time also increases. This
   becomes problematic when the total number of elements becomes too
   large.
   Hashing in Data Structure
◦ Hashing is a well-known technique to search for any particular element
 among several elements.
◦ It minimizes the number of comparisons while performing the search.
◦ Advantage-
 Unlike other searching techniques,
 ◦ Hashing is extremely efficient.
 ◦ The time taken by it to perform the search does not depend upon the total number of
   elements.
 ◦ It completes the search with constant time complexity O(1).
    Hashing Mechanism
◦ In hashing,
  ◦ An array data structure called a Hash table
   is used to store the data items.
  ◦ Based on the hash key value, data items
   are inserted into the hash table.
◦ Hash Key Value-
  ◦ Hash key value is a special value that
   serves as an index for a data item.
  ◦ It indicates where the data item should be
   stored in the hash table.
  ◦ Hash key value is generated using a hash
   function.
   Hash Function
◦ Hash function is a function that maps any big number or string to a small integer
 value.
◦ The main idea behind the hashing is to create the (key/value) pairs. If the key is
 given, then the hash function algorithm computes the index at which the value
 would be stored.
◦ Hash function takes the data item as an input (key) and returns a small integer
 value as an output (hash value).
       hash value = hash function (key)
◦ Hash value represents the original string of characters, but it is normally smaller
 than the original.
◦ Hash value of the data item (key) is then used as an index for storing it in the hash
 table.
   Key Properties of Hash Functions
◦ Deterministic: A hash function must consistently produce the same output
  for the same input.
◦ Fixed Output Size: The output of a hash function should have a fixed size,
  regardless of the size of the input.
◦ Efficiency: The hash function should be able to process input quickly.
◦ Uniformity: The hash function should distribute the hash values uniformly
  across the output space to avoid clustering.
◦ Pre-image Resistance: It should be computationally infeasible to reverse
  the hash function, i.e., to find the original input given a hash value.
◦ Collision Resistance: It should be difficult to find two different inputs that
  produce the same hash value.
◦ Avalanche Effect: A small change in the input should produce a
  significantly different hash value.
Types of Hash Functions
◦ There are different ways through which we can calculate hash
 functions:
   1. Division method
   2. Folding method
   3. Mid-square method
   4. Multiplication method
1. Division Method
◦ This is the simplest and easiest method to generate a hash value. The hash
 function divides the value k by M and then uses the remainder obtained.
◦ Formula:
      h(K) = k mod M
 Here,
  k is the key value, and
  M is the size of the hash table.
◦ In this method, we map a key K into one of the M slots by taking the
 remainder of K divided by M
Example
◦ For example, let the hash function be h(k) = k mod 10
  where k represents the keys. Let the keys be: 12, 11,
  14, 17, 16, 15, 18 and 13. The empty table will look as
  follows having a table size of 10 and index starting
  from 0.
◦ To insert values into the hash table, we make use of
  the hash function.
        h(12) = 12mod10 = 2
        h(11) = 11mod10 = 1
        h(14) = 14mod10 = 4
        h(17) = 17mod10 = 7
        h(16) = 16mod10 = 6
        h(15) = 15mod10 = 5
        h(18) = 18mod10 = 8
        h(13) = 13mod10 = 3
◦ The output of h(k) is the index at which we need to
  place the particular value in the hash table. Thus, after
  inserting values, the hash table will be:
    2. Folding Method
◦ This method involves two steps:
  ◦ Divide the key-value k into several parts, i.e., k1, k2, k3,….,kn, where each part has the same number of digits
    except for the last part, which can have lesser digits than the other parts.
  ◦ Add the individual parts. The hash value is obtained by ignoring the last carry, if any.
◦ Formula:
        k = k1, k2, k3, k4, ….., kn
        s = k1+ k2 + k3 + k4 +….+ kn
        h(K)= s
Here, s is obtained by adding the parts of the key k
◦ For example, if our item was the phone number 12345, we would take the digits and divide them into
  groups of 2 digits
  K1= 12, k2= 34, k3= 5
  S= 12+34+5, we get 51. so, h(12345)=51
If we assume our hash table has 11 slots, we need to perform the extra step of dividing by 11 and
keeping the remainder. In this case, 51 % 11 is 7, so the phone number 12345 hashes to slot 7.
  3. Mid Square Method
◦ The mid-square method is a very good hashing method. It involves two steps to compute the hash
  value-
  ◦ Square the value of the key k, i.e., k2
  ◦ Extract the middle r digits as the hash value.
◦ Formula:
        h(K) = h(k x k)
Here, k is the key value.
◦ The value of r can be decided based on the table size.
◦ Example:
◦ Suppose the hash table has 100 memory locations. So, r = 2 because two digits are required to
  map the key to the memory location.
        k = 60
        k x k = 60 x 60
               = 3600
        h(60) = 60
◦ The hash value obtained is 60
    4. Multiplication method
◦ This method involves two steps:
  ◦ Determine a constant value. A, where (0, A, 1)
  ◦ Add A to the key value and multiply.
  ◦ Consider kA's fractional portion.
  ◦ Multiply the outcome of the preceding step by M, the hash table's size.
◦ Formula:
        h(K) = floor (M (kA mod 1))
        Here, M = size of the hash table, k = key value and A = constant value
◦ For example, k = 5678, A = 0.6829, M = 200
        Now, calculating the new value of h(5678):
        h(5678) = floor[200(5678 x 0.6829 mod 1)]
        h(5678) = floor[200(3877.5062 mod 1)]
        h(5678) = floor[200(0.5062)]
        h(5678) = floor[101.24]
        h(5678) = 101
Collision in Hashing
◦ In hashing,
  ◦ Hash function is used to compute the hash value for a key.
  ◦ Hash value is then used as an index to store the key in the hash table.
  ◦ Hash function may return the same hash value for two or more keys.
◦ When the hash value of a key maps to an already occupied bucket of the hash table, it is called
  as a Collision.
◦ For example, let us consider a hash table with size 10, where the following items are to be
  stored:12, 24, 56, 30, 62.
  Let us assume that the hash function is given by h(K) = K % 10
  We take the numbers one by one, apply the hash function to it, and the result tells us the
  index that the numbers will occupy in the hash table. Applying the hash function, to the
  given numbers,
        h(12) = 12 % 10 = 2
        h(24) = 24 % 10 = 4
        h(56) = 56 % 10 = 6
        h(30) = 30 % 10 = 0
        h(62) = 62 % 10 = 2 ← Collision Occurs
Collision Resolution Techniques
◦ Collision Resolution Techniques are used to resolve or handle the
  collision.
◦ Collision resolution techniques are classified as-
 Separate Chaining
◦ To handle the collision,
  ◦   This technique creates a linked list to the slot for which collision occurs.
  ◦   The new key is then inserted into the linked list.
  ◦   These linked lists to the slots appear like chains.
  ◦   That is why this technique is called separate chaining.
◦ Time Complexity-
  ◦ For Searching-
      ◦ In the worst case, all the keys might map to the same bucket of the hash table. In such a case, all the keys
        will be present in a single linked list.
      ◦ Sequential search will have to be performed on the linked list to perform the search. So, the time taken for
        searching in the worst case is O(n).
  ◦ For Deletion-
      ◦ In the worst case, the key might have to be searched first and then deleted. In the worst case, the time
        taken for searching is O(n). So, the time taken for deletion in the worst case is O(n).
                   𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒆𝒍𝒆𝒎𝒆𝒎𝒕𝒔 𝒑𝒓𝒆𝒔𝒆𝒏𝒕 𝒊𝒏 𝒕𝒉𝒆 𝒉𝒂𝒔𝒉 𝒕𝒂𝒃𝒍𝒆
◦ Load Factor (α) =
                           𝑻𝒐𝒕𝒂𝒍 𝒔𝒊𝒛𝒆 𝒐𝒇 𝒕𝒉𝒆 𝒉𝒂𝒔𝒉 𝒕𝒂𝒃𝒍𝒆
◦ If Load factor (α) = constant, then time complexity of Insert, Search, Delete = Θ(1)
Using the hash function ‘key mod 7’, insert the following sequence of keys in the hash
table- 50, 700, 76, 85, 92, 73 and 101
Applying the hash function to the
   given numbers,
       h(50) = 50 % 7 = 1
       h(700) = 700 % 7 = 0
       h(76) = 76 % 7 = 6
       h(85) = 85 % 7 = 1
       h(92) = 92 % 7 = 1
       h(73) = 73 % 7 = 3
       h(101) = 101 % 7 = 3
Open Addressing
◦ In open addressing,
 ◦ Unlike separate chaining, all the keys are stored inside the hash table.
  Thus, a bigger hash table is needed.
 ◦ No key is stored outside the hash table.
 ◦ If a collision occurs, alternative cells are tried until an empty cell/bucket is
  found.
◦ Techniques used for open addressing are-
 ◦ Linear Probing
 ◦ Quadratic Probing
 ◦ Double Hashing
Linear Probing
◦ In case of a collision, it probes through the hash table
  linearly, looking for an empty slot, and assigns the free slot to
  the value.
◦ Let hash(x) be the slot index computed using the hash
  function and S be the table size
 ◦ If slot hash(x) % S is full, then we try (hash(x) + 1) % S
 ◦ If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
 ◦ If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
◦ Clustering:
◦ The main problem with linear probing is clustering, many
  consecutive elements form groups, and it starts taking time to
  find a free slot or to search for an element
Using the hash function ‘key mod 7’, insert the following sequence of keys in
the hash table- 50, 700, 76, 85, 92, 73 and 101. Use a linear probing technique
for collision resolution.
Applying the hash function to the given numbers,
                                             Array Index
  S. NO.   Key      Hash       Array Index   after linear     0    700
                                               Probing
                                                              1     50
     1     50    50 % 7 = 1        1               1
     2     700   700 % 7 = 0       0               0
                                                              2     85
     3     76    76 % 7 = 6        6               6          3     92
     4     85    85 % 7 = 1        1               2          4     73
     5     92    92 % 7 = 1        1               3
                                                              5    101
     6     73    73 % 7 = 3        3               4
     7     101   101 % 7 = 3       3               5          6     76
Quadratic Probing
◦ Quadratic Probing eliminates the primary clustering problem of linear
  probing.
◦ Collision function is quadratic.
  ◦ When a collision occurs, we probe for the i2 th bucket in ith iteration.
  ◦ We keep probing until an empty bucket is found.
◦ If the hash function evaluates to h and a search in cell h is
  inconclusive, we try cells h + 12, h+22, … h + i2.
  ◦ i.e., It examines cells 1, 4, 9 and so on away from the original probe.
◦ let hash(x)be the slot index computed using hash function.
  ◦ If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
  ◦ If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
  ◦ If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
Using the hash function ‘key mod 7’, insert the following sequence of keys in the hash table-
50, 700, 76, 85, 92, 73 and 101. Use a Quadratic Probing technique for collision resolution.
Applying the hash function to the given numbers,
                                                                   Array
                               Array   Quadratic Probing in
                                                                   Index     0    700
  S. NO.   Key      Hash                                            after
                               Index    case of collision
                                                                 Quadratic   1    50
                                                                  Probing
    1      50    50 % 7 = 1     1                                   1
                                                                             2    85
    2      700   700 % 7 = 0    0                                   0        3    73
    3      76    76 % 7 = 6     6                                   6        4    101
                                       (85 % 7 + 1*1) % 7 = 2
    4      85    85 % 7 = 1     1                                   2        5    92
    5      92    92 % 7 = 1     1      (92 % 7 + 2*2) % 7 = 5       5        6    76
    6      73    73 % 7 = 3     3                                   3
    7      101   101 % 7 = 3    3      (101 % 7 + 1*1) % 7 = 4      4
Double Hashing
◦ The double hashing technique uses two hash functions. The
  second hash function comes into use when the first function
  causes a collision. It provides an offset index to store the value.
◦ We use another hash function hash2(x) and look for i * hash2(x)
  slot in i’th rotation.
◦ The function hash2(x) must never evaluate to zero.
◦ let hash(x) be the slot index computed using hash function.
 ◦ If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
 ◦ If (hash(x)+1*hash2(x))%S is full, then try (hash(x) + 2*hash2(x)) % S
 ◦ If (hash(x) + 2*hash2(x)) % S is full, then try (hash(x) + 3*hash2(x)) % S
◦ A function such as hash2(x) = R – ( x mod R) with R is a number
  that must be smaller than the size of the hash table and should be
  coprime with the size of the table.
Let the size of the                                                                   Array
                                                   Arra   Double Hashing in           Index    0
hash    table     be    11.    S. Ke    Hash         y     case of collision           after
                              NO. y H1(K) = K % 11 Inde                                        1    34
                                                                                     Double
Using       the        hash                          x    (H1(K) + i * H2(K)) % 11
                                                                                     Hashing   2
function ‘H1(K) = K %          1   20   20 % 11 = 9   9                                9       3    56
11’ and ‘H2(K) = 8 - (K        2   34   34 % 11 = 1   1                                1       4    45
%    8)’,   insert      the                                H1(45)= 45%11=1                     5
                               3   45   45 % 11 = 1   1   H2(45)= 8-(45%8)=3           4
following     sequence                                      (1+1*3)%11=4                       6    70
of keys in the hash                                        H1(70)= 70%11=4                     7
                               4   70   70 % 11= 4    4   H2(70)= 8-(70%8)=2           6       8
table- 20, 34, 45, 70,                                      (4+1*2)%11=6
                                                                                               9    20
and 56. Use a Double                                       H1(56)= 56%11=1
                                                          H2(56)= 8-(56%8)=8                   10
Hashing technique for          5   56   56 % 11 = 1   1     (1+1*8)%11=9               3
                                                            (1+2*8)%11=6
collision    resolution.
                                                            (1+3*8)%11=3