An Introduction To
Hashing
ES-103 (Data Structure)
Dept of Computer Science & Engineering
Indian Institute of Information Technology, Design and Manufacturing
Jabalpur (MP)
Motivation
Search Time O(1)
Other Data Structures:
Linked-List
Array …………….
Search Time is O(N)
Tree
Search Time O(log N)
N: no. of elements
The Search Problem
Find items with keys matching a given search key
Given an array A, containing n keys, and a search key
x, find the index i such as x=A[i]
Direct Addressing
Assumption
Key values are distinct
Each key is drawn from a universe U = {0, 1, . . . ,m-1}
Idea
Store the items in an array, indexed by keys
Direct-address table representation:
An array T[0 . . . m - 1]
Each slot, or position, in T corresponds to a key in U
For an element x with key k, a pointer to x (or x itself)
will be placed in location T[k]
If there are no elements with key k in the set, T[k] is
empty, represented by NIL
Direct Addressing
Direct Addressing
Alg.: DIRECT-ADDRESS-SEARCH(T, k)
return T[k]
Alg.: DIRECT-ADDRESS-INSERT(T, x)
T[x. key ]= x
Alg.: DIRECT-ADDRESS-DELETE(T, x)
T[x. key] = NIL
Running time for these operations: O(1)
Direct Addressing :
Example 1:
Example 2:
Hash Tables
When K is much smaller than U, a hash table requires
much less space than a direct-address table
Can reduce storage requirements to |K|
Can still get O(1) search time, but on the average case,
not the worst case
Idea
Use a function h to compute the slot for each key
Store the element in slot h(k)
Hash Tables
A hash function h transforms a key into an index in a hash
table T[0…m-1]:
h : U → {0, 1, . . . , m - 1}
• We say that k hashes to slot h(k)
• Advantages:
Reduce the range of array indices handled: m
instead of |U|
Storage is also reduced
Example: Hash Tables
U
(universe of keys) h(k1)
h(k4)
k1
k4 k2 h(k2) = h(k5)
K
(actual k k3 h(k3)
5
keys)
m-1
Revisit Example 2 0
1
• Suppose that the keys are nine-digit 2
Social Security numbers (ssn) 3
4
• Hash Function : h(x) = x mod 100 5
(Table Size) T[0-99] 6
7
h(ssn)=ssn mod 100 (Last 2 digits of ssn)
8
for example if ssn = 101234511 then 9
h(101234511) =101234511mod100= 11 10
11 101234511
.
.
.
.
99
Problem with this approach?
Revisit Example 2 0
1
• Suppose that the keys are nine-digit 2
Social Security numbers (ssn) 3
4
• Hash Function : h(x) = x mod 100 5
(Table Size) T[0-99] 6
7
h(ssn)=ssn mod 100 (Last 2 digits of ssn)
8
for example if ssn = 101234511 then 9
h(101234511) = 101234511mod100=11 10
11 101234511
For ssn = 111456811 then .
.
h(111456811)= 111456811mod 100 = 11
.
.
Collision ?........ 99
Collision
• Two or more keys hash to the same slot!!
• For a given set K of keys
If |K| ≤ m, collision may or may not happen, depending on
the hash function
If |K| > m, collisions will definitely happen (i.e., there
must be at least two keys that have the same hash value)
Avoiding collisions completely is hard, even with a good hash
function
Handling Collisions
• Chaining
• Open addressing
Linear probing
Quadratic probing
Double hashing
Handling Collisions Using Chaining
• Idea
Put all elements that hash to the same slot into a linked
list:
Slot j contains a pointer to the head of the list of all elements that hash to j
Example
• M=10
• Keys = • Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
• Hash Function (HF) : Key mod M
• Collision Resolution Technique : Chaining
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0
1
72 Null
2
4
25 NULL
5
6 16 NULL
7
27 NULL
8 28 NULL
9
99 NULL
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0
1
72 Null
2
4
55 25 NULL
5
6 16 NULL
7
27 NULL
8 28 NULL
9
99 NULL
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0
1
72 Null
2
4
55 25 NULL
5
6 16 NULL
7
27 NULL
8 28 NULL
9 99
69 NULL
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0
1
72 Null
2
4
55 25
NULL
5
6 16 NULL
7
27 NULL
8 28
48 NULL
9 99
69 NULL
Examples :
• Keys = 25, 99, 16, 27, 28, 72, 55, 69, 48, 75
0
1
72 Null
2
4
25
75 55 NULL
5
6 16 NULL
7
27 NULL
8 28
48 NULL
9 99
69 NULL
Insertion in Hash Tables
Alg.: CHAINED-HASH-INSERT(T, x)
insert x at the head of list T[h(x. key)]
• Assumes that the element being inserted isn’t already in
the list
• It would take an additional search to check if it was
already inserted
• Worst-case running time is O(1)
Deletion in Hash Tables
Alg.: CHAINED-HASH-DELETE(T, x)
delete x from the list T[h(x. key)]
• Need to find the element to be deleted.
• Worst-case running time:
Deletion depends on searching the corresponding
list
Searching in Hash Tables
Alg.: CHAINED-HASH-SEARCH(T, k)
• search for an element with key k in list T[h(k)]
• Running time is proportional to the length of the list of
elements in slot h(k)
Hashing with Chaining: Worst Case
T
• How long does it take to search
0
for an element with a given key?
• Worst case:
All n keys hash to the same
slot
Worst-case time to search is
(n), plus time to compute the
chain
hash function
m-1
Hashing with Chaining: Worst Case
• Hashing with chaining uses extra space, despite the space
is present in the hash table itself.
Hash Function
• A hash function transforms a key into a table address
• What make a good hash function
Easy to compute
Approximates a random function: for every input,
every output is equally likely (simple uniform hashing)
Minimize the chance that closely related keys
hash to the same slot.
The Division Method
• Idea:
Map a key k into one of the m slots by taking the
remainder of k divided by m
h(k) = k mod m
• Advantage:
fast, requires only one operation
• Disadvantage:
Certain values of m are bad, e.g.,
power of 2
non-prime numbers
Example :
0
• m= 10 (0 - 9) Hash Table
1
H(key) = key mod m
key = 1234569 mod 10 = 9 2
9 1234569
Example :
m (8)23
00101mod 23 101
01101mod 23 101 Collision
10101mod 2 101
3
11101mod 23 101
i.e.
m 2 k
1010111001110101010101mod 2 101010101
k
K-bit LSB K-bit
The Division Method m m
K
• Choose m to be a prime, not close 97 100
to a power of 2
Column 2: k mod 97
Column 3: k mod 100
Open Addressing
• Idea:
If we have enough contiguous memory to store all
the keys (m > N) store the keys in the table itself
No need to use linked lists anymore
Basic idea:
Insertion: if a slot is full, try another one, until
you find an empty one
Search: follow the same sequence of probes
Deletion: Difficult .
• Search time depends on the length of the
probe sequence!
Common Open Addressing Methods
• Linear probing
• Quadratic probing
• Double hashing
Linear probing: Inserting a key
• Idea
when there is a collision, check the next
available position in the table (i.e.,
probing)
h(k,i) = (h1(k) + i) mod m
i=0,1,2,...
First slot probed: h1(k)
Second slot probed: h1(k) + 1
Third slot probed: h1(k)+2, and so on
probe sequence:
< h1(k), h1(k)+1 , h1(k)+2 , ....>
• Can generate m probe sequences
maximum
Example : Linear Probing
m=10 {0-9}
0 10
H(key) = key mod m 1 60
Keys : 10,15, 99, 60, 75, 19, 35, 44
2
LP(10,0) ( H (10) 0) mod10 (0 0) mod10 0
3
LP(15,0) ( H (15) 0) mod10 (5 0) mod10 5
LP(99,0) ( H (99) 0) mod10 (9 0) mod10 9 4
5 15
LP(60,0) ( H (60) 0) mod10 (0 0) mod10 0
Collision ?.... 6
7
LP(60,1) ( H (60) 1) mod10 (0 1) mod10 1 8
9 99
Example : Linear Probing
m=10 {0-9}
0 10
H(key) = key mod m 1 60
Keys : 10,15, 99, 60, 75, 19, 35, 44
2 19
LP(75,0) ( H (75) 0) mod10 (5 0) mod10 5
3
Collision ?....
LP(75,1) ( H (75) 1) mod10 (5 1) mod10 6 4
LP(19,0) ( H (19) 0) mod10 (9 0) mod10 9 15
Collision ?...
5
LP(19,1) ( H (19) 1) mod10 (9 1) mod10 0 6 75
Collision ?...
LP(19,2) ( H (19) 2) mod10 (9 2) mod10 1 7
Collision ?... 8
LP(19,3) ( H (19) 3) mod10 (9 3) mod10 2 99
9
Example : Linear Probing
m=10 {0-9}
0 10
H(key) = key mod m 1 60
Keys : 10,15, 99, 60, 75, 19, 35, 44
2 19
LP(35,0) ( H (35) 0) (5 0) mod10 5 3
Collision ?.....
LP(35,1) ( H (35) 1) (5 1) mod10 6 4 44
Collision ?..... 5 15
LP(35,2) ( H (35) 2) (5 2) mod10 7
6 75
LP(44,0) ( H (44) 0) (4 0) mod10 4
7 35
8
9 99
Searching: Linear Probing 0 10
• Find element 19
1 60
H (19) 19 mod10 9 2 19
3
• Note:
In linear probing we are probing 4 44
(attempting) one by one slot 5 15
To search an element in the table
Best Case = O(1) 6 75
If empty slot reach and not find element 7 35
Then element is not there
worst case O(m) 8
We are not wasting space in the form of 9 99
linked list.
Deletion: Linear Probing 0 10
• In linear probing deletion is difficult because
1 60
deletion of one element, create trouble to other
element 2 19
e.g. delete 60
3
Find element 19 ?..
4 44
• Solution 5 15
Mark the slot with a sentinel value 6 75
DELETED
The deleted slot can later be used for 7 35
insertion 8
Searching will be able to find all the keys
9 99
Hashing
Search Time O(1)
Direct Address Table
Idea:
Store the items in an array, indexed by keys
Advantage:
Search Time: O(1)
Insertion Time: O(1)
Deletion Time: O(1)
Disadvantage:
High memory Requirement
Hashing
Hash Table
Idea:
Transform the key into one of the memory location in
Hash Table using hash function.
Advantage:
Reduced memory location.
Disadvantage:
Collision.
Hashing
collision Resolution Technique:
Chaining
Idea:
Store the keys in linked list, pointing to the same slot
Advantage:
Any number of collision can be handled
Disadvantage:
Use of extra space, despite the space is present in
hash table itself
• Open Addressing
Linear Probing, Quadratic Probing, Double hashing
Linear Probing : Discussion 0
h ' : U {0,1..., m 1}
h(k , i ) (h ' (k ) i ) mod m for i= 0,1,…,m-1 1
'
2
• Given key k, we first probe T [h (k )]
• We next probe T [h (k )] 1 and so on up to
' 3
T [m 1] .
• Then, we wrap around the
.
T [0], T [1],..., T [h (k ) 1]
'
.
• m distinct probe sequences possible, because the
initial probe determines the entire probe sequence. .
m-1
Linear Probing 10
m = 10 {0-9}
0
6 full
H(k) = k mod m 1 11
Slot
Keys = 10, 11, 22, 33, 20, 30, 40
2 22
50 Primary
3 33
Clustering
4 20
5 30 Average
Search
6 40 Time
• Linear Probing suffers from 7 increases
Primary Clustering
8
9
Quadratic probing
• Quadratic probing introduced to solve the problem of
Primary Clustering.
H(k)=k mod m
QP(key, i) ( H (key) C1i C2i 2 ) mod m
i {0,1,2,...,m 1}
C1 and C2 are positive constant
Example 1:Quadratic probing 0 10
m = 10 {0-9}
1
H(k)=k mod m
Key=10,15, 74, 20, 35, 45, 92 2 20
CRT= QP, C1 =1, C2 = 1
3
QP (10 ,0) ( H (10 ) 1.0 1.0 2 ) mod 10 0
4 74
QP (15 ,0) ( H (15 ) 1.0 1.0 ) mod 10 5
2
QP (74 ,0) ( H (74 ) 1.0 1.0 2 ) mod 10 4 5 15
QP (20 ,0) ( H (20 ) 1.0 1.0 2 ) mod 10 0 6
Collision ?....... 7
QP (20 ,1) ( H (20 ) 1.1 1.12 ) mod 10 2
8
9
Example1 :Quadratic probing 0 10
m = 10 {0-9}
1 45
H(k)=k mod m
Key=10,15, 74, 20, 35, 45, 92 2 20
CRT= QP 3
QP (35,0) ( H (35 ) 1.0 1.0 2 ) mod 10 5 4 74
Collision ?.......
5 15
QP (35,1) ( H (35 ) 1.1 1.12 ) mod 10 7
6
QP (45,0) ( H (45 ) 1.0 1.0 2 ) mod 10 5
Collision ?....... 7 35
QP (45,1) ( H (45 ) 1.1 1.12 ) mod 10 7 8
Collision ?.......
9
QP (45,2) ( H (45 ) 1.2 1.2 2 ) mod 10 1
Example1 :Quadratic probing 0 10
m = 10 {0-9}
1 45
H(k)=k mod m
Key=10,15, 74, 20, 35, 45, 92 2 20
CRT= QP 3
QP (92 ,0) ( H (92 ) 1.0 1.0 2 ) mod 10 2 4 74
Collision ?.......
5 15
QP (92 ,1) ( H (92 ) 1.1 1.12 ) mod 10 4
Collision ?....... 6
QP (92 ,2) ( H (92 ) 1.2 1.2 2 ) mod 10 8 7 35
8 92
9
Example 2.:Quadratic probing 0 22
m = 11 {0-10}
1
H(k)=k mod m
Key=10,22, 31, 4, 15, 18, 17, 88, 59 2
CRT= QP, C1 =1, C2 = 1
3
QP (10 ,0) ( H (10 ) 1.0 1.0 2 ) mod 11 10 4 4
QP ( 22 ,0) ( H ( 22 ) 1.0 1.0 2 ) mod 11 0
5
QP (31,0) ( H (31) 1.0 1.0 2 ) mod 11 9
QP (4,0) ( H (4) 1.0 1.0 ) mod 11 4
2 6 15
QP (15,0) ( H (15 ) 1.0 1.02 ) mod 11 4
7
Collision ?....... 8
QP (15,1) ( H (15 ) 1.1 1.12 ) mod 11 6 9 31
Example 2:Quadratic probing 0 22
m = 11 {0-10}
H(k)=k mod m
1
Key=10,22, 31, 4, 15, 18, 17, 88, 59 2 88
CRT= QP
3
QP (18,0) ( H (18 ) 1.0 1.0 2 ) mod 11 7
4 4
QP (17 ,0) ( H (17 ) 1.0 1.0 2 ) mod 11 6
Collision ?.......
5
QP (17 ,1) ( H (17 ) 1.1 1.12 ) mod 11 8 6 15
QP (88,0) ( H (88 ) 1.0 1.02 ) mod 11 0 7 18
Collision ?.......
8 17
QP (88,1) ( H (88 ) 1.1 1.12 ) mod 11 2 9 31
Example 2:Quadratic probing 0 22
m = 11 {0-10}
1
H(k)=k mod m
Key=10,22, 31, 4, 15, 18, 17, 88, 59 2 88
CRT= QP 3
QP (59 ,0) ( H (59 ) 1.0 1.0 2 ) mod 11 4 4
Collision ?.......
4
QP (59 ,1) ( H (59 ) 1.1 1.12 ) mod 11 6 5 59
Collision ?....... 6 15
QP (59 ,2) ( H (59 ) 1.2 1.2 2 ) mod 11 10 7 18
Collision ?....... 8 17
QP (59 ,3) ( H (59 ) 1.3 1.32 ) mod 11 5 9 31
Quadratic Probing
• Quadratic probing uses hash function of the form
h(k , i) (h ' (k ) C1i C2i 2 ) mod m
where C1 and C2 are positive constants and i={0,1,2,…,m-1}
• The initial position probed is T [h ' (k )] later positions
probed are offset by amounts that depend in a quadratic
manner on the probe number i.
Quadratic Probing 0 10
•If two keys have the same initial probe position,
then their probe sequences are the same, 1 45
since h(k1 ,0) h(k 2 ,0) 2 20
implies h(k1 , i ) h(k2 , i )
• This property leads to Secondary Clustering.
3
m = 10 {0-9} 4 74
H(k)=k mod m 5 15
Key=10,15, 74, 20, 35, 45, 92
CRT= QP 6
7 35
• Similar to linear probing, the initial probe 8 92
determines the entire sequence and so only m
distinct probe sequences are used. 9
Linear and Quadratic
m = 10 {0-9}
H(k)=k mod m
Key=10,15, 74, 20,
35, 45, 92, 30
CRT= QP
Primary Cluster
Secondary Cluster
Linear Probing Quadratic Probing
Double hashing
• Double hashing is a hash function of the form
h(k , i) (h1 (k ) ih2 (k )) mod m
i {0,1,...,m 1}
here,
h1(k) = k mod m and 2 h ( k ) 1 ( k mod m '
)
Where m ' is chosen to be slightly less than m
• m2 probe sequences are used, rather than m, since each
possible (h1(k), h2(k)) pair yields a distinct probe sequence.
• Double hashing introduced to solve the problem of
Secondary Clustering.
Example 1 : Double hashing 0
• m = 13 1 79
2
Key = 79, 69, 72, 98, 50, 14
3
h1(key) = key mod m
4 69
h2(key) = 1+(key mod (m-3) )
5
CRT= Double Hashing
6
H (key, i) (h1 (key) ih2 (key)) mod 13 7 72
8
H (79,0) (h1 (79 ) 0.h2 (79 )) mod 13 1
9
H (69,0) (h1 (69 ) 0.h2 (69 )) mod 13 4
10
H (72,0) (h1 (72 ) 0.h2 (72 )) mod 13 7
11
12
Example 1: Double hashing 0
• m = 13 1 79
2
Key = 79, 69, 72, 98, 50, 14
3 98
h1(k) = k mod m
4 69
h2(k) = 1+(k mod (m-3) )
5
CRT= Double Hashing
6
H (98,0) (h1 (98) 0.h2 (98)) mod 13 7 7 62
Collision ?.... 8
H (98,1) (h1 (98) 1.h2 (98)) mod 13 3 9
10
H (50,0) (h1 (50 ) 0.h2 (50 )) mod 13 11
11 50
12
Example 1: Double hashing 0
• m = 13 1 79
2
Key = 79, 69, 72, 98, 50, 14
3 98
h1(k) = k mod m
4 69
h2(k) = 1+(k mod (m-3) )
5
CRT= Double Hashing
6 14
H (14,0) (h1 (14 ) 0.h2 (14 )) mod 13 1 7 62
Collision ?.... 8
H (14,1) (h1 (14 ) 1.h2 (14 )) mod 13 6 9
10
11 50
12
Example 2: Double hashing 0 22
• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5
CRT= Double Hashing
6
H (key, i) (h1 (key) ih2 (key)) mod 11 7
8
H (10,0) (h1 (10 ) 0.h2 (10 )) mod 11 10
9 31
H (22,0) (h1 (22 ) 0.h2 (22 )) mod 11 0 10 10
H (31,0) (h1 (31) 0.h2 (31)) mod 11 9
H (4,0) (h1 (4) 0.h2 (4)) mod 11 4
Example 2: Double hashing 0 22
• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6
H (key, i) (h1 (key) ih2 (key)) mod 11 7
H (15,0) (h1 (15) 0.h2 (15)) mod 11 4 8
9 31
Collision ?.....
H (15,1) (h1 (15) 1.h2 (15)) mod 11 10 10 10
Collision ?.....
H (15,2) (h1 (15) 2.h2 (15)) mod 11 5
Example 2: Double hashing 0 22
• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6 17
H (key, i) (h1 (key) ih2 (key)) mod 11 7 18
H (18,0) (h1 (18) 0.h2 (18)) mod 11 7 8
9 31
H (17 ,0) (h1 (17 ) 0.h2 (17 )) mod 11 6 10 10
Example 2: Double hashing 0 22
• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6 17
H (key, i) (h1 (key) ih2 (key)) mod 11 7 18
H (88,0) (h1 (88) 0.h2 (88)) mod 11 0 8
Collision ?..... 9 31
H (88,1) (h1 (88) 1.h2 (88)) mod 11 9 10 10
Collision ?.....
H (88,2) (h1 (88) 2.h2 (88)) mod 11 7
Collision ?.....
Example 2: Double hashing 0 22
• m = 11 1
2
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3 88
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6 17
H (key, i) (h1 (key) ih2 (key)) mod 11 7 18
H (88,3) (h1 (88) 3.h2 (88 )) mod 11 5 8
9 31
Collision ?.....
10
H (88,4) (h1 (88) 4.h2 (88 )) mod 11 3 10
Example 2: Double hashing 0 22
• m = 11 1
2 59
Key = 10, 22, 31, 4, 15, 18, 17, 88, 59
3 88
h1(k) = k mod m
4 4
h2(k) = 1+(k mod (m-1) )
5 15
CRT= Double Hashing
6 17
H (key, i) (h1 (key) ih2 (key)) mod 11 7 18
H (59,0) (h1 (59 ) 0.h2 (59 )) mod 11 4 8
9 31
Collision ?.....
10
H (59,1) (h1 (59 ) 1.h2 (59 )) mod 11 3 10
Collision ?.....
H (59,2) (h1 (59 ) 2.h2 (59 )) mod 11 2