Data Structures & Algorithms
Week 3 - Hashing (HashTables, HashMaps, Dictionaries)
Subodh Sharma, Rahul Garg
{svs,rahulgarg}@iitd.ac.in
1
Recap: Hash Functions
2
Recap: Hash Functions
• A hash function H takes a key k and generates a hash value H(k) = i
2
Recap: Hash Functions
• A hash function H takes a key k and generates a hash value H(k) = i
• Hash Table HT stores key-value pair (k,v) at HT[I]
2
Recap: Hash Functions
• A hash function H takes a key k and generates a hash value H(k) = i
• Hash Table HT stores key-value pair (k,v) at HT[I] k v
HT
i (k, v)
H
2
Recap: Hash Functions
• A hash function H takes a key k and generates a hash value H(k) = i
• Hash Table HT stores key-value pair (k,v) at HT[I] k v
HT
• Motivation: Fast storage and retrieval
i (k, v)
H
2
Recap: Hash Functions
• A hash function H takes a key k and generates a hash value H(k) = i
• Hash Table HT stores key-value pair (k,v) at HT[I] k v
HT
• Motivation: Fast storage and retrieval
i (k, v)
H
Sorted Array wit
LL, no dup Hash Table
h Binary Search
Insert O(n) O(log n) O(1)
Delete O(n) O(log n) O(1)
Contains O(n) O(log n) O(1)
2
Collision (Recap)
Collision (Recap)
• There are ~500 students in the COL106 class. Assume we have an HT of 500.
Collision (Recap)
• There are ~500 students in the COL106 class. Assume we have an HT of 500.
• Hash function: last two digits of the entry # of students
Collision (Recap)
• There are ~500 students in the COL106 class. Assume we have an HT of 500.
• Hash function: last two digits of the entry # of students
• Where does 2022CS10110 and 2022CS50310 go?
Collision (Recap)
• There are ~500 students in the COL106 class. Assume we have an HT of 500.
• Hash function: last two digits of the entry # of students
• Where does 2022CS10110 and 2022CS50310 go?
• To location 10 in HT. This is COLLISION!
Collision (Recap)
• There are ~500 students in the COL106 class. Assume we have an HT of 500.
• Hash function: last two digits of the entry # of students
• Where does 2022CS10110 and 2022CS50310 go?
• To location 10 in HT. This is COLLISION!
• Collision Resolution:
Collision (Recap)
• There are ~500 students in the COL106 class. Assume we have an HT of 500.
• Hash function: last two digits of the entry # of students
• Where does 2022CS10110 and 2022CS50310 go?
• To location 10 in HT. This is COLLISION!
• Collision Resolution:
• Open Hashing: Chaining
Collision (Recap)
• There are ~500 students in the COL106 class. Assume we have an HT of 500.
• Hash function: last two digits of the entry # of students
• Where does 2022CS10110 and 2022CS50310 go?
• To location 10 in HT. This is COLLISION!
• Collision Resolution:
• Open Hashing: Chaining
• Closed Hashing: Linear probing, Quadratic probing, Double Hashing
A Good Hash Function: Reqs
A Good Hash Function: Reqs
• Uniform Distribution: Distribution of keys uniformly across the HT
• This minimises collision, improves HT utilisation!
A Good Hash Function: Reqs
• Uniform Distribution: Distribution of keys uniformly across the HT
• This minimises collision, improves HT utilisation!
• Collision Resistant: Computationally infeasible to nd
x, y : H(x) = H(y) ∧ x ≠ y
fi
A Good Hash Function: Reqs
• Uniform Distribution: Distribution of keys uniformly across the HT
• This minimises collision, improves HT utilisation!
• Collision Resistant: Computationally infeasible to nd
x, y : H(x) = H(y) ∧ x ≠ y
• Deterministic and Fast computation
fi
A Good Hash Function: Reqs
• Uniform Distribution: Distribution of keys uniformly across the HT
• This minimises collision, improves HT utilisation!
• Collision Resistant: Computationally infeasible to nd
x, y : H(x) = H(y) ∧ x ≠ y
• Deterministic and Fast computation
• Using all of the input data: Every part of input a ects the output hash
• ∃i ∈ ℕ : xi ≠ yi ⇒ P(H(x) ≠ H(y)) > 0
ff
fi
A Good Hash Function: Reqs
• Uniform Distribution: Distribution of keys uniformly across the HT
• This minimises collision, improves HT utilisation!
• Collision Resistant: Computationally infeasible to nd
x, y : H(x) = H(y) ∧ x ≠ y
• Deterministic and Fast computation
• Using all of the input data: Every part of input a ects the output hash
• ∃i ∈ ℕ : xi ≠ yi ⇒ P(H(x) ≠ H(y)) > 0
• Dynamic: Dynamic resizing of HT should be possibles
ff
fi
Cryptographic Hashing vs Hashing
Cryptographic Hashing vs Hashing
• Cryptographic hashes require additional properties
Cryptographic Hashing vs Hashing
• Cryptographic hashes require additional properties
• Preimage Resistance (One Way property): Let H(x) = h. Given h, it is
computationally intractable to nd x
fi
Cryptographic Hashing vs Hashing
• Cryptographic hashes require additional properties
• Preimage Resistance (One Way property): Let H(x) = h. Given h, it is
computationally intractable to nd x
• Second Preimage Resistance: Given x1, it should be computationally infeasible
to nd x2 s.t. x1 ≠ x2 ∧ H(x1) = H(x2)
fi
fi
Cryptographic Hashing vs Hashing
• Cryptographic hashes require additional properties
• Preimage Resistance (One Way property): Let H(x) = h. Given h, it is
computationally intractable to nd x
• Second Preimage Resistance: Given x1, it should be computationally infeasible
to nd x2 s.t. x1 ≠ x2 ∧ H(x1) = H(x2)
• Avalanche e ect: Small change in the input produces signi cant change in the
output
fi
ff
fi
fi
Cryptographic Hashing vs Hashing
• Cryptographic hashes require additional properties
• Preimage Resistance (One Way property): Let H(x) = h. Given h, it is
computationally intractable to nd x
• Second Preimage Resistance: Given x1, it should be computationally infeasible
to nd x2 s.t. x1 ≠ x2 ∧ H(x1) = H(x2)
• Avalanche e ect: Small change in the input produces signi cant change in the
output
• ∀x, y : dH(x, y) = 1 ⇒ P(H(x)j ≠ H(y)j) ≥ 0.5
fi
ff
fi
fi
Collision Resolution (1): Chaining
Collision Resolution (1): Chaining
• For nd/insert/delete an element e
fi
Collision Resolution (1): Chaining
• For nd/insert/delete an element e
• Use H(.) to look up the position of e in HT
fi
Collision Resolution (1): Chaining
• For nd/insert/delete an element e
• Use H(.) to look up the position of e in HT
• Find/insert/delete in the linked list of the
hashed slot
fi
Collision Resolution (1): Chaining
• For nd/insert/delete an element e 0 0 (k,v) 0 (k’,v’)
1
• Use H(.) to look up the position of e in HT 2
• Find/insert/delete in the linked list of the 3 3 (k,v) 3 (k’,v’)
hashed slot 4
fi
Collision Resolution (1): Chaining
• For nd/insert/delete an element e 0 0 (k,v) 0 (k’,v’)
1
• Use H(.) to look up the position of e in HT 2
• Find/insert/delete in the linked list of the 3 3 (k,v) 3 (k’,v’)
hashed slot 4
• Analysis:
fi
Collision Resolution (1): Chaining
• For nd/insert/delete an element e 0 0 (k,v) 0 (k’,v’)
1
• Use H(.) to look up the position of e in HT 2
• Find/insert/delete in the linked list of the 3 3 (k,v) 3 (k’,v’)
hashed slot 4
• Analysis:
• Assume that time to compute H(k) is Θ(1)
fi
Collision Resolution (1): Chaining
• For nd/insert/delete an element e 0 0 (k,v) 0 (k’,v’)
1
• Use H(.) to look up the position of e in HT 2
• Find/insert/delete in the linked list of the 3 3 (k,v) 3 (k’,v’)
hashed slot 4
• Analysis:
• Assume that time to compute H(k) is Θ(1)
• Load factor of an HT: α = n/m with m
slots holding n elements
fi
Collision Resolution (1): Chaining
• For nd/insert/delete an element e 0 0 (k,v) 0 (k’,v’)
1
• Use H(.) to look up the position of e in HT 2
• Find/insert/delete in the linked list of the 3 3 (k,v) 3 (k’,v’)
hashed slot 4
• Analysis: • Use of Load factor?
• Assume that time to compute H(k) is Θ(1)
• Load factor of an HT: α = n/m with m
slots holding n elements
fi
Collision Resolution (1): Chaining
• For nd/insert/delete an element e 0 0 (k,v) 0 (k’,v’)
1
• Use H(.) to look up the position of e in HT 2
• Find/insert/delete in the linked list of the 3 3 (k,v) 3 (k’,v’)
hashed slot 4
• Analysis: • Use of Load factor?
• Assume that time to compute H(k) is Θ(1) • Performance Indication: α
approaches 1, the probability of
collision increases
• Load factor of an HT: α = n/m with m
slots holding n elements
fi
Collision Resolution (1): Chaining
• For nd/insert/delete an element e 0 0 (k,v) 0 (k’,v’)
1
• Use H(.) to look up the position of e in HT 2
• Find/insert/delete in the linked list of the 3 3 (k,v) 3 (k’,v’)
hashed slot 4
• Analysis: • Use of Load factor?
• Assume that time to compute H(k) is Θ(1) • Performance Indication: α
approaches 1, the probability of
collision increases
• Load factor of an HT: α = n/m with m
slots holding n elements • Threshold for resizing: When
α > 0.7, HT is doubled in size
fi
Collision Resolution (1): Chaining
Collision Resolution (1): Chaining
• Analysis:
Collision Resolution (1): Chaining
• Analysis:
• Assume that time to compute H(k) is Θ(1)
Collision Resolution (1): Chaining
• Analysis:
• Assume that time to compute H(k) is Θ(1)
• Load factor of an HT: α = n/m with m slots holding
n elements
Collision Resolution (1): Chaining
• Analysis:
• Assume that time to compute H(k) is Θ(1)
• Load factor of an HT: α = n/m with m slots holding
n elements
• Expected number of elements to be examined: α
Collision Resolution (1): Chaining
• Analysis:
• Assume that time to compute H(k) is Θ(1)
• Load factor of an HT: α = n/m with m slots holding
n elements
• Expected number of elements to be examined: α
• Search time for unsuccessful search: O(1 + α)
Collision Resolution (1): Chaining
• Analysis:
• Assume that time to compute H(k) is Θ(1)
• Load factor of an HT: α = n/m with m slots holding
n elements
• Expected number of elements to be examined: α
• Search time for unsuccessful search: O(1 + α)
• Search time for successful search:
n (i − 1)
Σi=1 m
1+ ≃ Θ(1 + α)
n
Collision Resolution (1): Chaining
• Analysis:
• Assume that time to compute H(k) is Θ(1)
• Load factor of an HT: α = n/m with m slots holding
n elements
• Expected number of elements to be examined: α
• Search time for unsuccessful search: O(1 + α)
• Search time for successful search:
n (i − 1)
Σi=1 m
1+ ≃ Θ(1 + α)
n
• Upon insertion of i-th element: expected length of
the list: (i-1)/m
Collision Resolution (1): Chaining
• Analysis:
• Assume that time to compute H(k) is Θ(1) 0 0 (k,v) 0 (k’,v’)
1
• Load factor of an HT: α = n/m with m slots holding
n elements 2
3 3 (k,v) 3 (k’,v’)
• Expected number of elements to be examined: α 4
• Search time for unsuccessful search: O(1 + α)
• Search time for successful search:
n (i − 1)
Σi=1 m
1+ ≃ Θ(1 + α)
n
• Upon insertion of i-th element: expected length of
the list: (i-1)/m
Collision Resolution (1): Chaining
• Analysis:
• Assume that time to compute H(k) is Θ(1) 0 0 (k,v) 0 (k’,v’)
1
• Load factor of an HT: α = n/m with m slots holding
n elements 2
3 3 (k,v) 3 (k’,v’)
• Expected number of elements to be examined: α 4
• Search time for unsuccessful search: O(1 + α) • Assumption: We use a simple
• Search time for successful search: uniform hash function
n (i − 1)
Σi=1 m
1+ ≃ Θ(1 + α)
n
• Upon insertion of i-th element: expected length of
the list: (i-1)/m
Collision Resolution (2): Linear Probing
Collision Resolution (2): Linear Probing
• Each slot can accommodate one element (i.e. n ≤ m)
• It uses less memory than Chaining, but slower
• Algorithm: Insert
Collision Resolution (2): Linear Probing
• Each slot can accommodate one element (i.e. n ≤ m)
• It uses less memory than Chaining, but slower
• Algorithm: Insert
Collision Resolution (2): Linear Probing
• Each slot can accommodate one element (i.e. n ≤ m) 0 H(x)
1
• It uses less memory than Chaining, but slower 2
• Algorithm: Insert 3
4
Collision Resolution (2): Linear Probing
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Insert keys: 18, 41, 22, 44, 59, 32, 31
• Successful search: Go to the hash value and continue until the location
• Unsuccessful search: Go the hash value and continue until the end of the array OR an
empty location
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Insert keys: 18, 41, 22, 44, 59, 32, 31
• Successful search: Go to the hash value and continue until the location
• Unsuccessful search: Go the hash value and continue until the end of the array OR an
empty location
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Insert keys: 18, 41, 22, 44, 59, 32, 31
• Successful search: Go to the hash value and continue until the location
• Unsuccessful search: Go the hash value and continue until the end of the array OR an
empty location
18
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Insert keys: 18, 41, 22, 44, 59, 32, 31
• Successful search: Go to the hash value and continue until the location
• Unsuccessful search: Go the hash value and continue until the end of the array OR an
empty location
41 18
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Insert keys: 18, 41, 22, 44, 59, 32, 31
• Successful search: Go to the hash value and continue until the location
• Unsuccessful search: Go the hash value and continue until the end of the array OR an
empty location
41 18 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Insert keys: 18, 41, 22, 44, 59, 32, 31
• Successful search: Go to the hash value and continue until the location
• Unsuccessful search: Go the hash value and continue until the end of the array OR an
empty location
44
41 18 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Insert keys: 18, 41, 22, 44, 59, 32, 31
• Successful search: Go to the hash value and continue until the location
• Unsuccessful search: Go the hash value and continue until the end of the array OR an
empty location
44
41 18 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Insert keys: 18, 41, 22, 44, 59, 32, 31
• Successful search: Go to the hash value and continue until the location
• Unsuccessful search: Go the hash value and continue until the end of the array OR an
empty location
31
44
41 18 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Insert keys: 18, 41, 22, 44, 59, 32, 31
• Successful search: Go to the hash value and continue until the location
• Unsuccessful search: Go the hash value and continue until the end of the array OR an
empty location
31
44
41 18 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (2): Linear Probing
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Keys: 18, 41, 22, 44, 59, 32, 31
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Keys: 18, 41, 22, 44, 59, 32, 31
• Deletion Algorithm: Go to the hash value and mark it (X)
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Keys: 18, 41, 22, 44, 59, 32, 31
• Deletion Algorithm: Go to the hash value and mark it (X)
31
44
41 18 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Keys: 18, 41, 22, 44, 59, 32, 31
• Deletion Algorithm: Go to the hash value and mark it (X)
31
44
41 18 X 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Keys: 18, 41, 22, 44, 59, 32, 31
• Deletion Algorithm: Go to the hash value and mark it (X)
• Lookup ignores the marked cells; Insert uses the marked cells to place the value
31
44
41 18 X 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Keys: 18, 41, 22, 44, 59, 32, 31
• Deletion Algorithm: Go to the hash value and mark it (X)
• Lookup ignores the marked cells; Insert uses the marked cells to place the value
31
44
41 18 X 22
0 1 2 3 4 5 6 7 8 9 10 11 12
• Q: What happens to the e ciency of lookup if there are too many marks?
ffi
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 13
• Keys: 18, 41, 22, 44, 59, 32, 31
• Deletion Algorithm: Go to the hash value and mark it (X)
• Lookup ignores the marked cells; Insert uses the marked cells to place the value
31
44
41 18 X 22
0 1 2 3 4 5 6 7 8 9 10 11 12
• Q: What happens to the e ciency of lookup if there are too many marks?
• Q: What is the solution to the above problem?
ffi
Collision Resolution (2): Linear Probing
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 16
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 16
• HT has 16 slots
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 16
• HT has 16 slots
• Assume the values 0 to 15 are represented as Bit Vectors (ie. 0000 to 1111)
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 16
• HT has 16 slots
• Assume the values 0 to 15 are represented as Bit Vectors (ie. 0000 to 1111)
• What problem do you foresee?
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 16
• HT has 16 slots
• Assume the values 0 to 15 are represented as Bit Vectors (ie. 0000 to 1111)
• What problem do you foresee?
• The value returned by H is solely dependent on the least signi cant 4 bits of the
key
fi
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 16
• HT has 16 slots
• Assume the values 0 to 15 are represented as Bit Vectors (ie. 0000 to 1111)
• What problem do you foresee?
• The value returned by H is solely dependent on the least signi cant 4 bits of the
key
• Therefore the results will be poorly distributed!
fi
Collision Resolution (2): Linear Probing
• Example: H(k) = k mod 16
• HT has 16 slots
• Assume the values 0 to 15 are represented as Bit Vectors (ie. 0000 to 1111)
• What problem do you foresee?
• The value returned by H is solely dependent on the least signi cant 4 bits of the
key
• Therefore the results will be poorly distributed!
• EXERCISE: Try with di erent table sizes, di erent levels of loading (the number of
keys inserted), and di erent input distributions.
ff
ff
ff
fi
Collision Resolution (3): Double Hashing
Collision Resolution (3): Double Hashing
• Use two hash functions : H1, H2
Collision Resolution (3): Double Hashing
• Use two hash functions : H1, H2
• H1(k) : Is the position in the table where we rst check for key k
fi
Collision Resolution (3): Double Hashing
• Use two hash functions : H1, H2
• H1(k) : Is the position in the table where we rst check for key k
• H2(k) : Determines the o set when we search for k
ff
fi
Collision Resolution (3): Double Hashing
• Use two hash functions : H1, H2
• H1(k) : Is the position in the table where we rst check for key k
• H2(k) : Determines the o set when we search for k
• In Linear Probing H2(k) is 1
ff
fi
Collision Resolution (3): Double Hashing
• Use two hash functions : H1, H2
• H1(k) : Is the position in the table where we rst check for key k
• H2(k) : Determines the o set when we search for k
• In Linear Probing H2(k) is 1
• Insert Algorithm:
ff
fi
Collision Resolution (3): Double Hashing
• Use two hash functions : H1, H2
• H1(k) : Is the position in the table where we rst check for key k
• H2(k) : Determines the o set when we search for k
• In Linear Probing H2(k) is 1
• Insert Algorithm:
• Distributes keys more uniformly
ff
fi
Collision Resolution (3): Double Hashing
• Use two hash functions : H1, H2
• H1(k) : Is the position in the table where we rst check for key k
• H2(k) : Determines the o set when we search for k
• In Linear Probing H2(k) is 1
• Insert Algorithm:
• Distributes keys more uniformly
ff
fi
Collision Resolution (3): Double Hashing
Collision Resolution (3): Double Hashing
• H1(k) : k mod 13
• H2(k) : 8 - (k mod 8)
• Insert keys: 18, 41, 22, 44, 59, 32, 31
Collision Resolution (3): Double Hashing
• H1(k) : k mod 13
• H2(k) : 8 - (k mod 8)
• Insert keys: 18, 41, 22, 44, 59, 32, 31
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (3): Double Hashing
• H1(k) : k mod 13
• H2(k) : 8 - (k mod 8)
• Insert keys: 18, 41, 22, 44, 59, 32, 31
18
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (3): Double Hashing
• H1(k) : k mod 13
• H2(k) : 8 - (k mod 8)
• Insert keys: 18, 41, 22, 44, 59, 32, 31
41 18
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (3): Double Hashing
• H1(k) : k mod 13
• H2(k) : 8 - (k mod 8)
• Insert keys: 18, 41, 22, 44, 59, 32, 31
41 18 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (3): Double Hashing
• H1(k) : k mod 13
• H2(k) : 8 - (k mod 8)
• Insert keys: 18, 41, 22, 44, 59, 32, 31
44
41 18 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (3): Double Hashing
• H1(k) : k mod 13
• H2(k) : 8 - (k mod 8)
• Insert keys: 18, 41, 22, 44, 59, 32, 31
44
41 18 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (3): Double Hashing
• H1(k) : k mod 13
• H2(k) : 8 - (k mod 8)
• Insert keys: 18, 41, 22, 44, 59, 32, 31
44
41 18 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Collision Resolution (3): Double Hashing
• H1(k) : k mod 13
• H2(k) : 8 - (k mod 8)
• Insert keys: 18, 41, 22, 44, 59, 32, 31
31
44
41 18 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Expected Number of Probes
RECOMMENDATIONS
RECOMMENDATIONS
RECOMMENDATIONS
RECOMMENDATIONS
RECOMMENDATIONS
RECOMMENDATIONS
RECOMMENDATIONS
RECOMMENDATIONS