0% found this document useful (0 votes)
14 views29 pages

Data Structures-3

The document outlines a syllabus on graph theory, covering basic terminology, types of graphs, and their representations, including adjacency matrices and lists. It explains graph traversal algorithms, specifically breadth-first search (BFS) and depth-first search (DFS), with examples and complexity analysis. Additionally, it discusses various graph terminologies such as directed and undirected graphs, cycles, and connected graphs.

Uploaded by

archana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views29 pages

Data Structures-3

The document outlines a syllabus on graph theory, covering basic terminology, types of graphs, and their representations, including adjacency matrices and lists. It explains graph traversal algorithms, specifically breadth-first search (BFS) and depth-first search (DFS), with examples and complexity analysis. Additionally, it discusses various graph terminologies such as directed and undirected graphs, cycles, and connected graphs.

Uploaded by

archana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

14-05-2025

Syllabus
• Graphs: Basic Terminology, Types of Graphs, Graphs
representations – adjacency matrix, adjacency list; Traversals
Module-2 - breath first search and depth first search; Applications of
graphs.
Unit-2: Graphs
• Hashing: Introduction, Different hash functions, collision:
avoidance and handling methods.

Graphs-Introduction Why are Graphs Useful?


• A graph is an abstract data structure that is used to implement the • Graphs are widely used to model any situation where entities or
mathematical concept of graphs. things are related to each other in pairs.
• It is basically a collection of vertices (also called nodes) and edges that • For example, the following information can be represented by graphs:
connect these vertices. • Family trees in which the member nodes have an edge from parent to
• A graph is often viewed as a generalization of the tree structure, each of their children.
where instead of having a purely parent-to-child relationship between • Transportation networks in which nodes are airports, intersections,
tree nodes, any kind of complex relationship can exist ports, etc.
• The edges can be airline flights, one-way roads, shipping routes, etc.

1
14-05-2025

Definition Types of graphs


• A graph can be directed or undirected.
• In an undirected graph, edges do not have any direction associated with
• A graph G is defined as an ordered set (V, E), where V(G) represents them.
the set of vertices and E(G) represents the edges that connect these • That is, if an edge is drawn between nodes A and B, then the nodes can be
vertices. traversed from A to B as well as from B to A.
• Fig shows a graph with • In a directed graph, edges form an ordered
𝑉(𝐺) = {𝐴, 𝐵, 𝐶, 𝐷 𝑎𝑛𝑑 𝐸} and pair.
𝐸(𝐺) = {(𝐴, 𝐵), (𝐵, 𝐶), (𝐴, 𝐷), (𝐵, 𝐷), (𝐷, 𝐸), (𝐶, 𝐸)}. • If there is an edge from A to B, then there is a
Note that there are five vertices or nodes and six edges path from A to B but not from B to A.
in the graph. • The edge (A, B) is said to initiate from node A
(also known as initial node) and terminate at
node B (terminal node).

Graph Terminology Graph Terminology


• Adjacent nodes or neighbours For every edge, e = (u, v) that connects
nodes u and v, the nodes u and v are the end-points and are said to
be the adjacent nodes or neighbours. • Regular graph It is a graph where each vertex has the same number
of neighbours. That is, every node has the same degree.
• Degree of a node Degree of a node u, deg(u), is the total number of
edges containing the node u. • A regular graph with vertices of degree k is called a k–regular graph or
a regular graph of degree k.
• If deg(u) = 0, it means that u does not belong to any edge and such a
node is known as an isolated node.
• Path A path P written as P = {𝑣 , 𝑣 , 𝑣 , ..., 𝑣 ), of length n from a
node u to v is defined as a sequence of (n+1) nodes.
• Here, u = 𝑣 , v = 𝑣 and 𝑣 is adjacent to 𝑣 for i = 1, 2, 3, ..., n.

2
14-05-2025

Graph Terminology Graph Terminology


• Closed path A path P is known as a closed path if the edge has the • Connected graph A graph is said to be connected if for any two
same end-points. That is, if 𝑉 = 𝑉 . vertices (u, v) in V there is a path from u to v.
• Simple path A path P is known as a simple path if all the nodes in the • That is, there are no isolated nodes in a connected graph.
path are distinct with an exception that 𝑽𝟎 may be equal to 𝑽𝒏 . • A connected graph that does not have any cycle is called a tree.
• If 𝑉 = 𝑉 , then the path is called a closed simple path. Therefore, a tree is treated as a special graph
• Cycle A path in which the first and the last vertices are same.
• A simple cycle has no repeated edges or vertices (except the first and
last vertices).

Graph Terminology Graph Terminology


• Complete graph A graph G is said to be complete • Labelled graph or weighted graph A graph is said to be labelled if
if all its nodes are fully connected. every edge in the graph is assigned some data.
• That is, there is a path from one node to every • In a weighted graph, the edges of the graph are assigned some weight
other node in the graph. or length.
• A complete graph has n(n–1)/2 edges, where n is • The weight of an edge denoted by w(e) is a positive value which
the number of nodes in G. indicates the cost of traversing the edge.
• Clique In an undirected graph G = (V, E), clique is
a subset of the vertex set 𝐶 ⊆ 𝑉, such that for
every two vertices in C, there is an edge that
connects two vertices.

3
14-05-2025

Graph Terminology Directed graphs


• Multiple edges Distinct edges which connect • A directed graph G, also known as a digraph, is a graph in which every
the same end-points are called multiple edges. edge has a direction assigned to it.
• That is, e = (u, v) and e' = (u, v) are known as • An edge of a directed graph is given as an ordered pair (u, v) of nodes
multiple edges of G. in G.
• Loop An edge that has identical end-points is • For an edge (u, v):
called a loop. • The edge begins at u and terminates at v.
• That is, e = (u, u). • u is known as the origin or initial point of e. Correspondingly, v is
• Multi-graph A graph with multiple edges known as the destination or terminal point of e.
and/or loops is called a multi-graph. • u is the predecessor of v. Correspondingly, v is the successor of u.
• Size of a graph The size of a graph is the total • Nodes u and v are adjacent to each other.
number of edges in it.

Digraph Terminology Digraph Terminology


• Out-degree of a node The out-degree of a node u, written as • Reachability A node v is said to be reachable from node u, if and only
outdeg(u), is the number of edges that originate at u. if there exists a (directed) path from node u to node v.
• In-degree of a node The in-degree of a node u, written as indeg(u), is • Strongly connected directed graph A digraph is said to be strongly
the number of edges that terminate at u. connected if and only if there exists a path between every pair of
• Degree of a node The degree of a node, written as deg(u), is equal to nodes in G. That is, if there is a path from node u to v, then there
the sum of the in-degree and out-degree of that node. Therefore, must be a path from node v to u.
deg(u) = indeg(u) + outdeg(u).
• Unilaterally connected graph A digraph is said to be unilaterally
• Source A node u is known as a source if it has a positive out-degree
but a zero in-degree. connected if there exists a path between any pair of nodes u, v in G
such that there is a path from u to v or a path from v to u, but not
• Sink A node u is known as a sink if it has a positive in-degree but a both.
zero out-degree.

4
14-05-2025

Digraph Terminology
• Weakly connected digraph A directed graph is said to be weakly
REPRESENTATION OF GRAPHS
connected if it is connected by ignoring the direction of edges.
That is, in such a graph, it is possible to reach any node from any • There are three common ways of storing graphs in the computer’s
other node by traversing edges in any direction (may not be in
the direction they point). The nodes in a weakly connected
memory.
directed graph must have either out-degree or in-degree of at • They are:
least 1.
• Sequential representation by using an adjacency matrix.
• Simple directed graph A directed graph G is said to be a simple
directed graph if and only if it has no parallel edges. However, a • Linked representation by using an adjacency list that stores the
simple directed graph may contain cycles with the exception that neighbours of a node using a linked list.
it cannot have more than one loop at a given node.
• Adjacency multi-list which is an extension of linked representation.
• Parallel/Multiple edges Distinct edges which connect the same
end-points are called multiple edges. That is, e = (u, v) and e' =
(u, v) are known as multiple edges of G. In Fig. 13.5(a), e3 and e5
are multiple edges connecting nodes C and D.

Adjacency Matrix representation


• An adjacency matrix represents which nodes are adjacent to one another. Adjacency Matrix representation
• By definition, two nodes are said to be adjacent if there is an edge connecting them. • Since an adjacency matrix contains only 0s and 1s, it is called a bit matrix or a Boolean
• In a directed graph G, if node v is adjacent to node u, then there is an edge from u to v. matrix.
• For any graph G having n nodes, the adjacency matrix will have the dimension of n X n. • The entries in the matrix depend on the ordering of the nodes in G. Therefore, a change in
the order of the nodes will result in a different adjacency matrix.

• In an adjacency matrix, the rows and columns are labelled by graph vertices. An entry
aij in the adjacency matrix will contain 1, if vertices vi and vj are adjacent to each other.
• However, if the nodes are not adjacent, aij will be set to zero.

5
14-05-2025

Adjacency List Representation Adjacency Multi-list Representation


• modified version of adjacency lists
• An adjacency list is another way • Adjacency multi-list is an edge-based rather than a vertex-based
in which graphs can be representation of graphs.
represented in the computer’s
memory. • A multi-list representation consists of two parts—a directory of
• This structure consists of a list of nodes’ information and a set of linked lists storing information about
all nodes in G. edges.
• Furthermore, every node is in
turn linked to its own list that
contains the names of all other
nodes that are adjacent to it.

GRAPH TRAVERSAL ALGORITHMS Breadth-First Search Algorithm


• Also called Level Order Traversal
• The method of examining the nodes and edges of the graph is known • BFS is a graph search algorithm that begins at the root node and explores all the
as graph traversal. neighbouring nodes.
• Then for each of those nearest nodes, their unvisited neighbour nodes are explored, and
• There are two standard methods of graph traversal: so on, until it finds the goal.
1. Breadth-first search(BFS)
• The purpose of the algorithm is to mark each vertex as visited while avoiding cycles.
2. Depth-first search(DFS)
The algorithm works as follows:
• While BFS uses a queue as an auxiliary data structure to store nodes 1. Start by putting any one of the graph's vertices at the back of a queue.
for further processing, the DFS scheme uses a stack. 2. Take the front item of the queue and add it to the visited list.
3. Create a list of that vertex's adjacent nodes. Add the ones which aren't in the visited
list to the back of the queue.
4. Keep repeating steps 2 and 3 until the queue is empty.

6
14-05-2025

Example Example
• We use an undirected graph with 5 vertices. • start from vertex 0, the BFS algorithm starts by putting it in the Visited
list and putting all its adjacent vertices in the queue.

Example Example
• Next, visit the element at the front of queue i.e. 1 and go to its • Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the
adjacent nodes. Since 0 has already been visited, visit 2 instead. back of the queue and visit 3, which is at the front of the queue.

7
14-05-2025

Example Example
• Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the • Only 4 remains in the queue since the only adjacent node of 3 i.e. 0 is
back of the queue and visit 3, which is at the front of the queue. already visited. We visit it.

Example-2 Example-2
• Find the BFS traversal for the given node. • Find the BFS traversal for the given node.
Consider A as Source node Consider A as Source node

Visit A and write its adjacent nodes B, C, and D in the queue Next, visit the first node from the Queue B and write its adjacent nodes E in queue

Visited A Visited A B

Queue B C D Queue C D E

8
14-05-2025

Example-2 Example-2
• Find the BFS traversal for the given node. • Find the BFS traversal for the given node.
Consider A as Source node Consider A as Source node

Next, visit the first node from the Queue C and insert its unvisited adjacent nodes G in the queue Next, visit the first node from the Queue D and insert its unvisited adjacent nodes in the queue

Visited A B C Visited A B C D

Queue D E G Queue E G

Example-2 Example-2
• Find the BFS traversal for the given node. • Find the BFS traversal for the given node.
Consider A as Source node Consider A as Source node

Visited A B C D E Visited A B C D E G

Queue G F Queue F H I

9
14-05-2025

Example-2 Example-2
• Find the BFS traversal for the given node. • Find the BFS traversal for the given node.
Consider A as Source node Consider A as Source node

Visited A B C D E G F Visited A B C D E G F H

Queue H I Queue I

Example-2 BFS Algorithm Complexity


• Find the BFS traversal for the given node. • The time complexity of the BFS algorithm is represented in the form
Consider A as Source node
of O(V + E), where V is the number of nodes and E is the number of
edges.

Visited A B C D E G F H I • The space complexity of the algorithm is O(V).

Queue

BFS Traversal of the given graph: A,B,C,D,E,G,F,H,I

10
14-05-2025

BFS Algorithm Applications Depth-first Search Algorithm


• To build index by search index • The depth-first search algorithm progresses by expanding the starting node
of G and then going deeper and deeper until the goal node is found, or
• For GPS navigation until a node that has no children is encountered.
• Cycle detection in an undirected graph • When a dead-end is reached, the algorithm backtracks, returning to the
most recent node that has not been completely explored.
• In the minimum spanning tree
The DFS algorithm works as follows:
• Breadth-first search can be used to solve many problems such as:
1. Start by putting any one of the graph's vertices on top of a stack.
• Finding all connected components in a graph G. 2. Take the top item of the stack and add it to the visited list.
• Finding all nodes within an individual connected component. 3. Create a list of that vertex's adjacent nodes. Add the ones which aren't in
• Finding the shortest path between two nodes, u and v, of a graph. the visited list to the top of the stack.
4. Keep repeating steps 2 and 3 until the stack is empty.

Example Example
• Given an undirected graph with 5 vertices. • Start from vertex 0, the DFS algorithm starts by putting it in the
Visited list and putting all its adjacent vertices in the stack.

11
14-05-2025

Example Example
• Next, visit the element at the top of the stack, i.e. 1 and go to its • Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the
adjacent nodes. Since 0 has already been visited, we visit 2 instead. top of the stack and visit it.

Example Example
• Vertex 2 has an unvisited adjacent vertex in 4, so we add that to the • After we visit the last element 3, it doesn't have any unvisited adjacent
top of the stack and visit it. nodes, so we have completed the Depth First Traversal of the graph.

12
14-05-2025

Example-2 Example-2
• Find the DFS traversal for the given node. • Find the DFS traversal for the given node.
Consider A as Source node Consider A as Source node

Visited A Visited A B

Stack Stack

TOP B TOP E
C C
D D

Example-2 Example-2
• Find the DFS traversal for the given node. • Find the DFS traversal for the given node.
Consider A as Source node Consider A as Source node

Visited A B E Visited A B E C

Stack Stack

C G
F F
C C
D D

13
14-05-2025

Example-2 Example-2
• Find the DFS traversal for the given node. • Find the DFS traversal for the given node.
Consider A as Source node Consider A as Source node

Visited A B E C G Visited A B E C G F
C

F Stack H Stack
H
H
I
I
F
F
C
C
D
D

Example-2 Example-2
• Find the DFS traversal for the given node. • Find the DFS traversal for the given node.
Consider A as Source node Consider A as Source node

Visited A B E C G F H Visited A B E C G F H
E
I Stack I Stack
H H
I I
F F
C C
D D

14
14-05-2025

Example-2 Example-2
• Find the DFS traversal for the given node. • Find the DFS traversal for the given node.
Consider A as Source node Consider A as Source node

Visited A B E C G F H I Visited A B E C G F H I

Stack Stack
H
I I
F F
C C
D D

Example-2 Example-2
• Find the DFS traversal for the given node. • Find the DFS traversal for the given node.
Consider A as Source node Consider A as Source node

Visited A B E C G F H I Visited A B E C G F H I

Stack Stack

F
C C
D D

15
14-05-2025

Example-2 Example-2
• Find the DFS traversal for the given node. • Find the DFS traversal for the given node.
Consider A as Source node Consider A as Source node

Visited A B E C G F H I Visited A B E C G F H I D

Stack Stack

D D

Applications of DFS
• Space complexity The space complexity of a depth-first search is • Depth-first search is useful for:
lower than that of a breadth-first search. • Finding a path between two specified nodes, u and v, of an
• Time complexity The time complexity of a depth-first search is unweighted graph.
proportional to the number of vertices plus the number of edges in • Finding a path between two specified nodes, u and v, of a weighted
the graphs that are traversed. graph.
• The time complexity can be given as (O(|V| + |E|)). • Finding whether a graph is connected or not.
• Computing the spanning tree of a connected graph.

16
14-05-2025

INTRODUCTION
• We discussed two search algorithms: linear search and binary search.
• Linear search has a running time proportional to O(n), while binary
Module-2 search takes time proportional to O(log n),
Where, n is the number of elements in the array.
Unit-2 Hashing
• Binary search and binary search trees are efficient algorithms to
search for an element.
• But what if we want to perform the search operation in time
proportional to O(1)?

Search an array in constant time Solution -1(Example)


• There are two solutions to this problem. • In a small company of 100 employees, each employee is assigned an
Emp_ID in the range 0–99.
1. To take individual values of the data to represent a unique index for
• To store the records in an array, each employee’s Emp_ID acts as an index
each record. into the array where the employee’s record.
2. Hashing. • In this case, we can directly access the record of any employee.
• But practically, this implementation is hardly feasible.
• Let us assume that the same company uses a five-digit Emp_ID as the
primary key.
• In this case, key values will range from 00000 to 99999.
• If we want to use the same technique as above, we need an array of size
100,000, of which only 100 elements will be used.

1
14-05-2025

Contd.. Solution-2 (Hashing)


• It is impractical to waste so much storage space just to ensure that
each employee’s record is in a unique and predictable location. • In the second solution, the elements are not stored according to the value
• Whether we use a two-digit primary key (Emp_ID) or a five-digit key, of the key.
there are just 100 employees in the company. • So, in this case, we need a way to convert a five-digit key number to a two-
• Thus, we will be using only 100 locations in the array. digit array index.
• Therefore, in order to keep the array size down to the size that we • We need a function which will do the transformation.
will actually be using (100 elements), another good option is to use • In this case, we will use the term hash table for an array and the function
just the last two digits of the key to identify each employee. that will carry out the transformation will be called a hash function.
• For example, the employee with Emp_ID 79439 will be stored in the • Hashing is the process of converting a given key into an index in a hash
element of the array with index 39. table using a hash function, so that the data can be inserted, searched, or
deleted in constant average time O(1).
• Similarly, the employeevwith Emp_ID 12345 will have his record
stored in the array at the 45th location.

Hash Table
• Hash table is a data structure in which keys are mapped to array Cont..
positions by a hash function.
• In the example discussed here we will use a hash function that • However, when the set K of keys that are actually used is smaller than
extracts the last two digits of the key. the universe of keys (U), a hash table consumes less storage space.
• Therefore, we map the keys to array locations or array indices. • The storage requirement for a hash table is O(k), where k is the
• A value stored in a hash table can be searched in O(1) time by using a number of keys actually used.
hash function which generates an address from the key (by producing • In a hash table, an element with key k is stored at index h(k) and not
the index of the array where the value is stored). k.
• It means a hash function h is used to calculate the index at which the
element with key k will be stored.

2
14-05-2025

HASH FUNCTIONs Types of Hash functions


• A hash function is a mathematical formula which, when applied to a • Division Method
key, produces an integer which can be used as an index for the key in • Multiplication Method
the hash table.
• Mid-Square Method
• It produces a unique set of integers within some suitable range in
order to reduce the number of collisions. • Folding Method
• In practice, there is no hash function that eliminates collisions
completely.
• A good hash function can only minimize the number of collisions by
spreading the elements uniformly throughout the array.

Division Method Implementation of division method


• It is the simplest method of hashing an integer x. int const M = 97; // a prime number
• This method divides x by M and then uses the remainder obtained.
• In this case, the hash function can be given as h(x) = x mod M.
int h (int x)
• However, extra care should be taken to select a suitable value for M. {
• For example, suppose M is an even number then h(x) is even if x is even and h(x) return (x % M);
is odd if x is odd.
• Generally, it is best to choose M to be a prime number because making M a }
prime number increases the likelihood that the keys are mapped with a
uniformity in the output range of values.
• M should also be not too close to the exact powers of 2.
• If we have ℎ(𝑥) = 𝑥 𝑚𝑜𝑑 2 then the function will simply extract the lowest k
bits of the binary representation of x.

3
14-05-2025

Drawbacks Example
• A potential drawback of the division method is that while using this • Calculate the hash values of keys 1234 and 5462.
method, consecutive keys map to consecutive hash values. • Solution Setting M = 97, hash values can be calculated as:
• On one hand, this is good as it ensures that consecutive keys do not • h(1234) = 1234 % 97 = 70
collide, but on the other, it also means that consecutive array
locations will be occupied. • h(5642) = 5642 % 97 = 16
• This may lead to degradation in performance.

Multiplication Method Example


• The steps involved in the multiplication method are as follows:
• Step 1: Choose a constant A such that 0 < A < 1.
• Step 2: Multiply the key k by A.
• Step 3: Extract the fractional part of kA.
• Step 4: Multiply the result of Step 3 by the size of hash table (m).
• Hence, the hash function can be given as:

4
14-05-2025

Mid-Square Method Cont..


• The mid-square method is a good hash function which works in two • In the mid-square method, the same r digits must be chosen from all
steps: the keys.
• Step 1: Square the value of the key. That is, find 𝑘 . • Therefore, the hash function can be given as:
• Step 2: Extract the middle r digits of the result obtained in Step 1. • h(k) = s
• The algorithm works well because most or all digits of the key value
contribute to the result. • where s is obtained by selecting r digits from 𝑘 .
• This is because all the digits in the original key value contribute to
produce the middle digits of the squared value.
• Therefore, the result is not dominated by the distribution of the
bottom digit or the top digit of the original key value.

Example Folding Method


• The folding method works in the following two steps:
• Step 1: Divide the key value into a number of parts. That is, divide k
into parts k1, k2, ..., kn, where each part has the same number of
digits except the last part which may have lesser digits than the other
parts.
• Step 2: Add the individual parts. That is, obtain the sum of k1 + k2 + ...
+ kn. The hash value is produced by ignoring the last carry, if any.

5
14-05-2025

Example COLLISIONS
• Collisions occur when the hash function maps two different keys to
the same location.
• Obviously, two records cannot be stored in the same location.

Collision avoidance Collision avoidance


1. Good Hash Function Design:
Ensures uniform distribution of keys. 4. Convert complex keys into simpler numeric keys (e.g., converting
• Minimizes clustering. strings to integers).
• Examples: Division method, Multiplication method,Mid-square method. 5. Uniform Key Distribution:
2. Table Size Selection: • Ensuring that keys are uniformly random or spread over a large range
• Choose table size (m) to be a prime number to reduce patterns in key to reduce clustering.
distribution.
3. Load Factor Control:
• Load factor (λ) = number of elements / table size
• Keep load factor low to reduce collisions.
• Typically, rehash when λ > 0.7.Preprocessing Keys

6
14-05-2025

Collision Resolution by Open Addressing


Collision handling • Also called closed hashing.
• Computes new positions using a probe sequence and the next record is stored in
that position.
• A method used to solve the problem of collision, is called collision
resolution technique. • In this technique, all the values are stored in the hash table.
• The hash table contains two types of values: sentinel values (e.g., –1) and data
• The two most popular methods of resolving collisions are: values.
• 1. Open addressing • The presence of a sentinel value indicates that the location contains no data value
at present but can be used to hold a value.
• 2. Chaining
• When a key is mapped to a particular memory location, then the value it holds is
checked.
• If it contains a sentinel value, then the location is free and the data value can be
stored in it.
• However, if the location already has some data value stored in it, then other slots
are examined systematically in the forward direction to find a free slot.
• If even a single free location is not found, then we have an OVERFLOW condition.

1. Linear Probing Example


• If a value is already stored at a location generated by h(k), then the
following hash function is used to resolve the collision:

• Where m is the size of the hash table, h¢(k) = (k mod m), and i is the
probe number that varies from 0 to m–1.
• Therefore, for a given key k, first the location generated by [h¢(k) mod
m] is probed because for the first time i=0.
• If the location is free, the value is stored in it, else the second probe
generates the address of the location given by [h¢(k) + 1]mod m.

7
14-05-2025

Example Example

Example Example

8
14-05-2025

2. Quadratic Probing Example


• In this technique, if a value is already stored at a location generated
by h(k), then the following hash function is used to resolve the
collision:

• Where, m is the size of the hash table, h’(k) = (k mod m),


• i is the probe number that varies from 0 to m–1, and
• c1 and c2 are constants such that c1 and c2 π 0.

Example Example

9
14-05-2025

3. Double Hashing Example


• Double hashing uses one hash value and then repeatedly steps forward an
interval until an empty location is reached.
• The interval is decided using a second, independent hash function hence
the name double hashing.
• In double hashing, we use two hash functions rather than a single function.
• The hash function in the case of double hashing can be given as:

• Where 𝑚 is the size of the hash table, ℎ (𝑘) and ℎ (𝑘) are two hash functions given as
ℎ (𝑘) = 𝑘 𝑚𝑜𝑑 𝑚
ℎ (𝑘) = 𝑘 𝑚𝑜𝑑 𝑚′,
• 𝑖 is the probe number that varies from 0 to m–1, and 𝑚′ is chosen to be less than m.
• We can choose 𝑚′ = 𝑚– 1 𝑜𝑟 𝑚– 2.

Example Example

10
14-05-2025

Example
Rehashing
• When the hash table becomes nearly full, the number of collisions
increases, thereby degrading the performance of insertion and search
operations.
• Now T[6] is occupied, so we cannot store the key 101 in T[6]. • In such cases, a better option is to create a new hash table with size double
• Therefore, try again for the next location with probe i = 2.
of the original hash table.
• Repeat the entire process until a vacant location is found. • All the entries in the original hash table will then have to be moved to the
new hash table.
• we have to probe many times to insert the key 101 in the hash table.
• Although double hashing is a very efficient algorithm, it always requires m to be a • This is done by taking each entry, computing its new hash value, and then
prime number. inserting it in the new hash table.
• In our case m=10, which is not a prime number, hence, the degradation in • Though rehashing seems to be a simple process, it is quite expensive and
performance. must therefore not be done frequently.
• Thus, we can say that the performance of the technique is sensitive to the value
of m.

Example
Collision Resolution by Chaining
• In chaining, each location in a hash table stores a pointer to a linked
list that contains all the key values that were hashed to that location.
• That is, location l in the hash table points to the head of the linked list
of all the key values that hashed to l.
• However, if no key value hashes to l, then location l in the hash table
contains NULL.

11
14-05-2025

Example Example

Advantages and disadvantages of Hashing APPLICATIONS OF HASHING


• One advantage of hashing is that no extra space is required to store the index as • Hash tables are widely used in situations where enormous amounts
in the case of other data structures. of data have to be accessed to quickly search and retrieve
• In addition, a hash table provides fast data access and an added advantage of information.
rapid updates.
• Database indexing.
• On the other hand, the primary drawback of using the hashing technique for
inserting and retrieving data values is that it usually lacks locality and sequential • Some database management systems store a separate file known as
retrieval by key. the index file. When data has to be retrieved from a file, the key
• This makes insertion and retrieval of data values even more random. information is first searched in the appropriate index file which
• All the more, choosing an effective hash function is more of an art than a science.
references the exact record location of the data in the database file.
This key information in the index file is often stored as a hashed value.
• It is not uncommon (in open-addressed hash tables) to create a poor hash
function.

12
14-05-2025

APPLICATIONS OF HASHING
• To implement compiler symbol tables in C++. The compiler uses a
• symbol table to keep a record of all the user-defined symbols in a C++
program. Hashing facilitates
• the compiler to quickly look up variable names and other attributes
associated with symbols.
• Hashing is also widely used for Internet search engines.

13

You might also like