0% found this document useful (0 votes)
39 views20 pages

Adsaa Unit 1

The document provides an overview of algorithm analysis, including definitions, specifications, and performance metrics such as time and space complexity. It discusses various algorithm design techniques, validation, and testing methods, as well as asymptotic notations for analyzing algorithm efficiency. Additionally, it emphasizes the importance of selecting the best algorithm based on performance criteria.

Uploaded by

somaraju parasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views20 pages

Adsaa Unit 1

The document provides an overview of algorithm analysis, including definitions, specifications, and performance metrics such as time and space complexity. It discusses various algorithm design techniques, validation, and testing methods, as well as asymptotic notations for analyzing algorithm efficiency. Additionally, it emphasizes the importance of selecting the best algorithm based on performance criteria.

Uploaded by

somaraju parasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

lOMoARcPSD|17958983

Adsaa UNIT 1

Data Structure (Pragati Engineering College)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by somaraju parasa (somaraju.acg@gmail.com)
lOMoARcPSD|17958983

UNIT-I
Introduction to Algorithm Analysis, Space and Time Complexity analysis, Asymptotic Notations AVL
Trees – Creation, Insertion, Deletion operations and Applications B-Trees – Creation, Insertion, Deletion
operations and Applications
1. Algorithm Definition
An algorithm is a finite set of instructions that, if followed, accomplishes a particular task. In addition, all
algorithms must satisfy the following criteria:
Input: Zero or more quantities are externally supplied. Output: At least one quantity is produced.
Definiteness: Each instruction is clear and unambiguous.
Finiteness: If we trace out the instructions of an algorithm, then for all cases, the algorithm terminates after a
finite number of steps.
Effectiveness: Every instruction must be very basic so that it can be carried out, In principle, by a person
using only pencil and paper.It is not enough that each operation be definite, it also must be feasible.

Four Distinct areas of study of algorithms:


1. How to devise algorithms: Techniques–Divide & Conquer, Branch and Bound, Dynamic Programming
2. How to validate algorithms:
The purpose of the validation is to assure us that this algorithm will work correctly independently of the
issues concerning the programming language it will eventually be written in.
3. How to analyse algorithms:
Analysis algorithms or performance analys is refers to the task of determining how much computing time
and storage time an algorithm required.
4. How to test a program 2 phases
• Debugging- Debugging is the process of executing programs on sample datasets to determine whether
faulty results occur and, if so, to correct them.
• Profiling or performance measurement is the process of executing a correct program on data sets and
measuring the time and space it takes to compute the results.

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

2.Algorithm Specification
In formal computer science, one distinguishes between an algorithm, and a program. A program does not
necessarily satisfy the fourth condition. One important example of such a program for a computer is its
operating system, which never terminates (except for system crashes) but continues in a wait loop until
more jobs are entered.
We represent algorithm using a pseudo language that is a combination of the constructs of a programming
language together with informal English statements.

Algorithm Specification:
Algorithm can be described in three ways.
1. Natural language like English: When this way is choused care should be taken, we should ensure that
each & every statement is definite.
2. Graphic representation called flowchart: This method will work well when the algorithm is small&
simple.
3. Pseudo-code Method: In this method, we should typically describe algorithms as program, which
resembles language like Pascal & algol.
Pseudo-Code Conventions:
1. Comments begin with // and continue until the end of line.
2. Blocks are indicated with matching braces { and }.
3. An identifier begins with a letter. The data types of variables are not explicitly declared.
4. Compound data types can be formed with records.
Here is an example,
Node. Record
{
data type – 1 data-1;
.
.
.
data type – n data – n;
node * link;
}
Here link is a pointer to the record type node. Individual data items of a record can be accessed with → and
period.
5.Assignment of values to variables is done using the assignment statement. :=
<Variable>:=<expression> ;
6. There are two Boolean values TRUE and FALSE.
Logical Operators AND, OR, NOT
Relational Operators <=,>,>=, =, !=
7. The following looping statements are employed.
For, while and repeat-until
While Loop:
While<condition>do
{
<statement-1>
.
.
.
<statement-n>
}

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

For Loop:
For variable:=value-1to value-2 step do
{
<statement-1>
.
.
.
<statement-n>
}
repeat-until:
repeat
<statement-1>
.
.
<statement-n>
until<condition>
8. A conditional statement has the following forms.
 If<condition> then <statement>
 If<condition> then <statement-1> Else <statement-1>
Case statement:
Case
{
: <condition-1> : <statement-1>
.
.
.
: <condition-n> : <statement-n>
: else : <statement-n+1>
}
9. Input and output are done using the instructions read & write.
10. There is only one type of procedure: Algorithm , the heading takes the form,
Algorithm <Name>(<Parameter lists>)

Example:
1. Algorithm selection_sort (a,n)
2. //Sort the array a[1:n] in to non-decreasing order.
3. {
4. For I:=1 to n do
5. {
6. j:=I;
7. for k:=i+1to n do
8. if (a[k]<a[j])
9. t:=a[I];

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

10. a[I]:=a[j];
11. a[j]:=t;
12. }
13. }

3.Performance Analysis
The performance of a program is the amount of computer memory and time needed to run a program. We
use two approaches to determine the performance of a program. One is analytical, and the other
experimental. In performance analysis we use analytical methods, while in performance measurement we
conduct experiments.
Time Complexity: The time needed by an algorithm expressed as a function of the size of a problem is
called the time complexity of the algorithm. The time complexity of a program is the amount of computer
time it needs to run to completion. The limiting behavior of the complexity as size increases is called the
asymptotic time complexity. It is the asymptotic complexity of an algorithm, which ultimately determines
the size of problems that can be solved by the algorithm.
The Running time of a program
When solving a problem, we are faced with a choice among algorithms. The basis for this can be any one of
the following:
i. We would like an algorithm that is easy to understand code and debug.
ii. We would like an algorithm that makes efficient use of the computer’s resources, especially, one that
runs as fast as possible.
Measuring the running time of a program
The running time of a program depends on factors such as:
1. The input to the program.
2. The quality of code generated by the compiler used to create the object program.
3. The nature and speed of the instructions on the machine used to execute the program,
4. The time complexity of the algorithm underlying the program

Space Complexity:
The space complexity of a program is the amount of memory it needs to run to completion. The space need
by a program has the following components:
Data space: Data space is the space needed to store all constant and variable values. Data space has two
components:
➢ Space needed by constants and simple variables in program.
➢ Space needed by dynamically allocated objects such as arrays and class instances.

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

Environment stack space: The environment stack is used to save information needed to resume execution
of partially completed functions.
Instruction Space: Instruction space is the space needed to store the compiled version of the program
instructions. The amount of instructions space that is needed depends on factors such as:
 The compiler used to complete the program into machine code.
 The compiler options in effect at the time of compilation
 The target computer.
The space requirements (p) of any algorithm p may therefore be written as, S(P)=c + Sp (Instance
characteristics)
Where ‘c’ is a constant.
Example 2:
Algorithm sum(a,n)
{
s=0.0;
for I=1to n do
s= s+a[I];
return s;
}
• The problem instances for this algorithm are characterized by n, the number of elements to be summed.
The space needed by ‘n’ is one word, since it is of type integer.
• The space needed by ‘a’ is the space needed by variables of type array of floating-point numbers. This is
at least ‘n’ words, since ‘a’ must be large enough to hold the ‘n’ elements to be summed.
• So, we obtain Sp(sum(n)>=(n+s))[ n for a[ ], one each for n, I a& s]

Complexity of Algorithms
The complexity of an algorithm M is the function f(n) which gives the running time and/or storage space
requirement of the algorithm in terms of the size ‘n’ of the input data. Mostly, the storage space required
by an algorithm is simply a multiple of the data size ‘n’. Complexity shall refer to the running time of the
algorithm.

The function f(n), gives the running time of an algorithm, depends not only on the size ‘n’ of the input data
but also on the particular data.
The complexity function f(n) for certain cases is:
1. Best Case: The minimum possible value of f(n) is called the best case.
2. Average Case: The expected value of f(n).
3. Worst Case: The maximum value of f(n) for any key possible input.

4. Performance measurement
If we want to go from city "A" to city "B", there can be many ways of doing this. We can go by flight, by
bus, by train and also by bicycle. Depending on the availability and convenience, we choose the one which
suits us. Similarly, in computer science, there are multiple algorithms to solve a problem. When we have
more than one algorithm to solve a problem, we need to select the best one. Performance analysis helps us
to select the best algorithm from multiple algorithms to solve a problem.
When there are multiple alternative algorithms to solve a problem, we analyze them and pick the one which
is best suitable for our requirements. The formal definition is as follows...
Performance of an algorithm is a process of making evaluative judgement about algorithms.
It can also be defined as follows...

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

Performance of an algorithm means predicting the resources which are required to an algorithm to perform
its task.
That means when we have multiple algorithms to solve a problem, we need to select a suitable algorithm to
solve that problem.
We compare algorithms with each other which are solving the same problem, to select the best algorithm.
To compare algorithms, we use a set of parameters or set of elements like memory required by that
algorithm, the execution speed of that algorithm, easy to understand, easy to implement, etc.,
Generally, the performance of an algorithm depends on the following elements...
1. Whether that algorithm is providing the exact solution for the problem?
2. Whether it is easy to understand?
3. Whether it is easy to implement?
4. How much space (memory) it requires to solve the problem?
5. How much time it takes to solve the problem? Etc.,
When we want to analyse an algorithm, we consider only the space and time required by that particular
algorithm and we ignore all the remaining elements.
Based on this information, performance analysis of an algorithm can also be defined as follows...
Performance analysis of an algorithm is the process of calculating space and time required by that
algorithm.
Performance analysis of an algorithm is performed by using the following measures...
1. Space required to complete the task of that algorithm (Space Complexity). It includes program space
and data space
2. Time required to complete the task of that algorithm (Time Complexity)
5.Asymptotic notations
What is Asymptotic Notation?
Whenever we want to perform analysis of an algorithm, we need to calculate the complexity of that
algorithm. But when we calculate the complexity of an algorithm it does not provide the exact amount of
resource required. So instead of taking the exact amount of resource, we represent that complexity in a
general form (Notation) which produces the basic nature of that algorithm. We use that general form
(Notation) for analysis process.
Asymptotic notation of an algorithm is a mathematical representation of its complexity.
Note - In asymptotic notation, when we want to represent the complexity of an algorithm, we use only the
most significant terms in the complexity of that algorithm and ignore least significant terms in the
complexity of that algorithm (Here complexity can be Space Complexity or Time Complexity).
For example, consider the following time complexities of two algorithms...
• Algorithm 1: 5n2 + 2n + 1
• Algorithm 2: 10n2 + 8n + 3
Generally, when we analyze an algorithm, we consider the time complexity for larger values of input data
(i.e. 'n' value). In above two-time complexities, for larger value of 'n' the term '2n + 1' in algorithm 1 has
least significance than the term '5n2', and the term '8n + 3' in algorithm 2 has least significance than the
term '10n2'.

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

Here, for larger value of 'n' the value of most significant terms (5n2 and 10n2 ) is very larger than the value
of least significant terms ( 2n + 1 and 8n + 3 ). So, for larger value of 'n' we ignore the least significant
terms to represent overall time required by an algorithm. In asymptotic notation, we use only the most
significant terms to represent the time complexity of an algorithm.
Majorly, we use THREE types of Asymptotic Notations and those are as follows...
1. Big - Oh (O)
2. Big - Omega (Ω)
3. Big - Theta (Θ)

Big - Oh Notation (O)


Big - Oh notation is used to define the upper bound of an algorithm in terms of Time Complexity. That
means Big - Oh notation always indicates the maximum time required by an algorithm for all input values.
That means Big - Oh notation describes the worst case of an algorithm time complexity. Big - Oh Notation
can be defined as follows...
Consider function f(n) as time complexity of an algorithm and g(n) is the most significant term. If f(n) <= C
g(n) for all n >= n0, C > 0 and n0 >= 1. Then we can represent f(n) as O(g(n)).

f(n) = O(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and time
required is on Y-Axis

In above graph after a particular input value n0, always C g(n) is greater than f(n) which indicates the
algorithm's upper bound.
Example
Consider the following f(n) and g(n)...
f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as O(g(n)) then it must satisfy f(n) <= C g(n) for all values of C > 0 and n0>=
1
f(n) <= C g(n)
⇒3n + 2 <= C n
Above condition is always TRUE for all values of C = 4 and n >= 2.
By using Big - Oh notation we can represent the time complexity as follows...
3n + 2 = O(n)
Big - Omege Notation (Ω)

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

Big - Omega notation is used to define the lower bound of an algorithm in terms of Time Complexity. That
means Big-Omega notation always indicates the minimum time required by an algorithm for all input
values. That means Big-Omega notation describes the best case of an algorithm time complexity. Big -
Omega Notation can be defined as follows...
Consider function f(n) as time complexity of an algorithm and g(n) is the most significant term. If f(n) >= C
g(n) for all n >= n0, C > 0 and n0 >= 1. Then we can represent f(n) as Ω(g(n)).
f(n) = Ω(g(n))

Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and
time required is on Y-Axis

In above graph after a particular input value n0, always C g(n) is less than f(n) which indicates the
algorithm's lower bound.
Example
Consider the following f(n) and g(n)...
f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as Ω(g(n)) then it must satisfy f(n) >= C g(n) for all values of C > 0 and n0>=
1
f(n) >= C g(n)
⇒3n + 2 >= C n
Above condition is always TRUE for all values of C = 1 and n >= 1.
By using Big - Omega notation we can represent the time complexity as follows...
3n + 2 = Ω(n)
Big - Theta Notation (Θ)
Big - Theta notation is used to define the average bound of an algorithm in terms of Time Complexity.
That means Big - Theta notation always indicates the average time required by an algorithm for all input
values. That means Big - Theta notation describes the average case of an algorithm time complexity.
Big - Theta Notation can be defined as follows...

Consider function f(n) as time complexity of an algorithm and g(n) is the most significant term. If C1 g(n)
<= f(n) <= C2 g(n) for all n >= n0, C1 > 0, C2 > 0 and n0 >= 1. Then we can represent f(n) as Θ(g(n)).

f(n) = Θ(g(n))

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-Axis and
time required is on Y-Axis

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

In above graph after a particular input value n0, always C1 g(n) is less than f(n) and C2 g(n) is greater than
f(n) which indicates the algorithm's average bound.
Example
Consider the following f(n) and g(n)...
f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as Θ(g(n)) then it must satisfy C1 g(n) <= f(n) <= C2 g(n) for all values of C1
> 0, C2 > 0 and n0>= 1
C1 g(n) <= f(n) <= C2 g(n)
⇒C1 n <= 3n + 2 <= C2 n
Above condition is always TRUE for all values of C1 = 1, C2 = 4 and n >= 2.
By using Big - Theta notation we can represent the time compexity as follows...
3n + 2 = Θ(n)
AVL Trees
AVL tree is a binary search tree in which the difference of heights of left and right subtrees of any
node is less than or equal to one. The technique of balancing the height of binary trees was developed
by Adelson, Velskii, and Landi and hence given the short form as AVL tree or Balanced Binary Tree.

AVL Tree can be defined as height balanced binary search tree in which each node is associated with a
balance factor which is calculated by subtracting the height of its right sub-tree from that of its left sub-
tree.

Tree is said to be balanced if balance factor of each node is in between -1 to 1, otherwise, the tree will
be unbalanced and need to be balanced.

Balance Factor (k) = height (left(k)) - height (right(k))

If balance factor of any node is 1, it means that the left sub-tree is one level higher than the right sub-
tree.

If balance factor of any node is 0, it means that the left sub-tree and right sub-tree contain equal height.

If balance factor of any node is -1, it means that the left sub-tree is one level lower than the right sub-
tree.

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

An AVL tree is given in the following figure. We can see that, balance factor associated with each node
is in between -1 and +1. therefore, it is an example of AVL tree.

Complexity

Algorithm Average Case Worst Case


Space O(n) O(n)
Search O ( log n ) O ( log n )
Insert O ( log n ) O ( log n )
Delete O ( log n ) O ( log n )

Operations on AVL tree

Due to the fact that, AVL tree is also a binary search tree therefore, all the operations are performed in
the same way as they are performed in a binary search tree. Searching and traversing do not lead to the
violation in property of AVL tree. However, insertion and deletion are the operations which can violate
this property and therefore, they need to be revisited.

Insertion: Insertion in AVL tree is performed in the same way as it is performed in a binary search tree.
However, it may lead to violation in the AVL tree property and therefore the tree may need balancing.
The tree can be balanced by applying rotations.
Deletion: Deletion can also be performed in the same way as it is performed in a binary search tree.
Deletion may also disturb the balance of the tree therefore, various types of rotations are used to
rebalance the tree.

Why AVL Tree?

AVL tree controls the height of the binary search tree by not letting it to be skewed. The time taken for
all operations in a binary search tree of height h is O(h). However, it can be extended to O(n) if the
BST becomes skewed (i.e. worst case). By limiting this height to log n, AVL tree imposes an upper
bound on each operation to be O(log n) where n is the number of nodes.

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

AVL Rotations

We perform rotation in AVL tree only in case if Balance Factor is other than -1, 0, and 1. There are
basically four types of rotations which are as follows:

1. L L rotation: Inserted node is in the left subtree of left subtree of A


2. R R rotation : Inserted node is in the right subtree of right subtree of A
3. L R rotation : Inserted node is in the right subtree of left subtree of A
4. R L rotation : Inserted node is in the left subtree of right subtree of A

Where node A is the node whose balance Factor is other than -1, 0, 1.

The first two rotations LL and RR are single rotations and the next two rotations LR and RL are double
rotations. For a tree to be unbalanced, minimum height must be at least 2, Let us understand each
rotation

1. RR Rotation

When BST becomes unbalanced, due to a node is inserted into the right subtree of the right subtree of A,
then we perform RR rotation, RR rotation is an anticlockwise rotation, which is applied on the edge
below a node having balance factor -2

In above example, node A has balance factor -2 because a node C is inserted in the right subtree of A
right subtree. We perform the RR rotation on the edge below A.

2. LL Rotation

When BST becomes unbalanced, due to a node is inserted into the left subtree of the left subtree of C,
then we perform LL rotation, LL rotation is clockwise rotation, which is applied on the edge below a
node having balance factor 2.

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

In above example, node C has balance factor 2 because a node A is inserted in the left subtree of C left
subtree. We perform the LL rotation on the edge below A.

3. LR Rotation

Double rotations are bit tougher than single rotation which has already explained above. LR rotation =
RR rotation + LL rotation, i.e., first RR rotation is performed on subtree and then LL rotation is
performed on full tree, by full tree we mean the first node from the path of inserted node whose balance
factor is other than -1, 0, or 1.

A node B has been inserted into the right subtree of A the left subtree of C, because of
which C has become an unbalanced node having balance factor 2. This case is L R
rotation where: Inserted node is in the right subtree of left subtree of C

As LR rotation = RR + LL rotation, hence RR (anticlockwise) on subtree rooted at A is


performed first. By doing RR rotation, node A, has become the left subtree of B.

After performing RR rotation, node C is still unbalanced, i.e., having balance factor 2, as
inserted node A is in the left of left of C

Now we perform LL clockwise rotation on full tree, i.e. on node C. node C has now become
the right subtree of node B, A is left subtree of B

Balance factor of each node is now either -1, 0, or 1, i.e. BST is balanced now.

4. RL Rotation

As already discussed, that double rotations are bit tougher than single rotation which has already
explained above. R L rotation = LL rotation + RR rotation, i.e., first LL rotation is performed on subtree
and then RR rotation is performed on full tree, by full tree we mean the first node from the path of
inserted node whose balance factor is other than -1, 0, or 1.

A node B has been inserted into the left subtree of C the right subtree of A, because of which A has
become an unbalanced node having balance factor - 2. This case is RL rotation where: Inserted
node is in the left subtree of right subtree of A

As RL rotation = LL rotation + RR rotation, hence, LL (clockwise) on subtree rooted


at C is performed first. By doing RR rotation, node C has become the right subtree
of B.

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

After performing LL rotation, node A is still unbalanced, i.e. having balance factor -2, which is because
of the right-subtree of the right-subtree node A.

Now we perform RR rotation (anticlockwise rotation) on full tree, i.e. on node A.


node C has now become the right subtree of node B, and node A has become the left
subtree of B.

Balance factor of each node is now either -1, 0, or 1, i.e., BST is balanced now.

Applications of AVL Trees

1. It is used to index huge records in a database and also search in that efficiently.
2. For all types of in-memory collections, including sets and dictionaries, AVL Trees are used.
3. Database applications, where insertions and deletions are less common but frequent data lookups are
necessary.
4. Software that needs optimized search.
5. It is applied in corporate areas and storyline games.

B-Trees

A B-Tree in data structure is a self-balancing tree used to efficiently store and manage large datasets.
In a B-Tree, each node can have multiple children, ensuring the tree remains balanced and operations like
search, insertion, and deletion are performed in O(log n) time.

Sr. No. Algorithm Time Complexity

1. Search O(log n)

2. Insert O(log n)

3. Delete O(log n)

What is B-Tree?

A B-Tree is a special kind of data structure that helps store and manage large amounts of data
efficiently. It is called a "tree" because it looks like an upside-down tree with branches.
Each part of the tree is called a node, and each node can have many children. This helps keep the tree
balanced, meaning it doesn't get too tall and narrow or too wide and short

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

In a B-Tree:
 Each node can hold multiple values.
 Each node can have multiple children.
 The tree stays balanced by making sure all the leaves (the nodes at the bottom) are at the same leve
The main idea of a B-Tree is to keep data organized in a way that makes it quick to find, add, or remove
information. This is especially important when working with large amounts of data, like in databases or file
systems.
Properties of B-Tree
The following are the key properties of B-Tree in data structure:
 Balanced Tree

B-Trees are balanced, meaning that all the leaf nodes (the nodes at the bottom of the tree) are at the
same level. This ensures that the tree remains short and wide, which helps in efficient data retrieval.
 Node Structure

Each node in a B-Tree can hold multiple keys (values) and children. The number of keys a node can
hold is determined by the order of the B-Tree, denoted as 'm'.
 Order 'm'

The order 'm' of a B-Tree defines the range of keys and children each node can have:
 A node can have at most 'm' children.
 A node can have at most 'm-1' keys.
Each node, except the root, must have at least ⌈m/2⌉ children and ⌈m/2⌉-1 keys. The root node must
have at least two children if it is not a leaf node.
 Keys in Nodes

The keys within a node are stored in a sorted manner. This helps in efficient searching within the node.
 Child Pointers

Each node has pointers to its child nodes. The number of child pointers is always one more than the
number of keys in the node.
 Data Range in Subtrees

For any given node:


 All keys in the left subtree are smaller than the keys in the node.

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

 All keys in the right subtree are greater than the keys in the node.
 Height of the Tree

The height of a B-Tree is kept low by allowing nodes to have multiple children. This ensures that
operations like search, insertion, and deletion can be performed efficiently.

Operations on B-Tree
B-Tree in data structure support several fundamental operations, including insertion, deletion, and
searching:
1. Insertion
Insertion in B-Tree involves adding a new key into the appropriate position while maintaining the tree's
properties. If a node becomes full, it is split into two nodes, and the middle key is promoted to the parent
node.
Algorithm:
 Search for the correct leaf node to insert the new key.
 Insert the key into the leaf node in sorted order.
 If the leaf node has more keys than allowed (m-1 keys), split the node:
 Find the median key in the node.
 Create a new node and move half of the keys to the new node.
 Promote the median key to the parent node.
 If the parent node also becomes full, repeat the splitting process recursively up to the root.
 If the root becomes full, create a new root and split the old root into two children.
Example:
Consider a B-Tree of order 7 (m=7), which means each node can have at most 2 keys.
The elements to be inserted are 8, 9, 10, 11, 15, 20

2. Deletion

Deleting an element on a B-tree consists of three main events: searching the node where the key to
be deleted exists, deleting the key and balancing the tree if required.

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

While deleting a tree, a condition called underflow may occur. Underflow occurs when a node contains
less than the minimum number of keys it should hold.
The terms to be understood before studying deletion operation are:

1. Inorder Predecessor
The largest key on the left child of a node is called its inorder predecessor.
2. Inorder Successor
The smallest key on the right child of a node is called its inorder successor.

Algorithm:
 Search for the key to be deleted.
 If the key is found in a leaf node, remove it directly.
 If the key is found in an internal node, there are two cases:
 Replace the key with the predecessor (maximum key in the left subtree) or successor (minimum key
in the right subtree) and recursively delete the predecessor/successor.
 If the key is not found in the node but exists in the subtree, recursively delete it from the appropriate
subtree.
 If a child node has fewer keys than the minimum required, borrow a key from a sibling or merge
with a sibling to maintain the minimum degree property.

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

3. Searching
Searching in a B-Tree involves finding a key within the nodes. The search process is similar to
binary search and takes advantage of the sorted order of keys in each node.
Algorithm:
 Start at the root node and check if the key is in the current node.
 If the key is found, return the node and the index of the key.
 If the key is not found and the node is a leaf, return None (the key does not exist in the tree).
 If the key is not found and the node is not a leaf, recurse into the appropriate child node:
 Compare the key with the keys in the current node to determine the correct child to search.
Example:
Consider the B-Tree after inserting 10, 20, 5, 6, 12.
Search for the key 12:
 Start at the root [10].
 Key 12 is greater than 10, move to the right child [12, 20].
 Key 12 is found in the node [12, 20].

Applications of B-Tree
B-Trees are versatile data structures widely used in various applications due to their efficiency in
managing large datasets and minimizing disk I/O operations:
1. Database Indexing
B-Trees are extensively used in database management systems (DBMS) to implement indexes.
Example:
SQL databases like MySQL and PostgreSQL use B-Trees for indexing. When you create an index on a
column, the database internally uses a B-Tree to organize and maintain the index.
2. File Systems
B-Trees are used in file systems to manage files and directories efficiently.

Downloaded by somaraju parasa (somaraju.acg@gmail.com)


lOMoARcPSD|17958983

Example:
The Btrfs (B-tree file system) and the ReiserFS file system use B-Trees to manage their on-disk
structures. Apple's HFS+ (Hierarchical File System Plus) also uses B-Trees to keep track of files and
directories.
3. Databases and Indexing for Embedded Systems
B-Trees are used in embedded systems for efficient data storage and retrieval.
Example:
Embedded databases like SQLite use B-Trees to manage their internal data structures, making them
suitable for applications on mobile devices and other embedded systems.
4. Multi-Level Indexing
B-Trees are used in multi-level indexing schemes to improve search performance.
Example:
In a multi-level index, the primary index might point to secondary indexes, each of which is a B-
Tree. This structure helps in quickly narrowing down the search to a specific range of values.
5. Caching Systems
B-Trees are used in caching systems to manage cache entries efficiently.
Example:
In-memory caching systems like Redis can use B-Trees to manage their internal data structures,
ensuring efficient handling of cache entries.
6. Geographic Information Systems (GIS)
B-Trees are used in Geographic Information Systems (GIS) for spatial data indexing.
Example:
Spatial databases like PostGIS, an extension of PostgreSQL, use B-Trees for indexing spatial data,
enabling efficient spatial queries and operations.
7. Operating System Schedulers
B-Trees are used in operating system schedulers to manage process scheduling.

Downloaded by somaraju parasa (somaraju.acg@gmail.com)

You might also like