0% found this document useful (0 votes)
11 views31 pages

Unit-3 DBMS

This document covers the concepts of normalization in database design, detailing functional dependencies and various normal forms including First, Second, Third, Boyce-Codd, Fourth, and Fifth Normal Forms. It explains how normalization minimizes data redundancy and addresses anomalies such as insertion, update, and deletion. Additionally, it discusses file structures, hashing, and indexing related to data organization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views31 pages

Unit-3 DBMS

This document covers the concepts of normalization in database design, detailing functional dependencies and various normal forms including First, Second, Third, Boyce-Codd, Fourth, and Fifth Normal Forms. It explains how normalization minimizes data redundancy and addresses anomalies such as insertion, update, and deletion. Additionally, it discusses file structures, hashing, and indexing related to data organization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

UNIT-III

NORMALIZATION
1. Functional dependencies
2. Normalization
2.1 First Normal form
2.2 Second Normal form
2.3 Third Normal form
2.4 Boyce-codd Normal form
2.5 Fourth Normal form
2.6 Fifth Normal form
3.Decomposition Relation

FILE STRUCTURES, HASHING AND INDEXING

4. File Organization
5. Placing File Records on Disk
6. Operations on Files
7. Sequential File Organization
8. HEAP file organization
9. Hash file organization
10. ISAM
11. B+ Trees
12. Cluster file organization

1
NORMALIZATION
1. FUNCTIONAL DEPENDENCY
FUNCTIONAL DEPENDENCY:-
1. The functional dependency is a relationship between attributes.
2. It typically exists between the primary key and non-key attribute with in the
table.
3. The functional dependency is nothing but a non-key attributes can be
depend on the primary key value.

X------------->y
(Determinant) (Dependent)

1.) In the below table functional dependency is

Sid Sname

1 Ranjith

4 Ranjith
f.d : sidsname
cause is it(sname)dependent on sid

2.) Sid Sname

1 Ranjith

2 Ranjith
f.d: sidsname
It is valid functional dependency because is it(sname)dependent on sid
3.)

Sid Sname

1 Ranjith

2 Varun
f.d: sidsname
It is valid functional dependency because is it sname dependent on sid
4.)

Sid Sname

1 Ranjith

2
1 Varun

The above student sname ranjith and varun contain the same sid.so it is invalid f.d
Reasoning About Functional Dependencies:

Axiom Name Axiom Example

Reflexivity if a is set of attributes, b ⊆ a, then a →b SSN,Name → SSN

SSN → Name then


Augmentati if a→ b holds and c is a set of attributes,
on then ca→cb
Transitivity if a →b holds and b→c holds, then a→ c SSN →Zip and Zip → City then
holds SSN →City
:
2. NORMALIZATION
NORMALIZATION:-
Definition1:-
The process of
and well structured.
Definition 2:-
It is a technique of
minimize the Redundancy.
Definition3:-
The process of decomposing he data and store them into table in order the
data redundancy.
PROBLEMS CAUSED IN DATABASE :-
1.) DATA REDUNDANCY:-
i.) Data Redundancy is nothing but repeat ion of similar data at multiple places.
ii.) Repeat ion of data increases the size of the Database.

3
Ex:- Student
Roll No Name Branch Hod Ph.NO

401 Raju cse MR.x 1234

402 Gopal cse MR.x 1234

403 Praveen cse MR.x 1234

404 Kumar cse MR.x 1234

 In the above student table we have four cse branch students.


 The data for fields are branch, Hod, phno is repeated for students who are in
the same branch in the college this is called Data redundancy.
HOW NORMALIZATION WILL SOLVE DATA REDUNDANCY PROBLEM?

1. The normalization will break the existing student table into two different
tables.
1. Student Table 2.Branch Table
Roll No Name Branch Branch Hod Ph No

401 Raju cse cse MR.x 1234

402 Gopal cse

403 Praveen cse

404 Kumar cse

 Student table contain the student information and branch table contain branch
information. The above student table the branch cse is still repeating the data
redundancy problem.
 Normalization is not eliminating the redundancy it is minimizing the
redundancy

2.) INSERTION ANOMALIES:-


 In the above student table, if we have to insert the data of 100 students of
same branch then the branch information will be repeated for all those 100
students.
 These scenarios are nothing but insertion anomalies.

4
HOW NORMALIZATION WILL SOLVE INSERT ANOMALIES PROBLEM :
The normalization will break the existing student table into two different tables.
1.) Student table 2.) Branch Table
Roll No Name Branch
Branch Hod Ph No
401 Raju cse
cse MR.x 1234
402 Gopal cse

403 Praveen cse


Insertion
404 Kumar cse

405 Sunil cse

 If you have to insert the new student data we have to enter the roll no, name
and branch. And branch information stored in the branch table.

3.) UPDATE ANOMALIES:


 .(or) is no longer the Hod of
cse department in that case all the student records will have to be updated.
 If by any mistake we miss any record it will be lead to data inconsistency.

HOW NORMALIZATION WILL SOLVE UPDATE ANOMALIES:


The normalization will break the existing student table into two Different tables.
1.) Student table 2.Branch Table
If you want o HOD name and office Ph.no we just have to once place
Student table Branch table
Roll No Name Branch Branch Hod Ph No

401 Raju cse cse MR.x 1234

402 Gopal cse

403 Praveen cse

404 Kumar cse

4.) DELETION ANOMALIES:-


In the student table, two different information are kept together student
information and branch information hence at the end of academic year if student
records are deleted we can also lose the branch information.

5
HOW NORMALIZATION WILL SOLVE DELETION ANOMALIES:
The normalization will break existing student table into 2 different tables.
1.) Student table 2.Branch Table
Delete all the student records in the student table still we have the branch details
in the branch table.
Student table Branch table
Roll No Name Branch Branch Hod Ph No

401 Raju cse cse MR.x 1234

402 Gopal cse

403 Praveen cse

404 Kumar cse

Normalization:-
Definition 1:-
The process of decomposing the relation with anomalies to produce small
and well structure.

Definition 2:-
It is a technique of organizing the data into multiple related tables o
minimize data redundancy.
Normal forms:-
 The process of normalization is based on the concept of normal form. Each
and every normal form has its own set of properties and constraints.
 A table is said to be in normal form if it is satisfy all the properties of that
normal form.
LEVELS OF NORMALIZATION.
1.) First Normal form
2.) Second Normal form
3.) Third Normal form
4.) Boyce code Normal form
5.) Fourth Normal form
6) Fifth Normal Form

6
Levels of Normalization:-
Table with multivalued attributes Remove multivalued
attributes
First normal form
Remove partial
Second Normal form dependancy

Third Normal form Remove transitive


dependency

Boyce-codd Normal form Removes remaining


anomalies

Fourth Normal form Remove multilevel

LEVELS OF NORMALIZATION

1.) First Normal form:-


A relation in which the intersection of each row and columns contains one and only
one value.
1.) A relation is said to be in first normal form if the values in the domain of each
attribute of the relation are atomic.
2.) only one value is associated with each attribute and the value is not a set of
values or list of values

Example: Emp table

eid Ename Contact

1 A 98

-- -- 99

2 B 198

-- -- 100

In the above emp table is not in first normal form the contact attribute which are
not atomic.
The contact attribute contains set of values

7
Convert into 1NF
Emp

Eid Ename Contact

1 A 98

1 A 99

2 B 198

2 B 100

Example:-
Student
Roll no Name Course

1 Sai c/c++

2 Harish Java

3 Omkar c/dbms

In the above student table course column contain multi valued attributes so
it is not in 1NF.
’sai’ is learning the courses c and c++,’Omkar’ is learning the courses c and
Dbms that means course column contain multi valued attribute.

1st way:-
Student
Roll no Name Course

1 Sai C

1 Sai C++

2 Harish Java

3 Omkar C

3 Omkar dbms

In the above student table Roll no an course can be treated as composite primary
key.

8
2nd way:-

Student
Roll no Name Course1 Course2

1 Sai C C++

2 Harish Java Null

3 Omkar C Dbms

In the above student table roll no is the primary key.


3rd way:-
Student table Course table
Roll no Name
Roll no Course
1 Sai
1 C
2 Harish
1 C++
3 Omkar
2 Java

3 C

3 Dbms

 In the above student table Roll no is primary key


 In the above course table Roll no and course are primary keys and Roll no is
foreign key.
FUNCTIONAL DEPENDENCY:-
1. The functional dependency is a relationship between attributes.
2. It typically exists between the primary key and non-key attribute with in the
table.
3. The functional dependency is nothing but a non-key attributes can be depend
on the primary key value.

X------------->y
(Determinant) (Dependent)

1.) In the below table functional dependency is

Sid Sname

1 Ranjith

9
4 Ranjith

f.d : sidsname
It is valid functional dependency because is it(sname)dependent on sid

2.) Sid Sname

1 Ranjith

2 Ranjith
f.d: sidsname
It is valid functional dependency because is it(sname)dependent on sid
3.)

Sid Sname

1 Ranjith

2 Varun
f.d: sidsname
It is valid functional dependency because is it sname dependent on sid
4.)

Sid Sname

1 Ranjith

1 Varun

The above student sname ranjith and varun contain the same sid.so it is invalid f.d

2.) SECOND NORMAL FORM:-


1. In the 2nd normal form relation must be the 1st normal form.
2. In the 2nd normal form all non-key attributes are functionally depend on the
primary key.
3. All the non-prime attributes should be fully functional depend on the candidate
key.
Prime attribute:-
An attribute is a part of primary key is called prime attribute.

10
Non-Prime attribute:-
An attribute is not a part of primary key is called non-prime attribute.
Definition:-
Every non-prime attribute should be fully f.d on the prime attribute if x-A
holds then their should not be any proper subset y of ‘x’,for which yA also holds
true.
A table should not contain partial dependency.
Ex: - customer
Customer Id Store Id Location

1 1 Delhi

1 3 Mumbai

2 1 Delhi

3 2 Banglore

4 3 Mumbai

[customer_id , store id=Candidate keys]


In the above table
Candidate keys : customer__id and store_id
Prime attributes: customer_id, store id
Non-prime attribute: location.
The above customer table is not in the 2nd normal form location attribute is
dependent on store_id not on cutomer_ id .so we can remove partial dependency by
splitting the customer table into two tables they are customer and store table .

Customer Store
Cus_id Storied
Storied Location
1 1
1 Delhi
1 3
2 Banglore
2 1
3 Mumbai
3 2

4 3

11
3.) THIRD NORMAL FORM:-
1.In the 3rd normal form relation must be in the 2nd normal form. There should be
no transitive dependency.
2.Transitive dependency :
A condition where a, b and c are attributes a relation such that ab, and
bc then c is transitively dependent on a via b a-c
3.A relation that is in first and second normal form and which no non primary key
attribute (or) all attributes non transitively depend on primary key.
Ex-Student Location

Roll no State City

1 Punjab Mohali

2 Haryana Ambala

3 Punjab Mohali

4 Haryana Ambala

5 Bihar Patna

In the above table functional dependices are Rollno-state and state-city.
City is non key attribute which is functionally dependent on Non-key
attribute state.
We have to remove the transitive dependency in the above table by splitting
the table into student table and location table.

Student Location
Roll no State
State City
1 Punjab
Punjab Mohali
2 Haryana
Haryana Ambala
3 Punjab
Punjab Mohali
4 Haryana
Haryana Ambala
5 Bihar
Bihar Patna
In the student table f.d : Roll no-state
In the location table f.d ; city-state

12
4.) BCNF (BOYCE CODD NORMAL FORM):-
1. A relation is said to be in BCNF if it is already in 3nf and the left hand side of
the every dependency is a candidate key.
2. A relation must be in 1NF,2NF and 3NF.
3. BCNF is the advanced version of 3NF.
4. A table is in BCNF is every f.d : x-y, x is the super key of the table.
5. L.H.S of each functional dependency should be candidate key or super
key.

Example:-
Student
Roll no Name Voter id Age

1 Ravi Ko123 20

2 Varun Mo34 21

3 Ravi K786 23

4 Rahul D286 21

In the above table candidate keys are Roll no,Voter id

Roll no-name
Functional dependencies Roll no-Voterid
Candidate keys L.H.S Voterid-age
Voterid-->Roll no

5.) FOURTH NORMAL FORM:-


1.A relation must first be in BCNF.
2.4NF eliminates independent many to one relationships between columns.
3.A given relation may not contain more than one multi-valued attributes.

13
Ex:-Student
Std_id Subject Activity

100 Music Swimming

100 Accounting Swimming

100 Music Tennis

100 Accounting Tennis

150 Maths jogging

1.In the above student table primary key are std_id,subject,Activity.


2.Many student_id have same subject.
3.Many student have same activity .Thus it violates 4NF.
4.We can decomposes the table into 4NF.
5.We can split into Two tables student and activity table

Student activity
Std_id Subject Std_id Activity

100 Music` 100 Swimming

100 Accounting 100 Tennis

150 Maths 150 jogging

6.) FIFTH NORMAL FORM:

1.A relation must first be In the 4NF


2. It should not have join dependency
3.it contain the Lossless Join dependency
A given relation may not contain more than one multi-valued attributes
DECOMPOSITION RELATION:-
The process of breaking up or divide a single relation into 2 or more sub-relations
is called Decomposition of Relation.
R
R1 R2
R

14
LOSSLESS JOIN DECOMPOSITION:-

The Decomposition is called lossless join Decomposition when the join of the sub
relations results in the same relation or that was decomposed is called lossless join
Decomposition.

Ex:

R(ABC)

A B C

1 2 1

2 5 3

3 3 3

R1 R2
A B B C

1 2 2 1

5 3
2 5
3 3
3 3

R1 ⋈ R2

A B C

1 2 1

2 5 3

3 3 3

 R relation is decomposed into two sub relations R1& R2


 We perform the natural join of the sub-relations R1&R2.we get R Relation.

3. DECOMPOSITION RELATION:-

The process of breaking up or divide a single relation into 2 or more sub-relations


is called Decomposition of Relation.

15
R
R1 R2
R
TYPES OF DECOMPOSITION:
1.) lossless join Decomposition
2.) lossy join Decomposition
1,) LOSSLESS JOIN DECOMPOSITION:-

The Decomposition is called lossless join Decomposition when the join of the sub
relations results in the same relation or that was decomposed is called lossless join
Decomposition.

Ex:

R(ABC)

A B C

1 2 1

2 5 3

3 3 3

R1 R2
A B B C

1 2 2 1

5 3
2 5
3 3
3 3

R1 ⋈ R2

A B C

1 2 1

2 5 3

3 3 3

16
 R relation is decomposed into two sub relations R1& R2
 We perform the natural join of the sub-relations R1&R2.we get R Relation.

2.) Lossy join Decomposition:-


 The relation ‘R’ which is decomposed into sub relations R1&R2&……..Rn
 The decomposition is called lossy join Decomposition when the join of the
sub relations does not result in the same relation (or) that was decomposed.

R
A B C D

1 a p x

2 b q y

R1 R2
A B C D
1 a p x
2 b x y

R1XR2
A B C D

1 a p x

1 a p x

2 b q y

2 b q y

R relation is decomposed into two sub-relations R1&R2


We perform the Cartesian operator betweenR1 & R2 the sub relations R1&R2
we get R relation with extra tuples.

17

Notes Prepared by K.CHANDRAMOULI


FILE STRUCTURES, HASHING AND INDEXING
4. FILE ORGANIZATION
INTRDUCTION

File – A file is named collection of related information that is recorded on


secondary storage such as magnetic disks, magnetic tables and optical disks.
File Organization
File Organization refers to the logical relationships among various records that
constitute the file, particularly with respect to the means of identification and
access to any specific record.
 In simple terms, Storing the files in certain order is called file Organization.
 File Structure refers to the format of the label and data blocks and of any
logical control record.

RECORD
Record consist of collect of Data value or Item
Record are of two types’

1. Fixed Length Record

2. Variable Length Record

1. Fixed length records:-


1.All the records in the file are of same size.
2. Leads to memory wastage.
3. Access of the records is easier and faster.
4. Exact location of the records can be determined: location of ith record would
be.n*(i-1), where n is the size of every record.

2. Variable length records:-


1.Different records in the file have different sizes.
2. Memory efficient.
3. Access of the records is slow.

18

Notes Prepared by K.CHANDRAMOULI


TYPES OF FILE ORGANIZATION

5. PLACING FILE RECORDS ON DISK

RECORD: It is the collection of related data values and items.It describes the
entities and attributes. eg : Employee record .
RECORD TYPES:
It is the collection of field names and their corresponding data types. A data type
specifies the types of values of a field.
FILE: It is a sequence of records.

RECORD
Record consist of collect of Data value or Item
Record are of two types’

3. Fixed Length Record


4. Variable Length Record

1. Fixed length records:-


1.All the records in the file are of same size.
2. Leads to memory wastage.
3. Access of the records is easier and faster.
4. Exact location of the records can be determined: location of ith record would
be.n*(i-1), where n is the size of every record.

2. Variable length records:-


1.Different records in the file have different sizes.
2. Memory efficient.
3. Access of the records is slow.

19
4. File records are same record type but one or more field have multiple values for
individual record.
5. Files are same record type but some fields are optional.
6. Files contain records of different record types and varying length

RECORD BLOCKS
Records of file are stored in disk blocks. Each block contain numerous records.

SPANNED Vs UNSPANNED Spanned:


It stores one part of a record in one block and the rest in the other.
Unspanned : The records are not allow to cross block boundaries.
Each block may contain different number of records.

ALLOCATING FILE BLOCKS ON DISK


There are three types allocation
1. Contiguous allocation
2. Linked allocation
3. Indexed allocation
1. Contiguous allocation The file blocks are allocated to consecutive disk
blocks.
2. Linked allocation Each file block contain a pointer to the next block.
3. Indexed allocation One or more index blocks contain pointes to the
actual blocks.

FILE HEADER •
It contain information about a file that is needed by the system programs that
access the file records. • It also include the information to determine the disk
address, record format of a file.

6. OPERATIONS ON FILES

Open.

Prepares the file for reading or writing. Allocates appropriate buffers (typically at
least two) to hold file blocks from disk, and retrieves the file header. Sets the file
pointer to the beginning of the file.

Reset.

Sets the file pointer of an open file to the beginning of the file.

Find (or Locate).

Searches for the first record that satisfies a search condition.

Read (or Get).

20
Copies the current record from the buffer to a program variable in the user
program. This command may also advance the current record pointer to the next
record in the file, which may necessitate reading the next file block from disk.

FindNext.

Searches for the next record in the file that satisfies the search condition.

Delete.

Deletes the current record and (eventually) updates the file on disk to reflect the
deletion.

Modify.

Modifies some field values for the current record and (eventually) updates the file
on disk to reflect the modification.

Insert.

Inserts a new record in the file by locating the block where the record is to be
inserted, transferring that block into a main memory buffer (if it is not already
there), writing the record into the buffer, and (eventually) writ-ing the buffer to disk
to reflect the insertion.

Close.

Completes the file access by releasing the buffers and performing any other needed
cleanup operations.

Scan.

If the file has just been opened or reset, Scan returns the first record; otherwise it
returns the next record. If a condition is specified with the oper-ation, the returned
record is the first or next record satisfying the condition.

Find All.

Locates all the records in the file that satisfy a search condition.

Find (or Locate) n.

Searches for the first record that satisfies a search condition and then continues to
locate the next n – 1 records satisfying the same condition.

Find Ordered.

Retrieves all the records in the file in some specified order.

21
7. SEQUENTIAL FILE ORGANIZATION
This method is the easiest method for file organization. In this method, files are
stored sequentially. This method can be implemented in two ways:
1. Pile File Method:
o It is a quite simple method. In this method, we store the record in a
sequence, i.e., one after another. Here, the record will be inserted in the
order in which they are inserted into tables.
o In case of updating or deleting of any record, the record will be searched in
the memory blocks. When it is found, then it will be marked for deleting, and
the new record is inserted.

Insertion of the new record:


Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence.
Hence, records are nothing but a row in the table. Suppose we want to insert a new
record R2 in the sequence, then it will be placed at the end of the file. Here, records
are nothing but a row in any table.

2. Sorted File Method:


o In this method, the new record is always inserted at the file's end, and then
it will sort the sequence in ascending or descending order. Sorting of records
is based on any primary key or any other key.
o In the case of modification of any record, it will update the record and then
sort the file, and lastly, the updated record is placed in the right place.

22
Insertion of the new record:
Suppose there is a preexisting sorted sequence of four records R1, R3 and so on
upto R6 and R7. Suppose a new record R2 has to be inserted in the sequence, then
it will be inserted at the end of the file, and then it will sort the sequence.

Pros of sequential file organization


o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism like
magnetic tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like grade
calculation of a student, generating the salary slip, etc.
o This method is used for report generation or statistical calculations.
Cons of sequential file organization
o It will waste time as we cannot jump on a particular record that is required
but we have to move sequentially which takes our time.
o Sorted file method takes more time and space for sorting the records.

23
8. HEAP FILE ORGANIZATION
o It is the simplest and most basic type of organization. It works with data
blocks. In heap file organization, the records are inserted at the file's end.
When the records are inserted, it doesn't require the sorting and ordering of
records.
o When the data block is full, the new record is stored in some other block.
This new data block need not to be the very next data block, but it can select
any data block in the memory to store new records. The heap file is also
known as an unordered file.
o In the file, every record has a unique id, and every page in a file is of the
same size. It is the DBMS responsibility to store and manage the new
records.

Insertion of a new record


Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we
want to insert a new record R2 in a heap. If the data block 3 is full then it will be
inserted in any of the database selected by the DBMS, let's say data block 1.

24

Notes Prepared by K.CHANDRAMOULI


If we want to search, update or delete the data in heap file organization, then we
need to traverse the data from staring of the file till we get the requested record.

If the database is very large then searching, updating or deleting of record will be
time-consuming because there is no sorting or ordering of records. In the heap file
organization, we need to check all the data until we get the requested record.
Pros of Heap file organization
o It is a very good method of file organization for bulk insertion. If there is a
large number of data which needs to load into the database at a time, then
this method is best suited.
o In case of a small database, fetching and retrieving of records is faster than
the sequential record.
Cons of Heap file organization
o This method is inefficient for the large database because it takes time to
search or modify the record.
o This method is inefficient for large databases.

25
9. HASH FILE ORGANIZATION
Hash File Organization uses the computation of hash function on some fields of the
records. The hash function's output determines the location of disk block where the
records are to be placed.

When a record has to be received using the hash key columns, then the address is
generated, and the whole record is retrieved using that address. In the same way,
when a new record has to be inserted, then the address is generated using the
hash key and record is directly inserted. The same process is applied in the case of
delete and update.
In this method, there is no effort for searching and sorting the entire file. In this
method, each record will be stored randomly in the memory.

26
10. INDEXED SEQUENTIAL ACCESS METHOD (ISAM)

ISAM method is an advanced sequential file organization. In this method, records


are stored in the file using the primary key. An index value is generated for each
primary key and mapped with the record. This index contains the address of the
record in the file.

If any record has to be retrieved based on its index value, then the address of the
data block is fetched and the record is retrieved from the memory.

27
Pros of ISAM:
o In this method, each record has the address of its data block, searching a
record in a huge database is quick and easy.
o This method supports range retrieval and partial retrieval of records. Since
the index is based on the primary key values, we can retrieve the data for
the given range of value. In the same way, the partial value can also be
easily searched, i.e., the student name starting with 'JA' can be easily
searched.
Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed
to maintain the sequence.
o When the record is deleted, then the space used by it needs to be released.
Otherwise, the performance of the database will slow down.

11. B+ TREE
o B+ tree file organization is the advanced method of an indexed sequential
access method. It uses a tree-like structure to store records in File.
o It uses the same concept of key-index where the primary key is used to sort
the records. For each primary key, the value of the index is generated and
mapped with the record.
o The B+ tree is similar to a binary search tree (BST), but it can have more
than two children. In this method, all the records are stored only at the leaf
node. Intermediate nodes act as a pointer to the leaf nodes. They do not
contain any records.

The above B+ tree shows that:


28
o There is one root node of the tree, i.e., 25.
o There is an intermediary layer with nodes. They do not store the actual
record. They have only pointers to the leaf node.
o The nodes to the left of the root node contain the prior value of the root and
nodes to the right contain next value of the root, i.e., 15 and 30 respectively.
o There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27
and 29.
o Searching for any record is easier as all the leaf nodes are balanced.
o In this method, searching any record can be traversed through the single
path and accessed easily.
Pros of B+ tree file organization
o In this method, searching becomes very easy as all the records are stored
only in the leaf nodes and sorted the sequential linked list.
o Traversing through the tree structure is easier and faster.
o The size of the B+ tree has no restrictions, so the number of records can
increase or decrease and the B+ tree structure can also grow or shrink.
o It is a balanced tree structure, and any insert/update/delete does not affect
the performance of tree.
Cons of B+ tree file organization
o This method is inefficient for the static method.

12. CLUSTER FILE ORGANIZATION


o When the two or more records are stored in the same file, it is known as
clusters.
o These files will have two or more tables in the same data block, and key
attributes which are used to map these tables together are stored only once.
o This method reduces the cost of searching for various records in different
files.
o The cluster file organization is used when there is a frequent need for joining
the tables with the same condition. These joins will give only a few records
from both tables.
o In the given example, we are retrieving the record for only particular
departments. This method can't be used to retrieve the record for the entire
department.
o In this method, we can directly insert, update or delete any record. Data is
sorted based on the key with which searching is done. Cluster key is a type
of key with which joining of the table is performed.

29
Types of Cluster file organization:
Cluster file organization is of two types:
1. Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored
together. The above EMPLOYEE and DEPARTMENT relationship is an example of
an indexed cluster. Here, all the records are grouped based on the cluster key-
DEP_ID and all the records are grouped.

30
2. Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records
based on the cluster key, we generate the value of the hash key for the cluster key
and store the records with the same hash key value.
Pros of Cluster file organization
o The cluster file organization is used when there is a frequent request for
joining the tables with same joining condition.
o It provides the efficient result when there is a 1:M mapping between the
tables.
Cons of Cluster file organization
o This method has the low performance for the very large database.
o If there is any change in joining condition, then this method cannot use. If
we change the condition of joining then traversing the file takes a lot of time.

31

You might also like