Unit-3 DBMS
Unit-3 DBMS
NORMALIZATION
1. Functional dependencies
2. Normalization
2.1 First Normal form
2.2 Second Normal form
2.3 Third Normal form
2.4 Boyce-codd Normal form
2.5 Fourth Normal form
2.6 Fifth Normal form
3.Decomposition Relation
4. File Organization
5. Placing File Records on Disk
6. Operations on Files
7. Sequential File Organization
8. HEAP file organization
9. Hash file organization
10. ISAM
11. B+ Trees
12. Cluster file organization
1
NORMALIZATION
1. FUNCTIONAL DEPENDENCY
FUNCTIONAL DEPENDENCY:-
1. The functional dependency is a relationship between attributes.
2. It typically exists between the primary key and non-key attribute with in the
table.
3. The functional dependency is nothing but a non-key attributes can be
depend on the primary key value.
X------------->y
(Determinant) (Dependent)
Sid Sname
1 Ranjith
4 Ranjith
f.d : sidsname
cause is it(sname)dependent on sid
1 Ranjith
2 Ranjith
f.d: sidsname
It is valid functional dependency because is it(sname)dependent on sid
3.)
Sid Sname
1 Ranjith
2 Varun
f.d: sidsname
It is valid functional dependency because is it sname dependent on sid
4.)
Sid Sname
1 Ranjith
2
1 Varun
The above student sname ranjith and varun contain the same sid.so it is invalid f.d
Reasoning About Functional Dependencies:
3
Ex:- Student
Roll No Name Branch Hod Ph.NO
1. The normalization will break the existing student table into two different
tables.
1. Student Table 2.Branch Table
Roll No Name Branch Branch Hod Ph No
Student table contain the student information and branch table contain branch
information. The above student table the branch cse is still repeating the data
redundancy problem.
Normalization is not eliminating the redundancy it is minimizing the
redundancy
4
HOW NORMALIZATION WILL SOLVE INSERT ANOMALIES PROBLEM :
The normalization will break the existing student table into two different tables.
1.) Student table 2.) Branch Table
Roll No Name Branch
Branch Hod Ph No
401 Raju cse
cse MR.x 1234
402 Gopal cse
If you have to insert the new student data we have to enter the roll no, name
and branch. And branch information stored in the branch table.
5
HOW NORMALIZATION WILL SOLVE DELETION ANOMALIES:
The normalization will break existing student table into 2 different tables.
1.) Student table 2.Branch Table
Delete all the student records in the student table still we have the branch details
in the branch table.
Student table Branch table
Roll No Name Branch Branch Hod Ph No
Normalization:-
Definition 1:-
The process of decomposing the relation with anomalies to produce small
and well structure.
Definition 2:-
It is a technique of organizing the data into multiple related tables o
minimize data redundancy.
Normal forms:-
The process of normalization is based on the concept of normal form. Each
and every normal form has its own set of properties and constraints.
A table is said to be in normal form if it is satisfy all the properties of that
normal form.
LEVELS OF NORMALIZATION.
1.) First Normal form
2.) Second Normal form
3.) Third Normal form
4.) Boyce code Normal form
5.) Fourth Normal form
6) Fifth Normal Form
6
Levels of Normalization:-
Table with multivalued attributes Remove multivalued
attributes
First normal form
Remove partial
Second Normal form dependancy
LEVELS OF NORMALIZATION
1 A 98
-- -- 99
2 B 198
-- -- 100
In the above emp table is not in first normal form the contact attribute which are
not atomic.
The contact attribute contains set of values
7
Convert into 1NF
Emp
1 A 98
1 A 99
2 B 198
2 B 100
Example:-
Student
Roll no Name Course
1 Sai c/c++
2 Harish Java
3 Omkar c/dbms
In the above student table course column contain multi valued attributes so
it is not in 1NF.
’sai’ is learning the courses c and c++,’Omkar’ is learning the courses c and
Dbms that means course column contain multi valued attribute.
1st way:-
Student
Roll no Name Course
1 Sai C
1 Sai C++
2 Harish Java
3 Omkar C
3 Omkar dbms
In the above student table Roll no an course can be treated as composite primary
key.
8
2nd way:-
Student
Roll no Name Course1 Course2
1 Sai C C++
3 Omkar C Dbms
3 C
3 Dbms
X------------->y
(Determinant) (Dependent)
Sid Sname
1 Ranjith
9
4 Ranjith
f.d : sidsname
It is valid functional dependency because is it(sname)dependent on sid
1 Ranjith
2 Ranjith
f.d: sidsname
It is valid functional dependency because is it(sname)dependent on sid
3.)
Sid Sname
1 Ranjith
2 Varun
f.d: sidsname
It is valid functional dependency because is it sname dependent on sid
4.)
Sid Sname
1 Ranjith
1 Varun
The above student sname ranjith and varun contain the same sid.so it is invalid f.d
10
Non-Prime attribute:-
An attribute is not a part of primary key is called non-prime attribute.
Definition:-
Every non-prime attribute should be fully f.d on the prime attribute if x-A
holds then their should not be any proper subset y of ‘x’,for which yA also holds
true.
A table should not contain partial dependency.
Ex: - customer
Customer Id Store Id Location
1 1 Delhi
1 3 Mumbai
2 1 Delhi
3 2 Banglore
4 3 Mumbai
Customer Store
Cus_id Storied
Storied Location
1 1
1 Delhi
1 3
2 Banglore
2 1
3 Mumbai
3 2
4 3
11
3.) THIRD NORMAL FORM:-
1.In the 3rd normal form relation must be in the 2nd normal form. There should be
no transitive dependency.
2.Transitive dependency :
A condition where a, b and c are attributes a relation such that ab, and
bc then c is transitively dependent on a via b a-c
3.A relation that is in first and second normal form and which no non primary key
attribute (or) all attributes non transitively depend on primary key.
Ex-Student Location
1 Punjab Mohali
2 Haryana Ambala
3 Punjab Mohali
4 Haryana Ambala
5 Bihar Patna
In the above table functional dependices are Rollno-state and state-city.
City is non key attribute which is functionally dependent on Non-key
attribute state.
We have to remove the transitive dependency in the above table by splitting
the table into student table and location table.
Student Location
Roll no State
State City
1 Punjab
Punjab Mohali
2 Haryana
Haryana Ambala
3 Punjab
Punjab Mohali
4 Haryana
Haryana Ambala
5 Bihar
Bihar Patna
In the student table f.d : Roll no-state
In the location table f.d ; city-state
12
4.) BCNF (BOYCE CODD NORMAL FORM):-
1. A relation is said to be in BCNF if it is already in 3nf and the left hand side of
the every dependency is a candidate key.
2. A relation must be in 1NF,2NF and 3NF.
3. BCNF is the advanced version of 3NF.
4. A table is in BCNF is every f.d : x-y, x is the super key of the table.
5. L.H.S of each functional dependency should be candidate key or super
key.
Example:-
Student
Roll no Name Voter id Age
1 Ravi Ko123 20
2 Varun Mo34 21
3 Ravi K786 23
4 Rahul D286 21
Roll no-name
Functional dependencies Roll no-Voterid
Candidate keys L.H.S Voterid-age
Voterid-->Roll no
13
Ex:-Student
Std_id Subject Activity
Student activity
Std_id Subject Std_id Activity
14
LOSSLESS JOIN DECOMPOSITION:-
The Decomposition is called lossless join Decomposition when the join of the sub
relations results in the same relation or that was decomposed is called lossless join
Decomposition.
Ex:
R(ABC)
A B C
1 2 1
2 5 3
3 3 3
R1 R2
A B B C
1 2 2 1
5 3
2 5
3 3
3 3
R1 ⋈ R2
A B C
1 2 1
2 5 3
3 3 3
3. DECOMPOSITION RELATION:-
15
R
R1 R2
R
TYPES OF DECOMPOSITION:
1.) lossless join Decomposition
2.) lossy join Decomposition
1,) LOSSLESS JOIN DECOMPOSITION:-
The Decomposition is called lossless join Decomposition when the join of the sub
relations results in the same relation or that was decomposed is called lossless join
Decomposition.
Ex:
R(ABC)
A B C
1 2 1
2 5 3
3 3 3
R1 R2
A B B C
1 2 2 1
5 3
2 5
3 3
3 3
R1 ⋈ R2
A B C
1 2 1
2 5 3
3 3 3
16
R relation is decomposed into two sub relations R1& R2
We perform the natural join of the sub-relations R1&R2.we get R Relation.
R
A B C D
1 a p x
2 b q y
R1 R2
A B C D
1 a p x
2 b x y
R1XR2
A B C D
1 a p x
1 a p x
2 b q y
2 b q y
17
RECORD
Record consist of collect of Data value or Item
Record are of two types’
18
RECORD: It is the collection of related data values and items.It describes the
entities and attributes. eg : Employee record .
RECORD TYPES:
It is the collection of field names and their corresponding data types. A data type
specifies the types of values of a field.
FILE: It is a sequence of records.
RECORD
Record consist of collect of Data value or Item
Record are of two types’
19
4. File records are same record type but one or more field have multiple values for
individual record.
5. Files are same record type but some fields are optional.
6. Files contain records of different record types and varying length
RECORD BLOCKS
Records of file are stored in disk blocks. Each block contain numerous records.
FILE HEADER •
It contain information about a file that is needed by the system programs that
access the file records. • It also include the information to determine the disk
address, record format of a file.
6. OPERATIONS ON FILES
Open.
Prepares the file for reading or writing. Allocates appropriate buffers (typically at
least two) to hold file blocks from disk, and retrieves the file header. Sets the file
pointer to the beginning of the file.
Reset.
Sets the file pointer of an open file to the beginning of the file.
20
Copies the current record from the buffer to a program variable in the user
program. This command may also advance the current record pointer to the next
record in the file, which may necessitate reading the next file block from disk.
FindNext.
Searches for the next record in the file that satisfies the search condition.
Delete.
Deletes the current record and (eventually) updates the file on disk to reflect the
deletion.
Modify.
Modifies some field values for the current record and (eventually) updates the file
on disk to reflect the modification.
Insert.
Inserts a new record in the file by locating the block where the record is to be
inserted, transferring that block into a main memory buffer (if it is not already
there), writing the record into the buffer, and (eventually) writ-ing the buffer to disk
to reflect the insertion.
Close.
Completes the file access by releasing the buffers and performing any other needed
cleanup operations.
Scan.
If the file has just been opened or reset, Scan returns the first record; otherwise it
returns the next record. If a condition is specified with the oper-ation, the returned
record is the first or next record satisfying the condition.
Find All.
Locates all the records in the file that satisfy a search condition.
Searches for the first record that satisfies a search condition and then continues to
locate the next n – 1 records satisfying the same condition.
Find Ordered.
21
7. SEQUENTIAL FILE ORGANIZATION
This method is the easiest method for file organization. In this method, files are
stored sequentially. This method can be implemented in two ways:
1. Pile File Method:
o It is a quite simple method. In this method, we store the record in a
sequence, i.e., one after another. Here, the record will be inserted in the
order in which they are inserted into tables.
o In case of updating or deleting of any record, the record will be searched in
the memory blocks. When it is found, then it will be marked for deleting, and
the new record is inserted.
22
Insertion of the new record:
Suppose there is a preexisting sorted sequence of four records R1, R3 and so on
upto R6 and R7. Suppose a new record R2 has to be inserted in the sequence, then
it will be inserted at the end of the file, and then it will sort the sequence.
23
8. HEAP FILE ORGANIZATION
o It is the simplest and most basic type of organization. It works with data
blocks. In heap file organization, the records are inserted at the file's end.
When the records are inserted, it doesn't require the sorting and ordering of
records.
o When the data block is full, the new record is stored in some other block.
This new data block need not to be the very next data block, but it can select
any data block in the memory to store new records. The heap file is also
known as an unordered file.
o In the file, every record has a unique id, and every page in a file is of the
same size. It is the DBMS responsibility to store and manage the new
records.
24
If the database is very large then searching, updating or deleting of record will be
time-consuming because there is no sorting or ordering of records. In the heap file
organization, we need to check all the data until we get the requested record.
Pros of Heap file organization
o It is a very good method of file organization for bulk insertion. If there is a
large number of data which needs to load into the database at a time, then
this method is best suited.
o In case of a small database, fetching and retrieving of records is faster than
the sequential record.
Cons of Heap file organization
o This method is inefficient for the large database because it takes time to
search or modify the record.
o This method is inefficient for large databases.
25
9. HASH FILE ORGANIZATION
Hash File Organization uses the computation of hash function on some fields of the
records. The hash function's output determines the location of disk block where the
records are to be placed.
When a record has to be received using the hash key columns, then the address is
generated, and the whole record is retrieved using that address. In the same way,
when a new record has to be inserted, then the address is generated using the
hash key and record is directly inserted. The same process is applied in the case of
delete and update.
In this method, there is no effort for searching and sorting the entire file. In this
method, each record will be stored randomly in the memory.
26
10. INDEXED SEQUENTIAL ACCESS METHOD (ISAM)
If any record has to be retrieved based on its index value, then the address of the
data block is fetched and the record is retrieved from the memory.
27
Pros of ISAM:
o In this method, each record has the address of its data block, searching a
record in a huge database is quick and easy.
o This method supports range retrieval and partial retrieval of records. Since
the index is based on the primary key values, we can retrieve the data for
the given range of value. In the same way, the partial value can also be
easily searched, i.e., the student name starting with 'JA' can be easily
searched.
Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed
to maintain the sequence.
o When the record is deleted, then the space used by it needs to be released.
Otherwise, the performance of the database will slow down.
11. B+ TREE
o B+ tree file organization is the advanced method of an indexed sequential
access method. It uses a tree-like structure to store records in File.
o It uses the same concept of key-index where the primary key is used to sort
the records. For each primary key, the value of the index is generated and
mapped with the record.
o The B+ tree is similar to a binary search tree (BST), but it can have more
than two children. In this method, all the records are stored only at the leaf
node. Intermediate nodes act as a pointer to the leaf nodes. They do not
contain any records.
29
Types of Cluster file organization:
Cluster file organization is of two types:
1. Indexed Clusters:
In indexed cluster, records are grouped based on the cluster key and stored
together. The above EMPLOYEE and DEPARTMENT relationship is an example of
an indexed cluster. Here, all the records are grouped based on the cluster key-
DEP_ID and all the records are grouped.
30
2. Hash Clusters:
It is similar to the indexed cluster. In hash cluster, instead of storing the records
based on the cluster key, we generate the value of the hash key for the cluster key
and store the records with the same hash key value.
Pros of Cluster file organization
o The cluster file organization is used when there is a frequent request for
joining the tables with same joining condition.
o It provides the efficient result when there is a 1:M mapping between the
tables.
Cons of Cluster file organization
o This method has the low performance for the very large database.
o If there is any change in joining condition, then this method cannot use. If
we change the condition of joining then traversing the file takes a lot of time.
31