Normalization
Introduction
• Normalization is a systematic approach to organize data
in a database to eliminate redundancy, avoid anomalies
and ensure data consistency.
• The process involves breaking down large tables into
smaller, well-structured ones and defining relationships
between them.
• This not only reduces the chances of storing duplicate
data but also improves the overall efficiency of the
database.
A relation in BCNF is also in 3NF , a relation in
3NF is also in 2NF and a relation in 2NF is also in
1NF.
A relation in BCNF is considered fully normalized.
First Normal Form (1NF)
• This is the most basic level of normalization.
• In 1NF, each table cell should contain only a
single value (indivisible) , and each column
should have a unique name.
• The first normal form helps to eliminate
duplicate data and simplify queries.
Rules for First Normal Form (1NF)
1. Every Column Should Have Single Values
2. All Values in a Column Should Be of the Same
Type
3. Every Column Must Have a Unique Name
4. The Order of Data Doesn’t Matter
Example for 1NF
In the above table, Courses has a multi-valued attribute, so
it is not in 1NF. The Below Table is in 1NF as there is no
multi-valued attribute.
After 1NF
Second Normal Form (2NF)
• A relation is in 2NF if it is in 1NF
• Every non-prime attribute must be fully
dependent on each candidate key.
• Table should not have partial dependencies.
• A functional dependency X->Y (where X and Y
are set of attributes) is said to be in
partial dependency, if Y can be determined by
any proper subset of X.
Example for 2NF
•In Student_Project relation the prime key attributes are Stu_ID
and Proj_ID.
•Rule: Non-key attributes, i.e. Stu_Name and Proj_Name must be
dependent upon both and not on any of the prime key attribute
individually.
•Here,
• Stu_Name can be identified by Stu_ID and
• Proj_Name can be identified by Proj_ID independently.
•This is called partial dependency, which is not allowed in Second
Normal Form.
After 2NF
•Relation is broken as shown in picture.
•So there is no partial dependency.
Third Normal Form (3NF)
• For a relation to be in Third Normal Form, it must
be in Second Normal form and
• No non-prime attribute is transitively dependent
on prime key attribute.
• For any non-trivial functional dependency, X → A,
then either −
– X is a superkey or,
– A is prime attribute.
Example for 3NF
•In the above Student_detail relation, Stu_ID is the key and
only prime key attribute.
•We find that City can be identified by Stu_ID as well as Zip
itself.
•Neither Zip is a superkey nor is City a prime attribute.
•Additionally, Stu_ID → Zip → City, so there exists transitive
dependency.
After 3NF
Boyce-Codd Normal Form (3.5 NF)
• An extension of Third Normal Form
• Rule 1:
• It is in 3NF
• Rule 2: X should be a super-key for every
functional dependency (FD) X−>Y in a given
relation.
Example for BCNF (3.5NF)
• Consider a relation R with attributes (student, teacher, subject).
FD: { (student, Teacher) -> subject, (student, subject) -> Teacher,
(Teacher) -> subject}
Candidate keys are (student, teacher) and (student, subject).
The above relation is in 3NF (since there is no transitive
dependency). A relation R is in BCNF if for every non-trivial FD X-
>Y, X must be a key.
• FD: { (student, Teacher) -> subject, (student, subject) ->
Teacher, (Teacher) -> subject}
Candidate keys are (student, teacher) and (student, subject).
• The above relation is in 3NF (since there is no transitive
dependency).
• A relation R is in BCNF if for every non-trivial FD X->Y, X must
be a key.
• The above relation is not in BCNF, because in the FD (teacher-
>subject), teacher is not a key. This relation suffers with
anomalies −
• For example, if we delete the student Tahira , we will also lose
the information that N.Gupta teaches C. This issue occurs
because the teacher is a determinant but not a candidate key.
After 3.5 NF
R is divided into two relations R1(Teacher,
Subject) and R2(Student, Teacher).
4TH AND 5TH NORMAL FORM IN DBMS
4NF & 5NF
• Two of the highest levels of database
normalization are
– fourth normal form (4NF) and
– fifth normal form (5NF).
– Multivalued dependencies are handled by 4NF,
– join dependencies are handled by 5NF.
Multivalued Dependency
• A multivalued dependency always requires at
least three attributes because it consists of at
least two attributes that are dependent on a
third.
Multivalued Dependency Example
Person Mobile Food_Likes
Mahesh 9893/9424 Burger/Pizza
Ramesh 9191 Pizza
FD:
Person->-> mobile,
Person ->-> food_likes
This is read as “person multi determines mobile” and
“person multi determines food_likes.”
Fourth Normal Form (4NF)
• Rule
• 1. It should be in the Boyce-Codd Normal
Form (BCNF).
• 2. The table should not have any Multi-valued
Dependency.
Example
Join Dependency
• Join decomposition is a further generalization
of Multivalued dependencies. If the join of R1
and R2 over C is equal to relation R then we
can say that a join dependency (JD) exists,
where R1 and R2 are the decomposition R1(A,
B, C) and R2(C, D) of a given relations R (A, B,
C, D). Alternatively, R1 and R2 are a lossless
decomposition of R.
Fifth Normal Form (5NF) / Projected
Normal Form (PJNF)
• Rule:
• 1. R should be already in 4NF.
2. It cannot be further non loss decomposed
(join dependency).
Example for 5NF
• The relation ACP is
decomposed into 3
relations.
Example for 5NF
• The result of the Natural Join of R1 and R3 over ‘Company’
and then the Natural Join of R13 and R2 over ‘Agent’ and
‘Product’ will be Table ACP.
• Hence, in this example, all the redundancies are eliminated,
and the decomposition of ACP is a lossless join
decomposition. Therefore, the relation is in 5NF as it does
not violate the property of lossless join.
References
• BCNF 3.5
– https://www.geeksforgeeks.org/boyce-codd-norm
al-form-bcnf/