PART-I: Relational Data Base Design: Functional
Dependencies & Normalization for Relational
Databases
Functional Dependencies
A functional dependency is a constraint that specifies the relationship between
two sets of attributes where one set can accurately determine the value of other
sets. It is denoted as X → Y, where X is a set of attributes that is capable of
determining the value of Y. The attribute set on the left side of the arrow, X is
called Determinant, while on the right side, Y is called the Dependent.
Types of Functional Dependencies in DBMS
There are mainly four types of Functional Dependency in DBMS. Following
are the types of Functional Dependencies in DBMS:
Multivalued Dependency
Trivial Functional Dependency
Non-Trivial Functional Dependency
Transitive Dependency
Multivalued Dependency in DBMS
Multivalued dependency occurs in the situation where there are multiple
independent multivalued attributes in a single table. A multivalued
dependency is a complete constraint between two sets of attributes in a
relation. It requires that certain tuples be present in a relation. Consider the
following Multivalued Dependency Example to understand.
Example:
Car_model Maf_year Color
H001 2017 Metallic
H001 2017 Green
H005 2018 Metallic
H005 2018 Blue
H010 2015 Metallic
H033 2012 Gray
In this example, maf_year and color are independent of each other but
dependent on car_model. In this example, these two columns are said to be
multivalue dependent on car_model.
This dependence can be represented like this:
car_model -> maf_year
car_model-> colour
Trivial Functional Dependency in DBMS
The Trivial dependency is a set of attributes which are called a trivial if the set
of attributes are included in that attribute.
So, X -> Y is a trivial functional dependency if Y is a subset of X. Let’s
understand with a Trivial Functional Dependency Example.
For example:
Emp_id Emp_name
AS555 Harry
AS811 George
AS999 Kevin
Consider this table of with two columns Emp_id and Emp_name.
{Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id
is a subset of {Emp_id,Emp_name}.
Non Trivial Functional Dependency in DBMS
Functional dependency which also known as a nontrivial dependency occurs
when A->B holds true where B is not a subset of A. In a relationship, if
attribute B is not a subset of attribute A, then it is considered as a non-trivial
dependency.
Company CEO Age
Microsoft Satya Nadell 51
Google Sundar Pichai 46
Apple Tim Cook 57
Example:
(Company} -> {CEO} (if we know the Company, we knows the CEO name)
But CEO is not a subset of Company, and hence it’s non-trivial functional
dependency.
Transitive Dependency in DBMS
A Transitive Dependency is a type of functional dependency which happens
when “t” is indirectly formed by two functional dependencies. Let’s understand
with the following Transitive Dependency Example.
Example:
Company CEO Age
Microsoft Satya Nadella 51
Google Sundar Pichai 46
Alibaba Jack Ma 54
{Company} -> {CEO} (if we know the compay, we know its CEO’s name)
{CEO } -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{ Company} -> {Age} should hold, that makes sense because if we know the
company name, we can know his age.
Note: You need to remember that transitive dependency can only occur in a
relation of three or more attributes.
Advantages of Functional Dependency
Functional Dependency avoids data redundancy. Therefore same data
do not repeat at multiple locations in that database
It helps you to maintain the quality of data in the database
It helps you to defined meanings and constraints of databases
It helps you to identify bad designs
It helps you to find the facts regarding the database design
Normalization
A large database defined as a single relation may result in data duplication.
This repetition of data may result in:
o Making relations very large.
o It isn't easy to maintain and update data as it would involve searching many
records in relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the
relations with redundant data into smaller, simpler, and well-structured
relations that are satisfy desirable properties. Normalization is a process of
decomposing the relations into relations with fewer attributes.
What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or
set of relations. It is also used to eliminate undesirable characteristics
like Insertion, Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using
relationships.
o The normal form is used to reduce redundancy from the database
table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these
anomalies. Failure to eliminate anomalies leads to data redundancy
and can cause data integrity and other problems as the database
grows. Normalization consists of a series of guidelines that helps to
guide you in creating a good database structure.
Data modification anomalies can be categorized into three types:
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a
new tuple into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the
deletion of data results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single
data value requires multiple rows of data to be updated.
Types of Normal Forms:
Normalization works through a series of stages called Normal forms. The
normal forms apply to individual relations. The relation is said to be in
particular normal form if it satisfies constraints.
Following are the various types of Normal forms:
Normal Forms in DBMS
Normalization is the process of minimizing redundancy from a relation or set
of relations. Redundancy in relation may cause insertion, deletion, and update
anomalies. So, it helps to minimize the redundancy in relations. Normal
forms are used to eliminate or reduce redundancy in database tables.
1. First Normal Form –
If a relation contain composite or multi-valued attribute, it violates first normal
form or a relation is in first normal form if it does not contain any composite or
multi-valued attribute. A relation is in first normal form if every attribute in that
relation is singled valued attribute.
Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-
valued attribute STUD_PHONE. Its decomposition into 1NF has been shown in
table 2.
Example 2 –
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
In the above table Course is a multi-valued attribute so it is not in 1NF.
Below Table is in 1NF as there is no multi-valued attribute
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
2. Second Normal Form –
To be in second normal form, a relation must be in first normal form and relation
must not contain any partial dependency. A relation is in 2NF if it has No Partial
Dependency, i.e., no non-prime attribute (attributes which are not part of any
candidate key) is dependent on any proper subset of any candidate key of the
table.
Partial Dependency – If the proper subset of candidate key determines non-
prime attribute, it is called partial dependency.
Example 1 – Consider table-3 as following below.
STUD_NO COURSE_NO COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
{Note that, there are many courses having the same course fee.}
Here,
COURSE_FEE cannot alone decide the value of COURSE_NO or
STUD_NO;
COURSE_FEE together with STUD_NO cannot decide the value of
COURSE_NO;
COURSE_FEE together with COURSE_NO cannot decide the value of
STUD_NO;
Hence,
COURSE_FEE would be a non-prime attribute, as it does not belong to the
one only candidate key {STUD_NO, COURSE_NO} ;
But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on
COURSE_NO, which is a proper subset of the candidate key. Non-prime
attribute COURSE_FEE is dependent on a proper subset of the candidate
key, which is a partial dependency and so this relation is not in 2NF.
To convert the above relation to 2NF,
we need to split the table into two tables such as :
Table 1: STUD_NO, COURSE_NO
Table 2: COURSE_NO, COURSE_FEE
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO
COURSE_FEE
1 C1 C1
1000
2 C2 C2
1500
1 C4 C3
1000
4 C3 C4
2000
4 C1 C5
2000
NOTE: 2NF tries to reduce the redundant data getting stored in memory. For
instance, if there are 100 students taking C1 course, we don’t need to store its
Fee as 1000 for all the 100 records, instead, once we can store it in the second
table as the course fee for C1 is 1000.
3. Third Normal Form –
A relation is in third normal form, if there is no transitive dependency for non-
prime attributes as well as it is in second normal form.
A relation is in 3NF if at least one of the following condition holds in every
non-trivial function dependency X –> Y
1. X is a super key.
2. Y is a prime attribute (each element of Y is part of some candidate key).
Transitive dependency – If A->B and B->C are two FDs then A->C is called
transitive dependency.
Example 1 – In relation STUDENT given in Table 4,
FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE,
STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE}
Candidate Key: {STUD_NO}
For this relation in table 4, STUD_NO -> STUD_STATE and
STUD_STATE -> STUD_COUNTRY are true. So STUD_COUNTRY is
transitively dependent on STUD_NO. It violates the third normal form.
To convert it in third normal form, we will decompose the relation
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE,
STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)
Example 2 – Consider relation R(A, B, C, D, E)
A -> BC,
CD -> E,
B -> D,
E -> A
All possible candidate keys in above relation are {A, E, CD, BC} All
attributes are on right sides of all functional dependencies are prime.
4. Boyce-Codd Normal Form (BCNF) –
A relation R is in BCNF if R is in Third Normal Form and for every FD,
LHS is super key. A relation is in BCNF iff in every non-trivial functional
dependency X –> Y, X is a super key.
Example 1 – Find the highest normal form of a relation
R(A,B,C,D,E) with FD set as {BC->D, AC->BE, B->E}
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its
subset can determine all attribute of relation, So AC will be
candidate key. A or C can’t be derived from any other attribute
of the relation, so there will be only 1 candidate key {AC}.
Step 2. Prime attributes are those attributes that are part of
candidate key {A, C} in this example and others will be non-
prime {B, D, E} in this example.
Step 3. The relation R is in 1st normal form as a relational
DBMS does not allow multi-valued or composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd
normal form (BC is not a proper subset of candidate key AC)
and AC->BE is in 2nd normal form (AC is candidate key) and B-
>E is in 2nd normal form (B is not a proper subset of candidate
key AC).
The relation is not in 3rd normal form because in BC->D (neither
BC is a super key nor D is a prime attribute) and in B->E
(neither B is a super key nor E is a prime attribute) but to satisfy
3rd normal for, either LHS of an FD should be super key or RHS
should be prime attribute.
So the highest normal form of relation will be 2nd Normal form.
Example 2 –For example consider relation R(A, B, C)
A -> BC,
B ->
A and B both are super keys so above relation is in BCNF.
Key Points –
BCNF is free from redundancy.
If a relation is in BCNF, then 3NF is also satisfied.
If all attributes of relation are prime attribute, then the relation is
always in 3NF.
A relation in a Relational Database is always and at least in 1NF form.
Every Binary Relation ( a Relation with only 2 attributes ) is always in
BCNF.
If a Relation has only singleton candidate keys( i.e. every candidate
key consists of only 1 attribute), then the Relation is always in
2NF( because no Partial functional dependency possible).
Sometimes going for BCNF form may not preserve functional
dependency. In that case go for BCNF only if the lost FD(s) is not
required, else normalize till 3NF only.
There are many more Normal forms that exist after BCNF, like 4NF
and more. But in real world database systems it’s generally not
required to go beyond BCNF.
Exercise 1: Find the highest normal form in R (A, B, C, D, E) under
following functional dependencies.
ABC --> D
CD --> AE
Important Points for solving above type of question.
1) It is always a good idea to start checking from BCNF, then 3 NF, and
so on.
2) If any functional dependency satisfied a normal form then there is no
need to check for lower normal form. For example, ABC –> D is in BCNF
(Note that ABC is a superkey), so no need to check this dependency for
lower normal forms.
Candidate keys in the given relation are {ABC, BCD}
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a super
key so this dependency is not in BCNF. So, R is not in BCNF.
3NF: ABC -> D we don’t need to check for this dependency as it already
satisfied BCNF. Let us consider CD -> AE. Since E is not a prime
attribute, so the relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD is a proper
subset of a candidate key and it determines E, which is non-prime
attribute. So, given relation is also not in 2 NF. So, the highest normal
form is 1 NF.