Unit 4
Unit 4
Database System Concepts - 7th Edition 1.2 ©Silberschatz, Korth and Sudarshan
Anomalies or problems facing without
normalization(problems due to redundancy) : Anomalies
refers to the problems occurred after poorly planned and
unnormalised databases where all the data is stored in one
table which is sometimes called a flat file database.
Let us consider such type of schema –
Database System Concepts - 7th Edition 1.3 ©Silberschatz, Korth and Sudarshan
Here all the data is stored in a single table which causes
redundancy of data or say anomalies as SID and Sname are
repeated once for same CID.
Let us discuss anomalies one by one. Due to redundancy of
data we may get the following problems, those are-
1.insertion anomalies : It may not be possible to store some
information unless some other information is stored as well.
2.redundant storage: some information is stored repeatedly
3.update anomalies: If one copy of redundant data is
updated, then inconsistency is created unless all redundant
copies of data are updated.
4.deletion anomalies: It may not be possible to delete some
information without losing some other information as well.
Database System Concepts - 7th Edition 1.4 ©Silberschatz, Korth and Sudarshan
Removal of Anomalies
These anomalies can be avoided or minimized by designing databases that adhere
to the principles of normalization. Normalization involves organizing data into
tables and applying rules to ensure data is stored in a consistent and efficient
manner. By reducing data redundancy and ensuring data integrity, normalization
helps to eliminate anomalies and improve the overall quality of the database
According to E.F.Codd, who is the inventor of the Relational Database, the goals
of Normalization include:
• It helps in vacating all the repeated data from the database.
• It helps in removing undesirable deletion, insertion, and update anomalies.
• It helps in making a proper and useful relationship between tables.
Database System Concepts - 7th Edition 1.5 ©Silberschatz, Korth and Sudarshan
Disadvantages of Anomalies in Relational Model
• Redundancy: When the same data is stored in various locations, a relational
architecture may cause data redundancy. This can result in inefficiencies and even
inconsistent data.
• Complexity: Establishing and keeping up a relational database calls for specific
knowledge and abilities and can be difficult and time-consuming.
• Performance: Because more tables must be joined in order to access information,
performance may degrade as a database gets larger.
• Incapacity to manage unstructured data: Text documents, videos, and other
forms of semi-structured or unstructured data are not well-suited for the relational
paradigm.
Database System Concepts - 7th Edition 1.6 ©Silberschatz, Korth and Sudarshan
Relational Decomposition
• When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
• In a database, it breaks the table into multiple tables.
• If the relation has no proper decomposition, then it may lead to problems like loss
of information.
• Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.
Database System Concepts - 7th Edition 1.7 ©Silberschatz, Korth and Sudarshan
Types of Decomposition
LOSSLESS DECOMPOSITON
LOSSY DECOMPOSITION
Database System Concepts - 7th Edition 1.8 ©Silberschatz, Korth and Sudarshan
Lossless Decomposition
• If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
• The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
• The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Database System Concepts - 7th Edition 1.9 ©Silberschatz, Korth and Sudarshan
Example:
EMPLOYEE_DEPARTMENT table:
Database System Concepts - 7th Edition 1.10 ©Silberschatz, Korth and Sudarshan
EMPLOYEE table:
Database System Concepts - 7th Edition 1.11 ©Silberschatz, Korth and Sudarshan
DEPARTMENT table
Database System Concepts - 7th Edition 1.12 ©Silberschatz, Korth and Sudarshan
Employee ⋈ Department
Database System Concepts - 7th Edition 1.13 ©Silberschatz, Korth and Sudarshan
Lossy Decomposition
Lossy Decomposition
As the name suggests, when a relation is decomposed into two or
more relational schemas, the loss of information is unavoidable
when the original relation is retrieved.
Let us see an example −
<EmpInfo>
Database System Concepts - 7th Edition 1.14 ©Silberschatz, Korth and Sudarshan
<EmpDetails>
Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance
Now, you won’t be able to join the above tables,
since Emp_ID isn’t part of the DeptDetails relation.
Database System Concepts - 7th Edition 1.15 ©Silberschatz, Korth and Sudarshan
Dependency Preserving
• It is an important constraint of the database.
• In the dependency preservation, at least one decomposed table must satisfy every
dependency.
• If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
• For example, suppose there is a relation R (A, B, C, D) with functional
dependency set (A->BC). The relational R is decomposed into R1(ABC) and
R2(AD) which is dependency preserving because FD A->BC is a part of relation
R1(ABC).
Database System Concepts - 7th Edition 1.16 ©Silberschatz, Korth and Sudarshan
Problem related to decomposition
Here are some common problems related to decomposition in DBMS:
1. Loss of Information:
During the decomposition process, it is possible to lose some information
present in the original table. This occurs when critical relationships or
dependencies between attributes are not preserved properly in the decomposed
relations. As a result, queries might not return the expected results or might
become more complex.
2. Anomalies:
Decomposition can introduce anomalies like update, insertion, and
deletion anomalies. These occur when data inconsistencies arise due to poor
decomposition choices. For example, if a table is decomposed into two smaller
tables, updating information that involves both of these tables may lead to
inconsistencies.
Database System Concepts - 7th Edition 1.17 ©Silberschatz, Korth and Sudarshan
3. Join Complexity:
When a table is decomposed into multiple smaller relations, it often requires
performing joins to retrieve the original data. If the decomposition is not done
efficiently, these joins can become complex and slow down query performance
significantly.
4. Redundancy:
While the goal of decomposition is to eliminate redundancy, it is possible to
introduce new redundancy if the decomposition is not carefully planned.
Redundancy can lead to wasted storage space and can cause update anomalies.
5. Increased Storage Space:
In some cases, decomposition can lead to an increase in storage space. For
example, if the same data needs to be stored in multiple decomposed tables to
maintain referential integrity, it can result in duplication of data.
6. Dependency Preservation:
Decomposition should ideally preserve all the functional dependencies from the
original relation. However, it is not always possible to preserve all dependencies,
and some may be lost during the decomposition process. This can lead to
integrity issues
Database System Concepts - 7th Edition 1.18 ©Silberschatz, Korth and Sudarshan
Functional Dependency
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2
Database System Concepts - 7th Edition 1.19 ©Silberschatz, Korth and Sudarshan
From the above table we can conclude some valid functional dependencies:
• roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine
values of fields name, dept_name and dept_building, hence a valid Functional
dependency
• roll_no → dept_name , Since, roll_no can determine whole set of {name,
dept_name, dept_building}, it can determine its subset dept_name also.
• dept_name → dept_building , Dept_name can identify the dept_building
accurately, since departments with different dept_name will also have a different
dept_building
• More valid functional dependencies: roll_no → name, {roll_no, name} ⇢
{dept_name, dept_building}, etc.
• Here are some invalid functional dependencies:
dept_building → dept_name There can be multiple departments in the same
building. Example, in the above table departments ME and EC are in the same
building B2, hence dept_building → dept_name is an invalid functional
dependency.
Database System Concepts - 7th Edition 1.20 ©Silberschatz, Korth and Sudarshan
Armstrong’s axioms/properties of functional dependencies:
1. Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule
Example, {roll_no, name} → name is valid.
2. Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by
the augmentation rule.
Example, {roll_no, name} → dept_building is valid, hence {roll_no, name,
dept_name} → {dept_building, dept_name} is also valid.
3. Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also
valid by the Transitivity rule.
Example, roll_no → dept_name & dept_name → dept_building, then roll_no →
dept_building is also valid.
Database System Concepts - 7th Edition 1.21 ©Silberschatz, Korth and Sudarshan
Types of Functional Dependencies
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset of the
determinant. i.e. If X → Y and Y is the subset of X, then it is called trivial
functional dependency
Database System Concepts - 7th Edition 1.22 ©Silberschatz, Korth and Sudarshan
Example:
Database System Concepts - 7th Edition 1.23 ©Silberschatz, Korth and Sudarshan
2. Non-trivial Functional Dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the
determinant. i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial
functional dependency.
Example:
Database System Concepts - 7th Edition 1.24 ©Silberschatz, Korth and Sudarshan
3. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not
dependent on each other. i.e. If a → {b, c} and there exists no functional
dependency between b and c, then it is called a multivalued functional
dependency.
For example,
Database System Concepts - 7th Edition 1.25 ©Silberschatz, Korth and Sudarshan
4. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on
determinant. i.e. If a → b & b → c, then according to axiom of transitivity, a → c.
This is a transitive functional dependency.
For example,
Database System Concepts - 7th Edition 1.26 ©Silberschatz, Korth and Sudarshan
5. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines
another attribute or set of attributes. If a relation R has attributes X, Y, Z with the
dependencies X->Y and X->Z which states that those dependencies are fully
functional.
6. Partial Functional Dependency
In partial functional dependency a non key attribute depends on a part of the
composite key, rather than the whole key. If a relation R has attributes X, Y, Z
where X and Y are the composite key and Z is non key attribute. Then X->Z is a
partial functional dependency in RBDMS.
Database System Concepts - 7th Edition 1.27 ©Silberschatz, Korth and Sudarshan
Advantages of Functional Dependencies
Functional dependencies having numerous applications in the field of database
management system. Here are some applications listed below:
1. Data Normalization
Data normalization is the process of organizing data in a database in order to
minimize redundancy and increase data integrity. Functional dependencies play an
important part in data normalization. With the help of functional dependencies we
are able to identify the primary key, candidate key in a table which in turns helps
in normalization.
2. Query Optimization
With the help of functional dependencies we are able to decide the connectivity
between the tables and the necessary attributes need to be projected to retrieve the
required data from the tables. This helps in query optimization and improves
performance.
Database System Concepts - 7th Edition 1.28 ©Silberschatz, Korth and Sudarshan
3. Consistency of Data
Functional dependencies ensures the consistency of the data by removing any
redundancies or inconsistencies that may exist in the data. Functional dependency
ensures that the changes made in one attribute does not affect inconsistency in
another set of attributes thus it maintains the consistency of the data in database.
4. Data Quality Improvement
Functional dependencies ensure that the data in the database to be accurate,
complete and updated. This helps to improve the overall quality of the data, as
well as it eliminates errors and inaccuracies that might occur during data analysis
and decision making, thus functional dependency helps in improving the quality of
data in database.
Database System Concepts - 7th Edition 1.29 ©Silberschatz, Korth and Sudarshan
Advantages of Functional Dependencies
• Through the identification and removal of redundant or unneeded data, they aid in
the reduction of data redundancy in databases.
• By guaranteeing that data is correct and consistent throughout the database, they
enhance data integrity.
• They make it simpler to add, edit, and remove data, which helps with database
management.
Disadvantages of Functional Dependencies
• The process of identifying functional dependencies can be time-consuming and
complex, especially in large databases with many tables and relationships.
• Overly restrictive functional dependencies can result in slow query performance or
data inconsistencies, as data that should be related may not be properly linked.
• Functional dependencies do not take into account the semantic meaning of data,
and may not always reflect the true relationships between data elements.
Database System Concepts - 7th Edition 1.30 ©Silberschatz, Korth and Sudarshan
Finding Attribute Closure and Candidate Keys using
Functional Dependencies
Steps to Find the Attribute Closure
Q.1. Given the FD set of a Relation R, The attribute closure set S is the set of
Attribute Closure A.
• Add A to S.
• Recursively add attributes that can be functionally determined from attributes of
the set S until done.
Database System Concepts - 7th Edition 1.31 ©Silberschatz, Korth and Sudarshan
Given R (E-ID, E-NAME, E-CITY, E-STATE)
FDs = { E-ID->E-NAME, E-ID->E-CITY, E-ID->E-STATE, E-CITY-
>E-STATE }
The attribute closure of E-ID can be calculated as:
• Add E-ID to the set {E-ID}
• Add Attributes that can be derived from any attribute of the set. In this case, E-
NAME and E-CITY, E-STATE can be derived from E-ID. So these are also a part
of the closure.
As there is one other attribute remaining in relation to be derived from E-ID. So
the result is:
(E-ID)+ = {E-ID, E-NAME, E-CITY, E-STATE }
Similarly,
(E-NAME)+ = {E-NAME}
(E-CITY)+ = {E-CITY, E_STATE}
Database System Concepts - 7th Edition 1.32 ©Silberschatz, Korth and Sudarshan
Q.2 Find the attribute closures of given FDs R(ABCDE) = {AB->C, B->D, C-
>E, D->A}. To find (B)+, we will add an attribute in the set using various FDs
which have been shown in the table below.
{B} Triviality
{B, D} B->D
{B, D, A} D->A
{B, D, A, C} AB->C
{B, D, A, C, E} C->E
We can find (C, D)+ by adding C and D into the set (triviality) and then E
using(C->E) and then A using (D->A) and the set becomes.
(C,D)+ = {C,D,E,A}
Concepts
Database System Similarly,
- 7 we
th
can find (B, C)+ by adding
Edition B and C into the set (triviality)
1.33 andKorth
©Silberschatz, thenand Sudarshan
Candidate Key
Candidate Key is a minimal set of attributes of a relationship that can be used to
identify a tuple uniquely. For Example, each tuple of EMPLOYEE relation given
in Table 1 can be uniquely identified by E-ID and it is minimal as well. So it will
be the Candidate key of the relationship.
A candidate key may or may not be a primary key.
Super Key is a set of attributes of a relationship that can be used to identify a
tuple uniquely.
For Example, each tuple of EMPLOYEE relation given in Table 1 can be
uniquely identified by E-ID or (E-ID, E-NAME) or (E-ID, E-CITY) or (E-ID,
E-STATE) or (E_ID, E-NAME, E-STATE), etc. So all of these are super keys of
EMPLOYEE relation.
Finding Candidate Keys and Super Keys of a Relation using FD set.
The set of attributes whose attribute closure is a set of all attributes of the relation
is called the super key of the relation.
For Example, the EMPLOYEE relation shown in Table 1 has the following FD
set. {E-ID->E-NAME, E-ID->E-CITY, E-ID->E-STATE, E-CITY->E-
STATE}.
Database System Concepts - 7th Edition 1.34 ©Silberschatz, Korth and Sudarshan
Let us calculate the attribute closure of different sets of attributes:
(E-ID)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-NAME)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-CITY)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-STATE)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-ID,E-CITY,E-STATE)+ = {E-ID, E-NAME,E-CITY,E-STATE}
(E-NAME)+ = {E-NAME}
(E-CITY)+ = {E-CITY,E-STATE}
As (E-ID)+, (E-ID, E-NAME)+, (E-ID, E-CITY)+, (E-ID, E-STATE)+, (E-ID, E-
CITY, E-STATE)+ give set of all attributes of relation EMPLOYEE. So all of
these are super keys of relation.
Database System Concepts - 7th Edition 1.35 ©Silberschatz, Korth and Sudarshan
The minimal set of attributes whose attribute closure is a set of all attributes of
relation is called the candidate key of the relation.
As shown above, (E-ID)+ is a set of all attributes of relation and it is minimal.
So E-ID will be the candidate key. On the other hand (E-ID, E-NAME) + also is a
set of all attributes but it is not minimal because its subset (E-ID)+ is equal to the
set of all attributes.
So (E-ID, E-NAME) is not a candidate key.
Database System Concepts - 7th Edition 1.36 ©Silberschatz, Korth and Sudarshan
Advantages of Attribute Closure
• Attribute closures help to identify all possible attributes that can be derived from a
set of given attributes.
• They facilitate database design by identifying relationships between attributes and
tables, which can help to optimize query performance.
• They ensure data consistency by identifying all possible combinations of attributes
that can exist in the database.
Disadvantages of Attribute Closure
• The process of calculating attribute closures can be computationally expensive,
especially for large datasets.
• Attribute closures can become too complex to manage, especially as the number of
attributes and tables in a database grows.
• Attribute closures do not take into account the semantic meaning of data, and may
not always accurately reflect the relationships between data elements.
Database System Concepts - 7th Edition 1.37 ©Silberschatz, Korth and Sudarshan
Prime and Non-Prime Attributes
Attributes which are parts of any candidate key of relation are called as prime
attribute, others are non-prime attributes. For Example, STUD_NO in STUDENT
relation is prime attribute, others are non-prime attribute.
Tools like functional dependency and attribute closure are helpful when designing
and optimizing databases. They are useful for:
• Determine the connections between the tables and the attributes.
• Boost query efficiency
• Ascertain data coherence.
Database System Concepts - 7th Edition 1.38 ©Silberschatz, Korth and Sudarshan
GATE Questions
Q.1: Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, N} and the
set of functional dependencies {{E, F} -> {G}, {F} -> {I, J}, {E, H} -> {K, L}, K
-> {M}, L -> {N} on R. What is the key for R? (GATE-CS-2014)
A. {E, F}
B. {E, F, H}
C. {E, F, H, K, L}
D. {E}
Database System Concepts - 7th Edition 1.39 ©Silberschatz, Korth and Sudarshan
Solution:
Database System Concepts - 7th Edition 1.40 ©Silberschatz, Korth and Sudarshan
In a schema with attributes A, B, C, D and E following set of functional
dependencies are given
{A -> B, A -> C, CD -> E, B -> D, E -> A}
Which of the following functional dependencies is NOT implied by the above
set?
A. CD -> AC
B. BD -> CD
C. BC -> CD
D. AC -> BC
Using FD set given in question,
(CD)+ = {CDEAB} which means CD -> AC also holds true.
(BD)+ = {BD} which means BD -> CD can’t hold true. So this
FD is no implied in FD set. So (B) is the required option.
Others can be checked in the same way.
Database System Concepts - 7th Edition 1.41 ©Silberschatz, Korth and Sudarshan
Consider a relation scheme R = (A, B, C, D, E, H) on which the following
functional dependencies hold: {A–>B, BC–> D, E–>C, D–>A}. What are the
candidate keys of R?
(a) AE, BE
(b) AE, BE, DE
(c) AEH, BEH, BCH
(d) AEH, BEH, DEH
(AE)+ = {ABECD} which is not a set of all attributes. So AE is not a candidate
key.
Hence options A and B are wrong.
(AEH)+ = {ABCDEH}
(BEH)+ = {BEHCDA}
(BCH)+ = {BCHDA} which is not set of all attributes.
So BCH is not a candidate key. Hence option C is wrong.
So correct answer is D.
Database System Concepts - 7th Edition 1.42 ©Silberschatz, Korth and Sudarshan
Normalization
A large database defined as a single relation may result in data duplication. This
repetition of data may result in:
• Making relations very large.
• It isn't easy to maintain and update data as it would involve searching many
records in relation.
• Wastage and poor utilization of disk space and resources.
• The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with
redundant data into smaller, simpler, and well-structured relations that are satisfy
desirable properties. Normalization is a process of decomposing the relations into
relations with fewer attributes.
Database System Concepts - 7th Edition 1.43 ©Silberschatz, Korth and Sudarshan
What is Normalization?
Database System Concepts - 7th Edition 1.44 ©Silberschatz, Korth and Sudarshan
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure
to eliminate anomalies leads to data redundancy and can cause data integrity and
other problems as the database grows. Normalization consists of a series of
guidelines that helps to guide you in creating a good database structure.
Data modification anomalies can be categorized into three types:
• Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new
tuple into a relationship due to lack of data.
• Deletion Anomaly: The delete anomaly refers to the situation where the deletion
of data results in the unintended loss of some other important data.
• Updatation Anomaly: The update anomaly is when an update of a single data
value requires multiple rows of data to be updated.
Database System Concepts - 7th Edition 1.45 ©Silberschatz, Korth and Sudarshan
Types of Normal Forms:
Normalization works through a series of stages called Normal forms. The normal
forms apply to individual relations. The relation is said to be in particular normal
form if it satisfies constraints.
Following are the various types of Normal forms:
Database System Concepts - 7th Edition 1.46 ©Silberschatz, Korth and Sudarshan
Normal Form Description
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent
on the primary key.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued
dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be
lossless.
Database System Concepts - 7th Edition 1.47 ©Silberschatz, Korth and Sudarshan
Advantages of Normalization
• Normalization helps to minimize data redundancy.
• Greater overall database organization.
• Data consistency within the database.
• Much more flexible database design.
• Enforces the concept of relational integrity.
Disadvantages of Normalization
• You cannot start building the database before knowing what the user needs.
• The performance degrades when normalizing the relations to higher normal forms,
i.e., 4NF, 5NF.
• It is very time-consuming and difficult to normalize relations of a higher degree.
• Careless decomposition may lead to a bad database design, leading to serious
problems.
Database System Concepts - 7th Edition 1.48 ©Silberschatz, Korth and Sudarshan
First Normal Form (1NF)
Database System Concepts - 7th Edition 1.49 ©Silberschatz, Korth and Sudarshan
The decomposition of the EMPLOYEE table into 1NF has been shown below:
Database System Concepts - 7th Edition 1.50 ©Silberschatz, Korth and Sudarshan
Second Normal Form (2NF)
• In the 2NF, relational must be in 1NF.
• In the second normal form, all non-key attributes are fully functional dependent on
the primary key
Example: Let's assume, a school can store the data of teachers and the subjects
they teach. In a school, a teacher can teach more than one subject.
TEACHER table
Database System Concepts - 7th Edition 1.51 ©Silberschatz, Korth and Sudarshan
In the given table, non-prime attribute TEACHER_AGE is dependent on
TEACHER_ID which is a proper subset of a candidate key. That's why it violates
the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
Database System Concepts - 7th Edition 1.52 ©Silberschatz, Korth and Sudarshan
Third Normal Form (3NF)
• A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
• 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
• If there is no transitive dependency for non-prime attributes, then the relation must
be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions
for every non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Database System Concepts - 7th Edition 1.53 ©Silberschatz, Korth and Sudarshan
Example:
EMPLOYEE_DETAIL table:
Database System Concepts - 7th Edition 1.54 ©Silberschatz, Korth and Sudarshan
Non-prime attributes: In the given table, all attributes except EMP_ID are non-
prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP
dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY)
transitively dependent on super key(EMP_ID). It violates the rule of third normal
form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
Database System Concepts - 7th Edition 1.55 ©Silberschatz, Korth and Sudarshan
EMPLOYEE_ZIP table:
Database System Concepts - 7th Edition 1.56 ©Silberschatz, Korth and Sudarshan
Boyce Codd normal form (BCNF)
Database System Concepts - 7th Edition 1.57 ©Silberschatz, Korth and Sudarshan
In the above table Functional dependencies are as follows:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are
keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
Database System Concepts - 7th Edition 1.58 ©Silberschatz, Korth and Sudarshan
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Database System Concepts - 7th Edition 1.59 ©Silberschatz, Korth and Sudarshan
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is
a key.
Database System Concepts - 7th Edition 1.60 ©Silberschatz, Korth and Sudarshan
Fourth normal form (4NF)
• A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
valued dependency.
• For a dependency A → B, if for a single value of A, multiple values of B exists,
then the relation will be a multi-valued dependency.
Example
STUDENT
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.
Database System Concepts - 7th Edition 1.61 ©Silberschatz, Korth and Sudarshan
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is
a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of
data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
Database System Concepts - 7th Edition 1.62 ©Silberschatz, Korth and Sudarshan
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Database System Concepts - 7th Edition 1.63 ©Silberschatz, Korth and Sudarshan
Fifth normal form (5NF)
• A relation is in 5NF if it is in 4NF and not contains any join dependency and
joining should be lossless.
• 5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
• 5NF is also known as Project-join normal form (PJ/NF).
Example
Database System Concepts - 7th Edition 1.64 ©Silberschatz, Korth and Sudarshan
In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields
required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject
and who will be taking that subject so we leave Lecturer and Subject as NULL.
But all three columns together acts as a primary key, so we can't leave other two
columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1,
P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
Database System Concepts - 7th Edition 1.65 ©Silberschatz, Korth and Sudarshan
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Database System Concepts - 7th Edition 1.66 ©Silberschatz, Korth and Sudarshan
Multi valued Dependencies
Database System Concepts - 7th Edition 1.67 ©Silberschatz, Korth and Sudarshan
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL
and independent of each other.
In this case, these two columns can be called as multivalued dependent on
BIKE_MODEL. The representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and
"BIKE_MODEL multidetermined COLOR".
Database System Concepts - 7th Edition 1.68 ©Silberschatz, Korth and Sudarshan
Join Dependency
Database System Concepts - 7th Edition 1.69 ©Silberschatz, Korth and Sudarshan
Dependency Preservation: A Decomposition D = { R1, R2,
R3…Rn } of R is dependency preserving wrt a set F of
Functional dependency if
Database System Concepts - 7th Edition 1.70 ©Silberschatz, Korth and Sudarshan
Problem:
Let a relation R (A, B, C, D ) and functional dependency {AB –> C, C –> D, D –>
A}. Relation R is decomposed into R1( A, B, C) and R2(C, D). Check whether
decomposition is dependency preserving or not.
R1(A, B, C) and R2(C, D)
closure(A) = { A } // Trivial
closure(B) = { B } // Trivial
closure(C) = {C, A, D} but D can't be in closure as D is not
present R1.
= {C, A}
C--> A // Removing C from right side as it is trivial attribute
Database System Concepts - 7th Edition 1.71 ©Silberschatz, Korth and Sudarshan
closure(AB) = {A, B, C, D}
= {A, B, C}
AB --> C // Removing AB from right side as these are trivial
attributes
closure(BC) = {B, C, D, A}
= {A, B, C}
BC --> A // Removing BC from right side as these are trivial
attributes
closure(AC) = {A, C, D}
NULL SET
F1 {C--> A, AB --> C, BC --> A}.
Similarly F2 { C--> D }