0% found this document useful (0 votes)
17 views42 pages

DBMS Unit - 4 Notes

The document provides an overview of database normalization and schema refinement, detailing its purpose in eliminating data redundancy and minimizing anomalies such as insertion, update, and deletion issues. It explains concepts like functional dependency, various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, and 5NF), and the importance of keys in database management systems. Additionally, it outlines types of functional dependencies and keys, emphasizing their roles in maintaining data integrity and relationships within databases.

Uploaded by

kdsiddu7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views42 pages

DBMS Unit - 4 Notes

The document provides an overview of database normalization and schema refinement, detailing its purpose in eliminating data redundancy and minimizing anomalies such as insertion, update, and deletion issues. It explains concepts like functional dependency, various normal forms (1NF, 2NF, 3NF, BCNF, 4NF, and 5NF), and the importance of keys in database management systems. Additionally, it outlines types of functional dependencies and keys, emphasizing their roles in maintaining data integrity and relationships within databases.

Uploaded by

kdsiddu7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

2025

DATABASE MANAGEMENT SYSTEMS


DBMS - UNIT 4 NOTES

FACULTY
SURESH
suresh.mentor@gmail.com
lOMoARcPSD|45012165

DATABASE MANAGEMENT SYSTEMS


UNIT-IV
Schema Refinement (Normalization): Purpose of Normalization or schema refinement,
concept of functional dependency, normal forms based on functional dependency(1NF, 2NF
and 3 NF), concept of surrogate key, Boyce-codd normal form(BCNF), Lossless join and
dependency preserving decomposition, Fourth normal form(4NF), Fifth Normal Form (5NF).

INTRODUCTION
 The Schema Refinement refers to refine the schema by using some technique.
 The best technique of schema refinement is decomposition.
 Normalization or Schema Refinement is a technique of organizing the data
in the database.
 It is a systematic approach of decomposing tables to eliminate data
redundancy and undesirable characteristics like Insertion, Update and
Deletion Anomalies.
 Redundancy refers to repetition of same data or duplicate copies of same data
stored in different locations.
 Anomalies: Anomalies refers to the problems occurred after poorly planned
and normalized databases where all the data is stored in one table which is
sometimes called a flat file.
 By using Decomposition you can eliminate the data redundancy.

PURPOSE OF NORMALIZATION OR SCHEMA REFINEMENT


 Database Normalization is a technique of organizing the data in the database.
 Normalization is a systematic approach of decomposing tables to eliminate
data redundancy and undesirable characteristics like Insertion, Update and
Deletion Anomalies.
Page 1

suresh.mentor@gmail.com
lOMoARcPSD|45012165

 It is a multi-step process that puts data into tabular form by removing


duplicated data from the relation tables.
 If a database design is not perfect, it may contain anomalies, which are like a
bad dream for any database administrator.
 Managing a database with anomalies is next to impossible.

Normalization is used for mainly two purpose,


1. Eliminating redundant (useless) data.
2. Minimize data modification errors, and simplify the query process.

Anomalies or problems facing without normalization (problems due to


redundancy):
 Without Normalization, it becomes difficult to handle and update the
database, without facing data loss.
 Insertion, Updation and Deletion Anamolies are very frequent if Database is
not Normalized.
Type of Anomalies in DBMS

1. Insert Anomaly
2. Update Anomaly
3. Delete Anomaly

1. Insert Anomaly
 The term "insertion anomaly" is used to describe when a new row is added to a
table and it causes an inconsistency.
 If a tuple is inserted in referencing relation and referencing attribute value is
not present in referenced attribute, it will not allow inserting in referencing
relation.

Page 2

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Example:

E_Id E_Name E_Address E_dept

101 Rick Delhi D001


101 Rick Delhi D002
123 Maggie Agra D890
166 Glenn Chennai D900
166 Glenn Chennai D004

Assume that a new employee is joining the company under training and not
assigned to any department. Then, we would not insert the data into the table if the
emp_dept field doesn't allow nulls.

2. Update anomaly:
 If there are some changes in the database, we have to apply that change in all
the rows. And if we miss any row, we will have one more field, creating an
update anomaly in the database.

Example
 In the given table, we have two rows for an employee named Rick, and he
belongs to two different departments of the company. If we need to update
Rick's address, we must update the same address in two rows. Otherwise, the
data will become inconsistent.

 If, in some way, we can update the correct address in one department but not
the other, then according to the database, Rick will have two different
addresses, which is not correct and would lead to inconsistent data.

Page 3

suresh.mentor@gmail.com
lOMoARcPSD|45012165

3. Delete anomaly:

 The term "deletion anomaly in the database" is used when we delete some
rows from a table and any necessary additional information or data is also lost
from the database.
Example
 Assume that if the company closes the department D890, then deleting the
rows that have emp_dept as D890 would also delete the information of
employee Maggie since she is assigned only to this department.

CONCEPT OF FUNCTIONAL DEPENDENCY


 A functional dependency is a “constraint that specifies the relationship
between two sets of attributes where one set can accurately determine the
value of other sets”.
 Functional Dependency helps to maintain the quality of data in the database.
 Functional Dependency is represented by -> (arrow sign)
 It is denoted as X → Y, where X is a set of attributes that is capable of
determining the value of Y.
 The attribute set on the left side of the arrow, X is called Determinant, while
on the right side, Y is called the Dependent.

FUNCTIONAL DEPENDENCY
X→Y
Y is functionally dependent on X

X is Determinant set

Y is Dependent Attribute.

Page 4

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Example:
We have a <Department> table with two attributes − DeptId and DeptName.
DeptId = Department ID
DeptName = Department Name

DeptId DeptName
001 Finance
002 Marketing
003 HR

 The DeptId is our primary key. Here, DeptId uniquely identifies


the DeptName attribute.
 This is because if you want to know the department name, then at first you
need to have the DeptId.
 Therefore, the above functional dependency
between DeptId and DeptName can be determined as DeptId is functionally
dependent on DeptName –

DeptId → DeptName

Key terms
S.no Key Terms Description
1 Decomposition The term decomposition refers to the process in
which we break down a table in a database into
various elements or parts.
2 Dependent It is displayed on the right side of the functional
dependency diagram.
3 Determinant It is displayed on the left side of the functional
dependency Diagram.
4 Union It suggests that if two tables are separate, and the
PK is the same, you should consider putting them
together

Page 5

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Rules of Functional Dependencies


1. Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha,
then alpha holds beta.
2. Augmentation rule − If a → b holds and y is attribute set, then ay → by also
holds. That is adding attributes in dependencies, does not change the basic
dependencies.
3. Transitivity rule − Same as transitive rule in algebra, if a → b holds and b →
c holds, then a → c also holds. a → b is called as a functionally that
determines b.
4. Union: If X→Y and X→Z are two functional dependencies then, X→YZ is also
a functional dependency.
5. Decomposition: If X→YZ is a functional dependency then X→Y and X→Z are
also functional dependencies.

CLOSURE SET OF A FUNCTIONAL DEPENDENCY (F+ )


 Closure of a functional dependency in database design is the process of
finding all the functional dependencies that can be derived from the original
set of functional dependencies, using the properties of functional
dependencies.
 The closure of functional dependencies is important in normalization, as it
helps identify redundant data and improve the efficiency of database
operations..
Steps to Calculate the Closure of Functional Dependency

The steps to calculate the closure of functional dependencies are as follows:

Step 1: Start with the original set of functional dependencies.


Step 2: Repeat the following steps until there are no new functional dependencies to
be added

Page 6

suresh.mentor@gmail.com
lOMoARcPSD|45012165

 For each functional dependency in the set, check if its right-hand side can be
determined using the left-hand side and the functional dependencies in the
set.
 If so, add the functional dependency to the set.

Step 3: The final set of functional dependencies is the closure of the original set.

Example:
Let A, B, C and D be attributes and the functional dependencies are as follows:
A -> B
B -> CD
The closure of this set of functional dependencies is {A -> B, B -> CD, A -> CD}.

Closure of an Attribute Set(x+)-


 The set of all those attributes which can be functionally determined from an
attribute set is called as a closure of that attribute set.

 Closure of attribute set {X} is denoted as {X}+

Steps to Find Closure of an Attribute Set


Following steps are followed to find the closure of an attribute set-

Step-1: Add the attributes contained in the attribute set for which closure is being
calculated to the result set.

Step 2: Recursively add the attributes to the result set which can be functionally
determined from the attributes already contained in the result set.

Page 7

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Example-

Consider a relation R ( A , B , C , D , E , F , G ) with the functional dependencies-


A → BC
BC → DE
D→F
CF → G
Now, let us find the closure of some attributes and attribute sets-
Closure of attribute A-

A+ = { A }
={A,B,C} ( Using A → BC )
={A,B,C,D,E} ( Using BC → DE )
={A,B,C,D,E,F} ( Using D → F )
={A,B,C,D,E,F,G} ( Using CF → G )

Types of Functional Dependency


Functional Dependency has three forms −

 Trivial Functional Dependency


 Non-Trivial Functional Dependency
 Multivalued dependency
 Transitive dependency
1. Trivial Functional Dependency
 In Trivial Functional Dependency, a dependent is always a subset of the
determinant. i.e. If X → Y and Y is the subset of X, then it is called trivial
functional dependency.

Page 8

suresh.mentor@gmail.com
lOMoARcPSD|45012165

For example,

Emp_id Emp_name
E_001 Raju
E_002 Srinu
E_003 Vikas

Consider above table table having two attributes Emp_id and Emp_name.

{Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is


a subset of {Emp_id,Emp_name}.

2. Non-trivial Functional Dependency


 In Non-trivial functional dependency, the dependent is strictly not a subset
of the determinant. i.e. If X → Y and Y is not a subset of X, then it is called
Non-trivial functional dependency.

Example

EMP_ID EMP_NAME EMP_ADDRESS


EMP_001 Ravi Guntur
EMP_002 Rajesh Vijayawada
EMP_003 Srinu Hyderabad

Consider above table table having three attributes Emp_id and


Emp_name,Emp_address.

Then the following functional dependencies are non-trival

emp_id->emp_name(emp_name is not a subset of emp_id)

emp_id->emp_address(emp_address is not a subset of emp_id

Page 9

suresh.mentor@gmail.com
lOMoARcPSD|45012165

3. Multi-valued dependency

 In Multi-valued functional dependency, entities of the dependent set are not


dependent on each other. i.e. If a → {b, c} and there exists no functional
dependency between b and c, then it is called a multi-valued functional
dependency.

Example:

Car_model Maf_year Color


H001 2017 Metallic
H001 2017 Green
H005 2018 Metallic
H005 2018 Blue
H010 2015 Metallic
H033 2012 Gray

In this example, maf_year and color are independent of each other but dependent on
car_model.

In this example, these two columns are said to be multivalue dependent on


car_model.

This dependence can be represented like this:

car_model -> maf_year

car_model-> colour

Page 10

suresh.mentor@gmail.com
lOMoARcPSD|45012165

4. Transitive dependency
 In transitive functional dependency, dependent is indirectly dependent on
determinant. i.e. If a → b & b → c, then according to axiom of transitivity,
a → c. This is a transitive functional dependency.

Example:

STATE CITY ZIPCODE


AP Guntur 522001
TS Nalgonda 508001
KN Bangalore 560007

Consider three columns State(S), City(C), and Zip code (Z)

City(C) is dependent on State(S)

Zip code (Z) is dependent on City(C)

So indirectly Zip code (Z) depends on State(S).

Advantages of Functional Dependency

 Functional Dependency avoids data redundancy. Therefore same data do not


repeat at multiple locations in that database
 It helps you to maintain the quality of data in the database
 It helps you to defined meanings and constraints of databases
 It helps you to identify bad designs

Page 11

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Key’s in DBMS

 A key is a Constraint
 A key in DBMS is an attribute or a set of attributes that help to uniquely
identify a tuple (or row) in a relation (or table).
 Key’s play an important role in the relational database.
 Keys are also used to establish relationships between the different tables and
columns of a relational database.
 Each key having its own functionality.

Types of Key’s in DBMS

There are mainly eight different types of Keys in DBMS and each key has it’s
different functionality:

1. Primary Key
2. Foreign Key
3. Super Key
4. Candidate Key
5. Composite Key
6. Compound Key
7. Alternate Key
8. Surrogate Key

1. Primary Key

 Primary key is a column or group of columns in a table that uniquely identify


every row in that table
 A Primary key is an attribute that can uniquely identify a table.
 A Table can have only one primary key.

Page 12

suresh.mentor@gmail.com
lOMoARcPSD|45012165

 The PK (PRIMARY KEY) constraint that we put on a column/set of columns


won’t allow these to have a null value or a duplicate.

Example:

Create table emp(emp_id int(20) primary key,not null,emp_name varchar(150));

Emp_id (PK) Emp_Name

2. Foreign Key

 Foreign Key is used to establish relationships between two tables.


 The relationship between two tables is known as Referential Integrity.
 A Single table can have “Multiple Foreign Key’s”
 A Foreign key can have null values.
 The table consisting of the foreign key is known as the CHILD table and the
table that is referred to by the foreign key is called the PARENT table.

Example:

Page 13

suresh.mentor@gmail.com
lOMoARcPSD|45012165

3. Super Key

 Super Key is a single attribute or combination of attributes that can be used


to uniquely identify a row in a table.
 A single table can have multiple super keys.
 The Maximum no. of super key’s available particular table is 2n – 1.
Where n is the Maximum no.of Attributes
 A Super key may or may not be a candidate key,but all the candidate key’s
are super key’s.
Example:

Table: Employee

Emp_SSN Emp_Number Emp_Name

123456789 226 Steve


999999321 227 Ajeet
888997212 228 Chaitanya
777778888 229 Robert

Super keys: The above table has following super keys. All of the following sets of
super key are able to uniquely identify a row of the employee table.

 {Emp_SSN}
 {Emp_Number}
 {Emp_SSN, Emp_Number}
 {Emp_SSN, Emp_Name}
 {Emp_SSN, Emp_Number, Emp_Name}
 {Emp_Number, Emp_Name}

Page 14

suresh.mentor@gmail.com
lOMoARcPSD|45012165

4. Candidate Key

 A super key with no redundant data attribute is called Candidate Key..


 A Candidate Key is basically the minimal version of the super key.
 There may be a single candidate key in a table or multiple candidate keys in
the table.
 The count of the candidate key depends on the relational table we have.

Example:

Candidate Keys: As I mentioned in the beginning, a candidate key is a minimal


super key with no redundant attributes. The following two set of super keys are
chosen from the above sets as there are no redundant attributes in these sets.

 {Emp_SSN}
 {Emp_Number}

Difference between Super Key and Candidate Key:

The following are the important differences between a Super Key and a Candidate
Key –

Key Super Key Candidate Key


Super Key is used to identify all Candidate key is a
Definition
the records in a relation. subset of Super Key.
All super keys can't be candidate All candidate keys are
Use
keys. super keys.
Super keys are combined together Candidate keys are
Selection to create a candidate key. combined together to
create a primary key.
Super keys are more than Candidate keys are less
Count Wise
Candidate keys. than Super Keys.

Page 15

suresh.mentor@gmail.com
lOMoARcPSD|45012165

5. Composite Key

 A primary key having two or more attributes is called composite key.


 It is a combination of two or more columns.
 If a single column alone fails to be served as a primary key then
combination columns would help to uniquely access a record from table
such type of keys or nothing but composite keys.

Example:

Here our composite key is OrderID and ProductID −

{OrderID, ProductID}

Let us see another example −

<Student>

StudentID StudentEnrollNo StudentMarks StudentPercentage


S001 0721722 570 90
S002 0721790 490 80
S003 0721766 440 86

Above, our composite keys are StudentID and StudentEnrollNo. The table has two
attributes as primary key.

Therefore, the Primary Key consisting of two or more attribute is called Composite
Key.

Page 16

suresh.mentor@gmail.com
lOMoARcPSD|45012165

6. Compound Key

 A Compound key is similar to a composite key.


 Compound Key has two or more attributes that allow you to uniquely
recognize a specific record.
 It is possible that each column may not be unique by itself within the
database.
 The purpose of the compound key in database is to uniquely identify each
record in the table.

Example:

OrderNo PorductID Product Name Quantity


B005 JAP102459 Mouse 5
B005 DKT321573 USB 10
B005 OMG446789 LCD Monitor 20
B004 DKT321573 USB 15
B002 OMG446789 Laser Printer 3

In this example, OrderNo and ProductID can’t be a primary key as it does not
uniquely identify a record. However, a compound key of Order ID and Product ID
could be used as it uniquely identified each record.

7. Alternative Key

 Alternative Key is also known as Secondary key.


 Alternate Key or Secondary Key is the key that has not been selected to be the
primary key, but are candidate keys.
 ALTERNATE KEYS is a column or group of columns in a table that uniquely
identify every row in that table.

Page 17

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Example:

Emp_ID Emp_Number Emp_Name

E01 2264 Steve


E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert

There are two candidate keys in the above table:


{Emp_Id}
{Emp_Number}

DBA (Database administrator) can choose any of the above key as primary key. Lets
say Emp_Id is chosen as primary key.

Since we have selected Emp_Id as primary key, the remaining


key Emp_Number would be called alternative or secondary key.

8. Surrogate Key

 A Surrogate key is also known as artificial key.


 When a primary is generated at runtime it is called surrogate key or artificial
key.
 A Surrogate key in DBMS is a system generated identifier.
 A surrogate key is a unique identifier for each row in a database table.
 It is often used in database management as the primary key of a table, and it is
typically a sequentially generated or auto-incremented integer.

Page 18

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Example:

Let's consider a table called "Students" with the following attributes:

StudentID (surrogate key)


StudentName
Age
Grade

In this case, the "StudentID" is the surrogate key. It is assigned automatically by the
database system when a new record is added to the table. Here's an example of how
the table might look:

StudentID StudentName Age Grade

1 John 18 A

2 Mary 19 B

3 David 18 A

4 Sarah 19 B

The "StudentID" values are automatically generated, ensuring that each row has a
unique identifier.

Surrogate keys are particularly useful when dealing with large databases and
complex relationships, as they provide a simple and reliable way to identify
individual records.

Page 19

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Normal forms based on functional dependency (1NF, 2NF and 3 NF)

What is Normalization?

 Normalization is the process of organizing the data in the database.

 Normalization is used to minimize the redundancy from a relation or set of


relations. It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.

 Normalization divides the larger table into smaller and links them using
relationships.

 The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?

 The main reason for normalizing the relations is removing these anomalies.

 Failure to eliminate anomalies leads to data redundancy and can cause data
integrity and other problems as the database grows.

 Normalization consists of a series of guidelines that helps to guide you in


creating a good database structure.

Normalization Rule(or) Normal Forms


Here are the most commonly used normal forms:

 First normal form(1NF)


 Second normal form(2NF)
 Third normal form(3NF)
 Boyce & Codd normal form (BCNF)
 Forth normal form(4NF)
 Fifth normal form(5NF)

Page 20

suresh.mentor@gmail.com
lOMoARcPSD|45012165

1. First Normal Form (1NF)


A relation is said to be in first normal form (1NF) to follow the following Rules

 A relation is in first normal form if every attribute in that relation is single


(Atomic) valued attribute.

 Attribute Domain (Data type) does not change.

 There is a unique name for every Attribute/Column.

 The order in which data is stored does not matter.

Example:

Professor
Un Normalized
ID Name Salary Relation Since
salary is a
1 Rama {40000,10000,15000,10000}
Multi valued
attribute

The above table Violates the First Normal Form (1NF) Because Salary attribute is
multi valued So We can eliminate this multi valued attribute by splitting the salary
column to more specific columns like Basic, TA, DA, HRA.

The Above relation in 1NF is as follows.

Professor

ID Name Basic TA DA HRA 1NF


1 Rama 40000 10000 15000 10000

Every Relation in the Relation Database must be in 1NF.

Page 21

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Example: 2

Relation (Table) EMPLOYEE is not in 1NF because of multi-valued attribute


EMP_PHONE.

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

2. Second Normal Form (2NF)


 Second Normal Form(2NF) depends on the concept of Full Functional
Dependences(FFD) and disallows Partial Functional Dependencies(PFD)
 Second Normal Form applies to relations with composite keys, that is,
relations with a primary key composed of two or more attributes.
 The Second Normal Form eliminates partial dependencies on primary keys.

PFD- Some Part of primary key-> Non-Key Attribute

FFD- Primary Key->Non-Key Attribute

Page 22

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Note: Partial Functional Dependencies (PFD) concept occur’s when ever table
consist of more than one primary key. i.e composite key

A relation is said to be in Second normal form (2NF) to follow the following


Rules

 The table must be in first normal form.


 Has no partial dependency, i.e every non-primary-key attribute is fully
functionally dependent on the primary key

Example:

Example (Table violates 2NF)


<StudentProject>

StudentID ProjectID StudentName ProjectName


S89 P09 Olivia Geo Location
S76 P07 Jacob Cluster Exploration
S56 P03 Ava IoT Devices
S92 P05 Alexandra Cloud Deployment

In the above table, we have partial dependency; let us see how −

The prime key attributes are StudentID and ProjectID.

As stated, the non-prime attributes i.e. StudentName and ProjectName should be


functionally dependent on part of a candidate key, to be Partial Dependent.

The StudentName can be determined by StudentID, which makes the relation


Partial Dependent.

Page 23

suresh.mentor@gmail.com
lOMoARcPSD|45012165

The ProjectName can be determined by ProjectID, which makes the relation Partial
Dependent.

Therefore, the <StudentProject> relation violates the 2NF in Normalization and is


considered a bad database design.

To remove Partial Dependency and violation on 2NF, decompose the above


table

<StudentInfo>

StudentID ProjectID StudentName


S89 P09 Olivia
S76 P07 Jacob
S56 P03 Ava
S92 P05 Alexandra

<ProjectInfo>

ProjectID ProjectName
P09 Geo Location
P07 Cluster Exploration
P03 IoT Devices
P05 Cloud Deployment

Now the relation is in 2nd Normal form of Database Normalization

3. Third Normal Form (3NF)


A relation is said to be in Second normal form (2NF) to follow the following
Rules

 The table must be in the second normal form.


 Has no transitive functional dependency.
Transitive functional dependency

 A transitive functional dependency is when changing a non-key column, might


cause any of the other non-key columns to change

Page 24

suresh.mentor@gmail.com
lOMoARcPSD|45012165

 Transitive functional dependency creates deletion, updating, and insertion


anomalies in the database and is considered as a bad database design.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and
A relation can be said to be in the third normal form if at least one of the following is
true for X → Y.

1. Y is a prime attribute
2. X is a superkey

Example: Let’s say a company wants to store the complete address of each
employee, they create a table named Employee_Details that looks like this:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY

11 Jai 33456 HR Gurgaon

12 Navdeep 12312 HR Ambala

13 Shivanshu 67868 US Chicago

14 Mahesh 34535 UK Norwich

15 Vishal 67868 UP Noida

The super key in the above table is

{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}….so on

Candidate key: {EMP_ID}

Non-prime attributes: Here all attributes except {EMP_ID} are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on


EMP_ID.

Page 25

suresh.mentor@gmail.com
lOMoARcPSD|45012165

The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super


key(EMP_ID).

This violates the rules of the third normal form.

We need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP>


table, with EMP_ZIP as a Primary key.

Employee

EMP_ID EMP_NAME EMP_ZIP

12 Jai 33456

13 Navdeep 12312

14 Shivanshu 67868

15 Mahesh 34535

16 Vishal 67868

EMP_ZIP

EMP_ZIP EMP_STATE EMP_CITY

33456 HR Gurgaon

12312 HR Ambala

67868 US Chicago

34535 UK Norwich

67868 UP Noida

Page 26

suresh.mentor@gmail.com
lOMoARcPSD|45012165

4. Boyce Codd normal form (BCNF)


 BCNF (Boyce Codd Normal Form) is an advanced version of the third normal
form (3NF).
 BCNF also known as the 3.5 normal form.
 BCNF was builds upon the concepts of 1NF, 2NF, and 3NF (First, Second, and
Third Normal Forms).
 BCNF is a technique applied to normalize the table by removing any pre-
existing redundancy and anomalies.

Definition

 A table or relation is said to be in BCNF in DBMS if the table or the relation is


already in 3NF, and also, for every functional dependency (let's say, X->Y), X is
either the super key or the candidate key.

Rules for BCNF in DBMS

A table is in BCNF if it meets the following criteria:

 It is in 1NF (First Normal Form).


 It is in 2NF (Second Normal Form).
 It is in 3NF (Third Normal Form).
 For every non-trivial functional dependency (X → Y), X must be a superkey.

Example: <<StudentCourses>>

StudentID CourseID Professor


101 C Dr. Smith
101 C++ Dr. Johnson
102 C Dr. Smith
103 Java Dr. Brown

Page 27

suresh.mentor@gmail.com
lOMoARcPSD|45012165

The Above Table "StudentCourses" having following attributes:

(StudentID, CourseID, Professor).

Each student can take multiple courses, and each course has a professor.

1. 1NF (First Normal Form):

The Above table is already in 1NF because each cell contains a single value.

2. 2NF (Second Normal Form):


2nd Normal form is about removing partial dependencies.

In this case functional dependencies

Primary Keys are- {StudentID, CourseID}

Non-Key Attribute- Proferssor

Professor depends on CourseID.

{StudentID, CourseID}->Professor

{CourseID}->Professor

1. 3NF (Third Normal Form):

3NF deals with transitive dependencies.

In this table, Professor depends only on CourseID, which is part of the candidate
key. So, it's already in 3NF.

2. BCNF (Boyce-Codd Normal Form):

BCNF requires that for every non-trivial functional dependency X → Y, X must be


a super key.

Page 28

suresh.mentor@gmail.com
lOMoARcPSD|45012165

In our example, we have one non-trivial functional dependency:

{CourseID} → {Professor}.

CourseID is not a superkey by itself because table contains duplicate rows.

To meet BCNF, you can split the table into two separate tables: "Courses" and
"Professors."

"Courses" Table:

CourseID Professor
Math101 Dr. Smith
Chem101 Dr. Johnson
Bio101 Dr. Brown

"StudentCourses" Table:

StudentID CourseID
101 Math101
101 Chem101
102 Math101
103 Bio101

Now, each table is in BCNF, and the dependency {CourseID} → {Professor} is


satisfied without violating BCNF constraints.

Page 29

suresh.mentor@gmail.com
lOMoARcPSD|45012165

5. Fourth normal form (4NF)


 Fourth Normal Form comes into picture when Multi-valued
Dependency occurs in any relation.

Multi-valued Dependency

 Multi-valued dependency occurs when two attributes in a table are


independent of each other but, both depend on a third attribute i.e For a
dependency A → B, if for a single value of A, multiple value of B exists, then
the table may have multi-valued dependency.

Example:

Suppose there is a bike manufacturer company which produces two colors (white
and black) of each model every year.

BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White


M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

In this case, these two columns can be called as multi-valued dependent on


BIKE_MODEL.

Page 30

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Rules for 4th Normal Form

For a table to satisfy the Fourth Normal Form, it should satisfy the following two
conditions:

1. It should be in the Boyce-Codd Normal Form.

2. And, the table should not have any Multi-valued Dependency.

Example on 4NF:

STUDENT

STU_ID COURSE HOBBY


21 C Dancing
21 C++ Singing
34 Java Dancing
74 Python Cricket
59 PHP Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two


courses, Computer and Math and two hobbies, Dancing and Singing. So there is a
Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

Page 31

suresh.mentor@gmail.com
lOMoARcPSD|45012165

STUDENT_COURSE

STU_ID COURSE
21 C
21 C++
34 Java
74 Python
59 PHP

STUDENT_HOBBY

STU_ID Hobby
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey

6. Fifth normal form (5NF)


 The 5NF (Fifth Normal Form) is also known as project-join normal form.
 5NF is a higher normal form used in relational database design to eliminate
redundancy and improve data integrity.
 5NF is not applied in practical scenarios and remains limited to theoretical
concepts.

To achieve 5NF, a relation (table) must meet the following Rules:


 It must be in Fourth Normal Form (4NF).
 There is no join dependency i.e. it cannot be further broken down to smaller
tables.

Page 32

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Join Dependency

 Join dependency is a constraint which is similar to functional dependency or


multi-valued dependency.
 A relation is said to have join dependency if it can be recreated by joining multiple
sub relations and each of these sub relations has a subset of the attributes of the
original relation.
 This means that if you split your table into smaller pieces and join them together,
you should get back the original table.

Join Dependency

Condition for join dependency:

If the join of R1 and R2 over Q is equal to relation R then we can say that a join
dependency exists, where R1 and R2 are the decomposition R1 (P, Q) and R2 (Q, S)
of a given relation R (P, Q, S). R1 and R2 are a lossless decomposition of R.

Page 33

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Example

The below relation violates the Fifth Normal Form (5NF) of Normalization −

<Employee>

EmpName EmpSkills EmpJob (Assigned Work)


David Java E145
John JavaScript E146
Jamie jQuery E146
Emma Java E147

The above relation can be decomposed into the following three tables; therefore, it is
not in 5NF –

<EmployeeSkills>

EmpName EmpSkills
David Java
John JavaScript
Jamie jQuery
Emma Java

The following is the <EmployeeJob> relation that displays the jobs assigned to each
employee −

<EmployeeJob>

EmpName EmpJob
David E145
John E146
Jamie E146
Emma E147

Page 34

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Here is the skills that are related to the assigned jobs −

<JobSkills>

EmpSkills EmpJob
Java E145
JavaScript E146
jQuery E146
Java E147

Our Join Dependency −

{(EmpName, EmpSkills ), (EmpName, EmpJob), (EmpSkills, EmpJob)}

The above relations have join dependency, so they are not in 5NF. That would mean

that a join relation of the above three relations is equal to our original

relation <Employee>.

Concept of surrogate key

 A Surrogate key is also known as artificial key.


 When a primary is generated at runtime it is called surrogate key or artificial
key.
 A Surrogate key in DBMS is a system generated identifier.
 A surrogate key is a unique identifier for each row in a database table.
 It is often used in database management as the primary key of a table, and it is
typically a sequentially generated or auto-incremented integer.

Page 35

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Example:

Let's consider a table called "Students" with the following attributes:

StudentID (surrogate key), StudentName, Age, Grade

In this case, the "StudentID" is the surrogate key. It is assigned automatically by the
database system when a new record is added to the table. Here's an example of how
the table might look:

StudentID StudentName Age Grade

1 John 18 A

2 Mary 19 B

3 David 18 A

4 Sarah 19 B

The "StudentID" values are automatically generated, ensuring that each row has a
unique identifier.

Surrogate keys are particularly useful when dealing with large databases and
complex relationships, as they provide a simple and reliable way to identify
individual records.

Features of the surrogate key:


 It is automatically generated by the system.
 It holds anonymous integer.
 It contains unique value for all records of the table.
 The value can never be modified by the user or application.

Page 36

suresh.mentor@gmail.com
lOMoARcPSD|45012165

How to Generate Surrogate Key:

Surrogate keys can be generate in various ways for example if you are using

1) Mysql- use AUTO_INCREMENT Keyword


2) Oracle- use SEQUENCE Keyword
3) SQL SERVER- use IDENTITY Keyword

Advantages of the surrogate key :

1) Surrogate Keys are Unique: because surrogate keys are system-generated, it is


impossible for the system to create and store duplicate values.
2) Enhanced Performance: Using surrogate keys can improve the database
performance.
3) Simplicity: surrogate keys are typical simple

Disadvantages of the surrogate key:

 The surrogate key value can never be used as a search key.


 As the key value has no relation to the data of the table, so third normal form
is violated.
 The extra column for surrogate key will require extra disk space.
 We will need extra IO when we have to insert or update data of the table.

Some examples of Surrogate key are :

 System date & time stamp


 Random alphanumeric string

Page 37

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Lossless join and dependency preserving decomposition

Decomposition in DBMS
 Decomposition in Database Management System is to break a relation into
multiple relations to bring it into an appropriate normal form.
 It helps to remove redundancy, inconsistencies, and anomalies from a
database.
 If a relation is not properly decomposed, then it may lead to other problems
like information loss, etc.

Types of Decomposition

Lossless Join Decomposition in DBMS?

 Lossless join decomposition is a technique used in database normalization to


decompose a relation schema into multiple smaller schemas without losing any
information.
 The process involves breaking down a single relation into multiple smaller
relations that can be joined back together to reconstruct the original relation.

Page 38

suresh.mentor@gmail.com
lOMoARcPSD|45012165

Example:

Consider the following relations- R = (D, E, F)


R1 = (D, E)

R2 = (E, F)

 The relation R has 3 attributes D, E, and F. The relation R is decomposed into


two relations Relation-1 and Relation-2.
 Relation-1 and Relation-2 both have two attributes.
 Both have a common attribute 'E'.

Now, let us draw a table of Relation R with raw data –

R = (D, E, F)

D E F

78 19 16

39 76 91

78 29 44

It is decomposed as follows-

R1(D, E)

D E

78 19

39 76

78 29

Page 39

suresh.mentor@gmail.com
lOMoARcPSD|45012165

R2(E, F)

E F

19 16

76 91

29 44

Let us check the first condition. It was The union of the sub-relations R1 and R2
must contain all the attributes that are available in the original relation R before
decomposition.

So, R1 U R2= R

D E F

78 19 16

39 76 91

78 29 44

The relation obtained above is same as the original relation R. We can say that it is
an example of Lossless-join decomposition.

Dependency-Preserving decomposition in DBMS?

 It is an important constraint of the database.

 Dependency-Preserving Decomposition is a technique used in database


management to decompose a large database into smaller, more manageable
parts while preserving (maintain) the relationships (dependencies) between the
data in the original database.

Page 40

suresh.mentor@gmail.com
lOMoARcPSD|45012165

 The objective of this technique is to minimize the loss of information or


functional dependencies among the attributes in the decomposed database.

 In the dependency preservation, at least one decomposed table must satisfy


every dependency.

 If a relation R is decomposed into relation R1 and R2, then the dependencies


of R either must be a part of R1 or R2 or must be derivable from the
combination of functional dependencies of R1 and R2.
 For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into
R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a
part of R1(ABC).

Example 1
Let R (A, B, C, D) and F = {A -> B, A -> C, C -> D}
R is decomposed into
R1 = (A, B, C) with the FD’s
F1 = {A -> B, A -> C}
R2 = (C, D) with the FD’s
F2 = {C -> D}
F’ = F1 U F2 = {A -> B, A -> C, C -> D}
Hence, F’+ = F+
Hence, the decomposition is dependency preserving and also loss less.

Page 41

suresh.mentor@gmail.com

You might also like