Database Systems
Lecture 06
Schema Refinement and
Normalization
Chapter 7, Database System Concepts, Silberschatz, Korth,
Sudarshan
Chapter 10, Database Systems; concepts, designs and
applications, S.K.Singh
Instructor: Sana Yousuf
Fatima Jinnah Women University
1
Goals of Normalization
Decide whether a particular relation R is in
good form.
In the case that a relation R is not in good
form, decompose it into a set of relations {R1,
R2, ..., Rn} such that
Each relation is in good form
The decomposition is a lossless-join decomposition
2
Decomposition
Suppose that relation R contains attributes
A1 ... An. A decomposition of R consists of
replacing R by two or more relations such
that:
Each new relation scheme contains a subset of the
attributes of R (and no attributes that do not
appear in R), and
Every attribute of R appears as an attribute of one
of the new relations.
3
Example of Redundancy
4
Decomposition
Eliminates Redundancy
To get back the original relation: Natural Join
5
Unnecessary Decomposition
Join returns original relation: Fine
No redundancy is removed and SID is stored twice: Unnecessary
6
Bad Decomposition
Association between CID and grade is lost
Join returns more rows than original
7
Database Normalization
Database normalization is the process of
removing redundant data from your tables in to
improve storage efficiency, data integrity, and
scalability.
In the relational model, methods exist for
quantifying how efficient a database is. These
classifications are called normal forms (or NF),
and there are algorithms for converting a given
database between them.
Normalization generally involves splitting existing
tables into multiple ones, which must be re-joined
or linked each time a query is issued.
History
Edgar F. Codd first proposed the process of
normalization and what came to be known
as the 1st normal form in his paper A
Relational Model of Data for Large Shared
Data Banks Codd stated:
There is, in fact, a very simple elimination
procedure which we shall call normalization.
Through decomposition nonsimple domains
are replaced by domains whose elements
are atomic (nondecomposable) values.
Normal Form
Edgar F. Codd originally established three
normal forms: 1NF, 2NF and 3NF. There are
now others that are generally accepted, but
3NF is widely considered to be sufficient for
most applications. Most tables when reaching
3NF are also in BCNF (Boyce-Codd Normal
Form).
Normalization
We discuss four normal forms: first, second, third, and Boyce-
Codd normal forms
1NF, 2NF, 3NF, and BCNF
Normalization is a process that improves a database design
by generating relations that are of higher normal forms.
The objective of normalization:
to
tocreate
createrelations
relationswhere
whereevery
everydependency
dependencyisison
onthe
thekey,
key,the
thewhole
wholekey,
key,
and
andnothing
nothingbut
butthe
thekey
key
11
Normalization
There is a sequence to normal forms:
1NF is considered the weakest,
2NF is stronger than 1NF,
3NF is stronger than 2NF, and
BCNF is considered the strongest
12
Normalization
1NF a relation in BCNF, is also in
3NF
2NF a relation in 3NF is also in
2NF
3NF a relation in 2NF is also in
1NF
BCNF
13
First Normal Form 1NF
We say a relation is in 1NF if all values stored in the
relation are single-valued and atomic
Values are simple and cannot be broken down further
EmpNum EmpPhone EmpDegrees
123 233-9876 BE, MS, PhD
333 233-1231 BA, BSc, PhD
679 233-1231 BSc, MSc
Is
Isthe
theabove
aboverelation
relationin
in INF?
INF?
14
First Normal Form 1NF
NewStu
Stuid lastName major credits status socSecNo
S1001 Smith History 90 Senior 100429500
S1003 Jones Math 95 Senior 010124567
S1006 Lee Math 15 Freshman 088520876
CSC
S1010 Burns English 63 Junior 099320985
Art
NewStu(StuId,
S1060 Jones
NewStu(StuId, lastName,
CSC 25major, Freshman
lastName, major, credits, status,
status, socSe
credits,064624738 socS
Assume students can have more than one major
The major attribute is not single-valued for each
tuple
15
First Normal Form 1NF
For
Foreach
eachmulti-valued
multi-valuedattribute,
attribute,create
createaanew
newtable,
table,ininwhich
whichyou
youplace
placethe
thekey
key
ofofthe
theoriginal
originaltable
tableand
andthe
themulti-valued
multi-valuedattribute.
attribute. Keep
Keepthetheoriginal
originaltable,
table,with
with
its
itskey
key
NewStu(StuId, lastName, major, credits, status, socSecNo)
NewStu2(stuId, lastName, credits,status,
socSecNo)
Majors(stuId, major)
Majors
NewStu2 stuId major
Stuid lastName credits status socSecNo
S1001 History
S1001 Smith 90 Senior 100429500 S1003 Math
S1003 Jones 95 Senior 010124567 S1006 CSC
S1006 Lee 15 Freshman 088520876 S1006 Math
S1010 Art
S1010 English
S1010 Burns 63 Junior 099320985 S1060 CSC
16
First Normal Form 1NF
Another Method
Flatten
Flattenthe
theoriginal
originaltable
tableby
bymaking
makingthe
themulti-valued
multi-valuedattribute
attributepart
partofofthe
thekey
key
Student(stuId, lastName, major, credits, status, socSecNo)
Stuid lastName major credits status socSecNo
S1001 Smith History 90 Senior 100429500
S1003 Jones Math 95 Senior 010124567
S1006 Lee CSC 15 Freshman 088520876
S1006 Lee Math 15 Freshman 088520876
NewStu Table Rewritten in 1NF, with {StuId, major} as primary key
S1010 Burns Art 63 Junior 099320985
17
First Normal Form 1NF
Employee
Unnormalized
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C, Perl, Java
2 Barbara Jones 224 IT Linux, Mac
3 Jake Rivera 201 R&D DB2, Oracle, Java
Employee
Normalized to INF
emp_no name dept_no dept_name skills
1 Kevin Jacobs 201 R&D C
1 Kevin Jacobs 201 R&D Perl
1 Kevin Jacobs 201 R&D Java
2 Barbara Jones 224 IT Linux
2 Barbara Jones 224 IT Mac
3 Jake Rivera 201 R&D DB2
3 Jake Rivera 201 R&D Oracle
3 Jake Rivera 201 R&D Java
18
Second Normal Form 2NF
Full Functional Dependency
CLASS
CLASS (Course#,
(Course#, StudID,
StudID, StuLastName,
StuLastName, FacID,
FacID, Sched,
Sched, Ro
R
Only one faculty member teaches each class
Course#, StudID StuLastName, FacID, Sched, Room,
Composite Key
We also have the following functional dependencies
Attributes
Attributesare
arefunctionally
Course# FacID, Sched, Room dependent
functionally
dependent on thecombination
on the combination
Course#
Course# and StudIDbut
and StudID butalso
StudID StuLastName functionally
functionally dependent onaa
dependent on
also
subset
subsetofofthat
thatcombination.
combination.
19
Second Normal Form 2NF
Full Functional Dependence
In
Inaarelation
relationR,
R,attribute
attributeBBofofRRisisfully
fullyfunctionally
functionallydependent
dependenton
onananattribute
attributeor
or
set
setofofattributes
attributesAAofofRRififBBisisfunctionally
functionallydependent
dependenton onAAbut
butnot
notfunctionally
functionally
dependent
dependenton onany
anyproper
propersubset
subsetofofAA
Course#, StudID StuLastName, FacID, Sched, Room,
Course# FacID, Sched, Room
(Partial FD)
StudID StuLastName
(Partial FD)
Grade
Gradeisisfully
fullyfunctionally
functionallydependent
dependenton
onCourse#,
Course#,StudID
StudID
20
Second Normal Form 2NF
AArelation
relationisisinin2NF
2NFififand
andonly
onlyififititisisinin1NF
1NFand
andall
allthe
thenonkey
nonkeyattributes
attributesare
arefully
fully
functionally
functionallydependent
dependenton onthe
thekey
key
If a relation is in 1NF and the key consists of a
single attribute:
The relation is automatically in 2NF
2NF Only when the key is composite
21
Second Normal Form 2NF
i.i. Identify
Identifyeach
eachpartial
partialFD
FD
ii.ii. Remove
Removethe
theattributes
attributesthat
thatdepend
dependon
oneach
eachofofthe
thedeterminants
determinantsso
so
identified
identified
iii.
iii. Place
Placethese
thesedeterminants
determinantsininseparate
separaterelations
relationsalong
alongwith
withtheir
theirdependent
dependent
attributes
attributes
iv.
iv. InInoriginal
originalrelation
relationkeep
keepthe
thecomposite
compositekey
keyand
andany
anyattributes
attributesthat
thatare
arefully
fully
functionally
functionallydependent
dependenton
onall
allofofitit
v.v. Even
Evenififthe
thecomposite
compositekey
keyhas
hasno
nodependent
dependentattributes,
attributes,keep
keepthat
thatrelation
relation
totoconnect
connectlogically
logicallythe
theothers
others
22
Second Normal Form 2NF
CLASS
CLASS (Course#,
(Course#, StudID,
StudID, StuName,
StuName, FacID,
FacID, Sched,
Sched, Room
Room
FDs grouped by determinant:
Course#, StudID StuLastName, FacID, Sched, Room, G
Course# FacID, Sched, Room
StudID StuLastName
Create tables grouped by determinants:
Student (StudID, StuLastName)
Course (Course#, FacID, Sched, Room)
Keep relation with original composite key, with attributes FD on it, if any
Class2 (Course#, StudID, Grade)
23
Second Normal Form 2NF
courseNo stuId stuLastName facId schedule room grade
ART103A S1001 Smith F101 MWF9 H221 A
ART103A S1010 Burns F101 MWF9 H221
ART103A S1006 Lee F101 MWF9 H221 B First Normal Form Relation
CSC201A S1003 Jones F105 TUTHF10 M110 A
CSC201A S1006 Lee F105 TUTHF10 M110 C
HST205A S1001 Smith F202 MWF11 H221
Course Student Class2
courseNo stuId grade stuId stuLastName courseNo facId schedule room
ART103A S1001 A S1001 Smith ART103A F101 MWF9 H221
ART103A S1010 S1010 Burns CSC201A F105 TUTHF10 M110
ART103A S1006 B S1006 Lee HST205A F202 MWF11 H221
CSC201A S1003 A S1003 Jones
CSC201A S1006 C
HST205A S1001
Second Normal Form Relations
24
Third Normal Form 3NF
Transitive Dependency
If A, B, and C are attributes of relation R, such that A B, and B
C, then C is transitively dependent on A
Example:
NewStudent
NewStudent (stuId,
(stuId, lastName,
lastName, major,
major, credits,
credits,
status)
status)
FD:creditsstatus
FD:creditsstatus
By
By transitivity:
transitivity:
stuIdcredits creditsstatus
stuIdcredits creditsstatus implies
implies
25
stuIdstatus
Third Normal Form 3NF
AArelation
relationisisin
in3NF
3NFifif itit isisin
in 2NF
2NF and
andno
no nonkey
nonkey attribute
attributeisis
transitively
transitivelydependent
dependenton onthe thekey
key
Example
NewStudent
NewStudent (stuId,
(stuId, lastName,
lastName, major,
major,
credits,
credits, status)
status)
With FD:creditsstatus
Is NOT in 3NF
26
Third Normal Form 3NF
NewStudent (stuId, lastName, major,
credits, status)
creditsstatus
Remove
Removethethedependent
dependentattribute,
attribute, status,
status,from
fromthe
therelation
relation
Create a new table with the dependent attribute and its
Create a new table with the dependent attribute and its
determinant,
determinant,credits
credits
Keep
Keepthe
thedeterminant
determinantin
inthe
theoriginal
originaltable
table
NewStu2 (stuId, lastName, major,
credits)
Stats (credits, status)
27
Third Normal Form 3NF
NewStudent
Stuid lastName Major Credits Status
S1001 Smith History 90 Senior
S1003 Jones Math 95 Senior
S1006 Lee CSC 15 Freshman
S1010 Burns Art 63 Junior
TransitiveJones
S1060
Dependency
CSC 25 Freshman
NewStu2 Stats
Stuid lastName Major Credits Credits Status
S1001 Smith History 90 15 Freshman
S1003 Jones Math 95
S1006 Lee CSC 15 25 Freshman
63 Junior
S1010 Burns Art 63 90 Senior
95 Senior
Removed
S1060 Transitive
Jones Dependency
CSC 25
28
Stages of Normalisation
Unnormalised
(UDF)
Remove repeating groups
First normal form
(1NF)
Remove partial dependencies
Second normal form
(2NF)
Remove transitive dependencies
Third normal form
(3NF)
Remove remaining functional
dependency anomalies
Boyce-Codd normal
form (BCNF)
Remove multivalued dependencies
Fourth normal form
(4NF)
Remove remaining anomalies
Fifth normal form
(5NF)
Unnormalised Normal Form (UNF)
Definition: A relation is unnormalised when it has not had any
normalisation rules applied to it, and it suffers from various anomalies.
This only tends to occur where the relation has been designed
using a bottom-up approach. i.e., the capturing of attributes to a
Universal Relation from a screen layout, manual report, manual
document, etc...
Recap up to 3NF
Example
inv_no line_no prod_no prod_desc qty
Is the relation in 2NF ? Since
Sinceprod_no
prod_noisisnot
notaacandidate
candidate
key
keyand
andwewehave:
have:
Is the relation in 3NF ?
prod_no
prod_no prod_desc
prod_desc
31
Recap up to 3NF
Example
ename ssn bdate address dnumber dname
Is the relation in 2NF ? Since
Sincednumber
dnumberisisnot
notaacandidate
candidatekey
keyand
and
we have:
we have:
Is the relation in 3NF ? dnumber
dnumber dname
dname
32
Boyce-Codd Normal Form
AArelation
relationisis in
inBCNF
BCNF ifif and
andonly
onlyififevery
everydeterminant
determinant isisaa
candidate
candidatekey key
For a relation with only one candidate key, 3NF
and BCNF are equivalent
We need to check for BNF if:
The candidate keys in the relation are composite keys
(that is, they are not single attributes)
There is more than one candidate key in the relation
The keys are not disjoint, that is, some attributes in the
keys are common
33
BCNF and 3NF
The difference between 3NF and BCNF is
that for a functional dependency A B,
3NF allows this dependency in a relation if
B is a primary-key attribute and A is not a
candidate key
BCNF insists that for this dependency to
remain in a relation, A must be a
candidate key
34
BCNF: Example
NewFac (facName, dept, office, rank, dateHired)
FDs:
office dept
facName,dept office, rank, dateHired Candidate Keys
facName,office dept, rank, dateHired
Primary Key
NewFac is not BCNF because office is not a candidate key
To make it BCNF, remove the dependent attributes to a new relation,
with the determinant as the key
Project into
Fac1 (office, dept)
Fac2 (facName, office, rank, dateHired)
Note we have lost a functional dependency in Fac2 no longer able
to see that {facName, dept} is a determinant, since they are in
different relations
35
Faculty
facName dept office rank dateHired
Adams Art A101 Professor 1975
Byrne Math M201 Assistant 2000
Davis Art A101 Associate 1992
Gordon Math M201 Professor 1982
Hughes Mth M203 Associate 1990
Smith CSC C101 Professor 1980
Smith History H102 Associate 1990
Tanaka CSC C101 Instructor 2001
Vaughn CSC C101 Associate 1995
Fac1 Fac2
office dept facName office rank dateHired
A101 Art Adams A101 Professor 1975
C101 CSC Byrne M201 Assistant 2000
C105 CSC Davis A101 Associate 1992
H102 History Gordon M201 Professor 1982
M201 Math Hughes M203 Associate 1990
M203 Math Smith C101 Professor 1980
Smith H102 Associate 1990
Tanaka C101 Instructor 2001
Vaughn C101 Associate 1995
36
Normalization: Example 1
Consider this Employee relation
EmpNum EmpName DeptNum DeptName
EmpName, DeptNum, and DeptName are non-key
attributes.
Is the relation in 2NF ?
Is the relation in 3NF ?
37
Normalization: Example 1
EmpNum EmpName DeptNum DeptName
We correct the situation by decomposing the original relation
into two 3NF relations.
EmpNum EmpName DeptNum DeptNum DeptName
Verify these two relations are in 3NF
Are they in BCNF?
38
Normalization: Example 2
Relation that stores information
about projects in large business
Work (projName, projMgr, empId, hours, empName, budget, startDate,
salary, empMgr, empDept, rating)
prijName projMgr empId hours empName budget startDate salary empMgr empDept rating
Jupiter Smith E101 25 Jones 100000 01/15/04 60000 Levine 10 9
Jupiter Smith E105 40 Adams 100000 01/15/04 55000 Jones 12
Jupiter Smith E110 10 Rivera 100000 01/15/04 43000 Levine 10 8
Maxima Lee E101 15 Jones 200000 03/01/04 60000 Levine 10
Maxima Lee E110 30 Rivera 200000 03/01/04 43000 Levine 10
Maxima Lee E120 15 Tanaka 200000 03/01/04 45000 Jones 15
39
Normalization: Example 2
1. Each project has a unique name.
2. Although project names are unique, names of employees and
managers are not.
3. Each project has one manager, whose name is stored in projMgr.
4. Many employees can be assigned to work on each project, and an
employee can be assigned to more than one project. The attribute
hours tells the number of hours per week a particular employee is
assigned to work on a particular project.
5. budget stores the amount budgeted for a project, and startDate
gives the starting date for a project.
6. salary gives the annual salary of an employee.
7. empMgr gives the name of the employees manager, who might not
be the same as the project manager.
8. empDept gives the employees department. Department names are
unique. The employees manager is the manager of the employees
department.
9. rating gives the employees rating for a particular project. The
project manager assigns the rating at the end of the employees
work on the project. 40
Normalization: Example 2
Functional dependencies
projName projMgr, budget, startDate
empId empName, salary, empMgr, empDept
projName, empId hours, rating
empDept empMgr
empMgr does not functionally determine empDept since people's
names were not unique (different managers may have same
name and manage different departments or a manager may
manage more than one department
projMgr does not determine projName
Primary Key
projName, empId since every member depends on that
combination
41
projName
projNameprojMgr,
projMgr,budget,
budget,startDate
startDate
empId
empId empName, salary, empMgr,em
empName, salary, empMgr, e
projName,
projName,empId
empIdhours,
hours,rating
rating
Normalization: Example 2
empDept
empDeptempMgr
empMgr
First Normal Form
Each cell is single valued, the relation Work is in 1NF
Second Normal Form
Pratial dependencies
projName projMgr, budget, startDate
empId empName, salary, empMgr, empDept
Transform to:
Proj
Proj(projName,
(projName,projMgr,
projMgr,budget,
budget,startDate)
startDate)
Emp
Emp(empId,
(empId,empName,
empName,salary,
salary,empMgr,
empMgr,
empDept)
empDept)
Work1
Work1(projName,
(projName,empId,
empId,hours,
hours,rating)
rating)
42
Proj
Proj(projName,
(projName,projMgr,
projMgr,budget,
budget,startDate)
startDate)
Emp (empId, empName, salary, empMgr,
Emp (empId, empName, salary, empMgr,
empDept)
empDept)
Work1
Work1(projName,
(projName,empId,
empId,hours,
hours,rating)
rating)
Normalization: Example 2
Second Normal Form
Proj Work1
prijName empId hours rating
prijName projMgr budget startDate
Jupiter E101 25 9
Jupiter Smith 100000 01/15/04
Jupiter E105 40
Maxima Lee 200000 03/01/04
Jupiter E110 10 8
Maxima E101 15
Maxima E110 30
Maxima E120 15
Emp
empId empName salary empMgr empDept
E101 Jones 60000 Levine 10
E105 Adams 55000 Jones 12
E110 Rivera 43000 Levine 10
E101 Jones 60000 Levine 10
E110 Rivera 43000 Levine 10
E120 Tanaka 45000 Jones 15
43
Proj
Proj(projName,
(projName,projMgr,
projMgr,budget,
budget,startDate)
startDate)
Emp
Emp (empId, empName, salary, empMgr,empDept)
(empId, empName, salary, empMgr, empDept)
Work1 (projName, empId, hours, rating)
Work1 (projName, empId, hours, rating)
Normalization: Example 2 empDept empM
empDept empM
Third Normal Form
Proj in 3NF no non-key atrribute functionally
determines another non-key attribute
Work1 in 3NF no transitive dependency involving
hours or rating
Emp not in 3NF transitive dependency
empDept empMgr
Need two relations:
Emp1 (empId, empName, salary, empDept)
Dep (empDept, empMgr)
44
Normalization: Example 2
Third Normal Form
Dept
empDept empMgr
empId empNam salary empDept 10 Levine
e 12 Jones
E101 Jones 60000 10
15 Jones
E105 Adams 55000 12
E110 Rivera 43000 10
E120 Tanaka 45000 15
Work1
prijName empId hours rating
Proj Jupiter E101 25 9
prijName projMgr budget startDate
Jupiter E105 40
Jupiter Smith 100000 01/15/04
Jupiter E110 10 8
Maxima Lee 200000 03/01/04 Maxima E101 15
Maxima E110 30
Maxima E120 15
This is also BCNF since the only determinant in each relation is
the primary key
45