06 - Normalization to 3NF
Design Theory for
Relational Databases
Normalization to 3NF
 There are a lot of choices among
attribute sets of a relational schema
 Some choices are better than others
 We will study
 Desirable properties
 How to obtain these desirable properties
 Department of Computer Science
Northern Illinois University
2000
Basic Concepts
 Study following relation
10 Main
10 Main
20 State
20 State
30 Elm
Basic Concepts
 Study following relation
SP ( Supp-Name, Supp-Addr, Item,
John
John
Jane
Jane
Frank
Do you see any
problems with
this relation?
Apple
Orange
Grape
Apple
Mango
Price )
SP ( Supp-Name, Supp-Addr, Item,
$2.00
2.50
1.25
2.25
6.00
John
John
Jane
Jane
Frank
10 Main
10 Main
20 State
20 State
30 Elm
Apple
Orange
Grape
Apple
Mango
Price )
$2.00
2.50
1.25
2.25
6.00
Basic Concepts
 What happens if John moves?
SP ( Supp-Name, Supp-Addr, Item,
John
John
Jane
Jane
Frank
10 Main
10 Main
20 State
20 State
30 Elm
Apple
Orange
Grape
Apple
Mango
Every tuple of Johns
would have to be
changed.
Basic Concepts
 What happens if John moves?
Price )
SP ( Supp-Name, Supp-Addr, Item,
$2.00
2.50
1.25
2.25
6.00
John
John
Jane
Jane
Frank
5
10 Main
10 Main
20 State
20 State
30 Elm
Apple
Orange
Grape
Apple
Mango
Price )
$2.00
2.50
1.25
2.25
6.00
6
06 - Normalization to 3NF
Basic Concepts
We would lose all the
information about
Frank and his address.
Basic Concepts
 What happens to Franks Address if we
delete his tuple because he is
temporarily not supplying us?
 What happens to Franks Address if we
delete his tuple because he is
temporarily not supplying us?
SP ( Supp-Name, Supp-Addr, Item,
SP ( Supp-Name, Supp-Addr, Item,
John
John
Jane
Jane
Frank
10 Main
10 Main
20 State
20 State
30 Elm
Apple
Orange
Grape
Apple
Mango
Price )
$2.00
2.50
1.25
2.25
6.00
Basic Concepts
 What if we wish to keep track of a new
supplier and the address but do not
know what they will be supplying?
SP ( Supp-Name, Supp-Addr, Item,
John
John
Jane
Jane
Frank
10 Main
10 Main
20 State
20 State
30 Elm
Apple
Orange
Grape
Apple
Mango
10 Main
10 Main
20 State
20 State
30 Elm
We cannot insert a new tuple
with information about
a supplier until we know what
item they will be supplying!!
Remember the Entity Integrity
Constraint
Apple
Orange
Grape
Apple
Mango
John
John
Jane
Jane
Frank
$2.00
2.50
1.25
2.25
6.00
Basic Concepts
SP ( Supp-Name, Supp-Addr, Item,
Price )
$2.00
2.50
1.25
2.25
6.00
John
John
Jane
Jane
Frank
Price )
10 Main
10 Main
20 State
20 State
30 Elm
Apple
Orange
Grape
Apple
Mango
Price )
$2.00
2.50
1.25
2.25
6.00
10
Redundancy
Anomalies
 When attribute values are repeated
unnecessarily
 Notice that address is repeated for each
item that is supplied
11
 Update Anomaly
 caused by redundant information
 must find all copies of information in order
to prevent inconsistencies when updating
12
06 - Normalization to 3NF
Anomalies
 Deletion Anomaly
Anomalies
 Insertion Anomaly
 occurs when data is lost during a deletion
that we do not wish to be lost
 occurs when there are attributes within a
tuple that are logically related to only part
of the primary key
 occurs when we cannot insert some
information into a tuple because of a
violation of a relational constraint
 occurs when a multiple attribute key cannot
be fully completed as necessary for
insertion
13
Fixing Anomalies
 Decompose the relation
SP ( Supp-Name, Supp-Addr, Item, Price )
Supp( Supp-Name, Supp-Addr)
John
10 Main
Jane
20 State
Frank
30 Elm
SP-Item ( Supp-Name, Item, Price )
John
John
Jane
Jane
Frank
Apple $2.00
Orange 2.50
Grape 1.25
Apple 2.25
15
Mango 6.00
Decomposition of Relations
14
Fixing Anomalies:
Decomposing Relations
Notice there is not any
redundancy on supplier
and supplier address.
Update anomaly eliminated.
Supp( Supp-Name, Supp-Addr)
John
10 Main
Jane
20 State
Frank
30 Elm
We can insert new supplier
and its address and keep this
information without knowing
the item that will be
supplied.
Insertion anomaly eliminated.
We can delete the fact
SP-Item ( Supp-Name, Item, Price )
that Frank supplies
John
Apple $2.00
mangos without losing
John
Orange 2.50
Franks address.
Jane
Grape 1.25
Deletion anomaly eliminated
Jane
Apple
2.25
Frank
Mango
6.00
16
Relational Design
Methodology
 Disadvantages
 How do we tell whether one relation is
better than another?
 it is more expensive to solve queries
 example: Get the address of suppliers
supplying grapes.
 Check for anomalies
 Normalization (also called functional
dependency theory)
SP (Supp-Name, Supp-Addr, Item, Price )
Only need ONE relation with the original
Supp( Supp-Name, Supp-Addr)
SP-Item ( Supp-Name, Item, Price )
with decomposed schema - need to join two
relations
17
18
06 - Normalization to 3NF
Functional Dependency
 Functional dependency
Functional Dependency
 Functional dependency
 constraints in the data that depend upon
NOT on the values within a given tuple
BUT on whether or not two tuples agree on
certain components.
 Let R be a relation and let X and Y be subsets
of the attributes (one or more) of R we say
X
Y
 X functionally determines Y
 y is functionally dependent on X
 if for all the tuples of R it is NOT possible that
two tuples agree on X but disagree on Y
19
20
Functional Dependency
Functional Dependency
 Functional dependency
 Person (SSN, Age, Gender)
FD = { SSN
Age
SSN
Gender
Age
Gender }
 Given a unique value of X, we can ALWAYS
determine a value of Y
 Since two people of the same age can be
of different genders
21
22
Functional Dependency
Functional Dependency
 FDs are assertions about the real world
which cannot be proven
 FDs are established by the database
designer by considering the meaning of the
attributes
 FDs MUST hold for all possible data values
 FDs can be enforced during insertion if
programmed and told to do so by the DBA
23
Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )
e1
e2
e1
e3
p1
p2
p3
p3
s1
s2
s1
s1
d1
d2
d1
d1
c1
c2
c3
c3
FD = { Emp-ID,Project Project, Supv, Dept, Case
Emp-ID
Supv
Supv, Dept
Dept }
24
06 - Normalization to 3NF
Insert Anomaly:
We cannot enter a new employee
say e4 who works for supervisor
s4 until e4 works on some project.
What do you see that is
wrong with this relation?
Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )
e1
e2
e1
e3
Update Anomaly:
If e1 has a new
supervisor, then
more than one tuple
has to be changed.
p1
p2
p3
p3
s1
s2
s1
s1
d1
d2
d1
d1
c1
c2
c3
c3
Before Normal Forms
 Unnormalized
 repeating groups
 R (A, B, C, D (d1, d2, d3)*, E, F)
FD = { A  B, C, E, F
A, d1,  d2, d3 }
Deletion Anomaly:
If e2 does not work on project
p2 and we delete this tuple, we
will loose ALL information about
e2s supervisor, etc.
R1 (A, B, C, E, F)
D (A, d1, d2, d3)
FD = { A  B, C, E, F }
FD2 = { A, d1  d2, d3 }
25
26
Normal Forms
First Normal Form
Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )
 1NF (First Normal Form)
This
relation is in
1NF.
All values
are atomic!
 all values are atomic
e1
e2
e1
e3
p1
p2
p3
p3
s1
s2
s1
s1
d1
d2
d1
d1
c1
c2
c3
c3
FD = { Emp-ID,Project  Emp-ID, Project, Supv, Dept,Case
Emp-ID
Supv, Dept
Supv
Dept }
27
28
Definitions
Definitions
 Remember definitions of key
 Remember definitions of key
 Super Key:
 A candidate key of a relation functionally
determines ALL attributes of the relation.
 an attribute or set of attributes that uniquely
identify a tuple (can be > 1 in a relation)
 Candidate Key:
 a minimum set of attributes that uniquely
identify a tuple (can be > 1 in a relation)
 a minimal super key
 Primary Key:
 one and only one per relation.
 a chosen candidate key
29
30
06 - Normalization to 3NF
Definitions
Definitions
 Prime Attribute
 Fully Dependent
 if an attribute appears in a key of a relation,
then it is a prime attribute.
 an attribute set Y is fully dependent on
attribute set X if
X  Y
and Y cannot be determined by any
subset of X
 In Emp-Proj,
 In Emp-Proj, Emp-ID is prime
 Non-Prime Attribute
 an attribute not appearing in a key of a
relation.
 Case is fully dependent on Emp-ID, Project
 Supv and Dept are NOT fully dependent on
Emp-ID, Project
 In Emp-Proj, Supv is non-prime
31
Normal Forms
32
NOT in 2NF because
Supv and Dept are NOT
fully dependent on
Emp-ID, Project
Second Normal Form
Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )
 1NF (First Normal Form)
 all values are atomic
 2NF (Second Normal Form)
This FD
violates 2NF!
 a relation is in 2NF if it is in 1NF and each
of its non-prime attributes are fully
dependent upon its entire primary key.
e1
e2
e1
e3
p1
p2
p3
p3
FD = { Emp-ID, Project
Emp-ID
33
Second Normal Form
Supv
s1
s2
s1
s1
d1
d2
d1
d1
c1
c2
c3
c3
Project, Supv, Dept, Case
 Supv, Dept
34
 Dept }
Second Normal Form
Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )
FD = { Emp-ID, Project
Emp-ID
Supv
Project, Supv, Dept, Case
 Supv, Dept
 Dept }
EP1 ( Emp-ID, Supv, Dept)
FD1 = { Emp-ID 
Supv
Supv, Dept
Dept }
EP1 ( Emp-ID, Supv, Dept)
FD1 = { Emp-ID 
Supv
Supv, Dept
Dept }
EP2 ( Emp-ID, Project, Case)
FD2 = { Emp-ID, Project
 Case }
EP2 ( Emp-ID, Project, Case)
FD2 = { Emp-ID, Project
 Case }
35
36
06 - Normalization to 3NF
Problems with
Second Normal Form
EP1 ( Emp-ID, Supv, Dept)
e1
s1
d1
e2
s2
d2
e3
s1
d1
EP2 ( Emp-ID, Project, Case)
e1
p1
c1
e2
p2
c2
e1
p3
c3
e3
p3
c3
Insertion Anomaly:
We cannot insert the information that
supervisor s6 works for dept d6 unless
at least one employee works for s6.
EP1 ( Emp-ID, Supv, Dept)
e1
s1
d1
e2
s2
d2
e3
s1
d1
Update Anomaly:
If supervisor s1 is
moved to d4, we have
to make changes in
more than one tuple.
EP2 ( Emp-ID, Project, Case)
e1
p1
c1
e2
p2
c2
e1
p3
c3
e3
p3
c3
Deletion Anomaly:
If e2 is fired,
information that s2
works for d2 is lost.
37
Definitions
38
Normal Forms
 1NF (First Normal Form)
 Transitively Dependent
 A non-prime attribute is transitively
dependent upon the primary key of a
relation if there is also a non-prime (non
key) attribute that functionally determines
the attribute.
 In EP1, Dept is transitively dependent upon
Emp-ID since
 Emp-ID  Supv
 Supv  Dept
 all values are atomic
 2NF (Second Normal Form)
 a relation is in 2NF if it is in 1NF and each
of its non-prime attributes are fully
dependent upon its key.
 3NF (Third Normal Form)
 a relation is in 3NF if it is in 2NF and none
of its non-prime attributes are transitively
dependent on its key.
39
40
Third Normal Form Relations
Third Normal Form
Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )
FD = { Emp-ID,Project Project, Supv, Dept, Case
Emp-ID
EP1 ( Emp-ID, Supv, Dept)
Supv
FD1 = { Emp-ID 
Supv, Dept
Dept }
Supv, Dept
EP2 ( Emp-ID, Project, Case)
Supv
Dept }
FD2 = { Emp-ID, Project
EP1-1( Supv, Dept)
FD1-1 = { Supv  Dept }
EP1-2 (Emp-ID, Supv)
EP1-2 (Emp-ID, Supv)
FD1-2 = { Emp-ID 
 Case }
Final Result
EP1-1( Supv, Dept)
of
Normalization FD1-1 = { Supv  Dept }
FD1-2 = { Emp-ID 
Supv }
41
Supv }
42
06 - Normalization to 3NF
Third Normal Form Relations
Emp-Proj ( Emp-ID, Project, Supv, Dept, Case )
FD = { Emp-ID,Project Project, Supv, Dept, Case
Emp-ID
Supv
Supv, Dept
Dept }
EP1 ( Emp-ID, Supv, Dept)
EP2 ( Emp-ID, Project, Case)
FD1 = { Emp-ID 
Supv 
FD2 = { Emp-ID, Project
Supv, Dept
Dept }
EP1-1( Supv, Dept)
FD1-1 = { Supv  Dept }
 Case }
EP1-2 (Emp-ID, Supv)
FD1-2 = { Emp-ID 
Supv }
43