INTRODUCTION TO SCHEMA REFINEMENT:
Data redundancy means duplication of data. It causes duplicate data at different locations which
destroys the integrity of the database and wastage of storage space.
The problems of redundancy are:
Wasted Storage Space. 2. More difficult Database Updates. 3. A Possibility of Inconsistent data.
FUNCTIONAL DEPENDENCY:
The normalization theory based on the fundamental notion of FUNCTIONAL DEPENDENCY.
Given a relation R, attribute A is functionally dependent on attribute B if each value of value of A in
R is associated with precisely one value of B.
(OR)
In other words, attribute A is functionally dependent on B if and only if, for each value of B, there is
exactly one value of A.
(OR)
Functional dependency is a relationship that exists when one attribute uniquely determines another
attribute.
If R is a relation with attributes X and Y, a functional dependency between the attributes is
represented as
X->Y, which specifies Y is functionally dependent on X.
Here X is a determinant set and Y is a dependent attribute. Each value of X is associated precisely
with one Y value.
Consider the following table EMPLOYEE
EMPLOYEE
CODE NAME CITY
E1 Mac Delhi
E2 Sandra CA
E3 Henry Paris
Given a particular value of CODE, there is exactly one corresponding value for NAME.
For example, for CODE E1 there is exactly one value of NAME, Mac. Hence NAME is functionally
dependent on code.
Similarly, there is exactly one value of CITY for each value of CODE. Hence, the attribute CITY is
functionally dependent on the attribute CODE.
The attribute CODE is the determinant. You can also say that CODE determines CITY and NAME.
i.e
Code- - -Name
Code- - -City
And also Code- - -Name, City
From above statements, we can conclude that,
Where code is determinant and name, City are dependent and we can say that Name, City are
functionally dependent on Code.
Rules about Functional Dependencies
The set of all FD’s implied by a given set F of FD’s is called the closure of F, denoted as F +.
The following three rules, called armstrong’s axioms, can be applied repeatedly to compute all FDs
implied by a set of FDs. Here, we use X, Y and Z to denote sets of attribute over a relation schema R,
Rule 1:Reflexivity: if X Y, then X Y.
Rule 2:Augmentation:if X Y, then XZ YZ for any Z.
Rule 3:Transitivity: if X Y and Y Z, then X Z.
It is convenient to use some additional rules while reasoning about F+. Rule 4:Union:If x Y and X
z, then X YZ.
Rule 5:Decomposition: If X YZ, then X Y and X Z. Rule 6:Pseudotranstivity:If X Y and Y
P, then XZ P.
LOSSLESS JOIN DECOMPOSITION:
A decomposition {R1, R2,…, Rn} of a relation R is called a lossless decomposition for R if the
natural join of R1, R2,…, Rn produces exactly the relation R.
R : relation
F : set of functional dependencies on R X,Y : decomposition of R Decomposition is lossles if :
X ∩ Y X, that is: all attributes common to both X and Y functionally determine ALL
the attributes in X OR
X ∩ Y Y, that is: all attributes common to both X and Y functionally determine ALL
the attributes in Y
In other words, if X ∩ Y forms a superkey of either X or Y, the decomposition of R is a lossless
decomposition.
Example: Given:
Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount)
Required FD’s:
branch-name branch-city assets
loan-number amount branch-name
Decompose Lending-schema into two schemas:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
Since Branch-schema ∩ Loan-info-schema = {branch-name}
Thus, this decomposition is Lossless decomposition.
DEPENDENCY PRESERVATION PROPERTY:
A decomposition of a relation R into R1, R2, R3, …, Rn is dependency preserving decomposition
with respect to the set of Functional Dependencies F that hold on R only if the following is hold;
(F1 U F2 U F3 U … U Fn)+ = F+
where,
F1, F2, F3, …, Fn – Sets of Functional dependencies of relations R1, R2, R3, …, Rn.
(F1 U F2 U F3 U … U Fn)+ - Closure of Union of all sets of functional dependencies.
F+ - Closure of set of functional dependency F of R.
If the closure of set of functional dependencies of individual relations R1, R2, R3, …, Rn are equal
to the set of functional dependencies of the main relation R (before decomposition), then we would
say the decomposition D is lossless dependency preserving decomposition.
Discussion with Example of Non-dependency preserving decomposition:
Dependency preservation is a concept that is very much related to Normalization process.
Remember that the solution for converting a relation into a higher normal form is to decompose the
relation into two or more relations. This is done using the set of functional dependencies identified
in the lower normal form state.
For example, let us assume a relation R (A, B, C, D) with set of functional dependencies
F ={A→B, B→C, C→D}. There is no partial dependency in the given set F. Hence, this relation is
in 2NF.
Is R (A, B, C, D) in 3NF? No. The reason is Transitive Functional Dependency.
How do we convert R into 3NF? The solution is decomposition.
Assume that we decompose R(A, B, C, D) into R1(A, C, D) and R2(B, C).
In R1(A, C, D), the FD C→D holds. In R2(B, C), the FD B→C holds. But, there is no trace of the
FD A→B. Hence, this decomposition does not preserve dependency.
What wrong would the above decomposition cause?
In R, the following were held;
value of B depends on value of A,
value of C depends on value of B,
value of D depends on value of C.
after decomposition, the relations R1 and R2 are holding the following;
value of C depends on value of B, value of D depends on value of C.
The dependency A→B is missing. This causes acceptance of any values for B in R2. It causes
duplicate values to be entered into R2 for B which may not depend on A. If we would like to avoid
this type of duplication, then we need to perform a join operation between R1 and R2 to accept a
new value for B which is costlier operation. Hence, we demand the decomposition to be a
dependency preserving decomposition.
INTRODUCTION TO NORMALIZATION:
Redundancy:
Redundancy means repetition of data.
Redundancy increases the time involved in updating, adding, and deleting data.
It also increases the utilization of disk space and hence, disk I/O increases.
DEFINITION OF NORMALIZATION:
Normalization is scientific method of breaking down complex stable structures into simple table
structures by using certain rules.
Using this method, you can, reduce redundancy in a table and elimates the problems of inconsistency
and disk space usage.
We can also ensure that there is no loss of information.
Benefits of Normalization:
Normalization has several benefits.
It enables faster sorting and index creation, more clustered indexes, few indexes per table, few
NULLs, and makes the database compact.
Normalization helps to simplify the structure of table. The performance of an application is directly
linked to database design.
A poor design hinders the performance of the system.
The logical design of the database plays the foundation of an optical database.
The following are the some rules that should be followed to achieve a good data base design
.They are:
Each table should have an identifier.
Each table should store data for single type entity.
Columns that accept NULLs should be avoided.
The repetition of values or columns should be avoided.
NORMAL FORMS:
The normalization results in the formation of that satisfy certain specified rules and represent certain
normal forms.
The normal forms are used to ensure that several of anomalies and inconsistencies are not introduced in the
database.
A table structure is always in a certain normal form. Several normal forms have been identified.
The following are the most important and widely used normal forms are:
First normal form(1NF)
Second normal form(2NF)
Third normal form(3NF)
Boyce-Codd Normal Form (BCNF)
Fourth Normal Form (4NF)
The first, second and third normal forms are originally defined by Dr. E. F. Codd.
Later, Boyce and Codd introduced another form called the Boyce-Codd Normal form.
FIRST NORMAL FORM (1NF):
.A table is said to be 1NF when each cell of the table contains precisely one value.
Consider the following table PROJECT.
PROJECT:
ECODE DEPT DEPTHEAD PROJCODE HOURS
E101 Systems E901 P27 90
P51 P20 101
60
E305 Sales E906 P27 109
P22 98
E508 Admin E908 P51 NULL
P27 72
The data in the table is not normalized because the cells in the PROJCODE and HOURS have more
than one value.
By applying the INF to the PROJECT table, the resultant table is as follows:
PROJECT:
ECODE DEPT DEPTHEAD PROJCODE HOURS
E101 Systems E901 P27 90
E101 Systems E901 P51 101
E101 Systems E901 P20 60
E305 Sales E906 P27 109
E305 Sales E906 P22 98
E508 Admin E908 P51 NULL
SECOND NORMAL FORM (2NF)
A table is said to be in 2NF when it is in 1NF and every attribute in the row is functionally dependent
upon the whole key, and not just part of the key.
Consider the PROJECT table.
PROJECT
ECODE
PROJCODE
DEPT
DEPTHEAD
HOURS
The table has following rows.
ECODE PROJCODE DEPT DEPTHEAD HOURS
E101 P27 Systems E901 90
E305 P27 Finance E909 10
E508 P51 Admin E908 NULL
E101 P51 Systems E901 101
E101 P20 Systems E901 60
E508 P27 Admin E908 72
The above table satisfies the definition of 1NF and now we have to now check if it satisfies 2NF.
In the above table, for each value of ECODE, there is more than one value of HOURS.
For example, for ECODE, E101, there are 3 values of HOURS-90, 101 and 60. Hence, HOURS is
not functionally dependent on ECODE.
Similarly, for each value of PROJCODE, there is more than 1 value of HOURS.
For example, for PROJCODE P27, there are three values of HOURS-90, 10 and 72.
However, for a combination of the ECODE and PROJCODE values, there is exactly one value of
HOURS. Hence, HOURS is functionally dependent on the whole key, ECODE+PROJCODE.
Therefore DEPT is functionally dependent on the part of the key (Which is ECODE) and not
functionally dependent on the whole key (ECODE+PROJCODE).
For the table to be in 2NF, the non-key attributes must be functionally dependent on the whole key
and not the part of the key.
GUIDELINES FOR CONVERTING THE TABLE TO 2NF:
Find and remove attributes that are functionally dependent on only a part of the key and not on
the whole key. Place them in a different table.
Group the remaining attributes.
After applying the 2NF:
After applying the 2Nf to above table, the resultant tables are as follows: EMPLOYEE -DEPT
ECODE DEPT DEPTHEAD
E101 Systems E901
E305 Finance E909
E508 Admin E908
E508 Admin E908
PROJECT
ECODE PROJCODE HOURS
E101 P27 90
E101 P51 101
E101 P20 60
E305 P27 10
E508 P51 NULL
E508 P27 72
THIRD NORMAL FORM (3NF):
A table is said to be in 3NF when it is in 2NFand every non-key attribute is functionally dependent
only on the primary key.
(OR)
A relation is said to be in Third Normal Form, if a table must be follow the following. They are
The table must be in 2 NF prior.
No non-prime attribute is transitively dependent on prime key attribute. Consider the table
EMPLOYEE.
ECODE DEPT DEPTHEAD
E101 Systems E901
E305 Finance E909
E402 Sales E906
E508 Admin E908
E607 Finance E909
E608 Finance E909
The above table is in 1NF because each row of a table contains atomic value.
The primary key is in the EMPLOYEE table is ECODE, for each value of ECODE, there is exactly
one value of DEPT. Hence, the attribute DEPT is functionally dependent on the primary key,
ECODE.
Similarly, for each value of ECODE, there is exactly one value of DEPTHEAD. Therefore,
DEPTHEAD is functionally dependent on the primary key ECODE.
Hence, all the attributes are functionally dependent on the whole key, ECODE. Hence the table is in
2NF.
However, the attribute DPETHEAD is dependent on the attribute DEPT also. As per 3NF, all non-key
attributes have to be functionally dependent only on the primary key.
This table is not in 3NF since DEPTHEAD is functionally dependent on DEPT, Which is not a
primary key.
GUIDELINES FOR CONVERTING A TABLE TO 3NF
Find and remove non-key attributes that are functionally dependent on attributes that are not
primary key. Place them in a different table.
Group the remaining attributes.
To convert the table employee into 3NF, we must remove the column DEPTHEAD since it is not
functionally dependent on only the primary key ECODE, and place it in another table called
DEPARTMENT along with the attribute DEPT that is functionally dependent on ECODE
After applying the 3NF:
EMPLOYEE
ECODE DEPT
E101 Systems ECODE DEPTHEAD
E305 Finance Systems E901
E402 Sales Sales E906
E508 Admin Admin E908
E607 Finance Finance E909
E608 Finance
Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form
A relation is in the BCNF if and only if every determinant is a candidate key.
Consider the following PROJECT table. PROJECT:
ECODE NAME PROJCODE HOURS
E1 Veronica P2 48
E2 Anthony P5 100
E3 Mac P6 15
E4 Susan P2 250
E4 Susan P5 75
E1 Veronica P5 40
This table has redundancy. If the name of the employee is modified, the change will have to be made
consistent across the table, Otherwise there will be inconsistencies.
The following are candidate keys for above table: ECODE+PROJCODE NAME+PROJCODE
HOURS is functionally dependent on ECODE+PROJCODE.
HOURS is also functionally dependent on NAME+PROJCODE.
NAME is functionally dependent on ECODE.
ECODE is functionally dependent on NAME.
Multiple candidate keys that is ECODE + PROJCODE and NAME+ PROJCODE.
The candidate keys are composite.
The candidate keys overlap since the attribute PROJCODE is common.
This is a situation that requires conversion to BCNF. The table is essentially in the third NF. The only
non-key item is HOURS, which is dependent, on the whole key, that is, ECODE+PROJCODE or
PROJCODE.
GUIDELINES FOR CONVERTING A TABLE TO BCNF:
Find and remove that overlapping candidate keys. Place the part of the candidate key and the
attribute it is functionally dependent on, in a different table.
Group the remaining items into a table.
After applying the BCNF
EMPLOYEE PROJECT
ECODE NAME
ECODE PROJCODE HOURS
E1 Veronica
E1 P2 48
E2 Anthony
E2 P5 100
E3 Mac
E3 P6 15
Fourth Normal Form (4NF):
If the database table contains multiple multi valued attributes then that table is said to be in 4 NF.
For example, consider the the possibility that an employee can have multiple assignments and also
have multiple service organizations.
VOLUNTEER-1
ENO ORG_CODE ASSIGN_NUM
10123 RC 1
10123 UW 3
10123 4
VOLUNTEER-2
ENO ORG_CODE ASSIGN_NUM
10123 RC
10123 UW
10123 1
10123 2
In the above table, employee 10123 does volunteer work for Red Cross (RC) and United Way (UW).
I addition that, the same employee might be assigned to work on three projects such as 1, 2 and 4.
The attributes ORG_CODE and ASSIGN_NUM each may have many different attributes.i.e the table
contain two sets of Multivalued attributes.
Your tables conform to the following two rules
1. All attributes must be dependent on the primary key.
2. No row may contain two or more multi valued facts about entity
MULTIVALUED DEPENDENCIES (MVD)
Given a relation schema R, let X and Y be subsets of attributes of R (X and Y need not be
distinct). Then the multivalued dependency denoted as X Y satisfies in a relation R, if given two
tuples t1 and t2 in R with t1(X) = t2(X), where R(t1, t2, t3, t4) have the same X value i.e.,
t1(X) = t2(X) = t3(X) = t4(X)
Whereas the Y values of t1 and t3 are same and the Y values of t2 and t4 are same, i.e., t1(Y) = t3(Y) =
t2(Y) = t4(Y).
The R – X – Y value of t1 and t4 are same and the R – X – Y values of t2 and t3 are same
i.e.,
t1(R – X – Y) = t4(R – X – Y)
t2(R – X – Y) = t3(R – X – Y
X Y Z tuples
a b1 c1 - tuple t1
a b2 c2 - tuple t2
a b1 c2 - tuple t3
a b2 c1 - tuple t4
An illustration of MVD definition
Therefore, for each X value in such a relation, there will be a set of Y values associated with
it. This association between the X and Y values does not depend on the values of other attributes in
the relation.
In the above example, two tuples t1, t2 in relation R defined on relation schema R with the
same X value (i.e., ‘a’). Exchange the Y values (i.e., b 1 & b2) of these tuples t1, t2 and obtained the Y
values of tuples t3, t4. (i.e., b1 & b2) (that is, same values as t1, t2). then tuples t3 and t4 must also be in
R.
Thus, the above example is illustrated as MVD X Y holds over R. The first three rules are
Armstrong’s Axioms,
Rule 1:Reflexivity: if X Y, then X Y
Rule 2:Augmentation: if X Y, then XZ YZ for any Z. Rule 3:Transitivity: if X Y and
Y Z, then X Z. Three of the additional rules involve only in MVDs,
Rule 4:MVD Complementation: If X Y, then X R – XY.
Rule 5:MVD Augmentation: If X Y, then W Z, then WX YZ. Rule 6:MVD
Transitivity: if X Y and Y Z, then X (Z – Y). The following rules
are related to both FDs and MVDs.
Rule 7:Replication: If X Y, then X Y.
Rule 8:Coalescence: If X Y and there is a W such that W Y is empty. W
Z and Y Z then X Z.
Observe the above replication rule states that every FD is also a MVD.
5TH NORMAL FORM :
Definition 1 :
A relation R is in 5NF if and only if every join dependency in R is implied by the candidate keys of R.
Definition 2 :
A relation decomposed into two relations must have loss-less join Property, which ensures that no
spurious or extra tuples are generated, when relations are reunited through a natural join.
What is a Join Dependency(JD) ??
Let R be a relation. Let A, B, …, Z be arbitrary subsets of R’s attributes. R satisfies the JD
* ( A, B, …, Z )
if and only if R is equal to the join of its projections on A, B, …, Z.
A join dependency JD(R1, R2, …, Rn) specified on relation schema R, is a trivial JD, if one of the
relation schemas Ri in JD(R1, R2, ….,Rn) is equal to R.
Join dependency is used in the following case :
When there is no lossless join decomposition of R into two relation schemas, but there is a lossless
join decompositions of R into more than two relation schemas.
Point : A join dependency is very difficult in a database, hence normally not used.
Example :
Consider a relation ACP(Agent, Company, Product)
ACP : Meaning of the tuples
Agent(A) Company(C) Product(P) ⇒ Agent sells Company’s Products.
A1 PQR Nut
⇒ A1 sells PQR’s Nuts and Screw.
A1 PQR Screw
A1 XYZ Bolt A1 sells XYZ’s Bolts.
A2 PQR Bolt A2 sells PQR’s Bolts.
R12 :
The table is in 4 NF as it does not contain multivalued dependency. But the relation contains
Agent Company Product
redundancy as A1 is an agent for PQR twice. But there is no way of eliminating this
redundancy without losing information.
Suppose
A1 that
PQRthe table is Nut
decomposed into its two relations, R1 and R2.
The redundancy has been eliminated by decomposing ACP relation, but the information about
which companies make which products and which agents supply which product has been lost.
A1 naturalPQR
The join of theseScrew
relations over the ‘agent’ columns is:
A1 PQR Bolt
A1 XYZ Nut
A1 XYZ Screw
A1 XYZ Bolt
A2 PQR Bolt
Hence, the decomposition of ACP is a lossy join decomposition as the natural join table is
spurious, since it contains extra tuples(shaded) that gives incorrect information.
But now, suppose the original relation ACP is decomposed into 3 relations :
R1(Agent, Company)
R123 :
Agent Company Product
A1 PQR Nut
A1 PQR Screw
Again, we get an extra tuple shown as by shaded portion.
Hence, it has to be accepted that it is not possible to eliminate all redundancies using
A1 PQRtechniquesBolt
normalization because it cannot be assumed that all decompositions will be non-loss.
Hence again, the decomposition of ACP is a lossy join decomposition.
A1 the above
So, XYZ tables are Bolt
not in 5th normal form.
A2 PQR Bolt