0% found this document useful (0 votes)
59 views72 pages

Unit 4

Uploaded by

Tejas Shukla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views72 pages

Unit 4

Uploaded by

Tejas Shukla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 72

Unit 4: Database Design Concepts

Database System Concepts, 7th Ed.


©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Schema Refinement

 What is the Relational Model?


 The Schema Refinement refers to refine the schema by
using some technique. The best technique of schema
refinement is decomposition. Normalization or Schema
Refinement is a technique of organizing the data in the
database.
 It is a systematic approach of decomposing tables to
eliminate data redundancy and undesirable characteristics
like Insertion, Update and Deletion Anomalies.
 Redundancy refers to the repetition of same data or
duplicate copies of same data stored in different locations.
Anomalies: Anomalies refers to the problems occurred after
poorly planned and normalized databases where all the data
is stored in one table which is sometimes called a flat file
database

Database System Concepts - 7th Edition 1.2 ©Silberschatz, Korth and Sudarshan
 Anomalies or problems facing without
normalization(problems due to redundancy) : Anomalies
refers to the problems occurred after poorly planned and
unnormalised databases where all the data is stored in one
table which is sometimes called a flat file database.
 Let us consider such type of schema –

Database System Concepts - 7th Edition 1.3 ©Silberschatz, Korth and Sudarshan
 Here all the data is stored in a single table which causes
redundancy of data or say anomalies as SID and Sname are
repeated once for same CID.
 Let us discuss anomalies one by one. Due to redundancy of
data we may get the following problems, those are-
 1.insertion anomalies : It may not be possible to store some
information unless some other information is stored as well.
 2.redundant storage: some information is stored repeatedly
 3.update anomalies: If one copy of redundant data is
updated, then inconsistency is created unless all redundant
copies of data are updated.
 4.deletion anomalies: It may not be possible to delete some
information without losing some other information as well.

Database System Concepts - 7th Edition 1.4 ©Silberschatz, Korth and Sudarshan
 Removal of Anomalies
 These anomalies can be avoided or minimized by designing databases that adhere
to the principles of normalization. Normalization involves organizing data into
tables and applying rules to ensure data is stored in a consistent and efficient
manner. By reducing data redundancy and ensuring data integrity, normalization
helps to eliminate anomalies and improve the overall quality of the database
 According to E.F.Codd, who is the inventor of the Relational Database, the goals
of Normalization include:
• It helps in vacating all the repeated data from the database.
• It helps in removing undesirable deletion, insertion, and update anomalies.
• It helps in making a proper and useful relationship between tables.

Database System Concepts - 7th Edition 1.5 ©Silberschatz, Korth and Sudarshan
 Disadvantages of Anomalies in Relational Model
• Redundancy: When the same data is stored in various locations, a relational
architecture may cause data redundancy. This can result in inefficiencies and even
inconsistent data.
• Complexity: Establishing and keeping up a relational database calls for specific
knowledge and abilities and can be difficult and time-consuming.
• Performance: Because more tables must be joined in order to access information,
performance may degrade as a database gets larger.
• Incapacity to manage unstructured data: Text documents, videos, and other
forms of semi-structured or unstructured data are not well-suited for the relational
paradigm.

Database System Concepts - 7th Edition 1.6 ©Silberschatz, Korth and Sudarshan
Relational Decomposition

• When a relation in the relational model is not in appropriate normal form then the
decomposition of a relation is required.
• In a database, it breaks the table into multiple tables.
• If the relation has no proper decomposition, then it may lead to problems like loss
of information.
• Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.

Database System Concepts - 7th Edition 1.7 ©Silberschatz, Korth and Sudarshan
Types of Decomposition

 LOSSLESS DECOMPOSITON
 LOSSY DECOMPOSITION

Database System Concepts - 7th Edition 1.8 ©Silberschatz, Korth and Sudarshan
 Lossless Decomposition
• If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
• The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
• The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.

Database System Concepts - 7th Edition 1.9 ©Silberschatz, Korth and Sudarshan
 Example:

 EMPLOYEE_DEPARTMENT table:

 The above relation is decomposed into two relations EMPLOYEE and


DEPARTMENT

Database System Concepts - 7th Edition 1.10 ©Silberschatz, Korth and Sudarshan
 EMPLOYEE table:

Database System Concepts - 7th Edition 1.11 ©Silberschatz, Korth and Sudarshan
 DEPARTMENT table

Database System Concepts - 7th Edition 1.12 ©Silberschatz, Korth and Sudarshan
 Employee ⋈ Department

 Hence, the decomposition is Lossless join decomposition.

Database System Concepts - 7th Edition 1.13 ©Silberschatz, Korth and Sudarshan
Lossy Decomposition

 Lossy Decomposition
 As the name suggests, when a relation is decomposed into two or
more relational schemas, the loss of information is unavoidable
when the original relation is retrieved.
 Let us see an example −
 <EmpInfo>

Emp_ID Emp_Name Emp_Ag Emp_Locati Dept_ID Dept_N


e on ame
E001 Jacob 29 Alabama Dpt1 Opera
tions
E002 Henry 32 Alabama Dpt2 HR
E003 Tom 22 Texas Dpt3 Fin
 Decompose the above table into two tables −

Database System Concepts - 7th Edition 1.14 ©Silberschatz, Korth and Sudarshan
 <EmpDetails>

Emp_ID Emp_Name Emp_Age Emp_Locatio


n
E001 Jacob 29 Alabama
E002 Henry 32 Alabama
E003 Tom 22 Texas
 <DeptDetails>

Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance
 Now, you won’t be able to join the above tables,
since Emp_ID isn’t part of the DeptDetails relation.

Database System Concepts - 7th Edition 1.15 ©Silberschatz, Korth and Sudarshan
 Dependency Preserving
• It is an important constraint of the database.
• In the dependency preservation, at least one decomposed table must satisfy every
dependency.
• If a relation R is decomposed into relation R1 and R2, then the dependencies of R
either must be a part of R1 or R2 or must be derivable from the combination of
functional dependencies of R1 and R2.
• For example, suppose there is a relation R (A, B, C, D) with functional
dependency set (A->BC). The relational R is decomposed into R1(ABC) and
R2(AD) which is dependency preserving because FD A->BC is a part of relation
R1(ABC).

Database System Concepts - 7th Edition 1.16 ©Silberschatz, Korth and Sudarshan
 Problem related to decomposition
 Here are some common problems related to decomposition in DBMS:
 1. Loss of Information:
 During the decomposition process, it is possible to lose some information
present in the original table. This occurs when critical relationships or
dependencies between attributes are not preserved properly in the decomposed
relations. As a result, queries might not return the expected results or might
become more complex.
 2. Anomalies:
 Decomposition can introduce anomalies like update, insertion, and
deletion anomalies. These occur when data inconsistencies arise due to poor
decomposition choices. For example, if a table is decomposed into two smaller
tables, updating information that involves both of these tables may lead to
inconsistencies.

Database System Concepts - 7th Edition 1.17 ©Silberschatz, Korth and Sudarshan
 3. Join Complexity:
 When a table is decomposed into multiple smaller relations, it often requires
performing joins to retrieve the original data. If the decomposition is not done
efficiently, these joins can become complex and slow down query performance
significantly.
 4. Redundancy:
 While the goal of decomposition is to eliminate redundancy, it is possible to
introduce new redundancy if the decomposition is not carefully planned.
Redundancy can lead to wasted storage space and can cause update anomalies.
 5. Increased Storage Space:
 In some cases, decomposition can lead to an increase in storage space. For
example, if the same data needs to be stored in multiple decomposed tables to
maintain referential integrity, it can result in duplication of data.
 6. Dependency Preservation:
 Decomposition should ideally preserve all the functional dependencies from the
original relation. However, it is not always possible to preserve all dependencies,
and some may be lost during the decomposition process. This can lead to
integrity issues

Database System Concepts - 7th Edition 1.18 ©Silberschatz, Korth and Sudarshan
Functional Dependency

 In a relational database management, functional dependency is a concept that


specifies the relationship between two sets of attributes where one attribute
determines the value of another attribute. It is denoted as X → Y, where the
attribute set on the left side of the arrow, X is called Determinant, and Y is called
the Dependent.

Roll_no name dept_name dept_building

42 abc CO A4

43 pqr IT A3

44 xyz CO A4

45 xyz IT A3

46 mno EC B2

47 jkl ME B2

Database System Concepts - 7th Edition 1.19 ©Silberschatz, Korth and Sudarshan
 From the above table we can conclude some valid functional dependencies:
• roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine
values of fields name, dept_name and dept_building, hence a valid Functional
dependency
• roll_no → dept_name , Since, roll_no can determine whole set of {name,
dept_name, dept_building}, it can determine its subset dept_name also.
• dept_name → dept_building , Dept_name can identify the dept_building
accurately, since departments with different dept_name will also have a different
dept_building
• More valid functional dependencies: roll_no → name, {roll_no, name} ⇢
{dept_name, dept_building}, etc.
• Here are some invalid functional dependencies:
 dept_building → dept_name There can be multiple departments in the same
building. Example, in the above table departments ME and EC are in the same
building B2, hence dept_building → dept_name is an invalid functional
dependency.

Database System Concepts - 7th Edition 1.20 ©Silberschatz, Korth and Sudarshan
 Armstrong’s axioms/properties of functional dependencies:
1. Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule
Example, {roll_no, name} → name is valid.
2. Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by
the augmentation rule.
Example, {roll_no, name} → dept_building is valid, hence {roll_no, name,
dept_name} → {dept_building, dept_name} is also valid.
3. Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also
valid by the Transitivity rule.
Example, roll_no → dept_name & dept_name → dept_building, then roll_no →
dept_building is also valid.

Database System Concepts - 7th Edition 1.21 ©Silberschatz, Korth and Sudarshan
 Types of Functional Dependencies
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
 1. Trivial Functional Dependency
 In Trivial Functional Dependency, a dependent is always a subset of the
determinant. i.e. If X → Y and Y is the subset of X, then it is called trivial
functional dependency

Database System Concepts - 7th Edition 1.22 ©Silberschatz, Korth and Sudarshan
 Example:

 roll_no name age


 42 abc 17
 43 pqr 18
 44 xyz 18

 Here, {roll_no, name} → name is a trivial functional dependency, since the


dependent name is a subset of determinant set {roll_no, name}. Similarly, roll_no
→ roll_no is also an example of trivial functional dependency.

Database System Concepts - 7th Edition 1.23 ©Silberschatz, Korth and Sudarshan
 2. Non-trivial Functional Dependency
 In Non-trivial functional dependency, the dependent is strictly not a subset of the
determinant. i.e. If X → Y and Y is not a subset of X, then it is called Non-trivial
functional dependency.
 Example:

 roll_no name age


 42 abc 17
 43 pqr 18
 44 xyz 18
 Here, roll_no → name is a non-trivial functional dependency, since the
dependent name is not a subset of determinant roll_no. Similarly, {roll_no,
name} → age is also a non-trivial functional dependency, since age is not a
subset of {roll_no, name}

Database System Concepts - 7th Edition 1.24 ©Silberschatz, Korth and Sudarshan
 3. Multivalued Functional Dependency
 In Multivalued functional dependency, entities of the dependent set are not
dependent on each other. i.e. If a → {b, c} and there exists no functional
dependency between b and c, then it is called a multivalued functional
dependency.
 For example,

 roll_no name age


 42 abc 17
 43 pqr 18
 44 xyz 18
 45 abc 19
 Here, roll_no → {name, age} is a multivalued functional dependency, since the
dependents name & age are not dependent on each other(i.e. name → age or age
→ name doesn’t exist !)

Database System Concepts - 7th Edition 1.25 ©Silberschatz, Korth and Sudarshan
 4. Transitive Functional Dependency
 In transitive functional dependency, dependent is indirectly dependent on
determinant. i.e. If a → b & b → c, then according to axiom of transitivity, a → c.
This is a transitive functional dependency.
 For example,

 enrol_no name dept building_no


 42 abc CO 4
 43 pqr EC 2
 44 xyz IT 1
 45 abc EC 2
 Here, enrol_no → dept and dept → building_no. Hence, according to the axiom
of transitivity, enrol_no → building_no is a valid functional dependency. This is
an indirect functional dependency, hence called Transitive functional dependency.

Database System Concepts - 7th Edition 1.26 ©Silberschatz, Korth and Sudarshan
 5. Fully Functional Dependency
 In full functional dependency an attribute or a set of attributes uniquely determines
another attribute or set of attributes. If a relation R has attributes X, Y, Z with the
dependencies X->Y and X->Z which states that those dependencies are fully
functional.
 6. Partial Functional Dependency
 In partial functional dependency a non key attribute depends on a part of the
composite key, rather than the whole key. If a relation R has attributes X, Y, Z
where X and Y are the composite key and Z is non key attribute. Then X->Z is a
partial functional dependency in RBDMS.

Database System Concepts - 7th Edition 1.27 ©Silberschatz, Korth and Sudarshan
 Advantages of Functional Dependencies
 Functional dependencies having numerous applications in the field of database
management system. Here are some applications listed below:
 1. Data Normalization
 Data normalization is the process of organizing data in a database in order to
minimize redundancy and increase data integrity. Functional dependencies play an
important part in data normalization. With the help of functional dependencies we
are able to identify the primary key, candidate key in a table which in turns helps
in normalization.
 2. Query Optimization
 With the help of functional dependencies we are able to decide the connectivity
between the tables and the necessary attributes need to be projected to retrieve the
required data from the tables. This helps in query optimization and improves
performance.

Database System Concepts - 7th Edition 1.28 ©Silberschatz, Korth and Sudarshan
 3. Consistency of Data
 Functional dependencies ensures the consistency of the data by removing any
redundancies or inconsistencies that may exist in the data. Functional dependency
ensures that the changes made in one attribute does not affect inconsistency in
another set of attributes thus it maintains the consistency of the data in database.
 4. Data Quality Improvement
 Functional dependencies ensure that the data in the database to be accurate,
complete and updated. This helps to improve the overall quality of the data, as
well as it eliminates errors and inaccuracies that might occur during data analysis
and decision making, thus functional dependency helps in improving the quality of
data in database.

Database System Concepts - 7th Edition 1.29 ©Silberschatz, Korth and Sudarshan
 Advantages of Functional Dependencies
• Through the identification and removal of redundant or unneeded data, they aid in
the reduction of data redundancy in databases.
• By guaranteeing that data is correct and consistent throughout the database, they
enhance data integrity.
• They make it simpler to add, edit, and remove data, which helps with database
management.
 Disadvantages of Functional Dependencies
• The process of identifying functional dependencies can be time-consuming and
complex, especially in large databases with many tables and relationships.
• Overly restrictive functional dependencies can result in slow query performance or
data inconsistencies, as data that should be related may not be properly linked.
• Functional dependencies do not take into account the semantic meaning of data,
and may not always reflect the true relationships between data elements.

Database System Concepts - 7th Edition 1.30 ©Silberschatz, Korth and Sudarshan
Finding Attribute Closure and Candidate Keys using
Functional Dependencies
 Steps to Find the Attribute Closure
 Q.1. Given the FD set of a Relation R, The attribute closure set S is the set of
Attribute Closure A.
• Add A to S.
• Recursively add attributes that can be functionally determined from attributes of
the set S until done.

E-ID E-NAME E-CITY E-STATE

E001 John Delhi Delhi

E002 Mary Delhi Delhi

E003 John Noida U.P.


From Table 1, FDs are

Database System Concepts - 7th Edition 1.31 ©Silberschatz, Korth and Sudarshan
 Given R (E-ID, E-NAME, E-CITY, E-STATE)
 FDs = { E-ID->E-NAME, E-ID->E-CITY, E-ID->E-STATE, E-CITY-
>E-STATE }
 The attribute closure of E-ID can be calculated as:
• Add E-ID to the set {E-ID}
• Add Attributes that can be derived from any attribute of the set. In this case, E-
NAME and E-CITY, E-STATE can be derived from E-ID. So these are also a part
of the closure.
 As there is one other attribute remaining in relation to be derived from E-ID. So
the result is:
 (E-ID)+ = {E-ID, E-NAME, E-CITY, E-STATE }
 Similarly,
 (E-NAME)+ = {E-NAME}
 (E-CITY)+ = {E-CITY, E_STATE}

Database System Concepts - 7th Edition 1.32 ©Silberschatz, Korth and Sudarshan
 Q.2 Find the attribute closures of given FDs R(ABCDE) = {AB->C, B->D, C-
>E, D->A}. To find (B)+, we will add an attribute in the set using various FDs
which have been shown in the table below.

Attributes Added in Closure FD used

{B} Triviality

{B, D} B->D

{B, D, A} D->A

{B, D, A, C} AB->C

{B, D, A, C, E} C->E

 We can find (C, D)+ by adding C and D into the set (triviality) and then E
using(C->E) and then A using (D->A) and the set becomes.
 (C,D)+ = {C,D,E,A}
 Concepts
Database System Similarly,
- 7 we
th
can find (B, C)+ by adding
Edition B and C into the set (triviality)
1.33 andKorth
©Silberschatz, thenand Sudarshan
 Candidate Key
 Candidate Key is a minimal set of attributes of a relationship that can be used to
identify a tuple uniquely. For Example, each tuple of EMPLOYEE relation given
in Table 1 can be uniquely identified by E-ID and it is minimal as well. So it will
be the Candidate key of the relationship.
 A candidate key may or may not be a primary key.
 Super Key is a set of attributes of a relationship that can be used to identify a
tuple uniquely.
 For Example, each tuple of EMPLOYEE relation given in Table 1 can be
uniquely identified by E-ID or (E-ID, E-NAME) or (E-ID, E-CITY) or (E-ID,
E-STATE) or (E_ID, E-NAME, E-STATE), etc. So all of these are super keys of
EMPLOYEE relation.
 Finding Candidate Keys and Super Keys of a Relation using FD set.
 The set of attributes whose attribute closure is a set of all attributes of the relation
is called the super key of the relation.
 For Example, the EMPLOYEE relation shown in Table 1 has the following FD
set. {E-ID->E-NAME, E-ID->E-CITY, E-ID->E-STATE, E-CITY->E-
STATE}.

Database System Concepts - 7th Edition 1.34 ©Silberschatz, Korth and Sudarshan
 Let us calculate the attribute closure of different sets of attributes:
 (E-ID)+ = {E-ID, E-NAME,E-CITY,E-STATE}
 (E-ID,E-NAME)+ = {E-ID, E-NAME,E-CITY,E-STATE}
 (E-ID,E-CITY)+ = {E-ID, E-NAME,E-CITY,E-STATE}
 (E-ID,E-STATE)+ = {E-ID, E-NAME,E-CITY,E-STATE}
 (E-ID,E-CITY,E-STATE)+ = {E-ID, E-NAME,E-CITY,E-STATE}
 (E-NAME)+ = {E-NAME}
 (E-CITY)+ = {E-CITY,E-STATE}
 As (E-ID)+, (E-ID, E-NAME)+, (E-ID, E-CITY)+, (E-ID, E-STATE)+, (E-ID, E-
CITY, E-STATE)+ give set of all attributes of relation EMPLOYEE. So all of
these are super keys of relation.

Database System Concepts - 7th Edition 1.35 ©Silberschatz, Korth and Sudarshan
 The minimal set of attributes whose attribute closure is a set of all attributes of
relation is called the candidate key of the relation.
 As shown above, (E-ID)+ is a set of all attributes of relation and it is minimal.
 So E-ID will be the candidate key. On the other hand (E-ID, E-NAME) + also is a
set of all attributes but it is not minimal because its subset (E-ID)+ is equal to the
set of all attributes.
 So (E-ID, E-NAME) is not a candidate key.

Database System Concepts - 7th Edition 1.36 ©Silberschatz, Korth and Sudarshan
Advantages of Attribute Closure
• Attribute closures help to identify all possible attributes that can be derived from a
set of given attributes.
• They facilitate database design by identifying relationships between attributes and
tables, which can help to optimize query performance.
• They ensure data consistency by identifying all possible combinations of attributes
that can exist in the database.
 Disadvantages of Attribute Closure
• The process of calculating attribute closures can be computationally expensive,
especially for large datasets.
• Attribute closures can become too complex to manage, especially as the number of
attributes and tables in a database grows.
• Attribute closures do not take into account the semantic meaning of data, and may
not always accurately reflect the relationships between data elements.

Database System Concepts - 7th Edition 1.37 ©Silberschatz, Korth and Sudarshan
 Prime and Non-Prime Attributes
 Attributes which are parts of any candidate key of relation are called as prime
attribute, others are non-prime attributes. For Example, STUD_NO in STUDENT
relation is prime attribute, others are non-prime attribute.

 Tools like functional dependency and attribute closure are helpful when designing
and optimizing databases. They are useful for:
• Determine the connections between the tables and the attributes.
• Boost query efficiency
• Ascertain data coherence.

Database System Concepts - 7th Edition 1.38 ©Silberschatz, Korth and Sudarshan
 GATE Questions
 Q.1: Consider the relation scheme R = {E, F, G, H, I, J, K, L, M, N} and the
set of functional dependencies {{E, F} -> {G}, {F} -> {I, J}, {E, H} -> {K, L}, K
-> {M}, L -> {N} on R. What is the key for R? (GATE-CS-2014)
 A. {E, F}
 B. {E, F, H}
 C. {E, F, H, K, L}
 D. {E}

Database System Concepts - 7th Edition 1.39 ©Silberschatz, Korth and Sudarshan
 Solution:

 Finding attribute closure of all given options, we get:


 {E,F}+ = {EFGIJ}
 {E,F,H}+ = {EFHGIJKLMN}
 {E,F,H,K,L}+ = {{EFHGIJKLMN}
 {E}+ = {E}
 {EFH}+ and {EFHKL}+ results in set of all attributes, but EFH
is minimal. So it will be candidate key. So correct option is
(B).

Database System Concepts - 7th Edition 1.40 ©Silberschatz, Korth and Sudarshan
 In a schema with attributes A, B, C, D and E following set of functional
dependencies are given
{A -> B, A -> C, CD -> E, B -> D, E -> A}
Which of the following functional dependencies is NOT implied by the above
set?
 A. CD -> AC
 B. BD -> CD
 C. BC -> CD
 D. AC -> BC
 Using FD set given in question,
 (CD)+ = {CDEAB} which means CD -> AC also holds true.
 (BD)+ = {BD} which means BD -> CD can’t hold true. So this
FD is no implied in FD set. So (B) is the required option.
 Others can be checked in the same way.

Database System Concepts - 7th Edition 1.41 ©Silberschatz, Korth and Sudarshan
 Consider a relation scheme R = (A, B, C, D, E, H) on which the following
functional dependencies hold: {A–>B, BC–> D, E–>C, D–>A}. What are the
candidate keys of R?
 (a) AE, BE
 (b) AE, BE, DE
 (c) AEH, BEH, BCH
 (d) AEH, BEH, DEH
(AE)+ = {ABECD} which is not a set of all attributes. So AE is not a candidate
key.
 Hence options A and B are wrong.
(AEH)+ = {ABCDEH}
(BEH)+ = {BEHCDA}
(BCH)+ = {BCHDA} which is not set of all attributes.
 So BCH is not a candidate key. Hence option C is wrong.
So correct answer is D.

Database System Concepts - 7th Edition 1.42 ©Silberschatz, Korth and Sudarshan
Normalization

 A large database defined as a single relation may result in data duplication. This
repetition of data may result in:
• Making relations very large.
• It isn't easy to maintain and update data as it would involve searching many
records in relation.
• Wastage and poor utilization of disk space and resources.
• The likelihood of errors and inconsistencies increases.
 So to handle these problems, we should analyze and decompose the relations with
redundant data into smaller, simpler, and well-structured relations that are satisfy
desirable properties. Normalization is a process of decomposing the relations into
relations with fewer attributes.

Database System Concepts - 7th Edition 1.43 ©Silberschatz, Korth and Sudarshan
 What is Normalization?

• Normalization is the process of organizing the data in the database.


• Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
• Normalization divides the larger table into smaller and links them using
relationships.
• The normal form is used to reduce redundancy from the database table.

Database System Concepts - 7th Edition 1.44 ©Silberschatz, Korth and Sudarshan
 Why do we need Normalization?
 The main reason for normalizing the relations is removing these anomalies. Failure
to eliminate anomalies leads to data redundancy and can cause data integrity and
other problems as the database grows. Normalization consists of a series of
guidelines that helps to guide you in creating a good database structure.
 Data modification anomalies can be categorized into three types:
• Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new
tuple into a relationship due to lack of data.
• Deletion Anomaly: The delete anomaly refers to the situation where the deletion
of data results in the unintended loss of some other important data.
• Updatation Anomaly: The update anomaly is when an update of a single data
value requires multiple rows of data to be updated.

Database System Concepts - 7th Edition 1.45 ©Silberschatz, Korth and Sudarshan
 Types of Normal Forms:
 Normalization works through a series of stages called Normal forms. The normal
forms apply to individual relations. The relation is said to be in particular normal
form if it satisfies constraints.
 Following are the various types of Normal forms:

Database System Concepts - 7th Edition 1.46 ©Silberschatz, Korth and Sudarshan
Normal Form Description

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent
on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has no multi-valued
dependency.

5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency, joining should be
lossless.

Database System Concepts - 7th Edition 1.47 ©Silberschatz, Korth and Sudarshan
 Advantages of Normalization
• Normalization helps to minimize data redundancy.
• Greater overall database organization.
• Data consistency within the database.
• Much more flexible database design.
• Enforces the concept of relational integrity.
 Disadvantages of Normalization
• You cannot start building the database before knowing what the user needs.
• The performance degrades when normalizing the relations to higher normal forms,
i.e., 4NF, 5NF.
• It is very time-consuming and difficult to normalize relations of a higher degree.
• Careless decomposition may lead to a bad database design, leading to serious
problems.

Database System Concepts - 7th Edition 1.48 ©Silberschatz, Korth and Sudarshan
First Normal Form (1NF)

• A relation will be 1NF if it contains an atomic value.


• It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
• First normal form disallows the multi-valued attribute, composite attribute, and
their combinations.
 Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute
EMP_PHONE.
 EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

Database System Concepts - 7th Edition 1.49 ©Silberschatz, Korth and Sudarshan
 The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab

Database System Concepts - 7th Edition 1.50 ©Silberschatz, Korth and Sudarshan
 Second Normal Form (2NF)
• In the 2NF, relational must be in 1NF.
• In the second normal form, all non-key attributes are fully functional dependent on
the primary key
 Example: Let's assume, a school can store the data of teachers and the subjects
they teach. In a school, a teacher can teach more than one subject.
 TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE


25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38

Database System Concepts - 7th Edition 1.51 ©Silberschatz, Korth and Sudarshan
 In the given table, non-prime attribute TEACHER_AGE is dependent on
TEACHER_ID which is a proper subset of a candidate key. That's why it violates
the rule for 2NF.
 To convert the given table into 2NF, we decompose it into two tables:
 TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE
25 30
47 35
83 38

 TEACHER_SUBJECT table:

TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
Database System Concepts - 7th Edition 1.52 ©Silberschatz, Korth and Sudarshan
 Third Normal Form (3NF)
• A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
• 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.
• If there is no transitive dependency for non-prime attributes, then the relation must
be in third normal form.
 A relation is in third normal form if it holds atleast one of the following conditions
for every non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Database System Concepts - 7th Edition 1.53 ©Silberschatz, Korth and Sudarshan
 Example:
 EMPLOYEE_DETAIL table:

EMP_ID EMP_NA EMP_ZIP EMP_STA EMP_CIT


ME TE Y
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal

Super key in the table above:


{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Database System Concepts - 7th Edition 1.54 ©Silberschatz, Korth and Sudarshan
 Non-prime attributes: In the given table, all attributes except EMP_ID are non-
prime.
 Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP
dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY)
transitively dependent on super key(EMP_ID). It violates the rule of third normal
form.
 That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
 EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP


222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

Database System Concepts - 7th Edition 1.55 ©Silberschatz, Korth and Sudarshan
 EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY


201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal

Database System Concepts - 7th Edition 1.56 ©Silberschatz, Korth and Sudarshan
Boyce Codd normal form (BCNF)

• BCNF is the advance version of 3NF. It is stricter than 3NF.


• A table is in BCNF if every functional dependency X → Y, X is the super key of
the table.
• For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
 Example: Let's assume there is a company where employees work in more than
one department.
 EMPLOYEE table:

EMP_ID EMP_COU EMP_DEPT DEPT_TYP EMP_DEPT


NTRY E _NO
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549

Database System Concepts - 7th Edition 1.57 ©Silberschatz, Korth and Sudarshan
 In the above table Functional dependencies are as follows:
 EMP_ID → EMP_COUNTRY
 EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
 Candidate key: {EMP-ID, EMP-DEPT}
 The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are
keys.
 To convert the given table into BCNF, we decompose it into three tables:
 EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India

 EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
Database System Concepts - 7th Edition 1.58 ©Silberschatz, Korth and Sudarshan
 EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549

Database System Concepts - 7th Edition 1.59 ©Silberschatz, Korth and Sudarshan
 Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
 Candidate keys:
 For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
 Now, this is in BCNF because left side part of both the functional dependencies is
a key.

Database System Concepts - 7th Edition 1.60 ©Silberschatz, Korth and Sudarshan
 Fourth normal form (4NF)
• A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-
valued dependency.
• For a dependency A → B, if for a single value of A, multiple values of B exists,
then the relation will be a multi-valued dependency.
 Example
STUDENT

STU_ID COURSE HOBBY


21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.

Database System Concepts - 7th Edition 1.61 ©Silberschatz, Korth and Sudarshan
 In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is
a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of
data.
 So to make the above table into 4NF, we can decompose it into two tables:
 STUDENT_COURSE

STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics

Database System Concepts - 7th Edition 1.62 ©Silberschatz, Korth and Sudarshan
 STUDENT_HOBBY

STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey

Database System Concepts - 7th Edition 1.63 ©Silberschatz, Korth and Sudarshan
Fifth normal form (5NF)

• A relation is in 5NF if it is in 4NF and not contains any join dependency and
joining should be lossless.
• 5NF is satisfied when all the tables are broken into as many tables as possible in
order to avoid redundancy.
• 5NF is also known as Project-join normal form (PJ/NF).
 Example

SUBJECT LECTURER SEMESTER


Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1

Database System Concepts - 7th Edition 1.64 ©Silberschatz, Korth and Sudarshan
 In the above table, John takes both Computer and Math class for Semester 1 but he
doesn't take Math class for Semester 2. In this case, combination of all these fields
required to identify a valid data.
 Suppose we add a new Semester as Semester 3 but do not know about the subject
and who will be taking that subject so we leave Lecturer and Subject as NULL.
But all three columns together acts as a primary key, so we can't leave other two
columns blank.
 So to make the above table into 5NF, we can decompose it into three relations P1,
P2 & P3:
 P1

SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math

Database System Concepts - 7th Edition 1.65 ©Silberschatz, Korth and Sudarshan
 P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen

 P3

SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen

Database System Concepts - 7th Edition 1.66 ©Silberschatz, Korth and Sudarshan
Multi valued Dependencies

 Multivalued dependency occurs when two attributes in a


table are independent of each other but, both depend on a
third attribute.
 A multivalued dependency consists of at least two
attributes that are dependent on a third attribute that's why
it always requires at least three attributes.
 Example: Suppose there is a bike manufacturer company which produces two
colors(white and black) of each model every year.
BIKE_MODEL MANUF_YEAR COLOR
M2011 2008 White
M2001 2008 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White
M4006 2017 Black

Database System Concepts - 7th Edition 1.67 ©Silberschatz, Korth and Sudarshan
 Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL
and independent of each other.
 In this case, these two columns can be called as multivalued dependent on
BIKE_MODEL. The representation of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
 This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and
"BIKE_MODEL multidetermined COLOR".

Database System Concepts - 7th Edition 1.68 ©Silberschatz, Korth and Sudarshan
Join Dependency

• Join decomposition is a further generalization of Multivalued dependencies.


• If the join of R1 and R2 over C is equal to relation R, then we can say that a join
dependency (JD) exists.
• Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given
relations R (A, B, C, D).
• Alternatively, R1 and R2 are a lossless decomposition of R.
• A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a
lossless-join decomposition.
• The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to
the relation R.
• Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD
of R.

Database System Concepts - 7th Edition 1.69 ©Silberschatz, Korth and Sudarshan
 Dependency Preservation: A Decomposition D = { R1, R2,
R3…Rn } of R is dependency preserving wrt a set F of
Functional dependency if

 (F1 ? F2 ? … ? Fm)+ = F+.


 Consider a relation R
 R ---> F{...with some functional dependency(FD)....}

 R is decomposed or divided into R1 with FD { f1 } and R2


with { f2 }, then
 there can be three cases:

 f1 U f2 = F -----> Decomposition is dependency preserving.


 f1 U f2 is a subset of F -----> Not Dependency preserving.
 f1 U f2 is a super set of F -----> This case is not possible.

Database System Concepts - 7th Edition 1.70 ©Silberschatz, Korth and Sudarshan
 Problem:
 Let a relation R (A, B, C, D ) and functional dependency {AB –> C, C –> D, D –>
A}. Relation R is decomposed into R1( A, B, C) and R2(C, D). Check whether
decomposition is dependency preserving or not.
 R1(A, B, C) and R2(C, D)

 Let us find closure of F1 and F2


 To find closure of F1, consider all combination of
 ABC. i.e., find closure of A, B, C, AB, BC and AC
 Note ABC is not considered as it is always ABC

 closure(A) = { A } // Trivial
 closure(B) = { B } // Trivial
 closure(C) = {C, A, D} but D can't be in closure as D is not
present R1.
 = {C, A}
 C--> A // Removing C from right side as it is trivial attribute

Database System Concepts - 7th Edition 1.71 ©Silberschatz, Korth and Sudarshan
 closure(AB) = {A, B, C, D}
 = {A, B, C}
 AB --> C // Removing AB from right side as these are trivial
attributes

 closure(BC) = {B, C, D, A}
 = {A, B, C}
 BC --> A // Removing BC from right side as these are trivial
attributes

 closure(AC) = {A, C, D}
 NULL SET
 F1 {C--> A, AB --> C, BC --> A}.
 Similarly F2 { C--> D }

 In the original Relation Dependency { AB --> C , C --> D , D -->


A}.
 AB --> C is present in F1.
 C --> D is present in F2.
 D --> A is not preserved.
 Concepts
Database System F1 U F2
th
- 7 is a subset of F. So given
Edition 1.72 decomposition is not Korth and Sudarshan
©Silberschatz,

You might also like