0% found this document useful (0 votes)
18 views17 pages

Unit 3 - Part 2

The document discusses schema refinement in database management, focusing on reducing redundancy and improving data integrity through normalization techniques such as FIRST, SECOND, and THIRD normal forms, as well as BCNF. It outlines the problems caused by redundancy, including update, insertion, and deletion anomalies, and explains the process of decomposition to eliminate these issues while preserving functional dependencies. Additionally, it covers functional dependencies, attribute closure, and types of dependencies essential for maintaining relational database integrity.

Uploaded by

waliwos724
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views17 pages

Unit 3 - Part 2

The document discusses schema refinement in database management, focusing on reducing redundancy and improving data integrity through normalization techniques such as FIRST, SECOND, and THIRD normal forms, as well as BCNF. It outlines the problems caused by redundancy, including update, insertion, and deletion anomalies, and explains the process of decomposition to eliminate these issues while preserving functional dependencies. Additionally, it covers functional dependencies, attribute closure, and types of dependencies essential for maintaining relational database integrity.

Uploaded by

waliwos724
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

DATABASE MANAGEMENT SYSTEM CSE(AI & ML)

UNIT-3- PART-2
Schema Refinement: Problems caused by redundancy, decompositions, problems related to
decomposition, reasoning about functional dependencies, FIRST, SECOND, THIRD
normal forms, BCNF, lossless join decomposition, multi-valued dependencies, FOURTH
normal form, FIFTH normal form.

Schema Refinement:
Schema refinement is the process of systematically organizing the attributes and tables of a relational
database to
● Reduce redundancy
● Improve data integrity.
The goal is to eliminate undesirable characteristics like
● insertion anomalies
● update anomalies and
● deletion anomalies
which can lead to data inconsistency.
This process involves decomposing a large table into smaller, well-structured tables without losing any data
or introducing redundancy.

Problems caused by redundancy


Redundancy in a database occurs when the same piece of data is stored in multiple places.

This can lead to several issues, including data anomalies, increased storage requirements, and maintenance
difficulties.

Here are some of the primary problems caused by redundancy:

1. Update Anomalies:
○ When the same data is stored in multiple locations, updating one instance of the data without
updating the others can lead to inconsistencies.
○ Example: If a customer's address is stored in multiple tables, changing the address in one
table but not the others results in inconsistent data.
○ Example 2:

2. Insertion Anomalies:

KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)
○ Redundancy can prevent the insertion of data due to the need to provide duplicated
information.
○ Example 1: If a new course is to be added to a university database, it may require a student
to be enrolled in it immediately if the course and student information are stored together.
○ Example 2:

3. Deletion Anomalies:
○ Deleting data in one place may inadvertently cause the loss of valuable data stored in another
place.
○ Example 1: If a student's enrollment record is deleted and that record contains the only copy
of the student's contact information, the contact information is also lost.
○ Example 2:

4. Increased Storage Costs:


○ Storing the same data multiple times leads to unnecessary consumption of storage space.
○ Example: If customer details are stored in every order record instead of linking to a single
customer record, storage requirements are significantly higher.
5. Data Inconsistency:
○ Redundant data can lead to inconsistencies if the duplicated values do not match across
different places.
○ Example: If an employee's salary is updated in one table but not in another, it leads to
conflicting information within the database.
6. Maintenance Overhead:
○ Keeping redundant data consistent requires additional effort and time, increasing the
complexity of maintaining the database.
○ Example: Regular audits and synchronization efforts are needed to ensure that all copies of
the data remain consistent.

Example to Illustrate Problems Caused by Redundancy

Consider an unnormalized database schema for an online store:

KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)
Initial Schema (Unnormalized)
CREATE TABLE Orders (

order_id INT,

customer_id INT,

customer_name VARCHAR(255),

customer_address VARCHAR(255),

product_id INT,

product_name VARCHAR(255),

order_date DATE,

quantity INT

);

Problems Illustrated:

● Update Anomaly: If a customer's address changes, every order record for that customer must be updated. If
one record is missed, the database will have inconsistent data.
● Insertion Anomaly: A new product cannot be added to the product catalog without creating an order for it,
which forces the insertion of irrelevant order data.
● Deletion Anomaly: Deleting an order record may result in losing the only copy of the customer's information
if that customer has made only one order.
● Increased Storage Costs: Customer and product information are repeated in every order record, consuming
more storage space than necessary.
● Data Inconsistency: If a product name is changed in one order record but not in others, it leads to
inconsistent product information.

Decompositions
Decomposition refers to the process of breaking down a large, complex table (relation) into smaller, more
manageable tables without losing information.

This is done to achieve a better design by eliminating redundancy, avoiding anomalies, and improving data
integrity.

The goal is to ensure that the resulting tables are in a normal form that meets the requirements of a relational
database.

Types of Decomposition

1. Lossless-Join Decomposition:
○ Ensures that no information is lost when the table is decomposed and then joined back
together.
○ The original table can be reconstructed by performing a natural join on the decomposed
tables.
2. Dependency-Preserving Decomposition:
○ Ensures that all functional dependencies are preserved in the decomposed tables.
○ This is important for maintaining the integrity constraints of the database.

KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)
Steps in Decomposition

1. Identify Functional Dependencies:


○ Determine how attributes depend on one another.
○ Identify keys and candidate keys for the table.
2. Apply Normalization Rules:
○ Decompose the table to achieve the desired normal form (e.g., 1NF, 2NF, 3NF, BCNF).
3. Ensure Lossless-Join and Dependency Preservation:
○ Verify that the decomposition is lossless and that all functional dependencies are preserved.

Example of Decomposition

Consider an initial table that contains student enrollment data, including information about students, courses,
and instructors.

Initial Schema (Unnormalized)

CREATE TABLE Enrollment (

student_id INT,

student_name VARCHAR(255),

course_id INT,

course_name VARCHAR(255),

instructor_name VARCHAR(255)

);

Problems in the Initial Schema:

● Redundancy: The same student and course information may be repeated.


● Anomalies: Update, insertion, and deletion anomalies can occur.

Steps of Decomposition

1. Identify Functional Dependencies:


○ student_id -> student_name
○ course_id -> course_name, instructor_name
2. Apply Normalization:
First Normal Form (1NF):
○ Ensure atomicity of values. In this case, the table is already in 1NF.
3. Second Normal Form (2NF):
○ Ensure that all non-key attributes are fully functionally dependent on the primary key.
○ Decompose the table to eliminate partial dependencies.

CREATE TABLE Students (

student_id INT PRIMARY KEY,

student_name VARCHAR(255)

);
KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)

CREATE TABLE Courses (

course_id INT PRIMARY KEY,

course_name VARCHAR(255),

instructor_name VARCHAR(255)

);

CREATE TABLE Enrollments (

enrollment_id INT PRIMARY KEY,

student_id INT,

course_id INT,

FOREIGN KEY (student_id) REFERENCES Students(student_id),

FOREIGN KEY (course_id) REFERENCES Courses(course_id)

);

4. Ensure Lossless-Join and Dependency Preservation:


○ The original table can be reconstructed by joining the decomposed tables.
○ All functional dependencies are preserved in the decomposed tables.

Benefits of Decomposition

● Reduces Redundancy: Each piece of data is stored only once, reducing the chance of inconsistency.
● Eliminates Anomalies: Properly decomposed tables avoid update, insertion, and deletion anomalies.
● Improves Data Integrity: Functional dependencies and constraints are easier to enforce.
● Enhances Performance: Smaller tables can be more efficiently managed and queried.

Lossless-Join and Dependency Preservation Example


Initial Schema (Unnormalized)

CREATE TABLE EmployeeProject (

emp_id INT,

emp_name VARCHAR(255),

proj_id INT,

proj_name VARCHAR(255)

);

Functional Dependencies:

KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)
● emp_id -> emp_name
● proj_id -> proj_name

Decomposed Schema:

Employees Table:
CREATE TABLE Employees (

emp_id INT PRIMARY KEY,

emp_name VARCHAR(255)

);

1.

Projects Table:
sql
Copy code
CREATE TABLE Projects (

proj_id INT PRIMARY KEY,

proj_name VARCHAR(255)

);

2.

EmployeeProject Table:
sql
Copy code
CREATE TABLE EmployeeProject (

emp_id INT,

proj_id INT,

PRIMARY KEY (emp_id, proj_id),

FOREIGN KEY (emp_id) REFERENCES Employees(emp_id),

FOREIGN KEY (proj_id) REFERENCES Projects(proj_id)

);

Benefits:

● Lossless-Join: The original EmployeeProject table can be reconstructed by joining the Employees and
Projects tables.
● Dependency Preservation: All functional dependencies are maintained in the decomposed tables.

Functional Dependencies
Functional Dependencies (FDs) are a fundamental concept in the theory of relational databases. They
express a relationship between attributes in a relation.
KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)
Definition: A functional dependency, denoted by X → Y, between two sets of attributes X and Y in a relation
R means that if two tuples (rows) of R agree on the values of attributes in X, they must also agree on the
values of attributes in Y.

Example

Consider the following Student table:

Functional Dependencies in the Student Table

1. StudentID → Name, Age, Major


○ StudentID uniquely determines the Name, Age, and Major of the student.
○ If two tuples have the same StudentID, they must have the same Name, Age, and Major.
2. Name → Major
○ Name determines the Major.
○ If two tuples have the same Name, they must have the same Major.

However, the second dependency (Name → Major) might not always be true in all contexts, as different
students might have the same name but different majors. It's important to define dependencies based on the
specific dataset and business rules.

Armstrong's Axioms

Armstrong's Axioms are a set of rules used to infer all the functional dependencies on a relational database.
The axioms are:

1. Reflexivity: If Y is a subset of X, then X → Y.


2. Augmentation: If X → Y, then XZ → YZ for any Z.
3. Transitivity: If X → Y and Y → Z, then X → Z.

Example Using Armstrong's Axioms

1. Given the functional dependencies:


● A→B
● B→C

We can infer:

● Using transitivity: A → C
2. Given a relation R with attributes W, U, V, X, Y, Z and functional dependencies: W ->UV, U
->Y, VX->YZ. Prove that it holds: WX->Z.

Solution:

KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)
1. (with decomposition) From W->U V we take W ->V.
2. (with augmentation) WX ->VX.
3. (with transitivity) WX->YZ.
4. (with decomposition) WX->Z.

Attribute Closure in DBMS

Attribute Closure of an attribute set is a fundamental concept in the theory of functional dependencies in
relational databases. It helps determine all the attributes that can be functionally determined from a given set
of attributes.

Definition: The closure of an attribute set X, denoted as X+, is the set of attributes that can be functionally
determined from X alphaα using a given set of functional dependencies F.

Steps to Compute Attribute Closure

1. Initialize the closure set with the given attribute set.


2. Iterate over the functional dependencies:
○ If a functional dependency's left-hand side (LHS) is a subset of the closure set, add the
right-hand side (RHS) of that dependency to the closure set.
3. Repeat step 2 until no more attributes can be added to the closure set.

Example 1

Consider a relation R with attributes {A,B,C,D,E}and the following set of functional dependencies F:

1. A→B
2. B→C
3. A→D
4. D→E

We want to find the closure of {A}, denoted as {A}+.

Step-by-Step Calculation

1. Initialize:
○ Start with the given attribute set: {A}.
○ Closure = {A}
2. First Iteration:
○ A→B(Since A⊆{A}, add B to the closure).
○ Closure = {A,B}.
3. Second Iteration:
○ B→C (Since B⊆{A,B}, add C to the closure).
○ Closure = {A,B,C}.
○ A→D(Since A⊆{A,B,C}A \subseteq \{A, B, C\}A⊆{A,B,C}, add D to the closure).
○ Closure = {A,B,C,D}.
4. Third Iteration:
KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)
○ D→E (Since D⊆{A,B,C,D}, add E to the closure).
○ Closure = {A,B,C,D,E}.
5. Fourth Iteration:
○ No new attributes can be added since all functional dependencies' LHS attributes are already
in the closure set.

Thus, the closure of {A} is {A,B,C,D,E}

Example:2

Consider a relation R(A,B,C,D,E,F)

F: E->A, E->D, A->C, A->D, AE->F, AG->K.

Find the closure of E or E+

The closure of E or E+ is as follows −

E+ = E

=EA {for E->A add A}

=EAD {for E->D add D}

=EADC {for A->C add C}

=EADC {for A->D D already added}

=EADCF {for AE->F add F}

=EADCF {for AG->K don’t add k AG ⊄ D+)

Example 3

Let the relation R(A,B,C,D,E,F)

F: B->C, BC->AD, D->E, CF->B. Find the closure of B.

Solution

The closure for B is as follows −

B+ = {B,C,A,D,E}

Closure is used to find the candidate keys of R and compute F+

Candidate key of R: X is a candidate keys of R if X->{R}

For example,

KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)
R(A,B,C,D,E,F) WHERE F:A->BC, B->D, C->DE, BC->F. Then, find the candidate keys of R.

Solution

A+= {A,B,C,D,E,F}={R}=>A is a candidate key

B+= {B,D} => B is not a candidate key

C+= {C,D,E} => C is not a candidate key

BC+= {B,C,D,E,F} => BC is not a candidate key

Closure of F (F+): F+ is the set of all FDs that can be inferred/ derived from F. Using Armstrong Axioms
repeatedly on F, we can compute all the FDs.

Types of Functional Dependencies:

Various types of functional dependencies help in normalizing databases and ensuring data
integrity.

There are the different types of functional dependencies:

1. Trivial Functional Dependency

A functional dependency X→Y is trivial if Y is a subset of X. In other words, it doesn't provide any new
information because the right-hand side is already included in the left-hand side.

Example: If X={A,B}, then A→A and A,B→B are trivial functional dependencies.

2. Non-Trivial Functional Dependency

A functional dependency X→Y is non-trivial if Y is not a subset of X. This type of dependency provides
new information.

Example: If X={A}and Y={B}, then A→BA \rightarrow BA→B is a non-trivial functional dependency.

3. Multivalued Dependency (MVD)

KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)
A multivalued dependency X→→Y exists if, for each value of X, there is a set of values for Y and Z (where
Z is the set of attributes not in X or Y) such that the values of X determine the set of values for Y
independently of the values of Z.

Example: In a relation with attributes {StudentID,Course,Hobby}, the dependency


StudentID→→HobbyStudentID indicates that a student's hobbies are independent of the courses they take.

The functional dependencies


car model -> manufr_year
car model -> colour
There are multivalued dependency since manufr_year and color both are multivalued attribute

4. Transitive Dependency

A transitive dependency exists when there is an indirect relationship between attributes. If X→Y and Y→Z
then X→Z is a transitive dependency.

Example: Consider a relation with attributes {A,B,C} and dependencies A→B and B→C. The dependency
A→C is transitive.

5. Partial Dependency

A partial dependency exists when a non-prime attribute is functionally dependent on a part (not the whole)
of a candidate key.

Example: In a relation R(A,B,C) where (A,B) is a candidate key, if A→C holds, then C is partially
dependent on A because A is part of the candidate key.

KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)

6. Full Dependency

A full dependency exists when a non-prime attribute is functionally dependent on the whole candidate key,
not just a part of it.

Example: In a relation R(A,B,C) where (A,B) is a candidate key, if (A,B)→C holds, then C is fully
dependent on A and B.

Normalization
Normalization is a database design technique that organizes tables in a manner that reduces redundancy and
dependency. It involves dividing large tables into smaller tables and defining relationships among them to
increase the clarity and efficiency of the database structure.

The goal of normalization is to eliminate redundant data and ensure data dependencies make sense to
minimize update anomalies, insert anomalies, and delete anomalies.

Normal Forms (NF)

There are several normal forms used in database normalization, each with a specific set of rules to follow.
Here, we will explain the different normal forms with clear examples.

1. First Normal Form (1NF)

Definition: A table is in 1NF if:

● All columns contain atomic (indivisible) values.


● Each column contains values of a single type.
● Each column has a unique name.
● The order in which data is stored does not matter.

Example:

Consider a table Students that records student information:

KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)

To convert this to 1NF, we need to ensure that each column contains atomic values:

2. Second Normal Form (2NF)

Definition: A table is in 2NF if:

● It is in 1NF.
● All non-key attributes are fully functionally dependent on the primary key.

Example:

Consider a table StudentCourses that is already in 1NF:

To convert this to 2NF, we need to remove partial dependencies by creating a new table for Courses:

Students Table:

KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)

3. Third Normal Form (3NF)

Definition: A table is in 3NF if:

● It is in 2NF.
● All the attributes are functionally dependent only on the primary key.

Example:

Consider the Courses table from the previous example that may have an additional attribute Department:

To convert this to 3NF, we need to remove transitive dependencies by creating a new table for Departments:

KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)

4. Boyce-Codd Normal Form (BCNF)

Definition: A table is in BCNF if:

● It is in 3NF.
● For every one of its non-trivial functional dependencies X→Y, X is a super key.

Example:

Consider the StudentCourses table:

If an instructor can only teach one course, we have an issue as Instructor is not a super key. To convert to
BCNF, split the tables further:

StudentCourses Table:

KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)

5. Fourth Normal Form (4NF)

Definition: A table is in 4NF if:

● It is in BCNF.
● It has no multi-valued dependencies.

Example:

Consider a table StudentsHobbies:

6. Fifth Normal Form(5NF)

Definition: A table is in 5NF if:


● It is in 4NF.
● It has no join dependency.

KMEC A.Y:2023-24
DATABASE MANAGEMENT SYSTEM CSE(AI & ML)
5NF deals with cases where information can be reconstructed from smaller pieces of information that cannot
be combined in any smaller form.

KMEC A.Y:2023-24

You might also like