Shameer Khan Notes
What is Normalization?
Normalization is a process in database design used to organize attributes
into tables to minimize data redundancy. The goal is to create a "good"
database design that is free of inconsistencies and data loss. It is based
on the concept of functional dependency.
Data Redundancy and Anomalies
The main goal of database design is to reduce data redundancy, which is
when the same piece of information is repeated in multiple rows.
Redundancy leads to three types of problems called anomalies.
Insertion Anomaly: You can't insert a valid piece of data because
another piece of data is missing.
o Example: In a combined Student-Department table, you can't
add a new department until a student is assigned to it,
because the StudentID (the primary key) would be null.
Update Anomaly: To update a single fact, you have to change
multiple rows. If you miss one, the data becomes inconsistent.
o Example: If a department's phone number changes, you must
update the record for every single student in that department.
Deletion Anomaly: Deleting one piece of data unintentionally
causes other, unrelated data to be lost.
o Example: If you delete the record of the last student in a
department, the information about the department itself (like
its name and phone number) is also deleted from the database.
Functional Dependency (FD)
Functional dependency is the core concept behind normalization. It
describes a relationship where the value of one attribute (or set of
attributes) uniquely determines the value of another attribute.
Notation: X -> Y
Meaning: "X functionally determines Y". If you know the value of
X, you can find the single, unique value of Y.
Terminology: X is the determinant and Y is the dependent.
Example: In an employee table, Ssn -> Ename (a Social
Security Number determines a unique Employee Name). However,
Ename -> Ssn is not a valid dependency, as multiple
employees could have the same name.
Types of Functional Dependencies
Full Functional Dependency: An attribute depends on a full
composite key, not just a part of it. The dependency would not
hold if any attribute were removed from the determinant.
o Example: (Ssn, Pnumber) -> Hours. The number of
hours worked depends on both the employee (Ssn) and the
project (Pnumber) combined.
Partial Functional Dependency: An attribute depends on only a
part of a composite key.
o Example: Given the key (StudentID, CourseID), the
dependency CourseID -> CourseName is a partial
dependency because CourseName only depends on
CourseID, not the full key.
Transitive Functional Dependency: An indirect dependency
where a non-key attribute depends on another non-key attribute.
o Example: If StudentID -> DeptID and DeptID ->
DeptName, then DeptName is transitively dependent on
StudentID.
Trivial Functional Dependency: A dependency where the
dependent attribute is a subset of the determinant attribute.
o Example: (StudentID, StudentName) ->
StudentName. This is trivial because StudentName is
already part of the determinant.