NORMALIZATION
INTRODUCTION
Normalization is a process of organizing data in a database to reduce
redundancy and improve data consistency.
Primary keys are really important in organizing information in a
database.
They help to make sure that every row in a table has a unique
identification so that nothing gets mixed up or lost.
Normalization
Normalization is a process of organizing the data in database to avoid
data redundancy, insertion anomaly, update anomaly & deletion anomaly.
Normalization is the process of organizing data in a database to minimize
redundancy and dependency.
In database design, there are different normal forms based on the primary
keys of a table. These include −
First Normal Form (1NF)
1NF requires that each column in a table contains atomic values and that
each row is uniquely identified.
This means that a table cannot have repeating groups or arrays as
columns, and each row must have a unique primary key.
Example
A table is in 1NF if each column contains atomic values and each row is
uniquely identified. For example, a table that lists customers and their phone
numbers −
Customer
Name Phone Numbers
ID
1 John 555-1234, 555-5678
2 Jane 555-9876
3 Michael 555-5555
John is having two phone nos as reapeating groups.
This violates 1NF because the Phone Numbers column contains repeating
groups.
To normalize this table to 1NF, we can split the Phone Numbers column into
separate rows and add a separate primary key column −
Custome
Name Phone Number
r ID
1 John 555-1234
1 John 555-5678
2 Jane 555-9876
3 Michael 555-5555
Second Normal Form (2NF)
2NF builds on 1NF by requiring that each non-primary key column in a
table is fully functionally dependent on the primary key.
This means that a table should not have partial dependencies, where a
non-primary key column depends on only part of the primary key.
Example
A table is in 2NF if each non-primary key column is fully functionally
dependent on the primary key.
For example, a table that lists orders and their line items:
Order ID Customer ID Customer Name Item ID Item Name Quantity
1 1 John 1 Shirt 2
1 1 John 2 Pants 1
2 2 Jane 1 Shirt 1
2 2 Jane 3 Hat 3
This violates 2NF because the Customer Name column depends on only part of
the primary key (Customer ID).
To normalize this table to 2NF, we can split it into two tables −
Order ID Customer ID Item ID Quantity
1 1 1 2
1 1 2 1
2 2 1 1
2 2 3 3
Customer ID Customer Name
1 John
2 Jane
Third Normal Form (3NF)
3NF builds on 2NF by requiring that each non-primary key column in a table is
not transitively dependent on the primary key.
This means that a table should not have transitive dependencies, where a non-
primary key column depends on another non-primary key column.
Example
To explain 3NF further, let's consider an example of a table that lists customer
orders –
Order Customer Customer Customer Order Order
ID ID Name City Date Total
2022-01-
1 100 John Smith New York 100
01
2022-01-
2 101 Jane Doe Los Angeles 200
02
2022-01-
3 102 Bob Johnson San Francisco 300
03
In this example, the non-primary key column "Customer City" is transitively
dependent on the primary key.
That is, it depends on "Customer ID", which is not part of the primary key,
instead of depending directly on the primary key "Order ID".
To bring this table to 3NF, we can split it into two tables −
Table 1: Customers
Customer ID Customer Name Customer City
100 John Smith New York
101 Jane Doe Los Angeles
102 Bob Johnson San Francisco
Table 2: Orders
Order ID Customer ID Order Date Order Total
1 100 2022-01-01 100
2 101 2022-01-02 200
3 102 2022-01-03 300
Now, the "Customer City" column is no longer transitively dependent on the
primary key and is instead in a separate table that has a direct relationship with
the primary key.
This makes the table 3NF-compliant.
Boyce-Codd Normal Form (BCNF)
BCNF is a stricter form of 3NF that applies to tables with more than one
candidate key.
BCNF requires that each non-trivial dependency in a table is a
dependency on a candidate key.
This means that a table should not have non-trivial dependencies, where a
non-primary key column depends on another non-primary key column.
BCNF ensures that each table in a database is a separate entity and
eliminates redundancies.
Example
A table is in BCNF if each determinant is a candidate key.
In other words, every non-trivial functional dependency in the table must
be on a candidate key.
For example, consider a table that lists information about books and their
authors −
Table: Books
Book Title Author Author Name Author
ID ID Nationality
Crime and Fyodor
1 100 Russian
Punishment Dostoevsky
F. Scott
2 The Great Gatsby 101 American
Fitzgerald
3 Pride and Prejudice 102 Jane Austen British
In this example, the functional dependency between "Author ID" and "Author
Name" violates BCNF because it is not on a candidate key.
To bring this table to BCNF, we can split it into two tables −
Table 1: Authors
Author ID Author Name Author Nationality
101 Fyodor Dostoevsky Russian
101 F. Scott Fitzgerald American
102 Jane Austen British
Table 2: Books
Book ID Title Author ID
1 Crime and Punishment 100
2 The Great Gatsby 101
3 Pride and Prejudice 102
Now, the "Author Name" and "Author Nationality" columns are not transitively
dependent on the primary key, and the table is in BCNF.
Fourth Normal Form (4NF)
4NF builds on BCNF by requiring that a table should not have multi-
valued dependencies.
A multi-valued dependency occurs when a non-primary key column
depends on a combination of other non-primary key columns.
For example, a table that lists customer orders with a primary key of
order ID and non-primary key columns for customer ID and order items
violates 4NF because order items depend on both order ID and customer
ID.
For example, a table that lists orders and their products, with columns for
order ID, product ID, and product details, violates 4NF because the
product details depend on the combination of order ID and product ID.
Example
Consider the following table of orders and products
Order ID Product ID Product Name Product Description
1 100 Widget Red Widget
1 200 Widget Blue Widget
2 100 Widget Red Widget
2 300 Thing Green Thing
3 200 Widget Blue Widget
3 300 Thing Green Thing
In this table, the product name and description depend on both the order ID and
product ID, creating a multi-valued dependency. To bring the table into 4NF, we
can split it into three tables –
Order ID Product ID
1 100
1 200
2 100
2 300
3 200
3 300
Product ID Product Name
100 Widget
200 Widget
300 Thing
Product ID Product Description
100 Red Widget
200 Blue Widget
300 Green Thing
Advantages and Disadvantages of Normalization
Advantages of Normalization
Reduced Data Redundancy
Improved Data Consistency
Simplified Database Maintenance
Improved Query Performance
Disadvantages of Normalization
Increased Complexity
Decreased Read Performance
Increased Write Performance
Increased Storage Space
Over-Normalization