Database Processing: Fundamentals,
Design, and Implementation
Sixteenth Edition
Chapter 4
Database Design Using
Normalization
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Learning Objectives
4.1 To design updatable databases to store data received from another
source
4.2 To use SQL to access table structure
4.3 To understand the advantages and disadvantages of normalization
4.4 To understand denormalization
4.5 To design read-only databases to store data from updatable
databases
4.6 To recognize and be able to correct common design problems:
– The multivalue, multicolumn problem
– The inconsistent values problem
– The missing values problem
– The general-purpose remarks column problem
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Chapter Premise
• We have received one or more tables of existing data.
• The data is to be stored in a new database.
• Question: Should the data be stored as received, or
should it be transformed for storage?
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Figure 4-1: Assessing Table Structure
Guidelines for Assessing Table Structure
• Count rows and examine columns
• Examine data values and interview users to determine:
– Multivalued dependencies
– Functional dependencies
– Candidate keys
– Primary keys
– Foreign keys
• Assess validity of assumed referential integrity constraints
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Counting Rows in a Table
• Use the COUNT(*) function to count the number of rows
in a table.
• Use the SELECT statement to determine the number and
type of the table’s columns.
– If there are a large number of rows you can limit the
number or rows returned by using the TOP function
as shown on the next slide.
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Examining the Columns
/* *** S QL-Query-CH04-01 *** */
SELECT TOP 5 *
FROM S KU_DATA;
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Checking Validity of Assumed Referential
Integrity Constraints
• Given two tables with an assumed foreign key constraint:
SKU_DATA (SKU, SKU_Description, Department, Buyer)
BUYER (BuyerName, Department, Position, Supervisor)
• Where SKU_DATA.Buyer must exist in BUYER.Buyer
• To find any foreign key values that violate the foreign key constraint
/* *** SQL-Query-CH04-02 *** */
SELECT Buyer
FROM SKU_DATA
WHERE Buyer NOT IN
(SELECT BuyerName
FROM BUYER);
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Type of Databases
• Updateable database, or read-only database?
• If updateable database, we normally want tables in B
CNF.
• If read-only database, we may not use BCNF tables.
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Designing Updateable Databases
• Updatable databases are typically the operational
databases of a company, such as the online transaction
processing (OLTP) system discussed for Cape Codd
Outdoor Sports at the beginning of Chapter 2.
• If you are constructing an updatable database, then you
need to be concerned about modification anomalies and
inconsistent data.
• Consequently, you must carefully consider normalization
principles.
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Figure 4-2: Advantages and
Disadvantages of Normalization
Advantages and Disadvantages of Normalization
• Advantages
– Eliminate modification anomalies
– Reduce duplicated data
▪ Eliminate data integrity problems
▪ Save file space
– Single table queries will run faster
• Disadvantages
– More complicated SQL required for multiple
subqueries and joins
– Extra work for DBMS can mean slower applications
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Figure 4-3: The EQUIPMENT_REPAIR
Table
• The EQUIPMENT_REPAIR table below is not normalized.
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Final Steps
• In Chapter 7, you will learn how to:
– Remove unneeded tables after the data is copied,
using the SQL DROP TABLE statement
– Create the referential integrity constraint, using the
SQL ALTER TABLE statement
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Choosing Not to Use BCNF
• BCNF is used to control anomalies from functional
dependencies.
• There are times when BCNF is not desirable.
• The classic example is ZIP codes:
– ZIP codes almost never change.
– Any anomalies are likely to be caught by normal
business practices.
– Not having to use SQL to join data in two tables will
speed up application processing.
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Multivalued Dependencies
• Anomalies from multivalued dependencies are very
problematic.
• Always place the columns of a multivalued dependency
into a separate table (4NF).
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Designing Read-Only Databases
• The extracted sales data that we used for Cape Codd
Outdoor Sports in Chapter 2 is a small, but typical
example of a read-only database.
• Read-only databases are used in business intelligence
(BI) systems for producing information for decision
making as discussed for Cape Codd Outdoor Sports in
Chapter 2.
• Read-only databases are commonly used in a data
warehouse, which was introduced in Chapter 2.
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Read-Only Databases
• Read-only databases are nonoperational databases
using data extracted from operational databases
commonly used in data warehouses.
• They are used for querying, reporting, and data mining
applications.
• They are never updated (in the operational database
sense—they may have new data imported from time to
time).
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Denormalization
• For read-only databases, normalization is seldom an
advantage.
– Application processing speed is more important.
• Denormalization is the joining of the data in normalized
tables prior to storing the data.
• The data is then stored in nonnormalized tables.
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Figure 4-5: Normalized Relations
• The Normalized STUDENT, ACTIVITY, and PAYMENT
Relations
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Figure 4-6: Denormalizing the Data
/* *** S QL-INSERT-CH04-03 *** */
INSERT INTO STUDENT_ACTIVITY_PAYMENT_DATA
SELECT STUDENT.StudentID,StudentName,
ACTIVITY.Activity, ActivityFee, AmountPaid
FROM STUDENT, PAYMENT, ACTIVITY
WHERE STUDENT.StudentID = PAYMENT.StudentID
AND PAYMENT.Activity = ACTIVITY.Activity;
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Figure 4-7: Columns in the PRODUCT
Table
Product Product
• SKU (Primary Key) • Quantity Sold Past Year
• Part Number (Candidate key) • Quantity Sold Past Quarter
• SKU_Description (Candidate key) • Quantity Sold Past Month
• Vendor Number • Detail Picture
• Vendor Name • Thumbnail Picture
• Vendor Contact_1 • Marketing Short Description
• Vendor Contact_2 • Marketing Long Description
• Vendor Street • Part Color
• Vendor City • Units Code
• Vendor State • Bin Number
• Vendor Zip • Production Key Code
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Customized Tables
• Read-only databases are often designed with many copies
of the same data, but with each copy customized for a
specific application.
• Consider the PRODUCT table previously shown:
PRODUCT_PURCHASING (SKU, SKU_Description, VendorNumber, VendorName,
VendorContact_1, VendorContact_2, VendorStreet, VendorCity, VendorState,
VendorZIP)
PRODUCT_USAGE (SKU, SKU_Description, QuantitySoldPastYear,
QuantitySoldPastQuarter, QuantitySoldPastMonth)
PRODUCT_WEB (SKU, DetailPicture, ThumbnailPicture, MarketingShortDescription,
MarketingLongDescription, PartColor)
PRODUCT_INVENTORY (SKU, PartNumber, SKU_Description, UnitsCode,
BinNumber, ProductionKeyCode)
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Figure 4-8: Common Design Problems
• Practical Problems in Designing Databases from
Existing Data
– The multivalue, multicolumn problem
– Inconsistent values
– Missing values
– General-purpose remarks column
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
The Multivalue, Multicolumn Problem
• The multivalue, multicolumn problem occurs when
multiple values of an attribute are stored in more than one
column:
EMPLOYEE (EmployeeNumber, EmployeeLastName,
EmployeeFirstName, EmailAddress, Auto1_LienseNumber,
Auto2_LicenseNumber, Auto3_LicenseNumber)
• The solution is to use a separate table to store the
multiple values.
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Inconsistent Values (1 of 3)
• Inconsistent values occur when different users or
different data sources use slightly different forms of the
same data value:
– Different codings:
▪ SKU_Description = 'Corn, Large Can'
▪ SKU_Description = 'Can, Corn, Large'
▪ SKU_Description = 'Large Can Corn‘
– Different spellings:
▪ Coffee, Cofee, Coffeee
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Inconsistent Values (2 of 3)
• Particularly problematic are primary or foreign key values.
• To detect:
– Use referential integrity checks for checking keys
– Use the SQL GROUP BY clause on suspected
columns
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Inconsistent Values (3 of 3)
/* *** S QL-Query-CH04-05 *** */
SELECT SK U_Description, COUNT(*) as S K U_ Description_
Count
FROM S K U_DATA
GROUP BY S K U_Description;
• Results of this query
show that there are not
any inconsistent values!
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Missing Values (NULL Values)
• A missing value or null value is a value that has never been provided.
– In a database table, a null value appears in upper case letters as
NULL.
• Null values are ambiguous:
– May indicate that a value is inappropriate
▪ DateOfLastChildbirth is inappropriate for a male
– May indicate that a value is appropriate but unknown
▪ DateOfLastChildbirth is appropriate for a female, but may be
unknown
– May indicate that a value is appropriate and known, but has never
been entered
▪ DateOfLastChildbirth is appropriate for a female, and may be
known but no one has recorded it in the database
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Checking for Null Values
• Use the SQL IS NULL operator to check for null values:
/* *** S QL-Query-CH04-06 *** */
SELECT COUNT (*) as QuantityNullCount
FROM ORDER_ITEM
WHERE Quantity IS NULL;
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
The General-Purpose Remarks Column
• A general-purpose remarks column is a column with a name such
as:
– Remarks
– Comments
– Notes
• It often contains important data stored in an inconsistent, verbal, and
verbose way.
– A typical use is to store data on a customer’s interests.
• Such a column may:
– Be used inconsistently
– Hold multiple data items
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Hidden Foreign Key Data in the General-
Purpose Remarks Column
CONTACT (ContactID, ContactLastName, ContactFirstName,
Address, . . . {other data}, Remarks, AirplaneModelID)
AIRPLANE_MODEL (AirplaneModelID, AirplaneModelName,
AirplaneModelDescription, . . . {other airplane model data})
• In a typical situation, the data for the foreign key may have been
recorded in the Remarks column.
– 'Wants to buy a Piper Seneca
– 'Owner of a Piper
Seneca
– 'Possible buyer for a turbo Seneca'.
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved
Copyright
This work is protected by United States copyright laws and is
provided solely for the use of instructors in teaching their courses
and assessing student learning. Dissemination or sale of any part of
this work (including on the World Wide Web) will destroy the integrity
of the work and is not permitted. The work and materials from it
should never be made available to students except by instructors
using the accompanying text in their classes. All recipients of this
work are expected to abide by these restrictions and to honor the
intended pedagogical purposes and the needs of other instructors
who rely on these materials.
Copyright © 2021, 2018, 2015 Pearson Education, Inc. All Rights Reserved