Database Tables and
Normalization
        Chapter 5.2
        Normalization of Database Tables
                                                                              Normalization is a process for assigning attributes to
                                                                               entities. It reduces data redundancies and helps eliminate
                                                                               the data anomalies.
                                                                              Normalization works through a series of stages called
                                                                               normal forms:
                                                                                – First normal form (1NF)
                                                                                – Second normal form (2NF)
                                                                                – Third normal form (3NF)
                                                                                – Boye Codd normal form
                                                                              The highest level of normalization is not always desirable.
Database Tables and Normalization                                         Scenario
   The Need for Normalization
                                                                              A few employees works for                 Project Num : 15
    –   Case of a Construction Company                                        one project.
                                                                                                                        Project Name :
            Building project -- Project number, Name, Employees
             assigned to the project.                                                                                   Evergreen
                                                                              Employee Num :
            Employee -- Employee number, Name, Job classification            101, 102, 103,
            The company charges its clients by billing the hours spent       105
             on each project. The hourly billing rate is dependent on
             the employee’s position.
            Periodically, a report is generated.
            The table whose contents correspond to the reporting
             requirements is shown in Table 5.1.
Sample Form
Project Num : 15
Project Name : Evergreen
Emp Num   Emp Name   Job Class   Chr Hours   Hrs Billed   Total
  101
  102
  103
  105
Table Structure Matches                                           Database Tables and Normalization
the Report Format
                                                                     Problems with the Figure 5.1
                                                                      –   The project number is intended to be a primary key, but it
                                                                          contains nulls.
                                                                      –   The table displays data redundancies.
                                                                      –   The table entries invite data inconsistencies.
                                                                      –   The data redundancies yield the following anomalies:
                                                                              Update anomalies.
                                                                              Addition anomalies.
                                                                              Deletion anomalies.
Database Tables and Normalization                                  Data Organization: First Normal Form
   Conversion to First Normal Form
    –   A relational table must not contain repeating groups.
    –   Repeating groups can be eliminated by adding the
        appropriate entry in at least the primary key
        column(s).
                                                                                                                  After
                                                                               Before
First Normal Form (1 NF)                                           Dependency Diagram
                                                                      Dependency Diagram
   1NF Definition                                                      – The primary key components are bold, underlined, and shaded in
                                                                          a different color.
     – The term first normal form (1NF) describes the
                                                                        – The arrows above entities indicate all desirable dependencies,
       tabular format in which:                                           i.e., dependencies that are based on PK.
         All the key attributes are defined.                           – The arrows below the dependency diagram indicate less
                                                                          desirable dependencies -- partial dependencies and transitive
         There are no repeating groups in the table.                     dependencies.
         All attributes are dependent on the primary key.
PROJECT (PROJ_NUM, PROJ_NAME)
ASSIGN (PROJ_NUM, EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR ,HOURS)
Second Normal Form (2 NF)                                            Second Normal Form (2 NF)
   Conversion to Second Normal Form
     – Starting with the 1NF format, the database can be converted
       into the 2NF format by eliminating partial dependence
        PROJECT (PROJ_NUM, PROJ_NAME)
        EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
        ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
Second Normal Form (2 NF)                                            Third Normal Form (3 NF)
    A table is in 2NF if:
                                                                        Conversion to Third Normal Form
           It is in 1NF and
                                                                         –   Create a separate table with attributes in a transitive
           It includes no partial dependencies; that is, no                 functional dependence relationship.
            attribute is dependent on only a portion of the
            primary key.                                                      PROJECT (PROJ_NUM, PROJ_NAME)
            (It is still possible for a table in 2NF to exhibit               ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
            transitive dependency; that is, one or more                       EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
            attributes may be functionally dependent on                       JOB (JOB_CLASS, CHG_HOUR)
            nonkey attributes.)
Third Normal Form (3 NF)                             Boyce-Codd Normal Form (BCNF)
                                                     –   A table is in Boyce-Codd normal form (BCNF) is a
                                                         special case of 3NF, Figure 1 illustrates a table that is in
   3NF Definition                                       3NF but not in BCNF.
    –   A table is in 3NF if:
          It is in 2NF and
          It contains no transitive dependencies.
                                                         Figure 1
The Decomposition of a Table Structure
to Meet BCNF Requirements                            Sample Data for a BCNF Conversion
     Decomposition into BCNF                      Normalization
                                                   –   Normalization will help us identify correct and
                                                       appropriate TABLES.
                                                   –   Until Now we have 4 tables
                                                            PROJECT (PROJ_NUM, PROJ_NAME)
                                                            ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
                                                            EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)
                                                            JOB (JOB_CLASS, CHG_HOUR)
NEXT ........                                     Business Rules
                                                           The company manages many projects.
 –   We are going to identify the relationships            Each project requires the services of many employees.
     between entities (tables) including their
     cardinality, connectivity.                            An employee may be assigned to several different
                                                            projects.
 –   We have to list out the Business Rules.
                                                           Some employees are not assigned to a project and
                                                            perform duties not specifically related to a project.
        PROJECT (PROJ_NUM, PROJ_NAME)                       Some employees are part of a labor pool, to be shared
                                                            by all project teams.
        ASSIGN (PROJ_NUM, EMP_NUM, HOURS)
                                                           Each employee has a (single) primary job
        EMPLOYEE (EMP_NUM, EMP_NAME, JOB_CLASS)             classification. This job classification determines the
                                                            hourly billing rate.
        JOB (JOB_CLASS, CHG_HOUR)
                                                           Many employees can have the same job classification.
Normalization and Database Design                       Normalization and Database Design
   Two Initial Entities:
       PROJECT (PROJ_NUM, PROJ_NAME)                   Three   Entities After Transitive Dependency Removed
       EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME,
         EMP_INITIAL, JOB_DESCRIPTION, JOB_CHG_HOUR)    PROJECT (PROJ_NUM, PROJ_NAME)
                                                        EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL,
                                                                  JOB_CODE)
                                                        JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)
     The Modified ERD                                  Creation of Composite Entity ASSIGN
    Normalization and Database Design                                     Relational Schema
    Attribute ASSIGN_HOUR is assigned to the composite entity
     ASSIGN.
    “Manages” relationship is created between EMPLOYEE and
     PROJECT.
         PROJECT (PROJ_NUM, PROJ_NAME, EMP_NUM)
         EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL,
                     EMP_HIREDATE, JOB_CODE)
         JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)
         ASSIGN (ASSIGN_NUM, ASSIGN_DATE, PROJ_NUM, EMP_NUM,
                  ASSIGN_HOURS)
Denormalization                                                  Thảo luận
                                                                 Mã đơn   Ngày đơn     Mã khách    Tên khách     Mã hàng   Tên hàng hóa   số lượng   đơn giá
   Normalization is only one of many database design             hàng                   hàng                     hóa
    goals.                                                        007     05/08/2007    AAA       Nguyễn Văn A     01       Bàn phím        100        25
                                                                  007     05/08/2007    AAA       Nguyễn Văn A     02       Màn hình         40        27
   Normalized (decomposed) tables require additional             007     05/08/2007    AAA       Nguyễn Văn A     03         USB            70        15
    processing, reducing system speed.                            009     15/08/2007     BBB      Nguyễn Văn B     01       Bàn phím        50         25
                                                                  009     15/08/2007     BBB      Nguyễn Văn B     02       Màn hình        80         29
   Normalization purity is often difficult to sustain in         009     15/08/2007     BBB      Nguyễn Văn B     04       Webcam          20         27
    the modern database environment. The conflict
    between design efficiency, information                        015
                                                                  015
                                                                          15/08/2007
                                                                          15/08/2007
                                                                                        AAA
                                                                                        AAA
                                                                                                  Nguyễn Văn A
                                                                                                  Nguyễn Văn A
                                                                                                                   02
                                                                                                                   04
                                                                                                                             Màn hình
                                                                                                                             Webcam
                                                                                                                                            500
                                                                                                                                             60
                                                                                                                                                       30
                                                                                                                                                       27
    requirements, and processing speed are often
    resolved through compromises that include
    denormalization.