Normalization
Normalization is a process of producing a set of
 related relations (tables) with desirable
 attributes, given the data requirements of a
 domain
 The goal is to remove redundancy and data
 modification problems: insertion anomaly,
 update anomaly, and deletion anomaly
 Usually dividing a table into 2 or more tables
 Using Normal Forms as a formal guide
                                                    6
Anomaly Example
                  7
Normalized Tables
                    8
Normal Forms
 Normal forms are formal guidelines (steps)
 for the normalization process
                    DK/NF
                5th Normal Form
              4th Normal Form (4NF)
             Boyce Codd NF (BC/NF)
            3rd Normal Form (3NF)
           2nd Normal Form (2NF)
           1st Normal Form (1NF)
            Unnormalized Form (UNF)
                                              9
Normalization – 1NF
 A table is in 1NF if
  1.   it satisfies the definition of a relation
        Review: what are the features of a relation?
  2.   no “repeating groups” (columns)
                                                        10
Repeating Groups
       Customer First                         Telephone
                                  Surname
       ID       Name                          Number
       123          Robert        Ingram      555-861-2025
                                              555-403-1659
       456          Jane          Wright
                                              555-776-4100
       789          Maria         Fernandez   555-808-9633
Customer   First                                                     Tel. No.
                      Surname       Tel. No. 1     Tel. No. 2
ID         Name                                                      3
123        Robert     Ingram        555-861-2025
456        Jane       Wright        555-403-1659   555-776-4100
789        Maria      Fernandez     555-808-9633
                                                   Lots of Null values    11
Avoid Repeating Groups
 Transforming to additional rows, rather
 than additional columns
 Customer ID   First Name   Surname     Telephone Number
 123           Robert       Ingram      555-861-2025
 456           Jane         Wright      555-403-1659
 456           Jane         Wright      555-776-4100
 789           Maria        Fernandez   555-808-9633
                                                           12
Transforming to 1NF: Example
 Another example
     UNF              1NF
                               13
Problems in 1NF
  Basically it may have the same problem as
  spreadsheet tables
      Redundancy, and anomalies
  What’s the problem in this table?
 Customer ID   First Name   Surname     Telephone Number
 123           Robert       Ingram      555-861-2025
 456           Jane         Wright      555-403-1659
 456           Jane         Wright      555-776-4100
 789           Maria        Fernandez   555-808-9633
                                                           14
Higher Normal Forms
 Normal forms higher than 1NF deal with
 functional dependency
 Identifying the normal form level by
 analyzing the functional dependency
 between attributes (fields)
                                          15
Functional Dependency
 If each value of attribute A is associated with only one
 value of attribute B, we say
     A determines B
     Or, B is dependent on A
     Denoted as: A  B
 Functional dependence describes relationships between
 attributes (not relations)
 Composite determinant: A (and B) can be a set of fields
     If A consists of column a and b, and a and b together
      determines c, then:
     (a, b)  c
                                                              16
      Functional Dependency Examples
              Dependency example
                   For each Customer ID, there is only one corresponding first name
                    (or last name), so: Customer ID determines First Name, or
                    Customer ID  First Name
                   Composite determinant: (First Name, Last Name)  Customer ID
              Non-dependency example
                   An Customer ID can have multiple phone numbers, so: Customer ID
                    does not determine Phone Number
                                                                Dependency diagram:
Dependency diagram:                                             an example of composite determinant
A solid line indicates
determinacy
                         Customer ID   First Name   Last Name         Phone Number
                         456           Jane         Wright            555-403-1659
                         456           Jane         Wright            555-776-4100
                         789           Jane         Fernandez         555-808-9633
  Dependency diagram:
  a broken line indicates
  non-determinacy                                                                                     17
Functional Dependency and Keys
  By definition, a unique key functionally
  determines all other attributes
     Primary key
     Candidate key
     Surrogate key
     Composite primary key
  Example
      ISBN       BookTitle    PubDate   ListPrice
                                                    18
    Normalization – 2NF
          A relation is in 2NF, if
               It is in 1NF, and
               All non-key attributes (attributes that are not part of any primary key or candidate
                key) must be functionally dependent on the whole primary (candidate) key
               Or, NO partial dependency
          Partial dependency
               A non-key attribute is dependent on part of a composite primary key
          Implication
               A relation with only single-attribute primary key and candidate key does not have
                partial dependency problem; therefore, such a relation is in 2NF.
(A, B) is a composite PK and
a composite determinant
                   A                         B              C                     D
  Partial dependency: B is also a
  determinant and is part of the PK (A, B)
                                                                                                       19
A Relation in 1NF but Not in 2NF
     Composite primary key determines other columns
Course ID   Section Title           Classroom   Time
CIS 2010    1        Intro to CIS   ALC 201     TTH 3:00-4:15PM
CIS 2010    2        Intro to CIS   ALC 310     TTH 9:30-10:45AM
CIS 2010    3        Intro to CIS   Online      W 7:00PM-9:40PM
CIS 3730    1        Database       CS 200      W 7:00PM-9:40PM
     Partial dependency
                                                                   20
Transforming to 2NF
 Steps
     Identify the primary key (PK).
     If PK consists of only one field, then it is in 2NF.
     If PK is a composite key, then look for partial
      dependency.
     If there is partial dependency, move the partial
      dependency involved attributes to another relation.
       A             B              C             D
  A            B          C                B           D
                                                             21
    Transforming to 2NF: Example
      Course ID       Section    Title           Classroom          Time
      CIS 2010        1          Intro to CIS    ALC 201            TTH 3:00-4:15PM
      CIS 2010        2          Intro to CIS    ALC 310            TTH 9:30-10:45AM
      CIS 2010        3          Intro to CIS    Online             W 7:00PM-9:40PM
      CIS 3730        1          Database        CS 200             W 7:00PM-9:40PM
      Course ID       Title                 Course ID     Section   Classroom   Time
                                            CIS 2010      1         ALC 201     TTH 3:00-4:15PM
      CIS 2010        Intro to CIS
                                            CIS 2010      2         ALC 310     TTH 9:30-10:45AM
      CIS 3730        Database
                                            CIS 2010      3         Online      W 7:00PM-9:40PM
Redundancy                                  CIS 3730      1         CS 200      W 7:00PM-9:40PM
is avoided
                 Course (CourseID, Title)
                 Schedule (CouseID, Section, Classroom, Time)
                           FK: CourseIDCourse.CourseID
                                                                                              22
Problems in 2NF
 Again, there could be redundancy and
 potential inconsistency
 Order_ID Order_Date Cust_ID   Cust_Name   Cust_Address
                               Value
 1006      10/24/2004   2      Furniture   Plano, TX
                               Furniture
 1007      10/25/2004   6      Gallery     Boulder, CO
                               Value
 1008       11/1/2004   2      Furniture   Plano, TX
                                                          23
Normalization – 3NF
   A relation is in 3NF, if
       It is in 2NF, and
       All (non-key) attributes must, and only, be
        functionally dependent on the primary key
       Or, NO transitive dependency
   Transitive dependency
       A  B and B  C, then A  C
          A                 B                 C        D
Transitive dependency: C is a determinant but not PK
                                                           24
A Relation in 2NF but Not in 3NF
  Identify primary key (PK) and look for
  transitive dependency
 Order_ID Order_Date CustID   Name                Address
 1006     10/24/2004    2     Value Furniture     Plano, TX
 1007     10/25/2004    6     Furniture Gallery   Boulder, CO
 1008      11/1/2004    2     Value Furniture     Plano, TX
                       Transitive dependency
                                                                25
Transforming to 3NF
 Move the attributes involved in a transitive
 dependency to another relation
 Order Order_ID Order_Date CustID Name                Address
         1006      10/24/2004   2   Value Furniture Plano, TX
         1007      10/25/2004   6   Furniture Gallery Boulder, CO
         1008       11/1/2004   2   Value Furniture Plano, TX
 Order                                 Customer
 Order_ID Order_Date Customer           CustID   Name              Address
 1006      10/24/2004    2                 2     Value Furniture Plano, TX
 1007      10/25/2004    6                 6     Furniture Gallery Boulder, CO
 1008       11/1/2004    2
         Customer (CustID, Name, Address)
         Order (Corder_ID, Order_Date, Customer)
                  FK: CustomerCustomer.CustID
                                                                                 26
BC/NF
 BC/NF is a stricter form of 2NF and 3NF
            A                B               C
    (A, B) is a candidate key
    (A, C) is also a candidate key
    A, B, C are all key attributes; there are no non-key attributes
    Can be viewed as special case of transitive dependency; or
     can be transformed into a similar pattern as partial
     dependency.
                                              A       C
     A          B        C
                                                 C    B
                                                                       27
BC/NF Example
      Physician       Patient         Bill Number
      A               1               101
      B               1               101
      A               2               102
      B               2               103
Physician   Bill Number
                                Bill Number    Patient
A           101
                                101            1
B           101
                                102            2
A           102
                                103            2
B           103
                                                         28
4NF Brief
 Multi-value dependency
     Employee  Skill (determines multiple skills)
     Employee  Degree (determines multiple degrees)
  Employee       Skill                     Degree
  Jack           SQL, Teaching             BA, MS, PhD
  Michael        SQL, C#, Java, Network    BA, MBA
  Employee          Skill                 Degree
  Jack              SQL                   BA
  Jack              Teaching              MS
  Jack                                    PhD
  Michael           SQL                   BA
  …
                                                         29
Practical Tips
  To identify the normalization level, determine the primary key and
  candidate keys first; then look for partial dependency (check if there is
  a composite PK or candidate key) and transitive dependency
  Design a relation that is easy to explain its meaning
      If there are attributes of different things in one table, there are usually problems;
       for example, students and courses are in one table, or customer and products are
       in one table; etc.
  Attributes that potentially generate many Null values might be moved
  into another table
  Generally relations in the 3NF are considered to be well formed; going
  higher may introduce unnecessary structural complexity which is
  inefficient for data queries
  Very often tables can go for lower normal forms (de-normalization)
  depending on design requirements
                                                                                               30
Normal Forms Summary
BCNF: every attribute is dependent on the key, the whole key, and nothing but the key
3NF: every non-key attribute is dependent on the key, the whole key, and nothing but
the key
                                                   Eliminate transitive
                                                      dependencies
2NF: every non-key attribute is dependent on the key, and the whole key.
                                                    Eliminate partial
                                                     dependencies
           1NF: If the tables are relations and no repeating groups
                                                   Split repeating groups
                                                     in separate rows
                                        UNF
                                                                                        31
Summary
 Key concepts
    Anomalies
    Normalization and de-normalization
    Normal forms: 1NF to 3NF
    Functional dependency
       Partial dependency
       Transitive dependency
 Key skills: identify and normalize tables from 1NF to 3NF
   Be able to identify the normal form of a given relation
   Be able to identify functional dependency among attributes
   Be able to apply normalization principles to normalize a
    relation up to the 3rd normalization form
                                                                 32