Database Normalization
Definition
•   This is the process which allows you to winnow out
    redundant data within your database.
•   This involves restructuring the tables to successively
    meeting higher forms of Normalization.
•   A properly normalized database should have the
    following characteristics
    –   Scalar values in each fields
    –   Absence of redundancy.
    –   Minimal use of null values.
    –   Minimal loss of information.
        Levels of Normalization
•   Levels of normalization based on the amount of
    redundancy in the database.
•   Various levels of normalization are:
    –   First Normal Form (1NF)
    –   Second Normal Form (2NF)
                                                     Number of Tables
                                        Redundancy
    –   Third Normal Form (3NF)
                                                                        Complexity
    –   Boyce-Codd Normal Form (BCNF)
    –   Fourth Normal Form (4NF)
    –   Fifth Normal Form (5NF)
    –   Domain Key Normal Form (DKNF)
    Most databases should be 3NF or BCNF in order to avoid
                    the database anomalies.
Levels of Normalization
                    1NF
                    2NF
                    3NF
                    4NF
                    5NF
                    DKNF
Each higher level is a subset of the lower level
           First Normal Form (1NF)
A table is considered to be in 1NF if all the fields contain
only scalar values (as opposed to list of values).
Example (Not 1NF)
  ISBN            Title         AuName    AuPhone         PubName       PubPhone       Price
  0-321-32132-1   Balloon       Sleepy,   321-321-1111,   Small House   714-000-0000   $34.00
                                Snoopy,   232-234-1234,
                                Grumpy    665-235-6532
  0-55-123456-9   Main Street   Jones,    123-333-3333,   Small House   714-000-0000   $22.95
                                Smith     654-223-3455
  0-123-45678-0   Ulysses       Joyce     666-666-6666    Alpha Press   999-999-9999   $34.00
  1-22-233700-0   Visual        Roman     444-444-4444    Big House     123-456-7890   $25.00
                  Basic
                  Author and AuPhone columns are not scalar
                     1NF - Decomposition
1.        Place all items that appear in the repeating group in a new
          table
2.        Designate a primary key for each new table produced.
3.        Duplicate in the new table the primary key of the table from
          which the repeating group was extracted or vice versa.
Example (1NF)
                                                                         ISBN            AuName   AuPhone
                                                                         0-321-32132-1   Sleepy   321-321-1111
     ISBN            Title         PubName       PubPhone       Price    0-321-32132-1   Snoopy   232-234-1234
     0-321-32132-1   Balloon       Small House   714-000-0000   $34.00   0-321-32132-1   Grumpy   665-235-6532
     0-55-123456-9   Main Street   Small House   714-000-0000   $22.95   0-55-123456-9   Jones    123-333-3333
     0-123-45678-0   Ulysses       Alpha Press   999-999-9999   $34.00   0-55-123456-9   Smith    654-223-3455
     1-22-233700-0   Visual        Big House     123-456-7890   $25.00   0-123-45678-0   Joyce    666-666-6666
                     Basic
                                                                         1-22-233700-0   Roman    444-444-4444
             Functional Dependencies
1.        If one set of attributes in a table determines another
          set of attributes in the table, then the second set of
          attributes is said to be functionally dependent on the
          first set of attributes.
Example 1
     ISBN            Title         Price    Table Scheme: {ISBN, Title, Price}
     0-321-32132-1   Balloon       $34.00   Functional Dependencies: {ISBN}  {Title}
     0-55-123456-9   Main Street   $22.95                               {ISBN}  {Price}
     0-123-45678-0   Ulysses       $34.00
     1-22-233700-0   Visual        $25.00
                     Basic
             Functional Dependencies
Example 2
PubID      PubName       PubPhone       Table Scheme: {PubID, PubName, PubPhone}
1          Big House     999-999-9999   Functional Dependencies: {PubId}  {PubPhone}
2          Small House   123-456-7890
                                                                   {PubId}  {PubName}
3          Alpha Press   111-111-1111
                                                    {PubName, PubPhone}  {PubID}
Example 3
    AuID     AuName      AuPhone
    1        Sleepy      321-321-1111
                                        Table Scheme: {AuID, AuName, AuPhone}
    2        Snoopy      232-234-1234   Functional Dependencies: {AuId}  {AuPhone}
    3        Grumpy      665-235-6532                              {AuId}  {AuName}
    4        Jones       123-333-3333
                                                     {AuName, AuPhone}  {AuID}
    5        Smith       654-223-3455
    6        Joyce       666-666-6666
    7        Roman       444-444-4444
                  FD – Example
Database to track reviews of papers submitted to an academic
conference. Prospective authors submit papers for review and
possible acceptance in the published conference proceedings.
Details of the entities
–   Author information includes a unique author number, a name, a mailing
    address, and a unique (optional) email address.
–   Paper information includes the primary author, the paper number, the
    title, the abstract, and review status (pending, accepted,rejected)
–   Reviewer information includes the reviewer number, the name, the
    mailing address, and a unique (optional) email address
–   A completed review includes the reviewer number, the date, the paper
    number, comments to the authors, comments to the program
    chairperson, and ratings (overall, originality, correctness, style, clarity)
               FD – Example
Functional Dependencies
   –   AuthNo  AuthName, AuthEmail, AuthAddress
   –   AuthEmail  AuthNo
   –   PaperNo  Primary-AuthNo, Title, Abstract, Status
   –   RevNo  RevName, RevEmail, RevAddress
   –   RevEmail  RevNo
   –   RevNo, PaperNo  AuthComm, Prog-Comm, Date,
       Rating1, Rating2, Rating3, Rating4, Rating5
    Second Normal Form (2NF)
For a table to be in 2NF, there are two requirements
     –    The database is in first normal form
     –    All nonkey attributes in the table must be functionally dependent on the
          entire primary key
Note: Remember that we are dealing with non-key attributes
Example 1 (Not 2NF)
Scheme  {Title, PubId, AuId, Price, AuAddress}
     1.   Key  {Title, PubId, AuId}
     2.   {Title, PubId, AuID}  {Price}
     3.   {AuID}  {AuAddress}
     4.   AuAddress does not belong to a key
     5.   AuAddress functionally depends on AuId which is a subset of a key
    Second Normal Form (2NF)
Example 2 (Not 2NF)
Scheme  {City, Street, HouseNumber, HouseColor, CityPopulation}
     1.    key  {City, Street, HouseNumber}
     2.    {City, Street, HouseNumber}  {HouseColor}
     3.    {City}  {CityPopulation}
     4.    CityPopulation does not belong to any key.
     5.    CityPopulation is functionally dependent on the City which is a proper subset of the
           key
Example 3 (Not 2NF)
Scheme  {studio, movie, budget, studio_city}
     1.    Key  {studio, movie}
     2.    {studio, movie}  {budget}
     3.    {studio}  {studio_city}
     4.    studio_city is not a part of a key
     5.    studio_city functionally depends on studio which is a proper subset of the key
             2NF - Decomposition
1.   If a data item is fully functionally dependent on only a part of the
     primary key, move that data item and that part of the primary key
     to a new table.
2.   If other data items are functionally dependent on the same part of
     the key, place them in the new table also
3.   Make the partial primary key copied from the original table the
     primary key for the new table. Place all items that appear in the
     repeating group in a new table
Example 1 (Convert to 2NF)
     Old Scheme  {Title, PubId, AuId, Price, AuAddress}
     New Scheme  {Title, PubId, AuId, Price}
     New Scheme  {AuId, AuAddress}
            2NF - Decomposition
Example 2 (Convert to 2NF)
    Old Scheme  {Studio, Movie, Budget, StudioCity}
    New Scheme  {Movie, Studio, Budget}
    New Scheme  {Studio, City}
Example 3 (Convert to 2NF)
    Old Scheme  {City, Street, HouseNumber, HouseColor, CityPopulation}
    New Scheme  {City, Street, HouseNumber, HouseColor}
    New Scheme  {City, CityPopulation}
          Third Normal Form (3NF)
This form dictates that all non-key attributes of a table must be functionally
      dependent on a candidate key i.e. there can be no interdependencies
      among non-key attributes.
For a table to be in 3NF, there are two requirements
     –     The table should be second normal form
     –     No attribute is transitively dependent on the primary key
Example (Not in 3NF)
Scheme  {Title, PubID, PageCount, Price }
     1.    Key  {Title, PubId}
     2.    {Title, PubId}  {PageCount}
     3.    {PageCount}  {Price}
     4.    Both Price and PageCount depend on a key hence 2NF
     5.    Transitively {Title, PubID}  {Price} hence not in 3NF
          Third Normal Form (3NF)
Example 2 (Not in 3NF)
Scheme  {Studio, StudioCity, CityTemp}
     1.    Primary Key  {Studio}
     2.    {Studio}  {StudioCity}
     3.    {StudioCity}  {CityTemp}
     4.    {Studio}  {CityTemp}
     5.    Both StudioCity and CityTemp depend on the entire key hence 2NF
     6.    CityTemp transitively depends on Studio hence violates 3NF
                                                                 BuildingID   Contractor   Fee
Example 3 (Not in 3NF)                                              100        Randolph    1200
Scheme  {BuildingID, Contractor, Fee}                              150        Ingersoll   1100
     1.    Primary Key  {BuildingID}                               200        Randolph    1200
     2.    {BuildingID}  {Contractor}                              250          Pitkin    1100
     3.    {Contractor}  {Fee}                                     300        Randolph    1200
     4.    {BuildingID}  {Fee}
     5.    Fee transitively depends on the BuildingID
     6.    Both Contractor and Fee depend on the entire key hence 2NF
            3NF - Decomposition
1.   Move all items involved in transitive dependencies to a new entity.
2.   Identify a primary key for the new entity.
3.   Place the primary key for the new entity as a foreign key on the
     original entity.
Example 1 (Convert to 3NF)
     Old Scheme  {Title, PubID, PageCount, Price }
     New Scheme  {PubID, PageCount, Price}
     New Scheme  {Title, PubID, PageCount}
             3NF - Decomposition
Example 2 (Convert to 3NF)
    Old Scheme  {Studio, StudioCity, CityTemp}
    New Scheme  {Studio, StudioCity}
    New Scheme  {StudioCity, CityTemp}
Example 3 (Convert to 3NF)                        BuildingID   Contractor   Contractor   Fee
    Old Scheme  {BuildingID, Contractor, Fee}       100        Randolph     Randolph    1200
                                                     150        Ingersoll    Ingersoll   1100
    New Scheme  {BuildingID, Contractor}
                                                     200        Randolph       Pitkin    1100
    New Scheme  {Contractor, Fee}                   250          Pitkin
                                                     300        Randolph
    Boyce-Codd Normal Form (BCNF)
•     BCNF does not allow dependencies between attributes that belong to candidate keys.
•     BCNF is a refinement of the third normal form in which it drops the restriction of a non-key
      attribute from the 3rd normal form.
•     Third normal form and BCNF are not same if the following conditions are true:
      –     The table has two or more candidate keys
      –     At least two of the candidate keys are composed of more than one attribute
      –     The keys are not disjoint i.e. The composite candidate keys share some attributes
Example 1 - Address (Not in BCNF)
Scheme  {City, Street, ZipCode }
     1.     Key1  {City, Street }
     2.     Key2  {ZipCode, Street}
     3.     No non-key attribute hence 3NF
     4.     {City, Street}  {ZipCode}
     5.     {ZipCode}  {City}
     6.     Dependency between attributes belonging to a key
    Boyce Codd Normal Form (BCNF)
Example 2 - Movie (Not in BCNF)
Scheme  {MovieTitle, MovieID, PersonName, Role, Payment }
     1.   Key1  {MovieTitle, PersonName}
     2.   Key2  {MovieID, PersonName}
     3.   Both role and payment functionally depend on both candidate keys thus 3NF
     4.   {MovieID}  {MovieTitle}
     5.   Dependency between MovieID & MovieTitle Violates BCNF
Example 3 - Consulting (Not in BCNF)
Scheme  {Client, Problem, Consultant}
     1.   Key1  {Client, Problem}
     2.   Key2  {Client, Consultant}
     3.   No non-key attribute hence 3NF
     4.   {Client, Problem}  {Consultant}
     5.   {Client, Consultant}  {Problem}
     6.   Dependency between attributess belonging to keys violates BCNF
          BCNF - Decomposition
1.   Place the two candidate primary keys in separate
     entities
2.   Place each of the remaining data items in one of the
     resulting entities according to its dependency on the
     primary key.
Example 1 (Convert to BCNF)
     Old Scheme  {City, Street, ZipCode }
     New Scheme1  {ZipCode, Street}
     New Scheme2  {City, Street}
•    Loss of relation {ZipCode}  {City}
     Alternate New Scheme1  {ZipCode, Street }
     Alternate New Scheme2  {ZipCode, City}
             Decomposition – Loss of
                  Information
1.   If decomposition does not cause any loss of information it is called
     a lossless decomposition.
2.   If a decomposition does not cause any dependencies to be lost it is
     called a dependency-preserving decomposition.
3.   Any table scheme can be decomposed in a lossless way into a
     collection of smaller schemas that are in BCNF form. However the
     dependency preservation is not guaranteed.
4.   Any table can be decomposed in a lossless way into 3rd normal form
     that also preserves the dependencies.
     •    3NF may be better than BCNF in some cases
         Use your own judgment when decomposing schemas
          BCNF - Decomposition
Example 2 (Convert to BCNF)
    Old Scheme  {MovieTitle, MovieID, PersonName, Role, Payment }
    New Scheme  {MovieID, PersonName, Role, Payment}
    New Scheme  {MovieTitle, PersonName}
•   Loss of relation {MovieID}  {MovieTitle}
    New Scheme  {MovieID, PersonName, Role, Payment}
    New Scheme  {MovieID, MovieTitle}
•   We got the {MovieID}  {MovieTitle} relationship back
Example 3 (Convert to BCNF)
    Old Scheme  {Client, Problem, Consultant}
    New Scheme  {Client, Consultant}
    New Scheme  {Client, Problem}
     Fourth Normal Form (4NF)
•    Fourth normal form eliminates independent many-to-one relationships
     between columns.
•    To be in Fourth Normal Form,
    –    a relation must first be in Boyce-Codd Normal Form.
    –    a given relation may not contain more than one multi-valued attribute.
Example (Not in 4NF)
    Scheme  {MovieName, ScreeningCity, Genre)
    Primary Key: {MovieName, ScreeningCity, Genre)
    1.   All columns are a part of the only candidate key, hence BCNF
    2.   Many Movies can have the same Genre
    3.   Many Cities can have the same movie         Movie        ScreeningCity       Genre
    4.   Violates 4NF                                Hard Code    Los Angles          Comedy
                                                      Hard Code          New York     Comedy
                                                      Bill Durham        Santa Cruz   Drama
                                                      Bill Durham        Durham       Drama
                                                      The Code Warrier   New York     Horror
     Fourth Normal Form (4NF)
Example 2 (Not in 4NF)                                          Manager           Child   Employee
Scheme  {Manager, Child, Employee}                             Jim               Beth    Alice
    1.   Primary Key  {Manager, Child, Employee}               Mary              Bob     Jane
    2.   Each manager can have more than one child              Mary              NULL    Adam
    3.   Each manager can supervise more than one employee
    4.   4NF Violated
Example 3 (Not in 4NF)
Scheme  {Employee, Skill, ForeignLanguage}
    1.   Primary Key  {Employee, Skill, Language }
    2.   Each employee can speak multiple languages
    3.   Each employee can have multiple skills          Employee         Skill           Language
    4.   Thus violates 4NF                               1234             Cooking         French
                                                         1234             Cooking         German
                                                         1453             Carpentry       Spanish
                                                         1453             Cooking         Spanish
                                                         2345             Cooking         Spanish
                        4NF - Decomposition
1.        Move the two multi-valued relations to separate tables
2.        Identify a primary key for each of the new entity.
Example 1 (Convert to 3NF)
          Old Scheme  {MovieName, ScreeningCity, Genre}
          New Scheme  {MovieName, ScreeningCity}
          New Scheme  {MovieName, Genre}
     Movie              Genre    Movie              ScreeningCity
     Hard Code          Comedy   Hard Code          Los Angles
     Bill Durham        Drama    Hard Code          New York
     The Code Warrier   Horror   Bill Durham        Santa Cruz
                                 Bill Durham        Durham
                                 The Code Warrier   New York
            4NF - Decomposition
Example 2 (Convert to 4NF)                         Manager   Child   Manager   Employee
    Old Scheme  {Manager, Child, Employee}        Jim       Beth    Jim       Alice
    New Scheme  {Manager, Child}                  Mary      Bob     Mary      Jane
                                                                     Mary      Adam
    New Scheme  {Manager, Employee}
Example 3 (Convert to 4NF)
    Old Scheme  {Employee, Skill, ForeignLanguage}
    New Scheme  {Employee, Skill}
    New Scheme  {Employee, ForeignLanguage}
           Employee   Skill         Employee   Language
           1234       Cooking       1234       French
           1453       Carpentry     1234       German
           1453       Cooking       1453       Spanish
           2345       Cooking       2345       Spanish
      Fifth Normal Form (5NF)
•   Fifth normal form is satisfied when all tables are broken into
    as many tables as possible in order to avoid redundancy.
    Once it is in fifth normal form it cannot be broken into
    smaller relations without changing the facts or the meaning.
    Domain Key Normal Form (DKNF)
•   The relation is in DKNF when there can be no insertion or
    deletion anomalies in the database.
    Boyce–Codd Normal Form
• A relation is in BCNF, if and only if, every
  determinant is a candidate key.
• Identify all the determinants and make sure that
  they are candidate keys.
• Functional dependency A→B,
• 3NF allows this dependency in a relation if B is
  a primary-key attribute and A is not a candidate
  key
• BCNF insists that for this dependency to remain
  in a relation, A must be a candidate key.
                   BCNF
• The potential to violate BCNF may occur
  when:
• The relation contains two (or more)
  composite candidate keys; or
• The candidate keys overlap, that is have at
  least one attribute in common
  Example
  Primary Key : (clientNo, interviewDate)
  3 candidate keys:
  (clientNo, interviewDate)
  (staffNo,interviewDate, interviewTime)
  (roomNo, interviewDate, interviewTime)
ClientInterview (clientNo, interviewDate,
interviewTime, staffNo, roomNo)
Functional Dependencies
• (staffNo, interviewDate) is not a candidate key
  this FD is allowed in 3NF because roomNo is a
  primary-key attribute being part of the candidate
  key (roomNo, interviewDate, interviewTime).
• As there are no partial or transitive dependencies
  on the primary key (clientNo, interviewDate) and
  fd4 is allowed, the ClientInterview relation is in
  3NF.
• BCNF requires that all determinants in a relation
  must be a candidate key for the relation. So this
  relation is not in BCNF.
      Transform to BCNF
• We must remove the violating FD by creating
  two new relations called Interview & StaffRoom
  First normal form (1NF) to Boyce–
  Codd Normal Form (BCNF)
Candidate keys (propertyNo, iDate), (staffNo, iDate, iTime),
and (carReg, iDate, iTime).
Primary Key - (propertyNo, iDate)
        First Normal Form (1NF)
  • Repeating Group = (iDate, iTime,
    comments, staffNo, sName, carReg)
StaffPropertyInspection (propertyNo, iDate, iTime,
pAddress, comments, staffNo, sName, carReg)
    Second Normal Form (2NF)
Following Functional Dependencies exists
                      2NF
  • To transform the relation into 2NF requires
    the creation of new relations
•These relations are in 2NF, as every non-primary-
key attribute is functionally dependent on the
primary key of the relation
   3NF - Identify Transitive Dependency
•Property relation does not have transitive dependencies
•PropertyInspection have following transitive dependencies
3 NF - Decompose
                  BCNF
• A relation is in BCNF if every determinant
  of a relation is a candidate key.
• Identify all the determinants and make sure
  they are candidate keys.
             BCNF – Identify FD
PropertyInspect is not in BCNF because determinant (staffNo,
iDate), is not a candidate key (represented as fd4).
         BCNF - Decompose
• The PropertyInspect relation may suffer
  from update anomalies.
• To change the car allocated to staff number
  SG14 on the 22-Apr-03, we must update
  two tuples.
        BCNF - Decompose
• Must remove the dependency that violates
  BCNF by creating two new relations called
  StaffCar and Inspection
After Normalization