BAN 610 Database Management
Instructor:
                Peng Xie
                Department of
                Management
                                    Relation
A relation is a named, two-dimensional table of data which consists of:
   - rows, also called records.
   - named columns, also called attributes or fields.
Example:
   EMPLOYEE_REG
employee_id      last_name        first_name       department   registration
123456789        Gruber           Stacey           IT           Part time
111222333        Clayton          John             marketing    Full time
                                     Relation
 A relation is a named, two-dimensional table of data which consists of:
 Example:
      EMPLOYEE_REG
 employee_id     last_name        first_name      department      registration
 123456789       Gruber           Stacey          IT              Part time
 111222333       Clayton          John            marketing       Full time
1. The table must have a unique name
2. Every record (row) must be unique (can’t allow two rows with exactly the same values
   for all their attributes)
3. Attributes (columns) in tables must have unique names
4. The order of the columns must be irrelevant
5. The order of the rows must be irrelevant
                         Relational Databases
                             Primary Key
- Primary Key
    - An attribute
    - Uniquely identify a record
    - Single Key
    - Composite Key
- Example
    - Student ID
    - SSN
            Example of Primary Keys
CUSTOMERS
                                      Single Key
SALES
                                      Composite Key
                                  More on keys
-   Super keys
     - The set of all combination of attributes that helps uniquely identify an entity
     - Suppose we have the table
          Emp_SSN               EID                    Emp_name
          1234556               332                    A
          3323234               232                    B
          3323232               223                    A
          1234145               321                    C
     -   The super keys can be
          - {Emp_SSN}
          - {EID}
          - {Emp_SSN, EID}
          - {Emp_SSN,Emp_name}
          - {EID, Emp_name}
                                                                                         6
                                 More on keys
-   Candidate keys
     - The minimal super key with no redundant value
          Emp_SSN              EID                     Emp_name
          1234556              332                     A
          3323234              232                     B
          3323232              223                     A
          1234145              321                     C
     -   The candidate keys can be
          - {Emp_SSN}
          - {EID}
-   Primary key
     - A primary key is selected from candidate keys
                                                                  7
                                 More on keys
-   Another example of super keys and candidate keys
     - Consider the following table
         Classroom       Reserved       Duration       Type
         1               10:00          1              Dissertation
         1               11:00          2              Dissertation
         1               14:00          3              Conference
         2               12:00          1.5            Presentation
         2               14:00          2.5            Presentation
         2               17:00          3.5            Lecture
     -   The super keys are
          - {Classroom, Reserved},
          - {Classroom, Reserved, Type}
          - {Classroom, Reserved, Duration}
          - {Classroom, Reserved, Duration, Type}
     -   Why {Classroom, Duration} is not listed?
                                                                      8
                                More on keys
-   Another example of super keys and candidate keys
     - Consider the following table
         Classroom      Reserved        Duration       Type
         1              10:00           1              Dissertation
         1              11:00           2              Dissertation
         1              14:00           3              Conference
         2              12:00           1.5            Presentation
         2              14:00           2.5            Presentation
         2              17:00           3.5            Lecture
     -   The candidate keys are
          - {Classroom, Reserved}
                                                                      9
                                  Normalization
-   The purpose of Normalization
     - To fully get rid of the above-mentioned anomalies
-   5 forms of normalization
     - 1 NF
     - 2 NF
     - 3 NF
     - 3.5NF (BCNF)
     - 4 NF
     - 5 NF
-   Each form is stricter than the last form
                 Relational Databases Normalization
-   Normalization goal:
    - Reduce redundancies to a minimum so that data updating
       anomalies are removed without affecting information
       retrievability.
                           Functional Dependency
-   Notation
    - A relation is denoted r(R), r is the name of the relation, R is the
        set of all attributes.
    - In the following example, the R is (A,B,C,D)
    -   Super key is a subset K of R, so that for any two tuples in r(R), 𝑡1
        and 𝑡2 , if 𝑡1 ≠ 𝑡2 , we must have 𝑡1 𝐾 ≠ 𝑡2 [𝐾].
    -   e.g. in the above example, one super key is (A,B), so for any two
        tuples such as 𝑡1 = (𝑎1 , 𝑏1 , 𝑐1 , 𝑑1 ), and 𝑡2 = (𝑎1 , 𝑏2 , 𝑐1 , 𝑑2 ). If 𝑡1 ≠
        𝑡2 , we must have
    𝑡1 𝐾 = 𝑡1 𝐴, 𝐵        = 𝑎1 , 𝑏1 ≠ 𝑡2 𝐾 = 𝑡1 𝐴, 𝐵                = (𝑎1 , 𝑏2 )
                       Functional Dependency
-   Functional dependency
    - For r(R), if it satisfies the functional dependency 𝛼 → 𝛽, then
        for any pairs of tuples, 𝑡1 and 𝑡2 , we must have:
        - if 𝑡1 𝛼 = 𝑡2 𝛼 , then 𝑡1 𝛽 = 𝑡2 𝛽
    -    In the previous example
    -   Functional dependency A → 𝐶 satisfied, since whenever column
        A is the same, the column C must be the same as well.
    -   However, C → 𝐴 is not satisfied.
    -   Does the following hold?
        - BC →AD?
        - BD →AC?
                        Functional Dependency
-   Functional dependency is logical
    - Consider the following relation
        Building            Room number         Capacity
        A                   121                 120
        B                   203                 20
        C                   302                 10
        D                   121                 120
        D                   231                 130
    -    Just from the current instances in the relation, we have
    -    Room number-> capacity
    -    But in the future, when we add more data to this table, it is very
         likely that the same room number in different building have
         different capacity.
    -    So, logically, room number -> capacity doesn’t hold.
    -    Remember, functional dependency is always logical.
                                         Normalization
-    NF1
      -     Each column must be single value (atomic value)
      -     Each column should store the same type value
      -     Each column must have a unique name
      -     Order of rows are irrelevant
    Emp_id    Last_ name   First_ name   Dept    Salary   Hire_ date     Training_ course        Course_name
        10020 Marvin       John          Mktg       65000          38047                 101,205 A,B
        10300 Carter       Michael       Sales      60000          38384                 103,210 C,D
        20139 Gates        Susan         Acctg      62000          38777                     202 E
        20040 Sanchez      Jose          Finance    58000          39295                     210 D
        21113 Li           Ping          Ops        55000          39692                     207 F
       -    The Training_course and the Course_name contain double values
       -    Violating the rules
                                                Normalization
-    NF1
      - We break down these columns into multiple columns to achieve NF1
    Emp_id Last_ name   First_ name   Dept      Salary Hire_   date Training_ course     Course_date Course_name
      10020 Marvin      John          Mktg        65000         38047                101         38078 A
      10020 Marvin      John          Mktg        65000         38047                205         38231 B
      10300 Carter      Michael       Sales       60000         38384                103         38565 C
      10300 Carter      Michael       Sales       60000         38412                210         38930 D
      20139 Gates       Susan         Acctg       62000         38777                202         39326 E
      20040 Sanchez     Jose          Finance     58000         39295                210         39661 D
      21113 Li          Ping          Ops         55000         39692                207         40057 F
                                          Normalization
-   NF2
     - NF1 is satisfied
     - Partial dependency does not exist
     - But what is partial dependency?
-   Example of partial dependency
     - The primary key for this table is Emp_id and Training_course
     - The non-key attribute first_name depends only on Emp_id
     - Or, given the Emp_id, the first_name is known for sure
     - But the first_name has nothing to do with Training_course
     - Or, For the training_course 210, there are two different first_name.
     - A non-key attribute only depend on one of the keys -> partial dependency
     - Solution -> split table
       Emp_id Last_ name   First_ name   Dept      Salary Hire_   date Training_ course     Course_name
         10020 Marvin      John          Mktg        65000         38047                101 A
         10020 Marvin      John          Mktg        65000         38047                205 B
         10300 Carter      Michael       Sales       60000         38384                103 C
         10300 Carter      Michael       Sales       60000         38412                210 D
         20139 Gates       Susan         Acctg       62000         38777                202 E
         20040 Sanchez     Jose          Finance     58000         39295                210 D
         21113 Li          Ping          Ops         55000         39692                207 F
                                      Normalization
-   NF3
     - NF2 satisfied
     - No transitive dependency exists
     - But what is transitive dependency?
-   Example of transitive dependency
     - In the following table, the primary key is Emp_id
     - Since there is only one key in primary key, this table is in NF2
     - However, we have Emp_ID -> Emp_state -> Emp -> Country
     - So Emp_Country transitively dependant on the Emp_ID
     - Solution -> split table
      Emp_id                name        Emp_state   Emp_country   age
                       10020Marvin      CA          US                  65000
                       10300Carter      GA          US                  60000
                       20040Sanchez     TX          US                  58000
                       21113Li          NY          US                  55000
                                 Normalization
-   NF3.5 (Boyce-Codd normalization form)
     - NF3 satisfied
     - Non-prime attributes don’t determine one of the keys
     - This is uncommon
-   Example
     - In the following table the primary key is Person and Shop Type
     - The nearest shop depends on both person and shop type -> 2NF
     - There is only one non-key attribute -> no transitive dependency -> 3NF
     - But nearest shop determines the shop type -> not NF3.5
     - Solution -> separating to {person, shop} and {shop, shop type}
     - But we sacrifice some query functionality
             Person                 Shop Type             Nearest Shop
                             David               Optician              Eagle Eye
                             David            Hairdresser               Snippets
                            Wright             Bookshop             Merlin Books
                             Fuller               Bakery               Doughy’s
                             Fuller           Hairdresser        Sweeney Todd’s
                             Fuller              Optician              Eagle Eye
                                         Normalization
-   NF4
     - NF3.5 satisfied
     - No multi-value dependency
-   Example
     - In the following table, John is associated with a set of hobbies (Ball, Singing) and a
        set of phone numbers (1121, 1122)
     - The two sets have more than one element
     - And the Hobby and the Phone number are independent with each other
     - This created the multi-value dependency problem
         Emp_id              Last_ name       First_ name      Hobby               Phone number
                        10020Marvin           John                            Ball           1121
                        10020Marvin           John                         Singing           1122
     -    Imagine you could easily imply the following redundant records in the same table
         Emp_id             Last_ name      First_ name     Hobby             Phone Number
                       10020Marvin          John                          Ball         1122
                       10020Marvin          John                       Singing         1121
     -    Solution -> split table
                                   Normalization
-   NF5
     - NF4 satisfied
     - No join dependency exist
-   Join dependency
     - Nothing depend on others logically -> no multivalued dependency -> NF4 satisfied
     - If we decompose the relation into multiple smaller relationships, and then join
         them, the result relationship should be exactly the same as the original relation,
         then join dependency exists->not NF5, need table splitting. Otherwise, join
         dependency does not exist->NF5.
     - Suppose we have the following relation
          Buyer                  Seller                Lender
          Smith                  Jones                 BOA
          Smith                  Wilson                Chase
          Nelson                 Jones                 Chase
     -   We can decompose the relation into three
                                   Normalization
-   The three smaller relations
     - Buyer and Seller
         Buyer                Seller
         Smith                Jones
         Smith                Wilson
         Nelson               Jones
     -   Buyer and lender
          Buyer              Lender
          Smith              BOA
          Smith              Chase
          Nelson             Chase
     -   Seller and lender
          Seller             Lender
          Jones              BOA
          Wilson             Chase
          Jones              Chase
                                 Normalization
-   Join them together again.
     - not the original relation -> no join dependency -> if we split into three relations,
          we loss information -> NF5 is already satisfied.
         Buyer                  Seller                 Lender
         Smith                  Jones                  BOA
         Smith                  Jones                  Chase
         Smith                  Wilson                 Chase
         Nelson                 Jones                  Chase
                                  Normalization
-   Example of Join Dependency.
         Subject                   Professor      Semester
         Maths 2                   S              2
         Data structure            M              3
         Deep learning             A              5
         Data structure            S              4
         Deep learning             T              5
         Computer network          K              2
         Digital logical design    A              5
     -   Please verify
                                     Normalization
-   If we follow these rules specifically to normalize a table, it is going to take a long time
-   Follow these steps instead
           - identify entities in the stored information
                - An entity is a person, place, object, event, concept about whom/which
                    the organization/user wishes to maintain data
                - An attribute is NOT an entity but a feature of an entity.
                - Example: employees and training courses are entities. An employee name
                    or a training course number are attributes.
           - Find out the relationship between these entities
                - One-to-one capital city and country
                - One-to-many customer and orders
                - Many-to-many students and courses
           - Split the tables according to entities
                - Each table should only include the attributes regarding the entity
                - Attributes of an instance of the entity should not change over time
                - For many to many relationship, add an association entity
                - The association entity will contain both primary keys of the two entities
                    and any attributes regarding to the interaction of the two entity
                          Example of Normalization
- The First normal form (1NF)
   - A table in which the intersection of every column and record contains only
       one value
                          Example of Normalization
- The First normal form (1NF)
   - A table in which the intersection of every column and record contains only
       one value
                          Example of Normalization
- The Second form (2NF)
                          Example of Normalization
- The Second form (2NF)
                           Example of Normalization
- The Second form (3NF)
   - table that is in 1NF and 2NF and in which all non-primary-key column can be
       worked out from only the primary key column(s) and no other columns.
                          Example of Normalization
- The Second form (3NF)