CXB 3104 Advanced Database Systems
Relational Data Manipulation
 Relational tables are sets. Rows of the tables can be considered as elements of the set Operations that can be performed on sets can be done on relational tables.
Lecture 3
Muthu
muthma@sunway.edu.my
Union Operator
The union operation of two relational tables is formed by appending rows from one table to those of a second table to produce a third.  Duplicate rows are eliminated.  Tables that are union compatible must have the same number of columns and corresponding columns must come from the same domain.
Union Operator
Difference Operator
Difference Operator
The difference of two relational tables is a third that contains those rows that occur in the first table but not in the second Requires that the tables be union compatible.
Intersection Operator
 The intersection of two relational tables is a
Intersection Operator
third table that contains common rows.
 Requires that the tables be union compatible.
Product Operator
Product Operator
The product of two relational tables is called the Cartesian product. It is the concatenation of every row in one table with every row in the second. The product of table A (having m rows) and table B (having n rows) is the table C (having m x n rows).
9 10
Projection & Selection Operators
Join Operator
 
Projection
 The project operator retrieves a subset of columns from a table, removing duplicate rows from the result.  Yields vertical subset of a table
combines the product, selection, and projection combines data from one row of a table with rows from another or the same table when certain criteria are met. criteria involves a relationship among the columns in the join relational table. If the join criterion is based on equality of column value, the result is called an equijoin. A natural join removes redundant columns.
12
Selection
 The select operator retrieves subsets of rows from a relational table based on a value(s) in a column or columns.  Yields a horizontal subset of a table
11
Division Operator Join Operator
Results in columns values in one table for which there are other matching column values corresponding to every row in another table.
13
14
Data Dictionary
 It provides details of all tables found within the
Data Dictionary
database.
 It contains all the attribute name and characteristics
for each table in the system. data
 The data dictionary contains metadata - data about
15
16
Database Tables and Normalization
Normalization
 Process for evaluating and correcting table structures to minimize data redundancies
Normalization
 helps eliminate data anomalies
 Works through a series of stages called normal forms:
 Normal form (1NF)  Second normal form (2NF)  Third normal form (3NF)
18
Database Tables and Normalization
 2NF is better than 1NF; 3NF is better than 2NF  For most business database design purposes, 3NF is highest we need to go in the normalization process  Highest level of normalization is not always most desirable
What is Normalisation?
In a RDB normalisation is crucial for:  retaining data consistency on updates  Minimizing data redundancy and therefore reducing file space required in the database  Minimize data storage
Key Concepts in normalization are Functional Dependency and keys
19 20
Update Anomalies
 
Deletion Anomalies
Tables that have redundant data may have problems called update anomalies. Consider the following table of data on products and required manufacturing equipment:
Product name Ethylene Ethylene Styrene Styrene Equipment name cooler heater pump heater Product manager Smith Smith Jones Jones Equipment supplier Westwood Westwood Davison Westwood Supplier address Derby Derby Rugby Derby
If a row is deleted that represents the last product with a particular piece of equipment, the equipment details are also lost - this is a deletion anomaly.
21
22
Deletion Anomalies example
Insert Anomalies
Insert anomalies  New rows that are entered must always have consistent sets of product and/or equipment - human error may lead to inconsistencies  Product and equipment data cannot be entered separately without using null values - this might violate primary keys
Product name Ethylene Ethylene Styrene Styrene
Equipment name cooler heater pump heater
Product manager Smith Smith Jones Jones
Equipment supplier Westwood Westwood Davison Westwood
Supplier address Derby Derby Rugby Derby
Loss of equipment Supplier Details
23
24
Insert Anomalies example
Modification Anomalies
An update on the values of product or equipment in one row must also be performed on all the other rows that have the same product or equipment, or inconsistencies will occur in the data modification anomaly
Potential for inconsistent data sets
25
26
Modification Anomalies example
Normal Forms
Normal Forms are rules developed to avoid logical inconsistencies from table update operations. Each normal form prohibits a form of redundancy in table organisation that could yield meaningless results if one table were updated independently of other tables or other rows in the table.
27
28
Normal Forms
  
Normal Forms
There are multiple levels of normal forms. Each higher level adds in an additional constraint to the level preceding it. As the database design satisfies higher level normal forms the tables become more fragmented. This means that:
 As data consistency is improved database navigation and hence queries become slower  The tables become less like the real-world system they represent.
The six normal form levels are:
      1st Normal Form (1NF) 2nd Normal Form (2NF) 3rd Normal Form (3NF) Boyce-Codd Normal Form (BCNF) 4th Normal Form (4NF) 5th Normal Form (5NF)
Usually ensuring that the database satisfies the third normal form is sufficient for data consistency.
29
30
First Normal Form- 1
First Normal Form  2
A table is in first normal form when a primary key can be defined for each combination of data. In the example below, a row has two entries in column Equipment name.
Table violates INF
This can be remedied by using two rows for the dual entry:
Table satisfies INF Table violates 2NF
Product name Ethylene Ethylene Styrene Styrene
Equipment name cooler heater pump heater
Product manager Smith Smith Jones Jones
Equipment supplier Westwood Westwood Davison Westwood
Supplier address Derby Derby Rugby Derby
Product name Ethylene Styrene Styrene
Equipment name cooler, heater pump heater
Product manager Smith Jones Jones
Equipment supplier Westworld Davison Westworld
Supplier address Derby Rugby Derby
Primary key: (Product name, Equipment name)
31
32
First Normal Form  Example Rules
    First Normal Form  Determinancy Diagram
Product Name Product Manager
Manufacture of each product uses a set of equipment. A piece of equipment can be used in the manufacture different products. Each equipment used in the manufacture of each product has a supplier. Each product has a product manager Each supplier has an address.
Equipment Name
Equipment Supplier
Supplier Address
33
34
Second Normal Form - 1
Second Normal Form  2
A table is in second normal form when it satisfies first normal form and each row has a primary key. Each non-primary key must fully depend on the primary key. In the example, Equipment supplier and Supplier address depend on the chosen primary key (Product name, Equipment name) but Product Manager only depends on part of the primary key.
This can be remedied by splitting off this partial dependency into another table:
Tables satisfy 2NF Tables violate 3NF
Product name Ethylene Styrene
Product manager Smith Jones
Product name Ethylene Ethylene Styrene Styrene
Equipment name cooler heater pump heater
Equipment supplier Westwood Westwood Davison Westwood
Supplier address Derby Derby Rugby Derby
Primary key: (Product name)
Primary key: (Product name, Equipment name)
35
36
Third Normal Form - 1
Third Normal Form  2
A table is in third normal form when it satisfies second normal form and each nonprimary key column directly depends on the primary key. In the example, 3NF is violated because there is transitive dependency. Supplier address depends on Equipment supplier which in turn depends on the primary key.
This can be remedied by splitting off this indirect dependence into a further table:
Tables satisfy 3NF
Product name Ethylene Styrene
Product manager Smith Jones
Equipment supplier Westwood Davison
Supplier address Derby Rugby
Product name Ethylene Ethylene Styrene Styrene
Equipment Equipment name supplier cooler heater pump heater Westwood Westwood Davison Westwood
Primary key: (Product name)
Primary key: (Equipment supplier)
Primary key: (Product name, Equipment name)
37
38
Other Normal Forms
 The data in a RDB is free of redundancy when it is in the fifth normal form. In this state an update on a column in any table should not lead to data inconsistencies occurring. In practice it is adequate to normalise data into the 3NF.
39