DBMS and Data Management Lecture Notes
DBMS and Data Management Lecture Notes
1,
2,
3,
1,
2,
3,
1,
2,
3,
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
 Now Suppose in  a  relational  model a  table  T  has  columns  titled  as  A,  B  and  C.  If  T  is
representing  an entity set then A, B, C will be its attributes. Every attribute corresponds
to a limited number of values that can be assigned to it.
 The set of values that can be assigned to a particular column is called domain for that. So
A, B and C will have their specified domains. Suppose those domain sets are denoted as
D
A
, D
B
, and D
C
.
 Any row of the table will have a value from D
A
in first column, D
B
in second column, and
D
C
in  third  column. So  a  row of  relational  table is  similar  to  a tuple  of  a  mathematical
relation between the sets D
A
, D
B
, and D
C
.
 For  Example:  Consider  an BOOK table  having  attributes acc-no, title,  and author. To
make  it  simple  we  restrict  the  domain  for  acc-no.  as A={100, 101, 102},  for  title  as
B={DBMS,  COMPILER,  OS}  and  for  author  as C={Ramanuj,  Aristotle  and
Silbershatz}. That means the first column of BOOK can have  any value from only  A,
second from only B and third from only C. the Cartesian product of these domain sets can
be represented in a tabular form as :
A  B  C =
 Now we can observe any valid table representing the entity set BOOK for a library given
above  restriction  on  domains  will  have  only  a  subset  of  the  rows  from  the  above  table
which represents  A   B  C. For Example a valid entity set for  all books in the library
can be
BOOK
 So  we  can  say  any  table  of  relational  model  is  actually  similar  to  the  mathematical
relation.
100   DBMS   Ramanuj
100   DBMS   Aristotle
100   DBMS   Silbershatz
100   COMPILER   Ramanuj
100   COMPILER   Aristotle
100   COMPILER   Silbershatz
100   OS   Ramanuj
100   OS   Aristotle
200   DBMS   Ramanuj
200   DBMS   Aristotle
200   DBMS   Silbershatz
200   COMPILER   Ramanuj
200   COMPILER   Aristotle
200   COMPILER   Silbershatz
200   OS   Ramanuj
200   OS   Aristotle
200   OS   Silbershatz
300   DBMS   Ramanuj
300   DBMS   Aristotle
300   DBMS   Silbershatz
300   COMPILER   Ramanuj
300   COMPILER   Aristotle
300   COMPILER   Silbershatz
300   OS   Ramanuj
300   OS   Aristotle
300   OS   Silbershatz
100   DBMS   Silbershatz
200   DBMS   Ramanuj
300   COMPILER   Silbershatz
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
 Every row of such relational table is similar to a tuple of a mathematical relation. Let the
tuple variable t refers to  the first tuple (first row) in above mentioned BOOK table then
we can various elements of the tuple as t[acc-no]= 100, t[title]= DBMS and t[author] =
Silbershatz.
Query Languages
 A  language  in  which  a  user  requests  information  from  the  database  is called  a  query
language.
o   Procedural- user instructs the system to perform a sequence of operations on the
database to compute the desired result. E.g. Relational algebra
o   Nonprocedural- user describes the information desired without giving a specific
procedure  for  obtaining  the  desired  information.  E.g.  tuple  calculus,  domain
calculus.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture 10
Relational Algebra:
 Basic operations:
o   Selection ()    Selects a subset of rows from relation.
o   Projection () Selects a subset of columns from relation.
o   Cross-product ()  Allows us to combine two relations.
o   Set-difference () Tuples in reln. 1, but not in reln. 2.
o   Union (U) Tuples in reln. 1 and in reln. 2.
o   Rename( ) Use new name for the Tables or fields.
 Additional operations:
o Intersection (), join( ), division():  Not essential, but (very!) useful.
 Since each operation returns a relation, operations can be composed! (Algebra is
closed.)
Projection
 Deletes attributes that are not in projection list.
 Schema of result contains exactly the fields in the projection list, with the same names
that they had in the (only) input relation. ( Unary Operation)
 Projection operator has to eliminate duplicates!  (as it returns a relation which is a set)
o Note: real systems typically dont do duplicate elimination unless the user
explicitly asks for it.  (Duplicate values may be representing different real world
entity or relationship)
Consider the BOOK table:
Title
(BOOK) =
Selection
 Selects rows that satisfy selection condition.
 No duplicates in result!  (Why?)
   Schema of result identical to schema of (only) input relation.
  Result relation can be the input for another relational algebra operation!  (Operator
composition.)
Acc-no>300
(BOOK) =
Acc-No   Title   Author
100 DBMS Silbershatz
200 DBMS Ramanuj
300 COMPILER Silbershatz
400 COMPILER Ullman
500 OS Sudarshan
600 DBMS Silbershatz
Title
DBMS
COMPILER
OS
Acc-No   Title   Author
400 COMPILER Ullman
500 OS Sudarshan
600 DBMS Silbershatz
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Title=DBMS
(BOOK)=
Acc-no
(
Title=DBMS
(BOOK))=
Union, Intersection, Set-Difference
 All of these operations take two input relations, which must be union-compatible:
o Same number of fields.
o `Corresponding fields have the same type.
 What is the schema of result?
Consider:
Borrower                                          Depositor
List of customers who are either borrower or depositor at bank= 
Cust-name
(Borrower) U
Cust-name
(Depositor)=
Customers who are both borrowers and depositors = 
Cust-name
(Borrower) 
Cust-name
(Depositor)=
Customers who are borrowers but not depositors = 
Cust-name
(Borrower)  
Cust-name
(Depositor)=
Acc-No   Title   Author
100 DBMS Silbershatz
200 DBMS Ramanuj
600 DBMS Silbershatz
Acc-No
100
200
600
Cust-name   Loan-no
Ram L-13
Shyam L-30
Suleman L-42
Cust-name   Acc-no
Suleman A-100
Radheshyam A-300
Ram A-401
Cust-name
Ram
Shyam
Suleman
Radeshyam
Cust-name
Ram
Suleman
Cust-name
Shyam
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-11
Cartesian-Product or Cross-Product (S1  R1)
 Each row of S1 is paired with each row of R1.
   Result  schema has  one  field  per  field  of  S1  and  R1,  with  field  names  `inherited  if
possible.
 Consider the borrower and loan tables as follows:
Borrower:                                  Loan:
Cross product of Borrower and Loan, Borrower  Loan =
The rename operation can be used to rename the fields to avoid confusion when two field
names are same in two participating tables:
For example the statement, 
Loan-borrower(Cust-name,Loan-No-1,  Loan-No-2,Amount)
( Borrower  Loan)
results into- A new Table named Loan-borrower is created where it has four fields which
are  renamed  as  Cust-name,  Loan-No-1,  Loan-No-2  and  Amount  and  the  rows  contains
the same data as the cross product of Borrower and Loan.
Loan-borrower:
Rename Operation:
 It can be used in two ways :
o   ( ) return the result of expression E in the table named x.
Cust-name   Loan-no
Ram L-13
Shyam L-30
Suleman L-42
Loan-no   Amount
L-13 1000
L-30 20000
L-42 40000
Borrower.Cust-
name
Borrower.Loan-
no
Loan.Loan-
no
Loan.Amount
Ram L-13 L-13 1000
Ram L-13 L-30 20000
Ram L-13 L-42 40000
Shyam L-30 L-13 1000
Shyam L-30 L-30 20000
Shyam L-30 L-42 40000
Suleman L-42 L-13 1000
Suleman L-42 L-30 20000
Suleman L-42 L-42 40000
Cust-name   Loan-No-1   Loan-No-2   Amount
Ram L-13 L-13 1000
Ram L-13 L-30 20000
Ram L-13 L-42 40000
Shyam L-30 L-13 1000
Shyam L-30 L-30 20000
Shyam L-30 L-42 40000
Suleman L-42 L-13 1000
Suleman L-42 L-30 20000
Suleman L-42 L-42 40000
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
o
  (   ,   ,,   )
( ) return  the  result  of  expression  E  in  the  table  named x with
the attributes renamed to A
1,
A
2,
, A
n
.
o Its  benefit  can  be  understood  by  the  solution  of  the  query    Find  the  largest
account balance in the bank
It can be solved by following steps:
1. Find out the relation of those balances which are not largest.
a. Consider Cartesion product of Account with itself i.e. Account
 Account
b. Compare  the  balances  of  first  Account  table  with  balances  of
second Account table in the product.
c. For  that  we  should  rename  one  of  the  account  table  by  some
other name to avoid the confusion
d. It can be done by following operation
Account.balance
 (
Account.balance < d.balance
(Account
d
(Account))
e. So  the  above  relation  contains  the  balances  which  are  not
largest.
2. Subtract this relation from the relation containing all the balances i.e .
balance
 (Account).
3. So the final statement for solving above query is
balance
 (Account)- 
Account.balance
 (
Account.balance < d.balance
(Account 
d
(Account))
Additional Operations
Natural Join (    )
 Forms Cartesian product of its two arguments, performs selection forcing equality
on those attributes that appear in both relations
 For example consider Borrower and Loan relations, the natural join between them
Borrower  Loan will automatically perform the selection on the table returned
by Borrower    Loan  which  force  equality  on  the  attribute  that  appear  in  both
Borrower and Loan i.e. Loan-no and also will have only one of the column named
Loan-No.
 That means Borrower  Loan = 
Borrower.Loan-no = Loan.Loan-no
(Borrower  Loan).
 The table returned from this will be as follows:
Eliminate rows that does not satisfy the selection criteria 
Borrower.Loan-no = Loan.Loan-
no
from Borrower  Loan =
Customer-name. branch-name
 (Depositor  )  
branch-name
 (
Branch-city=Agra
(Branch)
 The  division operations can  be  specified  by  using  only  basic  operations  as
follows: Let r(R) and s(S) be given relations for schema R and S with S  R
r  s = 
R-S
(r) - 
R-S
((
R-S
(r)  s) - 
R-S,S
(r))
Cust-name   Loan-no   Amount
Ram L-13 1000
Shyam L-30 20000
Suleman L-42 40000
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-12
Tuple Relational Calculus
 Relational algebra is an example of procedural language while tuple relational
calculus is a nonprocedural query language.
 A query is specified as:
{t | P(t)}, i.e it is the set of all tuples t such that predicate P is true for t.
 The formula P(t) is formed using atoms which uses  the relations, tuples of
relations and fields of tuples and following symbols
o ( belongs to),<,>,,,,=, (comparison operators)
 These atoms can then be used to form formulas with following symbols
o  ( universal qualifier  generally called "for all")
o  ( existential qualifier generally called "there exists")
o  ( and), (or), ( not)
 For  example  :  here  are  some  queries  and  a  way to  express  them  using  tuple
calculus:
o Find the branch-name, loan-number and  amount  for loans over Rs 1200.
{t| t  Loan  t[amount] > 1200}.
o Find  the  loan  number  for  each  loan  of  an  amount  greater  that  Rs1200.
{t|  s  Loan(t[loan-number] = s[loan-number]  s[amount] >1200}
o Find  the  names  of  all  the  customers  who  have  a  loan  from  the  Sadar
branch.
{t |  s  Borrower ( t customer-name =s customer-name 
 u  Loan ( u[loan-number] = s[loan-number
 u[branch-name] = "Sadar"))}
o Find all customers who have a loan , an account, or both at the bank
{t|  s  Borrower ( t[customer-name] = s[customer-name])
  u  Depositor (t[customer-name] = u[customer-name])}
o Find only those customers who have both an account and a loan.
{t|  s  Borrower ( t[customer-name] = s[customer-name])
  u  Depositor (t[customer-name] = u[customer-name])}
o Find all customers who have an account but do not have loan.
{t| u  Depositor (t[customer-name] = u[customer-name]) 
  s  Borrower ( t[customer-name] = s[customer-name])}
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
o Find all customers who have an account at all branches located in Agra
{t |  w  Branch( w[branch-city] = "Agra" 
 s  Depositor ( t customer-name = s customer-name
  u  Account ( u[account-number] = s[account-number]
 u[branch-name] = w[branch-name])))}
Domain Relational Calculus
 Domain relational calculus is another non procedural language for expressing database
queries.
 A query is specified as:
{<x
1
,x
2
,,x
n
> | P(x
1
,x
2
,,x
n
)} where x
1
,x
2
,,x
n
represents domain variables. P
represent a predicate formula as in tuple calculus
 Since the domain variables are referred in place of tuples the formula doesnt refer the
fields of tuples rather they refer the domain variables.
 For example the queries in domain calculus are mentioned as follows:
o Find  the  branch-name,  loan-number  and  amount for  loans  over  Rs  1200.
{<b, l, a>| <b, l, a> Loan  a >1200}.
o Find  the  loan  number  for  each  loan  of  an  amount  greater  that  Rs1200.
{< l >|  b,a( <b, l, a> Loan  a >1200}
o Find  the  names  of  all  the  customers  who  have  a  loan  from  the  Sadar  branch  and
find the loan amount
{<c, a >|  l(<c, l > Borrower
 b( <b, l, a > Loan  b="Sadar"))}
o Find  names  of  all  customers  who  have  a  loan  ,  an  account,  or  both  at  the  Sadar
Branch
{<c>|  l(<c, l > Borrower    b, a(<b, l, a> Loan  b ="Sadar"))
  a(<c, a> Depositor   b, n(<b, a, n> Account  b ="Sadar"))}
o Find only those customers who have both an account and a loan.
{<c>|  l(<c, l> Borrower )   a(<c, a> Depositor )}
o Find all customers who have an account but do not have loan.
{t|  a(<c, a> Depositor )    l(<c, l> Borrower )}
o Find all customers who have an account at all branches located in Agra
{<c>|  x, y, z(<x, y, z> Branch)  y = "Agra" 
 a, b(<x, a, b> Account  <c, a> Depositor)}
Outer Join
 Outer join operation is an extension of join operation to deal with missing information
 Suppose that we have following relational schemas:
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Employee( employee-name, street, city)
Fulltime-works(employee-name, branch-name, salary)
A snapshot of these relations is as follows:
Employee:
Fulltime-works
Suppose we want complete information of the full time employees.
 The natural join (Employee Fulltime-works)will result into the loss of information for
Suleman  and  Rehman  because  they  dont  have  record  in  both  the  tables (  left  and  right
relation). The outer join will solve the problem.
 Three forms of outer join:
o   Left  outer  join():the  tuples  which  doesnt  match  while  doing  natural  join
from  left  relation  are  also  added  in  the  result  putting  null  values  in  missing  field
of right relation.
o   Right  outer  join():the  tuples  which  doesnt  match  while  natural  join  from
right relation are also added in the result putting null values in missing field of left
relation.
o   Full outer join(): include both of the left and right outer joins i.e. adds the
tuples  which  did  not  match  either  in  left  relation  or  right  relation  and  put  null  in
place of missing values.
 The result for three forms of outer join are as follows:
Left join: Employee Fulltime-works=
Right join: Employee  Fulltime-works=
Full join: Employee  Fulltime-works=
employee-name   street   city
Ram M G Road Agra
Shyam New Mandi Road Mathura
Suleman Bhagat Singh Road Aligarh
employee-name   branch-name   salary
Ram Sadar 30000
Shyam Sanjay Place 20000
Rehman Dayalbagh 40000
employee-name   street   City   branch-name   salary
Ram M G Road Agra Sadar 30000
Shyam New Mandi Road Mathura Sanjay Place 20000
Suleman Bhagat Singh Road Aligarh Null Null
employee-name   street   city   branch-name   salary
Ram M G Road Agra Sadar 30000
Shyam New Mandi Road Mathura Sanjay Place 20000
Rehman null null Dayalbagh 40000
employee-name   street   city   branch-name   salary
Ram M G Road Agra Sadar 30000
Shyam New Mandi Road Mathura Sanjay Place 20000
Suleman Bhagat Singh Road Aligarh null null
Rehman null null Dayalbagh 40000
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Aggregate Functions
 Aggregate  functions  are  functions  that  take  a  collection  of  values  and  return  a  single
value as a result.
 Examples are sum, avg, count, max, min.
 Find the total balance of all the accounts
sum
balance
(Account).
 Find the no of borrowers
count
customer-name
(Borrower)
 Find the distinct customers who are either borrowers or depositors.
count-distinct
customer-name
(Borrower  Depositor)
 The aggregate functions can be applied on sub groups of the rows in the table rather than
on all of the rows of table using the denoted by symbol( ).
 For example we want to find the total salary of all the full time employees branch wise. It
can be specified as follows:
branch-name
(Fulltime-works)
Fulltime-works
The result of aggregate function with grouping specified above will be:
employee-name   branch-name   salary
Ram Sadar 30000
Shyam Sanjay Place 20000
Rehman Dayalbagh 40000
Suleman Sadar 25000
branch-name   sum of salary
Sadar 55000
Sanjay Place 20000
Dayalbagh 40000
Group1: branch name = sadar
Group2: branch name = sanjay place
Group3: branch name = Dayalbagh
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-13
Structured Query Language (SQL)
Introduction
 Commercial database systems use more user friendly language to specify the queries.
 SQL is the most influential commercially marketed product language.
 Other commercially used languages are QBE, Quel, and Datalog.
Basic Structure
 The basic structure of an SQL consists of three clauses: select, from and where.
   select: it corresponds to the projection operation of relational algebra. Used to list the
attributes desired in the result.
   from: corresponds to the Cartesian product operation of relational algebra. Used to list
the relations to be scanned in the evaluation of the expression
   where: corresponds to the selection predicate of the relational algebra. It consists of a
predicate involving attributes of the relations that appear in the from clause.
 A typical SQL query has the form:
select A
1
, A
2
,, A
n
fromr
1
, r
2
,, r
m
where P
o A
i
represents an attribute
o r
j
represents a relation
o P is a predicate
o It is equivalent to following relational algebra expression:
o   
A
1
,A
2
,,A
n
 (
P
(r
1
 r
2
 r
m
))
[Note: The words marked in dark in this text work as keywords in SQL language. For example
select, from and where in the above paragraph are shown in bold font to indicate that
they are keywords]
Select Clause
Let us see some simple queries and use of select clause to express them in SQL.
 Find the names of all branches in the Loan relation
select branch-name
from Loan
 By default the select clause includes duplicate values. If we want to force the elimination
of duplicates the distinct keyword is used as follows:
select distinct branch-name
from Loan
 The all key word can be used to specify explicitly that duplicates are not removed. Even
if we not use all it means the same so we dont require all to use in select clause.
select all branch-name
from Loan
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
 The asterisk * can be used to denote all attributes. The following SQL statement will
select and all the attributes of Loan.
select *
from Loan
 The arithmetic expressions involving operators, +, -, *, and / are also allowed in select
clause. The following statement will return the amount multiplied by 100 for the rows in
Loan table.
select branch-name, loan-number, amount * 100
from Loan
Where Clause
 Find all loan numbers for loans made at Sadar branch with loan amounts greater than
Rs 1200.
select loan-number
from Loan
where branch-name= Sadar and amount  > 1200
   where clause uses uses logival connectives and, or, and not
 operands of the logical connectives can be expressions involving the comparison
operators <, <=, >, >=, =, and < >.
   between can be used to simplify the comparisons
select loan-number
from Loan
where amount between 90000 and 100000
From Clause
 The from clause by itself defines a Cartesian product of the relations in the clause.
 When an attribute is present in more than one relation they can be referred as relation-
name.attribute-name to avoid the ambiguity.
 For all customers who have loan from the bank, find their names and loan numbers
select distinct customer-name,  Borrower.loan-number
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number
The Rename Operation
 Used for renaming both relations both relations and attributes in SQL
 Use as clause: old-name as new-name
 Find the names and loan numbers of the customers who have a loan at the Sadar
branch.
select distinct customer-name, borrower.loan-number as loan-id
from Borrower, Loan
where Borrower.loan-number = Loan.loan-number and
branch-name = Sadar
we can now refer the loan-number instead by the name loan-id.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
 For all customers who have a loan from the bank, find their names and loan-numbers
select distinct customer-name, T.loan-number
from Borrower as T, Loan as S
where T.loan-number = S.loan-number
 Find the names of all branches that have assets greater than at least one branch located in
Mathura.
select distinct T.branch-name
from branch as T, branch as S
where T.assets > S.assets and S.branch-city = Mathura
String Operation
 Two special characters are used for pattern matching in strings:
o Percent ( % ) : The % character matches any substring
o Underscore( _ ): The _ character matches any character
 %Mandi: will match with the strings ending with Mandi viz. Raja Ki mandi,
Peepal Mandi
 _ _ _ matches any string of three characters.
 Find the names of all customers whose street address includes the substring Main
select customer-name
from Customer
where customer-street like %Main%
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-14
Set Operations
   union, intersect and except operations are set operations available in SQL.
 Relations participating in any of the set operation must be compatible; i.e. they must have
the same set of attributes.
 Union Operation:
o Find all customers having a loan, an account, or both at the bank
(select customer-name
fromDepositor )
union
(select customer-name
from Borrower )
It will automatically eliminate duplicates.
o If we want to retain duplicates union all can be used
(select customer-name
fromDepositor )
union all
(select customer-name
from Borrower )
 Intersect Operation
o Find all customers who have both an account and a loan at the bank
(select customer-name
fromDepositor )
intersect
(select customer-name
from Borrower )
o If we want to retail all the duplicates
(select customer-name
fromDepositor )
intersect all
(select customer-name
from Borrower )
 Except Opeartion
o Find all customers who have an account but no loan at the bank
(select customer-name
fromDepositor )
except
(select customer-name
from Borrower )
o If we want to retain the duplicates:
(select customer-name
fromDepositor )
except all
(select customer-name
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
from Borrower )
Aggregate Functions
 Aggregate  functions  are  those  functions  which  take  a  collection  of  values  as  input  and
return a single value.
 SQL offers 5 built in aggregate functions-
o Average: avg
o Minimum:min
o Maximum:max
o Total: sum
o Count:count
 The  input  to  sum and  avg must  be  a  collection  of  numbers  but  others  may  have
collections of non-numeric data types as input as well
 Find the average account balance at the Sadar branch
select avg(balance)
from Account
where branch-name= Sadar
The  result  will  be  a  table  which  contains  single  cell  (one  row  and  one  column)  having
numerical value corresponding to average balance of all account at sadar branch.
   group  by clause  is  used  to  form  groups,  tuples  with  the  same  value  on  all  attributes  in
the group by clause are placed in one group.
 Find the average account balance at each branch
select branch-name, avg(balance)
from Account
group by branch-name
 By default the aggregate functions include the duplicates.
   distinct keyword is used to eliminate duplicates in an aggregate functions:
 Find the number of depositors for each branch
select branch-name, count(distinct customer-name)
from Depositor, Account
where Depositor.account-number = Account.account-number
group by branch-name
   having clause is used to state condition that applies to groups rather than tuples.
 Find the average account balance at each branch where average account balance is more
than Rs. 1200
select branch-name, avg(balance)
from Account
group by branch-name
having avg(balance) > 1200
 Count the number of tuples in Customer table
select count(*)
from Customer
 SQL doesnt allow distinct with count(*)
 When where and having are both present in a statement where is applied before  having.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Nested Subqueries
 A subquery is a select-from-where expression that is nested within another query.
   Set Membership
o The in and not in connectives are used for this type of subquery.
o Find all customers who have both a loan and an account at the bank, this query
can be written using nested subquery form as follows
select distinct customer-name
from Borrower
where customer-name in(select customer-name
fromDepositor )
o Select the names of customers who have a loan at the bank, and whose names are
neither Smith nor Jones
select distinct customer-name
from Borrower
where customer-name not in(Smith, Jones)
   Set Comparison
o Find the names of all branches that have assets  greater than those of  at least one
branch located in Mathura
select branch-name
from Branch
where asstets > some (select assets
from Branch
where branch-city = Mathura )
o Apart from > some others comparison could be < some , <= some , >= some ,
= some , < > some.
o Find  the  names  of  all  branches  that  have assets  greater  than  that  of  each  branch
located in Mathura
select branch-name
from Branch
where asstets > all (select assets
from Branch
where branch-city = Mathura )
o Apart from  > all others  comparison  could  be  < all , <= all , >= all , = all ,
< >all.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing  
 
Department of Electrical and Electronics                                                                                   By: Sulabh Bansal 
 Lecture-15 
Views 
?  In SQL create view command is used to define a view as follows: 
    create view v as <query expression> 
where <query expression> is any legal query expression and v is the view name. 
?  The  view  consisting  of  branch  names  and  the  names  of  customers  who  have  either  an 
account or a loan at the branch. This can be defined as follows: 
create view  All-customer  as 
  (select branch-name, customer-name 
   from Depositor, Account 
   where Depositor.account-number=account.account-number) 
union 
 (select branch-name, customer-name 
  from Borrower, Loan 
  where Borrower.loan-number = Loan.loan-number) 
?  The  attributes  names  may  be  specified  explicitly  within  a  set  of  round  bracket  after  the 
name of view. 
?  The  view  names  may  be  used  as  relations  in  subsequent  queries.  Using  the  view  All-
customer find all customers of Sadar branch 
select customer-name 
from All-customer 
where branch-name= Sadar 
?  A  create-view  clause  creates  a  view  definition  in  the  database  which  stays  until  a 
command - drop view view-name - is executed. 
 
Modification of Database 
?  Deletion 
o  In  SQL  we  can  delete  only  whole  tuple  and  not  the  values  on  any  particular 
attributes. The command is as follows: 
   delete from r 
   where P. 
where P is a predicate and r is a relation. 
o  delete command operates on only one relation at a time. Examples are as follows: 
o  Delete all tuples from the Loan relation 
delete from Loan 
o  Delete all of the Smiths account records 
     delete from Depositor 
     where customer-name = Smith 
o  Delete all loans with loan amounts between Rs 1300 and Rs 1500. 
    delete from  Loan 
    where  amount between 1300 and 1500 
 
 
 
 
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing  
 
Department of Electrical and Electronics                                                                                   By: Sulabh Bansal 
o  Delete the records of all accounts with balances below the average at the bank 
    delete from Account 
    where balance < ( select avg(balance) 
                                   from Account) 
 
?  Insertion 
o  In SQL we either specify a tuple to be inserted or write a query whose result is a 
set of tuples to be inserted. Examples are as follows: 
o  Insert an account of account  number  A-9732 at the Sadar branch  having balance 
of Rs 1200 
    insert into  Account 
               values(Sadar, A-9732, 1200) 
the  values  are  specified  in  the  order  in  which  the  corresponding  attributes  are 
listed in the relation schema. 
o  SQL allows the attributes to be specified as part of the  insert  statement  
     insert into  Account(account-number, branch-name, balance) 
               values(A-9732, Sadar, 1200) 
     insert into  Account(branch-name, account-number, balance) 
               values(Sadar, A-9732, 1200) 
o  Provide  for all loan customers of the Sadar branch a new Rs 200 saving account 
for each loan account they have. Where loan-number serve as the account number 
for these accounts. 
     insert into  Account 
                select branch-name, loan-number, 200 
                from Loan 
                where branch-name = Sadar 
?  Updates 
o  Used to change a value in a tuple without changing all values in the tuple. 
o  Suppose that annual interest payments are being made, and all balances are to be 
increased by 5 percent. 
     update Account 
     set balance = balance * 1.05 
o  Suppose  that  accounts  with  balances  over  Rs10000  receive  6  percent  interest, 
whereas all others receive 5 percent. 
     update Account 
     set balance = balance * 1.06 
     where balance > 10000 
 
     update Account 
     set balance = balance * 1.05 
     where balance <= 10000 
 
 
 
 
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing  
 
Department of Electrical and Electronics                                                                                   By: Sulabh Bansal 
Data Definition Language 
?  Data Types in SQL 
o  char(n): fixed length character string, length n. 
o  varchar(n): variable length character string, maximum length n. 
o  int: an integer. 
o  smallint: a small integer. 
o  numeric(p,d): fixed point number, p digits( plus a sign), and d of the p digits are 
to right of the decimal point. 
o  real, double precision: floating point and double precision numbers. 
o  float(n): a floating point number, precision at least n digits. 
o  date: calendar date; four digits for year, two for month and two for day of month. 
o  time: time of day n hours minutes and seconds. 
?  Domains can be defined as 
   create domain person-name char(20). 
 the  domain  name  person-name  can  be  used  to  define  the  type  of  an  attribute  just  like 
built-in domain. 
?  Schema Definition in SQL 
o  create table  command is used to define relations. 
     create table r (A
1
D
1,
 A
2
D
2
,, A
n
D
n
, 
                              <integrity constraint
1
>, 
                               , 
                              <integrity constraint
k
>) 
where r is relation name, each A
i
 is the name of attribute, D
i
 is the domain type of 
values of A
i
. Several types of integrity constraints are available to define in SQL. 
o  Integrity Constraints which are allowed in SQL are 
          primary key(A
j1,
 A
j2,,
 A
jm
) 
and 
          check(P) where P is the predicate. 
o  drop table command is used to remove relations from database. 
o  alter table  command is used to add attributes to an existing relation 
        alter table r add A D 
   it will add attribute A of domain type D in relation r. 
       alter table r drop A 
   it will remove the attribute A of relation r. 
   
www.jntuworld.com
www.jntuworld.com
www.jwjobs.net
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-16
Integrity Constraints
 Integrity Constraints guard against accidental damage to the database.
 Integrity constraints are predicates pertaining to the database.
 Domain Constraints:
o Predicates defined on the domains are Domain constraints.
o Simplest Domain constraints are defined by defining standard data types of the
attributes like Integer, Double, Float, etc.
o We can define domains by create domain clause also we can define the
constraints on such domains as follows:
create domain hourly-wage numeric(5,2)
constraint wage-value-test check(value >= 4.00)
So we can use hourly-wage as data type for any attribute where DBMS will
automatically allow only values greater than or equal to 4.00.
o Other examples for defining Domain constraints are as follows:
create domain account-number char(10)
constraint account-number-null-test check(value not null)
create domain account-type char(10)
constraint account-tyope-test
check(value in ( Checking, Saving))
By using the later domain of two above the DBMS will allow only values for any
attribute having type as account-type i.e. Checking and Saving.
 Referential Integrity:
o Foreign  Key:  If  two  table  R  and  S  are  related  to  each  other, K1  and  K2  are
primary keys of the two relations also K1 is one of the attribute in S. Suppose we
want that every row in S must have a corresponding row in R, then we define the
K1 in S as foreign key. Example in our original database of library we had a table
for  relation  BORROWEDBY,  containing two  fields  Card  No. and  Acc.  No.  .
Every  row  of  BORROWEDBY  relation  must  have  corresponding  row  in  USER
Table having  same  Card  No. and  a  row  in  BOOK  table  having  same  Acc.  No..
Then  we  will  define  the  Card  No. and  Acc.  No.  in  BORROWEDBY  relation  as
foreign keys.
o In other way we can say that every row of BORROWEDBY relation must refer to
some row in BOOK and also in USER tables.
o Such  referential  requirement  in  one  table  to  another  table  is  called  Referential
Integrity.
o Referential Integrity constraints are defined by defining some of the attributes in a
table, which forms primary key of some other table, as foreign key.
 Functional Dependencies
o Suppose  in  a  relation having  schema R,   R and   R.  A  functional
dependency  holds on R if, in any table having schema R, for every two rows
r1 and r2 the values of attributes  are same in r1 and r2 then values of attributes 
are also same.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
o Consider for example the table as follows
Seq
A B C D
1
a
1
b
1
c
1
d
1
2
a
1
b
2
c
1
d
2
3
a
2
b
2
c
2
d
2
4
a
2
b
3
c
2
d
3
5
a
3
b
3
c
2
d
4
Check if AC Holds, find pair of rows where value of A is same
 row 1 and 2, value of A is same and C is also same
 row 3 and 4, Value of A is same and C is also same
 No other two rows having same value on A, So AC holds.
Check if CA Holds, find pair of rows where value of C is same
 row 1 and 2, value of C is same and A is also same
 row 3 and 4, value of C is same and A is also same
 row 4 and 5, value of C is same but A is not same, So CA doesnt hold.
We can prove ABD also holds, find pair of rows where value of A and B
are both same
 No row where A and B both are same, So ABD holds
o If  K  is  a  super  key  of  a  relation  R  then  it  means  functional  dependency KR
holds and vice versa.
o Armstrongs  Rules: Suppose  there  is  a  given  relation  R  and  a  set  of  functional
dependencies F that holds on R. Then these rules can be used to derive all of the
other functional dependencies which are logically implied from the given relation
R and functional dependencies F.
 Reflexivity rule: if  is a set of attributes and   , then  holds.
 Augmentation  rule: if  holds  and is  a  set  of  attributes,  then
 holds.
 Transitivity rule: if  holds and  holds, then  holds.
o Additional rules are also formed to simplify deriving new functional dependencies
since applying Armstrongs rules is a lengthy and tiresome task. Although we can
generate all the functional dependencies using only Armstrongs rule.
 Union rule: if  holds and  holds, then  holds.
 Decomposition rule. if  holds, then  holds and  holds.
 Pseudotransitivity  rule. If  holds  and  holds,  then 
holds.
o Closure  of  Functional  Dependencies: Suppose  the  given  set  of  functional
dependencies  is  F  for  a  given  relation  schema  R.  When  we  apply  various  rules
stated above and generate all of the possible newer functional dependencies. Then
the  set  containing  all  these  newer  functional  dependencies  and  the  given  set  of
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
functional dependencies F is called the closure of functional dependencies and is
denoted as F
+
.
o Consider schema R=( A, B, C, G, H, I ) and the set of functional dependencies F
containing following functional dependencies.
 AB
   AC
 CGH
 CGI
 BH
 Find other functional dependencies that can be derived using various rules
given above
 Examples are as follows-
 AH can  be  derived  using  functional  dependencies  1  and  5  and
transitivity rule.
 CGHI can be derived using functional dependencies 3 and 4 and union
rule.
 AGI can be derived using 2 and 4 and Pseudotransitivity.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-17
Normal Forms
 Some of the undesirable properties that a bad database design may have
o Repetition of information
o Inability to represent certain information
o Incapability to maintain integrity of data
 The normal forms of relational database theory provide criteria for determining a table's
degree of vulnerability to logical inconsistencies and anomalies.
 The  higher  the  normal  form  applicable  to  a  table,  the  less  vulnerable  it  is  to
inconsistencies and anomalies.
 Each  table  has  a  "highest  normal  form"  (HNF):  by  definition,  a  table  always  meets  the
requirements of its HNF and of all normal forms lower than its HNF; also by definition, a
table fails to meet the requirements of any normal form higher than its HNF.
 Generally known hierarchy of normal forms is as follows First Normal Form(1NF),
Second Normal Form(2NF), Third Normal Form(3NF), Fourth Normal Form(4NF), Fifth
Normal Form(5NF).
 We will discuss only up to 3NF of above hierarchy and another normal form Boyce-Codd
Normal Form(BCNF) in this course.
First Normal Form
 According to Date's definition of 1NF, a table is in 1NF if and only if it is "isomorphic to
some relation", which means, specifically, that it satisfies the following five conditions:
1. There's no top-to-bottom ordering to the rows.
2. There's no left-to-right ordering to the columns.
3. There are no duplicate rows.
4.  Every  row-and-column  intersection  contains  exactly  one  value  from  the
applicable domain (and nothing else).
5. All columns are regular [i.e. rows have no hidden components such as row IDs,
object IDs, or hidden timestamps].
 Examples of tables (or views) that would not meet this definition of 1NF are:
o A  table  that  lacks  a unique  key.  Such  a  table  would  be  able  to  accommodate
duplicate rows, in violation of condition 3.
o A view whose definition mandates that results be returned in a particular order, so
that  the  row-ordering  is  an  intrinsic  and  meaningful  aspect  of  the  view.  This
violates  condition  1.  The tuples in  true  relations  are  not  ordered  with  respect  to
each other.
o A table which is having at least one nullable attribute. A nullable attribute would
be  in  violation  of  condition  4,  which  requires  every  field  to  contain  exactly  one
value  from  its  column's  domain.  It  should  be  noted,  however,  that  this  aspect  of
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
condition  4  is  controversial.  It  marks  an  important  departure  from Codd's  later
vision of the relational model, which made explicit provision for nulls.
 Codd states that the "values in the domains on which each relation is defined are required
to  be  atomic  with  respect  to  the  DBMS."  Codd  defines  an  atomic  value  as  one  that
"cannot  be  decomposed  into  smaller  pieces  by  the  DBMS  (excluding  certain  special
functions)." Meaning a field should not be divided into parts with more than one kind of
data  in  it  such  that  what  one  part  means  to  the  DBMS  depends  on  another  part  of  the
same field.
 Suppose  a  novice designer  wish  to  record  the  names  and  telephone  numbers  of
customers. He defines a customer table which looks like this:
Customer
Customer ID First Name Surname
Telephone
Number
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659
789 Maria Fernandez 555-808-9633
 The  designer  then  becomes  aware  of  a  requirement  to  record multiple telephone
numbers  for  some  customers.  He  reasons  that  the  simplest  way  of  doing  this  is  to
allow  the  "Telephone  Number"  field  in  any  given  record  to  contain  more  than  one
value:
Assuming,  however,  that  the  Telephone  Number  column  is  defined  on  some  Telephone
Number-like  domain  (e.g.  the  domain  of  strings  12  characters  in  length),  the
Customer
ID
First Name Surname
Telephone
Number
123 Robert Ingram 555-861-2025
456 Jane Wright
555-403-1659
555-776-4100
789 Maria Fernandez 555-808-9633
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
representation  above  is  not  in  1NF.  1NF  (and,  for  that  matter,  the RDBMS)  prevents  a
single field from containing more than one value from its column's domain.
 Repeating  groups  across  columns: The  designer  might  attempt  to  get  around  this
restriction by defining multiple Telephone Number columns:
Customer
ID
First
Name
Surname Tel. No. 1 Tel. No. 2 Tel. No. 3
123 Robert Ingram 555-861-2025
456 Jane Wright 555-403-1659 555-776-4100 555-403-1659
789 Maria Fernandez 555-808-9633
This  representation,  however,  makes  use  of  nullable  columns,  and  therefore  does  not
conform to Date's definition of 1NF. Even if the view is taken that nullable columns are
allowed,  the  design  is  not  in  keeping  with  the  spirit  of  1NF.Tel.  No.  1,  Tel.  No.  2.,  and
Tel. No. 3. share exactly the same domain and exactly the same meaning; the splitting of
Telephone  Number  into  three  headings  is  artificial  and  causes  logical  problems.  These
problems include:
o Difficulty  in  querying  the  table.  Answering  such  questions  as  "Which
customers have telephone number X?" and "Which pairs of customers share a
telephone number?" is awkward.
o Inability  to  enforce  uniqueness  of  Customer-to-Telephone  Number  links
through  the  RDBMS.  Customer  789  might  mistakenly  be  given  a  Tel.  No.  2
value that is exactly the same as her Tel. No. 1 value.
o Restriction  of  the  number  of  telephone  numbers  per  customer  to  three.  If  a
customer  with  four  telephone  numbers  comes  along,  we  are  constrained  to
record  only  three  and  leave  the  fourth  unrecorded.  This  means  that  the
database  design  is  imposing  constraints  on  the  business  process,  rather  than
(as should ideally be the case) vice-versa.
 Repeating  groups  within  columns: The  designer  might,  alternatively,  retain  the
single Telephone Number column but alter its domain, making it a string of sufficient
length to accommodate multiple telephone numbers:
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
This  design  is consistent  with  1NF according  to  Dates  definition  but  not  according  to
Codds  definition.  It presents  several  design  issues.  The  Telephone  Number  heading
becomes semantically woolly, as it can now represent either a telephone number, a list of
telephone numbers, or indeed anything at all. A query such as "Which pairs of customers
share a telephone number?" is more difficult to formulate, given the necessity to cater for
lists  of  telephone  numbers  as  well  as  individual  telephone  numbers.  Meaningful
constraints on telephone numbers are also very difficult to define in the RDBMS with this
design.
 A design that complies with 1NF:A design that is unambiguously in 1NF makes
use of two tables: a Customer Name table and a Customer Telephone Number table.
Customer Name Customer Telephone
Customer
ID
First
Name
Surname
123 Robert Ingram
456 Jane Wright
789 Maria Fernandez
Repeating groups of telephone numbers do not occur in this design. Instead, each
Customer-to-Telephone Number link appears on its own record.
It  is  worth  noting  that  this  design  meets  the  additional  requirements  for second
and third normal form (3NF).
Customer
ID
First
Name
Surname
Telephone
Numbers
123 Robert Ingram 555-861-2025
456 Jane Wright
555-403-1659,
555-776-4100
789 Maria Fernandez 555-808-9633
Customer
ID
Telephone
Number
123 555-861-2025
456 555-403-1659
456 555-776-4100
789 555-808-9633
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-18
Second Normal Form
 2NF was originally defined by E.F. Codd in 1971.
 A 1NF table is in 2NF if and only if, given any candidate key K and any attribute A
that  is  not  a  constituent  of  a  candidate  key, A depends  upon  the  whole  of K rather
than just a part of it
 A  1NF  table  is  in  2NF  if  and  only  if  all  its  non-prime  attributes  are  functionally
dependent  on  the  whole  of  every  candidate  key.  (A  non-prime  attribute  is  one  that
does not belong to any candidate key.)
 Note  that  when  a  1NF  table  has  no  composite  candidate  keys  (candidate  keys
consisting of more than one attribute), the table is automatically in 2NF.
 Consider a table describing employees' skills:
Employees' Skills
Employee   Skill
Current
Work
Location
Jones Typing
114
Main
Street
Jones Shorthand
114
Main
Street
Jones Whittling
114
Main
Street
Bravo
Light
Cleaning
73
Industrial
Way
Ellis Alchemy
73
Industrial
Way
Ellis Flying
73
Industrial
Way
Harrison
Light
Cleaning
73
Industrial
Way
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Neither {Employee} nor {Skill} is a candidate key for the table. This is because a
given  Employee  might  need  to  appear  more  than  once  (he  might  have  multiple
Skills),  and  a  given  Skill  might  need  to  appear  more  than  once  (it  might  be
possessed  by  multiple  Employees).  Only  the  composite  key  {Employee,  Skill}
qualifies as a candidate key for the table.
The remaining attribute, Current Work Location, is dependent on only part of the
candidate  key,  namely  Employee.  Therefore  the  table  is  not  in  2NF.  Note  the
redundancy in the way Current Work Locations are represented: we are told three
times  that  Jones  works  at  114  Main  Street,  and  twice  that  Ellis  works  at  73
Industrial Way. This redundancy makes the table vulnerable to update anomalies:
it  is,  for  example,  possible  to  update  Jones'  work  location  on  his  "Typing"  and
"Shorthand"  records  and  not  update  his  "Whittling"  record.  The  resulting  data
would  imply  contradictory  answers  to  the  question  "What  is  Jones'  current  work
location?"
 A 2NF alternative to this design would represent the same information in two tables:
an  "Employees"  table  with  candidate  key  {Employee},  and  an  "Employees'  Skills"
table with candidate key {Employee, Skill}:
Employees                                                          Employees Skills
Employee   Current Work Location
Jones 114 Main Street
Bravo 73 Industrial Way
Ellis 73 Industrial Way
Harrison 73 Industrial Way
Neither of these tables can suffer from update anomalies.
 Not  all  2NF  tables  are  free  from  update anomalies,  however.  An  example  of  a  2NF
table which suffers from update anomalies is:
Tournament Winners
Tournament   Year   Winner
  Winner  Date  of
Birth
Des Moines Masters 1998 Chip Masterson 14 March 1977
Employee   Skill
Jones Typing
Jones Shorthand
Jones Whittling
Bravo Light Cleaning
Ellis Alchemy
Ellis Flying
Harrison Light Cleaning
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
Even though Winner and Winner Date of Birth are determined by the whole key
{Tournament / Year} and not part of it, particular Winner / Winner Date of Birth
combinations are shown redundantly on multiple records. This leads to an update
anomaly: if updates  are  not carried out consistently, a particular winner could be
shown as having two different dates of birth.
The underlying problem is the transitive dependency to which the Winner Date of
Birth  attribute  is  subject.  Winner  Date  of  Birth  actually  depends  on  Winner,
which in turn depends on the key Tournament / Year.
 This problem is addressed by third normal form (3NF)
   Note: In addition to the primary key, the table may contain other candidate keys; it is
necessary to establish that no non-prime attributes have part-key dependencies on any
of these candidate keys.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-19
Third Normal Form:
 3NF  as  defined  by  E.F.  Codd  in  1971  is - a  table  is  in  3NF  if  and  only  if  both  of  the
following conditions hold:
o The relation R (table) is in second normal form (2NF)
o Every  non-prime  attribute  of  R  is  non-transitively  dependent  (i.e.  directly
dependent) on every candidate key of R.
o Note:
 A non-prime  attribute of  R  is  an  attribute  that  does  not  belong  to  any
candidate key of R.
 A  transitive  dependency  is  a functional  dependency  in  which  X   Z (X
determines Z)  indirectly,  because X  Y and Y  Z (where  it  is  not  the
case that Y  X).
 A 3NF definition, equivalent to Codd's given by Carlo Zaniolo in 1982, states that a table
is in 3NF if and only if, for each of its functional dependencies X  A, at least one of the
following conditions holds:
o   X contains A (that is, X  A is trivial functional dependency), or
o   X is a superkey, or
o Each attribute in X-A is a prime attribute (i.e., it is contained within a candidate
key)
 Zaniolo's  definition  gives  a  clear  sense  of  the  difference  between  3NF  and  the  more
stringent  Boyce-Codd  normal  form  (BCNF).  BCNF  simply  eliminates  the  third
alternative ("X-A has only prime attribute").
 Difference  between  2NF  and  3NF  can  be  stated  as:  non-key  attributes  be  dependent  on
"the whole key" ensures that a table is in 2NF; while that non-key attributes be dependent
on "nothing but the key" ensures that the table is in 3NF.
 Example of table given above :
Tournament Winners
Tournament Year Winner Winner Date of Birth
Des Moines Masters 1998 Chip Masterson 14 March 1977
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
This  table  is  in  2NF  but  not  in  3NF. The  breach  of  3NF  occurs  because  the  non-prime
attribute  Winner  Date  of  Birth  is  transitively  dependent  on  the  candidate  key
{Tournament,  Year}  via  the  non-prime  attribute  Winner.  The  fact  that  Winner  Date  of
Birth  is  functionally  dependent  on  Winner  makes  the  table  vulnerable  to  logical
inconsistencies,  as  there  is  nothing  to  stop  the  same  person  from  being  shown  with
different dates of birth on different records.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
In order to express the same facts without violating 3NF, it is necessary to split the table
into two:
Tournament Winners                                                 Player Dates of Birth
Tournament Year Winner
Des Moines Masters 1998 Chip Masterson
Indiana Invitational 1998 Al Fredrickson
Cleveland Open 1999 Bob Albertson
Des Moines Masters 1999 Al Fredrickson
Indiana Invitational 1999 Chip Masterson
Boyce-Codd Normal Form:
 It is a slightly stronger version of the third normal form (3NF). A table is in Boyce-Codd
normal form if and only  if for every one of its non-trivial [dependencies] X  Y, X is a
superkeythat is, X is either a candidate key or a superset thereof.
 Note the above set of tables Tournament Winners and Player Dates of Birth shown as
in 3NF are also in BCNF
 Only  in  rare  cases  does  a  3NF  table  not  meet  the  requirements  of  BCNF.  A  3NF  table
which does not have multiple overlapping candidate keys is guaranteed to be in BCNF
 An example of a 3NF table that does not meet BCNF is
Today's Court Bookings
Court Start Time End Time Rate Type
1 09:30 10:30 SAVER
1 11:00 12:00 SAVER
1 14:00 15:30 STANDARD
2 10:00 11:30 PREMIUM-B
2 11:30 13:30 PREMIUM-B
2 15:00 16:30 PREMIUM-A
There are two courts available and there are four distinct rate types:
 SAVER, for Court 1 bookings made by members
 STANDARD, for Court 1 bookings made by non-members
 PREMIUM-A, for Court 2 bookings made by members
 PREMIUM-B, for Court 2 bookings made by non-members
So, Rate Type  Court is only non-trivial functional dependency that holds.
o We can observe that the table's candidate keys are:
 {Court, Start Time}
Player Date of Birth
Chip Masterson 14 March 1977
Al Fredrickson 21 July 1975
Bob Albertson 28 September 1968
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
 {Court, End Time}
 {Rate Type, Start Time}
 {Rate Type, End Time}
o In the Today's Court Bookings table, there are no non-prime attributes: that is, all
attributes  belong  to  candidate  keys.  Therefore  the  table  adheres  to  both  2NF  and
3NF
o The  table  does  not  adhere  to  BCNF  because  in  the  dependency  Rate  Type 
Court, the determining attribute (Rate Type) is not a super key.
 The design can be amended so that it meets BCNF as follows:
Rate Types                                                             Todays Bookings
Rate Type Court Member Flag
SAVER 1 Yes
STANDARD 1 No
PREMIUM-A 2 Yes
PREMIUM-B 2 No
The candidate keys for the Rate Types table are {Rate Type} and {Court, Member Flag};
the candidate keys for the Today's Bookings table are {Rate Type, Start Time} and {Rate
Type, End Time}. Both tables are in BCNF.
Rate Type Start Time End Time
SAVER 09:30 10:30
SAVER 11:00 12:00
STANDARD 14:00 15:30
PREMIUM-B 10:00 11:30
PREMIUM-B 11:30 13:30
PREMIUM-A 15:00 16:30
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-20
Consider the following table:
Lending
branch-name branch-city assets customer-name loan-number amount
Sadar Agra 200000 Ram L-12 12000
Sanjay-place Agra 100000 Ram L-13 13000
This table stores the information regarding loans. This table has following problems:
 Since  every  branch  is  going  to  have  several  loans,  the  table  will  have  one  row  for  each
loan taken from a branch all of which will have same value for the columns branch-name,
branch-city and assets, repetition of data.
 Updating  the  branch-city  or  assets  of  a  particular  branch  will  require  updating  each  row
of this table and hence the operation will be costly.
 If we miss any row without updating then there will be more than one value for a branch
city or assets of a branch, which means breaching the data integrity.
 If there is a branch having no loans then we will not have any entry in this table and we
will not be able represent the complete information.
Decomposition
 The above problem can  be solved by decomposing the above table. The  set of relations
R
1
,  R
2
,R
n
is  a  decomposition  of  relation  R  if  R  =    R
1
  R
2
  R
n
. It  should  be
noted that every pair R
i
and R
i+1
of this set should have at least one common attribute so
that they can be combined back again using join operation.
 But all decompositions of this table will not be free from problem.
 Consider for example if we form two new tables out of our Lending table as follows
Branch-customer-schema = (branch-name, branch-city, assets, customer name)
Customer-loan-schema = (customer-name, loan-number, amount)
Then the resulting tables with data will be as follows:
Branch-customer
branch-name branch-city assets customer-name
Sadar Agra 200000 Ram
Sanjay-place Agra 100000 Ram
Customer-loan
customer-name loan-number amount
Ram L-12 12000
Ram L-13 13000
Now suppose to know the branch for loan L-12 we try to form join of these two we will
a table as follows:
Branch-customer  Customer-loan =
branch-name branch-city assets customer-name loan-number amount
Sadar Agra 200000 Ram L-12 12000
Sadar Agra 200000 Ram L-13 13000
Sanjay-place Agra 100000 Ram L-12 12000
Sanjay-place Agra 100000 Ram L-13 13000
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
According to this join both of the loans are taken from both of the branches. This is an
example  of  information  loss.  This  occurred  because  the  choice  of  Column  to  be  kept
common in two tables after decomposition is wrong.
   Lossless-Join Decomposition: A decomposition { R
1
, R
2
,R
n
} of relation schema R is
lossless join decomposition if for all legal relations r on schema R,
r = 
R
1
(r)
R
1
(r) 
R
n
(r)
In other words after decomposition, when we join all of the decomposed tables with data
it should result in the original table with data as was before decomposition.
 Otherwise it is called Lossy-join decomposition.
   Dependency  preservation: This  is  another  desirable  property  of  a  decomposition.
Suppose it is given that a set F of functional dependencies holds on any relation based on
schema R. Then set of functional dependencies that holds on any relation subschema R
1
is F
1
that contains all the functional dependencies of F which contains attributes of only
R
1
.  So  if  decomposition  of  R  is  { R
1
,  R
2
,R
n
}  such  that  corresponding  functional
dependencies which holds on them are { F
1
, F
2
,F
n
} then following should be true.
F
+
= {F
1
  F
2
    F
n
}
+
.
Such a decomposition is called dependency preserving decomposition.
For example:
Consider the schema R = {A, B, C, D} such that following functional dependency holds
on it F = {AB, A BC, C D}.
Now  suppose  the  decomposition  of  this  R  is  R
1
=  {A,B}  and  R
2
=  {B,C,D},  so  the
functional dependencies which holds on R
1
are F
1
= {AB} (Note: F
1
should contain all
the functional dependencies in F which have only attributes of R
1
) and those on R
2
are F
2
={CD}. If we union F
1
  F
2
is {AB, C D} which doesnt contain the A BC , so
it is not a dependency preserving decomposition.
If  we  decompose  R  into  these  relation  schemas  R
1
={A,B,C}  and  R
2
={C,D} then
F
1
={AB, A BC} and F
2
={CD} so F
1
  F
2
is {AB, A BC, C D}.
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture-21
Normalization Using Functional Dependency
 Lossless-Join Decomposition using FD:
o Let R is relation schema and F is a set of functional dependency on R. Let R
1
and
R
2
form a decomposition of R. This decomposition is lossless join decomposition
if at least one of the following functional dependency is in F
+
:
 R
1
 R
2
 R
1
 R
1
 R
2
 R
2
o Example:  Lending-schema=(branch-name,  branch-city,  assets,  customer-name,
loan-number, amount) the FD that holds on this schema are given as
branch-name  assets branch-city
loan-number  amount branch-name
so the decomposition of it into two schema as follows:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
is a lossless join decomposition because-
Branch-schema  Loan-info-schema = branch-name
and  we  have  an  FD branch-name   assets  branch-city,  applying  augmentation
rule  to  it,  this  FD  is  equivalent  to branch-name   branch-name  assets  branch-
city i.e. branch-name Branch-schema.
 Third Normal Form Using FD:
o Let  R  is  a  relation  having  F  as  the  minimal  set  of  functional  dependencies  that
holds on R.
Then do the following:
1. Initially have an empty set of relations.
2. for each FD in F, , i=1
 Add a relation R
i
=( ,) if no other relation contains , , Increase
i by one
3. After  adding  all  such  relations  add  another  relation  R
i
=  (  any  candidate
key of R) if no other relation is containing a candidate key.
 Boyce-Codd Normal Form using FD:
1. Let R
i
be relation i.e. not in BCNF
2. And,  let  is  the  FD  that  holds  on  but R
i
doesnt  hold  on  (i.e.  is  not  a
super key of R
i
)
3. Replace relation R
i
by two relations (, ) and (R
i
- ).
4. Now check again all the relations present with all the FDs that holds on them and
Go back to step 1.
o Example:
 Consider: Lending-schema=(branch-name,  branch-city,  assets,  customer-
name, loan-number, amount) the FD that holds on this schema are given as
1. branch-name  assets branch-city
2. loan-number  amount branch-name
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
 We can see that Lending-schema is not in BCNF. Also we see that in FD
branch-name   assets  branch-city, branch-name  is  not  superkey  of
Lending-schema. So new relations is a set as follows:
Branch-schema=(branch-name, branch-city, assets)
branch-name  assets branch-city
Loan-info-schema  =  (branch-name,  customer-name,  loan-number,
amount)
loan-number  amount branch-name
 Again in the new set of relations we see Loan-info-schema is not in BCNF
as  loan-number  is  not  a  super  key  of  Loan-info-schema.  Again  we
decompose it and the set of relations are
Branch-schema=(branch-name, branch-city, assets)
branch-name  assets branch-city
Loan-schema = (branch-name, loan-number, amount)
loan-number  amount branch-name
Borrower-schema = (customer-name, loan-number)
 Now  all  of  the  three  relations  are  in  BCNF  so  we  do  not  have  to
decompose any more.
 BCNF may not satisfy the dependency preservation criteria.
o In  some  cases,  a  non-BCNF  table cannot  be  decomposed  into  tables  that  satisfy
BCNF and preserve the dependencies that held in the original table
o For  example,  a  set  of  functional  dependencies  {AB   C,  C    B}  cannot  be
represented by a BCNF schema.
o Unlike the first three normal forms, BCNF is not always achievable.
o Consider  the  following  non-BCNF  table  whose  functional  dependencies  follow
the {AB C, C  B} pattern:
Nearest Shop
Person Shop Type Nearest Shop
Davidson Optician Eagle Eye
Davidson Hairdresser Snippets
Wright Bookshop Merlin Books
Fuller Bakery Doughy's
Fuller Hairdresser Sweeney Todd's
Fuller Optician Eagle Eye
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
For  each  Person  /  Shop  Type  combination,  the  table  tells  us  which  shop  of  this
type is geographically nearest to the person's home. We assume for simplicity that
a single shop cannot be of more than one type.
The candidate keys of the table are:
 {Person, Shop Type}
 {Person, Nearest Shop}
Because all three attributes are prime attributes (i.e. belong to candidate keys), the
table is in 3NF. The table is not in BCNF, however, as the Shop Type attribute is
functionally dependent on a non-superkey: Nearest Shop.
Shop Near Person                                                      Shop
Person   Shop
Davidson Eagle Eye
Davidson Snippets
Wright Merlin Books
Fuller Doughy's
Fuller Sweeney Todd's
Fuller Eagle Eye
The  "Shop  Near  Person"  table  has  a  candidate  key  of  {Person, Shop},  and  the
"Shop"  table  has  a  candidate  key  of  {Shop}.  Unfortunately,  although  this  design
adheres  to  BCNF,  it  is  unacceptable  on  different  grounds:  it  allows  us  to  record
multiple  shops  of  the  same  type  against  the  same  person.  In  other  words,  its
candidate  keys  do  not  guarantee  that  the  functional  dependency  {Person,  Shop
Type}  {Shop} will be respected.
Shop   Shop Type
Eagle Eye Optician
Snippets Hairdresser
Merlin Books Bookshop
Doughy's Bakery
Sweeney Todd's Hairdresser
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture 22
Multivalued Dependencies
 Let  R  be  a  relation  schema,  and  X  and  Y  be  disjoint  subsets  of  R  (i.e.,  X R,  Y R,
XY= ), and Z = R- XY.A relation r(R) satisfies XY if for any two tuples t
1
and t
2
,
o t
1
(X)=t
2
(X), then there exist t
3
in r such that
o t
3
(X)=t
1
(X), t
3
(Y)=t
1
(Y), t
3
(Z)=t
2
(Z).
o By symmetry, there exist t4 in r such that
o t
4
(X)=t
1
(X), t
4
(Y)=t
2
(Y), t
4
(Z)=t
1
(Z).
X   Y   Z
t1 x1 y1 z1
t2 x1 y2 z2
t3 x1 y1 z2
t4 x1 y2 z1
 The  MVD  X Y  says  that  the  relationship  between  X  and  Y  is  independent  of  the
relationship between X and R-Y
 For example consider the table Employee:
Employee-name   Project-name   Dependant-name
Smith X John
Smith Y Ann
Smith X Ann
Smith Y John
o MVDs  Employee-name Project-name  and  Employee-name Dependant-name
hold in the relation
o The employee named Smith works on projects X and Y, and has two dependents
John and Ann.
o If we store only the first two tuples in the relation, it would incorrectly show the
associations among attributes
o If we have MVDs in a relation, we may have to repeat values redundantly in the
tuples.  In  the  Employee  relation,  values  X  and  Y  of  Project-name  are  repeated
with each value of Dependant-name--- clearly undesirable
o Problem: Employee schema is in BCNF because no FDs hold for it
o Trivial MVD: If MVD X Y is satisfied by all relations whose schemas include X
and Y, it is called trivial MVD.
 XY is trivial whenever Y X or XY=R
o If a relation r fails to satisfy a given MVD, a relation r that satisfies the MVD can
be constructed by adding tuples to r.
 MVD is called "tuple generating dependency"
 compare it with FD: need to delete tuples to make the relation to satisfy a
given FD
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
o MVD can be used in two ways
 test relations to determine whether they are legal under a given set of FDs
and MVDs
 specify constraints on a set of relations
 Let D: a set of FDs and MVDs then D
+
: the closure of D is the set of all FDs and MVDs
logically implied by D.
 D
+
can be computed using the following set of sound and complete rules
1. reflexivity: if Y X then XY
2. augmentation: if X Y then WX Y
3. transitivity: if XY and YZ then X Z
4. complementation: if XY then XR-XY
5. MV augmentation: if XY and W R, V W,then WXVY
6. MV transitivity: if X Y and YZ then XZ-Y
7. replication: if X Y then XY
8. coalescence: if XY and ZY, WR, WY= , WZ, then XZ
 Note: The first three rules are Armstrongs axioms.
Fourth Normal Form(4NF):
 A relation scheme R is in 4NF w.r.t. D, if for every non-trivial MVD XY in D+, X is a
superkey for R
 4NF vs BCNF
o 4NF is different from BCNF only in the use of D (FD + MVD) instead of F (FDs)
o every 4NF schemas are also in BCNF.
 By replication rule, XY implies XY.
o If R is not in BCNF, there exists a non-trivial FD XY where X is not a superkey
--- R cannot be in 4NF
 For example: Employee (Employee-name, Project-name, Dependant-name) is not in 4NF,
since
o Employee-namePproject-name but Employee-name is not a key.
o Decompose into Emp-proj (E-n, P-n) and Emp-dep (E-n, D-n) do bring the tables
in 4NF
 For  example:  Borrow  (Loan#,  C-name,  Street,  C-city)  is  in  BCNF,  but  not  in  4NF,
because  C-nameLoan#  is  a  non-trivial  MVD, where  C-name  is  not  a  key  in  this
schema.
 The  decomposition -- R1=(C-name,  Loan#), R2=(C-name,  Street,  C-city)brings  them
in 4NF
 Benefits of Fourth Normal Form
o Reduced number of tuples
o No anomalies for insert/delete/update
 Comparing FD and MVD
o if we have (a1,b1,c1,d1)  r and (a1,b2,c2,d2)  r
 AB implies b1=b2
 AB implies (a1,b1,c2,d2)  r and (a1,b2,c1,d1)  r
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
Lecture 23
Join Dependency and Fifth Normal form(Project Join Normal Form):
 The  normal  forms  discussed  so  far  required  that  the  given  relation  R  if  not  in  the  given
normal  form  be  decomposed  in  two  relations  to  meet  the  requirements  of  the  normal
form.  In  some  rare  cases,  a  relation  can  have  problems  like  redundant  information  and
update anomalies because of it but cannot be decomposed in two relations to remove the
problems.  In  such  cases  it  may  be  possible  to  decompose  the  relation  in  three  or  more
relations using the 5NF.
 The  fifth  normal  form  deals  with  join-dependencies  which  is  a  generalisation  of  the
MVD.  The  aim  of  fifth  normal  form  is  to  have  relations  that  cannot  be  decomposed
further. A relation in 5NF cannot be constructed from several smaller relations.
 A relation R satisfies join dependency *(R
1
, R
2
, ..., R
n
) if and only if R is equal to the join
of R
1
, R
2
, ..., R
n
where each R
i
is a subset of the set of attributes of R
 A relation R is in 5NF (or project-join normal form, PJNF) if for all join dependencies of
the form *(R
1
, R
2
, ..., R
n
), where each R
i
is a subset of the set of attributes of R and
R = R
1
 R
2
...R
n
, at least one of the following holds.
o *(R
1
, R
2
, ..., R
n
) is a trivial join-dependency (i.e., one of R
i
is R)
o Every R
i
is a super key for R.
 An example of 5NF can be provided by the example below that deals with departments,
subjects and students.
Department Subject Student
Comp. Sc. CP1000 John Smith
Mathematics MA1000 John Smith
Comp. Sc. CP2000 Arun Kumar
Comp. Sc. CP3000 Reena Rani
Physics PH1000 Raymond Chew
Chemistry CH2000 Albert Garcia
o The  above  relation  says  that  Comp.  Sc.  offers  subjects  CP1000,  CP2000  and
CP3000 which are taken by a variety of students. No student takes all the subjects
and  no  subject  has  all  students enrolled  in  it  and  therefore  all  three  fields  are
needed to represent the information.
o The above  relation does  not show MVDs since the attributes subject  and student
are  not  independent;  they  are  related  to  each  other  and  the  pairings  have
significant information in them. The relation can therefore not be decomposed in
two relations
 (dept, subject), and (dept, student)
s.sanyasirao1@gmail.com
Lecture Notes For DBMS and Data Mining and Data Warehousing
Department of Electrical and Electronics By: Sulabh Bansal
without losing some important information.
o The relation can however be decomposed in the following three relations
 (dept, subject), and
 (dept, student)
 (subject, student)
and now it can be shown that this decomposition is lossless
 Consider the Loan-Info-Schema discussed earlier. Suppose it is given that following join
dependency holds on the Loan-Info-Schema-
*((loan-number,branch-name), (loan-number, customer-name), (loan-number,amount))
Then  it  is  not  in  5
th
normal  form  as  all  of  these  relation  schema  doesnt  represent  the
super  keys  so  we  should  decompose  it  into  three  relations  as  given  by  the  join
dependency i.e. we should have following three relation schemas in place of given Loan-
Info-Schema:
o (loan-number, branch-name),
o (loan-number, customer-name), and
o (loan-number, amount)
s.sanyasirao1@gmail.com