DATABASE MANAGEMENT SYSTEM
G PRAKASH
Semi-structured Data Model
Domain Constraint
Referential Integrity
Relational Databases
based on the relational model and uses a collection of tables to represent both
data and the relationships among those data
Tables
It
also includes DDL and DML
Relational model is an example of record-based model
Database structured in fixed-format records of several types
Each table consists of a particular record type
Each record type defines a fixed number of fields, or attributes
Relationalmodel hides low-level implementation details from database
developers and users
Data-Manipulation Language
SQL is non-procedural
A query takes as input several tables (possibly only one) and always returns a single
table
select instructor.name from instructor where instructor.dept_name = ‘History’;
Queries may involve information from more than one table.
select instructor.ID, department.dept_name from instructor, department where
instructor.dept_name= department.dept_name and department.budget > 95000;
Data-Definition Language
SQL provides a rich DDL that allows one to define tables, integrity constraints,
assertions, etc
create table department (dept name char (20), building char (15), budget number
(12,2));
Database Access from Application Programs
SQL does not support actions such as
Input from users,
output to displays,
or communication over the network
Actions
must be written in a host language, such as C, C++, or Java, with
embedded SQL queries that access the data in the database
To access the database, DML statements need to be executed from the host
language. 2 ways:
By providing an application program interface – Open Database Connectivity
(ODBC) – commonly used API standard & Java Database Connectivity (JDBC)
standard
By extending the host language syntax to embed DML calls within the host language
program – a preprocessor, called the DML precompiler, converts the DML statements
to normal procedure calls in the host language.
Database Design
Designed to manage large bodies of information
These large bodies of information do not exist in isolation
mainly involves the design of the database schema
Design Process
Initial phase is to characterize fully the data needs of the prospective database users;
Outcome => user requirements specification.
Next, the designer chooses a data model, by applying the concepts of the chosen data
model, translates these requirements into a conceptual schema of the database
Designer reviews the schema for confirming the satisfaction of data requirements
Examine the design to remove any redundant features
W.r.t. to relational model, Also involves decision on “what” attributes to capture in
the database & “how to group” these attributes to form the various tables
Use the Entity-Relationship model Normalization
A fully-developed conceptual schema indicates the functional requirements of
the enterprise
In a specification of functional requirements, users describe the kinds of
operations (or transactions) that will be performed on the data
Process of moving
Abstract Data Implementation of
Model Database
Proceeds in 2 final design phases:
Logical-Design phase
designer maps the high-level conceptual schema onto the implementation data
model of the database system
Physical-Design phase
Resulting system-specific database schema is used in which the physical features
of the database are specified
Database Design for a University Organization
Refer the book and consider your own conceptual design
The Entity-Relationship Model
Entities are described in a database by a set of attributes
Form attributes of the
department entity set
A relationship
is an association among
several entities.
E.g. a member relationship
associates an instructor with
her department.
The set of all entities of the same type is termed an entity set
The set of all relationships of the same type is termed an relationship set.
Overall logical structure (schema) of a database can be expressed
graphically by an entity-relationship (E-R) diagram
In addition to entities and relationships, the E-R model represents certain
constraints to which the contents of a database must conform =>
mapping cardinalities
Expresses the number of entities to which another entity can be associated via a
relationship set
E.g. if each instructor must be associated with only a single department, the E-R
model can express that constraint.
Normalization
Goal: to store information without unnecessary redundancy
Approach: to design schemas that are in an appropriate normal form
To determine whether a relation schema is in one of the desirable normal forms,
additional information needed => to use functional dependencies
A bad database design have undesirable properties:
Repetition of information
Inability to represent certain information (Suppose we are creating a new department in the
university
Solution: introduce null values
Transaction Management
Several operations on the database form a single logical unit of work.
E.g., a fund transfer must happen in its entirety or not at all; this all-or-none
requirement is called atomicity.
In addition, it is essential that the execution of the funds transfer preserve the
consistency of the database; This correctness requirement is called
consistency
Finally, after the successful execution of a funds transfer, the new values of
the balances of accounts A and B must persist, despite the possibility of
system failure; This persistence requirement is called durability.
A transaction is a collection of operations that performs a single logical
function in a database application
Each transaction is a unit of both atomicity and consistency
Transactions should not violate any database consistency constraints =>
programmer’s responsibility to define various transactions correctly and
preserve consistency of the database
Database system’s responsibility for ensuring atomicity and durability
properties – specifically of Recovery manager
To ensure atomicity property, database system perform failure recovery =>
restore the database to the state that existed prior to the occurrence of the
failure.
What happens when several transactions update the database concurrently =>
responsibility goes to concurrent -control manager
The transaction manager consists of the concurrency-control manager and
the recovery manager
ACID PROPERTIES
Let’s take an example of a simple transaction. Suppose a bank
employee transfers Rs 500 from A's account to B's account. This very
simple and small transaction involves several low-level tasks
A’s Account
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
Database Architecture
Database Users and Administrators
Database Users and User Interfaces
Four different types of database-system users
Na¨ıve users – typical user interface for na¨ıve users is a forms interface,
Application programmers - choose from many tools to develop user interfaces.
Rapid application development (RAD) enable an application programmer to
construct forms and reports with minimal programming effort
Sophisticated users interact with the system without writing programs; Analysts who
submit queries to explore data in the database fall in this category
Specialized users write specialized database applications that do not fit into the
traditional data-processing framework. computer-aided design systems, knowledgebase
and expert systems, systems that store data with complex data types (for example,
graphics data and audio data),
Database Administrator
Central control over the system
Schema definition
Storage structure and access-method definition
Schema and physical-organization modification
Granting of authorization for data access
Routine maintenance.
RELATIONAL DATABASES
In a relational model, the terms
Relation is used to refer to a table
Tuple is used to refer to a row
Attribute refers to a column of a table
The term relation instance is used to refer to a specific instance of a
relation, i.e., containing a specific set of rows
The order in which tuples appear in a relation is irrelevant, since a relation
is a set of tuples
For each attribute of a relation, there is a set of permitted values, called the
domain of that attribute
For all relations r, the domains of all attributes of r be atomic
A domain is atomic if elements of the domain are considered to be
indivisible units
The null value is a special value that signifies that the value is unknown or
does not exist
Database Schema
Logicaldesign of the database
The concept of a relation schema corresponds to the programming-
language notion of type definition
Relation corresponds to variable names
Relation instance corresponds to value of variables
The schema for this relation is
department (dept name, building, budget)
Common attributes in relation schemas is one way of relating tuples of
distinct relations
To find the information about all the instructors who work in the Watson
building? (Exercise)
Let us continue with our university
database example
Each course in a university may be
offered multiple times, across
different semesters, or even within a
semester
Relation needed to describe each individual section, of the class are defined as
schema
section (course id, sec id, semester, year, building, room number, time slot id)
Relation needed to describe the association between instructors and the class
sections that they teach; the schema is
teaches (ID, course id, sec id, semester, year)
Create your own relation schemas for university database? (Exercise)
Keys
How tuples within a given relation are distinguished?
No two tuples in a relation are allowed to have exactly the same value for all
attributes
A superkey is a set of one or more attributes that, taken collectively, allow us to
identify uniquely a tuple in the relation
A superkey may contain extraneous attributes
E.g. the combination of ID and name is a superkey for the relation instructor.
If K is a superkey, then so is any superset
of K.
Superkeys for which no proper subset is
also a superkey
Such minimal superkeys are called
candidate keys
Suppose that a combination of name and
dept name is sufficient to distinguish
among members of the instructor relation.
We shall use the term primary key to denote a
candidate key that is chosen
by the database designer
A key (whether primary, candidate, or super) is a property of the entire
relation, rather than of the individual tuples
Any two individual tuples in the relation are prohibited from having the
same value on the key attributes at the same time.
A relation, say r1, may include among its attributes the primary key of
another relation, say r2
This attribute is called a foreign key from r1, referencing r2.
The relation r1 is also called the referencing relation of the foreign key
dependency, and r2 is called the referenced relation of the foreign key
In any database instance, given any tuple, say ta, from the instructor relation,
there must be some tuple, say tb, in the department relation such that the
value of the dept_name attribute of ta is the same as the value of the primary
key, dept name, of tb .
The constraint from section to teaches is an example of a referential
integrity constraint;
A referential integrity constraint requires that the values appearing in
specified attributes of any tuple in the referencing relation also appear in
specified attributes of at least one tuple in the referenced relation
Schema Diagrams
A database schema, along with primary key and foreign key
dependencies, can be depicted by schema diagrams
Foreign key dependencies appear as arrows from the foreign key
attributes of the referencing relation to the primary key of the referenced
relation.
Relational Query Languages
A query language is a language in which a user requests information from the
database
Usually has a level higher than that of a standard programming language
Categorized as procedural or non-procedural language
In a procedural language, the user instructs the system to perform a sequence of
operations on the database to compute the desired result
In a nonprocedural language, the user describes the desired information without
giving a specific procedure for obtaining that information
Relational Operations
All procedural relational query languages provide a set of operations that can be
applied to either a single relation or a pair of relations
Join operation
allows the combining of two relations by merging pairs of tuples, one from each
relation, into a single tuple
a number of different ways to join relations are available
Natural join
In general, the natural join operation on two relations matches tuples whose values
are the same on all attribute names that are common to both relations.
Cartesian product operation
combines tuples from two relations, but unlike the join operation, its result
contains all pairs of tuples from the two relations, regardless of whether their
attribute values match
As relations are sets, we can perform normal set operations on relations
Union operation
performs a set union of two “similarly structured” tables
Intersection operation
Set difference operation
create table department (dept _name varchar (20), building varchar (15), budget
number (12,2), primary key (dept name));
create table course (course id varchar (7), title varchar (50),
dept _name varchar (20), credits number (2,0), primary key (course id),
foreign key (dept name) references department);
create table instructor (ID varchar (5), name varchar (20) not null, dept name
varchar (20), salary number (8,2), primary key (ID), foreign key (dept name)
references department);
create table section (course id varchar (8), sec id varchar (8),
semester varchar (6), year number (4,0), building varchar (15),
room number varchar (7), time slot id varchar (4), primary key (course id, sec id,
semester, year), foreign key (course id) references course);
create table teaches (ID varchar (5), course id varchar (8), sec id varchar (8),
semester varchar (6), year number (4,0), primary key (ID, course id, sec id, semester,
year), foreign key (course id, sec id, semester, year) references section, foreign key
(ID) references instructor);
To force the elimination of duplicates
select distinct dept name from instructor;
SQL allows us to use the keyword all to specify explicitly that
duplicates are not removed
select all dept name from instructor;
Select clause may also contain arithmetic expressions involving the
operators +, −, ∗, and / operating on constants or attributes of tuples
select ID, name, dept name, salary * 1.1 from instructor;
The where clause allows us to select only those rows in the result relation of the
from clause that satisfy a specified predicate.
SQL allows the use of the logical connectives and, or, and not in the where
clause.
The operands of the logical connectives can be expressions involving the
comparison operators <, <=, >, >=, =, and <>.
Queries on Multiple Relations
Retrieve the names of all instructors, along with their department names and
department building name (exercise)
select name, instructor.dept name, building from instructor, department where
instructor.dept name = department.dept name;
The from clause by itself defines a Cartesian product of the relations listed in the
clause. It is defined formally in terms of set theory.
For example, the relation schema for the Cartesian product of relations
instructor and teaches is:
(instructor.ID, instructor.name, instructor.dept name, instructor.salary
teaches.ID, teaches.course id, teaches.sec id, teaches.semester,
teaches.year)
The Cartesian product by itself combines tuples from instructor and teaches
that are unrelated to each other.
Each tuple in instructor is combined with every tuple in teaches, even those
that refer to a different instructor.
The result can be an extremely large relation, and it rarely makes sense to
create such a Cartesian product.
Instead, the predicate in the where clause is used to restrict the combinations
created by the Cartesian product to those that are meaningful for the desired
answer.
We would expect a query involving instructor and teaches to combine a particular
tuple t in instructor with only those tuples in teaches that refer to the same instructor
to which t refers
select name, course id from instructor, teaches where instructor.ID= teaches.ID;
To find instructor names and course identifiers for instructors in the Computer
Science department
select name, course id from instructor, teaches where instructor.ID= teaches.ID
and instructor.dept name = ’Comp. Sci.’;
In general, the meaning of an SQL query can be understood as follows
1. Generate a Cartesian product of the relations listed in the from clause
2. Apply the predicates specified in the where clause on the result of Step 1.
3. For each tuple in the result of Step 2, output the attributes (or results of
expressions) specified in the select clause.
Natural join (visited again)
Cartesian Product Natural Join
Concatenates each tuple of the first relation Considers only those pairs of tuples with the
with every tuple of the second same value on those attributes that appear in
the schemas of both relations
SQL STRING FUNCTIONS
CONCAT - returns the result (a string) of concatenating two string values
Concat (char1/string1/column1, char2/string2/column2)
CHR – used to return the character having the binary equivalent to ‘n’ as a
varchar2 value
Chr (n)
LOWER – returns a specified character expression in lowercase letters
Lower (cExpression/string/column)
UPPER
LPAD – used to pad the left side of a string with a specific set of characters
Lpad(expr1, n[,expr2])
RPAD
LTRIM – used to remove all the specified characters from the left
end side of a string
LTRIM (string1 [,set])
RTRIM(string [,trimming text])
SUBSTR – returns the specified number of characters from a
particular position of a given string
Substr(char, position [,substring_length])
SQL NUMERIC FUNCTIONS
Abs(n)
Log(B,n) – to return the logarithm, base B of n
Mod(N,M)
Power(m,n)
Sqrt(n)
Round(m)
SQL AGGREGATE FUNCTIONS
MIN(column_name)
MAX(column_name)
COUNT(column_name)
AVG(column_name)
SUM(column_name)
SOME DATE/TIME FUNCTIONS AND
QUERIES
Select sysdate from dual;
Select current_date from dual;
Display the date 60 days before the current date:
Select sysdate-60 from dual;
Select months_between(date1, date2) from dual;
SQL Like
SELECT column1, column2, ... FROM table_name
WHERE columnN LIKE pattern;
different LIKE operators with '%' and '_' wildcards:
LIKE ‘a%’ => Finds any values that start with "a“
LIKE ‘%a’ => Finds any values that end with "a“
LIKE ‘_r%’ => Finds any values that have "r" in the second position
LIKE ‘%or%’ => Finds any values that have "or" in any position
SQL IN Operator
The IN operator allows you to specify multiple values in a WHERE clause.
SELECT column_name(s) FROM table_name
WHERE column_name IN (value1, value2, ...);
SELECT column_name(s) FROM table_name
WHERE column_name IN (SELECT STATEMENT);
The SQL BETWEEN Operator
The BETWEEN operator selects values within a given range. The values can
be numbers, text, or dates.
SELECT column_name(s) FROM table_name
WHERE column_name BETWEEN value1 AND value2;
The SQL ORDER BY clause
used to sort the data in ascending or descending order, based on one or more
columns
Syntax:
SELECT column-list FROM table_name [WHERE condition] [ORDER BY
column1, column2, .. columnN] [ASC | DESC];
SQL> SELECT * FROM CUSTOMERS ORDER BY NAME;
SQL> SELECT * FROM CUSTOMERS ORDER BY NAME DESC;
The SQL GROUP BY clause
used in collaboration with the SELECT statement to arrange identical data
into groups
GROUP BY clause follows the WHERE clause in a SELECT statement
and precedes the ORDER BY clause
SELECT column1, column2 FROM table_name WHERE [ conditions ]
GROUP BY column1, column2 ORDER BY column1, column2
If you want to know the total amount of salary on each customer, then the
GROUP BY query would be as follows:
SQL> SELECT NAME, SUM(SALARY) FROM CUSTOMERS GROUP BY
NAME;
NAME SUM(SALARY
)
Hardik 8500.00
kaushik 8500.00
Komal 4500.00
Muffy 10000.00
Ramesh 3500.00
SQL - Using Joins
usedto combine records from two or more tables in a database
A JOIN is a means for combining fields from two tables by using values
common to each
Consider the following tables: CUSTOMERS & ORDERS
Let us join these two tables in our
SELECT statement as shown below.
SQL> SELECT ID, NAME, AGE,
AMOUNT FROM CUSTOMERS,
ORDERS WHERE CUSTOMERS.ID
= ORDERS.CUSTOMER_ID;
Several operators can be used to join tables, such as =, <, >, <>, <=, >=, !=,
BETWEEN, LIKE, and NOT
SQL> SELECT ID, NAME, AMOUNT,
DATE FROM CUSTOMERS INNER JOIN
ORDERS ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;
Different types of joins available in SQL:
SQL - INNER JOINS / EQUI JOINS
returns rows when there is a match in both tables.
SELECT table1.column1, table2.column2... FROM table1 INNER JOIN table2 ON
table1.common_field = table2.common_field;
SQL LEFT (OUTER) JOIN
returns all rows from the left table, even if there are no matches in the right table
This means that a left join returns all the values from the left table, plus matched values
from the right table or NULL in case of no matching join predicate
SELECT table1.column1, table2.column2... FROM table1 LEFT JOIN table2 ON
table1.common_field = table2.common_field;
SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS LEFT JOIN
ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
SQL RIGHT (OUTER) JOIN
returns all rows from the right table, even if there are no matches in the left table
This means that a right join returns all the values from the right table, plus
matched values from the left table or NULL in case of no matching join
predicate.
SELECT table1.column1, table2.column2... FROM table1 RIGHT JOIN
table2 ON table1.common_field = table2.common_field;
SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS RIGHT
JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
SQL FULL JOIN
combines the results of both left and right outer joins
Syntax
SELECT table1.column1, table2.column2... FROM table1 FULL JOIN table2
ON table1.common_field = table2.common_field;
SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS FULL
JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
Natural Join in SQL
The associated tables have one or more pairs of identically named columns
The columns must be the same data type
Don’t use ON clause in a natural join.
Syntax: SELECT * FROM table1 NATURAL JOIN table2;
Sample table: foods Sample table: company
To get all the unique columns from foods and company tables, the following
SQL statement can be used:
SELECT * FROM foods NATURAL JOIN company;
COMPANY_ ITEM_ID ITEM_NAME ITEM_UNIT COMPANY_ COMPANY_
ID NAME CITY
Difference between natural join and inner join
number of columns returned
SQL | USING Clause
If several columns have the same names but the datatypes do not match,
the NATURAL JOIN clause can be modified with the USING clause to specify
the columns that should be used for an EQUIJOIN
USING Clause is used to match only one column when more than one column
matches.
NATURAL JOIN and USING Clause are mutually exclusive.
NATURAL JOIN uses all the columns with matching names and data types to join the
tables.
The USING Clause can be used to specify only those columns that should be used for
an EQUIJOIN.
Syntax: SELECT table1.column, table2.column FROM table1 JOIN
table2 USING (join_column1, join_column2…);
table1, table2 are the name of the tables participating in joining.
The natural join syntax contains the NATURAL keyword, the JOIN…USING
syntax does not.
An error occurs if the NATURAL and USING keywords occur in the same
join clause.
The JOIN…USING clause allows one or more equijoin columns to specify in
brackets after the USING keyword.
SQL - UNIONS CLAUSE
used to combine the results of two or more SELECT statements without returning any
duplicate rows
To use this UNION clause, each SELECT statement must have
The same number of columns selected
The same number of column expressions
The same data type and
Have them in the same order
Syntax
SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition] UNION
SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]
SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS
LEFT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID UNION
SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS
RIGHT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
SQL - INTERSECT Clause
used to combine two SELECT statements, but returns rows only from the first SELECT
statement that are identical to a row in the second SELECT statement
Syntax
SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]
INTERSECT
SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]
SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS
LEFT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID
INTERSECT
SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS
RIGHT JOIN ORDERS ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;
SQL - EXCEPT Clause
used to combine two SELECT statements and returns rows from the first SELECT
statement that are not returned by the second SELECT statement.
SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]
EXCEPT
SELECT column1 [, column2 ] FROM table1 [, table2 ] [WHERE condition]
SQL> SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS
LEFT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID
EXCEPT SELECT ID, NAME, AMOUNT, DATE FROM CUSTOMERS
RIGHT JOIN ORDERS ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
SQL - Having Clause
enables you to specify conditions that filter which group results appear in the
results
The WHERE clause places conditions on the selected columns, whereas the
HAVING clause places conditions on groups created by the GROUP BY clause
Syntax
SELECT FROM WHERE GROUP BY HAVING ORDER BY
SELECT column1, column2 FROM table1, table2 WHERE [ conditions ]
GROUP BY column1, column2 HAVING [ conditions ] ORDER BY
column1, column2
SQL > SELECT ID, NAME, AGE, ADDRESS, SALARY FROM
CUSTOMERS GROUP BY age HAVING COUNT(age) >= 2;
NESTED SUBQUERIES
Set Membership
SQL allows testing tuples for membership in a relation
The ’ in’ connective tests for set membership, where the set is a
collection of values produced by a select clause.
The ‘not in’ connective tests for the absence of set membership.
Find all the courses taught in the both the Fall 2009 and Spring
2010 semesters?
Using intersect
(select course_id from section where semester = ’Fall’ and year= 2009)
intersect
Using
(select in connective
course _id from section where semester = ’Spring’ and year= 2010);
Write the subquery
(select course_id from section where semester = ’Spring’ and year=
2010)
We then need to find those courses that were taught in the Fall 2009 and that
appear in the set of courses obtained in the subquery.
We do so by nesting the subquery in the where clause of an outer query.
select distinct course_id from section where semester = ’Fall’ and year=
2009 and course_id in (select course_id from section
where semester = ’Spring’ and year= 2010);
Find all the courses taught in the Fall 2009 semester but not in the
Spring 2010 semester? Exercise, Hint: use ‘not in’
Set Comparison
Find the names of all instructors whose salary is greater than at
least one instructor in the Biology department
Using
‘SOME’ & ‘ANY’ subquery predicates
SOME returns true if some row of the group matched the condition
The phrase “greater than at least one” is represented in SQL by ‘>
some’
select name from instructor where salary > some (select salary from
instructor where dept_name = ’Biology’);
SQL also allows < some, <= some, >= some, = some, and <> some
comparisons.
Find the names of all instructors that have a salary value greater
than that of each instructor in the Biology department?
The construct > all corresponds to the phrase “greater than all.”
As it does for some, SQL also allows < all, <= all, >= all, = all, and <>
all comparisons
select name from instructor where salary > all (select salary from
instructor where dept_name = ’Biology’);
Find the departments that have the highest average salary
select dept_name from instructor group by dept_name having avg
(salary) >= all (select avg (salary) from instructor group by dept_name);
Test for Empty Relations
SQL includes a feature for testing whether a subquery has any tuples in
its result.
The exists construct returns the value true if the argument subquery is
nonempty
Find all courses taught in both the Fall 2009 semester and in the
Spring 2010 semester?
select course_id from section as S where semester = ’Fall’ and year= 2009
and exists (select * from section as T where semester = ’Spring’ and
year= 2010 and S.course id= T.course id);
Test for Empty Relations
SQL includes a feature for testing whether a subquery has any tuples in
its result
The exists construct returns the value true if the argument subquery is
nonempty.
Find all courses taught in both the Fall 2009 semester and in the
Spring 2010 semester?
select course_id from section as S where semester = ’Fall’ and year=
2009 and exists (select * from section as T where semester = ’Spring’
and year= 2010 and S.course id= T.course id);
A correlation name from an outer query (S in the above query), can
be used in a subquery in the where clause.
A subquery that uses a correlation name from an outer query is
called a correlated subquery
We can test for the nonexistence of tuples in a subquery by using
the not exists construct
Find all students who have taken all courses offered in the Biology
department.
Using the except construct, we can write the query as follows:
select distinct S.ID, S.name from student as S where not exists
((select course id from course where dept _name = ’Biology’)
Except (select T.course id from takes as T where S.ID = T.ID));
DISPLAYING DATA FROM
MULTIPLE TABLES USING JOINS
Join Conditions
A join queries must have contained at least one join
condition, either in the FROM clause or in the WHERE
clause
The join condition compares two columns from two different
tables
Equijoins
a join with a join condition containing an equality operator
This join retrieves information by using equality condition.
RELATIONAL ALGEBRA
A procedural query language used to query the database tables to
access data in different ways
In relational algebra, Input is a relation(table from which data has to be accessed)
and
Output is also a relation(a temporary table holding the data asked for by the user)
The primary operations that we can perform using relational algebra
are:
Select
Project
Union
SetDifferent
Cartesian product
Rename
Select Predicate
Select Operation (σ)
usedto fetch rows(tuples) from table(relation) which satisfies a given
condition
Syntax: σp(r) Prepositional Logic, where we specify the conditions that
must be satisfied by the data
In prepositional logic, one can use unary and binary operators like =, <, > etc,
to specify the conditions
σage > 17 (Student)
σage > 17 and gender = 'Male' (Student)
Project Operation (∏)
Used to project only a certain set of attributes of a relation
It will only project or show the columns or attributes asked for, and will also
remove duplicate data from the columns.
Syntax: ∏A1, A2...(r) where A1, A2 etc are attribute names(column names).
∏Name, Age(Student)
Union Operation (∪)
Used to fetch data from two relations(tables) or temporary relation(result of
another operation).
For this operation to work, the relations(tables) specified should have same number of
attributes(columns) and same attribute domain
Also the duplicate tuples are automatically eliminated from the result.
Syntax: A ∪ B where A and B are relations
∏dept_name(instructor) ∪ ∏dept_name(department)
Above operation will give us name of departments present in both instructor and
department relations, eliminating repetition.
Set Difference (-)
used to find data present in one relation and not present in the second relation
Syntax: A – B
∏dept_name(instructor) - ∏ dept_name(department)
Cartesian Product (X)
used to combine data from two different relations(tables) into one and fetch data
from the combined relation
Syntax: A X B
Exercise ?
Rename Operation (ρ)
used to rename the output relation for any query operation which returns result
like Select, Project etc. Or to simply rename a relation(table)
Syntax: ρ(RelationNew, RelationOld)
DATABASE DESIGN
ENTITY-RELATIONSHIP MODEL
Developed to facilitate database design by allowing specification of an
enterprise schema that represents the overall logical structure of a database.
is a model used for design and representation of relationships between data.
E-R data mode employs three basic concepts:
entity sets,
relationship sets, and
attributes
Entity Sets
An entity is a “thing” or “object” in the real world that is distinguishable
from all other objects.
E.g. student is an entity, instructor, course, section, etc. are all entities
It has a set of properties, and the values for some set of properties may
uniquely identify an entity
An entity set is a set of entities of the same type that share the same
properties, or attributes
If a Student is an Entity, then the complete dataset of all the students will be
the Entity Set
An entity is represented by a set of attributes
If a Student is an Entity, then student's roll no., student's name, student's age,
student's gender etc will be its attributes
An attribute can be of many types, here are different types of attributes
defined in ER database model:
For each attribute, there is a set of permitted values, called the domain, or
value set, of that attribute
Composite attributes help
us to
group together related
Simple attribute: The attributes with attributes,
values that aremaking
atomictheand cannot
modeling cleaner
be broken down further are simple attributes. For example,
student's age.
Composite attribute: A composite attribute is made up of more than one
simple attribute. For example, student's address will contain, house
no., street name, pincode etc.
Derived attribute: These are the attributes which are not present in the
whole database management system, but are derived using other attributes.
For example, average age of students in a class.
As another example, suppose that the instructor entity set has an attribute
age that indicates the instructor’s age; consider another attribute DOB
(stored /base attribute)
Single-valued attribute: As the name suggests, they have a single value.
E.g. Student _ID attribute for a specific student entity refers to only one
student ID.
Multi-valued attribute: And, they can have multiple values.
E.g. Phone_number attribute
A relationship is an association among several entities;
When an Entity is related to another Entity, they are said to have a
relationship
A relationship set is a set of relationships of the same type.
It is a mathematical relation on n ≥ 2 (possibly non-distinct) entity sets. If E1, E2,
. . . , En are entity sets, then a relationship set R is a subset of
advisor
{(e1, e2, . . . , en) | e1 ∈ E1, e2 ∈ E2, . . . , en ∈ En}
where (e1, e2, . . . , en) is a relationship
The association between entity sets is referred to as participation; i.e., the
entity sets E1, E2, . . . , En participate in relationship set R
A relationship instance in an E-R schema represents an association between
the named entities in the real-world enterprise that is being modeled
A relationship may also have attributes called descriptive attributes
Consider a relationship set advisor with entity sets instructor and student.
We could associate the attribute date with that relationship to specify the date
when an instructor became the advisor of a student
A relationship instance in a given relationship set must be uniquely
identifiable from its participating entities, without using the descriptive
attributes.
We cannot represent multiple dates by multiple relationship instances between
the same instructor and a student, since the relationship instances would not be
uniquely identifiable using only the participating entities
It is possible to have more than one relationship set involving the same entity
sets (Exercise?)
Depending upon the number of entities involved, a degree is assigned to
relationships
For example, if 2 entities are involved, it is said to be Binary relationship, if
3 entities are involved, it is said to be Ternary relationship, and so on.
Constraints
Mapping Cardinalities
Or cardinality ratios, express the number of entities to which another entity
can be associated via a relationship set
Most useful in describing binary relationship sets
For a binary relationship set R between entity sets A and B, the mapping
cardinality must be one of the following:
Participation Constraints
The participation of an entity set E in a relationship set R is said to be total if every
entity in E participates in at least one relationship in R
Partial – if only some entities in E participates in relationships in R
Keys
The primary key of an entity set allows us to distinguish among the various entities of
the set
Keys also help to identify relationships uniquely, and thus distinguish relationships
from each other
Similar mechanism needed to distinguish among the various relationships of a
relationship set.
Let R be a relationship set involving entity sets E1, E2, . . . , En.
Let primary key(Ei ) denote the set of attributes that forms the primary key for entity
set Ei
The composition of the primary key for a relationship set depends on the
set of attributes associated with the relationship set R.
If the relationship set R has no attributes associated with it, then the set of
attributes
primary-key (E1) ∪ primary-key (E2) ∪ ·· · ∪ primary-key (En)
describes an individual relationship in set R.
If the relationship set R has attributes a1, a2, . . . , am associated with it,
then the set of attributes
primary-key (E1) ∪ primary-key (E2) ∪ · · · ∪ primary-key (En) ∪ {a1, a2, . . . ,
am}
describes an individual relationship in set R.
The structure of the primary key for the relationship set depends on the
mapping cardinality of the relationship set
Forms a super key for the relationship set
Removing Redundant Attributes in Entity Sets
To design a database using the E-R model,
1st identify the entity set
Once entity sets are decided, then choose the appropriate attributes
These attributes represent the various values to be captured in the database
After choosing the entity sets and their corresponding attributes, relationship sets
among the various entity sets are formed
These relationship sets may result in a situation where attributes in the various
entity sets are redundant and need to be removed from the original entity sets.
Consider the entity sets instructor and department:
Model the fact that each instructor has an associated department using a relationship
set inst_dept relating instructor and department
Primary key for the department relation where it is redundant in the entity
set instructor and needs to be removed.
Removing the attribute “dept_name” is rather unintuitive
When we create a relational schema from the E-R diagram, the attribute
dept_name gets added to the relation instructor, but only if each
instructor has at most one associated department.
If an instructor has more than one associated department, the relationship
between instructors and departments is recorded in a separate relation
inst_dept
Treating the connections between “instructor” and “department”
uniformly as a relationship makes the logical relationship explicit
Helps avoid an early assumption that each instructor is associated with
only one department
Similarly the student entity set is related to the department entity set
through the relationship set student_dept and thus there is no need for a
dept_name attribute in student.
A good entity-relationship design does not contain redundant attributes
For our university example, we list the entity
sets and their attributes below, with primary
keys underlined:
The relationship sets in our design are listed below:
ENTITY-RELATIONSHIP DIAGRAMS
Basic Structure
An E-R diagram consists of the following major components:
Rectangles divided into two parts
Diamonds represent relationship sets
Undivided rectangles represent the attributes of a relationship set
Lines link entity sets to relationship sets
Dashed lines link attributes of a relationship set to the relationship set
Double lines indicate total participation of an entity in a relationship set.
Double diamonds represent identifying relationship sets linked to weak entity sets
Mapping Cardinality
One-to-one:
Draw a directed line from the relationship set
advisor to both entity sets instructor and student
One-to-many:
Draw a directed line from the relationship set
advisor to the entity set instructor and an
undirected line to the entity set student
Many-to-one:
Draw an undirected line from the relationship set
advisor to the entity set instructor and a directed
line to the entity set student
The limit 0..∗ on the line between
advisor and instructor indicates that
an instructor can have zero or more
students.
Many-to-many:
If both edges have a maximum
Draw an undirected line from the relationship set value of 1, the relationship is
advisor to both entity sets instructor and student one-to-one
The line between advisor and student has a cardinality constraint of 1..1,
meaning the minimum and the maximum cardinality are both 1
What is the cardinality limit of 1…* specify? (Exercise)
Complex Attributes Roles
Roles in E-R diagrams are indicated by
labeling the lines that connect diamonds to
rectangles.
Nonbinary Relationship Sets
Weak Entity Sets
Suppose we create a relationship set sec_course between entity sets section and
course.
An entity set that does not have sufficient attributes to form a primary key is
termed a weak entity set
An entity set that has a primary key is termed a strong entity set.
For a weak entity set to be meaningful, it must be associated with another entity
set, called the identifying or owner entity set.
Discriminator
underlined with
a dashed line
The identifying entity set is said
to own the weak entity set that it
identifies
The relationship associating the weak
entity set with the identifying entity
set is called the identifying
relationship
Every weak entity
must be associated
with an identifying
entity;
The weak entity set is
said to be existence
dependent on the
identifying entity set.
The identifying relationship set should not have any descriptive attributes, since
any such attributes can instead be associated with the weak entity set.
The identifying relationship is many-to-one from the weak entity set to the
identifying entity set
The participation of the weak entity set in the relationship is total
How to distinguish among all those entities in the weak entity set that depend on
one particular strong entity?
Discriminator – set of attributes that allow the distinction
Also called as the partial key of the entity set
E.g., in ‘section’ weak entity set -> sec_id, year, & semester attributes uniquely
identifies one single section for that course
Primary key of the weak entity set is formed by the primary key of the
identifying entity set + discriminator
A weak entity set
Can participate in relationships other than the identifying relationship
May participate as owner in an identifying relationship with another weak entity
set
May have more than one identifying entity set
E-R diagram for the University Enterprise
From the diagram followed, find out the components needed for showing the
constraints
– each instructor must have exactly one associated department.
– each instructor can have at most one associated department
Identify the descriptive attributes present in the relationship set
Reduction to Relational Schemas
For each entity set and for each relationship set in the database design, there
is a unique relation schema to which we assign the name of the
corresponding entity set or relationship set
Both the E-R model & Relational Schema are abstract, logical
representations of real-world enterprises
Representation of Strong Entity Sets with Simple Attributes
Let ‘E’ be a strong entity set with only simple descriptive attributes a1, a2, …, an =>
represented by a schema called E with n distinct attributes
Each tuple corresponds to this E by one entity
The primary key of the entity set serves as the primary key of the resulting schema
E.g., entity set ‘student’ from the E-R diagram with 3 attributes: ID, name, tot_cred =>
represented by a schema: student(ID, name, tot_cred)
Schemas derived from strong entity sets:
classroom (building, room number, capacity)
department (dept name, building, budget)
course (course id, title, credits)
instructor (ID, name, salary)
student (ID, name, tot cred)
Representation of Strong Entity Sets with Complex Attributes
We handle composite attributes by creating a separate attribute for each of the
component attributes
We do not create a separate attribute for the composite attribute itself.
‘name’ attribute in ‘instructor’ relation => the schema generated for instructor
contains the attributes first_name, middle_name and last_name; there is no
separate attribute / scheme for name
instructor (ID, first name, middle name, last name, street number, street name,
apt number, city, state, zip code, date of birth)
Multivalued attributes are treated differently from other attributes
Attributes in an E-R diagram generally map directly into attributes for the appropriate
relation schemas
Multivalued attributes, however, are an exception => new relation schemas are created
for these attributes
instructor_phone (ID, phone number)
Each phone number of an ‘instructor’ is represented as a unique tuple in the relation on
this schema
For a multivalued attribute M, we create a relation schema R with an attribute A that
corresponds to M and attributes corresponding to the primary key of the entity set or
relationship set of which M is an attribute
In addition, we create a foreign-key constraint on the relation schema created from the
multivalued attribute, with the attribute generated from the primary key of the entity set
referencing the relation generated from the entity set.
Derived attributes are not explicitly represented in the relational data model
Representation of Weak Entity Sets
Let A be a weak entity set with attributes a1 , a2 , . . . , am. Let B be the strong entity set
1 2
on which A depends
Let the primary key of B consist of attributes b1, b 2, . . . , bn
2
We represent the entity set A by a relation schema called A with one attribute for each
member of the set:
For schemas derived from a weak entity set, the combination of the primary key of the
strong entity set and the discriminator of the weak entity set serves as the primary key
of the schema
In addition to creating a primary key, we also create a foreign-key constraint on the
relation A, specifying that the attributes b1, b2, . . . , bn reference the primary key of
the relation B.
Considering the weak entity set ‘section’
The primary key of the course entity set, on which section depends, is
course_id
Thus, we represent section by a schema with the following attributes:
section (course_id, sec_id, semester, year)
The primary key consists of the primary key of the entity set course,
along with the discriminator of section, which is sec_id, semester, and
year.
Foreign key constraint on the section schema, with the attribute course
id referencing the primary key of the course schema, and the integrity
constraint “on delete cascade”
When a referential-integrity constraint is violated, the normal procedure is to reject the action
that caused the violation
However, a foreign key clause can
specify that if a delete or update action
on the referenced relation violates the
constraint, then, instead of rejecting
the action, the system must take steps
to change the tuple in the referencing
relation to restore the constraint
on delete cascade - if a delete of a tuple in department results in this
referential-integrity constraint being violated, the system does not reject the
delete
Instead, the delete “cascades” to the course relation, deleting the tuple that
refers to the department that was deleted.
Representation of Relationship Sets
Let R be a relationship set
Let a1, a2, . . . , am be the set of attributes formed by the union of the primary
keys of each of the entity sets participating in R
Let the descriptive attributes (if any) of R be b1, b2, . . . , bn.
We represent this relationship set by a relation schema called R with one
attribute for each member of the set
{a1, a2, . . . , am} ∪ {b1, b2, . . . , bn}
How to choose a primary key for a binary relationship set?
For a binary many-to-many relationship, the union of the primary-key attributes from
the participating entity sets becomes the primary key
For a binary one-to-one relationship set, the primary key of either entity set can be
chosen as the primary key. The choice can be made arbitrarily.
For a binary many-to-one or one-to-many relationship set, the primary key of the entity
set on the “many” side of the relationship set serves as the primary key.
For an n-ary relationship set without any arrows on its edges, the union of the primary
key-attributes from the participating entity sets becomes the primary key
For an n-ary relationship set with an arrow on one of its edges, the primary keys of the
entity sets not on the “arrow” side of the relationship set serve as the primary key for
the schema
How to create foreign-key constraints on the relation schema R?
For each entity set Ei related to relationship set R, we create a foreign-key constraint
from relation schema R, with the attributes of R that were derived from primary-key
attributes of Ei referencing the primary key of the relation schema representing Ei .
Consider the relationship set advisor in the E-R diagram
Example:
Since the relationship set has no attributes, the advisor schema has two attributes, the
primary keys of instructor and student.
Since both attributes have the same name, we rename them i_ID and s_ID
Since the advisor relationship set is many-to-one from student to instructor the primary
key for the advisor relation schema is s_ID.
Two foreign-key constraints on the advisor relation with attribute i_ID referencing the
primary key of instructor and attribute s_ID referencing the primary key of student.
The schemas derived from a relationship set are depicted as follows:
teaches (ID, course id, sec id, semester, year)
takes (ID, course id, sec id, semester, year, grade)
prereq (course id, prereq id)
advisor (s ID, i ID)
sec course (course id, sec id, semester, year)
sec time slot (course id, sec id, semester, year, time slot id)
sec class (course id, sec id, semester, year, building, room number)
inst_dept (ID, dept name)
stud dept (ID, dept name)
course dept (course id, dept name)
Redundancy of Schemas
Relationshipset linking a weak entity set to the corresponding strong entity
set => many-to-one relationships; no descriptive attributes present
Primary key for weak entity set section is {course_id, sec_id, semester, year}
Since sec course has no descriptive attributes, the sec course schema has
Includes
attributes course_id, sec_id, semester, andprimary
year. key of strong entity set
The schema for the entity set section includes the attributes course_id, sec
_id, semester, and year (among others).
Every (course_id, sec_id, semester, year) combination in a
sec_ course relation would also be present in the relation on
schema section, and vice versa.
Combination of Schemas
Consider a many-to-one relationship set AB from entity set A to entity set B.
instructor (ID, name, salary) => A
department (dept_name, building, budget) => B
inst_dept(ID, dept_name)
inst_dept and instructor schemas can be combined and now the
instructor (ID, name, dept_name, salary) has these attributes.
Similarly, we can combine the schemas for the relationship sets
stud_dept
course_dept
sec_class
sec_time_slot
In the case of one-to-one relationships, the
relation schema for the relationship set can be
combined with the schemas for either of the
entity sets
We can combine schemas even if the
participation is partial by using null values.
Entity-Relationship Design Issues
CREATE TABLE sales (
customer_id NUMBER,
product_id NUMBER,
order_date DATE NOT NULL,
total NUMBER(9,2) DEFAULT 0 NOT NULL,
PRIMARY KEY(customer_id,
product_id,
order_date)
);