Unit 1 and 2 PDF
Unit 1 and 2 PDF
By
Bhupendra Singh Saud
For B.Sc. CSIT 7th Semester TU
Course of Contain
Unit 1: The Relational Model of Data and RDBMS Implementation Techniques [5 Hrs.]
Theoretical concepts, Relational model conformity and Integrity, Advanced SQL programming,
Query optimization, Concurrency control and Transaction management, Database performance
tuning, Distributed relational systems and Data Replication, Security considerations.
Unit 2: The Extended Entity Relationship Model and Object Model [6 Hrs.]
The ER model revisited, Motivation for complex data types, User defined abstract data types and
structured types, Subclasses, Super classes, Inheritance, Specialization and Generalization,
Constraints and Characteristics of specialization and Generalization, Relationship types of degree
higher than two, Relational database design by EER- to relational mapping, basic concepts on
UML.
Prerequisite:
• Be familiar with at least one OO Programming language such as .Net or C++ or Java,
• Fundamentals of DBMS, SQL
Reference Books:
1. Elmasri and Navathe, Fundamentals of Database Systems, Pearson Education
2. Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill
3. Korth, Silberchatz, Sudarshan, Database Systems, Design, Implementation and
Management, Thomson Learning
4. C.J. Date & Longman, Introduction to Database Systems, Pearson Education
Computer Usage:
Windows or Linux based PC or workstation, Commercial OODBMS software package and MVC
software development framework installed at the server.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 2
Unit 1
Theoretical Concept of overall database
Data
The collection of raw facts is called data.
File
A file is the collection of related groups of data for example, payroll file of a company consists of
the salary detail and all of these records have the same heads (e.g.; basic pay, HRA, FA etc.).
Records
A file may be further divided into more descriptive subdivisions, called, records. In other words
a record is a collection of related data items as a single unit. It is also called column of table.
Tuples
The row of a table is called tuples. It is also called value of a table.
Fields
The column of a table is called fields.
Database
A database is a collection of related data necessary to manage an organization. By data, we mean
known facts that can be recorded and have implicit meaning. For example, consider names,
telephone numbers and addresses of the people. We may have recorded this data in an indexed
address book, or we may have recorded on the hard drive, using a personal computer and software
such as MS-Access, or Excel. This is the collection of related data with an implicit meaning and
hence is a database. A database is logically coherent collection of data with some inherent
meaning. A database is designed, built and populated with data for specific purpose. It excludes
transient data such as: input documents, reports and intermediate results obtained during
processing.
DBMS
A database management system is a set of procedures that manages the database and provide
access to the database in the form required by any application program. It effectively ensures that
necessary data in the desired form is available for diverse applications for different departments in
an organization. A DBMS is hence a general purpose software system that facilitates the processes
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 3
of defining, constructing, manipulating, and sharing database among various users and
applications. Fig 1 illustrate the difference between database and database management system.
Application1
Database
management Database
Application2
system
Application3
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 4
History
Integrated data store, first general purpose DBMS, early 1960s, Charles Bachman, general
electrics.
Information management system (IMS), late 1960s, IBM.
Relational database model, proposed in 1970 Edgar Codd, IBM's san Jose laboratory.
Structured Query language (SQL) standardized in the late 1980s.
More powerful query language, complex analysis of data, support for new data types late
1980s to 1990s.
Packages which came with power customizable application layers.
Internet.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 5
As the dependence of an organization on the database increase, positive management
controls should be exercised over addition, deletion, change and disposition of data. Data
must be protected to satisfy legal, accounting and auditing requirements.
4. Atomicity problem
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 6
In computer, any electrical or mechanical system may fail. When failure has occurred, it is
ensured that has been detected and restored to the consistent state that existed prior to the
failure. This problem cannot solved by file processing system for example a program to
transfer Rs1000 from account A to B. if the system failure occurs during the execution of
program, it is possible that Rs1000 was removed from account A but was not credited to
account B, resulting in an inconsistent database state. It is essential to database consistency
that either both the credit and debit occur or that neither occurs i.e. funds transfer must be
atomic. It must happen in its entirety or not at all .it is difficult to ensure this property in a
file processing system.
7. Security problem
In file processing system there is no security method. But in database system unauthorized
person cannot see the data. For example in bank account, there may be number of accounts
and only access the information about particular customers.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 7
Providing multiple user interfaces.
Representing complex relationships among data.
Enforcing integrity constraints.
DBMS Languages
Data Definition Language (DDL): Used by the DBA and database designers to specify
the conceptual schema of a database. In many DBMSs, the DDL is also used to define
internal and external schemas (views). In some DBMSs, separate storage definition
language (SDL) and view definition language (VDL) are used to define internal and
external schemas.
Data Manipulation Language (DML): Used to specify database retrievals and updates.
o DML commands (data sublanguage) can be embedded in a general-purpose
programming language (host language), such as COBOL, C or an Assembly
Language.
o Alternatively, stand-alone DML commands can be applied directly (query
language).
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 8
High Level or Non-procedural Languages: e.g., SQL, are set-oriented and specify what
data to retrieve than how to retrieve. Also called declarative languages.
Low Level or Procedural Languages: record-at-a-time; they specify how to retrieve data
and include constructs such as looping.
View of data
The main purpose of a database is to provide users with an abstract view of the data, i.e. the system
hides certain details of how the data are stored and maintained. This is called data abstraction.
Three level architecture (ANSI / SPARC Architecture)
View level.
Logical / conceptual level.
Physical level.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 9
3. Physical level (Internal level)
It is the lower level of abstraction, describes how data are stored. It describes the physical
storage characteristics of the data. Such as storage space allocation, indexes, It is
independent of any particular application software. It also describes how the tables are
stored, how many bytes / attributes it allocated etc.
Database schema
It is the structure of the DB that captures data types, relationships, constraints on the data. It is
independent of any application program. It may change frequently. It is similar to types and
variables in programming language.
Mapping
Conceptual / Internal Mapping
It defines the correspondence between the conceptual view and the stored database. It specifies
how conceptual records and file are represented at the internal level. If a change is made to the
storage structure definition then the conceptual / internal mapping must be changed accordingly
so that the conceptual schema can remain invariant. In other words, the effect of such changes
must be isolated below the conceptual level, in order to preserve physical data independence.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 10
conceptual fields can be combined into a single external fields and so on. Any number of external
views can exist at the same time any number of users can share a given external view, different
external views can overlap.
Data independence
When a schema at a lower level is changed, only the mapping between this schema and higher
level schemas need to be changed in a DBMS that fully supports data independence. The higher
level schemas themselves are unchanged. Hence, the application programs need be changed since
they refer to the external schemas logical data independence. The three schema architecture can
be used to further explain the concept of data independence, which can be defined as the capacity
to change the schema at one level of database system without having to change the schema at the
next higher level. There are two types of data dependence.
Logical data independence is more difficult to achieve than in physical data independence,
since the application programs are heavily dependent on the logical structure of data that they
access.
The concept of data independence is similar in many respects to the concept of abstract data
types in modern programming language. Both hide implementation details from the users to
concentrate on the general structure, rather than on low level implementation details.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 11
Data dictionary
The result of compilation of DDL statement is a set of tables that is stored in a special file called
data dictionary. A data dictionary is a file that contains metadata that is data about data. This file
is consulted before actual data are read or modified in a database system.
Data model
A collection of conceptual tools for describing data, data relationships, data semantics and
consistency constraints is called data model. The various data model that have been proposed fall
into three different groups.
Object based logical models.
Record based logical models.
Physical model
Entity:
An entity is a “thing” or “object” in the real word that is distinguishable from other objects. For
example each person is an entity. An entity has a set of properties and the values for some set of
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 12
properties may uniquely identify an entity. For example, if student is an entity, is identified by
registration number. It is represented by rectangle.
Entity
Attribute:
Attributes are properties possessed by on entity or relationship. For example stu_no,
stu_namestu_sub are the attributes of the entity student. Attribute is represented by ellipse.
Attribute
Relationship:
A relationship is an association among several entities and represents meaningful
dependencies between them. For example the association between teachers and students is
teaching. It is represented by diamond.
Relationship
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 13
1. Relational model
In June 1970, Dr. E.F. codd published a paper entitled. A relational model of data for large shared
data banks. This relational model of data for large shared data banks. This relational model,
sponsored by IBM, then came to be accepted as the definitive model for relational database
management system (RDBMS).
The relational model uses a collection of tables to represent both data and the relationship among
those data. Each table has multiple columns, and each column has a unique name .The relational
database is relatively new. The first database system was based of either a network model or
hierarchical model. This model greatly improves the flexibility of the database management
system. It has established itself as a primary data model for commercial data processing
application.
2. Network Model
Data in the network model are represented by collection of records and the relationships among
data are represented by links which can be viewed as pointers. The records in the database are
organized as collection of arbitrary graphs.
It is an improvement of hierarchical model. Here multiple parent-child relationship is used. The
network approach allows us to build up many to many correspondences that mean each child can
have more than one parent. This model is more versatile and flexible than the hierarchical model.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 14
A record is in many respects similar to an entity in the entity relationship model. Each record is a
collection of fields (attributes). Each of which contains only one data value. A link is an association
between precisely two records.
A data structure diagram is a schema representing the design of network database. Such a diagram
consists of two basic components, boxes, which correspond to record types, and lines which
correspond to links.
3. Hierarchical Model
The hierarchical model is a similar to the network model in the sense that data and relationship
among data are represented by records and links respectively. It differs from the network models
are that the records are organized as collection of trees rather than arbitrary graphs.
This model is introduced in the information management system (IMS) depend by IBM in 1968.
The top level of data is parent or root and other are sub-root or branches which may have
subdivision.
Science
Zoology Botany
Tree structure diagram is the schema for a hierarchical database such a diagram consists of two
basic components, boxes which correspond to record types and lies, which correspond to links.
Transaction
A transaction is a collection of operations that performs a single logical function in a database
application. Each transaction is a unit of bath atomicity and consistency. Thus, we require that
transactions do not violate any data base consistency constraints .i.e. if the database was consistent
when the transaction stated, the database must be consistent when the transaction successfully
terminated outing the execution of transaction it may be necessary temporarily to allow
inconsistency. This temporary inconsistency although necessary
Storage management
Database requires large amount of storage space. It is measured in gigabytes or terabytes.
The large amount of data cannot store in main memory so data are moved from main
memory to secondary memory as needed and vice versa. It is important that the data base
system structure the data so as to minimize the need to move data between disk and main
memory.
The goal of database system is to simplify and facilitate access to data high level views
help to active this goal.
Storage manager is a program module that provides the interface between the low level
data stored in the database and the application programs and queries submitted to the
system.
The storage manager is responsible for the interaction with the file manager
The raw data are stored on the disk using the file system, which is usually provided by a
conversion operating system.
The storage manager translates the various DML statements into low-level file system
commands. Thus the storage manager is responsible for storing, retrieving and updating of
data in the database.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 17
Database Users
The main goal of database system is to provide an environment for retrieving information from
and storing new information into the database. These are four types of database users.
1. Application Programmers
Application programmers are persons who write the program in HLL Such as C, COBOL, PL/1,
Pascal etc. They prepare program for bank, hospital, school etc.
2. Sophisticated users
They interact with system without writing programs. They form their requests in the database
query language. Each such query is submitted to query processor. Whose function is to break
down DML statement into instructions that the storage manager understands. The analyst who
submits queries to explore data in a database is called sophisticated users.
3. Specialized Users
They are sophisticated users who write specialized database applications that do not fit into
traditional data processing framework. Among these applications are computer aided design
system, knowledge base and export system. System that store data with complex data types eg
graphic data, audio data etc.
4. Naïve Users
They are unsophisticated users who interact with system by invoking one of the permanent
application programs that have been written previously. eg. a bank teller who needs to transfer
$50 from account A to account B invokes a program called transfer. This program asks the teller
for the amount of money to be transferred, the account from which the money to be transferred,
and the account to which the money is to be transferred
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 18
Which tests for the satisfaction of integrity constraints and checks the authority of users to
access data
2. Transaction Manager
Which ensures that the database remains in a consistent (correct) state despite system failures,
and that current transition execution proceed without conflicting
3. File manager
Which manages the allocation of space on disk storage and that concurrent transaction
execution proceed without conflicting.
4. Buffer Manager
Which is responsible for fetching data from disk storage into main memory, and deciding what
data to cache in main memory. The manager is critical part of the database system, since it
enables the database to handle data sizes that are much larger than the size of main memory.
The storage manager implements several data structures as part of the physical system
implementation.
Data files
Which store the database itself.
Data Dictionary
Which stores metadata about the structure of the database in particular the schema of the
database.
Indices
Which provides fast access to data items that hold particular values.
Statistical data
Which stores statistical information about data in the database. This information is used by
query processor to select efficient ways to execute a query.
Relational Model
The relational model is today the primary model for commercial data processing applications. It
has attained its primary position because of its simplicity as compared to earlier data models such
as the network model or the hierarchical model. It is a lower level model that uses a collection of
tables (also called relations) to represent both data and the relationship among those data. A table
of values is called relation. A relation may be thought of as a set of rows. A relation may
alternately be thought of as a set of columns. Each row represents a fact that corresponds to a real-
world entity or relationship. Each row has a value of an item or set of items that uniquely
identifies that row in the table. Sometimes row-ids or sequential numbers are assigned to identify
the rows in the table. Each column typically is called by its column name or column header or
attribute name. Each table has multiple columns and each column has a unique name.
Example: RDBMS
Formal Definitions
A Relation may be defined in multiple ways.
The Schema of the form: R (A1, A2 ...An) is called relation, Relation schema R is defined over
attributes A1, A2 ...An.
For Example:
CUSTOMER (Cust-id, Cust-name, Address, Phone#)
Here, CUSTOMER is a relation defined over the four attributes Cust-id, Cust-name, Address,
Phone#, each of which has a domain or a set of valid values. For example, the domain of Cust-id
is 6 digit numbers.
Tuple
A tuple is an ordered set of values. Each value is derived from an appropriate domain. Each row
in the CUSTOMER table may be referred to as a tuple in the table and would consist of four values.
<632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000"> is a tuple
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 20
belonging to the CUSTOMER relation. A relation may be regarded as a set of tuples (rows or
records).
Domain
A domain has a logical definition: e.g., “USA_phone_numbers” are the set of 10 digit phone
numbers valid in the U.S. A domain may have a data-type or a format defined for it. The
USA_phone_numbers may have a format: (ddd)-ddd-dddd where each d is a decimal digit. E.g.,
Dates have various formats such as month name, date, year or yyyy-mm-dd, or dd mm, yyyy etc.
An attribute designates the role played by the domain. E.g., the domain Date may be used to define
attributes “Invoice-date” and “Payment-date”. The relation is formed over the Cartesian product
of the sets; each set has values from a domain; that domain is used in a specific role which is
conveyed by the attribute name. For example, attribute Cust-name is defined over the domain of
strings of 25 characters. The role these strings play in the CUSTOMER relation is that of the name
of customers.
Formally,
Given R(A1, A2, .........., An)
r(R) dom (A1) X dom (A2) X ....X dom(An)
R: schema of the relation
r of R: a specific "value" or population of R.
R is also called the intension of a relation
r is also called the extension of a relation
Example
Let S1 = {0, 1}
Let S2 = {a, b, c}
Let R S1 X S2
Then for example: r(R) = {<0, a>, <0, b> , <1, c> } is one possible “state” or “population” or
“extension” r of the relation R, defined over domains S1 and S2. It has three tuples.
Definition Summary
Informal Terms Formal Terms
Table Relation
Column Attribute/Domain
Row Tuple
Values in a column Domain
Table Definition Schema of a Relation (Intension)
Populated Table Extension
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 21
Data Integrity
The fundamental function of the DBMS is to maintain the integrity of the data. Data integrity
ensures that the data in the database is consistent, accurate, correct, and valid. It ascertains that the
data adhere to the set of rules defined by the database administrator and hence, prevents the entry
of the invalid information into database. Data integrity is of four types, namely, domain
integrity, entity integrity, referential integrity, and semantic integrity. Generally, domain integrity
is applied on the attribute; entity integrity is applied on the tuple; referential integrity is applied on
the relation; and semantic integrity ensures logical data in the database.
1. Entity Integrity
The entity integrity constraint states that no primary key (nor any part of the primary key) value
can be null. This is because the primary key value is used to identify individual tuples in a
relation. Having null value for the primary key implies that we cannot identify some tuples.
This also specifies that there may not be any duplicate entries in primary key column.
2. Referential Integrity
The referential integrity constraint is specified between two relations and is used to maintain
the consistency among tuples in the two relations (Parent and Child). Informally, the referential
integrity constraint states that a tuple in one relation that refers to another relation must refer
to an existing tuple in that relation. It is a rule that maintains consistency among the rows of
the two relations. Examples of Referential integrity constraint in the Customer/Order database:
Customer (custid, custname)
Order (orderID, custid, OrderDate)
To ensure that there are no orphan records, we need to enforce referential integrity.
An orphan record is one whose foreign key value is not found in the corresponding entity – the
entity where the PK is located.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 22
3. Domain Integrity
A domain represents a set of values that can be assigned to an attribute. Domain integrity
constraint is specified on the column (attribute) of a relation, so that correct values can be
entered in the column for each record. The domain integrity states that every element from a
relation should respect the type and restrictions of its corresponding attribute. For example, we
can specify the value of 'Marks' attribute of "Student" table which must be greater than 0 and
less than or equal to 100.
4. Semantic Integrity
To represent real world accurately and consistently, business rules and logical rules must be
enforced in database. Such rules are derived from our knowledge of the application semantics
and are called semantic integrity constraints. Semantic integrity ensures that data in the database
is logically consistent and complete with respect to the real world. This type of integrity cannot
be expressed by the model and contains integrity constraints like:
Number of pages of a book cannot be zero
A book is published by only one publisher
An author cannot review his own book etc.
A constraint specification language may have to be used to express these rules. SQL-99 allows
triggers and assertions to allow for some of these.
Advanced SQL
SQL stands for structured query language developed at IBM research for system R. It includes
features of relational algebra and tuple relational calculus. It is standard for relational data access.
It is DBMS independent. It is one commercially available query language. SQL can define the
structure of data create table, index, view alter table etc. modify data in the data base such as select,
update, delete, insert etc.
SQL components
1. Data definition language (DDL):
The SQL DDL provides commands for defining relation schema, deletion relations, creating
indices, and modifying relation schemas.
2. Interactive data definition language (DML):
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 24
The SQL DML includes languages a query language based on both the relational algebra and
the tuple relational calculus. It includes also commands to insert tuples into delete tuples from
and modify tuples in the database.
3. Embedded DML:
The embedded form of SQL is designed for use with in general purpose programming
languages, such as PL/1, COBOL, PASCOL, FORTRAN and C.
4. View definition:
The SQL DDL includes commands for defining views.
5. Authorization:
The SQL DDL includes commands for specifying access rights to relations and views.
6. Integrity:
The SQL DDL includes commands for specifying integrity constraints that the data stored in
the database must satisfy. Updates that violate integrity constraints are disallowed.
7. Transaction control:
SQL includes commands for specifying the beginning and ending of transactions. Several
implementations also allow explicit locking of data for concurrency control.
Table creation
A new relation can be created using the CREATE TABLE command. The general syntax is as
follows.
CREATE TABLE table_name [{column descriptors}];
e.g. CREATE TABLE DEPARTMENT;
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 25
Customer = (customer_name, customer-street, customer- city)
Loan = (branch- name, loan number amount)
Borrower = (customer- name, lone- number)
Account = (branch_name, account_number, balance)
Depositor = (customer- name, account_number)
Basic Structure
The basic structure of an SQL expression consists of three clauses. SELECT, FROM and
WHERE.
The SELECT clause corresponds to the project operation of the relational algebra. It is used to list
the attributes desired in the result of a query. The FROM clause corresponds to the Cartesian
product operation of relational algebra. It lists the relations to be scanned in the evaluation of the
expression. The WHERE clause corresponds to the selection pedicure of the relational algebra. It
consists of a predicate involving attributes of the relations that appear in the form clause.
Tuple Variables
The as clause is used in defining the notion of tuple variable, as is done in the tuple relational
calculus. The tuple variable in SQL must be associated with a particular relation. Tuple variables
are defined in the FROM clause.
#For all customers who have a loan from the bank, find their names and loan numbers.
SELECT DISTINCT customer_name, T.loan_number
FROM borrower AS T, loan AS S
WHERE T.loan_number=S.loan_number;
String operations
The commonly used operation on string is pattern matching using the operator “like”. We use two
special characters:
Percent (%): it matches any substring.
Underscore (-): it matches any character.
Example: Find the names of all customers whose street address includes the substring "main" is:
SELECT customer_name
FROM Customer
WHERE street like "%main%"
Some examples are:
“Perry%” matches any string beginning with “Perry”.
“%edge%” matches any string containing “edge” as a sub string.
“- - -“matches any string of exactly three characters.
“- - -%” matches any string of the least three characters.
Set operations
Set operations are union, intersect and except.
The union operation
Find all customers having a loan, an account or both is:
(SELECT customer_name
FROM depositor)
UNION
(SELECT customer_name
FROM borrower)
The union operation automatically eliminates duplicates. For duplicate value we use union all in
place of union.
(SELECT customer_name
FROM depositer)
UNION ALL
(SELECT customer_name
FROM borrower)
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 28
INTERSECT ALL
(SELECT customer_name
FROM borrower)
Aggregate functions
Aggregate functions that take a collection of value as input and returns a single value. Aggregate
functions are given below:
Average : AVG
Minimum : MIN
Maximum : MAX
Total : SUM
Count : COUNT
To find the average account balance of Bagbazar branch.
SELECT AVG (balance)
FROM account
WHERE branch- name = ‘Bagbazar’
For the group of tuples, we can use group by to find the average account balance at each branch
is:
SELECT branch_name, AVG (balance)
FROM account
GROUP BY branch- name
If we want duplication is removed then distinct is used. To find the number of depositors for each
branch is:
Select branch_name, count (distinct customer- name)
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 29
From depositor, account
Where depositor account- number = account_account_number
Group by branch_name.
For the condition in group by clause we use having clause
e.g. To list branch_name and average balance group by branch_name having average balance
greater than 1200 is:
Select branch_name, avg (balance)
From account
Group by branch_name
Having avg (balance) > 1200
To find the average balance for all customers:
Select avg (balance)
From account
To find number of tuples in the customer relation:
Select count (*)
From customer
Find the averages balance for each customer who lives in lalitpur and has at least three accounts.
Select depositor.customer- name, avg (balance)
From depositor, account, customer
Where depositor.account_number = account.account_number and
depositor.customer_name = customer.customer_name and customer_city = "lalitpur"
Group by depositor.customer_name
Having count (distinct depositor.Account_number ) > = 3
Null values
Null values are values that indicate absence of information about the value of an attribute.
To find the loan number with null values.
Select loan-number
From loan
Where amount is null;
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 30
Nested Sub queries
A sub query is nested inside another query called nested sub queries. A common use of sub queries
is to perform tests for set membership, set companions and set cardinality.
Set membership
The in connective tests for set membership, where the set is a collection of values produced by a
SELECT clause.
The NOT IN connective tests for the absence of set membership consider the query:
Find all customers who have both a loan and account at the bank. This is solved by union operation
and another approach is finding all account holders at the bank who are members of the set of
borrowers from the bank. This formation generates the same results as did the previous one but it
leads us to write our query using in connective of SQL. We first by finding all account holders.
(Select customer_name
From depositor)
We then need to find those customers who are borrower from banks and who appear in the list of
account holders obtained on the sub query. The result is
select distinct customer_name
form borrower.
Where customer_name in (select customer_name from depositor);
Find all customers who have both on account and loan at the Bagbazar branch.
(Select distinct customer_name
From borrower, loan
Where borrower . loan number = loan . loan-number.
and branch_name = "bagbazzar" and (branch_name, customer_name) in (select
branch_name, customer_name from depositor, account Where
depositor.account_number = account.account_number )
We also use not in clause
To find all customer who have a loan at a bank but don't have an account at the bank.
Select distinct customer_name
From borrower
Where customer_name not in (select customer_name
From depositor);
Set Comparison
We use comparison operators for set comparison operations. "Greater than at least one" is
represented by > some.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 31
Find the names of all branches that have assets greater than those of at least one branch located in
Bagbazar branch.
Select branch_name
From branch
Where assents > some (select asserts From branch Where branch_city = ‘Bagbazar’);
SQL allows < some, < = some, =some, and < > some is not the same as not in.
Find the names of all branches that have asserts greater than that of each branch in Bagbazar.
SELECT branch_name
FROM branch
WHERE assets > ALL (SELECT assets FROM branch WHERE branch_city = ‘Bagbazar’)
SQL also allows < all, <= all, > = all, and < > all comparisons
Find the branch that has the highest average balance
SELECT branch_name
FROM account
GROUP BY branch_name
HAVING AVG (balance) > = ALL (SELECT AVG (balance)
FROM account
GROUP BY branch_name);
Find all customers who have an account at all the branches located at Bagbazar.
SELECT DISTINCT s.customer_name
FROM depositor AS S
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 32
WHERE NOT EXISTS (SELECT branch_name FROM branch
WHERE branch_city = ‘Bagbazar’)
EXCEPT
(SELECT R.branch_name FROM depositor AS T, account AS R
WHERE T. account_number = R. account_number AND S. customer name = T. customer
name)
Derived Relations
The result of the relation can be renamed and the attributes also renamed by AS clause.
(SELECT branch_name, AVG (balance) FROM depositer
GROUP BY branch_name AS result (branch_name, avg – balance)
View
We can create view in SQL by CREATE VIEW command. The syntax is given below:
CREATE VIEW V AS < query expression >
We can define view for the names of customers who have either an account or a loan is:
CREATE VIEW all_customer AS
(SELECT branch_name, customer_name FROM depositor, account
WHERE depositor.account_number = account.Account_number )
UNION
(SELECT branch_name, customer_name
FROM borrower, loan
WHERE borrower.loan_number = loan.loan_number);
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 33
To delete all loan amounts between 4000 and 5000
DELETE FROM loan
WHERE amount BETWEEN 4000 AND 5000;
To delete all account at every branch located in Bagbazar
Delete from account
Where branch_name in (select branch_name from branch
Where branch_city = ‘bagbazar’);
To delete the records of all accounts with balance below the average at the bank.
Delete from account
Where balance < (select avg (balance) FROM account)
Insertion
We can insert tuples in the relation. The attributes values for inserted tuples must be members of
the attributes domain. Tuples inserted must be of the correct aritry.
To insert account_no 501 at the Bagbazar branch and the balance is 5000
INSERT INTO account VALUES (‘Bagbazar’, 501, 5000)
It is equivalent to:
INSERT INTO account (branch_name, account_number , balance)
VALUES (‘Bagbazar’, 501, 5000);
It is also equivalent to:
Insert into account (account_number, branch_name, balance)
Values (501, ‘Bagbazar’, 5000)
The insert statement considered only examples in which a value is given for every attribute in
inserted tuples. It is possible for inserted tuples to the given values on only some attributes of the
schema. The remaining attributes are assigned a null value denoted by NULL e.g.
Insert into account
Values (NULL, ‘B-101’, 1500)
We know that account B-101 has Rs 1500 but branch name is not known.
Updates
We can change a value in a tuple without changing all values in the tuple. For this, update statement
can be used.
To increase the balance by 5 percent
UPDATE account
SET balance = balance * 1.05;
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 34
Account with balance over 10,000 receives 6 percent interest and other receives 5 percent.
UPDATE account
SET balance = balance * 1.06
WHERE balance > 10000
UPDATE account
SET balance = balance * 1.05
WHERE balance < = 10000;
Update of a view
We can create view by:
Create view branch_loan as
Select branch_name, loan_number
From loan;
We can update by:
Insert into branch_loan
Values (‘pokhara’, 305)
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 35
Examples:
CREATE TABLE customer CREATE TABLE account
(customer_name CHAR (20) NOT NULL, (account_number CHAR(10) NOT NULL,
customer_strees CHAR (30), branch_name CHAR(15),
customer_city CHAR (30), Balance INTEGER,
PRIMARY KEY (customer_name)); PRIMARY KEY (account_number),
CHECK (balance > = 0));
CREATE TABLE Branch
(branch_name CHAR(15) NOT NULL, CREATE TABLE depositor
Branch- city CHAR (30), (customer_name CHAR(20) NOT NULL,
Assets INTEGER, account_number CHAR(10) NOT NULL,
PRIMARY KEY (branch_name), PRIMARY KEY (customer_name,
CHECK (assets > = 0); account_number)),
Delete Table
To remove a relation from SQL database. We use DROP TABLE command. It removes all
information and table also. The command is
DROP TABLE r
Where r is the relation
Alter Table
We use alter table command in SQL to add attributes to the existing relation. All tuples in the
relation are assigned null as the value for the new attribute. The form of alter table command is:
ALTER TABLE r ADD A D
Where r is the name of existing relation, A is the name of attribute to be added and D is the domain
of the added attribute. We can drop attributes from a relation using a command:
ALTER TABLE r DROP A
Where r is the name of an existing relation and A is the name of attribute in a relation.
Integrity constraints
The term integrity refers to the accuracy or correctness of data in the database. Integrity constraint
is a condition specified on a database schema which must hold on all of valid relation instances.
Integrity constraints ensure that changes made to the database by authorized users do not result in
a loss of data consistency. Thus, integrity constraints guard against accidental damage to the
database. Constraints are basically used to impose rules on the table, whenever a row is inserted,
updated, or deleted from the table. Constraints prevent the deletion of a table if there are
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 36
dependencies. The different types of constraints that can be imposed on the table are domain
constraints, referential constraints, trigger, assertions etc. The constraints related to domain
constraints are NOT NULL, UNIQUE, PRIMARY KEY and CHECK constraints.
1. Domain constraints
NOT NULL constraints
UNIQUE constraints
PRIMARY KEY constraints
CHECK constraints etc.
2. Referential constraints
3. Triggers
4. Assertion
Domain constraint
Domains are used in the relational model to define the characteristics of the columns of a table.
Domain refers to the set of all possible values that attribute can take. The domain specifies its own
name, data type, and logical size. The logical size represents the size as perceived by the user, not
how it is implemented internally. For example, for an integer, the logical size represents the
number of digits used to display the integer, not the number of bytes used to store it. The domain
integrity constraints are used to specify the valid values that a column defined over the domain can
take. We can define the valid values by listing them as a set of values (such as an enumerated data
type in a strongly typed programming language), a range of values, or an expression that accepts
the valid values. Strictly speaking, only values from the same domain should ever be compared or
be integrated through a union operator. The domain integrity constraint specifies that each attribute
must have values derived from a valid range
The create domain clause can be used to define new domains. For example, to ensure that age
must be an integer in the range 1 to 100, we could use:
CREATE DOMAIN Ageval INTEGER
CHECK (VALUE >= 1 AND VALUE <= 100)
The domain can be restricted to contain only a specified set of values by using IN clause:
CREATE DOMAIN AccountType CHAR (10)
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 37
CONSTRAINT account-type-test
CHECK (VALUE IN (‘Checking’, ‘Saving’)
SQL also provides drop domain and alter domain clauses to drop or modify domains that have
been created earlier.
CHECK Constraints
CHECK constraint is added to the declaration of the attribute. The CHECK constraint may use the
name of the attribute or any other relation or attribute name may in a sub-query. Attribute value
check is checked only when the value of the attribute is inserted or updated. CHECK constraints
allow users to prohibit an operation on a table that would violate the constraint. It is a local
constraint.
Example: let’s create a student table with attributes student id, student name, age and address. If
we need to allow only those students in the table whose age must be an integer range 20 to 45, we
could use the CHECK constraint during the creation of table as below:
CREATE TABLE Student
(
sid INTEGER,
sname VARCHAR(20),
age INTEGER,
PRIMARY KEY (sid),
CHECK (age>=20 AND age<=45)
)
In the above student table if we are trying to insert a new record as
INSERT INTO Student
VALUES (5, “Rajesh”, 15);
We get insertion is rejected message since value of age attribute violated the check condition.
Referential Integrity
In the relational data model, associations between tables are defined through the use of foreign
keys. The referential integrity rule states that a database must not contain any unmatched foreign
key values. It is to be noted that referential integrity rule does not imply a foreign key cannot be
null. There can be situations where a relationship does not exist for a particular instance, in which
case the foreign key is null. A referential integrity is a rule that states that either each foreign key
value must match a primary key value in another relation or the foreign key value must be null.
Referential integrity ensures that a value that appears in one relation for a given set of attributes
also appears for a certain set of attributes in another relation to establish the relationship between
tables.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 38
In relational model we use primary key and foreign key to establish relationships between two
relations. Table containing primary key attribute is called master table and the table containing
foreign key attribute is called child table. Referential integrity ensures that value appeared in
foreign key attribute of child table must also appear in primary key attribute of corresponding
master table. Defining referential integrity restricts database modification operations such as insert,
delete, and update. To illustrate this consider following two relations:
Employee Department
Eid Ename Salary Dno Dno Dname Location
E01 Ramesh 26000 D1
D1 IT Gwarko
E02 Sohan 19000 D1
E03 Renuka 25000 D2 D2 Finance Satdobato
Insert: We cannot insert new tuples containing value of foreign key attribute that do not
appear in primary key attribute of master table. For example, we cannot insert new
employee that works in D4 department because the department D4 does not exists in
department table.
Delete: We cannot delete tuples containing values of primary key attribute that also appear
foreign key attribute of related table. For example, we cannot delete tuple containing value
D2 of DNO from department table because there are employees working in department D2.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 39
CREATE TABLE Books
( CREATE TABLE Publisher
(
ISBN VARCHAR (20),
Pid VARCHAR (10),
Title VARCHAR (20),
PnameVARCHAR (15),
Category VARCHAR (15),
City VARCHAR (20),
Price INTEGER, Email VARCHAR (20),
Pages INTEGER, PRIMARY KEY (Pid)
Year DATE (10), )
Pid VARCHAR (5),
PRIMARY KEY (ISBN),
FOREIGN KEY (Pid) REFERENCES Publisher (Pid) ON DELETE CASCADE ON
UPDATE CASCADE
)
Alternatively, we can use on delete set null and on update set null.
Also, we can use on delete set default and on update set default.
Assertions
Assertions are general purpose checks that allow the enforcement of any condition over the entire
database. Similar to CHECK but they are global Constraints. When an assertion is made, the
system tests it for validity, and tests it again on every update that may violate the assertion. This
testing may introduce a significant amount of overhead; hence assertions should be used with great
care. An assertion in SQL takes the form:
CREATE ASSERTION <assertion-name> CHECK <predicate>
Example: The department id of manager relation is always not null since each manager works at
least one department.
CREATE ASSERTION Noallow CHECK
(NOT EXISTS (SELECT * FROM MANAGER WHERE DeptId IS NULL));
Above assertion ensures that there is no manager who is not assigned any department at any time.
Let’s take a manager relation in which some records are inserted as
Manager
Mid Mname Address DeptId
M01 Aayan Pokhara D11
M02 Bhupi Lalitpur D22
M03 Arjun Kathmandu D11
M05 Ramesh Palpa Null
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 40
In the above table the department id of manager ‘Ramesh’ is NULL due to which assertion is
violated and we cannot further modify the database.
Assertions can be dropped using the DROP ASSERTION command.
DROP ASSERTION Noallow;
Example 2: The sum of all loan amounts for each branch must be less than the sum of all account
balances at the branch.
CREATE ASSERTION sum-constraint
CHECK (NOT EXISTS (SELECT * FROM branch
WHERE (SELECT SUM (amount) FROM loan
WHERE loan.branch-name =branch.branch-name) >= (SELECT
SUM (amount) FROM account
WHERE loan.branch-name = branch.branch-name)));
Example 3: To specify the constraint that the salary of an employee must not be greater than the
salary of the manager of the department that the employee works for in SQL, we can write the
following assertion:
CREATE ASSERTION Salary_Constraint
CHECK (NOT EXISTS (SELECT * FROM Employee E, Employee M, DEPARTMENT D
WHERE E.Salary >M.Salary AND E.Dno=D.Dnumber AND D.Mgr_ssn =M.Ssn));
Triggers
A trigger is a procedure (statement) that is automatically invoked by the DBMS in response to
specified changes to the database. A database that has a set of associated triggers is called an active
database. Triggers are useful mechanisms for alerting humans or for starting certain tasks
automatically when certain conditions are met. It is the most practical way to implement routines
and granting integrity of data. Unlike the stored procedures or functions, which have to be
explicitly invoked, these triggers implicitly get fired whenever the table is affected by the SQL
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 41
operation. For any event that causes a change in the contents of a table, a user can specify an
associated action that the DBMS should carry out. Trigger follows the Event-Condition-Action
scheme (ECA scheme). To design a trigger mechanism, we must meet following three
requirements:
Event: A change to the database that activates the trigger.
Condition: Trigger performs some action only if a specified condition matches at the
occurrence of the event
Action: A procedure that is executed when the trigger is activated and its condition is true.
General form of trigger:
CREATE TRIGGER <trigger-name>
<Time events>
ON <list-of-tables>
WHEN <Predicate>
<Action-name>
Need of Triggers
Triggers are useful mechanisms for alerting humans or for starting certain tasks automatically
when certain conditions are met.
For example, for every pre-paid account whose balance is less than or equal to 0, the account is
automatically marked as “blocked”.
CREATE TRIGGER overdraft AFTER UPDATE ON pre-paid
REFERENCING NEW ROW AS nrow
FOR EACH ROW
WHEN nrow.balance <= 0
UPDATE pre-paid
SET blocked = ‘T”;
Let’s take a pre-paid relation as
Pre-paid PNO Balance Blocked
PNO Balance Blocked 1 4000 T
1 4000 False 2 5000 False
2 5000 False 3 7000 False
3 7000 False 4 2000 T
4 2000 False
UPDATE TABLE pre-paid
SET Balance = Balance-4000;
If we execute this query then we get following modified Pre-paid table
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 42
Query Processing and Optimization
Query Processing is a procedure of transforming a high-level query (such as SQL) into a correct
and efficient execution plan expressed in low-level language. A query processing select a most
appropriate plan that is used in responding to a database request. When a database system receives
a query for update or retrieval of information, it goes through a series of compilation steps, called
execution plan. In the first phase called syntax checking phase, the system parses the query and
checks that it follows the syntax rules or not. It then matches the objects in the query syntax with
the view tables and columns listed in the system table. Finally it performs the appropriate query
modification. During this phase the system validates the user privileges and that the query does
not disobey any integrity rules. The execution plan is finally execute to generate a response. So
query processing is a stepwise process.
Q. What do you mean by query processing? What are the various steps involved in query
processing? Explain with the help of a block diagram.
Ans: Query processing includes translation of high-level queries into low-level expressions that
can be used at the physical level of the file system, query optimization and actual execution of the
query to get the result. It is a three-step process that consists of parsing and translation,
optimization and execution of the query submitted by the user .These steps are discussed below:
Parsing and translation
Optimization
Evaluation
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 43
1. Parsing and translation
Check syntax and verify relations. Translate the query into its internal form. This is then translated
into relational algebra.
2. Optimization
The primary goal of query optimization is of choosing an efficient execution strategy for
processing a query. The query optimizer attempts to minimize the use of certain resources (mainly
the number of I/O and CPU time) by selecting a best execution plan (access plan). A query
optimization start during the validation phase by the system to validate the user has appropriate
privileges. Simply, generate an optimal evaluation plan (with lowest cost) for the query plan is
called optimization.
3. Evaluation
The query-execution engine takes an (optimal) evaluation plan, executes that plan, and returns the
answers to the query.
Query Optimization
The primary goal of query optimization is of choosing an efficient execution strategy for
processing a query. DBMS provides two different approaches to query optimization: rule based
and cost-based. With the rule-based approach, the optimizer chooses execution plans based on
heuristically ranked operations. However, the rule-based approach is being phased out in favor of
the cost-based approach, where the optimizer examines alternative access paths and operator
algorithms and chooses the execution plan with the lowest estimated cost. The estimated query
cost is proportional to the expected elapsed time needed to execute the query with the given
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 44
execution plan. The optimizer calculates this cost based on the estimated usage of resources, such
as I/O, CPU time, and memory needed. The goal of cost-based optimization is to minimize the
elapsed time to process the entire query.
Transaction Management
A transaction is a collection of several operations on the database appears to be a single unit from
the point of view of the database user. For example, a transfer of funds from a checking account
to a savings account is a single operation from the customer’s standpoint; within the database
system, however, it consists of several operations.
Database transaction is collection of SQL queries which forms a logical one task. For transaction
to be completed successfully all SQL queries has to run successfully. Database transaction
executes either all or none. For example, if your database transaction contains 4 SQL queries and
one of them fails then change made by other 3 queries will be rolled back. This way your database
always remain consistent whether transaction succeeded or failed.
Transaction is implemented in database using SQL keyword TRANSACTION, COMMIT and
ROLLBACK.
COMMIT writes the changes made by transaction into database
ROLLBACK removes temporary changes logged in transaction log by database
transaction.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 45
Example:
Operation (Money(Money
Operation Transfer)
Transfer) T1 T1
Read balance
-Read of
balance
account
of A1
accountRead
A1 (A1)
Read (A1)
Subtract-Subtract
20,000 from
20,000
A1 from A1A1=A1-20000
A1=A1-
20000
Update balance of A1 Write (A1)
-Update balance of A1 Write (A1)
Read balance of account A2 Read (A2)
-Read balance of account A2 Read (A2)
Add Rs 20,000 to A2 A2=A2+20000
-Add rs 20,000 to A2
-Update balance of A2 Write (A2)
Table: Money Transfer Transaction
Properties of Transaction
There are four important properties of database transactions these are represented by acronym
ACID and also called ACID properties or database transaction where:
Atomicity: Atom is considered to be smallest particle which cannot be broken into further
pieces. Database transaction has to be atomic means either all steps of transaction
completes or none of them.
Consistency: Transaction must leave database in consistent state even if it succeed or
rollback.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 46
Isolation: Two database transactions happening at same time should not affect each other
and has consistent view of database. This is achieved by using isolation levels in database.
Durability: Data has to be persisted successfully in database once transaction completed
successfully and it has to be saved from power outage or other threats. This is achieved by
saving data related to transaction in more than one places along with database.
Transaction States
Whenever a transaction is submitted to a DBMS for execution, either it executes successfully or
fails due to some reasons. During its execution, a transaction passes through various states that
are active, partially committed, committed, failed, and aborted.
Active state - It is initial state. Transaction stays in this state while it is executing.
Partially committed state - After the final statement has been executed, a transaction is
in partially committed state.
Committed state - After successful completion, a transaction is in committed state.
Failed state - After the discovery that normal execution can no longer proceed, a
transaction is in failed state.
Terminated State – This state corresponds to the transaction leaving the system. The
transaction information that is maintained in system tables while the transaction has been
running is removed when the transaction terminates. Failed or aborted transactions may be
restarted later – either automatically or after being resubmitted by the user – as brand new
transactions.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 47
Concurrency Control
When multiple transactions are trying to access the same sharable resource, there could arise many
problems if the access control is not done properly. There are some important mechanisms to which
access control can be maintained. Earlier we talked about theoretical concepts like serializability,
but the practical concept of this can be implemented by using Locks and Timestamps. Here we
shall discuss some protocols where Locks and Timestamps can be used to provide an environment
in which concurrent transactions can preserve their Consistency and Isolation properties.
Objectives of concurrency control mechanism can be list as below:
To enforce Isolation (through mutual exclusion) among conflicting transactions.
To preserve database consistency through consistency preserving execution of
transactions.
To resolve read-write and write-write conflicts.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 48
changed back to its original value which causes transaction T2 to have dirty value of the
data item. Again consider the example where two concurrent transactions try to update
inventory of product P1 simultaneously. This example is same as above example except
that its operations are interleaved in slightly different way and first transaction is failed.
Example: Assume initial value of data item P1 is 300
Binary lock
Binary lock is a variable that can be only in two states. It is either locked or unlocked. Normally,
locked state is represented by value 1 and unlocked state is represented by value 0. A distinct lock
is associated with each database item x. If the value of the lock on data item x is 1, item x cannot
be accessed by a database operation that requests the item. If the value of the lock on x is 0, the
item can be accessed when requested. Two operations, lock and unlock, are used with binary
locking and these two operations must be implemented atomically.
If the simple binary locking scheme described above is used, every transaction must obey the
following rules:
1. A transaction T must issue the operation lock(x) before performing any read(x) or
write(x) operations.
2. A transaction T must issue the operation unlock(x) after finishing all read(x) and write(x)
operations.
3. A transaction T will not issue a lock(x) operation if the data item x is already locked by
it.
4. A transaction T will not issue an unlock(x) operation if the data item x is not locked by it.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 49
Shared/Exclusive Lock
This type of lock is also called multiple mode lock. It is a variable that can be in any of three states:
unlocked, read-locked (shared lock) and write-locked (exclusive lock). If a transaction acquires
shared lock (read-lock) on data item x then other transactions can also acquire shared lock on data
item x. But no transaction can acquire exclusive lock (write-lock) on data item x. On the other
hand, if a transaction acquires exclusive lock (write-lock) on data item x then no other transactions
can acquire shared/exclusive lock (read/write lock) on the data item.
When shared/exclusive locking scheme discussed above is used, the system must enforce the
following rules:
1. A transaction T must issue the operation read_lock(x) or write_lock(x) before performing
read(x) operation.
2. A transaction T must issue the operation write_lock(x) before performing write(x)
operation.
3. A transaction T must issue the operation unlock(x) after finishing all its read(x) and
write(x) operations.
4. A transaction T will not issue a read_lock(x) operation if it already holds a shared lock
(read-lock) or exclusive lock (write-lock) on item x.
5. A transaction T will not issue a write_lock(x) operation if it already holds a shared lock
(read-lock) or exclusive lock (write-lock) on item x.
6. A transaction T will not issue an unlock(x) operation if it does not hold a shared lock (read-
lock) or exclusive lock (write-lock) on item x.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 50
Time-Stamp Based Protocol
The most commonly used concurrency protocol is time-stamp based protocol. This protocol uses
either system time or logical counter to be used as a time-stamp. Lock based protocols manage the
order between conflicting pairs among transaction at the time of execution whereas time-stamp
based protocols start working as soon as transaction is created. Every transaction has a time-stamp
associated with it and the ordering is determined by the age of the transaction. A transaction created
at 0002 clock time would be older than all other transaction, which come after it. For example, any
transaction 'y' entering the system at 0004 is two seconds younger and priority may be given to the
older one.
A timestamp can be implemented in two ways. The simplest one is to directly assign the current
value of the clock to the transaction or the data item. The other policy is to attach the value of a
logical counter that keeps incrementing as new timestamps are required.
The concurrency control algorithm must check whether convicting operations violate the time
stamp ordering in the following two cases.
Case I: Transaction T Issues Write(x) Operation
If ReadTS(x) > TS(T) or if WriteTS (x) > TS(T), then abort and roll back T and reject the
Operation. This should be done because some younger transaction with a time stamp
greater than TS(T) – and hence after T in the timestamp ordering –has already read or
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 51
written the value of item x before T had a chance to write x, thus violating the timestamp
ordering.
If the above condition is not satisfied then execute the Write(x) operation and set
WriteTS(x) to TS(T).
Case II: Transaction T Issues Read(x) Operation
If WriteTS (x) >TS(T), then abort and rollback and reject the operation. This should be
done because some younger transaction have timestamp greater than TS (T) – and hence
after T in the Timestamp ordering-has already written the value of item x before T had a
chance to read x.
If write TS(x) < = TS (T), then execute the Read(x) operation of T and set ReadTS(x) to
the larger to TS(T) and the current ReadTS(x).
Example: Assume timestamps of T1 and T2 is 100 and 110 respectively and initial value of x is
500
Distributed Database
A distributed database (DDB) is a collection of multiple, logically interrelated databases
distributed over a computer network. Distributed databases bring the advantages of distributed
computing to the database management domain. It consists of a number of processing elements,
not necessarily homogenous, that are interconnected by a computer network, and that cooperate in
performing certain assigned tasks. As a general goal, distributed computing systems partition a
big, unmanageable problem into smaller pieces and solve it effectively in a coordinated manner.
It provides two major benefits: more computer power can be used to solve a complex task and
each autonomous processing element can be managed independently and develop its own
applications.
Site 5
Site 1
Site 3 Site 2
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 53
require access to local data and is also capable of processing data stored on other computers in the
network.
Additional Functions
Keeping track of data: The ability to keep track of the data distribution, fragmentation,
and replication by expanding the DDMS catalog.
Distributed query processing: The ability to access remote sites and transmit queries and
data among the various sites via a communication network.
Distributed transaction management: The ability to device execution strategies for
queries and transactions that access data from more than one site and to synchronize the
access to distributed data and maintain integrity of the overall database.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 54
Replicated data management: The ability to decide which copy of a replicated data item
to access and to maintain the consistency of copies of a replicated data item.
Distributed database recovery: The ability to recover from individual site crashes and
from new type failures such as the failure of communication link.
Security: Distributed transactions must be executed with the proper management of the
security of the data and the authorization/access privileges of users.
Distributed directory (catalog) management: A directory contains information
(metadata) about data in the database. The directory may be global for the entire DDB, or
local for each site. The placement and distribution of the directory are design and policy
issues.
These above functions (in addition to those of a centralized DBMS) themselves increase the
complexity of DDBMS over a centralized DBMS.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 55
Customer
Fragment 2
σsex=female(customer)
Vertical Fragmentation:
Vertical fragmentation divides a relation “vertically” by columns. A vertical fragment of a relation
keeps only certain attributes of the relation. It is a subset of a relation which is created by a subset
of columns. Thus a vertical fragment of a relation will contain values of selected columns. There
is no selection condition used in vertical fragmentation. All vertical fragments of a relation are
connected by using PROJECT operation of the relational algebra.
Example:
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 56
Vertical fragmentation is subset of attributes
Fragment 1
Fragment 2
To combine all the vertically fragmented tables we need to perform join operation on the
fragments.
SELECT customer_id, Name, Area, Sex, Payment_type
FROM Fragment 1 NATURAL JOIN Fragment 2;
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 57
Data Replication and Allocation
Replication is useful in improving the availability of data. The most extreme case is replication of
the whole database at every site in the distributed system, thus creating a fully replicated distributed
database. This can improve availability remarkably because the system can continue to operate as
long as at least one site is up. It also improves performance of retrieval for global queries because
the results of such queries can be obtained locally from any one site; hence, a retrieval query can
be processed at the local site where it is submitted, if that site includes a server module. The
disadvantage of full replication is that it can slow down update operations drastically, since a single
logical update must be performed on every copy of the database to keep the copies consistent. This
is especially true if many copies of the database exist. Full replication makes the concurrency
control and recovery techniques more expensive than they would be if there was no replication.
In partial replication some selected part is replicated to some of the sites. Data replication
is achieved through a replication schema.
The process of assigning each fragment to a particular site in a distributed system is called
data distribution (or data allocation). This is relevant only in the case of partial replication or
partition. The selected portion of the database is distributed to the database sites. The choice of
sites and the degree of replication depend on the performance and availability goals of the system
and on the types and frequencies of transactions submitted at each site. For example, if many
updates are performed, it may be useful to limit replication.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 58
Unit 2
The Entity Relationship, Extended Entity Relationship Model
And Object Model
E-R Model
It is developed to facilitate database design by allowing the specification of an enterprise schema,
which represents overall logical structure of database. E-R model is useful in mapping the
meanings and interactions of real word objects. Basic objects called entities. Relationship among
objects called relationship. E-R model keep the record of entities, their attributes and relationship
among those entities.
Weak entity
E A Attribute
A Discriminating attribute of
R Identifying Relationship
weak entity set
R One to one
E Total relationship
R
participation
ISA (specialization or
Total generalization)
generalization ISA
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 59
E-R data model consists of three basic notions.
a. Entity sets
b. Relationship sets
c. Attributes
Entity Sets
An entry set is a set of entities of the same type that share the same properties, or attributes for
example the set of all persons who are customer at a given bank can be defined as the entity set
customer, loan etc.
Attributes
An entity is represented by set of characteristics called attributes. Each attributes has set of values
called domain. E.g. possible attributes of loan entity set has loan number and loan amount similarly
the domain of attribute might be a set of a positive integers.
Types of attributes
1.Simple Vs. Composite:
Simple attributes are these they are not divided into sub parts. Composite attributes can be divided
into subpart. A composite attribute is made of one or more simple or composite attributes. E.g.
name is made of first name, middle name and last name and where name is composite attribute
and first name, middle name and last name are simple attribute. It may come in hierarchy.
Customer address
Descriptive Attributes
A relationship set may also have attributes called descriptive attributes. For example, the depositor
relationship set between entity sets customer and account may have the attribute access-date. See
in fig below. A relationship instance in a given relationship set must be uniquely identifiable from
other relationship instances, without using descriptive attributes.
Relationship sets
A relationship is an association between several entities. A relationship set is a set of relationships
of the same type. Mathematically For non-distinct entity set n 2. If E1, E2… En are entity sets
then relationship set R is the subset of {(e1, e2, e3… en) / e1 E1, e2E2………enEn}.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 61
Strong and Weak Entity Set
An entity set may not have sufficient attributes to form a primary key. Such an entity set is termed
as a weak entity set. An entity set that has a primary key is termed as a strong entity set.
For a weak entity set to be meaningful, it must be associated with another entity set, called the
identifying or owner entity set, using one of the key attribute of owner entity set. The weak entity
set is said to be existence dependent on the identifying entity set. The relationship associating the
weak entity set with the identifying entity set is called the identifying relationship. The identifying
relationship is many-to-one form the weak entity set to the identifying entity set, and the
participation of the weak entity set in the relationship set is total.
Although a weak entity set does not have a primary key, we use discriminator (or partial
key) as a set of attributes that allows the distinction to be made among all the entities in the weak
entity set.
In the figure below, payment-number is partial key and (loan-number, payment-number) is
primary key for payment entity set.
Constraints on ER Model
Relationship sets in ER model usually have certain constraints that limit the possible combinations
of entities that may involve in the corresponding relationship set. Database content must confirm
these constraints. The most important constraints are: mapping cardinalities and participation
constraints.
Mapping Cardinality Constraints
ER model constraint that describes maximum number of possible relationship occurrences for an
entity set participating in a given relationship type is called mapping cardinality. It is also termed
as cardinality ratio. On the basis of cardinality ratio, relationships can be categorized into: One-to-
One, One-to- Many, Many-to-One, and Many-to-Many. We express cardinality constraints by
drawing either a directed line (), signifying “one,” or an undirected line (—), signifying “many,”
between the relationship set and the entity set.
1. One-to-One Relationship
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 62
If every entity in A is associated with at most one entity in B and vice-versa then the relationship
is called one-to-one relationship. The following figure shows one to one mapping cardinality
between entity sets A and B. For example every bank has only one CEO and a person can be CEO
of only one bank therefore it shows one-to-one relationship between Bank and CEO.
2. One-to-Many Relationship
If an entity in A can be associated with any number (zero or more) of entities in B but every entity
in B can be associated with at most one entity in A, and then it is called one-to-many relationship.
For example, a mother can have any number of children but children can have only one mother
therefore it shows one-to-many relationship between mother and child.
3. Many-to-One Relationship
If very entity in A can be associated only one of entities in B but an entity in B can be associated
with any number of entities in A, then it is called many-to-one relationship. For example, a Book
is always published by only one publisher but a publisher can publish any number of books
therefore it shows many-to-one relationship between books and publication.
4. Many-to-Many Relationship
If an entity in A can be associated with any number of entities in B and vice versa then it is called
many-to-many relationship. For example, a student can enroll into more than one subject and a
subject can be enrolled by many students therefore it shows many-to-many relationship between
students and courses.
Participation Constraints
Constraint on ER model that determines whether all or only some entity occurrences participate in
a relationship is called participation constraint. It specifies whether the existence of an entity
depends on its being related to another entity via the relationship type. There are two types of
participation constraints:
Total Participation Constraints and
Partial Participation Constraints.
The participation of an entity set A in a relationship set R is said to be total if every entity in A
participates in relationship at least once.
On the other hand, the participation of an entity set A in a relationship set R is said to be partial
if only some of the members of an entity set A participate in relationship.
Total participation and partial participation is denoted by single line and double line in ER
diagrams respectively.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 63
For example, consider Customer and Loan entity sets in a banking system, and a relationship set
borrower between them indicates that only some of the customers have Loan but every Loan
should be associated with some customer. Therefore, there is total participation of entity set Loan
in the relationship set borrower but participation of entity set customer is partial in relationship set
borrower. Here, Loan entity set cannot exist without Customer entity set but existence of Customer
entity set is independent of Loan entity set.
Keys
Set of one or more attributes whose values are distinct for each individual entity in the entity set
is called key, and its values can be used to identify each entity uniquely. There are different types
of keys which are:
Super key
Candidate key
Primary key
Composite key
Foreign key
Super Key
A supper key is a set of one or more attributes allow us to identify uniquely in entity set. E.g. social
security number attribute of a entity set customer is distinguish from one customer entity to
another. Similarly, customer name and social-security is a supper key for an entity set customer.
The customer name of entity customer is not super key because several people might have the
same name.
Candidate key
A candidate key of an entity set is a minimal super key. That is a super key which does not have
any proper subset is called candidate key. For example, student-id is candidate key of the entity
set student but set of attributes {roll-number, name, program, semester, section} is not candidate
key of the entity set student because it has proper subset {roll-number, program, semester section}
which is also key. All candidate keys are super keys but vice versa is not true. Any candidate key
other than the one chosen as a primary key is known as alternate key.
Primary key
A primary key is a candidate key that is chosen by the database designer as the principle means of
uniquely identifying entities within an entity set. There may exist several candidate keys, one of
the candidate keys is selected to be the primary key. For example, entity set student have two
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 64
candidate keys: student-id, and {roll-number, program, semester section}, if database designer
chooses student-id for the purpose of uniquely identifying entities within entity set then it becomes
primary key. Primary key must satisfy following two characteristics:
It cannot be null
It cannot be duplicate
Composite Key
If a primary key contains more than one attribute, then it is called composite key. For example, if
database designer chooses student-id as primary key then it not composite key but if database
designer chooses {roll-number, program, semester section} as primary key then it is also called
composite key.
Foreign key
A foreign key (FK) is an attribute or combination of attributes that is used to establish and enforce
relationship between two relations (table). A set of attributes that references primary key of another
table is called foreign key. For example, if a student enrolls in program then program-id (primary
key of relation program) can be used as foreign key in student relation,
Student
S-ID Name Address Program-ID
S-12 Pawan Joshi C002 Foreign Keys
S-14 Yamman Karki C021
S-51 Abin Saud C321
S-11 Binak Singh C112 Program
Program-ID Program-
Relationships Name
C002 BBA
Primary Keys
C021 B. Sc CSIT
C112 BIM
C321 B. ed.
Fig: Primary key and foreign key
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 65
required to process the data. In an object DBMS or an object-relational DBMS, complex data types
are stored as objects that are integrated into and activated by the DBMS.
Complex data types are those that include record type and collections which are used to handle
data in record format or in an array format. A complex data type is usually a composite of other
existing data types. For example, you might create a complex data type whose components include
built-in types, opaque types, distinct types, or other complex types. An important advantage that
complex data types have over user-defined types is that users can access and manipulate the
individual components of a complex data type.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 66
from its superclass. That is, there is type inheritance in subclass. The entity also inherits all the
relationships in which the superclass participates.
Consider the example of entity set college. We can divide the entity set college into two
subgroups: constituent colleges and affiliated colleges. Further affiliated colleges can be divided
into two subgroups: permanently affiliated colleges and temporarily affiliated colleges. Here the
entity set college is superclass of subgroups entity sets constituent colleges and affiliated colleges
and the subgroups are called subclasses of the superclass entity set college. In ER diagram we can
represent superclass/subclass relationship by using ISA triangle as below:
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 68
Aggregation
Sometimes we have to model a relationship between a collection of entities and relationships.
Basic ER model cannot express relationships among relationship sets and relationships between
relationship sets and entity sets. Aggregation is an abstraction through which relationship sets are
treated as high-level entity sets and can participate in relationship sets. It allows relationships
between relationships.
Fig: Aggregation
Constraints on Generalization/Specialization
To model real world more accurately by using ER diagram we need to keep certain constraints on
it. Constraints on which entities can be members of a given lower-level entity set are discussed
below.
Condition defined constraint
Disjoint vs. Overlap Constraints
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 69
A total specialization vs. a partial specialization
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 70
Fig: Disjoint constraint
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 72
ER-to-Relational Mapping Algorithm
Step 1: Mapping of Regular Entity Types
Step 2: Mapping of Weak Entity Types
Step 3: Mapping of Multivalued attributes.
Step 4: Mapping of Composite attributes.
Step 5: Mapping of Binary 1:1 Relation Types
Step 6: Mapping of Binary 1: N Relationship Types.
Step 7: Mapping of Binary M: N Relationship Types.
Step 8: Mapping of N-ary Relationship Types.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 75
Mapping of Binary M: N Relation Types
For a binary Many-to-Many relationship type, separate relation is created for the relationship type.
Primary key for each participating entity set is included as foreign key in the relation and their
combination will form the primary key of the relation. Besides this, simple attributes of the many-
to-many relationship type (or simple components of composite attributes) is included as attributes
of the relation.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 76
Representing Specialization/Generalization
There are two approaches for translating ER diagrams with specialization or generalization into
relations.
A relation is created for higher level entity set using above discussed methods and then
separate relation is created for each of the lower level entity sets. Relation for a subclass
entity set includes all of its attributes and key attributes of superclass entity set.
Another approach is to create relations only for lower level entity sets. Here, relation for a
subclass entity set includes all attributes of superclass entity set and all of its own attributes.
This approach is possible only when subclasses are disjoint and complete
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 77
Representing Aggregation
To represent aggregation, create a table containing
Primary key of aggregated relationship
Primary key of the associated entity set
Any descriptive attributes of the relationship set
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 79
Object Modeling Using UML Class Diagrams
Object modeling methodologies, such as UML (Universal Modeling Language) and OMT (Object
Modeling Technique) are becoming increasingly popular. Although these methodologies were
developed mainly for software design, a major part of software design involves designing the
databases that will be accessed by the software modules. Hence, an important part of these
methodologies—namely, the class diagrams are similar to EER diagrams in many ways.
Unfortunately, the terminology often differs. We briefly review some of the notation, terminology,
and concepts used in UML class diagrams, and compare them with EER terminology and notation.
Figure below shows how the COMPANY ER database schema can be displayed using UML
notation.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 80
In UML class diagrams, a class is displayed as a box that includes three sections: the top section
gives the class name; the middle section includes the attributes for individual objects of the class;
and the last section includes operations that can be applied to these objects. Operations are not
specified in EER diagrams. Consider the EMPLOYEE class. Its attributes are Name, Ssn, Bdate,
Sex, Address, and Salary. The designer can optionally specify the domain of an attribute if desired.
A UML class diagram corresponding to the EER diagram shown in Figure above is represented
by following diagram.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 81
Assignment
1. Construct ER diagram and then translate into a set of Nepal premier database (NPL)
Requirements:
The NPL has many teams,
Each team has a name, a city, a coach, a captain, and a set of players,
Each player belongs to only one team,
Each player has a name, role, a skill level, and a set of injury records
A team captain is also a player
A game is played between two teams (referred to as host_team and guest_team) and has a
date (such as May 11th, 1999).
2. Construct EER diagram and then map into a set of University Database.
Requirements:
Professors have an Citizenship number, a name, an age, a rank, and a research specialty.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 82
Projects have a project number, a sponsor name, a starting date, an ending date, and a
budget.
Graduate students have Citizenship number, a name, an age, and a degree program (e.g.,
M.S. Or Ph.D.).
Each project is managed by one professor (known as the project’s principal investigator).
Each project is worked on by one or more professors (known as the project’s co
investigators).
Professors can manage and/or work on multiple projects.
Each project is worked on by one or more graduate students (known as the project’s
research assistants).
When graduate students work on a project, a professor must supervise their work on the
project. Graduate students can work on multiple projects, in which case they will have a
(potentially different) supervisor for each one.
Departments have a department number, a department name, and a main office.
Departments have a professor (known as the chairman) who runs the department.
Professors’ work in one or more departments, and for each department that they work in, a
time percentage is associated with their job.
Graduate students have one major department in which they are working on their degree.
1. Construct an ER diagram for a car-insurance company whose customers own one or more cars
each. Each car has associated with it zero to any number of recorded accidents.
2. Construct an ER diagram for a hospital with a set of patients and a set of doctors. Associate
with each patient a log of the various tests and examinations conducted.
3. Construct an ER diagram of the library system in your college.
4. Construct an ER diagram to maintain data about students, instructors, semester, and courses in
a college.
5. Construct an ERD to record the marks that students get in different exams of different course
offerings.
------------------------------------------------------------------------------------------------------------------------------------------
By Bhupendra Singh Saud ADBMS 83