Q-A Combined
Q-A Combined
PART – A
12.List the reasons why null value might be introduced into the database.
NULL is a special value provided by database in two cases -
i) When field values of some tuples are unknown (For e.g. city name is not
assigned and
ii) inapplicable (For e.g. middle name is not present).
20.What is the difference between logical data independence and physical data
independence?
1. Physical Independence: This is a kind of data independence which allows the
modification of physical schema without requiring any change to the conceptual
schema. For example - if there is any change in memory size of database server
then it will not affect the logical structure of any data object.
2. Logical Independence: This is a kind of data independence which allows the
modification of conceptual schema without requiring any change to the external
schema. For example - Any change in the table structure such as addition or
deletion of some column does not affect user views.
By these data independence the time and cost acquired by changes in any one
level can be reduced and abstract view of data can be provided to the user.
PART B
1. Sketch the typical component modules of DBMS. Indicate and explain
interactions between those modules of the system.
• Consider the top part of Fig. 1.5.1. It shows application interfaces used by naïve
users, application programs created by application programmers, query tools used
by sophisticated users and administration tools used by database administrator.
• The two important components of database architecture are - Query processor and
storage manager.
Query processor:
• The interactive query processor helps the database system to simplify and facilitate
access to data. It consists of DDL interpreter, DML compiler and query evaluation
engine.
i) DDL interpreter: This is basically a translator which interprets the DDL statements
in data dictionaries.
ii) DML compiler: It translates DML statements query language into an evaluation
plan. This plan consists of the instructions which query evaluation engine
understands.
iii) Query evaluation engine: It executes the low-level instructions generated by the
DML compiler.
• When a user issues a query, the parsed query is presented to a query optimizer,
which uses information about how the data is stored to produce an efficient execution
plan for evaluating the query. An execution plan is a blueprint for evaluating a query.
It is evaluated by query evaluation engine.
Storage manager:
• The storage manager is responsible for storing, retrieving, and updating detain the
database. The storage manager components include -
i) Authorization and integrity manager: Validates the users who want to access the
data and tests for integrity constraints.
ii) Transaction manager: Ensures that the database remains in consistent despite of
system failures and concurrent transaction execution proceeds without conflicting.
iii) File manager: Manages allocation of space on disk storage and representation of
the information on disk.
iv) Buffer manager: Manages the fetching of data from disk storage into main
memory. The buffer manager also decides what data to cache in main memory.
Buffer manager is a crucial part of database system.
ii) Data dictionary: Used for storing metadata, particularly schema of database.
iii) Indices: Indices are used to provide fast access to data items present in the
database
2. Discuss the main categories of the data model. What is the basic difference
between the relational model, the object model and the XML model.
• Data model provides a way to describe the design of database at physical, logical
and view level.
• There are various data models used in database systems and these are as follows
-
• Relation model consists of collection of tables which stores data and also guilatxo
represents the relationship among the data.
• The table contains one or more columns and each column has unique name.
• Each table contains record of particular type, and each record type defines a fixed
number of fields or attributes.
• For example - Following figure shows the relational model by showing the
relationship between Student and Result database. For example - Student Ram
lives in city Chennai and his marks are 78. Thus the relationship between these two
databases is maintained by the SeatNo. Column
Advantages:
(ii)Conceptual Simplicity: The relational model allows the designer to simply focus
on logical design and not on physical design. Hence relational models are
conceptually simple to understand.
(iii) Query Capability: Using simple query language (such as SQL) user can get egile
information from the database or designer can manipulate the database structure.
(iv) Easy design,maintenance and usage: The relational models can be designed
logically hence they are easy to maintain and use.
Disadvantages:
(i) Relational model requires powerful hardware and large data storage devices.
• As the name suggests the entity relationship model uses collection of basic objects
called entities and relationships.
Advantages:
ii) Easy to understand: The design of ER diagram is very logical and hence they are
easy to design and understand.
iv) Integrated: The ER model can be easily integrated with Relational model.
v) Easy conversion: ER model can be converted easily into other type of models.
Disadvantages:
i) Loss of information: While drawing ER model some information can be hidden or
lost.
• The object oriented languages like C++, Java, C# are becoming the
• To The object based data model combines object oriented features with
relationaldata model.
Advantages:
i) Enriched modelling: The object based data model has capability of modelling the
real world objects.
ii) Reusability: There are certain features of object oriented design such as
inheritance, polymorphism which help in reusability.
iii) Support for schema evolution: There is a tight coupling between data and b
applications, hence there is strong support for schema evolution.
iv)Improved performance: Using object based data model there can be significant
improvement in performance using object based data model.
Disadvantages:
i) Lack of universal data model: There is no universally agreed data model for an
object based data model, and most models lack a theoretical foundation.
ii) Lack of experience: In comparison with relational database management the use
of object based data model is limited. This model is more dependent on the skilled
egi programmer.
iii) Complex: More functionalities present in object based data model make the
design complex.
• The semi-structured data model permits the specification of data where individual
data items of same type may have different sets of attributes.
ii) It is flexible.
iii) It is portable.
Disadvantages
4. Explain equi-join, left outer join, right outer join and full outer join operations in
relational algebra with an example.
The SQL Joins clause is used to combine records from two or more tables in a
database. A JOIN is a means for combining fields from two tables by using values
common to each.
Example: Consider two tables for using the joins in SQL. Note that cid is common
column in following tables.
1) Inner Join:
• The most important and frequently used of the joins is the INNER JOIN. They are
also known as an EQUIJOIN.
• The INNER JOIN creates a new result table by combining column values of two
alqutul no tables (Table1 and Table2) based upon the join-predicate.
• The query compares each row of tablel with each row of Table2 to find all pairs of
rows which satisfy the join-predicate.
• When the join-predicate is satisfied, column values for each matched pair of rows of
A and B are combined into a result row. It can be represented as:
2) Left Join:
• The SQL LEFT JOIN returns all rows from the left table, even if there are no
matches in the right table. This means that if the ON clause matches 0 (zero) records
in the right table; the join will still return a row in the result, but with NULL in each
column from the right table.
• This means that a left join returns all the values from the left table, plus matched
values from the right table or NULL in case of no matching join predicate.
• It can be represented as –
3) Right Join:
• The SQL RIGHT JOIN returns all rows from the right table, even if there are no
matches in the left table.
• This means that if the ON clause matches 0 (zero) records in the left table; the join
will still return a row in the result, but with NULL in each column from the left table.
• This means that a right join returns all the values from the right table, plus matched
values from the left table or NULL in case of no matching join predicate.
• It can be represented as follows:
• Syntax: The basic syntax of a RIGHT JOIN is as follow-
4) Full Join:
• The SQL FULL JOIN combines the results of both left and right outer joins.
• The joined table will contain all records from both the tables and fill in NULLS for
missing matches on either side.
• It can be represented as
The CREATE command is used to define new tables and their structure. For
example,
1. creating a Student table:
CREATE TABLE STUDENT (
STUDENT_ID INT PRIMARY KEY, -- Unique identifier for each student
NAME VARCHAR(50) NOT NULL, -- Student's name
DATE_OF_BIRTH DATE, -- Date of birth
GENDER CHAR(1), -- Gender (M/F)
COURSE_ID INT, -- Course identifier (foreign key)
ADMISSION_DATE DATE, -- Admission date
EMAIL VARCHAR(100) UNIQUE -- Unique email address
);
Adding a Column:
ALTER TABLE STUDENT
ADD PHONE_NUMBER VARCHAR(15);
Modifying a Column:
ALTER TABLE STUDENT
Dropping a Column:
The DROP command removes a table and its data from the database permanently.
For example:
Modifying a Column:
ALTER TABLE COURSE
MODIFY FEES DECIMAL(12, 2);
3. Dropping Tables
The DROP command removes a table and its data from the database permanently.
For example:
DROP TABLE STUDENT;
4. Truncating Tables
The TRUNCATE command removes all rows from a table while retaining its
structure. For example:
sql
TRUNCATE TABLE STUDENT;
5. Constraints in DDL
Constraints ensure data integrity in the database. Common constraints include:
7. Discuss the entity Integrity and referential integrity constraints. Why are they
important? Explain them with suitable examples.
The integrity constraints are classified based on the concept of primary key and
foreign key. Let us discuss the classification of constraints based on primary key and
foreign key as follows-
This rule states that "In the relations, the value of attribute of primary key can not be
null".
The NULL represents a value for an attribute that is currently unknown or is not
applicable for this tuple. The Nulls are always to deal with incomplete or exceptional
data.
The primary key value helps in uniquely identifying every row in the table. Thus if the
users of the database want to retrieve any row from the table or perform any action
on that table, they must know the value of the key for that row. Hence it is necessary
that the primary key should not have the NULL value.
• The referential integrity rule states that "whenever a foreign key value is used it
must reference a valid, existing primary key in the parent table".
• Example:Consider the situation where you have two tables Employees and
Managers. The Employees table has a foreign key attribute entitled Managed By,
which points to the record for each employee's manager in the Managers table.
i) You cannot add a record to the Employees table unless the Managed By attribute
points to a valid record in the Managers table. Referential integrity prevents the
insertion of incorrect details into a table. Any operation that doesn't satisfy referential
integrity rule fails.
ii) If the primary key for a record in the Managers table changes, all corresponding
records in the Employees table are modified.
iii) If a record in the Managers table is deleted, all corresponding records in the
Employees table are deleted.
ii) Prevents one table from pointing to a nonexistent field in another table.
iii) Prevents the deletion of a record that contains a value referred to by a foreign key
in olds another table.
iv) Prevents the addition of a record to a table that contains a foreign key unless
there is etail a primary key in the linked table.
8. Describe in detail about various types of key constraints with SQL query
example.
We can specify rules for data in a table.
• When the table is created at that time we can define the constraints.
• The constraint can be column level i.e. we can impose constraint on the column
and table level i.e we can impose constraint on the entire table.
There are various types of constraints that can be defined are as follows -
1) Primary key: The primary key constraint is defined to uniquely identify the records
from the table.
The primary key must contain unique values. Hence database designer should
choose primary key very carefully.
For example
Consider that we have to create a person_details table with AdharNo, FirstName,
MiddleName, LastName, Address and City.
Now making AdharNo as a primary key is helpful here as using this field it becomes
easy to identify the records correctly.
The result will be
CREATE TABLE person_details (
AdharNo int,
FirstName VARCHAR(20),
MiddleName VARCHAR(20),
LastName VARCHAR(20),
Address VARCHAR(30),
City VARCHAR(10),
PRIMARY KEY(AdharNo)
);
We can create a composite key as a primary key using CONSTRAINT keyword. For
example
CREATE TABLE person_details (
AdharNo int NOT NULL,
FirstName VARCHAR(20),
MiddleName VARCHAR(20),
LastName VARCHAR(20) NOT NULL,
Address VARCHAR(30),
City VARCHAR(10),
CONSTRAINT PK_person_details PRIMARY KEY(AdharNo, LastName)
);
(2) Foreign Key
• Foreign key is used to link two tables.
• Foreign key for one table is actually a primary key of another table.
• The table containing foreign key is called child table and the table containing
candidate primary key is called parent key.
• Consider
Employee Table
Dept Table:
• Notice that the "EmpID" column in the "Dept" table points to the "EmpID" column in
the "Employee" table.
• The "EmpID" column in the "Employee" table is the PRIMARY KEY in the
"Employee" table.
• The "EmpID" column in the "Dept" table is a FOREIGN KEY in the "Dept" table.
• The FOREIGN KEY constraint is used to prevent actions that would destroy links
between tables.
• The FOREIGN KEY constraint also prevents invalid data from being inserted into
the foreign key column, because it has to be one of the values contained in the table
it points to.
• The purpose of the foreign key constraint is to enforce referential integrity but there
are also performance benefits to be had by including them in your database design.
The table Dept can be created as follows with foreign key constraint.
CREATE TABLE DEPT (
DeptID int
DeptName VARCHAR(20),
EmpID int,
PRIMARY KEY(DeptID),
FOREIGN KEY (EmpID)
REFERENCES EMPLOYEE(EmpID)
);
(3) Unique
Unique constraint is used to prevent same values in a column. In the EMPLOYEE
table, for example, you might want to prevent two or more employees from having an
identical designation. Then in that case we must use unique constraint.
We can set the constraint as unique at the time of creation of table, or if the table is
already created and we want to add the unique constraint then we can use ALTER
command.
For example -
CREATE TABLE EMPLOYEE(
EmpID INT NOT NULL,
Name VARCHAR (20) NOT NULL,
Designation VARCHAR(20) NOT NULL UNIQUE,
Salary DECIMAL (12, 2),
PRIMARY KEY (EmpID)
);
If table is already created then also we can add the unique constraint as follows -
ALTER TABLE EMPLOYEE
MODIFY Designation VARCHAR(20) NOT NULL UNIQUE;
(4) NOT NULL
• By default the column can have NULL values.
• NULL means unknown values.
• We can set the column values as non NULL by using the constraint NOT NULL.
• For example
CREATE TABLE EMPLOYEE(
EmpID INT NOT NULL,
Name VARCHAR (20) NOT NULL,
Designation VARCHAR(20) NOT NULL,
Salary DECIMAL (12, 2) NOT NULL,
PRIMARY KEY (EmpID)
);
(5) CHECK
The CHECK constraint is used to limit the value range that can be placed in a
column.
For example
CREATE TABLE parts (
Part_no int PRIMARY KEY,
Description VARCHAR(40),
Price DECIMAL(10, 2) NOT NULL CHECK(cost > 0)
);
9. Elaborate about embedded SQL and Dynamic SQL with suitable examples.
Embedded SQL
Embedded SQL is a method that combines SQL with a high−level programming
language’s feature. It enables programmers to put SQL statements right into the
source code files used to set up an application.
Database operations may be carried out effortlessly by developers by adding
SQL statements to the application code. The source code files having embedded
SQL statements should be pre-processed before compilation because of the issue of
interpretation of SQL statements by the high−level programming languages in
embedded SQL.
The terms EXEC SQL and END_EXEC must be used before and after each SQL
statement in the source code file. In embedded SQL, host variables play a key role.
These variables serve as an intermediary for data transfer between the application
and the database. There are two different kinds of host variables: input host
variables that provide data to the database and output host variables that receive
that data.
DYNAMIC SQL
The main disadvantage of embedded SQL is that it supports only static SQLs. If we
need to build up queries at run time, then we can use dynamic sql. That means if
query changes according to user input, then it always better to use dynamic SQL.
PREPARE
Since dynamic SQL builds a query at run time, as a first step we need to capture all
the inputs from the user. It will be stored in a string variable.
EXECUTE
This statement is used to compile and execute the SQL statements prepared in DB.
EXEC SQL EXECUTE sql_query;
EXECUTE IMMEDIATE
This statement is used to prepare SQL statement as well as execute the SQL
statements in DB. It performs the task of PREPARE and EXECUTE in a single line.
EXEC SQL EXECUTE IMMEDIATE:
sql_stmt;
EXEC SQL EXECUTE IMMEDIATE ‘GRANT SELECT ON STUDENT TO Faculty’;
EXEC SQL EXECUTE IMMEDIATE ‘DELETE FROM STUDENT WHERE STD_ID =
100’; EXEC SQL EXECUTE IMMEDIATE ‘UPDATE STUDENT SET ADDRESS =
‘Troy’ WHERE STD_ID =100’;
10.i) Outline select, project, cartesian product and join operations in relational
algebra with an example.
(ii) Solve the queries for the following database using relational algebra branch
(branch-name, branch-city, assets)
customer (customer-name, customer-street, customer-only)
account (account-number, branch-name, balance)
loan (loan-number, branch-name, amount)
depositor (customer-name, account-number)
borrower (customer-name, loan-number)
(a) Find all loans over $1200
(b) Find the loan number for each loan of an amount greater than $1200
(c) Find the names of all customers who have a loan, an account, or both, from
the bank
(d) Find the names of all customers who have a loan and an account at bank.
(e) Find the names of all customers who have a loan at the Perryridge branch.
(f) Find the names of all customers who have a loan at the Perryridge branch but
do not have an account at any branch of the bank.
i) Relational Operations
Various types of relational operations are as follows-
(1) Selection:
• This operation is used to fetch the rows or tuples from the table(relation).
• Syntax: The syntax is
σpredicate(relation)
Where σrepresents the select operation. The predicate denotes some logic using
which the data from the relation (table) is selected.
• For example - Consider the relation student as follows-
(2)Projection :
• Project operation is used to project only a certain set of attributes of a relation. That
means if you want to see only the names all of the students in the Student table, then
you can use Project operation.
• Thus to display particular column from the relation, the projection operator is used.
• It will only project or show the columns or attributes asked for, and will also vait
remove duplicate data from the columns.
• Syntax:
ПС1, С2... (r)
where C1, C2 etc. are attribute names(column names).
• For example - Consider the Student table given in Fig. 1.13.2.
Query: Display the name and age all the students
This can be written in relational algebra as
Пsname, age (Student)
Above statement will show us only the Name and Age columns for all the rows of
data in Student table.
• Query: If we want to find out the names of the students who are working in a
company then 300
Пname (Student) ∩ Пname (Worker)
(iii) Set-Difference: The result of set difference operation is tuples, which are present
in one relation but are not in the second relation.
Syntax: A - B
For Example: Consider two relations Full_Time_Employee and
Part_Time_Employee, if we want to find out all the employee working for Fulltime,
then the set difference operator is used -
ПEmpName(Full Time_Employee) – ПEmpName(Part_Time_Employee)
(5) Join:The join operation is used to combine information from two or more relations.
Formally join can be defined as a cross-product followed by selections and
projections, joins arise much more frequently in practice than plain cross-products.
The join operator is used as
A) Inner Join
There are three types of joins used in relational algebra
i) Conditional join: This is an operation in which information from two tables is
combined using some condition and this condition is specified along with the join
operator.
A c B = σc (A x B)
Thus is defined to be a cross-product followed by a selection. Note that the
condition c can refer to attributes of both A and B. The condition C can be specified
using <,<,>,< or = operators.
For example consider two table student and reserve as follows-
If we want the names of students with sid(Student) = sid (Reserve) and isbn =
005,then we can write it using Cartesian product as -
(σ((Student.sid = Reserve.sid) ∩(Reserve.(isbn) =005)) (Student × Reserve))
Here there are two conditions as
i) (Student.sid =Reserve.sid) and ii) (Reserve.isbn = 005) which are joined
by∩operator.
Now we can use c instead of above statement and write it as -
ii) Equijoin: This is a kind of join in which there is equality condition between two
attributes(columns) of relations(tables). For example - If there are two table Book and
Reserve table and we want to find the book which is reserved by the student having
isbn 005 and name of the book is 'DBMS', then :
iii) Natural Join: When there are common columns and we have to equate these
common columns then we use natural join. The symbol for natural join is simply
without any condition. For example, consider two tables-
Now if we want to list the books that are reserved, then that means we want to match
Books.isbn with Reserve.isbn. Hence it will be simply Books Reserve
ii) Solution:
1) σamount>1200 (loan))
2) II loan-number(σamount>1200 (loan))
3) II customer-name(borrower)U II customer-name(depositor)
6) IIcustomer-name(σbranch-name="Perryridge"(σborrower.loan-number-loan.loan-n
umber(borrower loan))) - II customer-name(depositor)
7) IIcustomer-name(σbranch-name="Perryridge"(σborrower.loan-number-loan.loan-n
umber(borrower loan)) U IIcustomer-name(depositor))
UNIT – II – DATABASE DESIGN
PART – A
4) Popular for high level design: ER model is very popular for designing high level design.
ii) Update Anomalies: If one copy of such repeated data is updated then inconsistency is created unless
all other copies are similarly updated.
iii) Insertion Anomalies: Due to insertion of new record repeated information get added to the relation
schema.
iv) Deletion Anomalies: Due to deletion of particular record some other important information associated
with the deleted record get deleted and thus we may lose some other important information from the
schema.
• For example - {A,B} -> B is a trivial functional dependency because B is a subset of A,B. Since
(A,B) -> B includes B, the value of B can be determined. It's a trivial functional dependency because
determining B is satisfied by its relationship to A,B
A table is said to have multi-valued dependency, if the following conditions are true,
1) For a dependency A → B, if for a single value of A, multiple values of B exists, then the table may have
multi-values dependency.
2) Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3) And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C
should be independent of each other.
• Boyce and Codd Normal Form is a higher version of the Third Normal form.
• A 3NF table which does not have multiple overlapping candidate keys is said to ove be in BCNF.
When the table is in BCNF then it doesn't have partial functional dependency as well as transitive
dependency.
10. Give an example of a relation schema R and set of dependencies such that R is in BCNF but not
in 4NF.
AB→C
ABC→D
AC→B
Here the only key is AB. Thus each functional dependency has superkey on the left. But MVD has
non-superky on its left. So it is not 4NF.
There are two properties associated with decomposition and those are -
1) Loss-less Join or non Loss Decomposition: When all information found in the original database is
preserved after decomposition, we call it as loss less or nonloss decomposition.
2) Dependency Preservation: This is a property in which the constraints on the wied original table can be
maintained by simply enforcing some constraints on each of the smaller relations.
Att(R1) Att(R2) ≠ Φ
iii)Common attribute must be a key for at least one relation (R1 or R2)
Entity set: The entity set is a set of entities of the same types. For example - All students studying in class X
of the School. The entity set need not be disjoint. Each entity in entity set have the same set of attributes
and the set of attributes will distinguish it from other entity sets. No other entity set will have exactly the
same set of attributes.
Relationship Sets: Relationship is an association among two or more entities. The relationship set is a
collection of similar relationships. For example - Following Fig. shows the relationship works for the two
entities Employee and Departments.
The association between entity sets is called as participation. That is, the entity sets E1, E2,...,
En participate in relationship set R.
The function that an entity plays in a relationship is called that entity's role.
14. Boyce-Codd normal form is found to be stricter than third normal form Justify the statement.
(i) Every relation which is in BCNF is also in 3NF but every relation which is in 3NF is not
necessarily be in BCNF.
(ii) BCNF non-transitionally depends on individual candidate key but there is no such
requirement in 3NF.
15. Mention the significance of "participation role name" in the description of relationship types.
Each entity type that participates in a relationship type plays a particular role in the
relationship. The role name signifies the role that a participating entity of an entity plays in each relationship
instance. In PREPARED BY relationship type, EMPLOYEE plays the role of document creator and voucher
plays the role of document created. Entity TEACHER and Entity STUDENT are related with a relationship
TEACHER-teach-STUDENT. The teaches is a participating role in the entity set TEACHER and STUDENT.
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals
with certain type of anomaly that is not handled by 3NF.
A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.
ii) For each functional dependency (X → Y), X should be a super Key. In simple words if Y is
a prime attribute then X can not be non prime attribute.
A relation is in 3NF if and only if it is in 2NF and every non key attribute is non transitively
Mapping cardinality express the number of entities to which another entity can be associated via a
relationship set.
4NF is more desirable than BCNF because it reduces the repetition of information. If we
consider a BCNF schema not in 4NF we observe that decomposition into 4NF does not lose information
provided that a lossless join decomposition is used, yet A redundancy is reduced.
PART – B
1. Describe the notation used in E-R diagram. Explain the E-R model structure with Example.
2. Develop an Entity Relationship model for a library management system. Clearly state the
problem Definition, Description, Business Rule and any assumption you make.
1. Problem Definition
To design a database system for a library that manages books, borrowers, and their
transactions. The system should facilitate the management of book information, borrower records, issue
and return of books, and overdue penalties. It must ensure data consistency and provide efficient
access to information.
2. Description
The library management system consists of the following key components:
● Books: Information about books available in the library.
● Borrowers: Information about library members who borrow books.
● Transactions: Records of book issues and returns.
● Staff: Library employees managing operations.
3. Business Rules
3. A borrower can borrow multiple books but not more than a specified limit (e.g., 3 books at a
time).
4. Each transaction records the issue date, due date, and return date.
5. If a book is returned after the due date, a fine is imposed based on the number of overdue
days.
8. Library staff are responsible for issuing and returning books.
4. Assumptions
5. ER Model Components
1. Book
o Title
o Author
o Publisher
o ISBN
o CopiesAvailable
2. Borrower
o Name
o ContactInfo
o MembershipDate
3. Staff
o StaffID (Primary Key)
o Name
o Role
o ContactInfo
4. Transaction
o IssueDate
o DueDate
o ReturnDate
o Fine
Relationships:
6. ER Diagram
Example:
This model captures the functional requirements of the library management system and ensures
data integrity, efficient querying, and scalability.
3. Develop an Entity Relationship model preparation staff (chef) and finalize the customer‗s bill.
The Food preparation staffs (Chefs), with their touch-display interfaces to the system, are able
to view orders sent to the kitchen by waiters. During preparation, they are able to let the waiter
know the status of each item, and can send notifications when items are completed. The system
should have full accountability and logging facilities, and should support Supervisor actions to
account for exceptional circumstances, such as a meal being refunded or walked out on.
1. Problem Definition
To design a database system that manages food orders, preparation status, customer bills,
and accountability for exceptional cases in a restaurant. The system must:
3. Maintain full accountability for all actions, including refunds or walk-outs.
2. Description
● Waiters: Staff members responsible for taking orders and serving food.
3. Business Rules
2. Each order can consist of multiple items, and each item is prepared by chefs.
3. Chefs update the status of each item as "Preparing," "Ready," or "Cancelled."
5. Bills are generated based on the items in the order and their prices.
6. Supervisors can authorize refunds or cancel bills in exceptional situations (e.g., customer
dissatisfaction or walk-outs).
7. The system must log every action for accountability, including order updates, notifications,
and supervisor interventions.
4. Assumptions
1. Each chef, waiter, and supervisor has a unique Staff ID.
4. Exceptional circumstances are rare but must be logged and authorized.
5. ER Model Components:
1. Chef
o Name
o Specialization
2. Waiter
o Name
3. Supervisor
o Name
4. Order
o TableNumber
o Timestamp
5. OrderItem
o Timestamp
6. Item
o ItemID (Primary Key)
o Name
o Price
7. Bill
o TotalAmount
8. ActionLog
o Timestamp
o Description
Relationships
6. ER Diagram
1. Entities: Chef, Waiter, Supervisor, Order, OrderItem, Item, Bill, ActionLog.
2. Waiters are notified when items are ready (status updates logged in ActionLog).
3. Bills are accurately generated and linked to orders (via Bill entity).
4. Supervisors can handle refunds or cancellations and ensure all actions are logged for
accountability.
This ER model provides a scalable and efficient design for managing the restaurant's operations
and ensuring data consistency.
5. Explain the Database Design Process in ER model. Draw the ER diagram for banking systems.
(Home loan Application).
Entity Relational model is a model for identifying entities to be represented in the database
and representation of how those entities are related.
Design Phases
Following are the six steps of database design process. The ER model is most relevant to
first three steps.
Step 1: Requirement analysis:
• In this step, it is necessary to understand what data need to be stored in the database,
what applications must be built, what are all those operations that are frequently used by the system.
• The requirement analysis is an informal process and it requires proper communication with
user groups.
• There are several methods for organizing and presenting information gathered in this step.
• This is a steps in which E-R Model i.e. Entity Relationship model is built.
• The goal of this design is to create a simple description of data that matches with the
requirements of users.
• In this step, relational database schema is analyzed to identify the potential smise problems
and to refine it.
• The schema refinement can be done with the help of normalizing and restructuring the
relations.
• The tasks that are performed in this step are building indexes on tables and clustering tables,
redesigning some parts of schema obtained from earlier design steps.
Step 6: Application and security design:
• Using design methodologies like UML (Unified Modeling Language) the design of the
database can be accomplished.
• The role of each entity in every process must be reflected in the application task.
• For each role, there must be the provision for accessing the some part of database and
prohibition of access to some other part of database.
• Thus some access rules must be enforced on the application(which is accessing the database)
to protect the security features.
6. Consider the following tables:
Employee (Emp_no, Name, Emp_city)
Company (Emp_no, Company_name, Salary)
i) Write a SQL query to display Employee name and company name.
ii) Write a SQL query to display employee name, employee city ,company name and salary of all
the employees whose salary >10000
iii) Write a query to display all the employees working in “XYZ company
iv) Write a query to display employees working in same city.
i) Query to display Employee name and Company name
We need to join the Employee and Company tables using the common column Emp_no.
SELECT Employee.Name, Company.Company_name
FROM Employee
INNER JOIN Company
ON Employee.Emp_no = Company.Emp_no;
ii) Query to display employee name, employee city, company name, and salary of employees
whose salary > 10000
Add a WHERE clause to filter employees with Salary > 10000.
SELECT Employee.Name, Employee.Emp_city, Company.Company_name, Company.Salary
FROM Employee
INNER JOIN Company
ON Employee.Emp_no = Company.Emp_no
WHERE Company.Salary > 10000;
iii) Query to display all employees working in "XYZ company"
Add a WHERE condition to filter rows based on Company_name = 'XYZ company'.
SELECT Employee.Name, Employee.Emp_city, Company.Company_name
FROM Employee
INNER JOIN Company
ON Employee.Emp_no = Company.Emp_no
WHERE Company.Company_name = 'XYZ company';
iv) Query to display employees working in the same city
For this, we need to join the Employee table with itself (self-join) to compare employees from
the same city.
SELECT E1.Name AS Employee1, E2.Name AS Employee2, E1.Emp_city
FROM Employee E1
INNER JOIN Employee E2
ON E1.Emp_city = E2.Emp_city AND E1.Emp_no <> E2.Emp_no;
Employee (empno,name,office,age)
Books(isbn,title,authors,publisher)
Loan(empno, isbn,date)
a) Find the names of employees who have borrowed a book Published by McGraw-Hill.
b) Find the names of employees who have borrowed all books Published by McGraw-Hill.
c) Find the names of employees who have borrowed more than five different books
published by McGraw-Hill.
d) For each publisher, find the names of employees who have borrowed more than five
books of that publisher.
SELECT name
FROM member
FROM Member M,
(SELECT isbn
FROM books
EXCEPT
(SELECT isbn
FROM borrowed R
SQL:
select name from employee, loan where employee. empno=loan. empno and isbn in ( select
distinct isbn from books where publisher='McGraw-Hill') group by employee.empno, name
having count(isbn) >=5
SQL:
select name from employee, loan, books where employee.empno=loan.empno and
books.isbn=loan.isbn group by employee. empno, name, books. publisher having count(loan.isbn)
>=5
iv Hash join
Definition:
The Nested Loop Join is a basic join algorithm where each tuple of one relation (outer relation) is
compared with every tuple of the other relation (inner relation) to find matching pairs based on the
join condition.
Steps:
1. For each tuple in the outer relation, scan all tuples in the inner relation.
Example:
Consider two relations:
● R1: {A: 1, 2}
● R2:{B:1,3}
Join condition: R1.A = R2.B.
Time Complexity:
● Without indexing: O(m × n), where m and n are the sizes of the two relations.
Advantages:
Disadvantages:
Definition:
A variation of the nested loop join that reduces the number of disk I/O operations by reading the
tuples of the outer relation in blocks (pages) rather than one tuple at a time.
Steps:
2. For each block of the outer relation, compare its tuples with all tuples of the inner relation.
Optimization:
● Load a block of tuples from R1 and compare with all tuples in R2.
● Repeat for the next block until all blocks of R1 are processed.
Advantages:
● More efficient than the basic nested loop join, especially when the outer relation is much
larger.
Disadvantages:
● Still slower than other optimized joins like hash join or merge join for large datasets.
Definition:
The Merge Join (or Sort-Merge Join) works by first sorting both relations on the join key and then
merging the two sorted lists to find matching tuples.
Steps:
2. Use two pointers, one for each relation, and scan through both relations simultaneously.
4. If the keys do not match, move the pointer for the relation with the smaller value.
Example:
R1: {A: 1, 2, 4} (already sorted)
R2: {B: 1, 3, 4} (already sorted)
Time Complexity:
● Merging: O(m + n)
Advantages:
● Efficient for sorted relations.
Disadvantages:
Definition:
The Hash Join is an efficient join algorithm that uses hashing to partition and match tuples.
It works best for equality joins.
Steps:
1. Build Phase: Create a hash table for the smaller relation (usually the inner relation) based
on the join key.
2. Probe Phase: For each tuple in the larger relation (outer relation), use the hash function to
check for matches in the hash table.
Example:
R1: {A: 1, 2}
R2: {B: 1, 3}
Join condition: R1.A = R2.B.
● Build phase: Create a hash table for R2 → Hash(B=1) → Slot 1, Hash(B=3) → Slot 3.
● Probe phase: For R1, hash A=1 → Match in Slot 1. Hash A=2 → No match.
Result: {(1, 1)}
Advantages:
● Very efficient for large relations when the smaller relation fits in memory.
● No sorting required.
Disadvantages:
● Not suitable for non-equality joins or large inner relations that do not fit in memory.
9. A software contract and consultancy firm maintains details of all the various projects in which its
employees are currently involved. These details comprise:
• Employee Number
• Employee Name
• Date of Birth
• Department Code
• Project Description
• Project Supervisor
• Project Code, Project Description and Project Supervisor are repeating fields.
Final Structure:
1. Employee Table contains employee-specific details (removing redundancy).
2. Department Table ensures department details are stored once and referenced via
Department Code.
3. Project Table stores project-specific details (removing redundancy of repeating fields).
4. Employee-Project Table establishes a many-to-many relationship between employees and
projects.
This normalized structure eliminates redundancy, ensures data integrity, and adheres to the rules of 3NF.
10. A car rental company maintains a database for all vehicles in its current fleet. For all vehicles, it
includes the vehicle identification number license number, manufacturer, model, date of purchase
and colour. Special data are included for certain types of vehicles.
UNIT – III – TRANSACTIONS
PART – A
• When we perform, Read or Write operations to the database then those changes can be
undone by rollback operations. To make these changes permanent, we should make use of commit.
● In a database, each transaction should maintain ACID property to meet the consistency and
integrity of the database. These are
(1) Atomicity
(2) Consistency
(3) Isolation
(4) Durability
Schedule S2 is a serial schedule because, in this, all operations of T1 are performed before
starting any operation of T2. Schedule S1 can be transformed into a serial schedule by swapping
non-conflicting operations of S1. Hence both of the above the schedules are conflict equivalent.
10.What is the difference between shared lock and exclusive lock?
11.What benefit does strict two-phase locking provide? Give the disadvantages result.
Benefits:
1. This ensure that any data written by an uncommitted transaction are locked in exclusive
mode until the transaction commits and preventing other transaction from reading that data.
2. This protocol solves dirty read problem.
Disadvantage:
1. Concurrency is reduced.
12.What is rigorous two- phase locking protocol?
This is stricter two phase locking protocol. Here all locks are to be held until the transaction
commits.
13.Why is it necessary to have control of concurrent execution of transactions? How is it made
possible?
Concurrent execution of transactions is necessary to improve system performance and
resource utilization, but it can lead to issues like data inconsistency, deadlock, and race conditions.
Without proper control, problems such as the lost update, dirty read, and unrepeatable read can
occur, compromising data integrity.
Control of concurrent execution is achieved through concurrency control techniques like:
1. Lock-based protocols: Ensuring exclusive or shared access to data items.
2. Timestamp ordering: Ordering transactions based on their start time.
3. Multiversion concurrency control (MVCC): Maintaining multiple versions of data.
4. Optimistic concurrency control: Detecting conflicts at the transaction commit phase.
These techniques ensure that the database maintains ACID properties (Atomicity,
Consistency, Isolation, Durability).
14.Define deadlock.
Deadlock is a situation in which when two or more transactions have got a lock and waiting
for another locks currently held by one of the other transactions.
15.List four conditions for deadlock.
1. Mutual exclusion condition
2. Hold and wait condition
3. No pre-emption condition
4. Circular wait condition
16.List the responsibilities of a DBMS has whenever a transaction is submitted to the system for
execution.
The system is responsible for making sure that –
(1) Either all the operations in the transaction are completed successfully and effect is
recorded permanently in the database.
(2) The transaction, has no effect whatsoever on the database or on the database or on any
other transaction.
17.Give any two violations that may occur if a transaction executes a lower isolation level than
serializable.
If a transaction executes at a lower isolation level than serializable, the following two
violations may occur:
1. Dirty Read:
A transaction may read uncommitted changes made by another transaction. If the other
transaction rolls back, the data read becomes invalid, leading to inconsistencies.
Example: Transaction T1 updates a record but hasn't committed, and Transaction T2 reads the
uncommitted value.
2. Non-Repeatable Read:
A transaction reads the same data twice but gets different results because another
transaction modifies the data in between.
Example: Transaction T1 reads a record, then Transaction T2 updates or deletes the record,
causing T1 to see inconsistent data when it reads the same record again.
18.What is meant by log-based recovery?
Log-based recovery in DBMS refers to a technique where all the changes made to the
database by transactions are recorded in a log file. This log contains details such as the transaction
ID, the operations performed, and the data affected, ensuring that the database can be recovered to
a consistent state in case of failure.
● Undo: If a transaction fails, the log is used to roll back the changes made by the transaction.
● Redo: If a committed transaction's changes are lost due to a crash, the log is used to reapply
those changes.
This ensures the database maintains atomicity and durability (ACID properties).
19.What is a transaction?
A transaction is a sequence of one or more operations performed on a database that
functions as a single logical unit of work. A transaction ensures the ACID properties:
● Atomicity: All operations in the transaction are executed fully or not at all.
● Consistency: The database remains in a valid state before and after the transaction.
● Isolation: Transactions do not interfere with each other.
● Durability: Once a transaction is committed, changes are permanent.
Example: Transferring money between two bank accounts involves debiting one account and
crediting another as a single transaction.
20. What do you mean by phantom problem?
The phantom problem in DBMS occurs when a transaction retrieves a set of rows based on
a condition, but another transaction inserts or deletes rows that match the same condition. This
leads to inconsistent results if the first transaction re-executes the query.
Example:
1. Transaction T1 reads rows from a table where salary > 5000.
2. Transaction T2 inserts a new row with salary = 6000 and commits.
3. When T1 re-executes the query, it sees the new row, causing inconsistent results.
The phantom problem occurs under lower isolation levels and can be prevented using the
serializable isolation level by applying range locks.
1. Explain in detail about Lock based protocols and Timestamp based protocols.
a) Lock Based Protocol
• Concept of Protocol: The lock-based protocol is a mechanism in which there is exclusive
use of locks on the data item for current transaction.
• Types of Locks: There are two types of locks used -
i) Shared Lock: The shared lock is used for reading data items only. It is denoted by
Lock-S. This is also called as read lock.
ii) Exclusive Lock: The exclusive lock is used for both read and write operations. It is
denoted as Lock-X. This is also called as write lock.
• The compatibility matrix is used while working on set of locks. The concurrency control
manager checks the compatibility matrix before granting the lock. If the two modes of
transactions are compatible to each other then only the lock will be granted.
• In a set of locks may consists of shared or exclusive locks. Following matrix represents
the compatibility between modes of locks.
Here T stands for True and F stands for False. If the control manager get the compatibility
mode as True then it grant the lock otherwise the lock will be denied.
• For example: If the transaction T1 is holding a shared lock in data item A, then the no
control manager can grant the shared lock to transaction T2 as compatibility is True.
But it cannot grant the exclusive lock as the compatibility is false. In simple words if
transaction T1 is reading a data item A then same data item A can be read by another
transaction T2 but cannot be written by another transaction.
Similarly, if an exclusive lock (i.e. lock for read and write operations) is hold on the data
item in some transaction then no other transaction can acquire Share or exclusive lock as
the compatibility function denotes F. That means of some transaction is writing a data item
A then another transaction cannot read or write that data item A.
Hence the rule of thumb is
i) Any number of transactions can hold shared lock on an item.
ii) But exclusive lock can be hold by only one transaction.
• Example of a schedule denoting shared and exclusive locks: Consider following schedule
in which initially A=100. We deduct 50 from A in T, transaction and Read the data item A in
transaction T2. The scenario can be represented with the help of locks and concurrency
control manager as follows:
(ii) Suppose we have two transactions T1 and T2 with timestamps 10 sec and 20
sec respectively.
• There are two types of serializability’s: conflict serializability and view serializability
Conflict Serializability
• Definition: Suppose T1 and T2 are two transactions and I1 and I2 are the instructions
in T1 and T2 respectively. Then these two transactions are said to be conflict
Serializable, if both the instruction access the data item d, and at least one of the
instructions is write operation.
• What is conflict? In the definition three conditions are specified for a conflict in
conflict serializability -
1) There should be different transactions
2) The operations must be performed on same data items
3) One of the operations must be the Write(W) operation
• We can test a given schedule for conflict serializability by constructing a precedence
graph for the schedule, and by searching for absence of cycles in the graph.
• Predence graph is a directed graph, consisting of G=(V,E) where V is set of vertices
and E is set of edges. The set of vertices consists of all the transactions participating
in the schedule. The set of edges consists of all edges Ti→Tj for which one of three
conditions holds:
1. Ti executes write(Q) before Tj executes read(Q).
2. Ti executes read(Q) before Tj executes write(Q).
3. Ti executes write(Q) before Tj executes write(Q).
• A serializability order of the transactions can be obtained by finding a linear order
consistent with the partial order of the precedence graph. This process is called
topological sorting.
Testing for serializability
Following method is used for testing the serializability: To test the conflictserializability
we can draw a graph G = (V,E) where V = vertices which represent the number of
transactions. E = edges for conflicting pairs.
Step 1: Create a node for each transaction.
Step 2: Find the conflicting pairs (RW, WR, WW) on the same variable (or data item)
by different transactions.
Step 3: Draw edge for the given schedule. Consider following cases
1. Ti executes write(Q) before Tj executes read(Q), then draw edge from Ti to Tj.
2. Ti executes read(Q) before Tj executes write(Q), then draw edge from Ti to Tj
3. Ti executes write(Q) before Tj executes write (Q),, then draw edge from Ti to Tj
Step 4: Now, if precedence graph is cyclic then it is a non-conflict serializable schedule
and if the precedence graph is acyclic then it is conflicting serializable schedule.
Example 3.5.1 Consider the following two transactions and schedule (time goes from
top to bottom). Is this schedule conflict-serializable? Explain why or why not.
Solution :
Step 1: To check whether the schedule is conflict serializable or not we will check from
top to bottom. Thus we will start reading from top to bottom as
T1: R(A) -> T1:W(A) ->T2:R(A) -> T2:R(B) ->T1:R(B)->T1:W(B)
Step 2: We will find conflicting operations. Two operations are called as conflicting
operations if all the following conditions hold true for them-
i) Both the operations belong to different transactions.
ii) Both the operations are on same data item.
iii) At least one of the two operations is a write operation
From above given example in the top to bottom scanning we find the conflict as
T1:W(A)->T2:R(A).
i) Here note that there are two different transactions T1 and T2,
ii) Both work on same data item i.e. A and
iii) One of the operation is write operation.
Step 3: We will build a precedence graph by drawing one node from each transaction.
In above given scenario as there are two transactions, there will be two nodes namely
T1 and T2
Step 4: Draw the edge between conflicting transactions. For example in above given
scenario, the conflict occurs while moving from T1:W(A) to T2:R(A). Hence edge must
be from T1 to T2.
Step 5: Repeat the step 4 while reading from top to bottom. Finally the precedence
graph will be as follows
Step 6: Check if any cycle exists in the graph. Cycle is a path using which we can start
from one node and reach to the same node. If the is cycle found then schedule is not
conflict serializable. In the step 5 we get a graph with cycle, that means given
schedule is not conflict serializable.
Example 3.5.2 Check whether following schedule is conflict serializable or not. If it is
not conflict serializable then find the serializability order.
Solution:
Step 1: We will read from top to bottom, and build a precedence graph for conflicting
entries. We will build a precedence graph by drawing one node from each transaction.
In above given scenario as there are three transactions, there will be two nodes
namely T1 T2, and T3
Step 4: As there is no cycle in the precedence graph, the given sequence is conflict
serializable. Hence, we can convert this non serial schedule to serial schedule. For
that purpose, we will follow these steps to find the serializable order.
Step 5: A serializability order of the transactions can be obtained by finding a linear
order consistent with the partial order of the precedence graph. This process is called
topological sorting.
Step 6: Find the vertex which has no incoming edge which is T1. If we delete T1 node
then T3 is a node that has no incoming edge. If we delete T3, then T2 is a node that
has no incoming edge.
Thus, the nodes can be deleted in a order T1, T3 and T2. Hence the order will be
T1-T3-T2
Example 3.5.3 Check whether the below schedule is conflict serializable or not.
{B2,r2(X),b1,r1(X),W1(X),r1(Y),W1(Y),W2(X),e1,C1,e2,C2}
Solution: b2 and b1 represents begin transaction 2 and begin transaction 1. Similarly,
el and e2 represents end transaction 1 and end transaction 2.
We will rewrite the schedule as follows-
Step 1: We will find conflicting operations. Two operations are called as conflicting
operations if all the following conditions hold true for them -
i) Both the operations belong to different transactions.
ii) Both the operations are on same data item.
iii) At least one of the two operations is a write operation.
The conflicting entries are as follows –
As there are two transactions only two nodes are present in the graph.
Step 3: We get a graph with cycle, that means given schedule is not conflict
serializable.
Example 3.5.4 Consider the three transactions T1, T2, and T3 and schedules S1 and
S2 given below. Determine whether each schedule is serializable or not? If a schedule
is serializable write down the equivalent serial schedule(S).
T1: R1(x) R1(z);W1(x);
T2: R2(x);R2(y);W2(z);W2(y)
T3:R3(x);R3(y);W3(y);
S1: R1(x);R2(z);R1(z); R3(x);R3(y);W1(x);W3(y);R2(y); W2(z);W2(y);
S2: R1(x);R2(z);R3(x);R1(z);R2(y);R3(y);W1(x);W2(z);W3(y);W2(y);
Solution:
Step 1: We will represent the schedule S1 as follows
Step (a): We will find conflicting operations. Two operations are called as conflicting
operations if all the following conditions hold true for them -
i) Both the operations belong to different transactions.
ii) Both the operations are on same data item.
iii) At least one of the two operations is a write operation
The conflicting entries are as follows -
Step (a): We will find conflicting operations. Two operations are called as conflicting
operations if all the following conditions hold true for them -
i) Both the operations belong to different transactions.
ii) Both the operations are on same data item.
iii) At least one of the two operations is a write operation
The conflicting entries are as follows -
Solution:
Step 1: We will read from top to bottom, and build a precedence graph for conflicting
entries.
The conflicting entries are as follows-
Step 3: There is no cycle in the precedence graph. That means this schedule is
conflict serializable. Hence, we can convert this non serial schedule to serial schedule.
For that purpose, we will follow the following steps to find the serializable order.
1) Find the vertex which has no incoming edge which is T1.
2) Then find the vertex having no outgoing edge which is T2. In between them there is
no other transaction.
3) Hence the order will be T1-T2.
View Serializability
• If a given schedule is found to be view equivalent to some serial schedule, then it is
called as a view serializable schedule.
• View Equivalent Schedule: Consider two schedules S1 and S2 consisting of
transactions T1 and T2 respectively, then schedules S1 and S2 are said to be view
equivalent schedule if it satisfies following three conditions:
• If transaction T1 reads a data item A from the database initially in schedule S1, then in
schedule S2 also, T1 must perform the initial read of the data item X from the database.
This is same for all the data items. In other words –the initial reads must be same for
all data items.
• If data item A has been updated at last by transaction T1 in schedule S1, then in
schedule S2 also, the data item A must be updated at last by transaction T1.
• If transaction T1 reads a data item that has been updated by the transaction T1 in
schedule S1 then in schedule S2 also, transaction T1 must read the same data item
that has been updated by transaction T1. In other words the Write-Read sequence
must be same.
When T1 writes value of C then only T2 can read it. And when T2 writes the value of C
then only transaction T3 can read it. But if the transaction T1 gets failed then
automatically transactions T2 and T3 gets failed.
The simple two-phase locking does not solve the cascading rollback problem. To solve
the problem of cascading Rollback two types of two-phase locking mechanisms can
be used.
Types of Two-Phase Locking
(1) Strict two-phase locking: The strict 2PL protocol is a basic two-phase protocol but
all the exclusive mode locks be held until the transaction commits. That means in
other words all the exclusive locks are unlocked only after the transaction is
committed. That also means that if T1 has exclusive lock, then T, will release the
exclusive lock only after commit operation, then only other transaction is allowed to
read or write. For example, Consider two transactions
If we apply the locks then
Thus only after commit operation in T1, we can unlock the exclusive lock. This
ensures the strict serializability.
Thus compared to basic two phase locking protocol, the advantage of strict 2PL
protocol is it ensures strict serializability.
(2) Rigorous two phase locking: This is stricter two phase locking protocol. Here all
locks are to be held until the transaction commits. The transactions can be seriealized
in the order in which they commit.
example - Consider transactions
This is lock-unlock instruction sequence help to satisfy the requirements for strict two
phase locking for the given transactions.
The execution of these transactions result in deadlock. Consider following partial
execution scenario which leads to deadlock.
Lock Conversion
Lock conversion is a mechanism in two phase locking mechanism - which allows
conversion of shared lock to exclusive lock or exclusive lock to shared lock.
Method of Conversion :
First Phase:
• can acquire a lock-S on item
• can acquire a lock-X on item
• can convert a lock-S to a lock-X (upgrade)
Second Phase:
• can release a lock-S
• can release a lock-X
• can convert a lock-X to a lock-S (downgrade)
This protocol assures serializability. But still relies on the programmer to insert the
various locking instructions.
For example - Consider following two transactions -
Here if we start applying locks, then we must apply the exclusive lock on data item A,
because we have to read as well as write on data item A. Another transaction T2 does
not get shared lock on A until transaction T1 performs write operation on A. Since
transaction T1 needs exclusive lock only at the end when it performs write operation on
A, it is better if T1 could initially lock A in shared mode and then later change it to
exclusive mode lock when it performs write operation. In such situation, the lock
conversion mechanism becomes useful.
When we convert the shared mode lock to exclusive mode then it is called upgrading
and when we convert exclusive mode lock to shared mode then it is called
downgrading.
Also note that upgrading takes place only in growing phase and downgrading takes
place only in shrinking phase. Thus we can refine above transactions using lock
conversion mechanism as follows -
4. Write in detail about the immediate update and deferred update recovery
techniques.
Deferred Database Modification
• In this technique, the database is not updated immediately.
• Only log file is updated on each transaction.
• When the transaction reaches to its commit point, then only the database is
physically updated from the log file.
• In this technique, if a transaction fails before reaching to its commit point, it will not
have changed database anyway. Hence there is no need for the UNDO operation.
The REDO operation is required to record the operations from log file to physical
database. Hence deferred database modification technique is also called as NO
UNDO/REDO algorithm.
• For example: Consider two transactions T1 and T2 as follows:
If T1 and T2 are executed serially with initial values of A = 100, B = 200 and C = 300,
then the state of log and database if crash occurs
a) Just after write (B, b)
b) Just after write (C, c)
c) Just after <T2, commit>
The result of above 3 scenarios is as follows:
Initially the log and database will be
Here T1 and T2 are executed serially. Initially A = 100, B = 200 and C = 300
If the crash occurs after
i) Just after Write(B, b) ii) Just after Write(C, c) iii) Just after <T,,Commit>
Then using the immediate Database modification approach the result of above three
scenarios can be elaborated as follows:
The contents of log and database is as follows:
• When a process is created, it must declare its maximum claim, i.e. the maximum
number of unit resource. The resource manager can grant the request if the
resources are available.
• For example, if process 1 requests a resource held by process 2 then make sure
that process 2 is not waiting for resource head by first process 1.
Banker's Algorithm
• Banker's algorithm is the deadlock avoidance algorithm. Banker's is named
because the algorithm is modelled after a banker who makes loads from a pool of
capital and receives payments that are returned to that pool.
• Algorithm is check to see if granting the request leads to an unsafe state. If it does,
the request is denied. If granting the request leads to a safe state, it is carried out.
• The Dijkstra proposed an algorithm to regulate resource allocation to avoid Hiwe
deadlocks. The banker's algorithm is the best known of the avoidance method.
• By using avoidance method, the system is always kept in a safe state.
• It is easy to check if a deadlock is created by granting a request, deadlock analysis
method is used for this.
• Deadlock avoidance uses the worst-case analysis method to check for future
deadlocks.
• Safe state: There is at least one sequence of resource allocations to processes that
does not result in a deadlock.
• System is in a safe state only if there exist a safe sequence. A safe state is not
a deadlocked state.
• Deadlocked state is an unsafe state. It does mean the system is in a deadlock.
• As long as the state is safe, the resource manager can be guaranteed to avoid a
deadlock.
• Initially the system is in a safe state. When process requests a resource and that
resource is available then the system must decide whether the resources can be
allocated immediately or process must wait.
• If system remains in safe state after allocating resource, then only OS allocates
resources to process.
• Banker algorithm uses following data structures.
1. Allocation: Allocation is a table in which row represents process and column
represents resources (R).
alloc [i, j] = Number of units of resource Rj held by process Pi.
2. Max: Max be the maximum number of resources that process requires during its
execution.
• Need (claim): It is current claim of a process, where a process's claim is equal to its
maximum need minus its current allocation.
Need Max - Allocation
• Available: There will be number of resources still available for allocation. This is
equivalent to the total number of resources minus the sum of the allocation to all
processes in the system.
Available = Number of resources - Sum of the allocation to all process
= Number of resources - ∑ni=1 Allocation (Pi)
• Each process cannot request more that the total number of resources in the
system. Each process must also guarantee that once allocated a resource, the
process will return that resource to the system within a finite time.
Weakness of Banker's algorithm
1. It requires that there be a fixed number of resources to allocate.
2. The algorithm requires that users state their maximum needs (request) in advance.
3. Number of users must remain fixed.
4. The algorithm requires that the bankers grant all requests within a finite time. 5.
Algorithm requires that process returns all resource within a finite time.
Examples on Banker's algorithm
1. System consists of five processes (P1, P2, P3, P4, P5) and three resources (R1, R2,
R3). Resource type R1 has 10 instances, resource type R2 has 5 instances and
R3 has 7 instances. The following snapshot of the system has been taken :
Currently the system is in safe state.
Safe sequence: Safe sequence is calculated as follows:
1) Need of each process is compared with available. If needi ≤ availablei , then the
resources are allocated to that process and process will release the resource.
2) If need is greater than available, next process need is taken for comparison.
3) In the above example, need of process P1 is (7, 4, 3) and available is (3, 3, 2).
need ≥ available → False
So system will move for next process.
4) Need for process P2 is (1, 2, 2) and available (3, 3, 2), so
need ≤ available (work)
(1, 2, 2) ≤ (3, 3, 2) = True
then Finish [i] = True
Request of P2 is granted and processes P2 is release the resource to the system.
Work: Work + Allocation
Work: (3, 3, 2) + (2, 0, 0)
:= (5, 3, 2)
This procedure is continued for all processes.
5) Next process P3 need (6, 0, 0) is compared with new available (5, 3, 2).
Need > Available = False
(6 00) > (532)
6) Process P4 need (0, 1, 1) is compared with available (5, 3, 2).
Need > Available
(0 1 1) < (532) = True
Available = Available + Allocation
= (532) + (2 1 1)
= (7 4 3) (New available)
7) Then process P5 need (4 3 1) is compared with available (7 4 3)
Need < Available
(4 3 1) < (74 3) = True
Available = Available + Allocation
= (7 4 3) + (0 0 2) = (7 4 5) (New available)
8) One cycle is completed. Again system takes all remaining process in sequence.
So process P1 need (7 4 3) is compared with new available (7 4 5).
Need < Available = True
(7 4 3) < (745)
Available = Available + Allocation
= (7 4 5) + (0 1 0) = (755) (New available)
9) Process P3 need is (6 0 0) is compared with new available (7 5 5).
Need < Available = True
(6 0 0) < (755) = True
Available = Available + Allocation
= (755) + (3 0 2)
= (10 5 7) = (New available)
Safe sequence is <P2 P4 P5 P1 P3>
8. Discuss the violations caused by each of the following: dirty read, non-repeatable
read and phantoms with suitable example.
Dirty read, Non-repeatable read, and Phantom read
Dirty read
Uncommitted data is read.
Transaction B is rolled back at this time, then the second transaction A reads dirty data which
age is 18
Phantom read
When the user reads records, another transaction inserts or deletes rows to the records being
read. When the user reads the same rows again, a new “phantom” row will be found.
Transaction B inserts a new row where transaction A reads, then transaction A finds the total
count of the result changes to 2
Non-repeatable read
Before transaction A is over, another transaction B also accesses the same data. Then, due
to the modification caused by transaction B, the data read twice from transaction A may be
different.
The difference between Phantom read and Non-repeatable read:
The key to non-repeatable reading is to modify:
In the same conditions, the data you have read, read it again, and find that the value is
different.
The key point of the phantom reading is to add or delete:
Under the same conditions, the number of records read out for the first time and the second
time is different.
The isolation levels are used to solve various problems in the database :
1. DEFAULT
Use the isolation level used by the database itself.
ORACLE (read has been submitted) MySQL (repeatable read)
2. Read uncommitted
Reading uncommitted, as the name implies, is that one transaction can read the data
of another uncommitted transaction.
3. Read committed
Read commit, as the name implies, is that a transaction cannot read data until another
transaction is committed.
Solve dirty read, but cannot solve non-repeatable read and phantom read.
4. Repeatable read
Repeated reading, that is, when starting to read data (transaction is opened),
modification operations are no longer allowed.
Solved non-repeatable read.
5. Serializable serialization
Serializable is the highest transaction isolation level. Under this level, transactions are
serialized and executed sequentially, which can avoid dirty read, non-repeatable read,
and phantom read. However, this transaction isolation level is inefficient and
consumes database performance, so it is rarely used.
9. Consider the following schedules. The actions are listed in the order they are
scheduled, and prefixed with the transaction name.
S1:T1:R(X),T2:R(X),T1:W(Y),T2:W(Y),T1:R(Y),T2:R(Y)
S2:T3:W(X),T1:R(X),T1:W(Y),T2:R(Z),T2:W(Z),T3:R(Z)
For each of the schedule, answer the following questions:
(i) what is the precedence graph for the schedule?
(ii) Is the schedule conflict –serializable? if so, what are all the conflict
equivalent serial schedules?
(iii)Is the schedule view-serializable? if so, what are all the view equivalent serial
schedules?
Serializability
• When multiple transactions run concurrently, then it may lead to inconsistency of data (i.e.
change in the resultant value of data from different transaction).
• Serializability is a concept that helps to identify which non serial schedule and find the
transaction equivalent to serial schedule.
• For example:
• In above transaction initially T1 will read the values from database as A= 100, B= 100 and
modify the values of A and B, transaction T2 will read the modified value i.e. 90 and will modify
it to 80 and perform write operation. Thus at the end of transaction T1 value of A will be 90 but
at end of transaction T2 value of A will be 80. Thus conflicts or inconsistency occurs here.
This sequence can be converted to a sequence which may give us consistent result. This
process is called serializability.
Difference between Serial schedule and Serializable schedule
• There are two types of serializabilities: conflict serializability and view serializability
Conflict Serializability
• Definition: Suppose T1 and T2 are two transactions and I1 and I2 are the instructions in
T1 and T2 respectively. Then these two transactions are said to be conflict Serializable, if both
the instruction access the data item d, and at least one of the instruction is write operation.
• What is conflict?: In the definition three conditions are specified for a conflict in conflict
serializability -
1) There should be different transactions
2) The operations must be performed on same data items
3) One of the operation must be the Write(W) operation
• We can test a given schedule for conflict serializability by constructing a precedence graph
for the schedule, and by searching for absence of cycles in the graph.
• Predence graph is a directed graph, consisting of G=(V,E) where V is set of vertices and E is
set of edges. The set of vertices consists of all the transactions participating in the schedule.
The set of edges consists of all edges Ti→Tj for which one of three conditions holds :
1. Ti executes write(Q) before Tj executes read(Q).
2. Ti executes read(Q) before Tj executes write(Q).
3. Ti executes write(Q) before Tj executes write(Q).
• A serializability order of the transactions can be obtained by finding a linear order consistent
with the partial order of the precedence graph. This process is called topological sorting.
Testing for serializability
Following method is used for testing the serializability: To test the conflictserializability we can
draw a graph G = (V,E) where V = vertices which represent the number of transactions. E =
edges for conflicting pairs.
Step 1: Create a node for each transaction.
Step 2: Find the conflicting pairs(RW, WR, WW) on the same variable(or data item) by
different transactions.
Step 3: Draw edge for the given schedule. Consider following cases
1. Ti executes write(Q) before Tj executes read(Q), then draw edge from Ti to Tj.
2. Ti executes read(Q) before Tj executes write(Q), then draw edge from Ti to Tj
3. Ti executes write(Q) before Tj executes write(Q),, then draw edge from Ti to Tj
Step 4: Now, if precedence graph is cyclic then it is a non conflict serializable schedule and if
the precedence graph is acyclic then it is conflict serializable schedule.
Example 3.5.1 Consider the following two transactions and schedule (time goes from top to
bottom). Is this schedule conflict-serializable? Explain why or why not.
Solution :
Step 1: To check whether the schedule is conflict serializable or not we will check from top to
bottom. Thus we will start reading from top to bottom as
T1: R(A) -> T1:W(A) ->T2:R(A) -> T2:R(B) ->T1:R(B)->T1:W(B)
Step 2: We will find conflicting operations. Two operations are called as conflicting operations
if all the following conditions hold true for them-
i) Both the operations belong to different transactions.
ii) Both the operations are on same data item.
iii) At least one of the two operations is a write operation
From above given example in the top to bottom scanning we find the conflict as
T1:W(A)->T2:R(A).
i) Here note that there are two different transactions T1 and T2,
ii) Both work on same data item i.e. A and
iii) One of the operation is write operation.
Step 3: We will build a precedence graph by drawing one node from each transaction. In
above given scenario as there are two transactions, there will be two nodes namely T1 and T2
Step 4: Draw the edge between conflicting transactions. For example in above given
scenario, the conflict occurs while moving from T1:W(A) to T2:R(A). Hence edge must be from
T1 to T2.
Step 5: Repeat the step 4 while reading from top to bottom. Finally the precedence graph will
be as follows
Step 6: Check if any cycle exists in the graph. Cycle is a path using which we can start from
one node and reach to the same node. If the is cycle found then schedule is not conflict
serializable. In the step 5 we get a graph with cycle, that means given schedule is not conflict
serializable.
Example 3.5.2 Check whether following schedule is conflict serializable or not. If it is not
conflict serializable then find the serializability order.
Solution:
Step 1: We will read from top to bottom, and build a precedence graph for conflicting entries.
We will build a precedence graph by drawing one node from each transaction. In above given
scenario as there are three transactions, there will be two nodes namely T1 T2, and T3
Step 4: As there is no cycle in the precedence graph, the given sequence is conflict
serializable. Hence we can convert this non serial schedule to serial schedule. For that
purpose we will follow these steps to find the serializable order.
Step 5: A serializability order of the transactions can be obtained by finding a linear order
consistent with the partial order of the precedence graph. This process is called topological
sorting.
Step 6: Find the vertex which has no incoming edge which is T1. If we delete T1 node then
T3 is a node that has no incoming edge. If we delete T3, then T2 is a node that has no
incoming edge.
Thus the nodes can be deleted in a order T1, T3 and T2. Hence the order will be T1-T3-T2
Example 3.5.3 Check whether the below schedule is conflict serializable or
not.{B2,r2(X),b1,r1(X),W1(X),r1(Y),W1(Y),W2(X),e1,C1,e2,C2}
Solution: b2 and b1 represents begin transaction 2 and begin transaction 1. Similarly, el and
e2 represents end transaction 1 and end transaction 2.
We will rewrite the schedule as follows-
Step 1: We will find conflicting operations. Two operations are called as conflicting operations
if all the following conditions hold true for them -
i) Both the operations belong to different transactions.
ii) Both the operations are on same data item.
iii) At least one of the two operations is a write operation.
The conflicting entries are as follows –
As there are two transactions only two nodes are present in the graph.
Step 3: We get a graph with cycle, that means given schedule is not conflict serializable.
Example 3.5.4 Consider the three transactions T1, T2, and T3 and schedules S1 and S2
given below. Determine whether each schedule is serializable or not? If a schedule is
serializable write down the equivalent serial schedule(S).
T1: R1(x) R1(z);W1(x);
T2: R2(x);R2(y);W2(z);W2(y)
T3:R3(x);R3(y);W3(y);
S1: R1(x);R2(z);R1(z); R3(x);R3(y);W1(x);W3(y);R2(y); W2(z);W2(y);
S2: R1(x);R2(z);R3(x);R1(z);R2(y);R3(y);W1(x);W2(z);W3(y);W2(y);
Solution:
Step 1: We will represent the schedule S1 as follows
Step (a): We will find conflicting operations. Two operations are called as conflicting
operations if all the following conditions hold true for them -
i) Both the operations belong to different transactions.
ii) Both the operations are on same data item.
iii) At least one of the two operations is a write operation
The conflicting entries are as follows -
Step (a): We will find conflicting operations. Two operations are called as conflicting
operations if all the following conditions hold true for them -
i) Both the operations belong to different transactions.
ii) Both the operations are on same data item.
iii) At least one of the two operations is a write operation
The conflicting entries are as follows -
Step (b): Now we will draw precedence graph as follows-
As there is no cycle in the precedence graph, the given sequence is conflict serializable.
Hence we can convert this non serial schedule to serial schedule. For that purpose we will
follow these steps to find the serializable order.
Step (c): A serializability order of the transactions can be obtained by finding a linear order
consistent with the partial order of the precedence graph. This process is called topological
sorting.
Step (d): Find the vertex which has no incoming edge which is T3. Finally find the vertex
having no outgoing edge which is T2. So in between them is T1. Hence the order will be
T3-T1-T2
Example 3.5.5 Explain the concept of conflict serializability. Decide whether following
schedule is conflict serializable or not. Justify your answer.
Solution:
Step 1: We will read from top to bottom, and build a precedence graph for conflicting entries.
The conflicting entries are as follows-
Step 3: There is no cycle in the precedence graph. That means this schedule is conflict
serializable. Hence we can convert this non serial schedule to serial schedule. For that
purpose we will follow the following steps to find the serializable order.
1) Find the vertex which has no incoming edge which is T1.
2) Then find the vertex having no outgoing edge which is T2. In between them there is no
other transaction.
3) Hence the order will be T1-T2.
View Serializability
• If a given schedule is found to be view equivalent to some serial schedule, then it is called
as a view serializable schedule.
• View Equivalent Schedule: Consider two schedules S1 and S2 consisting of transactions
T1 and T2 respectively, then schedules S1 and S2 are said to be view equivalent schedule if it
satisfies following three conditions:
• If transaction T1 reads a data item A from the database initially in schedule S1, then in
schedule S2 also, T1 must perform the initial read of the data item X from the database. This is
same for all the data items. In other words –the initial reads must be same for all data items.
• If data item A has been updated at last by transaction T1 in schedule S1, then in schedule
S2 also, the data item A must be updated at last by transaction T1.
• If transaction T1 reads a data item that has been updated by the transaction T1 in schedule
S1 then in schedule S2 also, transaction T1 must read the same data item that has been
updated by transaction T1. In other words the Write-Read sequence must be same.
Solution:
i) The initial read operation is performed by T2 on data item A or by T1 on data item C. Hence
we will begin with T2 or T1. We will choose T2 at the beginning.
ii) The final write is performed by T1 on the same data item B. Hence T1 will be at the last
position.
iii) The data item C is written by T3 and then it is read by T1. Hence T3 should before T1.
Thus we get the order of schedule of view serializability as T2 - T1 – T3
Example 3.5.7 Consider following two transactions:
T1: read(A)
read(B)
if A=0 then B:=B+1;
write(B)
T2: read(B);
read(A);
if B=0 then A:=A+1;
write(A)
Let consistency requirement be A=0 V B=0 with A=B=0 the initial values.
1) Show that every serial execution involving these two transactions preserves the
consistency of the Database?
2) Show a concurrent execution of T1 and T2 that produces a non serializable
schedule?
3) Is there a concurrent execution of T1 and T2 that produces a serializable
schedule?
Solution: 1) There are two possible executions: T1 ->T2 or T2->T1
Consider case T1->T2 then
AvB =A OR B=FVT-T. This means consistency is met.
Consider case T2->T1 then
Step 1: We will use the precedence graph method to check the serializability. As there are
three transactions, three nodes are created for each transaction.
Step 2: We will read from top to bottom. Initially we read r,(x) and keep on moving bottom in
search of write operation. Here all the transactions work on same data item i.e. x. Now we get
a write operation in T1 as w3(x). Hence the dependency is from T1 to T3. Therefore we draw
edge from T1 to T3.
Similarly, for r3(x) we get w1(x) pair. Hence there will be edge from T3 to T1. Continuing in this
fashion we get the precedence graph as
Step 3: As cycle exists in the above precedence graph, we conclude that it is not
serializable.
ii) r3(x);r2(x);w3(x);r1(x);w1(x)
From the given sequence the schedule can be represented as follows:
Step 1: Read the schedule from top to bottom for pair of operations. For r3(x) we get w1(x)
pair. Hence edge exists from T, to T, in precedence graph.
There is a pair from r2(x): w3(x). Hence edge exists from T2 to T3.
There is a pair from r2(x): w1(x). Hence edge exists from T2 to T1.
There is a pair from w2(x): r1(x). Hence edge exists from T3 to T1.
Step 2: The precedence graph will then be as follows-
Step 3: As there is no cycle in the above graph, the given schedule is serializable.
Step 4: The serializability order for consistent schedule will be obtained by applying
topological sorting on above drawn precedence graph. This can be achieved as follows,
Sub-Step 1: Find the node having no incoming edge. We obtain T2 is such a node. Hence
T2 is at the beginning of the serializability sequence. Now delete T2. The Graph will be
i)
• S1 is not conflict-serializable since the dependency graph has a cycle.
• S2 is conflict-serializable as the dependency graph is acylic. The order T2-T3-T1 is the only
equivalent serial order
ii)
• S1 is not view serializable.
• S2 is trivially view-serializable as it is conflict serializable. The only serial order allowed is
T2-T3-T1.
Example 3.5.10 Check whether following schedule is view serializable or not. Justify your
answer. (Note: T1 and T2 are transactions). Also explain the concept of view equivalent
schedules and conflict equivalent schedule considering the example schedule given below :
Solution:
Step 1: We will first find if the given schedule is conflict serializable or not. For that purpose,
we will find the conflicting operations. These are as shown below –
The precedence graph is as follows -
As there exists no cycle, the schedule is conflict serializable. The possible serializability order
can be T1 - T2
Now we check it for view serializability. As we get the serializability order as T1 - T2, we will
find the view equivalence with the given schedule as serializable schedule.
Let S be the given schedule as given in the problem statement. Let the serializable schedule
is S'={T1,T2}. These two schedules are represented as follows:
Now we will check the equivalence between them using following conditions -
(1) Initial Read
In schedule S initial read on A is in transaction T1. Similarly initial read on B is in transaction
T1.
Similarly in schedule S', initial read on A is in transaction T1. Similarly initial read on B is in
transaction T1.
(2) Final Write
In schedule S final write on A is in transaction T2. Similarly final write on B is in transaction
T2.
In schedule S' final write on A is in transaction T2. Similarly final write on B is in transaction T2
(3) Intermediate Read
Consider schedule S for finding intermediate read operation.
In both the schedules S and S', the intermediate read operation is performed by T2 only after
T1 performs write operation.
Thus all the above three conditions get satisfied. Hence given schedule is view serializable.
10. Define deadlock. How does it occur? How transactions can be written to (i) Avoid
deadlock. (ii) Guarantee correct execution.
Illustrate with suitable example
Deadlock Handling
Deadlock is a specific concurrency problem in which two transactions depend on each other
for something.
For example- Consider that transaction T1 holds a lock on some rows of table A and needs to
update some rows in the B table. Simultaneously, transaction T2 holds locks on some rows in
the B table and needs to update the rows in the A table held by Transaction T1.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and
similarly, transaction T2 is waiting for T1 to release its lock. All activities come to a halt state
and remain at a standstill. This situation is called deadlock in DBMS.
Definition: Deadlock can be formally defined as - " A system is in deadlock state if there exists
a set of transactions such that every transaction in the set is waiting for another transaction in
the set. "
There are four conditions for a deadlock to occur
A deadlock may occur if all the following conditions holds true.
1. Mutual exclusion condition: There must be at least one resource that cannot be used by
more than one process at a time.
2. Hold and wait condition: A process that is holding a resource can request for vd's additional
resources that are being held by other processes in the system.
3. No pre-emption condition: A resource cannot be forcibly taken from a process. Only the
process can release a resource that is being held by it.
4. Circular wait condition: A condition where one process is waiting for a resource that is being
held by second process and second process is waiting for third process In the....so on and the
last process is waiting for the first process. Thus, making a circular chain of waiting.
Deadlock can be handled using two techniques -
1. Deadlock Prevention
2. Deadlock Detection and deadlock recovery
1. Deadlock prevention:
For large database, deadlock prevention method is suitable. A deadlock can be prevented if
the resources are allocated in such a way that deadlock never occur. The DBMS analyzes the
operations whether they can create deadlock situation or not, If they do, that transaction is
never allowed to be executed.
There are two techniques used for deadlock prevention -
(i) Wait-Die:
• In this scheme, if a transaction requests for a resource which is already held with a
conflicting lock by another transaction, then the DBMS simply checks the timestamp of both
transactions. It allows the older transaction to wait until the resource is available for execution.
• Suppose there are two transactions T; and T, and let TS(T) is a timestamp of any transaction
T. If T2 holds a lock by some other transaction and T1 is requesting for resources held by T2
then the following actions are performed by DBMS:
• Check if TS(T) < TS(T) - If T, is the older transaction and T, has held some resource, then T,
is allowed to wait until the data-item is available for execution. That means if the older
transaction is waiting for a resource which is locked by the younger transaction, then the older
transaction is allowed to wait for estb n resource until it is available.
• Check if TS(T;) < TS(T;) - If T; is older transaction and has held some resource and if T, is
waiting for it, then T, is killed and restarted later with the random delay but with the same
timestamp.
Timestamp is a way of assigning priorities to each transaction when it starts. If timestamp is
lower then that transaction has higher priority. That means oldest transaction has highest
priority.
For example-
Let T1 is a transaction which requests the data item acquired by Transaction T2. Similarly, T3
is a transaction which requests the data item acquired by transaction T2.
Here TS(T1) i.e. Time stamp of T1 is less than TS(T3). In other words, T1 is older than T3.
Hence T1 is made to wait while T3 is rolled back.
(ii) Wound - wait:
• In wound wait scheme, if the older transaction requests for a resource which is held by the
younger transaction, then older transaction forces younger one to not kill the transaction and
release the resource. After some delay, the younger bevomer transaction is restarted but with
the same timestamp.
• If the older transaction has held a resource which is requested by the younger transaction,
then the younger transaction is asked to wait until older releases it.
Suppose T1 needs a resource held by T2 and T3 also needs the resource held by T2, with
TS(T1) =5, TS(T2) =8 and TS(T3) =10, then T1 being older waits and T3 being younger dies.
After some delay, the younger transaction is restarted but with the same timestamp.
This ultimately prevents a deadlock to occur.
To summarize
2. Deadlock detection:
• In deadlock detection mechanism, an algorithm that examines the state of the system is
invoked periodically to determine whether a deadlock has occurred or not. If deadlock is
occurrence is detected, then the system must try to recover from it.
• Deadlock detection is done using wait for graph method.
Wait for graph
• In this method, a graph is created based on the transaction and their lock. If the created
graph has a cycle or closed loop, then there is a deadlock.
• The wait for the graph is maintained by the system for every transaction which is waiting for
some data held by the others. The system keeps checking the graph if there is any cycle in
the graph.
• This graph consists of a pair G = (V, E), where V is a set of vertices and E is a set of edges.
• The set of vertices consists of all the transactions in the system.
• When transaction Ti requests a data item currently being held by transaction Ti, then the
edge Ti → Tj is inserted in the wait-for graph. This edge is removed only when transaction Tj is
no longer holding a data item needed by transaction Ti.
• For example - Consider following transactions, We will draw a wait for graph for this scenario
and check for deadlock.
Step 2: We find the Read-Write pair from two different transactions reading from top to bottom.
If such as pair is found then we will add the edges between corresponding directions. For
instance –
Step 3:
As cycle is detected in the wait-for graph there is no need to further process. The deadlock is
present in this transaction scenario.
Example 3.16.1 Give an example of a scenario where two phase locking leads to
deadlock. AU: May-09, Marks 4
Solution: Following scenario of execution of transactions can result in deadlock.
In above scenario, transaction T1, makes an exclusive lock on data item B and then
transaction T2 makes an exclusive lock on data item A. Here unless and until T1, does not give
up the lock (i.e. unlock) on B; T2 cannot read/write it. Similarly unless and until T2 does not
give up the lock on A; T1, cannot read or write on A.
This is a purely deadlock situation in two phase locking.
UNIT – IV
IMPLEMENTATION TECHNIQUES
PART – A
Q.1 What is the need for RAID?
• RAID is a technology that is used to increase the performance.
• It is used for increased reliability of data storage.
• An array of multiple disks accessed in parallel will give greater throughput than a single
disk.
• With multiple disks and a suitable redundancy scheme, your system can stay up and
running when a disk fails, and even while the replacement disk is being installed and its data
restored.
Q.2 Define Software and hardware RAID systems.
Ans.: Hardware RAID: The hardware-based array manages the RAID subsystem
independently from the host. It presents a single disk per RAID array to the host. Software
RAID: Software RAID implements the various RAID levels in the kernel disk code. It offers
the cheapest possible solution, as expensive disk controller cards.
Q.3 What are ordered indices?
Ans.: This is type of indexing which is based on sorted ordering values. Various ordered
indices are primary indexing, secondary indexing.
Q.4 What are the two types of ordered indices?
Ans.: Two types of ordered indices are - Primary indexing and secondary indexing. The
primary indexing can be further classified into dense indexing and sparse indexing and
single level indexing and multilevel indexing.
Q.5 Give the comparison between ordered indices and hashing.
Ans.:
(1) If range of queries are common, ordered indices are to be used.
(2) The buckets containing records can be chained in sorted order in case of ordered
indices.
(3) Hashing is generally better at retrieving records having a specified value of the key.
(4) Hash function assigns values randomly to buckets. Thus, there is no simple notion of
"next bucket in sorted order."
Q.6 What are the causes of bucket overflow in a hash file organization?
Ans.:Bucket overflow can occur for following reasons -
(1) Insufficient buckets: For the total number of buckets there are insufficient number of
buckets to occupy.
(2) Skew: Some buckets are assigned more records than are others, so a bucket might
overflow even while other buckets still have space. This situation is known as bucket skew.
Q.7 What can be done to reduce the occurrences of bucket overflows in a hash file
organization?
Ans.:
(1) A bucket is a unit of storage containing one or more records (a bucket is typically a disk
block).
(2) The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket...
bucketM-1. Typically, a bucket corresponds to one (or a fixed number of) disk block.
(3) In a hash file organization we obtain the bucket of a record directly from its search- key
value using a hash function, h (K).
(4) To reduce overflow records, a hash file is typically kept 70-80% full.
(5) The hash function h should distribute the records uniformly among the buckets;
otherwise, search time will be increased because many overflow records will exist.
Q.8 Distinguish between dense and sparse indices.
1) Dense index:
• An index record appears for every search key value in file.
• This record contains search key value and a pointer to the actual record.
2) Sparse index:
Level: RAID 2
• This level makes use of mirroring as well as stores Error Correcting Codes (ECC) for its
data striped on different disks.
• The data is stored in separate set of disks and ECC is stored another set of disks.
• This level has a complex structure and high cost. Hence it is not used commercially.
Level: RAID 3
• This level consists of byte-level stripping with dedicated parity. In this level, the parity
information is stored for each disk section and written to a dedicated. parity drive.
• We can detect single errors with a parity bit. Parity is a technique that checks whether data
has been lost or written over when it is moved from one place in storage to another.
• In case of disk failure, the parity disk is accessed and data is reconstructed from the
remaining devices. Once the failed disk is replaced, the missing data can be restored on the
new disk.
Level: RAID 4
• RAID 4 consists of block-level stripping with a parity disk.
• Note that level 3 uses byte-level striping, whereas level 4 uses block-level striping.
Level: RAID 5
• RAID 5 is a modification of RAID 4.
• RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data
block stripe are distributed among all the data disks rather than storing them on a different
dedicated disk.
Level: RAID 6
• RAID 6 is a extension of Level 5
• RAID 6 writes whole data blocks onto different disks, but the two independent parity bits
generated for data block stripe are distributed among all the data disks rather than storing
them on a different dedicated disk.
• Two parities provide additional fault tolerance.
• This level requires at least four disks to implement RAID.
The factors to be taken into account in choosing a RAID level are :
Monetary cost of extra disk-storage requirements.
1. Performance requirements in terms of number of I/O operations.
2. Performance when a disk has failed.
3. Performance during rebuild
2. Illustrate indexing and hashing techniques with examples. What is the use of an index
structure? Explain the concept of ordered indices.
An index is a data structure that organizes data records on the disk to make the retrieval of
data efficient.
• The search key for an index is collection of one or more fields of records using which we
can efficiently retrieve the data that satisfy the search conditions.
• The indexes are required to speed up the search operations on file of records.
• There are two types of indices -
• Ordered Indices: This type of indexing is based on sorted ordering values.
• Hash Indices: This type of indexing is based on uniform distribution of values across
range of buckets. The address of bucket is obtained using the hash function.
• There are several techniques of for using indexing and hashing. These techniques are
evaluated based on following factors -
• Access Types: It supports various types of access that are supported efficiently.
• Access Time: It denotes the time it takes to find a particular data item or set items.
• Insertion Time: It represents the time required to insert new data item.
• Deletion Time: It represents the time required to delete the desired data item.
• Space overhead: The space is required to occupy the index structure. But allocating such
extra space is worth to achieve improved performance.
Indexing Techniques:
Indexing is used to speed up data retrieval by creating a structure that maps keys to data
records.
1. Primary Index:
o Built on primary keys.
o Example: A student database sorted by roll number with the first record of
each block indexed.
2. Secondary Index:
o Created on non-primary keys.
● Types:
Step 2: Now if we insert 22, the sequence will be 22, 23, 30, 31, 32. The middle key 30,
will go up.
Step 4: Insert 29. The sequence becomes 22, 23, 24, 28, 29. The middle key 24 will go
up. Thus we get the B+ tree.
Example 4.8.2 Construct B+ tree to insert the following (order of the tree is 3)
26,27,28,3,4,7,9,46,48,51,2,6
Solution:
Order means maximum number of children allowed by each node. Hence order 3 means at
the most 2 key values are allowed in each node.
Step 1: Insert 26, 27 in ascending order
Step 2: Now insert 28. The sequence becomes 26,27,28. As the capacity of the node is
full, 27 will go up. The B+ tree will be,
Step 4: Insert 4. The sequence becomes 3,4, 26. The 4 will go up. The partial B+ tree
will be –
Step 5: Insert 7. The sequence becomes 4,7,26. The 7 will go up. Again from 4,7,27.
the 7 will go up. The partial B+ Tree will be,
Step 6: Insert 9. By inserting 7,9, 26 will be the sequence. The 9 will go up. The partial
B+ tree will be,
Step 7: Insert 46. The sequence becomes 27,28,46. The 28 will go up. Now the
sequence becomes 9, 27, 28. The 27 will go up and join 7. The B+ Tree will be,
Step 8: Insert 48. The sequence becomes 28,46,48. The 46 will go up. The B+ Tree will
become,
Step 9: Insert 51. The sequence becomes 46,48,51. The 48 will go up. Then the
sequence becomes 28, 46, 48. Again the 46 will go up. Now the sequence becomes
7,27, 46. Now the 27 will go up. Thus the B+ tree will be
Step 10: Insert 2. The insertion is simple. The B+ tree will be,
Step 11: Insert 6. The insertion can be made in a vacant node of 7(the leaf node). The
final B+ tree will be,
Deletion Operation
Algorithm for deletion:
Step 1: Start at root, find leaf L with entry, if it exists.
Step 2: Remove the entry.
i) If L is at least half-full, done!
ii) If L has only d-1 entries,
• Try to re-distribute, borrowing keys from sibling.
(adjacent node with same parent as L).
• If redistribution fails, merge L and sibling.
Step 3: If merge occurred, must delete entry (pointing to L or sibling) from parent of L.
Step 4: Merge could propagate to root, decreasing height.
Example 4.8.3 Construct B+ Tree for the following set of key values
(2,3,5,7,11,17,19,23,29,31) Assume that the tree is initially empty and values are added
in ascending order. Construct B+ tree for the cases where the number of pointers that
fit one node is four. After creation of B+ tree perform following series of operations:
(a) Insert 9. (b) Insert 10. (c) Insert 8. (d) Delete 23. (e) Delete 19.
Solution: The number of pointers fitting in one node is four. That means each node contains
at the most three key values.
Step 1: Insert 2, 3, 5.
Step 2: If we insert 7, the sequence becomes 2, 3, 5, 7. Since each node can accommodate
at the most three key, the 5 will go up, from the sequence 2, 3, 5, 7.
Step 4: Insert 17. The sequence becomes 5,7, 11,17. The element 11 will go up. Then the
partial B+ tree becomes,
Step 5: Insert 19.
Step 6: Insert 23. The sequence becomes 11,17,19,23. The 19 will go up.
Step 8: Insert 31. The sequence becomes 19,23,29, 31. The 29 will go up. Then at the
upper level the sequence becomes 5,11,19,29. Hence again 19 will go up to maintain the
capacity of node (it is four pointers three key values at the most). Hence the complete B+
tree will be,
(a) Insertion of 9: It is very simple operation as the node containing 5,7 has one space
vacant to accommodate. The B+ tree will be,
(b) Insert 10: If we try to insert 10 then the sequence becomes 5,7,9,10. The 9 will go up.
The B+ tree will then become –
(c) Insert 8: Again insertion of 8 is simple. We have a vacant space at node 5,7. So we just
insert the value over there. The B+ tree will be-
(d) Delete 23: Just remove the key entry of 23 from the node 19,23. Then merge the sibling
node to form a node 19,29,31. Get down the entry of 11 to the leaf node. Attach the node of
11,17 as a left child of 19.
(e) Delete 19: Just delete the entry of 19 from the node 19,29,31. Delete the internal node
key 19. Copy the 29 up as an internal node as it is an inorder successor node.
Search Operation
1. Perform a binary search on the records in the current node.
2. If a record with the search key is found, then return that record.
3. If the current node is a leaf node and the key is not found, then report an unsuccessful
search.
4. Otherwise, follow the proper branch and repeat the process.
For example-
• From leaf node only any key can be accessed of entire tree. There is no need to traverse
the tree in inorder fashion.
•Thus B+tree gives faster access to any key.
Ex. 5.2.1 Construct a B+tree for F, S, Q, K, C, L, H, T, V, W, M, R.
Sol ; The method for constructing B+tree is similar to the building of B tree but the only
difference here is that, the parent nodes also appear in the leaf nodes. We will build B+tree
for order 5.
The order 3 means at the most 2 keys are allowed.
4.Write detailed notes on ordered indices and B-tree index files.
B-tree indices are similar to B+-tree indices.
• The primary distinction between the two approaches is that a B-tree eliminates the
redundant storage of search-key values.
• B-tree is a specialized multiway tree used to store the records in a disk.
• There are number of subtrees to each node. So that the height of the tree is relatively
small. So that only small number of nodes must be read from disk to retrieve an item. The
goal of B-trees is to get fast access of the data.
• A B-tree allows search-key values to appear only once (if they are unique), unlike a
B+-tree, where a value may appear in a nonleaf node, in addition to appearing in a leaf
node.
Step 2: If we insert the value 30. The sequence becomes 10,20,30. As only two key
values are allowed in each node (being order 3), the 20 will go up.
Step 5: Insert 40
Step 6: Insert 50. The sequence becomes 30,40,50. The 40 will go up. But again it
forms 12,20,40 sequence and then 20 will go up. Thus the final B Tree will be,
3) Bucket: The hash function H(key) is used to map several dictionary entries in the hash
table. Each position of the hash table is called bucket.
4) Collision: Collision is situation in which hash function returns the same address for more
than one record.
For example:
Hence we will extend the table by assuming the bit size as 3. The above indicated bucket A
will split based on 001 and 101.
a) Insert 2 will be 2 mod 8 = 2 = 010. If we consider last two digits i.e. 10 then there is
no bucket. So we get,
b) Insert 24 : 24 mod 8 = 0 = 000. The bucket in which 24 can be inserted is 8, 12, 28.
But as this bucket is full we split it in two buckets based on digits 000 100.
c) Delete 5: On deleting 5, we get one bucket pointed by 101 as empty. This will also
result in reducing the local depth of the bucket pointed by 001. Hence we get,
d) Delete 12: We will simply delete 12 from the corresponding bucket there can not be
any merging of buckets on deletion. The result of deletion is as given below
Difference between Static and Dynamic Hashing
8. Query Processing
• Query processing is a collection of activities that are involved in extracting data from
database.
• During query processing there is translation high level database language queries into the
expressions that can be used at the physical level of filesystem.
• There are three basic steps involved in query processing and those are -
1. Parsing and Translation
• In this step the query is translated into its internal form and then into relational algebra.
• Parser checks syntax and verifies relations.
• For instance - If we submit the query as,
SELECT RollNo, name
FROM Student
HAVING RollNo=10
Then it will issue a syntactical error message as the correct query should be
SELECT RollNo, name
FROM Student
HAVING RollNo=10
Thus during this step the syntax of the query is checked so that only correct and verified
query can be submitted for further processing.
2. Optimization
• During this process thequery evaluation plan is prepared from all the relational algebraic
expressions. bud off
• The query cost for all the evaluation plans is calculated.
• Amongst all equivalent evaluation plans the one with lowest cost is chosen.
• Cost is estimated using statistical information from the database catalog, such asthe
number of tuples in each relation, size of tuples, etc.
3. Evaluation
• The query-execution engine takes a query-evaluation plan, executes that plan, and returns
the answers to the query.
For example - If the SQL query is,
SELECT balance
FROM account
WHERE balance<1000
Step 1: This query is first verified by the parser and translator unit for correct syntax. If so
then the relational algebra expressions can be obtained. For the above given queries there
are two possible relational algebra
(1) σbalance<1000(Πbalance (account))
(2) Πbalance ( σbalance<1000 (account))
Step 2: Query Evaluation Plan: To specify fully how to evaluate a query, we need not only to
provide the relational-algebra expression, but also to annotate it with instructions specifying
how to evaluate each operation. For that purpose, using the order of evaluation of queries,
two query evaluation plans are prepared. These are as follows
Associated with each query evaluation plan there is a query cost. The query optimization
selects the query evaluation plan having minimum query cost.
Once the query plan is chosen, the query is evaluated with that plan and the result of the
query is output.
To locate a data entry, we apply a hash function to search the data we us last two digits of
binary representation of number. For instance binary representation of 32* = 10000000. The
last two bits are 00. Hence we store 32* accordingly.
Insertion operation :
• Suppose we want to insert 20* (binary 10100). But with 00, the bucket A is full. So we must
split the bucket by allocating new bucket and redistributing the contents, bellsp across the
old bucket and its split image.
• For splitting, we consider last three bits of h(r).
• The redistribution while insertion of 20* is as shown in following Fig. 4.12.2.
The split image of bucket A i.e. A2 and old bucket A are based on last two bits i.e. 00. Here
we need two data pages, to adjacent additional data record. Therefore here it is necessary
to double the directory using three bits instead of two bits. Hence,
• There will be binary versions for buckets A and A2 as 000 and 100.
• In extendible hashing, last bits d is called global depth for directory and d is called local
depth for data pages or buckets. After insetion of 20*, the global depth becomes 3 as we
consider last three bits and local depth of A and A2 buckets become 3 as we are considering
last three bits for placing the data records. Refer Fig. 4.12.3.
• Suppose if we want to insert 11*, it belongs to bucket B, which is already full. Hence let us
split bucket B into old bucket B and split image of B as B2.
• The local depth of B and B2 now becomes 3.
• Now for bucket B, we get and 1=001
11 100011
• For bucket B2, we get
5=101
29 = 11101
and 21 =10101
After insertion of 11* we get the scenario as follows,
Example 4.12.1 The following key values are organized in an extendible hashing technique.
13589 12 17 28. Show the extendible hash structure for this file if the hash function is h(x) =
x mod 8 and buckets can hold three records. Show how extendable hash structure changes
as the result of each of the following steps:
Insert 2
Insert 24
Delete 5
Delete 12
Solution:
Step 1: Initially we assume the hash function based on last two bits, of result of hash
function.
1 mod 8=1=001
3 mod 8=3 = 011
5 mod 8=5= 101
8 mod 8=0=000
9 mod 8=1=001
12 mod 8 = 4=100
17 mod 8=1=001
28 mod 84 = 100
The extendible hash table will be,
Hence we will extend the table by assuming the bit size as 3. The above indicated bucket A
will split based on 001 and 101.
a) Insert 2 will be 2 mod 8 = 2 = 010. If we consider last two digits i.e. 10 then there is
no bucket. So we get,
b) Insert 24 : 24 mod 8 = 0 = 000. The bucket in which 24 can be inserted is 8, 12, 28.
But as this bucket is full we split it in two buckets based on digits 000 100.
c) Delete 5: On deleting 5, we get one bucket pointed by 101 as empty. This will also
result in reducing the local depth of the bucket pointed by 001. Hence we get,
d) Delete 12: We will simply delete 12 from the corresponding bucket there can not be
any merging of buckets on deletion. The result of deletion is as given below
Difference between Static and Dynamic Hashing
10. i) Reliability and File Organization
Improving Reliability Through Redundancy in DBMS
Reliability refers to a database system's ability to function correctly and recover from
failures, ensuring consistent availability and integrity of data. One key approach to improving
reliability is redundancy.
Redundancy Overview
Redundancy involves maintaining duplicate or additional resources (data, hardware, or
systems) to minimize the impact of failures. This ensures that even if a failure occurs, the
system can continue to function or recover quickly.
Challenges of Redundancy
1. Increased Storage Costs: Storing redundant data or components requires more
resources.
2. Synchronization Overhead: Ensuring that redundant copies remain consistent adds
complexity.
3. Potential for Data Anomalies: If redundancy is poorly managed, it can lead to data
inconsistencies.
Conclusion
Redundancy is a critical strategy in DBMS to improve reliability by mitigating the impact of
failures. Whether through data replication, hardware duplication, or network backups,
redundancy ensures that systems maintain high availability, fault tolerance, and data
integrity. However, redundancy should be implemented with careful consideration of costs,
performance trade-offs, and synchronization mechanisms.
Conclusion
The way records are represented and organized in files directly impacts the efficiency of
data retrieval, insertion, and maintenance. Fixed-length records are simple but waste space,
whereas variable-length records optimize storage but are more complex. File organization
methods like heap, sequential, clustered, and hashed further determine the performance and
usability of database systems based on the specific application requirements.