0% found this document useful (0 votes)
11 views138 pages

Unit 1 (DBMS) Merged

This document provides an overview of databases and Database Management Systems (DBMS), explaining the difference between raw data and meaningful information. It discusses the components, advantages, and disadvantages of DBMS, as well as various data models and database languages. Additionally, it outlines the architecture of DBMS and different types of user interfaces for interacting with databases.

Uploaded by

Ravi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views138 pages

Unit 1 (DBMS) Merged

This document provides an overview of databases and Database Management Systems (DBMS), explaining the difference between raw data and meaningful information. It discusses the components, advantages, and disadvantages of DBMS, as well as various data models and database languages. Additionally, it outlines the architecture of DBMS and different types of user interfaces for interacting with databases.

Uploaded by

Ravi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 138

UNIT 1

Introduction:
Data is a raw and unprocessed form of facts and figures. It is usually meaningless.
For example,3, Deepti,90, 88,70 is considered data. However, when you interpret this
data as Deepti has obtained 90, 88, and 70 marks in three subjects; then this data
becomes meaningful.

Thus, information is the data, which has been processed and has meaning.
It can be easily interpreted. So, It is essential to organize this available data, in
such a way that it can be altered, stored, and retrieved easily. This can be achieved
with the help of a database.

Database:
Database is a collection of data, which is stored in a systematic manner,
i.e., in the form of tables so that you can retrieve and manipulate the
information quickly and efficiently from a large set of data.

A database consists of one or more tables. A table holds information


in the form of rows and columns.

Field

Roll_no Name Marks_english Marks_hindi Marks_maths

1 Ravi 56 77 95

2 Riya 87 91 79

3 Deepti 90 88 70
Record

Data
UNIT 1
A database is basically used developer, administrators and the Users.

DBMS

Developer End-Users

Database
Administrators

DBMS: The full name of DBMS is Database Management System.


 DBMS is a software which is used to create, delete and manage
the database.

 In other word, DBMS is a software application used to store,


manage, delete and access data in a database.”

 DBMS is a set of programs that act as an interface between the


user and the database.

 There are many commands in DBMS, with the help of which the
user can easily create the database, manage the database and
delete the database.
DBMS Through the user can do the following: -
1. The user can create, delete and manage the database.

2. User can store the data in the database, modify the data and
delete the data.

3. The user can access the data present in the database as per his
need.

4. Through this user are managed and monitored in the database.


UNIT 1
Simply put, “DBMS is a software that acts as an interface between end users
and the database. With the help of this, the user can store the data properly in
the database and can easily access the data whenever he wants.
The database management systems used today are: - dBase, FoxPro, IMS,
Oracle, MySQL, SQL , and DB2 etc.

Advantages of DBMS:
1. Controlling data redundancy – It can be control data redundancy
because it stores all the data in one single database file and that
recorded data is placed in the database.
2. Data Sharing – In RDBMS, The authorized user of an organization
can share the data among multiple users.
3. Backup and recovery - It provide backup and recovery which create
automatic backup of data from hardware and software failure and
restores the data, if required.
4. Multiple user interface - It provide multiple user different type of user
interface like GUI, application program interface.
5. Data Sharing - Date can be accessed only by authorized users of
the organization.
6. Easily maintenance - It can be easily maintainable due to centralized
nature of the database system.

Disadvantages of DBMS:

1. High cost of hardware & software – It required a high speed of


hardware and software data processor and large memory size of
run DBMS software
2. Size – It occupied a large space of disks and large memory to run
them efficiently.
3. Cost of staff training – Dbms are often complex system, so the
training is required for the user to use the database. The
organization has to be paid a lots of amount for the training staff to
run the dbms.
4. Higher impact of failure – Failure is highly impacted the database
because in most of the organization, all the data stored in a single
database and if the database is damaged due to electric failure or
database corruption then the data may be lost forever.
5. Complexity - Database system creates addition complexity and
requirement. Database structure is complex so that it is difficult to
manage.
UNIT 1

Components of DBMS:
A database management system is made up of the following components: -
 Hardware
 Software
 Data
 Users
 Procedures
Hardware
Hardware includes our computer system which is used to store and access our
database. Under this come physical electronic devices such as – computer, I/O
channels, storage devices etc. Mostly hard disk is used to store data in
computer system.
Software
It is the most important component of DBMS. It is a group of programs that is
used to control and manage the entire database.
DBMS software is located between the database and the users. It provides a
very easy interface to store, update, and retrieve data in the database.
Data
Data is one of the main important components of DBMS.
Users
There are many users in it who access the data according to their needs. The
capability and need of each user are different. In this the users are as follows: -
2. Database administrator
3. Database designers
4. End-users
5. Application programmers
Procedures
Procedures come with rules and instructions for using the database
management system. Procedures tell how to use the database in the
system. Such as: - Installing and setting up the DBMS, logging in to the
database, logging out, backing up the database, and handling the database,
etc.
UNIT 1
File Oriented System: -
The manual paper-based data store system has many disadvantages, to overcome
for the disadvantages of paper-based system. The computer-based system is
developed.
File oriented system are the first attempt of the computerized based system. File
system is a method of organizing the files with a hard disk or other
medium of storage. File system arranges the files and helps in retrieving
the files, when required. It is compatible with different file types, such
as mp3, doc, txt, mp4, etc. and these are also grouped into directories.
It also influences the method of writing and reading data to the hard
disk.
“This System stores the group of directories into separate file and each
files operate independently from other files in the storages”. As shown
here take the Example of Bank has a personal department, Account
department, Loan department, each department has different file to store
the data.

It is nothing but the collection of programs which manage & store data in
file and folder in a computer hard disk. It helps in reading and writing data
to the hard disk.
UNIT 1
Advantages of file oriented systems:

1. Backup: It is possible to take faster and automatic back-up of


database stored in files of computer-based systems.

2. Compactness: It is possible to store data compactly.

3. Data Retrieval: Computer-based systems provide enhanced data


retrieval techniques to retrieve data stored in files in easy and
efficient way.

4. Editing: It is easy to edit any information stored in computers in


form of files. Specific application programs or editing software can be
used for this purpose.

5. Remote Access: In computer-based systems, it is possible to access


data remotely. so, it accesses data it is not necessary for a user to
remain present at location where these data are kept.

6. Sharing: Data stored in files of computer-based systems can be


shared among multiple users at a same time.

Disadvantage of File-oriented system:

1. Data Redundancy: It is possible that the same information may be


duplicated in different files. This leads to data redundancy results in
memory wastage.

2. Data Inconsistency: Because of data redundancy, it is possible that


data may not be in consistent state.

3. Limited Data Sharing: Data are scattered in various files. Also,


different files may have different formats and these files may be
stored in different folders may be of different departments. So, due
to this data isolation, it is difficult to share data among different
applications.
UNIT 1

4. Integrity Problems: Data integrity means that the


data contained inthe database in both correct and
consistent. for this purpose, the data stored in
database must satisfy correct and constraints.
5. Atomicity Problems: Any operation on database
must be atomic.This means, it must happen in it’s
entirely or not at all.

6. Concurrent Access Anomalies: Multiple users are


allowed to accessdata simultaneously. This is for the
sake of better performance and faster response.

7. Security Problems: Database should be


accessible to users in limited way. Each user should
be allowed to access data concerninghis
requirements only.

Difference Between DBMS and File System


UNIT 1
Data Model:

It is a set of rules & standard that describe how the database is


organized/store data is called Data Model. The data model also
defines us how the logical structure of the database is built. The
data model organizes and stores the data.

For Example, as engineers prepare a model of a house before


building it, similarly database designers prepare a data model to
improve the database design.

Types of Dbms
models:-

Hierarchical Model
Relational Model
Network Model
Object-Oriented
Model E – R Model
1) Hierarchical
Model
The hierarchical data model is one of the oldest models, developed in the 1950s by
IBM. There is a parent-child relationship, each entity only one parent and many
children, there is only oneentity which is call root.
In this model, data is organizing in a tree-
like structure.
Advantages –
There is a parent-child relationship in it due to which its concept
are simple. It provides database security.
one to many
relationships.
Disadvantages –
It is not flexible.
Changes in its structure have to be made in all the programs.
UNIT 1

2) Network Data Model


It is same as hierarchical model, but in this child node can have more
than one parent. Relation is shown by Arrows. It follows navigation or
indexing technique.

Advantages:
1. More flexible than hierarchical model.
2. Direct approach is allowed.
3. 1 to 1, 1 to many relationships is allowed.

Disadvantages:
1. It’s more complex.
2. Slow and difficult to maintain and operate.
3. Diagram structures are very complex and massive.
4. If we update even a single node, the whole dependency changed.
UNIT 1
3) Relational DBMS

In this tables are used to store data. Tables show relation between data
contained. That’s why these are called table or relation. Columns are used to
store fields or attributes, and rows are used to store data of the Entity. (Entity
are real world objects for computers like, mobile, human, vehicles are Entity
for computer.)
In this we can splits large data into smaller and simpler tables and connect
them all with the help of unique key concept.
Each block of the table is called Tuple or record.
SQL (Structured Query Language) is used in this DBMS.

Advantages:
1. Easy to create, maintain, operate.
2. Very flexible.
3. Multiple tables allowed, which improves data simplification.
4. Single tuple operation is allowed, means the changed will effect only
the selected tuple, remaining table will be remained same.
5. Unique key concept like Primary key, foreign key makes operations
easier and safe.

Disadvantages:
1. Software used to create these are very expensive, and this kind of
software requires expensive hardware as well.
2. Highly skilled and creative man power is required.
UNIT 1
4) Object Oriented Database Model

When we applied object-oriented programming concepts on Relational data


model, then it’s called Object Oriented database model.

Here are the basic concepts of Object-Oriented programming:


1. Object: Anything from real world is an object. Like mobile, human, vehicles, movies,books etc.
2. Class: Group of relevant objects is called class like mobile is a class, and different
companies’ different models are objects of that class.
3. Inheritance: Child products copies characteristics from their parent product.
4. Encapsulation: Hide unnecessary data from user.
5. Abstraction: Show only required or desired data to user.
6. Polymorphism: Single object can perform different tasks as per condition’s
requirement.

The combined model often called Object-Relational Data Model.

Advantages:
1. Create simplicity, easy access for users.
2. Improves security.
3. Provide durability or long life to the product.
4. Easily holds very large group of data.

Disadvantages:
1. No proper standards to create uniform development.
2. Difficult to create and maintain.
UNIT 1
UNIT 1
UNIT 1
UNIT 1
UNIT 1
UNIT 1
UNIT 1
DBMS Architecture:

DBMS act as interface between user and database.


The user requests the dbms to perform various operations.
UNIT 1
UNIT 1
UNIT 1
UNIT 1
UNIT 1
UNIT 1
UNIT 1
UNIT 1
UNIT 1
UNIT 1
UNIT 1
Database Languages and Interfaces

Database Languages
Database languages are used to read, store and update the data in the
database. Specific languages are used to perform various operations of the
database.

Types of Database Languages

DDL(Data Definition Language)

Data Definition Language(DDL) is used for describing structures or patterns


and its relationship in a database. It is also used to define the database
schema, tables, index, Constraints, etc. It can also be used to store
information like the number of tables, names, columns, indexes, etc. The
commands only affect the database structure and not the data.

The commands used in DDL are:

Create: It is used to create a database or table.

Alter: It is used to make a change in the structure of a database.

Drop: It is used to completely delete a table from the database


UNIT 1
Truncate: It is used to delete the entities inside the table while holding the
structure of the table.

DML (Data Manipulation Language)

DML is used to manipulate the data present in the table or database. We can
easily perform operations such as store, modify, update, and delete on the
database.

The commands used in DML are:

Select: It shows the record of the specific table. Also, it can be used with a
WHERE clause to get the particular record.

Insert: It allows users to insert data into the database or tables.

Update: It is used to update or modify the existing data in database tables.

Delete: It is used to delete records from the database tables. Also, it can be
used with a WHERE clause to delete a particular row from the table.

DCL(Data Control Language)

DCL works to deal with SQL commands that are used to permit a user to
access, modify and work on a database. it is used to access stored data. It
gives access, revokes access, and changes the permission to the owner of
the database as per the requirement.

The commands used in DCL are:

Grant: It is used to give access to security privileges to a specific database


user.

Revoke: It is used to revoke the access from the user that is being granted
by the grant command.
UNIT 1
TCL (Transaction Control Language)

It can be grouped into a logical transaction and is used to run the changes
made by the DML command in the database.

Commit: Transaction on the database is saved using Commit.

Rollback: The database gets restored to the original since the last commit.

DBMS Interface

Database Interface is a user interface which provides the user with easy
access to the database.

With the help of database interface, the user can easily insert, delete and
update data in the database.

Database Interface is also called DBMS Interface.

Types of Database Interface

There are following types of Database Interface: -


1. Form Based Interface
2. Menu Based Interface
3. Natural Language Interface
4. Graphical User Interface
5. Interface for DBA
6. Command Line Interface
UNIT 1

The most typical kinds of DBMS interfaces are as follows −

Command-Line Interface (CLI)

The earliest and most fundamental form of DBMS interface is the command-
line interface (CLI). Through the use of a command prompt, users may
communicate with the database by inputting commands. Users can only
access and modify the database using particular commands and syntax.
Today, programmers and database administrators are the main users of CLI.

Graphical User Interface (GUI)

The most prevalent kind of DBMS interface used nowadays is the graphical
user interface (GUI). Users may interact with the database using the user-
friendly interface's graphical menus and icons. GUIs are user-friendly and
need little to no programming experience. They are mostly utilized by non-
programmers who are end users.

Web-Based Interface

Users can access the database using a browser using the web-based
interface. Users can use online forms, buttons, and other graphical
UNIT 1
components to interact with the database using a web browser. Because
they can be accessed from any location with an internet connection, web-
based interfaces are common.

Natural Language Interface

This interface enables users to communicate with the database in their own
language. The system will convert user inquiries into SQL instructions and
carry them out if they are entered in plain English. Users who are unfamiliar
with SQL commands might benefit from using natural language interfaces.

Forms-Based Interface

Using the forms-based interface, users may enter data into the database
using a graphical user interface. Users can complete pre-made forms that
provide fields for particular categories of information, such as name,
address, and phone number. Applications for data entry frequently employ
forms-based interfaces.

Menu-Based Interface

Users have access to a number of predefined menus and choices when using
the menu-based interface to interact with the database. Users can choose
the menu choice that reflects the intended action, such as adding or
removing data, from the available options. The majority of specialized
applications and embedded systems employ menu-based interfaces.

Users can interact with the database thanks in large part to DBMS interfaces.
The user's technical proficiency, the nature of the program, and the target
market all influence the interface choice. The user experience and database
usability may both be improved by selecting the correct interface.
UNIT 1
Concept of Generalization, Specialization & Aggregation.

Generalization

o Generalization is like a bottom-up approach in which two or more entities of


lower level combine to form a higher-level entity if they have some attributes
in common.
o In generalization, an entity of a higher level can also combine with the entities
of the lower level to form a further higher level entity.
o Generalization is more like subclass and superclass system, but the only
difference is the approach. Generalization uses the bottom-up approach.
o In generalization, entities are combined to form a more generalized entity,
i.e., subclasses are combined to make a superclass.
o Generalization is represented by a triangle component table ISA .The label
ISA stands for “is a” and represents

For example, Faculty and Student entities can be generalized and create a higher
level entity Person.
UNIT 1
Specialization

o Specialization is a top-down approach, and it is opposite to Generalization.


In specialization, one higher level entity can be broken down into two lower
level entities.
o Specialization is used to identify the subset of an entity set that shares some
distinguishing characteristics.
o Normally, the superclass is defined first, the subclass and its related attributes
are defined next, and relationship set are then added.

For example: In an Employee management system, EMPLOYEE entity can be


specialized as TESTER or DEVELOPER based on what role they play in the company.
UNIT 1

Aggregation
UNIT 1
In aggregation, the relation between two entities is treated as a single entity. In
aggregation, relationship with its corresponding entities is aggregated into a higher-
level entity.

For example: Center entity offers the Course entity act as a single entity in the
relationship which is in a relationship with another entity visitor. In the real world, if
a visitor visits a coaching center then he will never enquiry about the Course only or
just about the Center instead he will ask the enquiry about both.
UNIT 2
Relational Model in DBMS
Relational model can represent as a table with columns and rows. Each row is
known as a tuple. Each table of the column has a name or attribute.

Domain: It contains a set of atomic values that an attribute can take.

Attribute: It contains the name of a column in a particular table. Each attribute


is a domain value.

Relational instance: In the relational database system, the relational instance is


represented by a finite set of tuples. Relation instances do not have duplicate
tuples.

Relational schema: A relational schema contains the name of the relation and
name of all columns or attributes.

Relational key: In the relational key, each row has one or more attributes. It can
identify the row in the relation uniquely.

Roll_no name Address


101 Sonali Bhopal
102 Twinkle Indore
103 Darshan Mumbai
o In the given table, ROLL_NO, Name and Address are the attributes.
o The instance of schema STUDENT has 3 tuples.

Properties of Relations

o Name of the relation is distinct from all other relations.


o Each relation cell contains exactly one atomic (single) value.
o Each attribute contains a distinct name.
o Attribute domain has no significance.
o Tuple has no duplicate value.
o Order of tuple can have a different sequence.
KEY CONCEPTS

Keys play an important role in the relational database.


It is used to uniquely identify any record or row of data from the table. It is also
used to establish and identify relationships between tables.

For example, ID is used as a key in the student table because it is unique for each
student. In the PERSON table, passport_number, license_number, SSN are keys
since they are unique for each person.

Types of Keys
Primary Key:

Primary key can be defined as the minimum no. of candidate key that is chosen
by the database designer as the principle means of identifying entities within an
entity set.
 It is a unique key.
 It can identify only one tuple (a record) at a time.
 It has no duplicate values, it has unique values.
 It cannot be NULL.
 Primary keys are not necessarily to be a single column; more than one
column can also be a primary key for a table.
Example: In STUDENT table
(ROLL_NO, NAME, Registration_no)
Primary key = P1 -> Roll_no
Candidate Key
The minimal set of attributes that can uniquely identify a tuple is known as a
candidate key.
 The minimal set of attributes that can uniquely identify a record.
 It must contain unique values.
 It can contain NULL values.
 Every table must have at least a single candidate key.
For example = In Student table with attribute (ROLL_NO, NAME,
Registration_no)
Candidate Key = C1 = ROLL_NO
C2 = Registration_no

Foreign Key

A foreign Key is a column whose value is the same as the primary key of
another table.
 It combines two or more relations (table) at a time.
 They act as a Cross reference between the tables.
 Foreign Key is the column of the table used to point to the primary key
of another table.
(PK) (P K)

ROLL NAM ID
ID BRANC ADDRESS
NO E
H
1 RIYA 123
123 CSE DELHI
2 SIYA 111
111 CE BHOPAL
3 PIYA 112
112 ME INDORE
(TABLE 1) (F K) (TABLE 2)
Super key
A Super key is a set of one or more attributes that, taken collectively, allow as
to identify uniquely an entity in the entity set.
For Example = In Student table with attribute.
(Roll_no, Name, Registration_no)
Super key = S1 -> Roll_no
S2 -> Registration_no
S3 -> Roll_no, Registration_no Roll_no

Composite Key
Whenever a primary key consist of more than one attribute is it is known as a
composite key.
For Example = In Student table with attribute.
(ROLLNO, SID, NAME, BRANCH)
Composite key -> ROLLNO, SID

ROLLNO SID NAME BRNACH


1 23 Riya CSE
2 22 Priya CYBER
3 43 Tiya DS
4 55 Liya AIML

Integrity Constraint
Integrity constraints are a set of rules. It is used to maintain the quality of information.
Integrity constraints ensure that the data insertion, updating, and other processes have
to be performed in such a way that data integrity is not affected.

Types of Integrity Constraint

Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an
attribute. The data type of domain includes string, character, integer, time, date,
currency, etc. The value of the attribute must be available in the corresponding domain.

Example:
2. Entity integrity constraints
o The entity integrity constraint states that primary key value can't be null. This is because
the primary key value is used to identify individual rows in relation and if the primary
key has a null value, then we can't identify those rows.A table can contain a null value
other than the primary key field.

Example:

3. Referential Integrity Constraints


o A referential integrity constraint is specified between two tables. In the Referential
integrity constraints, if a foreign key in Table 1 refers to the Primary Key of Table 2, then
every value of the Foreign Key in Table 1 must be null or be available in Table 2.

Example:
Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.An
entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.

Example:

Relational Algebra
came in 1970 and was given by Edgar F. Codd (Father of DBMS). It is also known as
Procedural Query Language (PQL) as in PQL, a programmer/user has to mention two
things, "What to Do" and "How to Do".

Suppose our data is stored in a database, then relational algebra is used to access the data
from the database.

The First thing is we have to access the data, this needs to be specified in the query as "What
to Do", but we have to also specify the method/procedure in the query that is "How to
Do" or how to access the data from the database.

Types of Relational Operations in DBMS


In Relational Algebra, we have two types of Operations.

 Basic Operations
 Derived Operations
Basic Operations
Six fundamental operations are mentioned below. The majority of data retrieval operations
are carried out by these.

Example => Student

ROLL_NO NAME AGE


101 A 23
102 B 22
103 C 21
104 D 18
104 E 17

Example => Student1

ROLL_NO NAME AGE


101 A 23
102 F 22
103 C 21
104 I 18
104 E 17
Select (σ)

Select operation is done by Selection Operator which is represented by "sigma"(σ). It is used


to retrieve tuples(rows) from the table where the given condition is satisfied. It is a unary
operator means it requires only one operand.

The general format of select operation is


Notation: σ <select condition>(R)
Where σ is used to represent SELECTION prediction
R is used to represent RELATION
p is the logic formula

Select condition is the relational operator like =, =!, <,>.

Query 1 Give all information of student having roll no is 104

Solution => σ Roll_no = 104(student)

Query 2 Find all information of student having name is C and age is 21

Solution => σ name = “C” and age = 21(student)

OR

σ name = “C” V age = 21(student)

Project (∏)

Project operation is done by Projection Operator which is represented by "pi"(∏). It is used to


retrieve certain attributes(columns) from the table. It is also known as vertical partitioning as
it separates the table vertically. It is also a unary operator.

The general format of select operation is


Notation: ∏ a(r) OR ∏ <attribute list>(r)
Where ∏ is used to represent PROJECTION operation
r is used to represent RELATION
a is the attribute list

Query 1 Find student name in the student table

Solution => ∏ name(student)

Query 2 Find student name and age

Solution => ∏ name, age(student)


Union (∪)

Union operation is done by Union Operator which is represented by "union"(∪). It is the


same as the union operator from set theory, i.e., it selects all tuples from both relations but
with the exception that for the union of two relations/tables both relations must have the
same set of Attributes. It is a binary operator as it requires two operands.
Notation: R ∪ S
Where R is the first relation
S is the second relation

If relations don't have the same set of attributes, then the union of such relations will result
in NULL.

Union operation to be valid the following condition must hold –

 R and S must have the same number of attributes.


 Attribute domains must be compatible.
 Duplicate tuples are automatically eliminated.

Query 1 Fine name of student in both table

Solution => ∏ name(student) U ∏ name(student1)

Select name from student union select name from student1;

Set Difference (-)

Set Difference as its name indicates is the difference between two relations (R-S). It is
denoted by a "Hyphen"(-) and it returns all the tuples(rows) which are in relation R but not in
relation S. It is also a binary operator.
Notation: R – S = R but not B
Where R is the first relation
S is the second relation

The difference operation is not commutative

R - S is not equivalent to S - R

Just like union, the set difference also comes with the exception of the same set of attributes
in both relations.

Cross product (X)

Cartesian product is denoted by the "X" symbol. Let's say we have two relations R and S.
Cartesian product will combine every tuple(row) from R with all the tuples from S. I know it
sounds complicated, but once we look at an example, you'll see what I mean.
Notation: R X S
Where R is the first relation
S is the second relation

As we can see from the notation it is also a binary operator.

T1 T2
A B C x C D E
1 2 3 2 2 3
2 1 2 3 3 2

(T1 X T2)

A B C C D E
1 2 3 2 2 3
1 2 3 3 3 2
2 1 2 2 2 3
2 1 2 3 3 2

Rename (ρ)

Rename operation is denoted by "Rho"(ρ). As its name suggests it is used to rename the
output relation. Rename operator too is a binary operator.
Notation: ρ (R, S)
Where R is the new relation’s name
S is the old relation’s name

Query 1: To rename the table student to employee and its attributes name, address, phone_no

ρ employee (name, address, phone_no) (student)

Derived Operations
Also known as extended operations, these operations can be derived from basic operations
and hence named Derived Operations. These include three operations: Join Operations,
Intersection operations, and Division operations.

Join Operations

Join Operation in DBMS are binary operations that allow us to combine two or more
relations.
They are further classified into two types: Inner Join, and Outer Join.
Intersection (∩)

Intersection operation is done by Intersection Operator which is represented


by "intersection"(∩). It is the same as the intersection operator from set theory, i.e., it selects
all the tuples which are present in both relations. It is a binary operator as it requires two
operands. Also, it eliminates duplicates.
Notation: R ∩ S
Where R is the first relation
S is the second relation

R = {1,2,3} AND S = {3,4}

(R ∩ S) = {3}

The INTERSECTION operation is both commutative and associative

Commutative law of intersection: R ∩ S = S ∩ R

Associative law of intersection: (R ∩ S) ∩ P = R ∩ (S ∩ P)

Division (÷)

Division Operation is represented by "division"(÷ or /) operator and is used in queries that


involve keywords "every", "all", etc.
Notation: R (X, Y)/S(Y)
Here,
R is the first relation from which data is retrieved.
S is the second relation that will help to retrieve the data.
X and Y are the attributes/columns present in relation.

Example => Student

S_id C_id
S1 Dbms
S2 dbms
S1 oopm
S4 Oopm

Example => Course

C_id
Dbms
oopm
the query is to return the S_ID of students who are enrolled in every/ALL course.

Student (s_id, c_id) / course(c_id) =S1

S_id
S1

JOIN: Join is used to combines rows from 2 or more tables based on a related column.
OR

Join commands joins two or more tables & create a temporary view joint table.

TYPES OF JOINS:

1) INNER JOIN/JOIN:
In SQL, INNER JOIN selects records that have matching values in both tables
as long as the condition is satisfied. It returns the combination of all rows from
both the tables where the condition satisfies.
SELECTED COLUMN

Syntax: Select column_name from table1 inner join table2 on table1.column_name =


table2.column_name;

Example:Select std1.sid, name, bid, bname from std1 inner join library on std1.sid =
library.sid;

Join all columns

Syntax:Select*from table1 inner join table2 on table1.column_name = table2.column_name;

Example: Select*from std1 inner join library on std1.sid = library.sid;

2) Left Outer Join/LEFT JOIN:


The SQL left join returns all the values from left table and the matching values from the right
table. If there is no matching join value, it will return NULL.

SELECTED COLUMN
Syntax: Select column_name from table1 left join table2 on table1.column_name =
table2.column_name;

Example: Select std1.sid, name, bid, bname from std1 left join library on std1.sid =
library.sid;

Join all columns

Syntax: Select*from table1 left join table2 on table1.column_name = table2.column_name;

Example: Select*from std1 left join library on std1.sid = library.sid;

3) RIGHT OUTER JOIN/RIGHT JOIN:


In SQL, RIGHT JOIN returns all the values from the values from the rows of right table and
the matched values from the left table. If there is no matching in both tables, it will return
NULL.

SELECTED COLUMN

Syntax: Select column_name from table1 right join table2 on table1.column_name =


table2.column_name;

Example: Select std1.sid, name, bid, bname from std1 right join library on std1.sid =
library.sid;
Join all columns

Syntax: Select*from table1 right join table2 on table1.column_name = table2.column_name;

Example: Select*from std1 right join library on std1.sid = library.sid;

4) Full OUTER JOIN/FULL JOIN:


In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join
tables have all the records from both tables. It puts NULL on the place of matches not found.
SELECTED COLUMN

Syntax:Select column_name from table1 LEFT join table2 on table1.column_name =


table2.column_name

UNION Select column_name from table1 right join table2 on table1.column_name =


table2.column_name;
Example: Select std1.sid, name, bid, bname from std1 LEFT join library on std1.sid =
library.sid

UNION Select std1.sid, name, bid, bname from std1 right join library on std1.sid =
library.sid;
Join all columns

Syntax: Select*from table1 LEFT join table2 on table1.column_name = table2.column_name

UNION Select*from table1 RIGHT join table2 on table1.column_name =


table2.column_name;

Example: Select*from std1 LEFT join library on std1.sid = library.sid

UNION Select*from std1 right join library on std1.sid = library.sid;

Relational Calculus
Relational calculus is a non-procedural query language, and instead of algebra, it uses
mathematical predicate calculus. Relational Calculus in database management
system (DBMS) is all about "What to do?". Relational calculus does not tell us how to get the
results from the Database, but it just cares about what to do.
The theory of Relational calculus was introduced by computer scientist and
mathematician Edgar Codd.
It is found in two forms. These are
 Tuple relational calculus which was originally proposed by Codd in the year
1972 and
 Domain relational calculus which was proposed by Lacroix and Pirotte in the
year 1977.

Tuple Relational Calculus (TRC)

Tuple relational calculus is used for selecting those tuples that satisfy the given
condition. Tuple Relational Calculus in DBMS uses a tuple variable (t) that goes to each row of the
table and checks if the predicate is true or false for the given row. Depending on the given predicate
condition, it returns the row or part of the row.

The Tuple Relational Calculus Expression Syntax

{t | P(t)}

Where t is the tuple variable that runs over every Row, and P(t) is the predicate logic
expression or condition. The curly braces {} are used to indicate that the expression is a set of
tuples.

Table: Student

First_Name Last_Name Age


---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Lets write relational calculus queries.

Query to display the last name of those students where age is greater than 30
{ t.Last_Name | Student(t) AND t.age > 30 }

Domain Relational Calculus (DRC)


In domain relational calculus the records are filtered based on the domains.
Again we take the same table to understand how DRC works.
Table: Student

First_Name Last_Name Age


---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28
Query to find the first name and age of students where student age is greater than 27

{< First_Name, Age > | ∈ Student ∧ Age > 27}


Note:
The symbols used for logical operators are: ∧ for AND, ∨ for OR and ┓ for NOT.

Output:

First_Name Age
---------- ----
Ajeet 30
Chaitanya 31
Carl 28

TRIGGERS
Trigger is actually a coding block, written in SQL. Trigger is related to some
specific event which may happen in database. If that event takes place, then the
trigger performs some specific action for which the trigger is designed. Trigger
is designed by the user but executes automatically. We can design multiple
triggers for multiple events. Triggers are for maintain security, integrity but also
helps users to access the database.
Normally Trigger is designed to execute, when DML statements like insert,
update, delete type commands are performed.
Like in a school management database, school’s fee is 5000. If a person enters
the amount less then or greater then this amount, then a trigger will execute and
write the value to “5000” automatically.
Basic parts of a trigger:
A trigger has three basic parts
1. A triggering event or statement
2. A trigger restriction
3. A trigger action
Syntax of a trigger
create trigger Trigger_name
(before | after)
[insert | update | delete] (triggering event or statement)
on [table_name]
[for each row]
[trigger_body] (trigger restriction)

In the final or updated view of table the result will be shown.

Here:
create trigger: code to create trigger
Trigger_name: name of the trigger.

[insert | update | delete]: on which operation you want to run trigger.

(before | after): before the operation or after the operation

on [table_name]: the name of table on you want to put a trigger.

[for each row]: after make any type of changes in any row present in any column
the trigger will execute.

[trigger_body]: what operation or action want to perform, write in allowed code


language.

Advantages of Triggers
1. Triggers can maintain the integrity of the data.
2. To perform a similar operation multiple time in a table we can write a
trigger for that action and after that the operation will execute itself.

Disadvantages of Triggers
1. It is very tough to make changes or to stop the trigger. Most of user even
can’t understand why the trigger is executed.
2. If we can’t understand its mechanism then we can’t stop it from execution.
3. A trigger is executed every time when any operation is performed in a
table. This can increase the burden of a database.
Types of triggers:
1. Row Triggers: Apply on rows
2. Statement Triggers: Apply on statement
3. Before Triggers: Before the operation or event
4. AFTER Triggers: after the operation or event
5. Cascading triggers: a trigger in a trigger

SQL QUERIES:

Data Definition Language (DDL)


o DDL changes the structure of the table like creating a table, deleting
a table, altering a table, etc.
o All the command of DDL are auto-committed that means it
permanently save all the changes in the database.

Here are some commands that come under DDL:

o CREATE
o ALTER
o DROP
o TRUNCATE

a. CREATE It is used to create a new table in the database.

Syntax:
CREATE TABLE TABLE_NAME (COL_NAME DATATYPES(size));

b. DROP: It is used to delete both the structure and record stored in the
table.

Syntax: DROP TABLE table_name;

c. ALTER: It is used to alter the structure of the database. This change could
be either to modify the characteristics of an existing attribute or probably to
add a new attribute.
Syntax: ALTER TABLE table_name ADD column_name;

d. TRUNCATE: It is used to delete all the rows from the table and free the
space containing the table.

Syntax:TRUNCATE TABLE table_name;

2. Data Manipulation Language


o DML commands are used to modify the database. It is responsible for
all form of changes in the database.
o The command of DML is not auto-committed that means it can't
permanently save all the changes in the database. They can be
rollback.

Here are some commands that come under DML:

o INSERT
o UPDATE
o DELETE

a. INSERT: The INSERT statement is a SQL query. It is used to insert data


into the row of a table.
Syntax:
INSERT INTO TABLE_NAME(col1, col2, col3,.... col N) VALUES (valu
e1, value2, value3, .... valueN);
OR
INSERT INTO TABLE_NAME VALUES (value1, value2, value3, .... val
ueN);
b. UPDATE: This command is used to update or modify the value of a
column in the table.

Syntax:
UPDATE table_name SET [column_name1= value1,...column_nameN = v
alueN] [WHERE condition];

c. DELETE: It is used to remove one or more row from a table.

Syntax: DELETE FROM table_name WHERE condition;


Data Query Language
DQL is used to fetch the data from the database.

It uses only one command:

o SELECT

a. SELECT: This is the same as the projection operation of relational


algebra. It is used to select the attribute based on the condition described by
WHERE clause.

Syntax: SELECT expressions FROM TABLES WHERE conditions;

SQL FUNTIONS:
For doing operations on data SQL has many built-in functions, they are categorized
in two categories and further sub-categorized in different seven functions under each
category. The categories are:

SQL Aggregate Functions


o SQL aggregation function is used to perform the calculations on multiple rows
of a single column of a table. It returns a single value.
1) COUNT FUNCTION
o COUNT function is used to Count the number of rows in a database table. It can work
on both numeric and non-numeric data types.
o COUNT function uses the COUNT(*) that returns the count of all the rows in a specified
table. COUNT(*) considers duplicate and Null.

SYNTAX : SELECT COUNT(COLUMN_NAME) FROM TABLE_NAME ;

2)SUM Function
Sum function is used to calculate the sum of all selected columns. It works on numeric
fields only.

Syntax:
SELECT Sum (column_name) from table_name;

3) AVG function
The AVG function is used to calculate the average value of the numeric type. AVG
function returns the average of all non-Null values.

Syntax:
SELECT Avg (column_name) from table_name;

4) MAX Function
MAX function is used to find the maximum value of a certain column. This function
determines the largest value of all selected values of a column.

Syntax:
SELECT Max (column_name) from table_name;
5) MIN Function
MIN function is used to find the minimum value of a certain column. This function
determines the smallest value of all selected values of a column.

Syntax:
SELECT Min (column_name) from table_name;
UNIT 3:

FUNCTIONAL DEPENDENCIES

Functional dependency is just like a constraint. It determines the relation between


two and more entities or attributes in a relation. In simple words it shows the
properties of an entity or attribute.
If an entity depends on another entity for its recognition, then this is known as
functional dependency. Take an example:

Sid Name
01 Raj
02 Raj
Table Student
In this student table, students name are similar “Raj”, but we know that both are
two different students, because they both have separate and unique Sid (Student
ID). So here in this table “Name” attribute is functionally depend on “Sid”
attribute. If we remove “SID” attribute then we can’t differentiate between both
entries that are they same or different entries.
Also, with the help of “Sid” we can retrieve student’s other details like address,
results, mobile number, school name etc.
So, we can say that Functional dependency is a constraint through which we can
find out values of another attribute’s values with given attribute A, but that
attribute A must be unique or a key attribute.
It is denoted as A → B, where A is an attribute or a set of attributes that is capable
of determining the value of attribute B. (B may be a set of attributes as well).
The left side of FD (functional dependencies) is known as a determinant, the
right side of the relation is known as a dependent.
So, in A → B, A is determinant and B is dependent.

TYPES OF FUNCTIONAL DEPENDENCIES:

1. Trivial Functional Dependency


2. Non-Trivial Functional Dependency

1. TRIVIAL FUNCTIONAL DEPENDENCY:


X → Y has trivial functional dependency if B is a subset of A.
The following dependencies are also trivial like: A → A, B → B.
Sid Name
01 Raj
02 Raj
In this table Sid → Name, means with the help of Sid we can find out name of
the student.
Here {Sid, Name} is a set S.
{Sid, Name} → Sid shows Trivial Functional Dependency because Sid is also a
subset of Set S. Also, Name is also a subset of Set S.
Also, Sid → Sid and Name → Name shows Trivial Functional Dependency.
2. NON-TRIVIAL FUNCTIONAL DEPENDENCY
X → Y has a non-trivial functional dependency if B is not a subset of A.
When value of X intersection Y is NULL means in both sets no common values
found, then X → Y is called as complete non-trivial. Also, these both sets are not
related to each other.
Sid Name
01 Raj
02 Raj
student1
Name Class
Raj 11
Raj 12
Student2
Here {Sid, Name} is a set S1 from table student1, and {Name, Class} is a set S2
from table student2.
So, Sid and Name are subset of set S1, but Class is a subset of set S2. Means
Class is not a subset of set S1.
Here Sid → Name, and Name → Class.
In simple words we can find out name of any student with the help of Sid directly,
because both are from common set S1.
we can’t find out class of any student with the help of Sid directly, because both
are from different sets.
PRIME AND NON-PRIME ATTRIBUTES, AND
ARMSTRONG'S AXIOMS
PRIME AND NON-PRIME ATTRIBUTES
In any database table or relation, those attributes which can become part of a
candidate key are Prime attributes. These attributes have unique values. These
Prime attributes may also become Part of primary key.
Remaining all attributes of that table are known as non-Prime attributes.
In any database table or relation those attributes which could not become part of
a candidate key are non-Prime attributes. These attributes don’t have unique
values.

Sid Sname Mobile Number Class


Here in this table Sid and Mobile Numbers are Prime attributes and Sname and
Class are non-Prime Attributes

ARMSTRONG’S AXIOMS/INFERENCE RULES


Introduced by William W. Armstrong, Armstrong axioms refer to the complete
set of inference rules or axioms. These rules are used to find out Closure of a set
of Functional Dependencies.

Mainly three rules or axioms are used as per given below:


1. Axiom of reflexivity
2. Axiom of augmentation
3. Axiom of transitivity
DERIVED OR SECONDARY RULES

CLOSURE OF A “SET OF FUNCTIONAL DEPENDENCIES”


CLOSURE OF A SET OF FUNCTIONAL DEPENDENCIES:
A Closure of a set of FD is a set of all possible FD that can be derived from a
given set of FD, also named as Complete set of FD for a given particular
relation R.
If set is F then closure is F+.
So, if F is consists of values x, then F+ must holds x+.
How to find closure of set of FD for a given relation R?
For this we follow these steps given below:
1. Add the attributes which are present on Left Hand Side in the original
functional dependency.
2. Now, add the attributes present on the Right-Hand Side of the functional
dependency.
3. Now check the other attributes that can be derived from the other given
functional dependencies.
4. Repeat this process till all the possible attributes are added in the closure.
Normalization:
Normalization is the process of organizing the data and the attributes of a database.
It is performed to reduce the data redundancy in a database and to ensure that data
is stored logically. Data redundancy in DBMS means having the same data but at
multiple places. It is necessary to remove data redundancy because it causes
anomalies in a database which makes it very hard for a database administrator to
maintain it.

Why Do We Need Normalization?


Normalization is used to reduce data redundancy. It provides a method to
remove the following anomalies from the database and bring it to a more
consistent state:

A database anomaly is a flaw in the database that occurs because of poor


planning and redundancy.

1. Insertion anomalies: This occurs when we are not able to insert data into
a database because some attributes may be missing at the time of
insertion.
2. Updation anomalies: This occurs when the same data items are repeated
with the same values and are not linked to each other.
3. Deletion anomalies: This occurs when deleting one part of the data
deletes the other necessary information from the database.

Each step has its own functioning and importance.


1NF
 A relation is in 1NF if
 All the contained values should be atomic values.
2NF
 A relation will be in 2NF if it is in 1NF and
 All non-key attributes are fully functional dependent on the key attribute
or primary key
3NF
 A relation will be in 3NF if it is in 2NF and
 No transition dependency exists between relations

BCNF
 It is a strict version of 3NF
4NF
 A relation will be in 4NF if
 It is in Boyce Codd's normal form and
 Hold no multi-valued attribute dependency
5NF
 A relation is in 5NF If it is in 4NF and
 Should not hold any join dependency between relations.

Functional dependency is a relationship that exists between two sets of


attributes of a relational table where one set of attributes can determine the
value of the other set of attributes. It is denoted by X -> Y, where X is called
a determinant and Y is called dependent.

There are various levels of normalizations.

First Normal Form (1NF)


A relation is in 1NF if every attribute is a single-valued attribute or it does not
contain any multi-valued or composite attribute, i.e., every attribute is an atomic
attribute. If there is a composite or multi-valued attribute, it violates the 1NF.
To solve this, we can create a new row for each of the values of the multi-
valued attribute to convert the table into the 1NF.

Let’s take an example of a relational table < StudentCourseDetail > that contains
the details of the employees of the company.

<StudentCourseDetail>
This is a simple database table and not normalized yet.
Let’s convert this table in 1NF table.

<EmployeeDetail>

As we can see in the normalized table all the attributes which have multiple vales such as
Course Id, Course Name and Contact number for each respective single entity, are now
separated in individual lines. So, all the attributes have atomic values.

Second Normal Form (2NF)


The normalization of 1NF relations to 2NF involves the elimination of partial
dependencies. A partial dependency in DBMS exists when any non-prime
attributes, i.e., an attribute not a part of the candidate key, is not fully
functionally dependent on one of the candidate keys.
For a relational table to be in second normal form, it must satisfy the following
rules:

1. The table must be in first normal form.


2. It must not contain any partial dependency, i.e., all non-prime attributes
are fully functionally dependent on the primary key.

If a partial dependency exists, we can divide the table to remove the partially
dependent attributes and move them to some other table where they fit in well.

Let us take an example of the following < StudentCourseDetail > table to


understand what is partial dependency and how to normalize the table to the
second normal form:

< StudentCourseDetail >

Here in this table Course attributes have 2 functional dependencies, such as SID and CID
SID Course
CID Course
So now we should remove this partial dependency, so the resultant table is become as given
below:

<StudentDetail> <CourseDetail>

So now we have two tables instead of one but these tables are interconnected with the help of
key attributes such as CID is present both the tables.

Third Normal Form (3NF)


3NF A relation will be in 3NF if
It is in 2NF and
No transition dependency exists between relations
For any Non-Trivial functional dependency such as

X is a super key
Or
Y is a prime attribute, means each element of Y is part of some candidate
key
What is Trivial dependency?

This is called Trivial dependency.


Let’s take a table which already have 2NF properties.
< StudentAddressDetail>
SID NAME CITY STATE PINCODE
1 Riya Bhopal MP 456212
2 Priya Mohali Punjab 342133
3 Ram Mumbai MH 892132
4 Laksh Mohali Punjab 342133
5 Soni Mumbai MH 892132

Find out Functional Dependencies

Sid Name

Sid City

Sid State

State Pincode

Here we have found one trivial dependency such as SID STATE


PINCODE So, we have to remove this
So now the resultants tables are as follows:
< StudentDetail> < AddressDetail>
SID NAME PINCODE PINCODE CITY STATE
1 Riya 456212 456212 Bhopal MP
2 Priya 342133 342133 Mohali Punjab
3 Ram 892132 892132 Mumbai MH
4 Laksh 342133
5 Soni 892132
BOYCE CODD'S NORMALIZATION FORM OR BCNF
BCNF is a strict version of 3NF
Because in 3NF we have
X Y
X is a super key
or
Y is a prime attribute, means each element of Y is part of some candidate key
But in BCNF
X Y
X is a super key.
Means all the attributes must dependent on the super key only.
Let’s take a table which already have 3NF properties.
Here in this table Salutation_Id is dependent on Name. Here Name may be considered as a
candidate key, but name may be repeated, so its not unique, so the Salutation_Id should be
dependent on SID only, so we have to remove this dependency such as
Name Salutation_Id
And make it as
SID Salutation_Id
So now the resultant tables are as follows:

Multivalued Dependency
o Multivalued dependency occurs when two attributes in a table are independent of each
other but, both depend on a third attribute.
o A multivalued dependency consists of at least two attributes that are dependent on a
third attribute that's why it always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two colors (white
and black) of each model every year.

BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White


Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of
each other.

In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The
representation of these dependencies is shown below:

1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR

This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL


multidetermined COLOR".

4 NORMALIZATION FORM OR 4NF


th

o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exist, then the
relation will be a multi-valued dependency.

Example

STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence,
there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to
unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables.

STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics

STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey
5 NORMALIZATION FORM OR 5NF
th

o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order
to avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).

Example
SUBJECT LECTURER SEMESTER

Maths John Semester 1

Hindi John Semester 1

Hindi Akash Semester 2

Math Rashmi Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a
valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL. But all three columns
together act as a primary key, so we can't leave other two columns blank. So, to make the above
table into 5NF, we can decompose it into three relations P1, P2 & P3:

P1

SUBJECT LECTURER

Math John

Hindi John

Hindi Akash

Math Rashmi
P2

LECTURER SEMESTER

John Semester 1

John Semester 1

Akash Semester 2

Rashmi Semester 1

P3

SEMSTER SUBJECT

Semester 1 Maths

Semester 1 Hindi

Semester 2 Hindi

Semester 1 Maths

Decomposition in DBMS
Decomposition in Database Management System is to break a relation into
multiple relations to bring it into an appropriate normal form. It helps to
remove redundancy, inconsistencies, and anomalies from a database. The
decomposition of a relation R in a relational schema is the process of replacing
the original relation R with two or more relations in a relational schema. Each
of these relations contains a subset of the attributes of R and together they
include all attributes of R.
Introduction
If a relation is not properly decomposed, then it may lead to other problems like
information loss, etc. There are two types of decomposition as shown below:

Rules for Decomposition


Whenever we decompose a relation, there are certain properties that must be
satisfied to ensure no information is lost while decomposing the relations. These
properties are:

1. Lossless Join Decomposition.


2. Dependency Preserving.

Lossless Join Decomposition


A lossless Join decomposition ensures two things:

 No information is lost while decomposing from the original relation.


 If we join back the sub decomposed relations, the same relation that was
decomposed is obtained.

We can follow certain rules to ensure that the decomposition is a lossless join
decomposition Let’s say we have a relation R and we decomposed it
into R1 and R2, then the rules are:

1. The union of attributes of both the sub relations R1 and R2 must contain
all the attributes of original relation R.

R1 ∪ R2 = R
2. The intersection of attributes of both the sub relations R1 and R2 must
not be null, i.e., there should be some attributes that are present in both
R1 and R2.

R1 ∩ R2 ≠ ∅

3. The intersection of attributes of both the sub relations R1 and R2 must be


the super key of R1 or R2, or both R1 and R2.

R1 ∩ R2 = Super key of R1 or R2

First, we decompose a large table into small appropriate tables, then apply
natural join to reconstruct the original table.

This is a student database relational table:

Student Details

We can decompose it into two simple tables as given below:

Student Subject Details:

Sid Name (Not Null) Subject (Not Null)

1 Raj English

2 Jyoti Home Science

3 Vikash Maths

1 Harsh Maths

3 Ajay Science
Student Personal Details:

Sid Mobile Address

1 65468154 51, Vaishalinagar

2 87668545 4a, Sukhsagar

3 26865948 H7, Civil Lines

1 Null R32, Gokul Villa

3 86516529 26, Karoli

If we want to see a common table then we can apply Natural JOIN between
both tables like this:

Student Subject Details ⋈ Student Personal Details

Sid Name (Not Null) Subject (Not Null) Mobile Address

1 Raj English 65468154 51, Vaishalinagar

2 Jyoti Home Science 87668545 4a, Sukhsagar

3 Vikash Maths 26865948 H7, Civil Lines

1 Harsh Maths Null R32, Gokul Villa

3 Ajay Science 86516529 26, Karoli

In this operation, no data loss occurs, so this is a good option to consider for
decomposition.

Lossy Decomposition
In a lossy decomposition, one or more of these conditions would fail and we will
not be able to recover Complete information as present in the original relation.

Let’s take an example:


Student Details

Sid Name (Not Null) Subject (Not Null) Mobile Address

1 Raj English 65468154 51, Vaishalinagar

2 Jyoti Home Science 87668545 4a, Sukhsagar

3 Vikash Maths 26865948 H7, Civil Lines

1 Harsh Maths Null R32, Gokul Villa

3 Ajay Science 86516529 26, Karoli


If we divide this student details table into two sections as given below:

Student Subject Details:

Sid Name (Not Null) Subject (Not Null)

1 Raj English

2 Jyoti Home Science

3 Vikash Maths

1 Harsh Maths

3 Ajay Science
Student Personal Details:

Mobile Address

65468154 51, Vaishalinagar

87668545 4a, Sukhsagar

26865948 H7, Civil Lines

Null R32, Gokul Villa

86516529 26, Karoli


In this Student Personal Details table, the SID column is not included, so now
we don’t know that these mobiles numbers and address belongs to whom.

So always decompose a table in such a manner that the data may be easily
reconstructed and retrieved.
Dependency Preserving
The second property of lossless decomposition is dependency preservation which says that
after decomposing a relation R into R1 and R2, all dependencies of the original relation R
must be present either in R1 or R2 or they must be derivable using the combination of
functional dependencies present in R1 and R2.

Let’s understand this from the same example above:

Theory
Consider a schema R(A,B,C,D) and functional dependencies A->B and C->D which
is decomposed into R1(AB) and R2(CD)

This decomposition is dependency preserving decomposition because

 A->B can be ensured in R1(AB)


 C->D can be ensured in R2(CD)

Example

Let a relation R (A, B, C, D) and set a FDs F = {A -> B , A -> C , C -> D} are given.
A relation R is decomposed into –

R1 = (A, B, C) with FDs F1 = {A -> B, A -> C}, and


R2 = (C, D) with FDs F2 = {C -> D}.
F' = F1 ∪ F2 = {A -> B, A -> C, C -> D}
so, F' = F.
And so, F'+ = F+.

Thus, the decomposition is dependency preserving decomposition.

R1 (A, B, C) withf1 = {A->B, A->C}

& R2 (C, D) f2 = {C-->D}

F’ = f1 U f2 = {A->B, A->C, C->D}


UNIT 4

Transactions are a set of operations that are used to perform some


logical set of work. A transaction is made to change data in a database which
can be done by inserting new data, updating the existing data, or by deleting the
data that is no longer required. There are certain types of transaction states
which tell the user about the current condition of that database transaction and
what further steps to be followed for the processing.

Operations in Transaction

A certain set of operations takes place when a transaction is done that is used to
perform some logical set of operations. For example: When we go to withdraw
money from ATM, we encounter the following set of operations:

1. Transaction Initiated
2. You have to insert an ATM card
3. Select your choice of language
4. Select whether savings or current account
5. Enter the amount to withdraw
6. Entering your ATM pin
7. Transaction processes
8. You collect the cash
9. You press finish to end transaction

The above mentioned are the set of operations done by you. But in the case of a
transaction in DBMS there are three major operations that are used for a
transaction to get executed in an efficient manner. These are:

1. Read/ Access Data 2. Write/ Change Data 3. Commit

Read(X): Read operation is used to read the value of X from the database and
stores it in a buffer in main memory.

Write(X): Write operation is used to write the value back to the database from
the buffer.

Let's take an example to debit transaction from an account which consists of


following operations:

1. R(X);
2. X = X - 500;
3. W(X);

1
UNIT 4
Let's assume the value of X before starting of the transaction is 4000.

o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will
contain 3500.
o The third operation will write the buffer's value to the database. So X's final
value will be 3500.

But it may be possible that because of the failure of hardware, software or power,
etc. that transaction may fail before finished all the operations in the set.

For example: If in the above transaction, the debit transaction fails after
executing operation 2 then X's value will remain 4000 in the database which is
not acceptable by the bank.

To solve this problem, we have two important operations:

Commit: It is used to save the work done permanently.

Rollback: It is used to undo the work done.

Transaction States in DBMS

During the lifetime of a transaction, there are a lot of states to go through. These
states update the operating system about the current state of the transaction and
also tell the user about how to plan further processing of the transaction. These
states decide the regulations which decide the fate of a transaction whether it
will commit or abort.

2
UNIT 4
The ROLLBACK statement undo the changes made by the current
transaction. A transaction cannot undo changes after COMMIT execution.

Following are the different types of transaction States:

 Active State: When the operations of a transaction are running then the
transaction is said to be active state. If all the read and write operations
are performed without any error then it progresses to the partially
committed state, if somehow any operation fails, then it goes to a state
known as failed state.
 Partially Committed: After all the read and write operations are
completed, the changes which were previously made in the main memory
are now made permanent in the database, after which the state will
progress to committed state but in case of a failure it will go to the failed
state.
 Failed State: If any operation during the transaction fails due to some
software or hardware issues, then it goes to the failed state . The

3
UNIT 4
occurrence of a failure during a transaction makes a permanent change to
data in the database. The changes made into the local memory data are
rolled back to the previous consistent state.
 Aborted State: If the transaction fails during its execution, it goes
from failed state to aborted state and because in the previous states all the
changes were only made in the main memory, these uncommitted
changes are either deleted or rolled back. The transaction at this point can
restart and start afresh from the active state.
 Committed State: If the transaction completes all sets of operations
successfully, all the changes made during the partially committed state
are permanently stored and the transaction is stated to be completed, thus
the transaction can progress to finally get terminated in the terminated
state.
 Terminated State: If the transaction gets aborted after roll-back or the
transaction comes from the committed state, then the database comes to a
consistent state and is ready for further new transactions since the
previous transaction is now terminated.

Properties of Transaction in DBMS

There are four major properties that are vital for a transaction to be successful.
These are used to maintain state consistency in the database, both before and
after the transaction. These are called ACID properties.

4
UNIT 4
1. Atomicity: This property means that either the transaction takes place
completely at once or doesn’t happen at all. There is no middle option,
i.e., transactions do not occur partially. Each transaction is considered as
one single step which either runs completely or is not executed at
all. Click here, to learn more about Atomicity in DBMS.
2. Consistency: This property means that the integrity constraints of a
database are maintained so that the database is consistent before and after
the transaction. It refers to the correctness of a database.
3. Isolation: This property means that multiple transactions can occur
concurrently without causing any inconsistency to the database state.
These transactions occur independently without any external interference.
Changes that occur in a particular transaction are not visible/ accessible to
any other transaction until that particular change in that transaction has
been committed.
4. Durability: This property ensures that once the transaction has
completed execution, the updates and modifications to the database are
stored in and written to disk and they remain intact even if a system
failure occurs. These updates become permanent and are stored in the
non-volatile memory.

Schedule
A series of operation from one transaction to another transaction is known as schedule. It is
used to preserve the order of the operation in each of the individual transaction.

Serial Schedule

5
UNIT 4
As the name says, all the transactions are executed serially one after the other.
In serial Schedule, a transaction does not start execution until the currently
running transaction finishes execution. This type of execution of the transaction
is also known as non-interleaved execution. Serial Schedule are always
recoverable, cascades, strict and consistent. A serial schedule always gives the
correct result.

Consider two transactions T1 and T2 shown above, which perform some


operations. If it has no interleaving of operations, then there are the following
two possible outcomes - Either execute all T1 operations, which were followed
by all T2 operations. Or execute all T2 operations, which were followed by all
T1 operations. In the above figure, the Schedule shows the serial Schedule
where T1 is followed by T2, i.e., T1 -> T2. Where R(A) -> reading some data
item ‘A’. And, W(B) -> writing/updating some data item ‘B’. Therefore, for the
above 2 transactions, a total number of serial schedules possible = 2.

Non-serial Schedule (Parallel Schedule)


In a non-serial Schedule, multiple transactions execute
concurrently/simultaneously, unlike the serial Schedule, where one transaction
must wait for another to complete all its operations. In the Non-Serial Schedule,

6
UNIT 4
the other transaction proceeds without the completion of the previous
transaction. All the transaction operations are interleaved or mixed with each
other.

In this Schedule, there are two transactions, T1 and T2, executing concurrently.
The operations of T1 and T2 are interleaved. So, this Schedule is an example of
a Non-Serial Schedule.

Non-serial schedules are further categorized into serializable and non-


serializable schedules.

SERIALIZABILITY
Serializable means obtaining an equivalent output as of a serial schedule for the
same 'n' number of transactions.
Serializability helps preserve the consistency and concurrency of a database.
There are 2 methods widely used to check serializability i.e., Conflict equivalent
and View equivalent.
A non-serial schedule is said to be serializable, if it is conflict equivalent or view-
equivalent to a serial schedule.

7
UNIT 4
Conflict Serializability

A schedule is called conflict serializable if it can be transformed into a serial


schedule by swapping non-conflicting operations. An operations pair become
conflicting if all conditions satisfy:

1. Both belong to separate transactions.


2. They have the same data item.
3. They contain at least one write operation.

In this schedule, Write(A)/Read(A) and Write(B)/Read(B) are called as


conflicting operations. This is because all the above conditions hold true for
them. Thus, by swapping(non-conflicting) 2nd pair of the read/write operation
of 'A' data item and 1st pair of the read/write operation of 'B' data item, this
non-serial Schedule can be converted into a serializable Schedule. Therefore, it
is conflict serializable.

Testing for Serializability

Serialization Graph is used to test the Serializability of a schedule.

Assume a schedule S. For S, we construct a graph known as precedence graph.

G = (V, E),

where V consists a set of vertices, and E consists a set of edges. The set of vertices
is used to contain all the transactions participating in the schedule. The set of

8
UNIT 4
edges is used to contain all edges Ti ->Tj for which one of the three conditions
holds:

1. Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).


2. Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).
3. Create a node Ti → Tj if Ti executes write (Q) before Tj executes write
(Q).

For example:

Precedence graph for schedule S1:

The precedence graph for schedule S1 contains a cycle that's why Schedule S1
is non-conflict serializable.

9
UNIT 4
For example:

Precedence graph for schedule S2:

The precedence graph for schedule S2 contains no cycle that's why Schedule S2
is Conflict serializable.

10
UNIT 4
View Serializability

A schedule is viewed serializable if it is view equivalent to a serial schedule. If


a schedule is conflict serializable, then it will be view serializable. The view
serializable which does not conflict with serializable, contains blind writes.

S S1

Transaction Transaction Transaction Transaction Transaction Transaction


T1 T2 T3 T1 T2 T3
R(A) //100 R(A) //100
W(A)// W(A)//
A=A-40 A=A-40

W(A) //60 W(A) //60


W(A)// W(A)//
A=A-40 A=A-40

W(A) //20 W(A) //20


W(A)// A=A- W(A)//
20 A=A-20

W(A) //0 W(A) //0

BEFORE SWAPPING AFTER SWAPPING

The final write operation in S is done by T3 and in S1, it is also done by T3. So,
S and S1 are view Equivalent.

The first schedule S1 satisfies all three conditions, so we don't need to check
another schedule. Hence, view equivalent serial schedule is:
T1 → T2 → T3

Numerical Question based on Conflict Serializability and View


Serializability.

Question-01:
Check whether the given schedule S is conflict serializable or not-
S : R1(A) , R2(A) , R1(B) , R2(B) , R3(B) , W1(A) , W2(B)

Solution-

11
UNIT 4
Step-01:
List all the conflicting operations and determine the dependency between the
transactions-
 R2(A) , W 1(A) (T2 → T1)
 R1(B) , W 2(B) (T1 → T2)
 R3(B) , W 2(B) (T3 → T2)
Step-02:
Draw the precedence graph-

 Clearly, there exists a cycle in the precedence graph.


 Therefore, the given schedule S is not conflict serializable.

Question-02:
Check whether the given schedule S is conflict serializable and recoverable or not-

12
UNIT 4

Solution- Checking Whether S is Conflict Serializable Or Not-


Step-01:
List all the conflicting operations and determine the dependency between the
transactions-
 R2(X) , W 3(X) (T2 → T3)
 R2(X) , W 1(X) (T2 → T1)
 W 3(X) , W 1(X) (T3 → T1)
 W 3(X) , R4(X) (T3 → T4)
 W 1(X) , R4(X) (T1 → T4)
 W 2(Y) , R4(Y) (T2 → T4)

Step-02:
Draw the precedence graph-

 Clearly, there exists no cycle in the precedence graph.


 Therefore, the given schedule S is conflict serializable.
Checking Whether S is Recoverable Or Not-
 Conflict serializable schedules are always recoverable.
 Therefore, the given schedule S is recoverable.

Question-03:
Check whether the given schedule S is view serializable or not-

13
UNIT 4

Solution-
 We know, if a schedule is conflict serializable, then it is surely view serializable.
 So, let us check whether the given schedule is conflict serializable or not.

Checking Whether S is Conflict Serializable Or Not-


Step-01:
List all the conflicting operations and determine the dependency between the
transactions-
 W 1(B) , W 2(B) (T1 → T2)
 W 1(B) , W 3(B) (T1 → T3)
 W 1(B) , W 4(B) (T1 → T4)
 W 2(B) , W 3(B) (T2 → T3)
 W 2(B) , W 4(B) (T2 → T4)
 W 3(B) , W 4(B) (T3 → T4)

Step-02:
Draw the precedence graph-

 Clearly, there exists no cycle in the precedence graph.


 Therefore, the given schedule S is conflict serializable.
 Thus, we conclude that the given schedule is also view serializable.

Question-04:
Check whether the given schedule S is view serializable or not. If yes, then give the
serial schedule.
S : R1(A) , W2(A) , R3(A) , W1(A) , W3(A)

Solution-
For simplicity and better understanding, we can represent the given schedule pictorially
as-

14
UNIT 4

 We know, if a schedule is conflict serializable, then it is surely view serializable.


 So, let us check whether the given schedule is conflict serializable or not.

Checking Whether S is Conflict Serializable Or Not-


Step-01:
List all the conflicting operations and determine the dependency between the
transactions-
 R1(A) , W 2(A) (T1 → T2)
 R1(A) , W 3(A) (T1 → T3)
 W 2(A) , R3(A) (T2 → T3)
 W 2(A) , W 1(A) (T2 → T1)
 W 2(A) , W 3(A) (T2 → T3)
 R3(A) , W 1(A) (T3 → T1)
 W 1(A) , W 3(A) (T1 → T3)

Step-02:
Draw the precedence graph-

 Clearly, there exists a cycle in the precedence graph.


 Therefore, the given schedule S is not conflict serializable.
Now,

15
UNIT 4
 Since, the given schedule S is not conflict serializable, so, it may or may not be
view serializable.
 To check whether S is view serializable or not, let us use another method.
 Let us check for blind writes.

Checking for Blind Writes-


 There exists a blind write W 2 (A) in the given schedule S.
 Therefore, the given schedule S may or may not be view serializable.
Now,
 To check whether S is view serializable or not, let us use another method.
 Let us derive the dependencies and then draw a dependency graph.

Drawing a Dependency Graph-

 T1 firstly reads A and T2 firstly updates A.


 So, T1 must execute before T2.
 Thus, we get the dependency T1 → T2.
 Final Updation on A is made by the transaction T3.
 So, T3 must execute after all other transactions.
 Thus, we get the dependency (T1, T2) → T3.
 From write-read sequence, we get the dependency T2 → T3

Now, let us draw a dependency graph using these dependencies-

 Clearly, there exists no cycle in the dependency graph.


 Therefore, the given schedule S is view serializable.
 The serialization order T1 → T2 → T3.

Recoverability of Schedule
16
UNIT 4
Sometimes a transaction may not execute completely due to a software issue,
system crash or hardware failure. In that case, the failed transaction has to be
rollback. But some other transaction may also have used value produced by the
failed transaction. So we also have to rollback those transactions.

Recoverable schedules are further categorized into 2 types:

1. Cascading Schedule
2. Cascadeless Schedule
3. Strict Schedule

Cascading Schedule

If in a schedule, several other dependent transactions are forced to


rollback/abort because of the failure of one transaction, then such a
schedule is called a Cascading Schedule or Cascading Rollback or
Cascading Abort. It simply leads to the wastage of CPU time.

Here, Transaction T2 depends on transaction T1, and transaction T3


depends on transaction T2. Thus, in this Schedule, the failure of
transaction T1 will cause transaction T2 to roll back, and a similar case
for transaction T3. Therefore, it is a cascading schedule. If the
transactions T2 and T3 had been committed before the failure of
transaction T1, then the Schedule would have been irrecoverable.

Cascadeless Schedule
If in a schedule, a transaction is not allowed to read a data item until and
unless the last transaction that has been written is committed/aborted,
17
UNIT 4
then such a schedule is called a Cascadeless Schedule. It avoids cascading
rollback and thus saves CPU time.

The above Table 2 shows a schedule with two transactions. Transaction T1


reads and write A and commits, and that value is read and written by T2.
So this is a cascade less recoverable schedule.

Strict Schedule
If in a schedule, until the last transaction that has written it is committed
or aborted, a transaction is neither allowed to read nor write data item,
then such a schedule is called as Strict Schedule.

Here, transaction Tb reads/writes the written value of transaction Ta only


after the transaction Ta commits. Hence, the Schedule is a strict Schedule.

Concurrency Control

18
UNIT 4
When several transactions execute concurrently without any rules and
protocols, various problems arise that may harm the data integrity of
several databases. These problems are known as concurrency control
problems. Therefore, several rules are designed, to maintain consistency
in the transactions while they are executing concurrently which are
known as concurrency control protocols.

Concurrency Control Problems


Several problems that arise when numerous transactions execute
simultaneously in a random manner are referred to as Concurrency
Control Problems.

1)Dirty Read Problem

The dirty read problem in DBMS occurs when a transaction reads the
data that has been updated by another transaction that is still
uncommitted. It arises due to multiple uncommitted transactions
executing simultaneously.

Example: Consider two transactions A and B performing read/write


operations on a data DT in the database DB. The current value of DT is
1000: The following table shows the read/write operations in A and B
transactions.

Transaction A reads the value of data DT as 1000 and modifies it to 1500


which gets stored in the temporary buffer. The transaction B reads the
data DT as 1500 and commits it and the value of DT permanently gets
changed to 1500 in the database DB. Then some server errors occur in
transaction A and it wants to get rollback to its initial value, i.e., 1000 and
then the dirty read problem occurs.

2)Unrepeatable Read Problem

The unrepeatable read problem occurs when two or more different values
of the same data are read during the read operations in the same
transaction.

Example: Consider two transactions A and B performing read/write


operations on a data DT in the database DB. The current value of DT is
1000: The following table shows the read/write operations in A and B
transactions.

19
UNIT 4
Transaction A and B initially read the value of DT as 1000. Transaction
A modifies the value of DT from 1000 to 1500 and then again transaction
B reads the value and finds it to be 1500. Transaction B finds two
different values of DT in its two different read operations.

3)Lost Update Problem

The Lost Update problem arises when an update in the data is done over
another update but by two different transactions.

Transaction A Transaction B
R(A)//1000
A=A+500

W(A)//1500
A=A+300

W(A)// 1800
R(A)//1800

Example: Consider two transactions A and B performing read/write


operations on a data DT in the database DB. The current value of DT is
1000: The following table shows the read/write operations in A and B
transactions.

Transaction A initially reads the value of DT as 1000. Transaction A


modifies the value of DT from 1000 to 1500 and then again transaction B
modifies the value to 1800. Transaction A again reads DT and finds 1800
in DT and therefore the update done by transaction A has been lost.

Concurrency Control Protocols


To avoid concurrency control problems and to maintain consistency and
serializability during the execution of concurrent transactions some rules
are made. These rules are known as Concurrency Control Protocols.

Lock-Based Protocols

To attain consistency, isolation between the transactions is the most


important tool. Isolation is achieved if we disable the transaction to
perform a read/write operation. This is known as locking an operation in
a transaction. Through lock-based protocols, desired operations are freely
allowed to perform locking the undesired operations.

20
UNIT 4
There are two kinds of locks used in Lock-based protocols:

Shared Lock(S): The locks which disable the write operations but allow
read operations for any data in a transaction are known as shared locks.
They are also known as read-only locks and are represented by 'S'.

Exclusive Lock(X): The locks which allow both the read and write
operations for any data in a transaction are known as exclusive locks.
This is a one-time use mode that can't be utilized on the exact data item
twice. They are represented by 'X'.

Two-Phase Locking
The two phases of Locking are :

 Growing Phase: In the growing phase, the transaction only obtains the
lock. The transaction can not release the lock in the growing phase. Only
when the data changes are committed the transaction starts the Shrinking
phase.
 Shrinking Phase: Neither are locks obtained nor they are released in this
phase. When all the data changes are stored, only then the locks are
released.

 In the above figure, we can see that in the growing phase, all the locks are
obtained till the point when all the locks needed by the transactions are
acquired. This pint is called Lock-Point. After the lock point, the
transaction enters the Shrinking phase.

Two-Phase Locking Types

21
UNIT 4
Two-phase Locking is further classified into three types :

1. Strict two-phase locking protocol :

 The transaction can release the shared lock after the lock point.
 The transaction cannot release any exclusive lock until the
transaction commits.
 In strict two-phase locking protocol, if one transaction rollback
then the other transaction should also have to roll back. The
transactions are dependent on each other. This is called Cascading
schedule.

2. Rigorous two-phase locking protocol :


 The transaction cannot release either of the locks, i.e., neither
shared lock nor exclusive lock.
 Serializability is guaranteed in a Rigorous two-phase locking
protocol.
 Deadlock is not guaranteed in rigorous two-phase locking protocol.

3. Conservative two-phase locking protocol:

 The transaction must lock all the data items it requires in the
transaction before the transaction begins.
 If any of the data items are not available for locking before
execution of the lock, then no data items are locked.
 The read-and-write data items need to know before the transaction
begins. This is not possible normally.
 Conservative two-phase locking protocol is deadlock-free.
 Conservative two-phase locking protocol does not ensure a strict
schedule.

Timestamp Ordering Protocols


Timestamp-based protocols in dbms are used to order the transaction in
ascending order of their creation time. The creation time is the system time
or a logical counter.

The transaction which is created first or you can say older transactions are
given high priority over new transactions.

22
UNIT 4
For example, if there are two transactions T1 and T2. T1 enters the system at
008 and T2 enters the system at 009 then T1 is given priority over T2

How a Timestamp Ordering Protocol Works?

Timestamp-based protocols in dbms order the transaction according to their


transaction timestamps. A schedule that is ordered in the serial order of their
transaction timestamp is the only serializable schedule equivalent to the
timestamp-ordered-based transaction schedule.

In order to ensure that the conflicting operation occurring in the schedule does
not violate the timestamp ordering two-time stamp values relating to each
database item (X) are used :

 W_TS(X) (write timestamp) is the largest timestamp of any transaction


that executed write(X) successfully.
 R_TS(X) (read timestamp) is the largest timestamp of any transaction
that executed read(X) successfully.

Advantages

 Timestamp-based protocols in dbms ensure serializability as the


transaction is ordered on their creation timestamp. The precedence graph
for the timestamp ordering protocol looks as follows:

23
UNIT 4
 No deadlock occurs when timestamp ordering protocol is used as no
transaction waits.
 No older transaction waits for a longer period of time so the protocol is
free from deadlock.
 Timestamp based protocol in dbms ensures that there are no conflicting
items in the transaction execution.

Disadvantages

 Timestamp-based protocols in dbms may not be cascade free or


recoverable.
 In timestamp based protocol in dbms there is a possibility of starvation of
long transactions if a sequence of conflicting short transactions causes
repeated restarting of the long transaction.

Introduction to Log-Based Recovery


A log is a series of records. The logs for each transaction are kept in a log file to
allow recovery in case of failure. A log is kept for each operation performed on
the database. It is important to store the log before the actual transactions are
applied to the database. Take the example of modifying a student's City. This
transaction produces the following logs.

 A start log is produced when the transaction begins. <Tn, Start>


 A new log is written to the file when the City is changed from Chennai to
NCR <Tn, City, 'Chennai', 'NCR' >
 Once the transaction has been completed, another log will be written to
indicate that the operation has been completed

24
UNIT 4

Transactions that perform a write operation to the database only modify it when
a log record is created for that write operation. The logs may or may not record
the reading of data items. The reason is that reading a data item has no effect on
the consistency of the database and is not beneficial to the recovery mechanism.

During the recovery process, we perform two operations:

1. Undo(Ti): Restores the old values of each of the items in Ti that have
been updated by the transaction Ti.
2. Redo(Ti): Updates all data items updated by transaction Ti with their
new values. Undoing a T requires the log to contain both the
record <start> and the record <commit>.

Approaches to Modify the Database

In the recovery system, we use two different types of medication in the


database. They are

Immediate Database Modification

By using immediate updates, the database can be modified and output while still
in the active state of the transaction. Data modifications written during an active
transaction are referred to as uncommitted modifications.

For consistency reasons, old values of data items should be used if the system
crashes or a transaction fails. Undoing an operation is a way to accomplish this.
In order for Ti to start its execution, a log entry called <Ti start> is written. A

25
UNIT 4
new update record is written to the log before the write(X) operation by Ti is
executed. Partially committed Ti records are written to the log as <Ti commit>.

The following are examples of a banking system, the transaction To is followed


by the transaction T1. In the case of a system crash right after the log record,
and during recovery, we redo (T1) and undo (T1) since the log record includes
both <To start> and <To commit>. However, the log record does not contain
a <T1 commit> with a <T1 start>. First, we should undo (T1), and then we
should redo (To).

Deferred Modification Technique

All database modifications are recorded in the log using the delayed database
modification technique to ensure transaction atomicity. As a part of this
technique, a transaction is only partially committed when all the written
statements are applied to the database. As soon as the final action of an order
has been completed, the order is said to be partially committed.

Whenever a partial commit occurs, deferred writes are executed according to


the information in the transaction log. The information on the log is ignored if
the system crashes before a transaction has been completed or if the transaction
aborts. With the log, the system can deal with events that result in a loss of
information on volatile storage. The recovery procedure uses the log.

It resets the values of data items that were updated by transaction Ti. The log
includes a list of the data items that Ti updated and their new values. Redo
transactions must be reversible. Repeating them many times must have the same
effect as performing them once.

In the event of a failure, the recovery subsystem consults the log for information
about which transactions need to be redone. If the log record includes both <Ti
start> and <Ti commit> statements, Transaction Ti is redone.

26
UNIT 4

There is a failure just after the record for step write(B) of transaction To. The
recovery process will not perform any redo operations as we only have <To
start> in our log record, but not <To commit>.

When a system crash occurs just after writing the log record C, during recovery
we do only redo (To), since the log disk only contains data to commit and to
start. However, we don't have the <T1 commit> in log disk at the same time, so
we can't do the redo (T1).

If the system crashes just after the log record <T1 commit>, then during the
recovery we will do both redos (T0) and redos (TI) since we have both <To
start> and <To commit> and <T1 start>, <T1 commit> in log disk.

Recovery using Log records

The system uses the log to determine what transactions need to be undone and
which need to be re-done if the system crashes.

The log should contain the records <Ti, Start> and <Ti, Commit> or <Ti,
Commit>; otherwise, the Transaction Ti should be redone. There should be
an undo of Transaction Ti if the log contains the record <Tn, Start> but does
not contain the record <Ti, commit> or <Ti, abort>.
27
UNIT 4
CHECK POINT
The DBMS checkpoint is a procedure of compressing the transaction log file by
transferring the old transaction to permanent storage. It helps in system recovery
when the failure occurs. The procedure of system recovery includes reading of
log file in reverse order and maintaining redo and undo lists.

Recovery using Checkpoint

The schematic representation of the system recovery when the failure


occurs during the execution of concurrent transactions.

Transactions and their operations in the above diagram:

28
UNIT 4
Following are the steps to be performed for the system recovery using
checkpoint.

 The transaction log file are read in the reverse order, ie., from T4 to T1.
 Redo and Undo are the lists that are created and maintained by the
system.
 If the transactions contain operations like <Tn, start> and <Tn,
commit> together or <Tn, commit> alone then the transaction will be
stored in the redo list. In the above diagram, The transaction T1 contains
only <Tn, commit> and transactions T2 and T3 contain <Tn, start> and
<Tn, commit> both the operations and therefore transactions T1, T2 and
T3 are stored in the redo list.
 If the transactions contain operations like <Tn, start> but not <Tn,
commit>, then the transaction will be stored in undo list. In the above
diagram, The transaction T4 contains <Tn, start> but not <Tn, commit>
operation, and therefore transaction T4 is stored in undo list.

Table of transactions operations and the lists they are placed in

29
UNIT 5
File Organization
o The File is a collection of records. Using the primary key, we can access the records.
The type and frequency of access can be determined by the type of file organization
which was used for a given set of records.
o File organization is a logical relationship among various records. This method defines
how file records are mapped onto disk blocks.
o File organization is used to describe the way in which the records are stored in terms of
blocks, and the blocks are placed on the storage medium.
o The first approach to map the database to the file is to use the several files and store
only one fixed length record in any given file. An alternative approach is to structure
our files so that we can contain multiple lengths for records.

Types of File organization:


File organization contains various methods. These particular methods have pros and
cons on the basis of access or selection. In the file organization, the programmer decides
the best-suited file organization method according to his requirement.

Types of file organization are as follows:

Sequential File Organization

This method is the easiest method for file organization. In this method, files
are stored sequentially. This method can be implemented in two ways:

1. Pile File Method:

o It is a quite simple method. In this method, we store the record in a sequence, i.e., one after
another. Here, the record will be inserted in the order in which they are inserted into tables.
o In case of updating or deleting of any record, the record will be searched in the memory blocks.
When it is found, then it will be marked for deleting, and the new record is inserted.
UNIT 5
Insertion of the new record:
Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records
are nothing but a row in the table. Suppose we want to insert a new record R2 in the sequence,
then it will be placed at the end of the file. Here, records are nothing but a row in any table.

2. Sorted File Method:

o In this method, the new record is always inserted at the file's end, and then it will sort the
sequence in ascending or descending order. Sorting of records is based on any primary key or
any other key.
o In the case of modification of any record, it will update the record and then sort the file, and
lastly, the updated record is placed in the right place.

Insertion of the new record:


Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and
R7. Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the
end of the file, and then it will sort the sequence.
UNIT 5
Pros of sequential file organization

o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism like magnetic tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like grade calculation of a
student, generating the salary slip, etc.

o This method is used for report generation or statistical calculations.

Cons of sequential file organization

o It will waste time as we cannot jump on a particular record that is required but we have to move
sequentially which takes our time.
o Sorted file method takes more time and space for sorting the records.
o It is the simplest and most basic type of organization. It works with data blocks. In heap
file organization, the records are inserted at the file's end. When the records are inserted,
it doesn't require the sorting and ordering of records.

Heap file organization


o When the data block is full, the new record is stored in some other block. This new data
block need not to be the very next data block, but it can select any data block in the
memory to store new records. The heap file is also known as an unordered file.
o In the file, every record has a unique id, and every page in a file is of the same size. It
is the DBMS responsibility to store and manage the new records.
UNIT 5
Insertion of a new record
Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert
a new record R2 in a heap. If the data block 3 is full then it will be inserted in any of the
database selected by the DBMS, let's say data block 1.

If we want to search, update or delete the data in heap file organization, then we need to traverse
the data from staring of the file till we get the requested record.

If the database is very large then searching, updating or deleting of record will be time-
consuming because there is no sorting or ordering of records. In the heap file organization, we
need to check all the data until we get the requested record.

Pros of Heap file organization

o It is a very good method of file organization for bulk insertion. If there is a large number
of data which needs to load into the database at a time, then this method is best suited.
o In case of a small database, fetching and retrieving of records is faster than the
sequential record.

Cons of Heap file organization

o This method is inefficient for the large database because it takes time to search or
modify the record.
o This method is inefficient for large databases.
UNIT 5
Need for an Indexing
The main behind designing a database to store data and get quick access of any
data. Beside this one can easily insert/update/delete data of the database. As
database is of large size hence searching for even one record will take time.
Moreover, even a smaller transaction will take time to perform the action. In
order to reduce the time spent in transaction, indexes are used.

What is Indexing in DBMS?

Indexing is a data structure technique to efficiently retrieve records from the


database files based on the attributes on which the indexing has been done. An
index file consists of records (called index entries) in the following form:

Search Key Pointer


Search Key – Attribute used to lookup records in a file.
Pointer – Locates the memory address of that attribute.

Types of Indexes

According to the attributes defined above, we divide indexing into three types:

Single Level Indexing

It is somewhat like the index (or the table of contents) found in a book. Index of
a book contains topic names along with the page number similarly the index
table of the database contains keys and their corresponding block address.

Single Level Indexing is further divided into three categories:

1. Primary Indexing: The indexing or the index table created using


Primary keys is known as Primary Indexing. It is defined on ordered data.
As the index is comprised of primary keys, they are unique, not null, and
possess one to one relationship with the data blocks.
UNIT 5
(a) Dense Indexing: The dense index contains on index record for
every search key value in the data file. It makes searching faster. In
this, the number of records in the index table it same as the number
of records in the main table.
It needs more space to store index record itself. The index records
have the search key and a pointer to the actual record on the disk.

(b) Sparse Indexing: In sparse indexing each and every value of


primary key is not entered in index file. Only the first entry of
every block is stored in index file and the base address of that
block is pointed by the block pointer.

2. Secondary Indexing: In Secondary indexing main file is unsorted. It can


be done on key attribute as well as non-key attribute. Sparse indexing is
not possible on secondary indexing. Because we will have to enter each
value in the index table. So dense indexing will be performed.
UNIT 5

Example:

3. Cluster Indexing: In cluster indexing main file is sorted using non key
attribute. There will be one entry for each unique value of the non key
attribute. Block pointer will point the first block in which the unique
value appears. Clusters index is an example of sparse index as well as
dense index. Because it doesn’t allow every entry in the index file, but it
allows every entry in the index file, but it allows every unique value in
the index file.

Example:
Let’s assume that a company recruited many employees in various
departments. In this case, clustering indexing should be created for all
employee who being to the same dept.
It is considered in a single cluster, and index point to the cluster, and index
points to the cluster as a whole. Here, Department_no is a non- unique key.
UNIT 5
Single Level Indexing

B tree

B tree is an m-way tree that self-balances. Because of their balanced nature, such trees are
widely used to organize and manage massive datasets and to facilitate searches.

Binary trees have a maximum of 2 children, whereas a Multiway Search Tree, also known as
an M-way tree, has a maximum of m children where m>2. M-way trees are primarily used in
external searching, i.e. retrieving data from secondary storage such as disc files, due to their
structure.

Structure of B Tree

A B-tree of order m is an m-way search tree that is either empty or satisfies the following
characteristics:

 All leaf nodes are at the same level.


 All non-leaf nodes (except the root node) have at least [m/2] children.
 All nodes (except the root node) should have at least [m/2]-1 keys.
 If the root node is a leaf node (only node in the tree), then it will have no children and will
have at least one key. If the root is a non leaf node, then it will have at least 3 children and at
least one key.
 A non leaf node with n-1 key values should have n non NULL children.

The figure above is an example of a B Tree. It has [11] at the root. 9 that is lesser
than 11 falls in the left child. [16, 18] being greater than 11 a falls in the right child. The same
process follows as we go down the tree.

From this, we can see that any node (except root) in a B-tree is at least half-full and this
avoids wastage of storage space. The B tree is perfectly balanced so that the number of nodes
accessed to find a key becomes less.

Operations on B Tree

The B tree may be split or joined to ensure that none of its properties are violated during the
operations.
UNIT 5
Searching in B Tree

The process of searching in a B Tree is similar to that of searching in a Binary Search Tree.
Unlike a BST, however, instead of making a 2-way decision (left or right), a B tree makes
an n-way decision at each node, where n is the node's number of children.

Following are the steps to search an element in B Tree:

1. The search begins with the root node. Contrast the root with the search element k.
o The search is complete if the root node contains the character k.
o If k is the first value in the root, we move to the leftmost child and recursively
search for the child.
o If the root has only 2 children, go to the rightmost child and search the child
recursively.
o If the root contains more than 2 keys, search the remainder.
2. If k is not found after traversing the entire tree, the search element does not exist in
the B Tree.

Insertion in B Tree

When inserting an element into the B tree, we must first check to see if the item
is already existent in the tree. If the element is found, the insertion does not
occur. If not, we will continue with the insertion. 2 cases must be addressed
when inserting the key into the leaf node:

1. Node contains no more than MAX number of keys - Simply add the key
into the node in its proper location.
2. Node has the maximum number of keys - A key is inserted into the full
node, and then a split occurs, dividing the full node into three parts.
The second or median key becomes the parent node, while the first and
third parts serve as the left and right children. If the parent node also has
a MAX number of keys, the process is repeated.

The following are the steps for inserting an element into B Tree:

1. Determine the maximum number of keys in the node based on the B tree's order.
2. If the tree is empty, a root node is allocated and the key is inserted and acts as the root
node.
3. Search the appropriate node for insertion.`
4. If the node is full:
4.1. Insert the elements in increasing order.
4.2. If the elements are greater than the maximum number of keys, split at the median.
4.3. Push the median key upwards and split the left and right keys as left and right
child respectively.
5. If the node is not full, insert the node in increasing order.
UNIT 5
Deletion in B Tree

During insertion, we had to make sure that the number of keys in the node did
not exceed a certain limit. Similarly, during deletion, we must ensure that the
number of keys remaining in the node does not fall below the minimum number
of keys that a node can hold. As a result, instead of a split, a combination
process occurs in the case of deletion.

We would consider two scenarios when deleting an element from the B Tree:

1. Deletion from a leaf node.


2. Deletion from a non-leaf node.

Case 1 - Deletion from Leaf Node

1.1. If the node has more than MIN keys - Deleting a key does not violate any
property, and thus the key can be easily deleted by shifting other node keys, if
necessary.

1.2. If the node has less than MIN keys - This type of deletion violates a B tree
property. If the keys in the left sibling exceed MIN, keys are borrowed from
there. If the keys in the right sibling exceed MIN, keys are borrowed from there.
If either of these conditions is not met, a node combination occurs with either of
the siblings.

Case 2 - Deletion from Non-Leaf Node


In this case, the successor key (the smallest key in the right subtree) is copied at
the location of the key to be deleted, and the successor is then deleted. This case
is further simplified to Case 1, which is deletion from a leaf node.

Applications of B Tree

1. Database or File System - Consider having to look up a person's


information in the Yellow Pages (the directory containing details of
professionals). B Tree can be used to create a system in which you need
to search for a record among millions.
2. Search Engines and Multilevel Indexing - B tree Searching is used in
search engines as well as MySQL Server querying.
3. To store blocks of data - Because of its balanced structure, B Tree aids
in the faster storage of blocks of data in secondary storage.

B+ tree
UNIT 5
B+ tree is simply an improved version of the B tree that allows for efficient
and smooth insertion, deletion, and sequential access. It helps to mitigate the
disadvantages of the B-tree by saving data pointers at the leaf node level and
simply storing key values in internal nodes.

A B+ tree, also known as an n-array tree, is a tree with a large number of


children per node.

The B+ tree is also known as an advanced self-balanced tree since every path
from the tree's root to its leaf is the same length. The fact that all of the leaf
nodes are the same length indicates that they all occur at the same level. It is not
possible for some leaf nodes to appear at the third level while others appear at
the second level.

The above figure shows a representation of a B+ tree.

Structure of B+ Tree

A B+ tree is the same as a B tree; the only difference is that the B+ tree has an
additional level with linked leaves at the bottom. In addition, unlike the B tree,
each node in a B+ tree contains only keys rather than key-value pairs.
UNIT 5
Operations of B+ Tree

Most operations in B+ Tree are faster and make up for the drawbacks of the B
tree.

Searching in B+ Tree

In the B tree, our search ended when we discovered the key, but this will not be
the case in the B+ tree. It is possible that a key exists in the internal node but not
in the leaf node of a B+ tree. This is because when a data item is deleted from
the leaf node, the corresponding key is not deleted from the internal node. The
search will take place only within the leaf nodes.

Insertion in B+ Tree

First, a search is performed, and if the key is not found in the leaf node, we can
have one of two outcomes, depending on whether the leaf node has
the maximum number of keys or not:

1. If the leaf node has fewer than the maximum number of keys, key and data are
simply inserted in an ordered manner into the leaf node, and the index set is not
changed.
2. If the leaf node has the maximum number of keys, we will have to split it. After the
old node, a new leaf node is allocated and inserted into the sequence set (linked list of
leaf nodes). All keys less than the median key remain in the old leaf node, while all
keys greater than or equal to the median key are moved to the new node, along with
the corresponding data items. The median key becomes the new node's first key, and
this key (without a data item) is copied (rather than moved) to the parent node,
which is an internal node. As a result, this median key is present both in the leaf
node and in the internal node which is the parent of the leaf node.

Splitting of a leaf node Keys < Median remains in the old leaf node
Keys >= Median go to a new leaf node The median key is copied to the parent
node

If after splitting of the leaf node, the parent becomes full then again a split has
to be done. The splitting of an internal node is similar to that of the splitting of a
node in a B Tree. When an internal node is split the median key is moved to the
parent node.

Splitting of an internal node Keys < Median remains in the old leaf node.
Keys < Median go to a new leaf node. The median key is moved to the parent
node.
UNIT 5
This splitting continues till we get a nonfull parent node. If the root node is split
then a new root node has to be allocated.

Deletion in B+ Tree

First, a search is performed, and if the key is not found in the leaf node, we
can have one of two outcomes, depending on whether the leaf node has
the minimum number of keys or not:

1. If the leaf node has more than minimum elements, then we can simply delete the key
and its data item, and move other elements of the leaf node if required. In this case,
the index set is not changed, i.e. if the key is present in any internal node also then it
is not deleted from there. This is because the key still serves as a separator key
between its left and right children.
2. If the leaf node has a minimum number of nodes:
2.1. If any of the siblings have more than minimum nodes, then a key is borrowed
from it and the separator key in the parent node is updated accordingly.
2.2. If both siblings have minimum nodes then we merge the underflow node with its
sibling. This is done by moving keys and data from the underflow node to the sibling
node and deleting the underflow node. The merging could result in an underflow
parent node which is an internal node. For the internal nodes, the borrowing and
merging process is similar to that of the B tree.

Applications of B+ Tree

 A B+ tree is used to store records very efficiently by indexing the records using
the B+ tree indexed structure. Data access becomes faster and easier as a result
of multi-level indexing.
 A B+ tree's structural simplicity is preferred by many database system implementers.
As a result, it is commonly used in multilevel and database indexing.

Advantages of Indexing

 Indexing helps in faster query results or quick data retrieval.


 Indexing helps in faster sorting and grouping of records
 Some Indexing uses sorted and unique keys which helps to retrieve sorted
queries even faster.
 Index tables are smaller in size so require lesser memory.
 As Index tables are smaller in size, they are stored in the main memory.
 Since CPU speed and secondary memory speed have a large difference,
the CPU uses this main memory index table to bridge the gap of speeds.
 Indexing helps in better CPU utilization and better performance.

What is Hashing in DBMS?


UNIT 5
In DBMS, Hashing is a technique to directly search the location of desired data
om the disk without using index structure. Hashing method is used to index and
retrieve items in a database as it is faster to search that specific item using the
shorter hashed key instead of using that specific value.
Data is stored in the form of blocks whose address is generated by
applying a hash function in the memory location where these records are stored
known as a data block or data bucket.

Important Terminologies

 Data Bucket
Data buckets are the memory locations where the actual data records are
stored.
 Hash function
A mathematical function that returns the value of the address of the
desired data block. Mostly, it uses the primary key to generate the address
of a data block.
 Hash Index
It denotes the address of a data block generated by the hash function.
 Bucket Overflow
The condition when the memory address generated by the hash function
is not empty.i.e. it already contains some data record) is known as bucket
overflow as shown in the image below –
UNIT 5

Types of Hashing in DBMS

Hashing techniques can be further divided into two types i.e., static hashing and
dynamic hashing.

1. Static Hashing

In the static hashing, the resultant data bucket address will always remain
the same.
Therefore, if you generate an address for say Student_id = 10 using
hashing function mod (3), the resultant bucket address will always be 1.
So, you will not see any change in the bucket address.
Therefore, in this static hashing method, the number of data buckets in
memory aways remains constant.

Static Hashing Functions

Inserting a record: When a new record requires to be inserted into the


table, you can generate an address for the new record using its hash key.
When the address is generated, the record is automatically stored in that
location.
UNIT 5
Searching: When you need to retrieve the record, the same hash function
should be helped to retrieve the address of the bucket where data should
be stored.

Delete a record: Using the hash function, you can first fetch the record
which is you wants to delete. Then you can remove the records for that
address in memory.

Types of Static Hashing

If we want to insert some new record into file but the address of a data
bucket generated by the hash function is not empty, or data already exists
in that address. This situation in the static hashing is known as bucket
overflow, to overcome the situation, following methods are used –
 Open hashing
 Close hashing
Open hashing
In open hashing method, instead of overwriting older and the next
available data block is used to enter the new record, this method is also
known as Linear Probing.
For example: A2 is a new record which you wants to insert. The hash
function generates address as 222. But it is already occupied by some
other value. That’s why the system looks for the next data bucket 501 and
assigns A2 to it.

Close hashing
In the close hashing method. When buckets are full a new bucket it
allocated for the same hash and result are linked after the previous one.
This mechanism is known as overflow changing. This method requires on
extra link filed to each table position.
UNIT 5

2 Dynamic Hashing

The problem with static hashing is that it does not expand or shrink dynamically
as the database size grows or shrinks. Dynamic hashing helps us to overcome
the problem of bucket overflow. Dynamic hashing provides a mechanism in
which data bucket added or removed dynamically and on-demand. Dynamic
hashing is also known as extended hashing.

Introduction to RAID in DBMS

Redundancy array of independent disk (RAID) is a way to combine multiple


disk storages for increased performance, data redundancy and disk reliability.

What is the problem with single disk storage?

 No backup Disk: If a single disk failure occurs, the whole system fails to
perform. Due to data Redundancy if copy of data is stored in multiple
disks, then even if one disk failure occurs the system can still fetch data
from the other redundant disk.
UNIT 5
 Performance: If large amount of data is stored in a single disk it can
degrade the performance and effectiveness of the system. This can be
solved by using multiple disks with Redundant data.

In RAID technique, the combined disks are considered as a single logical disk
by the operating system. These individual disk uses different methods to store
data. It depends on the type of RAID levels used.

RAID Levels in DBMS

The different RAID levels used in DBMS are:

 RAID 0
 RAID 1
 RAID 2
 RAID 3
 RAID 4
 RAID 5
 RAID 6

RAID 0

RAID 0 implements data striping. The data blocks are placed in multiple disks
without redundancy. None of the disks are used for data redundancy so if one
disk fails then all the data in the array is lost.

Let's understand RAID 0 implementation with an example :

Block '10,11,12,13' form a stripe. Also, no data block is being repeated in any
disk.
UNIT 5
Instead of placing one block of data in a disk, we can place more than one block
of data in a disk and then move to another disk.

In above example, block 10 and 11 are first placed in DISK 0 and then we move
to another disk to place further blocks.

Pros of RAID 0
 All the disk space is utilized and hence performance is increased.
 Data requests can be on multiple disks and not on a single disk hence
improving the throughput.

Cons of RAID 0
 Failure of one disk can lead to complete data loss in the respective array.
 No data Redundancy is implemented so one disk failure can lead to
system failure.

RAID 1

RAID 1 implements mirroring which means the data of one disk is replicated
in another disk. This helps in preventing system failure as if one disk fails then
the redundant disk takes over.

Let's understand RAID 1 implementation with an example :

DISK 0 DISK 1 DISK 2 DISK 3


10 10 12 12
14 14 16 16
UNIT 5
DISK 0 DISK 1 DISK 2 DISK 3
18 18 20 20
22 22 24 24

Here Disk 0 and Disk 1 have the same data as disk 0 is copied to disk 1. Same is
the case with Disk 2 and Disk 3.

Pros of RAID 1
 Failure of one Disk does not lead to system failure as there is redundant
data in other disk.

Cons of RAID 1
 Extra space is required for each disk as each disk data is copied to some
other disk also.

RAID 2

RAID 2 is used when error in data has to be checked at bit level, which uses a
Hamming code detection method. Two disks are used in this technique. One is
used to store bit of each word in the disk and another is used to store error code
correction (Parity bits) of data words. The structure of this RAID is complex, so
it is not used commonly.

We will learn about Hamming code error detection in computer networks.

Let's understand RAID 2 with an Example:

Here Disk 3, Disk 4 and Disk 5 stores the parity bits of Data stored in Disk 0,
Disk 1, and Disk 2 respectively. Parity bits are used to detect the error in data.
UNIT 5
Pros of RAID 2
 It checks for error at a bit level for every data word.
 One full disk is used to store parity bits which helps in detecting error.

Cons of RAID 2
 Large extra space is used for parity bit storage.

RAID 3

RAID 3 implements byte-level striping of Data. Data is stored across disks


with their parity bits in a separate disk. The parity bits help to reconstruct the
data when there is a data loss.

Let's see RAID 3 High-level implementation with an example:

DISK 0 DISK 1 DISK 2 DISK 3


10 11 12 P(10,11,12)
14 15 16 P(14,15,16)
18 19 20 P(18,19,20)
22 23 24 P(22,23,24)

Here Disk 3 contains the Parity bits for Disk 0 Disk 1 and Disk 2. If any one of
the Disk's data is lost the data can be reconstructed using parity bits in Disk 3.

Pros of RAID 3
 Data can be recovered with the help of parity bits.

Cons of RAID 3
 Extra space for storing parity bits is used.

RAID 4

RAID 4 implements block-level striping of data with dedicated parity drive. If


only one of the data is lost in any disk then it can be reconstructed with the help
of parity drive. Parity is calculated with the help of XOR operation over each
data disk block.

Let's see RAID 4 with an example:


UNIT 5
DISK 0 DISK 1 DISK 2 DISK 3
0 1 0 P0
1 1 0 P1

Here P0 is calculated using XOR(0,1,0) = 1 and P1 is calculated using


XOR(1,1,0) = 0 If there is even number of 1 then XOR is 0 and for odd
numeber of 1 XOR is 1. If suppose Disk 0 data is lost, by checking parity P0=1
we will know that Disk 0 should have 0 to make the Parity P0 as 1 whereas if
there was 1 in Disk 0 it would have made the parity P0=0 which contradicts
with the current parity value.

Pros of RAID 4
 Parity bits helps to reconstruct the data if at most one data is lost from the
disks.

Cons of RAID 4
 Extra space for Parity is required.
 If there is more than one data loss from multiple disks then Parity cannot
help us reconstruct the data.

RAID 5

RAID 5 is similar to RAID 4 with only one difference. The parity Rotates
among the Disks.

Let's see an example of RAID 5 implementation:

DISK 0 DISK 1 DISK 2 DISK 3


0 1 0 P0
1 1 P1 0
1 P2 0 1
P3 1 0 0

Here We can see the rotation of Parity bits from Disk 3 to Disk 1.

Pros of RAID 5
 Parity is distributed over the disk and makes the performance better.
 Data can be reconstructed using parity bits.
UNIT 5
Cons of RAID 5
 Parity bits are useful only when there is data loss in at most one Disk. If
there is loss in more than one Disk Block then parity is of no use.
 Extra space for parity is required.

RAID 6

If there is more than one Disk failure, then RAID 6 implementation helps in that
case. In RAID 6 there are two parities in each array/row. It is similar to RAID 5
with extra parity.

Let's See RAID 6 with the help of an example:

DISK 0 DISK 1 DISK 2 DISK 3


0 1 Q0 P0
1 Q1 P1 0
Q2 P2 0 1
P3 1 0 Q3

Here P0,P1,P2,P3 and Q0,Q1,Q2,Q3 are two parity to reconstruct the data if
almost two disks fail.

Pros of RAID 6
 More parity helps in reconstructing at most 2 Disk data.

Cons of RAID 6
 Extra space is used for both parities. (P and Q).
 More than 2 disk failures can not be corrected.

You might also like