Unit 4
Unit 4
Unit-4
Transaction
A transaction can be defined as a group of tasks. A single task is the minimum
processing unit which cannot be divided further.
o The transaction is a set of logically related operation. It contains a group of tasks.
o A transaction is an action or series of actions. It is performed by a single user to perform
operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account. This
small transaction contains several low-level tasks:
X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction
Following are the main operations of transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a
buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following
Data Base Management System 2
operations:
1. 1. R(X);
2. 2. X = X - 500;
Data Base Management System 3
3. 3. W(X);
For example: If in the above transaction, the debit transaction fails after executing operation 2
then X's value will remain 4000 in the database which is not acceptable by the bank.
ACID Properties
A transaction is a very small unit of a program and it may contain several lowlevel tasks. A
transaction in a database system must maintain Atomicity, Consistency, Isolation,
and Durability − commonly known as ACID properties − in order to ensure accuracy,
completeness, and data integrity.
• Atomicity − This property states that a transaction must be treated as an atomic unit,
that is, either all of its operations are executed or none. There must be no state in a
database where a transaction is left partially completed. States should be defined either
before the execution of the transaction or after the execution/abortion/failure of the
transaction.
• Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a transaction, it must remain
consistent after the execution of the transaction as well.
• Durability − The database should be durable enough to hold all its latest updates even if
the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data. If a transaction commits but the
system fails before the data could be written on to the disk, then that data will be
updated once the system springs back into action.
• Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions
Data Base Management System 4
will be carried out and executed as if it is the only transaction in the system. No
transaction will affect the existence of any other transaction.
States of Transactions
A transaction in a database can be in one of the following states −
• Active − In this state, the transaction is being executed. This is the initial state of every
transaction.
• Partially Committed − When a transaction executes its final operation, it is said to be in
a partially committed state.
• Failed − A transaction is said to be in a failed state if any of the checks made by the
database recovery system fails. A failed transaction can no longer proceed further.
• Aborted − If any of the checks fails and the transaction has reached a failed state, then
the recovery manager rolls back all its write operations on the database to bring the
database back to its original state where it was prior to the execution of the transaction.
Transactions in this state are called aborted. The database recovery module can select
one of the two operations after a transaction aborts −
o Re-start the transaction
o Kill the transaction
• Committed − If a transaction executes all its operations successfully, it is said to be
committed. All its effects are now permanently established on the database system.
Transaction Properties
The transaction has the four properties. These are used to maintain consistency in a database,
before and after the transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
Data Base Management System 5
4. Durability
Atomicity
o It states that all operations of the transaction take place at once if not, the transaction is
aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is
treated as one unit and either run to completion or is not executed at all.
Atomicity involves the following two operations:
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Consistency
o The integrity constraints are maintained so that the database is consistent before and
after the transaction.
o The execution of a transaction will leave a database in either its prior stable state or a
new stable state.
o The consistent property of database states that every transaction sees a consistent
database instance.
o The transaction is used to transform the database from one consistent state to another
consistent state.
Isolation
o It shows that the data which is used at the time of execution of a transaction cannot be
used by the second transaction until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then that
data item can't be accessed by any other transaction T2 until the transaction T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.
Durability
o The durability property is used to indicate the performance of the database's consistent
state. It states that the transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the system
failure. When a transaction is completed, then the database reaches a state known as
the consistent state. That consistent state cannot be lost, even in the event of a
system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability property.
Data Base Management System 6
Example
Let’s take two transactions T1 and T2,
If both transactions are performed without interfering each other then it is called
as serial schedule, it can be represented as follows –
T1 T2
READ1(A)
WRITE1(A)
READ1(B)
C1
READ2(B)
WRITE2(B)
READ2(B)
C2
Data Base Management System 7
Example
Consider the following example −
T1 T2
READ1(A)
WRITE1(A)
READ2(B)
WRITE2(B)
READ1(B)
WRITE1(B)
READ1(B)
Types of serializability
There are two types of serializability –
View serializability
Conflict serializability
That means
Precedence graph
Example 1
The cycle is present so it is not conflict serializable.
Example 2
Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point from
where it can't go any further. If a few transaction or process is hurt, then this is called
as transaction failure.
1. Logical errors: If a transaction cannot complete due to some code error or an internal
error condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active transaction
because the database system is not able to execute it. For example, The system
aborts an active transaction, in case of deadlock or resource unavailability.
Data Base Management System 11
Recoverability in DBMS
Irrecoverable schedules
If a transaction does a dirty read operation from an uncommitted transaction and commits before the
transaction from where it has read the value, then such a schedule is called an irrecoverable
schedule.
Example
T1 T2
Read(A)
Write(A)
Read(A)
- ///Dirty
Read
- Write(A)
- Commit
Data Base Management System 12
T1 T2
Rollback
Recoverable Schedules
If any transaction that performs a dirty read operation from an uncommitted transaction and also its
committed operation becomes delayed till the uncommitted transaction is either committed or
rollback such type of schedules is called as Recoverable Schedules.
Example
T1 T2
Read(A)
Write(A)
Read(A)
- ///Dirty
Read
Data Base Management System 13
T1 T2
- Write(A)
Commit
Commit //
delayed
Let's assume there is a transaction to modify the City of a student. The following logs are
written for this transaction.
o When the transaction is initiated, then it writes 'start' log.
1. <Tn, Start>
o When the transaction modifies the City from 'Noida' to 'Bangalore', then another log
is written to the file.
1. <Tn, City, 'Noida', 'Bangalore' >
o When the transaction is finished, then it writes another log to indicate the end of
the transaction.
1. <Tn, Commit>
Checkpoint
o The checkpoint is a type of mechanism where all the previous logs are
removed from the system and permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the transaction,
such checkpoints are marked, and the transaction is executed then using the
steps of the transaction, the log files will be created.
o When it reaches to the checkpoint, then the transaction will be updated into
the database, and till that point, the entire log file will be removed from the
file. Then the log file is updated with the new step of transaction till next
checkpoint and so on.
o The checkpoint is used to declare a point before which the DBMS was in the
consistent state, and all transactions were committed.
Recovery using Checkpoint
In the following manner, a recovery system recovers the database from this
failure:
Data Base Management System 16
o The recovery system reads log files from the end to start. It reads log files from T4 to T1.
o Recovery system maintains two lists, a redo-list, and an undo-list.
o The transaction is put into redo state if the recovery system sees a log with
o <Tn, Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous list,
all the transactions are removed and then redone before saving their logs.
o For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn, Commit>.
The T1 transaction will have only <Tn, commit> in the log file. That's why the transaction is
committed after the checkpoint is crossed. Hence it puts T1, T2 and T3 transaction into redo
list.
o The transaction is put into undo state if the recovery system sees a log with <Tn, Start> but
no commit or abort log found. In the undo-list, all the transactions are undone, and their
logs are removed.
o For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo list since this
transaction is not yet complete and failed amid.
Types of Checkpoints
There are basically two main types of Checkpoints:
1. Automatic Checkpoint
2. Manual Checkpoint
Data Base Management System 17
1. Automatic Checkpoint: These checkpoints occur very frequently like every hour or every
day. These intervals are set by the database administrator. They are generally used by heavy
databases as they are frequently updated, and we can recover the data easily in case of
failure.
2. Manual Checkpoint: These are the checkpoints that are manually set by the database
administrator. Manual checkpoints are generally used for smaller databases. They are
updated very less frequently only when they are set by the database administrator.
Advantages of Checkpoints
• Checkpoints help us in recovering the transaction of the database in case of a random
shutdown of the database.
• It enhancing the consistency of the database in case when multiple transactions are
executing in the database simultaneously.
• It increasing the data recovery process.
• Checkpoints work as a synchronization point between the database and the transaction log
file in the database.
• Checkpoint records in the log file are used to prevent unnecessary redo operations.
• Since dirty pages are flushed out continuously in the background, it has a very low overhead
and can be done frequently.
• Checkpoints provide the baseline information needed for the restoration of the lost state in
the event of a system failure.
• A database checkpoint keeps track of change information and enables incremental database
backup.
• A database storage checkpoint can be mounted, allowing regular file system operations to
be performed.
• Database checkpoints can be used for application solutions which include backup, recovery
or database modifications.
Data Base Management System 18
Disadvantages of Checkpoints
1. Database storage checkpoints can only be used to restore from logical errors (E.g. a
human error).
2. Because all the data blocks are on the same physical device, database storage checkpoints
cannot be used to restore files due to a media failure.
For example, in the following wait-for-graph, transaction T1 is waiting for data item X which is
locked by T3. T3 is waiting for Y which is locked by T2 and T2 is waiting for Z which is locked by
T1. Hence, a waiting cycle is formed, and none of the transactions can proceed executing.
Data Base Management System 19
Deadlock Handling
There are three classical approaches for deadlock handling, namely −
• Deadlock prevention.
• Deadlock avoidance.
• Deadlock detection and removal.
All of the three approaches can be incorporated in both a centralized and a distributed
database system.
Deadlock Prevention
The deadlock prevention approach does not allow any transaction to acquire locks that will
lead to deadlocks. The convention is that when more than one transactions request for
locking the same data item, only one of them is granted the lock.
One of the most popular deadlock prevention methods is pre-acquisition of all the locks. In
this method, a transaction acquires all the locks before starting to execute and retains the
locks for the entire duration of transaction. If another transaction needs any of the already
acquired locks, it has to wait until all the locks it needs are available. Using this approach, the
system is prevented from being deadlocked since none of the waiting transactions are holding
any lock.
Deadlock Avoidance
The deadlock avoidance approach handles deadlocks before they occur. It analyzes the
transactions and the locks to determine whether or not waiting leads to a deadlock.
The method can be briefly stated as follows. Transactions start executing and request data
items that they need to lock. The lock manager checks whether the lock is available. If it is
available, the lock manager allocates the data item and the transaction acquires the lock.
However, if the item is locked by some other transaction in incompatible mode, the lock
Data Base Management System 20
manager runs an algorithm to test whether keeping the transaction in waiting state will cause
a deadlock or not. Accordingly, the algorithm decides whether the transaction can wait or one
of the transactions should be aborted.
There are two algorithms for this purpose, namely wait-die and wound-wait. Let us assume
that there are two transactions, T1 and T2, where T1 tries to lock a data item which is already
locked by T2. The algorithms are as follows –
• Wait-Die − If T1 is older than T2, T1 is allowed to wait. Otherwise, if T1 is younger than T2,
T1 is aborted and later restarted.
• Wound-Wait − If T1 is older than T2, T2 is aborted and later restarted. Otherwise, if T1 is
younger than T2, T1 is allowed to wait.
But before knowing about concurrency control, we should know about concurrent execution.
Concurrency Control
Concurrency Control is the working concept that is required for controlling and managing the
concurrent execution of database operations and thus avoiding the inconsistencies in the
database. Thus, for maintaining the concurrency of the database, we have the concurrency
control protocols.
Lock management
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
Database Security
Security of databases refers to the array of controls, tools, and procedures designed to ensure
and safeguard confidentiality, integrity, and accessibility. This tutorial will concentrate on
confidentiality because it's a component that is most at risk in data security breaches.
Security for databases must cover and safeguard the following aspects:
Data Base Management System 25
Security of databases is a complicated and challenging task that requires all aspects of security
practices and technologies. This is inherently at odds with the accessibility of databases. The
more usable and accessible the database is, the more susceptible we are to threats from
security. The more vulnerable it is to attacks and threats, the more difficult it is to access and
utilize.
Distributed databases
A distributed database is a collection of multiple interconnected databases, which are spread
physically across various locations that communicate via a computer network.
• Databases in the collection are logically interrelated with each other. Often they
represent a single logical database.
• Data is physically stored across multiple sites. Data in each site can be managed by a
DBMS independent of the other sites.
• The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
• A distributed database is not a loosely connected file system.
• A distributed database incorporates transaction processing, but it is not synonymous
with a transaction processing system.
A distributed database is basically a database that is not limited to one system, it is spread
over different sites, i.e, on multiple computers or over a network of computers. A distributed
database system is located on various sites that don’t share physical components. This may
be required when a particular database needs to be accessed by various users globally. It
needs to be managed such that for the users it looks like one single database.
2. Heterogeneous Database:
In a heterogeneous distributed database, different sites can use different schema and
software that can lead to problems in query processing and transactions. Also, a particular
site might be completely unaware of the other sites. Different computers may use a different
operating system, different database application. They may even use different data models
for the database. Hence, translations are required for different sites to communicate.
1. Object Structure:
The structure of an object refers to the properties that an object is made up of. These
properties of an object are referred to as an attribute. Thus, an object is a real-world entity
with certain attributes that makes up the object structure. Also, an object encapsulates the
data code into a single unit which in turn provides data abstraction by hiding the
implementation details from the user.
The object structure is further composed of three types of components: Messages, Methods,
and Variables. These are explained below.
1. Messages –
A message provides an interface or acts as a communication medium between an object
and the outside world. A message can be of two types:
• Read-only message: If the invoked method does not change the value of a variable,
then the invoking message is said to be a read-only message.
• Update message: If the invoked method changes the value of a variable, then the
invoking message is said to be an update message.
2. Methods –
When a message is passed then the body of code that is executed is known as a method.
Whenever a method is executed, it returns a value as output. A method can be of two
types:
• Read-only method: When the value of a variable is not affected by a method, then it is
known as the read-only method.
• Update-method: When the value of a variable change by a method, then it is known
as an update method.
Data Base Management System 29
3. Variables –
It stores the data of an object. The data stored in the variables makes the object
distinguishable from one another.
2. Object Classes:
An object which is a real-world entity is an instance of a class. Hence first we need to define a
class and then the objects are made which differ in the values they store but share the same
class definition. The objects in turn correspond to various messages and variables stored in
them.
Example –
class CLERK
{ //variables
char name;
string address;
int id;
int salary;
//Messages
char get_name();
string get_address();
int annual_salary();
};
In the above example, we can see, CLERK is a class that holds the object variables and
messages.
Features of ODBMS:
Object-oriented data model: ODBMS uses an object-oriented data model to store and
manage data. This allows developers to work with data in a more natural way, as objects are
similar to the objects in the programming language they are using.
Complex data types: ODBMS supports complex data types such as arrays, lists, sets, and
graphs, allowing developers to store and manage complex data structures in the database.
Automatic schema management: ODBMS automatically manages the schema of the
database, as the schema is defined by the classes and objects in the application code. This
eliminates the need for a separate schema definition language and simplifies the
development process.
Data Base Management System 30
High performance: ODBMS can provide high performance, especially for applications that
require complex data access patterns, as objects can be retrieved with a single query.
Data integrity: ODBMS provides strong data integrity, as the relationships between objects
are maintained by the database. This ensures that data remains consistent and correct, even
in complex applications.
Concurrency control: ODBMS provides concurrency control mechanisms that ensure that
multiple users can access and modify the same data without conflicts.
Scalability: ODBMS can scale horizontally by adding more servers to the database cluster,
allowing it to handle large volumes of data.
Support for transactions: ODBMS supports transactions, which ensure that multiple
operations on the database are atomic and consistent.
Advantages:
Supports Complex Data Structures: ODBMS is designed to handle complex data structures,
such as inheritance, polymorphism, and encapsulation. This makes it easier to work with
complex data models in an object-oriented programming environment.
Improved Performance: ODBMS provides improved performance compared to traditional
relational databases for complex data models. ODBMS can reduce the amount of mapping
and translation required between the programming language and the database, which can
improve performance.
Reduced Development Time: ODBMS can reduce development time since it eliminates the
need to map objects to tables and allows developers to work directly with objects in the
database.
Supports Rich Data Types: ODBMS supports rich data types, such as audio, video, images, and
spatial data, which can be challenging to store and retrieve in traditional relational databases.
Scalability: ODBMS can scale horizontally and vertically, which means it can handle larger
volumes of data and can support more users.
Disadvantages:
Limited Adoption: ODBMS is not as widely adopted as traditional relational databases, which
means it may be more challenging to find developers with experience working with ODBMS.
Lack of Standardization: ODBMS lacks standardization, which means that different vendors
may implement different features and functionality.
Cost: ODBMS can be more expensive than traditional relational databases since it requires
specialized software and hardware.
Integration with Other Systems: ODBMS can be challenging to integrate with other systems,
such as business intelligence tools and reporting software.
Data Base Management System 31
Scalability Challenges: ODBMS may face scalability challenges due to the complexity of the
data models it supports, which can make it challenging to partition data across multiple
nodes.