0% found this document useful (0 votes)
14 views47 pages

Unit 4transaction

Transaction processing involves managing individual operations called transactions to ensure data integrity and support interactive applications. It contrasts with batch processing and relies on ACID properties to maintain consistency during concurrent execution, which can lead to issues like lost updates and dirty reads. Concurrency control protocols, including lock-based and timestamp ordering, are essential for managing simultaneous transactions and ensuring database reliability.

Uploaded by

anikasahu720
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views47 pages

Unit 4transaction

Transaction processing involves managing individual operations called transactions to ensure data integrity and support interactive applications. It contrasts with batch processing and relies on ACID properties to maintain consistency during concurrent execution, which can lead to issues like lost updates and dirty reads. Concurrency control protocols, including lock-based and timestamp ordering, are essential for managing simultaneous transactions and ensuring database reliability.

Uploaded by

anikasahu720
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

UNIT-4

Transaction processing
Transaction processing is a style of computing, typically performed by large server
computers, that supports interactive applications. In transaction processing, work is divided
into individual, indivisible operations, called transactions. By contrast, batch processing is a
style of computing in which one or more programs processes a series of records (a batch)
with little or no action from the user or operator.

A transaction processing system allows application programmers to concentrate on writing code


that supports the business, by shielding application programs from the details of transaction
management:

 It manages the concurrent processing of transactions.


 It enables the sharing of data.
 It ensures the integrity of data.
 It manages the prioritization of transaction execution.

When a transaction starts processing, CICS runs a program that is associated with the
transaction. That program can transfer control to other programs in the course of the
transaction, making it possible to assemble modular applications consisting of many CICS
programs.

At any time, in a CICS system, many instances of a transaction can run at the same time. A
single instance of a running transaction is known as a task.

During the time that a task is running, it has exclusive use of (or holds a lock for) each data
resource that it changes, ensuring the isolation property of the transaction. Other tasks that
require access to the resources must wait until the lock is released. To ensure overall
responsiveness of the transaction processing system, you design your CICS application
programs so that they hold locks for the shortest possible time. For example, you can design
your transactions to be short-lived because CICS releases locks when a task ends.

 ACID properties of transactions


In the context of transaction processing, the acronym ACID refers to the four key properties
of a transaction: atomicity, consistency, isolation, and durability.
 Commit and rollback
To assure the ACID properties of a transaction, any changes made to data in the course of a
transaction must be committed or rolled back.
 Recovery and restart
CICS continually records information about the state of the region and about the state of each
unit of work in the region. This information is preserved and used when a region is restarted,
thus enabling CICS to restart with no loss of data integrity.

DBMS Concurrency Control


Concurrency Control is the management procedure that is required for controlling
concurrent execution of the operations that take place on a database.

But before knowing about concurrency control, we should know about concurrent
execution.

Concurrent Execution in DBMS


o In a multi-user system, multiple users can access and use the same database at one
time, which is known as the concurrent execution of the database. It means that the
same database is executed simultaneously on a multi-user system by different users.
o While working on the database transactions, there occurs the requirement of using
the database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
o The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations,
thus maintaining the consistency of the database. Thus, on making the concurrent
execution of the transaction operations, there occur several challenging problems
that need to be solved.

Problems with Concurrent Execution


In a database transaction, the two main operations are READ and WRITE operations.
So, there is a need to manage these two operations in the concurrent execution of
the transactions as if these operations are not performed in an interleaved manner,
and the data may become inconsistent. So, the following problems occur with the
Concurrent Execution of the operations:

Problem 1: Lost Update Problems (W - W Conflict)


The problem occurs when two different database transactions perform the read/write
operations on the same database items in an interleaved manner (i.e., concurrent
execution) that makes the values of the items incorrect hence making the database
inconsistent.

consider the below diagram where two transactions TX and TY, are performed
on the same account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be $300
only because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only added but
not updated/write).
o At time t6, transaction TX writes the value of account A that will be updated as $250
only, as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by TX is lost, i.e., $250 is
lost.

Hence data becomes incorrect, and database sets to inconsistent.

Dirty Read Problems (W-R Conflict)


The dirty read problem occurs when one transaction updates an item of the database,
and somehow the transaction fails, and before the data gets rollback, the updated
database item is accessed by another transaction. There comes the Read-Write Conflict
between both transactions.

For example:

Consider two transactions TX and TY in the below diagram performing


read/write operations on account A where the available balance in account A is
$300:

o At time t1, transaction TX reads the value of account A, i.e., $300.


o At time t2, transaction TX adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e., $350.
o Then at time t4, transaction TY reads account A that will be read as $350.
o Then at time t5, transaction TX rollbacks due to server problem, and the value
changes back to $300 (as initially).
o But the value for account A remains $350 for transaction TY as committed, which is
the dirty read and therefore known as the Dirty Read Problem.

Unrepeatable Read Problem (W-R Conflict)


Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two
different values are read for the same database item.

For example:
Consider two transactions, TX and TY, performing the read/write operations on
account A, having an available balance = $300. The diagram is shown below:

o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100 to the
available balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction TX reads the available value of account A, and that
will be read as $400.
o It means that within the same transaction TX, it reads two different values of account
A, i.e., $ 300 initially, and after updation made by transaction T Y, it reads $400. It is an
unrepeatable read and is therefore known as the Unrepeatable read problem.

Thus, in order to maintain consistency in the database and avoid such problems that
take place in concurrent execution, management is needed, and that is where the
concept of Concurrency Control comes into role.

Concurrency Control
Concurrency Control is the working concept that is required for controlling and
managing the concurrent execution of database operations and thus avoiding the
inconsistencies in the database. Thus, for maintaining the concurrency of the
database, we have the concurrency control protocols.
Concurrency Control Protocols
The concurrency control protocols ensure the atomicity, consistency, isolation,
durability and serializability of the concurrent execution of the database transactions.
Therefore, these protocols are categorized as sections.:

o Lock Based Concurrency Control Protocol


o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol

Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:

1. Shared lock:

o It is also known as a Read-only lock. In a shared lock, the data item can only read by
the transaction.
o It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.

2. Exclusive lock:

o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same
data simultaneously.

There are four types of lock protocols available:


1. Simplistic lock protocol
It is the simplest way of locking the data while transaction. Simplistic lock-based
protocols allow all the transactions to get the lock on the data before insert or delete
or update on it. It will unlock the data item after completing the transaction.

2. Pre-claiming Lock Protocol


o Pre-claiming Lock Protocols evaluate the transaction to list all the data items on
which they need locks.
o Before initiating an execution of the transaction, it requests DBMS for all the lock on
all those data items.
o If all the locks are granted then this protocol allows the transaction to begin. When
the transaction is completed then it releases all the lock.
o If all the locks are not granted then this protocol allows the transaction to rolls back
and waits until all the locks are granted.

3. Two-phase locking (2PL)


o The two-phase locking protocol divides the execution phase of the transaction into
three parts.
o In the first part, when the execution of the transaction starts, it seeks permission for
the lock it requires.
o In the second part, the transaction acquires all the locks. The third phase is started as
soon as the transaction releases its first lock.
o In the third phase, the transaction cannot demand any new locks. It only releases the
acquired locks.
There are two phases of 2PL:

Growing phase: In the growing phase, a new lock on the data item may be acquired
by the transaction, but none can be released.

Shrinking phase: In the shrinking phase, existing lock held by the transaction may
be released, but no new locks can be acquired.

In the below example, if lock conversion is allowed then the following phase can
happen:

1. Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.


2. Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.

Example:
The following way shows how unlocking and locking work with 2-PL.

Transaction T1:

o Growing phase: from step 1-3


o Shrinking phase: from step 5-7
o Lock point: at 3

Transaction T2:

o Growing phase: from step 2-6


o Shrinking phase: from step 8-9
o Lock point: at 6
4. Strict Two-phase locking (Strict-2PL)
o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the
locks, the transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a
lock after using it.
o Strict-2PL waits until the whole transaction to commit, and then it releases all the
locks at a time.
o Strict-2PL protocol does not have shrinking phase of lock release.

It does not have cascading abort as 2PL does.

Timestamp Ordering Protocol


o The Timestamp Ordering Protocol is used to order the transactions based on
their Timestamps. The order of transaction is nothing but the ascending order
of the transaction creation.
o The priority of the older transaction is higher that's why it executes first. To
determine the timestamp of the transaction, this protocol uses system time or
logical counter.
o The lock-based protocol is used to manage the order between conflicting
pairs among transactions at the execution time. But Timestamp based
protocols start working as soon as a transaction is created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1
has entered the system at 007 times and transaction T2 has entered the
system at 009 times. T1 has the higher priority, so it executes first as it is
entered the system first.
o The timestamp ordering protocol also maintains the timestamp of last 'read'
and 'write' operation on a data.

Basic Timestamp ordering protocol works as follows:

1. Check the following condition whenever a transaction Ti issues a Read


(X) operation:

o If W_TS(X) >TS(Ti) then the operation is rejected.


o If W_TS(X) <= TS(Ti) then the operation is executed.
o Timestamps of all the data items are updated.

2. Check the following condition whenever a transaction Ti issues


a Write(X) operation:

o If TS(Ti) < R_TS(X) then the operation is rejected.


o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back
otherwise the operation is executed.

Where,

TS(TI) denotes the timestamp of the transaction Ti.

R_TS(X) denotes the Read time-stamp of data-item X.

W_TS(X) denotes the Write time-stamp of data-item X.

Advantages and Disadvantages of TO protocol:


o TO protocol ensures serializability since the precedence graph is as follows:
o TS protocol ensures freedom from deadlock that means no transaction ever
waits.
o But the schedule may not be recoverable and may not even be cascade- free.

Validation Based Protocol


Validation phase is also known as optimistic concurrency control technique. In the
validation based protocol, the transaction is executed in the following three phases:

1. Read phase: In this phase, the transaction T is read and executed. It is used to
read the value of various data items and stores them in temporary local
variables. It can perform all the write operations on temporary variables
without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated
against the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the
temporary results are written to the database or system otherwise the
transaction is rolled back.

Here each phase has the following different timestamps:

Start(Ti): It contains the time when Ti started its execution.

Validation (Ti): It contains the time when Ti finishes its read phase and starts its
validation phase.

Finish(Ti): It contains the time when Ti finishes its write phase.


o This protocol is used to determine the time stamp for the transaction for
serialization using the time stamp of the validation phase, as it is the actual
phase which determines if the transaction will commit or rollback.
o Hence TS(T) = validation(T).
o The serializability is determined during the validation process. It can't be
decided in advance.
o While executing the transaction, it ensures a greater degree of concurrency
and also less number of conflicts.
o Thus it contains transactions which have less number of rollbacks.

Transaction Recovery
Availability is the Holy Grail of database administrators. If your data is not
available, your applications can not run, and therefore your company is losing
business. Lost business translates into lower profitability and, perhaps, a
lower stock valuation for your company. These are all detrimental to the
business, so the DBA must do everything in his power to ensure that
databases are kept online and operational. This has been the duty of the DBA
since the days of the first DBMS.

But the need for availability is increasing. The days of the long batch window
where databases can be offline for extended periods to perform nightly
processing are diminishing. Exacerbating this trend is the drive toward e-
business. And when your business is on the World Wide Web, it never closes.

These demands for higher availability make traditional forms of database


recovery inadequate. Today’s DBA must understand Transaction Recovery
techniques to be able to prepare an optimal approach for every recovery
situation. This article will cover Transaction Recovery from a DB2 for OS/390
point-of-view.

When DBAs hear the word "recovery," the first thing that usually comes to
mind is handling some sort of disaster. This disaster could be anything from a
simple media failure to a natural disaster destroying your data center.
Applications are completely unavailable until the recovery is complete.

Another traditional type of recovery is a point-in-time (PIT) recovery. PIT


recovery usually is performed to deal with an application level problem.
Conventional techniques to perform a PIT recovery will remove the effects of
all transactions since that specified point in time. This can cause problems if
there were any valid transactions during that timeframe that needed to be
kept.

Transaction Recovery is a third type of recovery that addresses the


shortcomings of the traditional types of recovery: downtime and loss of good
data. Transaction Recovery is an application recovery whereby the effects of
specific transactions during a specified timeframe are removed from the
database.

Historically, recovery was performed primarily to overcome disasters and


hardware failures. However, this is simply not the case anymore. Application
failures, not hardware failures, are the predominant drivers of recovery needs.
Industry analysts at the GartnerGroup estimate that as much as 80 percect of
application errors are due to software failures and human error. Although
hardware and operating system failures were common several years ago,
modern hardware and operating systems are more reliable, with a high mean
time between failure.

In reality, except for disaster recovery tests, very few DBAs need to perform
true disaster recovery. While media does fail, it’s actually quite rare. User
errors and application failures are the most common causes of problems
requiring recovery, and therefore, the primary cause for system unavailability.
Any number of problems can occur at the application level.

As databases grow in size and complexity, so do the chances that bad


transactions will corrupt the data on which your business depends.

Transaction Recovery Defined

Transaction Recovery is the process of removing the undesired effects of


specific transactions from the database. This statement, while simple on the
surface, hides a bevy of complicated details.

Traditional recovery is at the database object level: for example, at the


tablespace or index level. When performing a traditional recovery, a specific
database object is chosen, a backup copy of that object is applied, and then
log entries are applied for changes that occurred after the image copy was
taken. This approach is used to recover the database object to a specific,
desired point in time. If multiple objects must be recovered, this approach is
repeated for each object impacted.

Transaction Recovery allows a user to recover a specific portion of the


tablespace, based on user-defined criteria, so only a portion of the data is
affected. Any associated indexes are automatically recovered as the
transaction is recovered. The transaction may impact data in multiple
tablespaces, too.
A transaction is a set of related operations that, when grouped together,
define a logical unit of work within an application. Transactions are defined by
the user’s view of the process, for example, the set of panels comprising a
new hire operation or perhaps the set of jobs that post to the general ledger.

Using Transaction Recovery, application problems can be addressed quicker,


thereby maintaining a higher level of data availability. The database does not
always need to be taken offline while Transaction Recovery occurs (it
depends on the type of Transaction Recovery being performed).

There are three types of Transaction Recovery: PIT, UNDO and REDO.

Point-in-Time Recovery

Point-in-time recovery is the simplest strategy. It also is the only one


supported by native DB2 utilities. With PIT recovery, you remove all
transactions since a given point in time and then manually reenter the valid
work. The desired result is to maintain "Good Transaction 1" and "Good
Transaction 2," while removing the "Bad Transactions" from the system.

You must be able to determine a common recovery point for a set of


tablespaces. A DB2 QUIESCE works fine, but if that is not available, you will
have to determine a point of consistency to be used for recovery.

After the point-in-time recovery, good transactions are missing from the
database. If the information is still available, the user could rerun or reenter
"Good Transaction 2." Regardless of the type of recovery to be performed, if
the error that caused the recovery to be performed is caught too late,
subsequent processing could have occurred using the "bad data." How to deal
with these types of problems depends on the nature of the data and the type
of updates applied, and needs to be handled on an application by application
basis.

UNDO Transaction Recovery

The second possibility is to deploy UNDO Transaction Recovery. This is the


simplest type of SQL-based Transaction Recovery. It involves generating
UNDO SQL statements to reverse the effect of the transactions in error. To
generate UNDO SQL, the DB2 log is read to find the data modifications that
were applied during a given timeframe and:

• INSERTs are turned into DELETEs

• DELETEs are turned into INSERTs

• UPDATEs are reversed to UPDATE to the old value


To accomplish this transformation, a solution is required that understands the
DB2 log format and can create the SQL needed to undo the data
modifications.

Note that in the case of UNDO Transaction Recovery, the portion of the
database that does not need to be recovered remains undisturbed. When
undoing erroneous transactions, recovery can be done online without suffering
an application or database outage. But, the potential for anomalies causing
failures in the UNDO is certainly a consideration.

REDO Transaction Recovery

The REDO Transaction Recovery strategy is a combination of the first two


recovery techniques we have discussed – but with a twist.

Instead of generating SQL for the bad transaction that we want to eliminate,
we generate the SQL for the transactions we want to save. Then we do a
standard PIT recovery, eliminating all the transactions since the recovery
point. Finally, we reapply the good transactions captured in the first step.

Unlike the UNDO process, which creates SQL statements that are designed to
back out all of the problem transactions, the REDO process creates SQL
statements designed to reapply only the valid transactions from a consistent
point of recovery to the current time. Since the REDO process does not
generate SQL for the problem transactions, performing a recovery and then
executing the REDO SQL can restore the tablespace to a state that does not
include the problem transactions.
Transaction properties
The transaction has the four properties. These are used to maintain consistency in a
database, before and after the transaction.

Property of Transaction

1. Atomicity
2. Consistency
3. Isolation
4. Durability

Atomicity

o It states that all operations of the transaction take place at once if not, the transaction
is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is
treated as one unit and either run to completion or is not executed at all.

Atomicity involves the following two operations:

Abort: If a transaction aborts then all the changes made are not visible.

Commit: If a transaction commits then all the changes made are visible.

Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs
600 and B consists of Rs 300. Transfer Rs 100 from account A to account B.

After completion of the transaction, A consists of Rs 500 and B consists of Rs 400.


T1 T2

Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)
If the transaction T fails after the completion of transaction T1 but before completion of
transaction T2, then the amount will be deducted from A but not added to B. This shows the
inconsistent database state. In order to ensure correctness of database state, the transaction
must be executed in entirety.

Consistency

o The integrity constraints are maintained so that the database is consistent before and
after the transaction.
o The execution of a transaction will leave a database in either its prior stable state or a
new stable state.
o The consistent property of database states that every transaction sees a consistent
database instance.
o The transaction is used to transform the database from one consistent state to
another consistent state.

For example: The total amount must be maintained before or after the transaction.

Example: Let's assume that following transaction T consisting of T1 and T2. A


consists of Rs 600 and B consists of Rs 300. Transfer Rs 100 from account A to
account B.

1. Total before T occurs = 600+300=900


2. Total after T occurs= 500+400=900

Therefore, the database is consistent. In the case when T1 is completed but T2 fails, then
inconsistency will occur.

Isolation

o It shows that the data which is used at the time of execution of a transaction cannot
be used by the second transaction until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then
that data item can't be accessed by any other transaction T2 until the transaction T1
ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.

Durability

o The durability property is used to indicate the performance of the database's


consistent state. It states that the transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the
system failure. When a transaction is completed, then the database reaches a state
known as the consistent state. That consistent state cannot be lost, even in the event
of a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability property.

States of Transaction
In a database, the transaction can be in one of the following states -

Active state
o The active state is the first state of every transaction. In this state, the transaction is
being executed.
o For example: Insertion or deletion or updating a record is done here. But all the
records are still not saved to the database.
Partially committed
o In the partially committed state, a transaction executes its final operation, but the
data is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is
executed in this state.

Committed

A transaction is said to be in a committed state if it executes all its operations successfully. In


this state, all the effects are now permanently saved on the database system.

Failed state
o If any of the checks made by the database recovery system fails, then the transaction
is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to
fetch the marks, then the transaction will fail to execute.

Aborted
o If any of the checks fail and the transaction has reached a failed state then the
database recovery system will make sure that the database is in its previous
consistent state. If not then it will abort or roll back the transaction to bring the
database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the
two operations:
1. Re-start the transaction
2. Kill the transaction
Schedule
A series of operation from one transaction to another transaction is known as schedule. It is
used to preserve the order of the operation in each of the individual transaction.

1. Serial Schedule

The serial schedule is a type of schedule where one transaction is executed completely
before starting another transaction. In the serial schedule, when the first transaction
completes its cycle, then the next transaction is executed.

For example: Suppose there are two transactions T1 and T2 which have some operations. If
it has no interleaving of operations, then there are the following two possible outcomes:

1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.

o In the given (a) figure, Schedule A shows the serial schedule where T1 followed by T2.
o In the given (b) figure, Schedule B shows the serial schedule where T2 followed by T1.

2. Non-serial Schedule

o If interleaving of operations is allowed, then there will be non-serial schedule.


o It contains many possible orders in which the system can execute the individual
operations of the transactions.
o In the given figure (c) and (d), Schedule C and Schedule D are the non-serial
schedules. It has interleaving of operations.
3. Serializable schedule

o The serializability of schedules is used to find non-serial schedules that allow the
transaction to execute concurrently without interfering with one another.
o It identifies which schedules are correct when executions of the transaction have
interleaving of their operations.
o A non-serial schedule will be serializable if its result is equal to the result of its
transactions executed serially.
Here,

Schedule A and Schedule B are serial schedule.

Schedule C and Schedule D are Non-serial schedule.

Testing of Serializability
Serialization Graph is used to test the Serializability of a schedule.

Assume a schedule S. For S, we construct a graph known as precedence graph. This graph
has a pair G = (V, E), where V consists a set of vertices, and E consists a set of edges. The set
of vertices is used to contain all the transactions participating in the schedule. The set of
edges is used to contain all edges Ti ->Tj for which one of the three conditions holds:

1. Create a node Ti → Tj if Ti executes write (Q) before Tj executes read (Q).


2. Create a node Ti → Tj if Ti executes read (Q) before Tj executes write (Q).
3. Create a node Ti → Tj if Ti executes write (Q) before Tj executes write (Q).
o If a precedence graph contains a single edge Ti → Tj, then all the instructions of Ti are
executed before the first instruction of Tj is executed.
o If a precedence graph for schedule S contains a cycle, then S is non-serializable. If the
precedence graph has no cycle, then S is known as serializable.

For example:

Explanation:
Read(A): In T1, no subsequent writes to A, so no new edges
Read(B): In T2, no subsequent writes to B, so no new edges
Read(C): In T3, no subsequent writes to C, so no new edges
Write(B): B is subsequently read by T3, so add edge T2 → T3
Write(C): C is subsequently read by T1, so add edge T3 → T1
Write(A): A is subsequently read by T2, so add edge T1 → T2
Write(A): In T2, no subsequent reads to A, so no new edges
Write(C): In T1, no subsequent reads to C, so no new edges
Write(B): In T3, no subsequent reads to B, so no new edges

Precedence graph for schedule S1:

The precedence graph for schedule S1 contains a cycle that's why Schedule S1 is non-
serializable.
Explanation:

Read(A): In T4,no subsequent writes to A, so no new edges


Read(C): In T4, no subsequent writes to C, so no new edges
Write(A): A is subsequently read by T5, so add edge T4 → T5
Read(B): In T5,no subsequent writes to B, so no new edges
Write(C): C is subsequently read by T6, so add edge T4 → T6
Write(B): A is subsequently read by T6, so add edge T5 → T6
Write(C): In T6, no subsequent reads to C, so no new edges
Write(A): In T5, no subsequent reads to A, so no new edges
Write(B): In T6, no subsequent reads to B, so no new edges
Precedence graph for schedule S2:

The precedence graph for schedule S2 contains no cycle that's why ScheduleS2 is
serializable.

Conflict Serializable Schedule

o A schedule is called conflict serializability if after swapping of non-conflicting


operations, it can transform into a serial schedule.
o The schedule will be a conflict serializable if it is conflict equivalent to a serial
schedule.

Conflicting Operations

The two operations become conflicting if all conditions satisfy:

1. Both belong to separate transactions.


2. They have the same data item.
3. They contain at least one write operation.

Example:

Swapping is possible only if S1 and S2 are logically equal.


Here, S1 = S2. That means it is non-conflict.

Here, S1 ≠ S2. That means it is conflict.

Conflict Equivalent

In the conflict equivalent, one can be transformed to another by swapping non-conflicting


operations. In the given example, S2 is conflict equivalent to S1 (S1 can be converted to S2
by swapping non-conflicting operations).

Two schedules are said to be conflict equivalent if and only if:


1. They contain the same set of the transaction.
2. If each pair of conflict operations
T1 T2 are ordered in the same way.

Example:
Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)

Schedule S2 is a serial schedule because, in this, all operations of T1 are performed before
starting any operation of T2. Schedule S1 can be transformed into a serial schedule by
swapping non-conflicting operations of S1.

After swapping of non-conflict operations, the schedule S1 becomes:


Since, S1 is conflict serializable.
Write(B)

View Serializability

o A schedule will view serializable if it is view equivalent to a serial schedule.


o If a schedule is conflict serializable, then it will be view serializable.
o The view serializable which does not conflict serializable contains blind writes.

View Equivalent

Two schedules S1 and S2 are said to be view equivalent if they satisfy the following
conditions:

1. Initial Read

An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In
schedule S1, if a transaction T1 is reading the data item A, then in S2, transaction T1 should
also read A.

Above two schedules are view equivalent because Initial read operation in S1 is done by T1
and in S2 it is also done by T1.

2. Updated Read

In schedule S1, if Ti is reading A which is updated by Tj then in S2 also, Ti should read A


which is updated by Tj.
Above two schedules are not view equal because, in S1, T3 is reading A updated by T2 and in
S2, T3 is reading A updated by T1.

3. Final Write

A final write must be the same between both the schedules. In schedule S1, if a transaction
T1 updates A at last then in S2, final writes operations should also be done by T1.

Above two schedules is view equal because Final write operation in S1 is done by T3 and in
S2, the final write operation is also done by T3.

Example:
Schedule S

With 3 transactions, the total number of possible schedule

1. = 3! = 6
2. S1 = <T1 T2 T3>
3. S2 = <T1 T3 T2>
4. S3 = <T2 T3 T1>
5. S4 = <T2 T1 T3>
6. S5 = <T3 T1 T2>
7. S6 = <T3 T2 T1>

Taking first schedule S1:

Schedule S1

Step 1: final updation on data items


In both schedules S and S1, there is no read except the initial read that's why we don't need
to check that condition.

Step 2: Initial Read

The initial read operation in S is done by T1 and in S1, it is also done by T1.

Step 3: Final Write

The final write operation in S is done by T3 and in S1, it is also done by T3. So, S and S1 are
view Equivalent.

The first schedule S1 satisfies all three conditions, so we don't need to check another
schedule.

Hence, view equivalent serial schedule is:

1. T1 → T2 → T3

Multiversion concurrency
Multiversion concurrency control (MCC or MVCC), is a concurrency
control method commonly used by database management systems to provide
concurrent access to the database and in programming languages to
implement transactional memory.

Without concurrency control, if someone is reading from a database at the same time as
someone else is writing to it, it is possible that the reader will see a half-written
or inconsistent piece of data. For instance, when making a wire transfer between two bank
accounts if a reader reads the balance at the bank when the money has been withdrawn
from the original account and before it was deposited in the destination account, it would
seem that money has disappeared from the bank. Isolation is the property that provides
guarantees in the concurrent accesses to data. Isolation is implemented by means of
a concurrency control protocol. The simplest way is to make all readers wait until the writer is
done, which is known as a read-write lock. Locks are known to create contention especially
between long read transactions and update transactions. MVCC aims at solving the problem
by keeping multiple copies of each data item. In this way, each user connected to the
database sees a snapshot of the database at a particular instant in time. Any changes made
by a writer will not be seen by other users of the database until the changes have been
completed (or, in database terms: until the transaction has been committed.)

When an MVCC database needs to update a piece of data, it will not overwrite the
original data item with new data, but instead creates a newer version of the data
item. Thus there are multiple versions stored. The version that each transaction sees
depends on the isolation level implemented. The most common isolation level
implemented with MVCC is snapshot isolation. With snapshot isolation, a transaction
observes a state of the data as of when the transaction started.
MVCC provides point-in-time consistent views. Read transactions under MVCC
typically use a timestamp or transaction ID to determine what state of the DB to read,
and read these versions of the data. Read and write transactions are
thus isolated from each other without any need for locking. However, despite locks
being unnecessary, they are used by some MVCC databases such as Oracle. Writes
create a newer version, while concurrent reads access an older version.
MVCC introduces the challenge of how to remove versions that become obsolete
and will never be read. In some cases, a process to periodically sweep through and
delete the obsolete versions is implemented. This is often a stop-the-world process
that traverses a whole table and rewrites it with the last version of each data
item. PostgreSQL can use this approach with its VACUUM FREEZE process. Other
databases split the storage blocks into two parts: the data part and an undo log. The
data part always keeps the last committed version. The undo log enables the
recreation of older versions of data. The main inherent limitation of this latter
approach is that when there are update-intensive workloads, the undo log part runs
out of space and then transactions are aborted as they cannot be given their
snapshot. For a document-oriented database it also allows the system to optimize
documents by writing entire documents onto contiguous sections of disk—when
updated, the entire document can be re-written rather than bits and pieces cut out or
maintained in a linked, non-contiguous database structure.

Implementation
MVCC uses timestamps (TS), and incrementing transaction IDs, to
achieve transactional consistency. MVCC ensures a transaction (T) never has to
wait to Read a database object (P) by maintaining several versions of the object.
Each version of object P has both a Read Timestamp (RTS) and a Write
Timestamp (WTS) which lets a particular transaction Ti read the most recent version
of the object which precedes the transaction's Read Timestamp RTS(Ti).
If transaction Ti wants to Write to object P, and there is also another
transaction Tk happening to the same object, the Read Timestamp RTS(Ti) must
precede the Read Timestamp RTS(Tk), i.e., RTS(Ti) < RTS(Tk), for the object Write
Operation (WTS) to succeed. A Write cannot complete if there are other outstanding
transactions with an earlier Read Timestamp (RTS) to the same object. Like
standing in line at the store, you cannot complete your checkout transaction until
those in front of you have completed theirs.
To restate; every object (P) has a Timestamp (TS), however if transaction Ti wants
to Write to an object, and the transaction has a Timestamp (TS) that is earlier than
the object's current Read Timestamp, TS(Ti) < RTS(P), then the transaction is
aborted and restarted. (This is because a later transaction already depends on the
old value.) Otherwise, Ti creates a new version of object P and sets the read/write
timestamp TS of the new version to the timestamp of the transaction TS ← TS(Ti).[2]
The drawback to this system is the cost of storing multiple versions of objects in the
database. On the other hand, reads are never blocked, which can be important for
workloads mostly involving reading values from the database. MVCC is particularly
adept at implementing true snapshot isolation, something which other methods of
concurrency control frequently do either incompletely or with high performance costs.
Example
Concurrent read–write
At Time = 1, the state of a database could be:

Time Object 1 Object 2

0 "Foo" by T0 "Bar" by T0

1 "Hello" by T1

T0 wrote Object 1="Foo" and Object 2="Bar". After that T1 wrote Object 1="Hello"
leaving Object 2 at its original value. The new value of Object 1 will supersede the
value at 0 for all transactions that start after T1 commits at which point version 0 of
Object 1 can be garbage collected.
If a long running transaction T2 starts a read operation of Object 2 and Object 1 after
T1 committed and there is a concurrent update transaction T3 which deletes Object
2 and adds Object 3="Foo-Bar", the database state will look like this at time 2:

Time Object 1 Object 2 Object 3

0 "Foo" by T0 "Bar" by T0

1 "Hello" by T1

2 (deleted) by T3 "Foo-Bar" by T3

There is a new version as of time 2 of Object 2 which is marked as deleted and a


new Object 3. Since T2 and T3 run concurrently T2 sees the version of the database
before 2 i.e. before T3 committed writes, as such T2 reads Object 2="Bar" and
Object 1="Hello". This is how multiversion concurrency control allows snapshot
isolation reads without any locks.

Multiversion concurrency control is described in some detail in the 1981 paper


"Concurrency Control in Distributed Database Systems by Phil Bernstein and Nathan
Goodman, then employed by the Computer Corporation of America. Bernstein and
Goodman's paper cites a 1978 dissertation by David P. Reed which quite clearly
describes MVCC and claims it as an original work.
The first shipping, commercial database software product featuring MVCC was VAX
Rdb/ELN, released in 1984, and created at Digital Equipment Corporation by Jim
Starkey. Starkey went on[6] to create the second commercially successful MVCC
database - InterBase.

Recovery
When a system crashes, it may have several transactions being executed and
various files opened for them to modify the data items. Transactions are made of
various operations, which are atomic in nature. But according to ACID properties of
DBMS, atomicity of transactions as a whole must be maintained, that is, either all the
operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
 It should check the states of all the transactions, which were being executed.
 A transaction may be in the middle of some operation; the DBMS must ensure
the atomicity of the transaction in this case.
 It should check whether the transaction can be completed now or it needs to
be rolled back.
 No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as
maintaining the atomicity of a transaction −
 Maintaining the logs of each transaction, and writing them onto some stable
storage before actually modifying the database.
 Maintaining shadow paging, where the changes are done on a volatile
memory, and later, the actual database is updated.

Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to the actual modification
and stored on a stable storage media, which is failsafe.
Log-based recovery works as follows −
 The log file is kept on a stable storage media.
 When a transaction enters the system and starts execution, it writes a log
about it.
<Tn, Start>
 When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V 1 to V2.

 When the transaction finishes, it logs −


<Tn, commit>
The database can be modified using two approaches −
 Deferred database modification − All logs are written on to the stable
storage and the database is updated when a transaction commits.
 Immediate database modification − Each log follows an actual database
modification. That is, the database is modified immediately after every
operation.

Recovery techniques based on deferred update

Deferred update
The idea behind deferred update is to defer or postpone any actual updates to the database itself
until the transaction completes its execution successfully and reaches its commit point. During
transaction execution, the updates are recorded only in the log and in the transaction workspace.
After the transaction reaches its commit point and the log is force-written to disk, the updates are
recorded in the database itself. If a transaction fails before reaching its commit point, there is no
need to undo any operations, because the transaction has not affected the database in any way.

The steps involved in the deferred update protocol are as follows:

1. When a transaction starts, write an entry start_transaction(T) to the log.

2. When any operation is performed that will change values in the database, write a log entry
write_item(T, x, old_value, new_value).

3. When a transaction is about to commit, write a log record of the form commit(T); write all
log records to disk.

4. Commit the transaction, using the log to write the updates to the database; the writing of
data to disk need not occur immediately.

5. If the transaction aborts, ignore the log records and do not write the changes to disk.

Deferred update in a single-user environment


We first discuss recovery based on deferred update in single-user systems, where no concurrency
control is needed, so that we can understand the recovery process independently of any
concurrency control method. In such an environment, the recovery algorithm can be rather simple.
It works as follows.

Use two lists to maintain the transactions: the committed transactions list, which contains all the
committed transactions since the last checkpoint, and the active transactions list (at most one
transaction falls in this category, because the system is a single-user one). Apply the REDO operation
to all the write_item operations of the committed transactions from the log in the order in which
they were written to the log. Restart the active transactions.

The REDO procedure is defined as follows:


Redoing a write_item operation consists of examining its log entry write_item(T, x, old_value,
new_value) and setting the value of item x in the database to new_value. The REDO operation is
required to be idempotent, as discussed before.

Notice that the transaction in the active list will have no effect on the database because of the
deferred update protocol, and is ignored completely by the recovery process. It is implicitly rolled
back, because none of its operations were reflected in the database. However, the transaction must
now be restarted, either automatically by the recovery process or manually by the user.

The method’s main benefit is that any transaction operation need never be undone, as a

transaction does not record its changes in the database until it reaches its commit point.

Deferred update in a multi-user environment


For a multi-user system with concurrency control, the recovery process may be more complex,
depending on the protocols used for concurrency control. In many cases, the concurrency control
and recovery processes are interrelated. In general, the greater the degree of concurrency we wish
to achieve, the more difficult the task of recovery becomes.

Consider a system in which concurrency control uses two-phase locking (basic 2PL) and prevents
deadlock by pre-assigning all locks to items needed by a transaction before the transaction starts
execution. To combine the deferred update methods for recovery with this concurrency control
technique, we can keep all the locks on items in effect until the transaction reaches its commit point.
After that, the locks can be released. This ensures strict and serialisable schedules. Assuming that
checkpoint entries are included in the log, a possible recovery algorithm for this case is given below.

Use two lists of transactions maintained by the system: the committed transactions list which
contains all committed transactions since the last checkpoint, and the active transactions list. REDO
all the write operations of the committed transactions from the log, in the order in which they were
written into the log. The transactions in the active list that are active and did not commit are
effectively cancelled and must be resubmitted.

The REDO procedure is the same as defined earlier in the deferred update in the single-user
environment.

In general, a transaction will have actions that do not affect the database, such as generating and
printing messages or reports from information retrieved from the database. If a transaction fails
before completion, we may not want the user to get these reports, since the transaction has failed
to complete. Hence, such reports should be generated only after the transaction reaches its commit
point. A common method of dealing with such actions is to issue the commands that generate the
reports but keep them as batch jobs. The batch jobs are executed only after the transaction reaches
its commit point. If the transaction does not reach its commit point because of a failure, the batch
jobs are cancelled.

Exercise 1: Deferred update protocol


Given the operations of the four concurrent transactions in (1) below and the system log at the point
of system crash in (2), discuss how each transaction recovers from the failure using deferred update
technique.

1. The read and write operations of four transactions:

2. System log at the point of crash:

Exercise 2: Recovery management using deferred update with incremental log

Below, a schedule is given for five transactions A, B, C, D and E.

Assume the initial values for the variables are a=1, b=2, c=3, d=4 and e=5.

Using an incremental log with deferred updates, for each operation in each of the transactions,
show:

1. The log entries.

2. Whether the log is written to disk.

3. Whether the output buffer is updated.

4. Whether the DBMS on disk is updated.

5. The values of the variables on the disk.

Use your own words to describe the deferred update method for recovery management in a multi-
user environment. Complete the following table to show how the deferred update protocol affects
the log on disk, database buffer and database on disk.

1. How can recovery handle transaction operations that do not affect the database, such as the
printing of reports by the transaction?

Recovery techniques based on immediate update

Immediate update
In the immediate update techniques, the database may be updated by the operations of a
transaction immediately, before the transaction reaches its commit point. However, these
operations are typically recorded in the log on disk by force-writing before they are applied to the
database, so that recovery is possible.
When immediate update is allowed, provisions must be made for undoing the effect of update
operations on the database, because a transaction can fail after it has applied some updates to the
database itself. Hence recovery schemes based on immediate update must include the capability to
roll back a transaction by undoing the effect of its write operations.

1. When a transaction starts, write an entry start_transaction(T) to the log;

2. When any operation is performed that will change values in the database, write a log entry
write_item(T, x, old_value, new_value);

3. Write the log to disk;

4. Once the log record is written, write the update to the database buffers;

5. When convenient, write the database buffers to the disk;

6. When a transaction is about to commit, write a log record of the form commit(T);

7. Write the log to disk.

In general, we can distinguish two main categories of immediate update algorithms. If the
recovery technique ensures that all updates of a transaction are recorded in the database on disk
before the transaction commits, there is never a need to redo any operations of committed
transactions. Such an algorithm is called UNDO/NO-REDO. On the other hand, if the transaction is
allowed to commit before all its changes are written to the database, we have the UNDO/REDO
method, the most general recovery algorithm. This is also the most complex technique. Recovery
activities are summarised below:

Immediate update in a single-user environment


We first consider a single-user system so that we can examine the recovery process separately from
concurrency control. If a failure occurs in a single-user system, the executing transaction at the time
of failure may have recorded some changes in the database. The effect of all such operations must
be undone as part of the recovery process. Hence, the recovery algorithm needs an UNDO
procedure, described subsequently, to undo the effect of certain write operations that have been
applied to the database following examination of their system log entry. The recovery algorithm also
uses the redo procedure defined earlier. Recovery takes place in the following way.

Use two lists of transaction maintained by the system: the committed transactions since the last
checkpoint, and the active transactions (at most one transaction will fall in this category, because
the system is single user). Undo all the write operations of the active transaction from the log, using
the UNDO procedure described hereafter. Redo all the write operations of the committed
transactions from the log, in the order in which they were written in the log, using the REDO
procedure.
The UNDO procedure is defined as follows:

Undoing a write operation consists of examining its log entry write_item(T, x, old_value, new_value)
and setting the value of x in the database to old_value. Undoing a number of such write operations
from one or more transactions from the log must proceed in the reverse order from the order in
which the operations were written in the log.

Immediate update in a multi-user environment


When concurrency execution is permitted, the recovery process again depends on the protocols
used for concurrency control. The procedure below outlines a recovery technique for concurrent
transactions with immediate update. Assume that the log includes checkpoints and that the
concurrency control protocol produces strict schedules – as, for example, the strict 2PL protocol
does. Recall that a strict schedule does not allow a transaction to read or write an item unless the
transaction that last wrote the item has committed. However, deadlocks can occur in strict 2PL, thus
requiring UNDO of transactions.

Use two lists of transaction maintained by the system: the committed transactions since the last
checkpoint, and the active transactions. Undo all the write operations of the active (uncommitted)
transaction from the log, using the UNDO procedure. The operations should be undone in the
reverse of the order in which they were written into the log. Redo all the write operations of the
committed transactions from the log, in the order in which they were written in the log, using the
REDO procedure.

Exercise 3: Immediate update protocol

Given the same transactions and system log in exercise 1, discuss how each transaction recovers
from the failure using immediate update technique.

Exercise 4: Recovery management using immediate update with incremental log

The same schedule and initial values of the variables in exercise 2 are given; use the immediate
update protocol for recovery to show how each transaction recovers from the failure.

Review question 4

1. Use your own words to describe the immediate update method for recovery management in
a multi-user environment. Complete the following table to show how the immediate update
protocol affects the log on disk, database buffer and database on disk.

2. In general, we can distinguish two main categories of immediate update algorithms. If the
recovery technique ensures that all updates of a transaction are recorded in the database on
disk before the transaction commits, there is never a need to redo any operations of
committed transactions. Such an algorithm is called UNDO/NO-REDO. On the other hand, if
the transaction is allowed to commit before all its changes are written to the database, we
have the UNDO/REDO method.
Recovery in multidatabase transactions

So far, we have implicitly assumed that a transaction accesses a single database. In some cases a
single transaction, called a multidatabase transaction, may require access to multiple database.
These databases may even be stored on different types of DBMSs; for example, some DBMSs may be
Relational, whereas others are hierarchical or network DBMSs. In such a case, each DBMS involved in
the multidatabase transaction will have its own recovery technique and transaction manager
separate from those of the other DBMSs. This situation is somewhat similar to the case of a
distributed database management system, where parts of the database reside at different sites that
are connected by a communication network.

To maintain the atomicity of multidatabase transaction, it is necessary to have a two-level recovery


mechanism. A global recovery manager, or coordinator, is needed in addition to the local recovery
managers. The coordinator usually follows a protocol called the two-phase commit protocol, whose
two phases can be stated as follows.

PHASE 1: When all participating databases signal the coordinator that the part of the multidatabase
transaction involving them has concluded, the coordinator sends a message “prepare for
commit― to each participant to get ready for committing the transaction. Each participating
database receiving that message will force-write all log records to disk and then send a “ready to
commit― or “OK― signal to the coordinator. If the force-writing to disk fails or the local
transaction cannot commit for some reason, the participating database sends a “cannot
commit” or “not OK” signal to the coordinator. If the coordinator does not receive a reply
from a database within a certain time interval, it assumes a “not OK― response.

PHASE 2: If all participating databases reply “OK―, the transaction is successful, and the
coordinator sends a “commit― signal for the transaction to the participating databases.
Because all the local effects of the transaction have been recorded in the logs of the participating
databases, recovery from failure is now possible. Each participating database completes transaction
commit by writing a commit(T) entry for the transaction in the log and permanently updating the
database if needed. On the other hand, if one or more of the participating databases have a “not
OK― response to the coordinator, the transaction has failed, and the coordinator sends a message
to “roll back― or UNDO the local effect of the transaction to each participating database. This is
done by undoing the transaction operations, using the log.

The net effect of the two-phase commit protocol is that either all participating databases commit
the effect of the transaction or none of them do. In case any of the participants – or the
coordinator – fails, it is always possible to recover to a state where either the transaction is
committed or it is rolled back. A failure during or before phase 1 usually requires the transaction to
be rolled back, whereas a failure during phase 2 means that a successful transaction can recover and
commit.
Checkpoints
Why do we need Checkpoints ?
Whenever transaction logs are created in a real-time environment, it eats up
lots of storage space. Also keeping track of every update and its
maintenance may increase the physical space of the system. Eventually, the
transaction log file may not be handled as the size keeps growing. This can
be addressed with checkpoints. The methodology utilized for removing all
previous transaction logs and storing them in permanent storage is called
a Checkpoint.
What is a Checkpoint ?
The checkpoint is used to declare a point before which the DBMS was in the
consistent state, and all transactions were committed. During transaction
execution, such checkpoints are traced. After execution, transaction log files
will be created.
Upon reaching the savepoint/checkpoint, the log file is destroyed by saving
its update to the database. Then a new log is created with upcoming
execution operations of the transaction and it will be updated until the next
checkpoint and the process continues.
How to use Checkpoints in database ?
Steps :
1. Write begin_checkpoint record into log.
2. Collect checkpoint data in the stable storage.
3. Write end_checkpoint record into log.
The behavior when the system crashes and recovers when concurrent
transactions are executed is shown below –

Understanding Checkpoints in multiple Transactions


 The recovery system reads the logs backward from the end to the last
checkpoint i.e. from T4 to T1.
 It will keep track of two lists – Undo and Redo.
 Whenever there is a log with instruction <Tn, start>and <Tn, commit> or
only <Tn, commit> then it will put that transaction in Redo List. T2 and T3
contain <Tn, Start> and <Tn, Commit> whereas T1 will have only <Tn,
Commit>. Here, T1, T2, and T3 are in the redo list.
 Whenever a log record with no instruction of commit or abort is found, that
transaction is put to Undo List <Here, T4 has <Tn, Start> but no <Tn,
commit> as it is an ongoing transaction. T4 will be put in the undo list.
All the transactions in the redo-list are deleted with their previous logs and
then redone before saving their logs. All the transactions in the undo-list are
undone and their logs are deleted.
Relevance of Checkpoints :
A checkpoint is a feature that adds a value of C in ACID-compliant
to RDBMS. A checkpoint is used for recovery if there is an unexpected
shutdown in the database. Checkpoints work on some intervals and write all
dirty pages (modified pages) from logs relay to data file from i.e from a buffer
to physical disk. It is also known as the hardening of dirty pages. It is a
dedicated process and runs automatically by SQL Server at specific
intervals. The synchronization point between the database and transaction
log is served with a checkpoint.

Advantages of using Checkpoints :


 It speeds up data recovery process.
 Most of the dbms products automatically checkpoints themselves.
 Checkpoint records in log file is used to prevent unnecessary redo
operations.
 Since dirty pages are flushed out continuously in the background, it has
very low overhead and can be done frequently.

Real-Time Applications of Checkpoints :


 Whenever an application is tested in real-time environment that may have
modified the database, it is verified and validated using checkpoints.
 Checkpoints are used to create backups and recovery prior to applying
any updates in the database.
 The recovery system is used to return the database to the checkpoint
state.
Shadow Paging
Shadow Paging is recovery technique that is used to recover database. In this
recovery technique, database is considered as made up of fixed size of logical units of
storage which are referred as pages. pages are mapped into physical blocks of storage,
with help of the page table which allow one entry for each logical page of database.
This method uses two page tables named current page table and shadow page table.
The entries which are present in current page table are used to point to most recent
database pages on disk. Another table i.e., Shadow page table is used when the
transaction starts which is copying current page table. After this, shadow page table
gets saved on disk and current page table is going to be used for transaction. Entries
present in current page table may be changed during execution but in shadow page
table it never get changed. After transaction, both tables become identical.
This technique is also known as Cut-of-Place updating.

To understand concept, consider above figure. In this 2 write operations are performed
on page 3 and 5. Before start of write operation on page 3, current page table points to
old page 3. When write operation starts following steps are performed :
1. Firstly, search start for available free block in disk blocks.
2. After finding free block, it copies page 3 to free block which is represented by
Page 3 (New).
3. Now current page table points to Page 3 (New) on disk but shadow page table
points to old page 3 because it is not modified.
4. The changes are now propagated to Page 3 (New) which is pointed by current page
table.
COMMIT Operation :
To commit transaction following steps should be done :
1. All the modifications which are done by transaction which are present in buffers
are transferred to physical database.
2. Output current page table to disk.
3. Disk address of current page table output to fixed location which is in stable
storage containing address of shadow page table. This operation overwrites
address of old shadow page table. With this current page table becomes same as
shadow page table and transaction is committed.
Failure :
If system crashes during execution of transaction but before commit operation, With
this, it is sufficient only to free modified database pages and discard current page
table. Before execution of transaction, state of database get recovered by reinstalling
shadow page table.
If the crash of system occur after last write operation then it does not affect
propagation of changes that are made by transaction. These changes are preserved and
there is no need to perform redo operation.
Advantages :
 This method require fewer disk accesses to perform operation.
 In this method, recovery from crash is inexpensive and quite fast.
 There is no need of operations like- Undo and Redo.
Disadvantages :
 Due to location change on disk due to update database it is quite difficult to keep
related pages in database closer on disk.
 During commit operation, changed blocks are going to be pointed by shadow page
table which have to be returned to collection of free blocks otherwise they become
accessible.
 The commit of single transaction requires multiple blocks which decreases
execution speed.
 To allow this technique to multiple transactions concurrently it is difficult.

You might also like