UNIT - IV Transaction Concept
UNIT - IV Transaction Concept
Equivalence Schedules
An equivalence schedule can be of the following types −
Result Equivalence
If two schedules produce the same result after execution, they are said to
be result equivalent. They may yield the same result for some value and
different results for another set of values. That's why this equivalence is
not generally considered significant.
View Equivalence
Two schedules would be view equivalence if the transactions in both the
schedules perform similar actions in a similar manner.
For example −
If T reads the initial data in S1, then it also reads the initial
data in S2.
If T reads the value written by J in S1, then it also reads the
value written by J in S2.
If T performs the final write on the data value in S1, then it
also performs the final write on the data value in S2.
Conflict Equivalence
Two schedules would be conflicting if they have the following properties −
States of Transactions
A transaction in a database can be in one of the following states −
Two-phase locking has two phases, one is growing, where all the locks
are being acquired by the transaction; and the second phase is shrinking,
where the locks held by the transaction are being released.
To claim an exclusive (write) lock, a transaction must first acquire a
shared (read) lock and then upgrade it to an exclusive lock.
Timestamp-based Protocols
The most commonly used concurrency protocol is the timestamp based
protocol. This protocol uses either system time or logical counter as a
timestamp.
Lock-based protocols manage the order between the conflicting pairs
among transactions at the time of execution, whereas timestamp-based
protocols start working as soon as a transaction is created.
Every transaction has a timestamp associated with it, and the ordering is
determined by the age of the transaction. A transaction created at 0002
clock time would be older than all other transactions that come after it.
For example, any transaction 'y' entering the system at 0004 is two
seconds younger and the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp.
This lets the system know when the last ‘read and write’ operation was
performed on the data item.
DBMS - Deadlock
In a multi-process system, deadlock is an unwanted situation that arises in
a shared resource environment, where a process indefinitely waits for a
resource that is held by another process.
For example, assume a set of transactions {T , T , T , ...,T }. T needs a
0 1 2 n 0
held by T . Thus, all the processes wait for each other to release
0
resources. In this situation, none of the processes can finish their task.
This situation is known as a deadlock.
Deadlocks are not healthy for a system. In case a system is stuck in a
deadlock, the transactions involved in the deadlock are either rolled back
or restarted.
Deadlock Prevention
To prevent any deadlock situation in the system, the DBMS aggressively
inspects all the operations, where transactions are about to execute. The
DBMS inspects the operations and analyzes if they can create a deadlock
situation. If it finds that a deadlock situation might occur, then that
transaction is never allowed to be executed.
There are deadlock prevention schemes that use timestamp ordering
mechanism of transactions in order to predetermine a deadlock situation.
Wait-Die Scheme
In this scheme, if a transaction requests to lock a resource (data item),
which is already held with a conflicting lock by another transaction, then
one of the two possibilities may occur −
If TS(T ) < TS(T ) − that is T , which is requesting a conflicting
i j i
item is available.
If TS(T ) > TS(t ) − that is T is younger than T − then T dies. T is
i j i j i i
Wound-Wait Scheme
In this scheme, if a transaction requests to lock a resource (data item),
which is already held with conflicting lock by some another transaction,
one of the two possibilities may occur −
If TS(T ) < TS(T ), then T forces T to be rolled back − that is
i j i j
available.
This scheme, allows the younger transaction to wait; but when an older
transaction requests an item held by a younger one, the older transaction
forces the younger one to abort and release the item.
In both the cases, the transaction that enters the system at a later stage
is aborted.
Deadlock Avoidance
Aborting a transaction is not always a practical approach. Instead,
deadlock avoidance mechanisms can be used to detect any deadlock
situation in advance. Methods like "wait-for graph" are available but they
are suitable for only those systems where transactions are lightweight
having fewer instances of resource. In a bulky system, deadlock
prevention techniques may work well.
Wait-for Graph
This is a simple method available to track if any deadlock situation may
arise. For each transaction entering into the system, a node is created.
When a transaction T requests for a lock on an item, say X, which is held
i
T releases item X, the edge between them is dropped and T locks the
j i
data item.
The system maintains this wait-for graph for every transaction waiting for
some data items held by others. The system keeps checking if there's any
cycle in the graph.
Here, we can use any of the two following approaches −
First, do not allow any request for an item, which is already
locked by another transaction. This is not always feasible and
may cause starvation, where a transaction indefinitely waits
for a data item and can never acquire it.
The second option is to roll back one of the transactions. It is
not always feasible to roll back the younger transaction, as it
may be important than the older one. With the help of some
relative algorithm, a transaction is chosen, which is to be
aborted. This transaction is known as the victim and the
process is known as victim selection.
Remote Backup
Remote backup provides a sense of security in case the primary location
where the database is located gets destroyed. Remote backup can be
offline or real-time or online. In case it is offline, it is maintained manually.
Online backup systems are more real-time and lifesavers for database
administrators and investors. An online backup system is a mechanism
where every bit of the real-time data is backed up simultaneously at two
distant places. One of them is directly connected to the system and the
other one is kept at a remote place as backup.
As soon as the primary database storage fails, the backup system senses
the failure and switches the user system to the remote storage.
Sometimes this is so instant that the users can’t even realize a failure.
Failure Classification
To see where the problem has occurred, we generalize a failure into
various categories, as follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a
point from where it can’t go any further. This is called transaction failure
where only a few transactions or processes are hurt.
Reasons for a transaction failure could be −
Logical errors − Where a transaction cannot complete
because it has some code error or any internal error condition.
System errors − Where the database system itself terminates
an active transaction because the DBMS is not able to execute
it, or it has to stop because of some system condition. For
example, in case of deadlock or resource unavailability, the
system aborts an active transaction.
System Crash
There are problems − external to the system − that may cause the system
to stop abruptly and cause the system to crash. For example,
interruptions in power supply may cause the failure of underlying
hardware or software failure.
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common problem where
hard-disk drives or storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk,
disk head crash or any other failure, which destroys all or a part of disk
storage.
Storage Structure
We have already described the storage system. In brief, the storage
structure can be divided into two categories −
Volatile storage − As the name suggests, a volatile storage
cannot survive system crashes. Volatile storage devices are
placed very close to the CPU; normally they are embedded
onto the chipset itself. For example, main memory and cache
memory are examples of volatile storage. They are fast but
can store only a small amount of information.
Non-volatile storage − These memories are made to survive
system crashes. They are huge in data storage capacity, but
slower in accessibility. Examples may include hard-disks,
magnetic tapes, flash memory, and non-volatile (battery
backed up) RAM.
Recovery and Atomicity
When a system crashes, it may have several transactions being executed
and various files opened for them to modify the data items. Transactions
are made of various operations, which are atomic in nature. But according
to ACID properties of DBMS, atomicity of transactions as a whole must be
maintained, that is, either all the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
It should check the states of all the transactions, which were
being executed.
A transaction may be in the middle of some operation; the
DBMS must ensure the atomicity of the transaction in this
case.
It should check whether the transaction can be completed now
or it needs to be rolled back.
No transactions would be allowed to leave the DBMS in an
inconsistent state.
There are two types of techniques, which can help a DBMS in recovering
as well as maintaining the atomicity of a transaction −
Maintaining the logs of each transaction, and writing them
onto some stable storage before actually modifying the
database.
Maintaining shadow paging, where the changes are done on a
volatile memory, and later, the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions
performed by a transaction. It is important that the logs are written prior
to the actual modification and stored on a stable storage media, which is
failsafe.
Log-based recovery works as follows −
The log file is kept on a stable storage media.
When a transaction enters the system and starts execution, it
writes a log about it.
<T , Start>
n
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill
out all the memory space available in the system. As time passes, the log
file may grow too big to be handled at all. Checkpoint is a mechanism
where all the previous logs are removed from the system and stored
permanently in a storage disk. Checkpoint declares a point before which
the DBMS was in consistent state, and all the transactions were
committed.
Recovery
When a system with concurrent transactions crashes and recovers, it
behaves in the following manner −
The recovery system reads the logs backwards from the end to
the last checkpoint.
It maintains two lists, an undo-list and a redo-list.
If the recovery system sees a log with <T , Start> and <T ,
n n
redo-list.
If the recovery system sees a log with <T , Start> but no
n
Serial execution.
Parallel execution.
Concurrent execution.
Concurrent transaction or execution includes multiple transactions which
are executed concurrently or simultaneously in the system.
Advantages
The advantages of the concurrent transactions are as follows −
Increases throughput which is nothing but number of
transactions completed per unit time.
It reduces the waiting time.
Example
T1= 90sec
T2= 500sec
T3= 5sec.
If we execute serially by T1->T2->T3 then transaction T3 waits for 590
sec, so we go for non-serial or concurrent transactions to reduce waiting
time.
i.e. T3 -> T1 -> T2.
Disadvantage
The disadvantage is that the execution of concurrent transactions may
result in inconsistency.
Example 1
The concurrent execution given below is in a consistent state.
Example 2
The transaction given below is not consistent.
Concurrent schedules do not keep the database in the consistent state.
The concurrent execution of the transactions has to be carried out in a
controlled environment.
If a system consists of ‘n’ number of transactions, we will have more than
‘n!’ number of concurrent schedules, out of which only a few of them are
keeping the database in consistent state.
View Serializability-
We have discussed-
The concept of serializability helps to identify the correct non-serial schedules
that will maintain the consistency of the database.
There are two types of serializability-
Problem-01:
Step-01:
List all the conflicting operations and determine the dependency between the
transactions-
W1(B) , W2(B) (T1 → T2)
W1(B) , W3(B) (T1 → T3)
W1(B) , W4(B) (T1 → T4)
W2(B) , W3(B) (T2 → T3)
W2(B) , W4(B) (T2 → T4)
W3(B) , W4(B) (T3 → T4)
Step-02:
Problem-02:
Step-01:
List all the conflicting operations and determine the dependency between the
transactions-
R1(A) , W3(A) (T1 → T3)
R2(A) , W3(A) (T2 → T3)
R2(A) , W1(A) (T2 → T1)
W3(A) , W1(A) (T3 → T1)
Step-02:
Now,
Since, the given schedule S is not conflict serializable, so, it may or may not
be view serializable.
To check whether S is view serializable or not, let us use another method.
Let us check for blind writes.
Now,
To check whether S is view serializable or not, let us use another method.
Let us derive the dependencies and then draw a dependency graph.
Clearly,
there exists a cycle in the dependency graph.
Thus, we conclude that the given schedule S is not view serializable.
Problem-03:
Solution-
Step-01:
List all the conflicting operations and determine the dependency between the
transactions-
R1(A) , W2(A) (T1 → T2)
R2(A) , W1(A) (T2 → T1)
W1(A) , W2(A) (T1 → T2)
R1(B) , W2(B) (T1 → T2)
R2(B) , W1(B) (T2 → T1)
Step-02:
Now,
Since, the given schedule S is not conflict serializable, so, it may or may not
be view serializable.
To check whether S is view serializable or not, let us use another method.
Let us check for blind writes.
Alternatively,
You could directly declare that the given schedule S is not view serializable.
This is because there exists no blind write in the schedule.
You need not check for conflict serializability.
Problem-04:
Check whether the given schedule S is view serializable or not. If yes, then give the
serial schedule.
S : R1(A) , W2(A) , R3(A) , W1(A) , W3(A)
Solution-
For simplicity and better understanding, we can represent the given schedule
pictorially as-
Step-01:
List all the conflicting operations and determine the dependency between the
transactions-
R1(A) , W2(A) (T1 → T2)
R1(A) , W3(A) (T1 → T3)
W2(A) , R3(A) (T2 → T3)
W2(A) , W1(A) (T2 → T1)
W2(A) , W3(A) (T2 → T3)
R3(A) , W1(A) (T3 → T1)
W1(A) , W3(A) (T1 → T3)
Step-02:
Now,
Since, the given schedule S is not conflict serializable, so, it may or may not
be view serializable.
To check whether S is view serializable or not, let us use another method.
Let us check for blind writes.
Now,
To check whether S is view serializable or not, let us use another method.
Let us derive the dependencies and then draw a dependency graph.
Serial Schedules are less efficient as Serializable Schedules are always better
compared to serializable schedules. than serial schedules.
(due to above reason) (due to above reason)
Types of Serializability-
Conflict Serializability-
If a given non-serial schedule can be converted into a serial schedule by swapping its
non-conflicting operations, then it is called as a conflict serializable schedule.
Conflicting Operations-
Two operations are called as conflicting operations if all the following conditions hold
true for them-
Both the operations belong to different transactions
Both the operations are on the same data item
At least one of the two operations is a write operation
Example-
Follow the following steps to check whether a given non-serial schedule is conflict
serializable or not-
Step-01:
Find and list all the conflicting operations.
Step-02:
Start creating a precedence graph by drawing one node for each transaction.
Step-03:
Draw an edge for each conflict pair such that if Xi (V) and Yj (V) forms a conflict
pair then draw an edge from Ti to Tj.
This ensures that Ti gets executed before Tj.
Step-04:
Check if there is any cycle formed in the graph.
If there is no cycle found, then the schedule is conflict serializable otherwise not.
Precedence graph
It is used to check conflict serializability.
The steps to check conflict serializability are as follows −
For each transaction T, put a node or vertex in the graph.
For each conflicting pair, put an edge from Ti to T j.
If there is a cycle in the graph then schedule is not conflict
serializable else schedule is conflict serializable.
Example 1
The cycle is present so it is not conflict serializable.
Example 2
The cycle is not present, so it is conflict serializable.
Example 3
The cycle is not present, so it is conflict serializable.
Example
Let’s take two transactions T1 and T2,
If both transactions are performed without interfering each other then it is
called as serial schedule, it can be represented as follows −
T1 T2
READ1(A)
WRITE1(A)
READ1(B)
C1
READ2(B)
WRITE2(B)
READ2(B)
C2
Example
Consider the following example −
T1 T2
READ1(A)
WRITE1(A)
READ2(B)
WRITE2(B)
READ1(B)
WRITE1(B)
READ1(B)
Types of serializability
There are two types of serializability −
View serializability
A schedule is view-serializability if it is viewed equivalent to a serial
schedule.
The rules it follows are as follows −
T1 is reading the initial value of A, then T2 also reads the
initial value of A.
T1 is the reading value written by T2, then T2 also reads the
value written by T1.
T1 is writing the final value, and then T2 also has the write
operation as the final value.
Conflict serializability
It orders any conflicting operations in the same way as some serial
execution. A pair of operations is said to conflict if they operate on the
same data item and one of them is a write operation.
That means
Readi(x) readj(x) - non conflict read-read operation
Readi(x) writej(x) - conflict read-write operation.
Writei(x) readj(x) - conflict write-read operation.
Writei(x) writej(x) - conflict write-write operation.