Unit 4transaction
Unit 4transaction
Transaction processing
Transaction processing is a style of computing, typically performed by large server
computers, that supports interactive applications. In transaction processing, work is divided
into individual, indivisible operations, called transactions. By contrast, batch processing is a
style of computing in which one or more programs processes a series of records (a batch)
with little or no action from the user or operator.
When a transaction starts processing, CICS runs a program that is associated with the
transaction. That program can transfer control to other programs in the course of the
transaction, making it possible to assemble modular applications consisting of many CICS
programs.
At any time, in a CICS system, many instances of a transaction can run at the same time. A
single instance of a running transaction is known as a task.
During the time that a task is running, it has exclusive use of (or holds a lock for) each data
resource that it changes, ensuring the isolation property of the transaction. Other tasks that
require access to the resources must wait until the lock is released. To ensure overall
responsiveness of the transaction processing system, you design your CICS application
programs so that they hold locks for the shortest possible time. For example, you can design
your transactions to be short-lived because CICS releases locks when a task ends.
But before knowing about concurrency control, we should know about concurrent
execution.
consider the below diagram where two transactions TX and TY, are performed
on the same account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be $300
only because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only added but
not updated/write).
o At time t6, transaction TX writes the value of account A that will be updated as $250
only, as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write as
done at time t4 that will be $400. It means the value written by TX is lost, i.e., $250 is
lost.
For example:
For example:
Consider two transactions, TX and TY, performing the read/write operations on
account A, having an available balance = $300. The diagram is shown below:
o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100 to the
available balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction TX reads the available value of account A, and that
will be read as $400.
o It means that within the same transaction TX, it reads two different values of account
A, i.e., $ 300 initially, and after updation made by transaction T Y, it reads $400. It is an
unrepeatable read and is therefore known as the Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid such problems that
take place in concurrent execution, management is needed, and that is where the
concept of Concurrency Control comes into role.
Concurrency Control
Concurrency Control is the working concept that is required for controlling and
managing the concurrent execution of database operations and thus avoiding the
inconsistencies in the database. Thus, for maintaining the concurrency of the
database, we have the concurrency control protocols.
Concurrency Control Protocols
The concurrency control protocols ensure the atomicity, consistency, isolation,
durability and serializability of the concurrent execution of the database transactions.
Therefore, these protocols are categorized as sections.:
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by
the transaction.
o It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same
data simultaneously.
Growing phase: In the growing phase, a new lock on the data item may be acquired
by the transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may
be released, but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can
happen:
Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
Transaction T2:
Where,
1. Read phase: In this phase, the transaction T is read and executed. It is used to
read the value of various data items and stores them in temporary local
variables. It can perform all the write operations on temporary variables
without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated
against the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the
temporary results are written to the database or system otherwise the
transaction is rolled back.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its
validation phase.
Transaction Recovery
Availability is the Holy Grail of database administrators. If your data is not
available, your applications can not run, and therefore your company is losing
business. Lost business translates into lower profitability and, perhaps, a
lower stock valuation for your company. These are all detrimental to the
business, so the DBA must do everything in his power to ensure that
databases are kept online and operational. This has been the duty of the DBA
since the days of the first DBMS.
But the need for availability is increasing. The days of the long batch window
where databases can be offline for extended periods to perform nightly
processing are diminishing. Exacerbating this trend is the drive toward e-
business. And when your business is on the World Wide Web, it never closes.
When DBAs hear the word "recovery," the first thing that usually comes to
mind is handling some sort of disaster. This disaster could be anything from a
simple media failure to a natural disaster destroying your data center.
Applications are completely unavailable until the recovery is complete.
In reality, except for disaster recovery tests, very few DBAs need to perform
true disaster recovery. While media does fail, it’s actually quite rare. User
errors and application failures are the most common causes of problems
requiring recovery, and therefore, the primary cause for system unavailability.
Any number of problems can occur at the application level.
There are three types of Transaction Recovery: PIT, UNDO and REDO.
Point-in-Time Recovery
After the point-in-time recovery, good transactions are missing from the
database. If the information is still available, the user could rerun or reenter
"Good Transaction 2." Regardless of the type of recovery to be performed, if
the error that caused the recovery to be performed is caught too late,
subsequent processing could have occurred using the "bad data." How to deal
with these types of problems depends on the nature of the data and the type
of updates applied, and needs to be handled on an application by application
basis.
Note that in the case of UNDO Transaction Recovery, the portion of the
database that does not need to be recovered remains undisturbed. When
undoing erroneous transactions, recovery can be done online without suffering
an application or database outage. But, the potential for anomalies causing
failures in the UNDO is certainly a consideration.
Instead of generating SQL for the bad transaction that we want to eliminate,
we generate the SQL for the transactions we want to save. Then we do a
standard PIT recovery, eliminating all the transactions since the recovery
point. Finally, we reapply the good transactions captured in the first step.
Unlike the UNDO process, which creates SQL statements that are designed to
back out all of the problem transactions, the REDO process creates SQL
statements designed to reapply only the valid transactions from a consistent
point of recovery to the current time. Since the REDO process does not
generate SQL for the problem transactions, performing a recovery and then
executing the REDO SQL can restore the tablespace to a state that does not
include the problem transactions.
Transaction properties
The transaction has the four properties. These are used to maintain consistency in a
database, before and after the transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity
o It states that all operations of the transaction take place at once if not, the transaction
is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is
treated as one unit and either run to completion or is not executed at all.
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Example: Let's assume that following transaction T consisting of T1 and T2. A consists of Rs
600 and B consists of Rs 300. Transfer Rs 100 from account A to account B.
Read(A) Read(B)
A:= A-100 Y:= Y+100
Write(A) Write(B)
If the transaction T fails after the completion of transaction T1 but before completion of
transaction T2, then the amount will be deducted from A but not added to B. This shows the
inconsistent database state. In order to ensure correctness of database state, the transaction
must be executed in entirety.
Consistency
o The integrity constraints are maintained so that the database is consistent before and
after the transaction.
o The execution of a transaction will leave a database in either its prior stable state or a
new stable state.
o The consistent property of database states that every transaction sees a consistent
database instance.
o The transaction is used to transform the database from one consistent state to
another consistent state.
For example: The total amount must be maintained before or after the transaction.
Therefore, the database is consistent. In the case when T1 is completed but T2 fails, then
inconsistency will occur.
Isolation
o It shows that the data which is used at the time of execution of a transaction cannot
be used by the second transaction until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then
that data item can't be accessed by any other transaction T2 until the transaction T1
ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.
Durability
States of Transaction
In a database, the transaction can be in one of the following states -
Active state
o The active state is the first state of every transaction. In this state, the transaction is
being executed.
o For example: Insertion or deletion or updating a record is done here. But all the
records are still not saved to the database.
Partially committed
o In the partially committed state, a transaction executes its final operation, but the
data is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is
executed in this state.
Committed
Failed state
o If any of the checks made by the database recovery system fails, then the transaction
is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to
fetch the marks, then the transaction will fail to execute.
Aborted
o If any of the checks fail and the transaction has reached a failed state then the
database recovery system will make sure that the database is in its previous
consistent state. If not then it will abort or roll back the transaction to bring the
database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the
two operations:
1. Re-start the transaction
2. Kill the transaction
Schedule
A series of operation from one transaction to another transaction is known as schedule. It is
used to preserve the order of the operation in each of the individual transaction.
1. Serial Schedule
The serial schedule is a type of schedule where one transaction is executed completely
before starting another transaction. In the serial schedule, when the first transaction
completes its cycle, then the next transaction is executed.
For example: Suppose there are two transactions T1 and T2 which have some operations. If
it has no interleaving of operations, then there are the following two possible outcomes:
1. Execute all the operations of T1 which was followed by all the operations of T2.
2. Execute all the operations of T1 which was followed by all the operations of T2.
o In the given (a) figure, Schedule A shows the serial schedule where T1 followed by T2.
o In the given (b) figure, Schedule B shows the serial schedule where T2 followed by T1.
2. Non-serial Schedule
o The serializability of schedules is used to find non-serial schedules that allow the
transaction to execute concurrently without interfering with one another.
o It identifies which schedules are correct when executions of the transaction have
interleaving of their operations.
o A non-serial schedule will be serializable if its result is equal to the result of its
transactions executed serially.
Here,
Testing of Serializability
Serialization Graph is used to test the Serializability of a schedule.
Assume a schedule S. For S, we construct a graph known as precedence graph. This graph
has a pair G = (V, E), where V consists a set of vertices, and E consists a set of edges. The set
of vertices is used to contain all the transactions participating in the schedule. The set of
edges is used to contain all edges Ti ->Tj for which one of the three conditions holds:
For example:
Explanation:
Read(A): In T1, no subsequent writes to A, so no new edges
Read(B): In T2, no subsequent writes to B, so no new edges
Read(C): In T3, no subsequent writes to C, so no new edges
Write(B): B is subsequently read by T3, so add edge T2 → T3
Write(C): C is subsequently read by T1, so add edge T3 → T1
Write(A): A is subsequently read by T2, so add edge T1 → T2
Write(A): In T2, no subsequent reads to A, so no new edges
Write(C): In T1, no subsequent reads to C, so no new edges
Write(B): In T3, no subsequent reads to B, so no new edges
The precedence graph for schedule S1 contains a cycle that's why Schedule S1 is non-
serializable.
Explanation:
The precedence graph for schedule S2 contains no cycle that's why ScheduleS2 is
serializable.
Conflicting Operations
Example:
Conflict Equivalent
Example:
Read(A)
Write(A)
Read(B)
Write(B)
Read(A)
Write(A)
Read(B)
Schedule S2 is a serial schedule because, in this, all operations of T1 are performed before
starting any operation of T2. Schedule S1 can be transformed into a serial schedule by
swapping non-conflicting operations of S1.
View Serializability
View Equivalent
Two schedules S1 and S2 are said to be view equivalent if they satisfy the following
conditions:
1. Initial Read
An initial read of both schedules must be the same. Suppose two schedule S1 and S2. In
schedule S1, if a transaction T1 is reading the data item A, then in S2, transaction T1 should
also read A.
Above two schedules are view equivalent because Initial read operation in S1 is done by T1
and in S2 it is also done by T1.
2. Updated Read
3. Final Write
A final write must be the same between both the schedules. In schedule S1, if a transaction
T1 updates A at last then in S2, final writes operations should also be done by T1.
Above two schedules is view equal because Final write operation in S1 is done by T3 and in
S2, the final write operation is also done by T3.
Example:
Schedule S
1. = 3! = 6
2. S1 = <T1 T2 T3>
3. S2 = <T1 T3 T2>
4. S3 = <T2 T3 T1>
5. S4 = <T2 T1 T3>
6. S5 = <T3 T1 T2>
7. S6 = <T3 T2 T1>
Schedule S1
The initial read operation in S is done by T1 and in S1, it is also done by T1.
The final write operation in S is done by T3 and in S1, it is also done by T3. So, S and S1 are
view Equivalent.
The first schedule S1 satisfies all three conditions, so we don't need to check another
schedule.
1. T1 → T2 → T3
Multiversion concurrency
Multiversion concurrency control (MCC or MVCC), is a concurrency
control method commonly used by database management systems to provide
concurrent access to the database and in programming languages to
implement transactional memory.
Without concurrency control, if someone is reading from a database at the same time as
someone else is writing to it, it is possible that the reader will see a half-written
or inconsistent piece of data. For instance, when making a wire transfer between two bank
accounts if a reader reads the balance at the bank when the money has been withdrawn
from the original account and before it was deposited in the destination account, it would
seem that money has disappeared from the bank. Isolation is the property that provides
guarantees in the concurrent accesses to data. Isolation is implemented by means of
a concurrency control protocol. The simplest way is to make all readers wait until the writer is
done, which is known as a read-write lock. Locks are known to create contention especially
between long read transactions and update transactions. MVCC aims at solving the problem
by keeping multiple copies of each data item. In this way, each user connected to the
database sees a snapshot of the database at a particular instant in time. Any changes made
by a writer will not be seen by other users of the database until the changes have been
completed (or, in database terms: until the transaction has been committed.)
When an MVCC database needs to update a piece of data, it will not overwrite the
original data item with new data, but instead creates a newer version of the data
item. Thus there are multiple versions stored. The version that each transaction sees
depends on the isolation level implemented. The most common isolation level
implemented with MVCC is snapshot isolation. With snapshot isolation, a transaction
observes a state of the data as of when the transaction started.
MVCC provides point-in-time consistent views. Read transactions under MVCC
typically use a timestamp or transaction ID to determine what state of the DB to read,
and read these versions of the data. Read and write transactions are
thus isolated from each other without any need for locking. However, despite locks
being unnecessary, they are used by some MVCC databases such as Oracle. Writes
create a newer version, while concurrent reads access an older version.
MVCC introduces the challenge of how to remove versions that become obsolete
and will never be read. In some cases, a process to periodically sweep through and
delete the obsolete versions is implemented. This is often a stop-the-world process
that traverses a whole table and rewrites it with the last version of each data
item. PostgreSQL can use this approach with its VACUUM FREEZE process. Other
databases split the storage blocks into two parts: the data part and an undo log. The
data part always keeps the last committed version. The undo log enables the
recreation of older versions of data. The main inherent limitation of this latter
approach is that when there are update-intensive workloads, the undo log part runs
out of space and then transactions are aborted as they cannot be given their
snapshot. For a document-oriented database it also allows the system to optimize
documents by writing entire documents onto contiguous sections of disk—when
updated, the entire document can be re-written rather than bits and pieces cut out or
maintained in a linked, non-contiguous database structure.
Implementation
MVCC uses timestamps (TS), and incrementing transaction IDs, to
achieve transactional consistency. MVCC ensures a transaction (T) never has to
wait to Read a database object (P) by maintaining several versions of the object.
Each version of object P has both a Read Timestamp (RTS) and a Write
Timestamp (WTS) which lets a particular transaction Ti read the most recent version
of the object which precedes the transaction's Read Timestamp RTS(Ti).
If transaction Ti wants to Write to object P, and there is also another
transaction Tk happening to the same object, the Read Timestamp RTS(Ti) must
precede the Read Timestamp RTS(Tk), i.e., RTS(Ti) < RTS(Tk), for the object Write
Operation (WTS) to succeed. A Write cannot complete if there are other outstanding
transactions with an earlier Read Timestamp (RTS) to the same object. Like
standing in line at the store, you cannot complete your checkout transaction until
those in front of you have completed theirs.
To restate; every object (P) has a Timestamp (TS), however if transaction Ti wants
to Write to an object, and the transaction has a Timestamp (TS) that is earlier than
the object's current Read Timestamp, TS(Ti) < RTS(P), then the transaction is
aborted and restarted. (This is because a later transaction already depends on the
old value.) Otherwise, Ti creates a new version of object P and sets the read/write
timestamp TS of the new version to the timestamp of the transaction TS ← TS(Ti).[2]
The drawback to this system is the cost of storing multiple versions of objects in the
database. On the other hand, reads are never blocked, which can be important for
workloads mostly involving reading values from the database. MVCC is particularly
adept at implementing true snapshot isolation, something which other methods of
concurrency control frequently do either incompletely or with high performance costs.
Example
Concurrent read–write
At Time = 1, the state of a database could be:
0 "Foo" by T0 "Bar" by T0
1 "Hello" by T1
T0 wrote Object 1="Foo" and Object 2="Bar". After that T1 wrote Object 1="Hello"
leaving Object 2 at its original value. The new value of Object 1 will supersede the
value at 0 for all transactions that start after T1 commits at which point version 0 of
Object 1 can be garbage collected.
If a long running transaction T2 starts a read operation of Object 2 and Object 1 after
T1 committed and there is a concurrent update transaction T3 which deletes Object
2 and adds Object 3="Foo-Bar", the database state will look like this at time 2:
0 "Foo" by T0 "Bar" by T0
1 "Hello" by T1
2 (deleted) by T3 "Foo-Bar" by T3
Recovery
When a system crashes, it may have several transactions being executed and
various files opened for them to modify the data items. Transactions are made of
various operations, which are atomic in nature. But according to ACID properties of
DBMS, atomicity of transactions as a whole must be maintained, that is, either all the
operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
It should check the states of all the transactions, which were being executed.
A transaction may be in the middle of some operation; the DBMS must ensure
the atomicity of the transaction in this case.
It should check whether the transaction can be completed now or it needs to
be rolled back.
No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as
maintaining the atomicity of a transaction −
Maintaining the logs of each transaction, and writing them onto some stable
storage before actually modifying the database.
Maintaining shadow paging, where the changes are done on a volatile
memory, and later, the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to the actual modification
and stored on a stable storage media, which is failsafe.
Log-based recovery works as follows −
The log file is kept on a stable storage media.
When a transaction enters the system and starts execution, it writes a log
about it.
<Tn, Start>
When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V 1 to V2.
Deferred update
The idea behind deferred update is to defer or postpone any actual updates to the database itself
until the transaction completes its execution successfully and reaches its commit point. During
transaction execution, the updates are recorded only in the log and in the transaction workspace.
After the transaction reaches its commit point and the log is force-written to disk, the updates are
recorded in the database itself. If a transaction fails before reaching its commit point, there is no
need to undo any operations, because the transaction has not affected the database in any way.
2. When any operation is performed that will change values in the database, write a log entry
write_item(T, x, old_value, new_value).
3. When a transaction is about to commit, write a log record of the form commit(T); write all
log records to disk.
4. Commit the transaction, using the log to write the updates to the database; the writing of
data to disk need not occur immediately.
5. If the transaction aborts, ignore the log records and do not write the changes to disk.
Use two lists to maintain the transactions: the committed transactions list, which contains all the
committed transactions since the last checkpoint, and the active transactions list (at most one
transaction falls in this category, because the system is a single-user one). Apply the REDO operation
to all the write_item operations of the committed transactions from the log in the order in which
they were written to the log. Restart the active transactions.
Notice that the transaction in the active list will have no effect on the database because of the
deferred update protocol, and is ignored completely by the recovery process. It is implicitly rolled
back, because none of its operations were reflected in the database. However, the transaction must
now be restarted, either automatically by the recovery process or manually by the user.
The method’s main benefit is that any transaction operation need never be undone, as a
transaction does not record its changes in the database until it reaches its commit point.
Consider a system in which concurrency control uses two-phase locking (basic 2PL) and prevents
deadlock by pre-assigning all locks to items needed by a transaction before the transaction starts
execution. To combine the deferred update methods for recovery with this concurrency control
technique, we can keep all the locks on items in effect until the transaction reaches its commit point.
After that, the locks can be released. This ensures strict and serialisable schedules. Assuming that
checkpoint entries are included in the log, a possible recovery algorithm for this case is given below.
Use two lists of transactions maintained by the system: the committed transactions list which
contains all committed transactions since the last checkpoint, and the active transactions list. REDO
all the write operations of the committed transactions from the log, in the order in which they were
written into the log. The transactions in the active list that are active and did not commit are
effectively cancelled and must be resubmitted.
The REDO procedure is the same as defined earlier in the deferred update in the single-user
environment.
In general, a transaction will have actions that do not affect the database, such as generating and
printing messages or reports from information retrieved from the database. If a transaction fails
before completion, we may not want the user to get these reports, since the transaction has failed
to complete. Hence, such reports should be generated only after the transaction reaches its commit
point. A common method of dealing with such actions is to issue the commands that generate the
reports but keep them as batch jobs. The batch jobs are executed only after the transaction reaches
its commit point. If the transaction does not reach its commit point because of a failure, the batch
jobs are cancelled.
Assume the initial values for the variables are a=1, b=2, c=3, d=4 and e=5.
Using an incremental log with deferred updates, for each operation in each of the transactions,
show:
Use your own words to describe the deferred update method for recovery management in a multi-
user environment. Complete the following table to show how the deferred update protocol affects
the log on disk, database buffer and database on disk.
1. How can recovery handle transaction operations that do not affect the database, such as the
printing of reports by the transaction?
Immediate update
In the immediate update techniques, the database may be updated by the operations of a
transaction immediately, before the transaction reaches its commit point. However, these
operations are typically recorded in the log on disk by force-writing before they are applied to the
database, so that recovery is possible.
When immediate update is allowed, provisions must be made for undoing the effect of update
operations on the database, because a transaction can fail after it has applied some updates to the
database itself. Hence recovery schemes based on immediate update must include the capability to
roll back a transaction by undoing the effect of its write operations.
2. When any operation is performed that will change values in the database, write a log entry
write_item(T, x, old_value, new_value);
4. Once the log record is written, write the update to the database buffers;
6. When a transaction is about to commit, write a log record of the form commit(T);
In general, we can distinguish two main categories of immediate update algorithms. If the
recovery technique ensures that all updates of a transaction are recorded in the database on disk
before the transaction commits, there is never a need to redo any operations of committed
transactions. Such an algorithm is called UNDO/NO-REDO. On the other hand, if the transaction is
allowed to commit before all its changes are written to the database, we have the UNDO/REDO
method, the most general recovery algorithm. This is also the most complex technique. Recovery
activities are summarised below:
Use two lists of transaction maintained by the system: the committed transactions since the last
checkpoint, and the active transactions (at most one transaction will fall in this category, because
the system is single user). Undo all the write operations of the active transaction from the log, using
the UNDO procedure described hereafter. Redo all the write operations of the committed
transactions from the log, in the order in which they were written in the log, using the REDO
procedure.
The UNDO procedure is defined as follows:
Undoing a write operation consists of examining its log entry write_item(T, x, old_value, new_value)
and setting the value of x in the database to old_value. Undoing a number of such write operations
from one or more transactions from the log must proceed in the reverse order from the order in
which the operations were written in the log.
Use two lists of transaction maintained by the system: the committed transactions since the last
checkpoint, and the active transactions. Undo all the write operations of the active (uncommitted)
transaction from the log, using the UNDO procedure. The operations should be undone in the
reverse of the order in which they were written into the log. Redo all the write operations of the
committed transactions from the log, in the order in which they were written in the log, using the
REDO procedure.
Given the same transactions and system log in exercise 1, discuss how each transaction recovers
from the failure using immediate update technique.
The same schedule and initial values of the variables in exercise 2 are given; use the immediate
update protocol for recovery to show how each transaction recovers from the failure.
Review question 4
1. Use your own words to describe the immediate update method for recovery management in
a multi-user environment. Complete the following table to show how the immediate update
protocol affects the log on disk, database buffer and database on disk.
2. In general, we can distinguish two main categories of immediate update algorithms. If the
recovery technique ensures that all updates of a transaction are recorded in the database on
disk before the transaction commits, there is never a need to redo any operations of
committed transactions. Such an algorithm is called UNDO/NO-REDO. On the other hand, if
the transaction is allowed to commit before all its changes are written to the database, we
have the UNDO/REDO method.
Recovery in multidatabase transactions
So far, we have implicitly assumed that a transaction accesses a single database. In some cases a
single transaction, called a multidatabase transaction, may require access to multiple database.
These databases may even be stored on different types of DBMSs; for example, some DBMSs may be
Relational, whereas others are hierarchical or network DBMSs. In such a case, each DBMS involved in
the multidatabase transaction will have its own recovery technique and transaction manager
separate from those of the other DBMSs. This situation is somewhat similar to the case of a
distributed database management system, where parts of the database reside at different sites that
are connected by a communication network.
PHASE 1: When all participating databases signal the coordinator that the part of the multidatabase
transaction involving them has concluded, the coordinator sends a message “prepare for
commit― to each participant to get ready for committing the transaction. Each participating
database receiving that message will force-write all log records to disk and then send a “ready to
commit― or “OK― signal to the coordinator. If the force-writing to disk fails or the local
transaction cannot commit for some reason, the participating database sends a “cannot
commit†or “not OK†signal to the coordinator. If the coordinator does not receive a reply
from a database within a certain time interval, it assumes a “not OK― response.
PHASE 2: If all participating databases reply “OK―, the transaction is successful, and the
coordinator sends a “commit― signal for the transaction to the participating databases.
Because all the local effects of the transaction have been recorded in the logs of the participating
databases, recovery from failure is now possible. Each participating database completes transaction
commit by writing a commit(T) entry for the transaction in the log and permanently updating the
database if needed. On the other hand, if one or more of the participating databases have a “not
OK― response to the coordinator, the transaction has failed, and the coordinator sends a message
to “roll back― or UNDO the local effect of the transaction to each participating database. This is
done by undoing the transaction operations, using the log.
The net effect of the two-phase commit protocol is that either all participating databases commit
the effect of the transaction or none of them do. In case any of the participants – or the
coordinator – fails, it is always possible to recover to a state where either the transaction is
committed or it is rolled back. A failure during or before phase 1 usually requires the transaction to
be rolled back, whereas a failure during phase 2 means that a successful transaction can recover and
commit.
Checkpoints
Why do we need Checkpoints ?
Whenever transaction logs are created in a real-time environment, it eats up
lots of storage space. Also keeping track of every update and its
maintenance may increase the physical space of the system. Eventually, the
transaction log file may not be handled as the size keeps growing. This can
be addressed with checkpoints. The methodology utilized for removing all
previous transaction logs and storing them in permanent storage is called
a Checkpoint.
What is a Checkpoint ?
The checkpoint is used to declare a point before which the DBMS was in the
consistent state, and all transactions were committed. During transaction
execution, such checkpoints are traced. After execution, transaction log files
will be created.
Upon reaching the savepoint/checkpoint, the log file is destroyed by saving
its update to the database. Then a new log is created with upcoming
execution operations of the transaction and it will be updated until the next
checkpoint and the process continues.
How to use Checkpoints in database ?
Steps :
1. Write begin_checkpoint record into log.
2. Collect checkpoint data in the stable storage.
3. Write end_checkpoint record into log.
The behavior when the system crashes and recovers when concurrent
transactions are executed is shown below –
To understand concept, consider above figure. In this 2 write operations are performed
on page 3 and 5. Before start of write operation on page 3, current page table points to
old page 3. When write operation starts following steps are performed :
1. Firstly, search start for available free block in disk blocks.
2. After finding free block, it copies page 3 to free block which is represented by
Page 3 (New).
3. Now current page table points to Page 3 (New) on disk but shadow page table
points to old page 3 because it is not modified.
4. The changes are now propagated to Page 3 (New) which is pointed by current page
table.
COMMIT Operation :
To commit transaction following steps should be done :
1. All the modifications which are done by transaction which are present in buffers
are transferred to physical database.
2. Output current page table to disk.
3. Disk address of current page table output to fixed location which is in stable
storage containing address of shadow page table. This operation overwrites
address of old shadow page table. With this current page table becomes same as
shadow page table and transaction is committed.
Failure :
If system crashes during execution of transaction but before commit operation, With
this, it is sufficient only to free modified database pages and discard current page
table. Before execution of transaction, state of database get recovered by reinstalling
shadow page table.
If the crash of system occur after last write operation then it does not affect
propagation of changes that are made by transaction. These changes are preserved and
there is no need to perform redo operation.
Advantages :
This method require fewer disk accesses to perform operation.
In this method, recovery from crash is inexpensive and quite fast.
There is no need of operations like- Undo and Redo.
Disadvantages :
Due to location change on disk due to update database it is quite difficult to keep
related pages in database closer on disk.
During commit operation, changed blocks are going to be pointed by shadow page
table which have to be returned to collection of free blocks otherwise they become
accessible.
The commit of single transaction requires multiple blocks which decreases
execution speed.
To allow this technique to multiple transactions concurrently it is difficult.