0% found this document useful (0 votes)
21 views59 pages

UNIT IV Madhura

The document discusses database transaction management, focusing on concepts such as transaction states, ACID properties, and concurrency control mechanisms. It explains the importance of serializability, recoverability, and various protocols like lock-based and timestamp-based methods to ensure consistency and isolation during concurrent executions. Additionally, it highlights issues like deadlocks and starvation that can arise in transaction management systems.

Uploaded by

mahanpiyush88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views59 pages

UNIT IV Madhura

The document discusses database transaction management, focusing on concepts such as transaction states, ACID properties, and concurrency control mechanisms. It explains the importance of serializability, recoverability, and various protocols like lock-based and timestamp-based methods to ensure consistency and isolation during concurrent executions. Additionally, it highlights issues like deadlocks and starvation that can arise in transaction management systems.

Uploaded by

mahanpiyush88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Unit IV Database Transaction

Management

By
Madhura Kalbhor
Contents
• Introduction to Database Transaction
• Transaction states, ACID properties

• Concept of Schedule, Serial Schedule

• Serializability: Conflict and View, Cascaded Aborts, Recoverable and Non-


recoverable Schedules.

• Concurrency Control: Lock-based, Time-stamp based Deadlock handling.

• Recovery methods: Shadow-Paging and Log-Based Recovery, Checkpoints.

• Log-Based Recovery: Deferred Database Modifications and Immediate


Database Modifications.
Introduction
• A transaction is a unit of program execution that accesses and

possibly updates various data items.

• A transaction must see a consistent database.

• During transaction execution the database may be inconsistent.

• When the transaction is committed, the database must be consistent.

• Two main issues to deal with:

• Failures of various kinds, such as hardware failures and system


crashes

• Concurrent execution of multiple transactions


ACID Rules
• Atomicity. Either all operations of the transaction are properly reflected in the
database or none are.

• Consistency. Execution of a transaction in isolation preserves the consistency of the


database.

• Isolation. Although multiple transactions may execute concurrently, each


transaction must be unaware of other concurrently executing transactions.
Intermediate transaction results must be hidden from other concurrently executed
transactions.

• That is, for every pair of transactions Ti and Tj, it appears to Ti that either Tj,
finished execution before Ti started, or Tj started execution after Ti finished.

• Durability. After a transaction completes successfully, the changes it has made to


the database persist, even if there are system failures.
Example of Fund Transfer
• Transaction to transfer $50 from account A to account B:

1. read(A)

2. A := A – 50

3. write(A)

4. read(B)

5. B := B + 50

6. write(B)

• Consistency requirement – the sum of A and B is unchanged by the execution of the


transaction.

• Atomicity requirement — if the transaction fails after step 3 and before step 6, the system
should ensure that its updates are not reflected in the database, else an inconsistency will
result
Example of Fund Transfer
• Durability requirement — once the user has been notified that the transaction has
completed (i.e., the transfer of the $50 has taken place), the updates to the database
by the transaction must persist despite failures.

• Isolation requirement — if between steps 3 and 6, another transaction is allowed to


access the partially updated database, it will see an inconsistent database (the sum A
+ B will be less than it should be).

• Can be ensured trivially by running transactions serially, that is one after the other.
However, executing multiple transactions concurrently has significant benefits, as
we will see.
Transaction State
• Active, the initial state; the transaction stays in this state while it is
executing

• Partially committed, after the final statement has been executed.

• Failed, after the discovery that normal execution can no longer proceed.

• Aborted, after the transaction has been rolled back and the database restored
to its state prior to the start of the transaction. Two options after it has been
aborted:

• restart the transaction – only if no internal logical error-ATM


• kill the transaction
• Committed, after successful completion.
Transaction State
Concurrent Executions
 Multiple transactions are allowed to run concurrently in the system.
Advantages are:

 increased processor and disk utilization, leading to better transaction


throughput: one transaction can be using the CPU while another is
reading from or writing to the disk

 reduced average response time for transactions: short transactions need


not wait behind long ones.

 Concurrency control schemes – mechanisms to achieve isolation, i.e., to


control the interaction among the concurrent transactions in order to prevent
them from destroying the consistency of the database
Schedules
 Schedules – sequences that indicate the chronological order in which
instructions of concurrent transactions are executed

 a schedule for a set of transactions must consist of all instructions


of those transactions

 must preserve the order in which the instructions appear in each


individual transaction.
Example Schedules
 Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance
from A to B. The following is a serial schedule (Schedule 1 in the
text), in which T1 is followed by T2.
Example Schedules
 Let T1 and T2 be the transactions defined previously. The following schedule
(Schedule 3 in the text) is not a serial schedule, but it is equivalent to Schedule 1.

In both Schedule 1 and 3, the sum A + B is preserved.


Example Schedules
 The following concurrent schedule (Schedule 4 in the text) does not preserve the
value of the the sum A + B.
Example Schedules
 The following concurrent schedule (Schedule 4 in the text) does not preserve the
value of the the sum A + B.
Serializability
• Basic Assumption – Each transaction preserves database consistency.

• Thus serial execution of a set of transactions preserves database consistency.

• A (possibly concurrent) schedule is serializable if it is equivalent to a serial


schedule. Different forms of schedule equivalence give rise to the notions of:

1. conflict serializability
2. view serializability

• We ignore operations other than read and write instructions, and we assume that
transactions may perform arbitrary computations on data in local buffers in between
reads and writes. Our simplified schedules consist of only read and write
instructions.
Conflict Serializability
• Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if
there exists some item Q accessed by both li and lj, and at least one of these
instructions wrote Q.

1. li = read(Q), lj = read(Q). li and lj don’t conflict.

2. li = read(Q), lj = write(Q). They conflict.

3. li = write(Q), lj = read(Q). They conflict

4. li = write(Q), lj = write(Q). They conflict

• Intuitively, a conflict between li and lj forces a (logical) temporal order between


them. If li and lj are consecutive in a schedule and they do not conflict, their results
would remain the same even if they had been interchanged in the schedule.
Conflict Serializability
 Schedule 3 below can be transformed into Schedule 1, a serial
schedule where T2 follows T1, by series of swaps of non-conflicting
instructions. Therefore Schedule 3 is conflict serializable.
Conflict Serializability
• If a schedule S can be transformed into a schedule S´ by a series of swaps of
non-conflicting instructions, we say that S and S´ are conflict equivalent.
• We say that a schedule S is conflict serializable if it is conflict equivalent to
a serial schedule
Not Conflict Serializable
•Example of a schedule that is not conflict serializable:

We are unable to swap instructions in the above schedule to obtain either


the serial schedule < T3, T4 >, or the serial schedule < T4, T3 >.
View Serializability
• Let S and S´ be two schedules with the same set of transactions. S and S´ are view
equivalent if the following three conditions are met:

1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S,
then transaction Ti must, in schedule S´, also read the initial value of Q.

2. For each data item Q if transaction Ti executes read(Q) in schedule S, and that
value was produced by transaction Tj (if any), then transaction Ti must in schedule
S´ also read the value of Q that was produced by transaction Tj .

3. For each data item Q, the transaction (if any) that performs the final write(Q)
operation in schedule S must perform the final write(Q) operation in schedule S´.

As can be seen, view equivalence is also based purely on reads and writes alone.
View Equivalent
Not View Equivalent
View Serializability (Cont.)
•A schedule S is view serializable it is view equivalent to a serial schedule.
•Every conflict serializable schedule is also view serializable.

•Schedule 9 (from text) — a schedule which is view-serializable but not conflict


serializable.

•Every view serializable schedule that is not conflict serializable has blind writes.
Every conflict serializable schedule is
also view serializable
Recoverability
 Recoverable schedule — if a transaction Tj reads a data items previously written
by a transaction Ti , the commit operation of Ti appears before the commit
operation of Tj.
 The following schedule (Schedule 11) is not recoverable if T9 commits immediately
after the read

 If T8 should abort, T9 would have read (and possibly shown to the user) an
inconsistent database state. Hence database must ensure that schedules are
recoverable.
Recoverability
 Cascading rollback – a single transaction failure leads to a series of transaction
rollbacks. Consider the following schedule where none of the transactions has yet
committed (so the schedule is recoverable)

If T10 fails, T11 and T12 must also be rolled back.


 Can lead to the undoing of a significant amount of work
Recoverability
 Cascadeless schedules — cascading rollbacks cannot occur; for each
pair of transactions Ti and Tj such that Tj reads a data item previously
written by Ti , the commit operation of Ti appears before the read
operation of Tj.

 Every cascadeless schedule is also recoverable

 It is desirable to restrict the schedules to those that are cascadeless


Concurrency Control
• A database must provide a mechanism that will ensure that all
possible schedules are
– either conflict or view serializable, and
– are recoverable and preferably cascadeless
• A policy in which only one transaction can execute at a time
generates serial schedules, but provides a poor degree of
concurrency
– Are serial schedules recoverable/cascadeless?
• Testing a schedule for serializability after it has executed is a
little too late!
• Goal – to develop concurrency control protocols that will
assure serializability.
Concurrency Control

 Lock-Based Protocols
 Timestamp-Based Protocols

PCCOE, Pune Prof. Rahul Patil


4/28/2022
Lock-Based Protocols

 A lock is a mechanism to control concurrent access to a


data item
 Data items can be locked in two modes :
1. exclusive (X) mode. Data item can be both read as well as
written. X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is
requested using lock-S instruction.
 Lock requests are made to concurrency-control manager.
Transaction can proceed only after request is granted.

PCCOE, Pune Prof. Rahul Patil 4/28/2022


Lock-Based Protocols (Cont.)4

 Lock-compatibility matrix

 A transaction may be granted a lock on an item if the requested


lock is compatible with locks already held on the item by other
transactions
 Any number of transactions can hold shared locks on an item,
but if any transaction holds an exclusive on the item no other
transaction may hold any lock on the item.
 If a lock cannot be granted, the requesting transaction is made
to wait till all incompatible locks held by other transactions have
been released. The lock is then granted.
PCCOE, Pune Prof. Rahul Patil 4/28/2022
Lock-Based Protocols (Cont.)
 Example of a transaction performin6g locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
 Locking as above is not sufficient to guarantee serializability — if A and B
get updated in-between the read of A and B, the displayed sum would be
wrong.
 A locking protocol is a set of rules followed by all transactions while
requesting and releasing locks. Locking protocols restrict the set of
possible schedules.
PCCOE, Pune Prof. Rahul Patil 4/28/2022
Pitfalls of Lock-Based
Protocols 7
 Consider the partial schedule

 Neither T3 nor T4 can make progress — executing lock-S(B) causes T4


to wait for T3 to release its lock on B, while executing lock-X(A) causes
T3 to wait for T4 to release its lock on A.
 Such a situation is called a deadlock.
 To handle a deadlock one of T3 or T4 must be rolled back
and its locks released.
PCCOE, Pune Prof. Rahul Patil 4/28/2022
Pitfalls of Lock-Based
Protocols (Cont.)
8

 The potential for deadlock exists in most locking protocols.


Deadlocks are a necessary evil.
 Starvation is also possible if concurrency control
manager is badly designed. For example:
 A transaction may be waiting for an X-lock on an item,
while a sequence of other transactions request and
are granted an S-lock on the same item.
 The same transaction is repeatedly rolled back due to
deadlocks.
 Concurrency control manager can be designed to prevent
starvation.

PCCOE, Pune Prof. Rahul Patil 4/28/2022


Lock-Based Protocols (Cont.)
 Starvation is also possible if concurrency control
manager is badly designed. For example:
 A transaction may be waiting for an X-lock on an item,
while a sequence of other transactions request and
are granted an S-lock on the same item.
 The same transaction is repeatedly rolled back due to
deadlocks.
 Concurrency control manager can be designed to prevent
starvation.

PCCOE, Pune Prof. Rahul Patil 4/28/2022


The Two-Phase Locking
Protocol
 This is a protocol which ensures conflict-
serializable schedules.
 Phase 1: Growing Phase
 transaction may obtain locks
 transaction may not release locks
 Phase 2: Shrinking Phase
 transaction may release locks
 transaction may not obtain locks

PCCOE, Pune Prof. Rahul Patil 4/28/2022


The Two-Phase Locking Protocol
(Cont.)
 Two-phase locking does not ensure freedom from
deadlocks

 Cascading roll-back is possible under two-phase locking.


To avoid this, follow a modified protocol called strict two-
phase locking. Here a transaction must hold all its
exclusive locks till it commits/aborts.
 Rigorous two-phase locking is even stricter: here all
locks are held till commit/abort. In this protocol
transactions can be serialized in the order in which they
commit.

PCCOE, Pune Prof. Rahul Patil 4/28/2022


Timestamp-Based Protocols
 Each transaction is issued a timestamp when it enters the
system. If an old transaction Ti has time-stamp TS(Ti), a new
transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti)
<TS(Tj).
 The protocol manages concurrent execution such that the
time-stamps determine the serializability order.
 In order to assure such behavior, the protocol maintains for
each data Q two timestamp values:
 W-timestamp(Q) is the largest time-stamp of any
transaction that executed write(Q) successfully.
 R-timestamp(Q) is the largest time-stamp of any
transaction that executed read(Q) successfully.

PCCOE, Pune Prof. Rahul Patil 4/28/2022


Timestamp-Based Protocols (Cont.)

 The timestamp ordering protocol ensures that any conflicting


read and write operations are executed in timestamp order.
 Suppose a transaction Ti issues a read(Q)
1. If TS(Ti)  W-timestamp(Q), then Ti needs to read a value of Q
that was already overwritten. Hence, the read operation is
rejected, and Ti is rolled back.
2. If TS(Ti) W-timestamp(Q), then the read operation is
executed, and R-timestamp(Q) is set to the maximum of R-
timestamp(Q) and TS(Ti).

PCCOE, Pune Prof. Rahul Patil 4/28/2022


Timestamp-Based Protocols
(Cont.)
 Suppose that transaction Ti issues write(Q).
 If TS(Ti) < R-timestamp(Q), then the value of Q that Ti is
producing was needed previously, and the system
assumed that that value would never be produced. Hence,
the write operation is rejected, and Ti is rolled back.
 If TS(Ti) < W-timestamp(Q), then Ti is attempting to write
an obsolete value of Q. Hence, this write operation is
rejected, and Ti is rolled back.
 Otherwise, the write operation is executed, and W-
timestamp(Q) is set to TS(Ti).

PCCOE, Pune Prof. Rahul Patil 4/28/2022


Recovery System

 Failure Classification
 Storage Structure
 Recovery and Atomicity
 Log-Based Recovery
 Shadow Paging
Failure Classification

 Transaction failure :
 Logical errors: transaction cannot complete due to some
internal error condition
 System errors: the database system must terminate an
active transaction due to an error condition (e.g., deadlock)
 System crash: a power failure or other hardware or software
failure causes the system to crash.
 Fail-stop assumption: non-volatile storage contents are
assumed to not be corrupted by system crash
 Database systems have numerous integrity checks to
prevent corruption of disk data
 Disk failure: a head crash or similar disk failure destroys all or
part of disk storage
 Destruction is assumed to be detectable: disk drives use
checksums to detect failures
Recovery Algorithms

 Recovery algorithms are techniques to ensure


database consistency and transaction atomicity
and durability despite failures

 Recovery algorithms have two parts


1. Actions taken during normal transaction processing to
ensure enough information exists to recover from
failures
2. Actions taken after a failure to recover the database
contents to a state that ensures atomicity, consistency
and durability
Storage Structure

 Volatile storage:
 does not survive system crashes
 examples: main memory, cache memory
 Nonvolatile storage:
 survives system crashes
 examples: disk, tape, flash memory,
non-volatile (battery backed up) RAM
 Stable storage:
 a mythical form of storage that survives all failures
 approximated by maintaining multiple copies on distinct
nonvolatile media
Recovery and Atomicity

 Modifying the database without ensuring that the


transaction will commit may leave the database in
an inconsistent state.
 Consider transaction Ti that transfers $50 from
account A to account B; goal is either to perform
all database modifications made by Ti or none at
all.
 Several output operations may be required for Ti
(to output A and B). A failure may occur after one of
these modifications have been made but before all
of them are made.
Recovery and Atomicity (Cont.)

 To ensure atomicity despite failures, we first


output information describing the
modifications to stable storage without
modifying the database itself.
 We study two approaches:
 log-based recovery, and
 shadow-paging
 We assume (initially) that transactions run
serially, that is, one after the other.
Log-Based Recovery
 A log is kept on stable storage.
 The log is a sequence of log records, and maintains a record of update
activities on the database.
 When transaction Ti starts, it registers itself by writing a
<Ti start>log record
 Before Ti executes write(X), a log record <Ti, X, V1, V2> is written,
where V1 is the value of X before the write, and V2 is the value to be
written to X.
 Log record notes that Ti has performed a write on data item Xj Xj had
value V1 before the write, and will have value V2 after the write.
 When Ti finishes it last statement, the log record <Ti commit> is
written.
 We assume for now that log records are written directly to stable
storage (that is, they are not buffered)
 Two approaches using logs
 Deferred database modification
 Immediate database modification
Deferred Database Modification

 The deferred database modification scheme records all


modifications to the log, but defers all the writes to after partial
commit.
 Assume that transactions execute serially
 Transaction starts by writing <Ti start> record to log.
 A write(X) operation results in a log record <Ti, X, V> being
written, where V is the new value for X
 Note: old value is not needed for this scheme
 The write is not performed on X at this time, but is deferred.
 When Ti partially commits, <Ti commit> is written to the log
 Finally, the log records are read and used to actually execute
the previously deferred writes.
Deferred Database Modification
(Cont.)
 During recovery after a crash, a transaction needs to be redone if
and only if both <Ti start> and<Ti commit> are there in the log.
 Redoing a transaction Ti ( redoTi) sets the value of all data items
updated by the transaction to the new values.
 Crashes can occur while
 the transaction is executing the original updates, or
 while recovery action is being taken
 example transactions T0 and T1 (T0 executes before T1):
T0: read (A) T1 : read (C)
A: - A - 50 C:- C- 100
Write (A) write (C)
read (B)
B:- B + 50
write (B)
Deferred Database Modification (Cont.)

 Below we show the log as it appears at three instances of


time.

If log on stable storage at time of crash is as in case:


(a) No redo actions need to be taken
(b) redo(T0) must be performed since <T0 commit> is present
(c) redo(T0) must be performed followed by redo(T1) since
<T0 commit> and <Ti commit> are present
Immediate Database Modification

 The immediate database modification scheme allows database


updates of an uncommitted transaction to be made as the writes are
issued
 since undoing may be needed, update logs must have both old
value and new value
 Update log record must be written before database item is written
 We assume that the log record is output directly to stable storage
 Can be extended to postpone log record output, so long as prior to
execution of an output(B) operation for a data block B, all log
records corresponding to items B must be flushed to stable storage
 Output of updated blocks can take place at any time before or after
transaction commit
 Order in which blocks are output can be different from the order in
which they are written.
Immediate Database Modification
Example

Log Write Output

<T0 start>
<T0, A, 1000, 950>
To, B, 2000, 2050
A = 950
x1 B = 2050
<T0 commit>
<T1 start>
<T1, C, 700, 600>
C = 600
BB, BC
<T1 commit>
BA
 Note: BX denotes block containing X.
Immediate Database Modification
(Cont.)

 Recovery procedure has two operations instead of one:


 undo(Ti) restores the value of all data items updated by Ti to their
old values, going backwards from the last log record for Ti
 redo(Ti) sets the value of all data items updated by Ti to the new
values, going forward from the first log record for Ti
 Both operations must be idempotent
 That is, even if the operation is executed multiple times the effect is
the same as if it is executed once
 Needed since operations may get re-executed during recovery
 When recovering after failure:
 Transaction Ti needs to be undone if the log contains the record
<Ti start>, but does not contain the record <Ti commit>.
 Transaction Ti needs to be redone if the log contains both the
record <Ti start> and the record <Ti commit>.
 Undo operations are performed first, then redo operations.
Immediate DB Modification
Recovery Example

Below we show the log as it appears at three instances of time.

Recovery actions in each case above are:


(a) undo (T0): B is restored to 2000 and A to 1000.
(b) undo (T1) and redo (T0): C is restored to 700, and then A and B are
set to 950 and 2050 respectively.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
respectively. Then C is set to 600
Shadow Paging

 Shadow paging is an alternative to log-based recovery; this


scheme is useful if transactions execute serially
 Idea: maintain two page tables during the lifetime of a transaction –
the current page table, and the shadow page table
 Store the shadow page table in nonvolatile storage, such that state
of the database prior to transaction execution may be recovered.
 Shadow page table is never modified during execution
 To start with, both the page tables are identical. Only current page
table is used for data item accesses during execution of the
transaction.
 Whenever any page is about to be written for the first time
 A copy of this page is made onto an unused page.
 The current page table is then made to point to the copy
 The update is performed on the copy
Sample Page Table
Example of Shadow Paging
Shadow and current page tables after write to page 4
Shadow Paging (Cont.)

 To commit a transaction :
1. Flush all modified pages in main memory to disk
2. Output current page table to disk
3. Make the current page table the new shadow page table, as follows:
 keep a pointer to the shadow page table at a fixed (known) location on disk.
 to make the current page table the new shadow page table, simply update
the pointer to point to current page table on disk
 Once pointer to shadow page table has been written, transaction is
committed.
 No recovery is needed after a crash — new transactions can start right
away, using the shadow page table.
 Pages not pointed to from current/shadow page table should be freed
(garbage collected).
Show Paging (Cont.)

 Advantages of shadow-paging over log-based schemes


 no overhead of writing log records
 recovery is trivial
 Disadvantages :
 Copying the entire page table is very expensive
 Can be reduced by using a page table structured like a B+-tree
 No need to copy entire tree, only need to copy paths in the
tree that lead to updated leaf nodes
 Commit overhead is high even with above extension
 Need to flush every updated page, and page table
 Data gets fragmented (related pages get separated on disk)
 After every transaction completion, the database pages containing
old versions of modified data need to be garbage collected
 Hard to extend algorithm to allow transactions to run concurrently
 Easier to extend log based schemes

You might also like