Topic:
Recovery System
Failure Classification
• Transaction failure :
• Local failure or Logical errors: transaction cannot complete due to some internal error
condition
• System errors: the database system must terminate an active transaction due to an error
condition (e.g., deadlock)
• Global failure or System crash: a power failure or other hardware or software failure causes
the system to crash.
• Fail-stop assumption: non-volatile storage contents are assumed to not be corrupted by
system crash
• Database systems have numerous integrity checks to prevent corruption of disk data
• Media failure or Disk failure: a head crash or similar disk failure destroys all or part of disk
storage and also called hard crash.
▪
Log-Based Recovery
A log is a sequence of log records. The records keep information about update activities on
the database.
• The log is kept on stable storage
▪ When transaction Ti starts, it registers itself by writing a
<Ti start> log record
▪ Before Ti executes write(X), a log record
<Ti, X, V1, V2>
is written, where V1 is the value of X before the write (the old
value), and V2 is the value to be written to X (the new value).
▪ When Ti finishes it last statement, the log record <Ti commit> is written.
▪ Two approaches using logs
• Immediate database modification
• Deferred database modification.
Deferred database Modification
▪ The deferred modification technique ensures transaction atomicity by recording all database
modifications in the log, but deferring the execution of all write operations of a transaction until the
transaction partially commits.
▪ When a transaction partially commits, the information on the log associated with the transaction is
used in executing the deferred writes.
▪ If the system crashes before the transaction completes its transaction, or if the transaction aborts,
then the information on the log is simply ignored.
▪ The execution of transaction Ti proceeds as follows:
▪ Before Ti starts its execution a record <Ti start> is written to the log. A write(X) operation by Ti
results in the writing of a new record to the log. Finally when Ti partially commits, a record <Ti
commit> is written to the log.
▪ When transaction Ti partially commits, the records associated with it in the log are used in
executing the deferred writes.
Undo and Redo Operations
• Undo and Redo of Transactions
• undo(Ti) -- restores the value of all data items updated by Ti to their old values,
going backwards from the last log record for Ti
• Each time a data item X is restored to its old value V a special log record <Ti , X, V> is written
out
• When undo of a transaction is complete, a log record
<Ti abort> is written out.
• redo(Ti) -- sets the value of all data items updated by Ti to the new values, going
forward from the first log record for Ti
• No logging is done in this case
Recovering from Failure
• When recovering after failure:
• Transaction Ti needs to be undone if the log
• Contains the record <Ti start>,
• But does not contain either the record <Ti commit> or <Ti abort>.
• Transaction Ti needs to be redone if the log
• Contains the records <Ti start>
• And contains the record <Ti commit> or <Ti abort>
Recovering from Failure (Cont.)
• Suppose that transaction Ti was undone earlier and the <Ti abort>
record was written to the log, and then a failure occurs,
• On recovery from failure transaction Ti is redone
• Such a redo redoes all the original actions of transaction Ti including the steps
that restored old values
• Known as repeating history
• Seems wasteful, but simplifies recovery greatly
Checkpoints
• Redoing/undoing all transactions recorded in the log can be very slow
• Processing the entire log is time-consuming if the system has run for a long time
• We might unnecessarily redo transactions which have already output their
updates to the database.
• Streamline recovery procedure by periodically performing checkpointing
1. Output all log records currently residing in main memory onto stable
storage.
2. Output all modified buffer blocks to the disk.
3. Write a log record < checkpoint L> onto stable storage where L is a
list of all transactions active at the time of checkpoint.
4. All updates are stopped while doing checkpointing
Checkpoints (cont.)
• During recovery we need to consider only the most recent transaction Ti that
started before the checkpoint, and transactions that started after Ti.
• Scan backwards from end of log to find the most recent <checkpoint L> record
• Only transactions that are in L or started after the checkpoint need to be redone
or undone
• Transactions that committed or aborted before the checkpoint already have all
their updates output to stable storage.
• Some earlier part of the log may be needed for undo operations
• Continue scanning backwards till a record <Ti start> is found for every transaction
Ti in L.
• Parts of log prior to earliest <Ti start> record above are not needed for recovery,
and can be erased whenever desired.
Example of Checkpoints