Transactions Management,
Concurrency and Recovery
                12-10-2022
                             1
                           Agenda
1.   Database System Concepts and Architecture
2.   Entity Relationship Model
3.   Relational Model & Relational Algebra
4.   Structured Query Language (SQL) & Indexing
5.   Relational Database Design
6.   Transactions Management, Concurrency and Recovery
     ■ Transaction      Concepts, State Diagram, ACID Properties,
        Transaction Control Commands, Concurrent Executions,
        Serializability Conflict and View.
     ■ Concurrency Control: Lock-based-protocols, Deadlock Handling,
        Timestamp-Based Protocols, Recovery System: Recovery
        Concepts, Log Based Recovery.
                                                                  2
                               RECAP
■   A transaction
     ■ an atomic unit of work
     ■ implements all-or-nothing semantic
■   The state of transactions
■   Active: the initial state; the transaction stays in this state while it is
    executing.
■   Partially committed: after the final statement has been executed
■   Failed: after the discovery that normal execution can no longer proceed.
■   Aborted: after the transaction has been rolled backed and the database has
    been restored to its state prior to the start of transaction.
■   Committed: after successful completion
■   Log (or journal)
■   keeps track of any operations that changes the state of database system
■   used for recovery/auditing and stored on disk/tape
                                                                            3
                       RECAP
■   LOG System: Operations
Two operations:
   ■ Undo operation, traces backward – atomicity
   ■ Redo operation trace forward – durability
■ Types of log entries (or RECORDS)
   ■ [start_trans, T]
   ■ [write_item, T, X, old, new]
   ■ [read_item,T, X]
   ■ [commit, T]
   ■ [abort, T]
■ Schedule (S)
   ■ Refers to the order of operations from multiple transactions
     & only interested in read(r), write(w), commit(c), abort(a)
                                                                    4
                       Conflicts
■   Two operations in Schedule S are said to
    conflict iff
      1.   They both belong to different transactions
      2.   They both access the same data item
      3.   One of the operation is write
                                                        5
     Example of schedule with conflict
                operations
■   S a:
    ■   r1(x);
    ■   r2(x);
    ■   w1(x);
    ■   r1(y);
    ■   w2(x);
    ■   w1(y);
                                         6
                 Example of rollback
■   S3,
    ■   r1(x);
    ■   w1(x);
                     T3 needs to rollback ; why?
    ■   r2(x);
    ■   r1(y);          because it reads item x written by
    ■   w2(x);          T2; T2 is aborted;
    ■   w1(y);       T2 needs to rollback; why?
    ■   r3(x)           because it reads item x written by
    ■   w3(x)           T1; T1 is aborted
    ■   a1
    ■   a2
    ■   a3
                                                             7
                        Serializability
■   Assumption: Every serial schedule is correct
■   Goal: Like to find non-serial schedules which are
    also correct
■   A schedule S of n transactions is serializable if it is
    equivalent to some serial schedule of the same n
    transactions
■   When are two schedules equivalent?
    ■   Result equivalent
    ■   View equivalent
    ■   Conflict equivalent
                                                         8
           Result Equivalent Schedules
■    Two schedules are result equivalent if they produce the same
     final state of the database
■    Problem: May produce same result by accident!
                S1                            S2
       read_item(X);                  read_item(X);
       X:=X+10;                       X:=X*1.1;
       write_item(X);                 write_item(X);
    Schedules S1 and S2 are result equivalent for X=100
    But, How about X = 200 ? , Not practical.
                                                                9
                 Conflict Serializability
■   Conflict equivalent
    ■   Two schedules S0 and S1 are conflict equivalent if S0
        can be transformed into S1 by a series of swaps of
        non-conflicting operations
    ■   ri(y); ri(x); wj(x); rj(z) ≡ ri(z); ri(x); wj(x); rj(y)
■   Conflict serializable
    ■    S is conflict serializable if it is conflict equivalent to
        some serial schedule
                                                                  10
           Conflict Equivalence
               Serial Schedule S1
               T1                            T2
  read_item(A);
  write_item(A);
order doesn’t matter
                         order matters
  read_item(B);
  write_item(B);                   read_item(A):
                                   write_item(A);
                order matters                           order
                                   read_item(B);    doesn’t matter
                                   write_item(B);
                                                            11
Conflict Equivalence
                           Schedule S2
               T1                               T2
   read_item(A);
   read_item(B);
   write_item(A);      same order as in S1
                                      read_item(A):
                                      write_item(A);
   write_item(B);       same order as in S1
                                      read_item(B);
                                      write_item(B);
  S1 and S2 are conflict equivalent
(S2 produces the same result as S1)
                                                       12
     Conflict Equivalence
                  Schedule S3
          T1                                T2
                                  read_item(A):
                                  write_item(A);
read_item(A);    different order as in S1
write_item(A);
                                  read_item(B);
                                  write_item(B);
                 different order as in S1
read_item(B);
write_item(B);   Schedule S3 is not conflict equivalent to S1
                    (produces a different result than S1)
                                                           13
Conflict Serializable
■   Schedule S is conflict serializable if it is conflict equivalent to
    some serial schedule S’
    ■   Can reorder the non-conflicting operations to improve efficiency
■   Non-conflicting operations:
    ■   Reads and writes from same transaction
    ■   Reads from different transactions
    ■   Reads and writes from different transactions on different data
        items
■   Conflicting operations:
    ■   Reads and writes from different transactions on same data item
                                                                         14
                           Example
        Schedule A                                       Schedule B
                         T2                                            T2
         T1                                         T1
read_item(X);                              read_item(X);
X:=X-N;                                    X:=X-N;
write_item(X);                             write_item(X);
                                                               read_item(X);
read_item(Y);
                                                               X:=X+M;
Y:=Y+N;
                                                               write_item(X);
write_item(Y);
                 read_item(X);
                                           read_item(Y);
                 X:=X+M;
                                           Y:=Y+N;
                 write_item(X);
                                           write_item(Y);
                         B is conflict equivalent to A
                         ⇒ B is serializable
                                                                            15
              View Serializability
■   Two schedules S0 and S1 having the same set of
    transactions are said to be view equivalent iff
     ■ If T reads the initial value of a data item in S , then
            i                                          0
        Ti reads the initial value of the item in S1.
     ■ If T writes the final value of a data item in S , T
             i                                            0   i
        also writes the final value of the item in S1.
     ■ If T reads an item written by T in S , T also reads
            i                              j    0  i
        that changed item by Tj in S1.
■   if S is a view equivalent to some serial schedule S’,
    then S is view serializable
                                                              16
             Test for Serializability
■    Construct a directed graph, precedence graph, G =
     (V, E)
    ■ V: set of all transactions participating in schedule
    ■ E: set of edges T → T for which one of the
                             i     j
      following holds:
       ■ T executes a write_item(X) before T executes
           i                                        j
         read_item(X)
       ■ T executes a read_item(X) before T executes
           i                                        j
         write_item(X)
       ■ T executes a write_item(X) before T executes
           i                                        j
         write_item(X)
                                                         17
              Test for Serializability
■    Construct a directed graph, precedence graph, G =
     (V, E)
    ■ V: set of all transactions participating in schedule
    ■ E: set of edges T → T
                         i     j
    ■   An edge Ti → Tj means that in any serial schedule
        equivalent to S, Ti must come before Tj
    ■   If G has a cycle, then S is not conflict serializable
    ■   If not, use topological sort to obtain serialiazable
        schedule (linear order consistent with precedence
        order of graph)
                                                            18
         Serialization Graph
■   Consider the schedule S:
    W1(X),R2(Y),R1(Y),R2(X)
■   The precedence graph is
            T1                 T2
    Thus, it is conflict equivalent to T1,T2
                                               19
            Serialization Graph
■   Consider again the
    schedule:                 T1                    T2
■   R1(X),
■   W2(Y),
■   W3(X),
                                         T3
■   R2(X),
■   R1(Y)
■   The precedence graph is
     There is a cycle. Hence it is NOT conflict serializable
                                                           20
Another example of serializability testing. (c) Schedule F.
                                                              21
Precedence graph for schedule F
      T1                  y, x            T2
                                   y, z
               y
                        T3
           Equivalent serial schedules:
           T3,T1, T2
                                               22
Another example of serializability testing. (b) Schedule E
                                                             23
Precedence graph for schedule E
                 y
     T1                   x             T2
                                 y, z
                     T3
          No equivalent serial schedules
          Why?
                                             24
Concurrency Control Technique
■   Two-phase Locking
■   Deadlock
■   Timestamp
                                25
    Concurrency Control Through Locks
■   Lock: variable associated with each data item
    ■   Describes status of item wrt operations that can be performed
        on it
■   Binary locks: Locked/unlocked
    ■   Enforces mutual exclusion
■   Multiple-mode locks: Read/write
    ■   Shared/Exclusive
■   Three operations
    ■   read_lock(X)
    ■   write_lock(X)
    ■   unlock(X)
■   Each data item can be in one of three lock states
                                                                        26
                Implementation
■   Maintain lock table
■   Keep track of locked items and their
    locks
    <data item, LOCK, no_of_reads, locking_transaction>
■   For read locks, keep track of the
    number of transactions that hold a read
    lock on an item
                                                          27
                 Locking Rules
1.   T must issue read_lock(X) or write_lock(X)
     before any read_item(X) op is performed in T
2.   T must issue write_lock(X) before any
     write_item(X) op is performed in T
3.   T must issue unlock(X) after all read_item(X) and
     write_item(X) ops are completed in T
4.   T will not issue a read_lock(X) if it already holds
     a read lock or write lock on X (may be relaxed)
5.   T will not issue a write_lock(X) if it already holds
     a read lock or write lock on X (may be relaxed)
                                                        28
Locking and unlocking operations for
two-mode (read-write or
shared-exclusive) locks.
                                       29
Locking and unlocking operations for
two-mode (read-write or
shared-exclusive) locks.
                                       30
Lock Conversions
■   Sometimes beneficial to relax locking rules 4 and 5
■   Upgrade read lock on X to a write lock (by issuing
    a write_lock(X) )
    ■   Only possible if T is the only transaction holding a read
        lock on X
■   Downgrade a write lock by issuing a read_lock(X)
■   Must be noted in lock table
                                                                    31
              Granting of Locks
■   Suppose T2 has read-lock on item X
■   T1 is requesting write-lock on item X; needs
    to wait for T2 to release
■   T3 requests read-lock on X; request is
    granted
    ■   Assume shortly thereafter T2 relinquishes read-lock
    ■   Continue scenario through a sequence of
        transactions all requesting read-lock on X
    ■   T1 will never make progress
■   T1 is said to be starved
                                                              32
              Granting of Locks
■   How do you avoid starvation in the
    presence of locks?
■   Assume Ti requesting lock on Q
■   Grant lock provided that
    ■   No locking conflict with lock requested by Ti,
        OR
    ■   No other transaction waiting for lock and
        made request before Ti
                                                     33
Transactions that do not obey two-phase locking. (a)
Two transactions T1 and T2. (b) Results of possible
serial schedules of T1 and T2.
                                Result of serial schedule T2
                                followed by T1
                                X= 70, Y=50
                                                               34
Transactions that do not obey two-phase locking. (c) A
nonserializable schedule S that uses locks.
            unlocked too early!
                              Locks Alone Don’t Do the Trick!
                                                           35
        Two-Phase Locking (2PL)
■   Def.: Transaction is said to follow the
    two-phase-locking protocol if all locking
    operations precede the first unlock operation
    ■   Expanding (growing) = first phase
    ■   Shrinking = second phase
■   During the shrinking phase no new locks can
    be acquired!
    ■   Downgrading ok
    ■   Upgrading is not
                                                36
  Transactions T1′ and T2 ′, which are the same as T1
    and T2 of above, but which allow the two-phase
locking protocol. Not that they can produce a deadlock.
                                                          37
Variations to the Basic Protocol
■   Previous technique knows as basic 2PL
■   Conservative 2PL (static) 2PL: Lock all
    items needed BEFORE execution begins
    by predeclaring its read and write set
    ■   If any of the items in read or write set is already
        locked (by other transactions), transaction waits
        (does not acquire any locks)
    ■   Deadlock free but not very realistic
                                                              38
Variations to the Basic Protocol
■   Strict 2PL: Transaction does not release its
    write locks until AFTER it aborts/commits
    ■   Not deadlock free but guarantees recoverable
        schedules (strict schedule: transaction can
        neither read/write X until last transaction that
        wrote X has committed/aborted)
    ■   Most popular variation of 2PL
                                                           39
Variations to the Basic Protocol
■   Rigorous 2PL: No lock is released until
    after abort/commit
    ■   Transaction is in its expanding phase until it ends
                                                         40
    Remarks 2PL
■   Concurrency control subsystem is responsible for
    inserting locks at right places into your transaction
     ■   Strict 2PL is widely used
     ■   Requires use of waiting queue
■   All 2PL locking protocols guarantee serializability
■   Does not permit all possible serial schedules
     ■   Conservative and rigorous 2PL charge a high price for
         serializability
■   However, deadlock-based algorithms may suffer
    from starvation and deadlock.
                                                                 41
               What is Deadlock?
■   Occurs when each transaction Ti in a set of two or more
    transactions is waiting on an item locked by some other
    transaction Tj in the set
    S        T1               T2
    read_lock(Y);
    read_item(Y);
                      read_lock(X);      T1         T2
    write_lock(X);    read_item(X);
                      write_lock(Y);
                                                          42
Deadlock Prevention
■   Locking as deadlock prevention leads to
    very inefficient schedules (e.g.,
    conservative 2PL)
■   Better, use transaction timestamp TS(T)
    ■   TS is unique identifier assigned to each
        transaction
    ■   if T1 starts before T2, then TS(T1) < TS(T2)
        (older has smaller timestamp value)
    ■   Wait-die and wound-wait schemes
                                                       43
Wait-Die Scheme
■   Assume Ti tries to lock X which is locked by Tj
■   If TS(Ti) < TS(Tj) (Ti older than Tj), then Ti is
    allowed to wait
■   Otherwise, Ti younger than Tj, abort Ti (Ti dies)
    and restart later with SAME timestamp
■   Older transaction is allowed to wait on younger
    transaction
■   Younger transaction requesting an item held by
    older transaction is aborted and restarted
                                                        44
           Wound-Wait Scheme
■   Assume Ti tries to lock X which is locked by Tj
■   If TS(Ti) < TS(Tj) (Ti older than Tj), abort Tj (Ti
    wounds Tj) and restart later with SAME timestamp
■   Otherwise, Ti younger than Tj, Ti is allowed to wait
■   Younger transaction is allowed to wait on older
    transaction
■   Older transaction requesting item held by younger
    transaction preempts younger one by aborting it
■   Both schemes abort younger transaction that may
    be involved in deadlock
■   Both deadlock free but may cause needless aborts
                                                           45
        More Deadlock Prevention
■   Waiting schemes (require no timestamps)
■   No waiting: if transaction cannot obtain lock,
    aborted immediately and restarted after time t
     ■ → needless restarts
■   Cautious waiting:
    ■   Suppose Ti tries to lock item X which is locked by Tj
    ■   If Tj is not blocked, Ti is blocked and allowed to wait
    ■   Otherwise abort Ti
    ■   Cautious waiting is deadlock-free
                                                                  46
             Deadlock Detection
■    DBMS checks if deadlock has occurred
     ■ Works well if few short transactions with little
       interference
     ■ Otherwise, use deadlock prevention
■    Two approaches to deadlock detection:
1.   Wait-for graph
     ■ If cycle, abort one of the transactions (victim
       selection)
2.   Timeouts
                                                          47
                   Starvation
■   Transaction cannot continue for indefinite
    amount of time while others proceed normally
■   When? Unfair waiting scheme with priorities for
    certain transactions
     ■ E.g., in deadlock detection, if we choose
       victim always based on cost factors, same
       transaction may always be picked as victim
     ■ Include rollbacks in cost factor
                                                      48