Unit 4 DBMS
Unit 4 DBMS
A transaction is a single, logical unit of work that consists of one or more related
tasks. A transaction is treated as a single, indivisible operation, which means that
either all the tasks within the transaction are executed successfully, or none are.
Here are a few key properties of transactions, often referred to by the acronym ACID:
ACID Properties
1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity: This property states that a transaction must be treated as an atomic unit,
that is, either all of its operations are executed or none. There is no such thing as
partial transaction; if a transaction fails, all the changes made in the database so far
by that transaction are rolled back, and the database remains unchanged.
Consistency: The database must remain in a consistent state after any transaction.
No transaction should have any adverse effect on the data residing in the database. If
the database was in a consistent state before a transaction, then after execution of the
transaction, the database must be back to its consistent state.
Isolation: Each transaction is considered independent of other transactions. That is,
the execution of multiple transactions concurrently will have the same effect as if they
had been executed one after the other. But isolation does not ensure which transaction
will execute first.
Durability: The changes made to the database by a successful transaction persist
even after a system failure.
Transaction State in DBMS
Each and every transactions in DBMS, pass through several states during their
lifespan. These states, known as "Transaction States," collectively constitute the
"Transaction Lifecycle." Here are the main states in the lifecycle of a transaction:
1. Active: This is the initial state of every transaction. In this state, the
transaction is being executed. The transaction remains in this state as long
as it is executing SQL statements.
2. Partially Committed: When a transaction executes its final statement, it is
said to be in a 'partially committed' state. At this point, the transaction has
passed the modification phase but has not yet been committed to the
database. If a failure occurs at this stage, the transaction will roll back.
3. Failed: If a transaction is in a 'partially committed' state and a problem
occurs that prevents the transaction from committing, it is said to be in a
'failed' state. When a transaction is in a 'failed' state, it will trigger a rollback
operation.
4. Aborted: If a transaction is rolled back and the database is restored to its
state before the transaction began, the transaction is said to be 'aborted.'
After a transaction is aborted, it can be restarted again, but this depends
on the policy of the transaction management component of the DBMS.
5. Committed: When a transaction is in a 'partially committed' state and the
commit operation is successful, it is said to be 'committed.' At this point,
the transaction has completed its execution and all of its updates are
permanently stored in the database.
6. Terminated: After a transaction reaches the 'committed' or 'aborted' state,
it is said to be 'terminated.' This is the final state of a transaction.
Note: The actual terms and the number of states may vary depending on the specifics
of the DBMS and the transaction processing model it uses. However, the fundamental
principles remain the same.
Implementation Atomicity
• When a transaction starts, changes are not immediately written to the original
data pages. Instead, new copies of the data pages are created and modified. The
original data remains untouched.
• If the transaction fails or needs to be aborted for some reason, the DBMS simply
discards the new pages. Since the original data hasn't been altered, atomicity is
maintained. No changes from the failed transaction are visible.
Implementation Durability
• Once a transaction is completed successfully, the directory that was pointing to
the original pages is updated to point to the modified pages. This switch can be
done atomically, ensuring durability.
• After the switch, the new pages effectively become the current data, while the old
pages can be discarded or archived.
• Because the actual switching of the pointers is a very fast operation, committing a
transaction can be done quickly, ensuring that once a transaction is declared
complete, its changes are durable.
• Initial State: Initially, the database is in a consistent state. Let's call this the
"Shadow" database.
• Transaction Execution: When transactions are executed, they are not applied
directly to the shadow database. Instead, they are applied to a separate copy of
the database.
| AccountID | Balance |
|-----------|---------|
| 1 | 1000 |
| 2 | 2000 |
3. Commit: The transaction completes successfully. The modified copy becomes the
new shadow database.
| AccountID | Balance |
|-----------|---------|
| 1 | 800 |
| 2 | 2200 |
4. System Failure: If the system crashes after this point, the shadow database
(Version 2) is already in a consistent state, ensuring durability and atomicity.
• Durability: Once changes are committed, they are permanent, even in the case
of system failures.
• Disk I/O: Requires more disk operations because of the separate copy.
3. Decreased Wait Time: Transactions no longer have to wait for other long
transactions to complete.
T1 | T2
----------|-----------
Read(A) |
A = A+50 |
| Read(A)
| A = A+100
Write(A) |
| Write(A)
T1 | T2
------------------|-----------
Read(A) |
A = A+50 |
Write(A) |
| Read(A)
| A = A+100
| Write(A)
Read(A)(rollbacks)|
| commit
Result: T2 has a "dirty" value, that was never committed in T1 and doesn't actually
exist in the database.
T1 | T2
----------|----------
Read(A) |
| Read(A)
| A = A+100
| Write(A)
Read(A) |
Result: Within the same transaction, T1 has read two different values for the same
data item. This inconsistency is the unrepeatable read.
To manage concurrent execution and ensure the consistency and reliability of the
database, DBMSs use concurrency control techniques. These typically include locking
mechanisms, timestamps, optimistic concurrency control, and serializability checks.
Serializability in DBMS
Schedule is an order of multiple transactions executing in concurrent environment.
Serial Schedule: The schedule in which the transactions execute one after the other
is called serial schedule. It is consistent in nature.
For example: Consider following two transactions T1 and T2.
T1 | T2
----------|----------
Read(A) |
Write(A) |
Read(B) |
Write(B) |
| Read(A)
| Write(A)
| Read(B)
| Write(B)
All the operations of transaction T1 on data items A and then B executes and then in
transaction T2 all the operations on data items A and B execute.
Non Serial Schedule: The schedule in which operations present within the
transaction are intermixed. This may lead to conflicts in the result or inconsistency in
the resultant data.
For example- Consider following two transactions,
T1 | T2
----------|----------
Read(A) |
Write(A) |
| Read(A)
| Write(B)
Read(A) |
Write(B) |
| Read(B)
| Write(B)
The above transaction is said to be non serial which result in inconsistency or conflicts
in the data.
Types of Serializability
1. Conflict Serializability
2. View Serializability
Conflict Serializability
Conflict serializability is a form of serializability where the order of non-conflicting
operations is not significant. It determines if the concurrent execution of several
transactions is equivalent to some serial execution of those transactions.
Two operations are said to be in conflict if:
T1 | T2
----------|----------
Read(A) | Read(A)
Read(A) | Read(B)
Write(B) | Read(A)
Read(B) | Write(A)
Write(A) | Write(B)
T1 | T2
----------|----------
Read(A) | Write(A)
Write(A) | Read(A)
Write(A) | Write(A)
T1 | T2
----------|----------
Read(A) |
| Read(A)
Write(A) |
| Read(B)
| Write(B)
\( R1(A) \) conflicts with W1(A), so there's an edge from T1 to T1, but this is ignored
because they´re from the same transaction.
R2(A) conflicts with W1(A), so there's an edge from T2 to T1.
No other conflicting pairs.
The graph has nodes T1 and T2 with an edge from T2 to T1. There are no cycles in
this graph.
Decision: Since the precedence graph doesn't have any cycles,Cycle is a path using
which we can start from one node and reach to the same node. the schedule S is
conflict serializable. The equivalent serial schedules, based on the graph, would be
T2 followed by T1.
View Serializability
View Serializability is one of the types of serializability in DBMS that ensures the
consistency of a database schedule. Unlike conflict serializability, which cares about
the order of conflicting operations, view serializability only cares about the final
outcome. That is, two schedules are view equivalent if they have:
• Initial Read: The same set of initial reads (i.e., a read by a transaction with no
preceding write by another transaction on the same data item).
• Updated Read: For any other writes on a data item in between, if a transaction
\(T_j\) reads the result of a write by transaction \(T_i\) in one schedule, then \(T_j\)
should read the result of a write by \(T_i\) in the other schedule as well.
• Final Write: The same set of final writes (i.e., a write by a transaction with no
subsequent writes by another transaction on the same data item).
Let's understand view serializability with an example:
Consider two transactions \(T_1\) and \(T_2\):
Schedule 1(S1):
| Transaction T1 | Transaction T2 |
|---------------------|---------------------|
| Write(A) | |
| | Read(A) |
| | Write(B) |
| Read(B) | |
| Write(B) | |
| Commit | Commit |
Schedule 2(S2):
| Transaction T1 | Transaction T2 |
|---------------------|---------------------|
| | Read(A) |
| Write(A) | |
| | Write(A) |
| Read(B) | |
| Write(B) | |
| Commit | Commit |
Here,
3. For intermediate writes/reads, in S1, \(T_2\) reads the value of A after \(T_1\) has
written to it. Similarly, in S2, \(T_2\) reads A which can be viewed as if it read the
value after \(T_1\) (even though in actual sequence \(T_2\) read it before \(T_1\)
wrote it). The important aspect is the view or effect is equivalent.
Recoverability in DBMS
Recoverability refers to the ability of a system to restore its state to a point where the
integrity of its data is not compromised, especially after a failure or an error.
When multiple transactions are executing concurrently, issues may arise that affect
the system's recoverability. The interaction between transactions, if not managed
correctly, can result in scenarios where a transaction's effects cannot be undone,
which would violate the system's integrity.
Importance of Recoverability:
The need for recoverability arises because databases are designed to ensure data
reliability and consistency. If a system isn't recoverable:
Levels of Recoverability
1. Recoverable Schedules
A schedule is said to be recoverable if, for any pair of transactions \(T_i\) and \(T_j\),
if \(T_j\) reads a data item previously written by \(T_i\), then \(T_i\) must commit before
\(T_j\) commits. If a transaction fails for any reason and needs to be rolled back, the
system can recover without having to rollback other transactions that have read or
used data written by the failed transaction.
Example of a Recoverable Schedule
Suppose we have two transactions \(T_1\) and \(T_2\).
In the above schedule, \(T_2\) reads a value written by \(T_1\), but \(T_1\) commits
before \(T_2\), making the schedule recoverable.
2. Non-Recoverable Schedules
A schedule is said to be non-recoverable (or irrecoverable) if there exists a pair of
transactions \(T_i\) and \(T_j\) such that \(T_j\) reads a data item previously written by
\(T_i\), but \(T_i\) has not committed yet and \(T_j\) commits before \(T_i\). If \(T_i\)
fails and needs to be rolled back after \(T_j\) has committed, there's no straightforward
way to roll back the effects of \(T_j\), leading to potential data inconsistency.
Example of a Non-Recoverable Schedule
Again, consider two transactions \(T_1\) and \(T_2\).
| Transaction T1 | Transaction T2 |
|---------------------|---------------------|
| Write(A) | |
| | Read(A) |
| | Write(B) |
| | Commit |
| Commit | |
In this schedule, \(T_2\) reads a value written by \(T_1\) and commits before \(T_1\)
does. If \(T_1\) encounters a failure and has to be rolled back after \(T_2\) has
committed, we're left in a problematic situation since we cannot easily roll back \(T_2\),
making the schedule non-recoverable.
3. Cascading Rollback
A cascading rollback occurs when the rollback of a single transaction causes one or
more dependent transactions to be rolled back. This situation can arise when one
transaction reads uncommitted changes of another transaction, and then the latter
transaction fails and needs to be rolled back. Consequently, any transaction that has
read the uncommitted changes of the failed transaction also needs to be rolled back,
leading to a cascade effect.
Example of Cascading Rollback
Consider two transactions \(T_1\) and \(T_2\):
| Transaction T1 | Transaction T2 |
|---------------------|---------------------|
| Write(A) | |
| | Read(A) |
| | Write(B) |
| Abort(some failure) | |
| Rollback | |
| | Rollback (because it read uncommitted changes from T1)
Here, \(T_2\) reads an uncommitted value of A written by \(T_1\). When \(T_1\) fails
and is rolled back, \(T_2\) also has to be rolled back, leading to a cascading rollback.
This is undesirable because it wastes computational effort and can complicate
recovery procedures.
4. Cascadeless Schedules
A schedule is considered cascadeless if transactions only read committed values. This
means, in such a schedule, a transaction can read a value written by another
transaction only after the latter has committed. Cascadeless schedules prevent
cascading rollbacks.
Example of Cascadeless Schedule
Consider two transactions \(T_1\) and \(T_2\):
| Transaction T1 | Transaction T2 |
|---------------------|---------------------|
| Write(A) | |
| Commit | |
| | Read(A) |
| | Write(B) |
| | Commit |
In this schedule, \(T_2\) reads the value of A only after \(T_1\) has committed. Thus,
even if \(T_1\) were to fail before committing (not shown in this schedule), it would not
affect \(T_2\). This means there's no risk of cascading rollback in this schedule.
Isolation is one of the core ACID properties of a database transaction, ensuring that
the operations of one transaction remain hidden from other transactions until
completion. It means that no two transactions should interfere with each other and
affect the other's intermediate state.
Isolation Levels
Isolation levels defines the degree to which a transaction must be isolated from the
data modifications made by any other transaction in the database system. There are
four levels of transaction isolation defined by SQL -
1. Serializable
2. Repeatable Read
• Since other transaction cannot read, update or delete these rows, it avoids non
repeatable read.
3. Read Committed
• Thus it does not allows dirty read (i.e. one transaction reading of data immediately
after written by another transaction).
• The transaction hold a read or write lock on the current row, and thus prevent
other rows from reading, updating or deleting it.
4. Read Uncommitted
• In this level, one transaction may read not yet committed changes made by other
transaction.
Serializable NO
Repeatable Read NO
Read Committed NO
Implementation of Isolation
Implementing isolation typically involves concurrency control mechanisms. Here are
common mechanisms used:
1. Locking Mechanisms
Locking ensures exclusive access to a data item for a transaction. This means that
while one transaction holds a lock on a data item, no other transaction can access that
item.
• Shared Lock (S-lock): Allows a transaction to read an item but not write to it.
• Exclusive Lock (X-lock): Allows a transaction to read and write an item. No other
transaction can read or write until the lock is released.
• Two-phase Locking (2PL): This protocol ensures that a transaction acquires all
the locks before it releases any. This results in a growing phase (acquiring locks
and not releasing any) and a shrinking phase (releasing locks and not acquiring
any).
2. Timestamp-based Protocols
Every transaction is assigned a unique timestamp when it starts. This timestamp
determines the order of transactions. Transactions can only access the database if
they respect the timestamp order, ensuring older transactions get priority.
Testing of Serializability
Serialization Graph is used to test the Serializability of a schedule.
Assume a schedule S. For S, we construct a graph known as precedence graph. This
graph has a pair G = (V, E), where V consists a set of vertices, and E consists a set of
edges. The set of vertices is used to contain all the transactions participating in the
schedule. The set of edges is used to contain all edges Ti ->Tj for which one of the
three conditions holds:
ADVERTISEMENT
o If a precedence graph contains a single edge Ti → Tj, then all the instructions of Ti are
executed before the first instruction of Tj is executed.
o If a precedence graph for schedule S contains a cycle, then S is non-serializable. If the
precedence graph has no cycle, then S is known as serializable.
For example:
ExplanationPla
The precedence graph for schedule S1 contains a cycle that's why Schedule S1 is non-
serializable.
Explanation:
1. Consistency: Without locks, multiple transactions could modify the same data
item simultaneously, resulting in an inconsistent state.
2. Isolation: Locks ensure that the operations of one transaction are isolated from
other transactions, i.e., they are invisible to other transactions until the transaction
is committed.
3. Concurrency: While ensuring consistency and isolation, locks also allow multiple
transactions to be processed simultaneously by the system, optimizing system
throughput and overall performance.
4. Avoiding Conflicts: Locks help in avoiding data conflicts that might arise due to
simultaneous read and write operations by different transactions on the same
data item.
5. Preventing Dirty Reads: With the help of locks, a transaction is prevented from
reading data that hasn't yet been committed by another transaction.
Lock-Based Protocols
1. Simple Lock Based Protocol
The Simple lock based protocol is a mechanism in which there is exclusive use of
locks on the data item for current transaction.
Types of Locks: There are two types of locks used -
T1 | T2
----------|----------
Lock-S(A) |
Read(A) |
Unlock(A) | Lock-X(A)
| Read(A)
| Write(A)
| Unlock(A)
Shared lock is used for when the transaction wants to perform read operation. Exclusive loc
Any number of transactions can hold shared lock on an item. But exclusiv
• Growing Phase: The transaction may obtain any number of locks but cannot
release any.
• Shrinking Phase: The transaction may release but cannot obtain any new locks.
The point where the transaction releases its first lock is the end of the growing phase
and the beginning of the shrinking phase.
This protocol ensures conflict-serializability and avoids many of the concurrency
issues, but it doesn't guarantee the absence of deadlocks.
T1 | T2
----------|----------
Lock-S(A) |
|
| Lock-S(A)
Lock-X(B) |
Unlock(A) |
| Lock-X(C)
Unlock(B) |
| Unlock(A)
| Unlock(C)
• Deadlocks: The main disadvantage of 2PL is that it can lead to deadlocks, where
two or more transactions wait indefinitely for a resource locked by the other.
• Reduced Concurrency (in certain cases): Locking can block transactions, which
can reduce concurrency. For example, if one transaction holds a lock for a long
time, other transactions needing that lock will be blocked.
• Starvation: It's possible for some transactions to get repeatedly delayed if other
transactions are continually requesting and acquiring locks.
• Reduced Concurrency: Since transactions hold onto locks until they commit or
abort, other transactions might be blocked for longer periods.
• Potential for Deadlocks: Holding onto locks can increase the chances of
deadlocks.
• This approach can avoid deadlocks since transactions only start when all their
required locks are available.
• Avoids Deadlocks: By ensuring that all required locks are acquired before a
transaction starts, it inherently avoids the possibility of deadlocks.
• Reduced Concurrency: Transactions might be delayed until all required locks are
available, leading to potential inefficiencies.
• Read Rule: If a transaction \( T \) wants to read an item that was last written by
transaction \( T' \) with \( TS(T') > TS(T) \), the read is rejected because it's trying to
read a value from the future. Otherwise, it can read the item.
• Write Rule: If a transaction \( T \) wants to write an item that has been read or
written by a transaction \( T' \) with \( TS(T') > TS(T) \), the write is rejected. This
avoids overwriting a value that has been read or written by a younger transaction.
3. Handling Violations: When a transaction's read or write operation violates the
rules, the transaction can be rolled back and restarted or aborted, depending on the
specific protocol in use.
Example-2
Suppose two transactions \( T1 \) and \( T2 \) with timestamps 5 and 10 respectively:
According to the read rule, \( T1 \) can't read item \( A \) after \( T2 \) has written it, as
\( TS(T2) > TS(T1) \). So, \( T1 \)'s read operation will be rejected.
2. Fairness: Older transactions have priority over newer ones, ensuring that
transactions do not starve and get executed in a fair manner.
One of the most well-known timestamp-based protocols is the Thomas Write Rule,
which modifies the write rule to allow certain writes that would be rejected under the
basic timestamp protocol. The idea is to ignore a write that would have no effect on
the outcome, rather than rolling back the transaction. This reduces the number of
rollbacks but can result in non-serializable schedules.
In practice, timestamp-based protocols offer an alternative approach to concurrency
control, especially useful in systems where locking leads to frequent deadlocks or
reduced concurrency. However, careful implementation and tuning are required to
handle potential issues like transaction starvation or cascading rollbacks.
Validation Based Protocols in DBMS
Validation-based protocols, also known as Optimistic Concurrency Control (OCC), are
a set of techniques that aim to increase system concurrency and performance by
assuming that conflicts between transactions will be rare. Unlike other concurrency
control methods, which try to prevent conflicts proactively using locks or timestamps,
OCC checks for conflicts only at transaction commit time.
Here’s how a typical validation-based protocol operates:
1. Read Phase
o The transaction reads from the database but does not write to it.
2. Validation Phase
o Before committing, the system checks to ensure that this transaction's local
updates won't cause conflicts with other transactions.
o The validation can take many forms, depending on the specific protocol.
3. Write Phase
o If the transaction passes validation, its updates are applied to the database.
• \( T2 \) reaches the validation phase. The system realizes that since \( T2 \) read
item \( A \), another transaction (i.e., \( T1 \)) has written to \( A \). Therefore, \( T2
\) fails validation.
• High Concurrency: Since transactions don’t acquire locks, more than one
transaction can process the same data item concurrently.
Validation-based protocols work best in scenarios where conflicts are rare. If the
system anticipates many conflicts, then the high rate of transaction restarts might
offset the advantages of increased concurrency.
In the context of database systems, "granularity" refers to the size or extent of a data
item that can be locked by a transaction. The idea behind multiple granularity is to
provide a hierarchical structure that allows locks at various levels, ranging from
coarse-grained (like an entire table) to fine-grained (like a single row or even a single
field). This hierarchy offers flexibility in achieving the right balance between
concurrency and data integrity.
The concept of multiple granularity can be visualized as a tree. Consider a database
system where:
| | NL | IS | IX | S | SIX | X |
|:-----:|:-----:|:------:|:-----:|:------:|:-----:|:------:|
| NL | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| IS | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ |
| IX | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ |
| S | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ |
| SIX | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| X | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ |
• Database recovery means recovering the data when it get deleted, hacked or
damaged accidentally.
• It should check the states of all the transactions, which were being executed.
• A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
• Maintaining the logs of each transaction, and writing them onto some stable
storage before actually modifying the database.
• Maintaining shadow paging, where the changes are done on a volatile memory,
and later, the actual database is updated
Log-Based Recovery
Log-based recovery is a widely used approach in database management systems to
recover from system failures and maintain atomicity and durability of transactions. The
fundamental idea behind log-based recovery is to keep a log of all changes made to
the database, so that after a failure, the system can use the log to restore the database
to a consistent state.
1. Transaction Logging:
For every transaction that modifies the database, an entry is made in the log. This
entry typically includes:
• OLD value: The value of the data item before the modification.
• NEW value: The value of the data item after the modification.
We represent an update log record as <\(T_i\) , \(X_j\) , \(V_1\), \(V_2\)>, indicating
that transaction \(T_i\) has performed a write on data item \(X_j\). \(X_j\) had value
\(V_1\) before the write, and has value \(V_2\) after the write. Other special log records
exist to record significant events during transaction processing, such as the start of a
transaction and the commit or abort of a transaction. Among the types of log records
are:
3. Checkpointing
Periodically, the DBMS might decide to take a checkpoint. A checkpoint is a point of
synchronization between the database and its log. At the time of a checkpoint:
• All the changes in main memory (buffer) up to that point are written to disk.
• A special entry is made in the log indicating a checkpoint. This helps in reducing
the amount of log that needs to be scanned during recovery.
4. Recovery Process
• Redo: If a transaction is identified (from the log) as having committed but its
changes have not been reflected in the database (due to a crash before the
changes could be written to disk), then the changes are reapplied using the 'After
Image' from the log.
Initialization
When the transaction begins, the database system creates a copy of the current page
table. This copy is called the shadow page table.
The actual data pages on disk are not duplicated; only the page table entries are. This
means both the current and shadow page tables point to the same data pages initially.
2. Transaction rollback
3. Checkpoints
4. Restart recovery
Checkpoints
Checkpointing is a technique where the DBMS periodically takes a snapshot of the
current state of the database and writes all changes to the disk. This reduces the
amount of work during recovery.
By creating checkpoints, recovery operations don't need to start from the very
beginning. Instead, they can begin from the most recent checkpoint, thereby
considerably speeding up the recovery process.
During the checkpoint, ongoing transactions might be temporarily suspended, or their
logs might be force-written to the stable storage, depending on the implementation.
Restart Recovery
In the case of a system crash, the recovery manager uses the logs to restore the
database to the most recent consistent state. This process is called restart recovery.
Initially, an analysis phase determines the state of all transactions at the time of the
crash. Committed transactions and those that had not started are ignored.
The redo phase then ensures that all logged updates of committed transactions are
reflected in the database. Lastly, the undo phase rolls back the transactions that were
active during the crash to ensure atomicity.
The presence of checkpoints can make the restart recovery process faster since the
system can start the redo phase from the most recent checkpoint rather than from the
beginning of the log.