Unit – 5
Crash recovery and Concurrency Control
Failure classification: To find that where the problem has occurred, we generalize
a failure into the following categories:
1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches a point
from where it can't go any further. If a few transaction or process is hurt, then
this is called as transaction failure.
Reasons for a transaction failure could be -
1. Logical errors: If a transaction cannot complete due to some code error
or an internal error condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an active
transaction because the database system is not able to execute it. For
example, The system aborts an active transaction, in case of deadlock or
resource unavailability.
2. System Crash
o System failure can occur due to power failure or other hardware or
software failure. Example: Operating system error.
Fail-stop assumption: In the system crash, non-volatile storage is
assumed not to be corrupted.
3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail frequently.
It was a common problem in the early days of technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head crash,
and unreachability to the disk or any other failure, which destroy all or
part of disk storage.
Different type of Recovery techniques and their comparative
analysis:
Database recovery techniques are used in database management systems (DBMS) to
restore a database to a consistent state after a failure or error has occurred. The main
goal of recovery techniques is to ensure data integrity and consistency and prevent
data loss. There are mainly two types of recovery techniques used in DBMS:
Rollback/Undo Recovery Technique: The rollback/undo recovery technique is based
on the principle of backing out or undoing the effects of a transaction that has not
completed successfully due to a system failure or error. This technique is
accomplished by undoing the changes made by the transaction using the log records
stored in the transaction log. The transaction log contains a record of all the
transactions that have been performed on the database. The system uses the log
records to undo the changes made by the failed transaction and restore the database
to its previous state.
Commit/Redo Recovery Technique: The commit/redo recovery technique is based on
the principle of reapplying the changes made by a transaction that has been
completed successfully to the database. This technique is accomplished by using the
log records stored in the transaction log to redo the changes made by the
transaction that was in progress at the time of the failure or error. The system uses
the log records to reapply the changes made by the transaction and restore the
database to its most recent consistent state.
In addition to these two techniques, there is also a third technique called checkpoint
recovery. Checkpoint recovery is a technique used to reduce the recovery time by
periodically saving the state of the database in a checkpoint file. In the event of a
failure, the system can use the checkpoint file to restore the database to the most
recent consistent state before the failure occurred, rather than going through the
entire log to recover the database.
Overall, recovery techniques are essential to ensure data consistency and availability
in DBMS, and each technique has its own advantages and limitations that must be
considered in the design of a recovery system
Database systems, like any other computer system, are subject to failures but the
data stored in them must be available as and when required. When a database fails it
must possess the facilities for fast recovery. It must also have atomicity i.e. either
transaction are completed successfully and committed (the effect is recorded
permanently in the database) or the transaction should have no effect on the
database. There are both automatic and non-automatic ways for both, backing up of
data and recovery from any failure situations. The techniques used to recover the lost
data due to system crashes, transaction errors, viruses, catastrophic failure, incorrect
commands execution, etc. are database recovery techniques. So to prevent data loss
recovery techniques based on deferred update and immediate update or backing up
data can be used. Recovery techniques are heavily dependent upon the existence of
a special file known as a system log. It contains information about the start and end
of each transaction and any updates which occur during the transaction. The log
keeps track of all transaction operations that affect the values of database items. This
information is needed to recover from transaction failure.
The log is kept on disk start_transaction(T): This log entry records that transaction T
starts the execution.
read_item(T, X): This log entry records that transaction T reads the value of database
item X.
write_item(T, X, old_value, new_value): This log entry records that transaction T
changes the value of the database item X from old_value to new_value. The old value
is sometimes known as a before an image of X, and the new value is known as an
afterimage of X.
commit(T): This log entry records that transaction T has completed all accesses to the
database successfully and its effect can be committed (recorded permanently) to the
database.
abort(T): This records that transaction T has been aborted.
checkpoint: Checkpoint is a mechanism where all the previous logs are removed
from the system and stored permanently in a storage disk. Checkpoint declares a
point before which the DBMS was in a consistent state, and all the transactions were
committed.
A transaction T reaches its commit point when all its operations that access the
database have been executed successfully i.e. the transaction has reached the point
at which it will not abort (terminate without completing). Once committed, the
transaction is permanently recorded in the database. Commitment always involves
writing a commit entry to the log and writing the log to disk. At the time of a system
crash, item is searched back in the log for all transactions T that have written a
start_transaction(T) entry into the log but have not written a commit(T) entry yet;
these transactions may have to be rolled back to undo their effect on the database
during the recovery process.
Undoing – If a transaction crashes, then the recovery manager may undo
transactions i.e. reverse the operations of a transaction. This involves examining a
transaction for the log entry write_item(T, x, old_value, new_value) and set the value
of item x in the database to old-value. There are two major techniques for recovery
from non-catastrophic transaction failures: deferred updates and immediate updates.
Deferred update – This technique does not physically update the database on disk
until a transaction has reached its commit point. Before reaching commit, all
transaction updates are recorded in the local transaction workspace. If a transaction
fails before reaching its commit point, it will not have changed the database in any
way so UNDO is not needed. It may be necessary to REDO the effect of the
operations that are recorded in the local transaction workspace, because their effect
may not yet have been written in the database. Hence, a deferred update is also
known as the No-undo/redo algorithm
Immediate update – In the immediate update, the database may be updated by
some operations of a transaction before the transaction reaches its commit point.
However, these operations are recorded in a log on disk before they are applied to
the database, making recovery still possible. If a transaction fails to reach its commit
point, the effect of its operation must be undone i.e. the transaction must be rolled
back hence we require both undo and redo. This technique is known as undo/redo
algorithm.
Caching/Buffering – In this one or more disk pages that include data items to be
updated are cached into main memory buffers and then updated in memory before
being written back to disk. A collection of in-memory buffers called the DBMS cache
is kept under the control of DBMS for holding these buffers. A directory is used to
keep track of which database items are in the buffer. A dirty bit is associated with
each buffer, which is 0 if the buffer is not modified else 1 if modified.
Shadow paging – It provides atomicity and durability. A directory with n entries is
constructed, where the ith entry points to the ith database page on the link. When a
transaction began executing the current directory is copied into a shadow directory.
When a page is to be modified, a shadow page is allocated in which changes are
made and when it is ready to become durable, all pages that refer to the original are
updated to refer new replacement page.
Backward Recovery – The term “Rollback ” and “UNDO” can also refer to backward
recovery. When a backup of the data is not available and previous modifications
need to be undone, this technique can be helpful. With the backward recovery
method, unused modifications are removed and the database is returned to its prior
condition. All adjustments made during the previous traction are reversed during the
backward recovery. In another word, it reprocesses valid transactions and undoes the
erroneous database updates.
Forward Recovery – “Roll forward “and “REDO” refers to forwarding recovery. When a
database needs to be updated with all changes verified, this forward recovery
technique is helpful.
Some failed transactions in this database are applied to the database to roll those
modifications forward. In another word, the database is restored using preserved
data and valid transactions counted by their past saves.
Some of the backup techniques are as follows:
Full database backup – In this full database including data and database, Meta
information needed to restore the whole database, including full-text catalogs are
backed up in a predefined time series.
Differential backup – It stores only the data changes that have occurred since the last
full database backup. When some data has changed many times since last full
database backup, a differential backup stores the most recent version of the changed
data. For this first, we need to restore a full database backup.
Transaction log backup – In this, all events that have occurred in the database, like a
record of every single statement executed is backed up. It is the backup of
transaction log entries and contains all transactions that had happened to the
database. Through this, the database can be recovered to a specific point in time. It is
even possible to perform a backup from a transaction log if the data files are
destroyed and not even a single committed transaction is lost.
1. Deferred Update: It is a technique for the maintenance of the transaction
log files of the DBMS. It is also called NO-UNDO/REDO technique. It is used
for the recovery of transaction failures that occur due to power, memory, or
OS failures. Whenever any transaction is executed, the updates are not made
immediately to the database. They are first recorded on the log file and then
those changes are applied once the commit is done. This is called the “Re-
doing” process. Once the rollback is done none of the changes are applied to
the database and the changes in the log file are also discarded. If the commit
is done before crashing the system, then after restarting the system the
changes that have been recorded in the log file are thus applied to the
database.
2. Immediate Update: It is a technique for the maintenance of the
transaction log files of the DBMS. It is also called UNDO/REDO technique. It is
used for the recovery of transaction failures that occur due to power, memory,
or OS failures. Whenever any transaction is executed, the updates are made
directly to the database and the log file is also maintained which contains
both old and new values. Once the commit is done, all the changes get stored
permanently in the database, and records in the log file are thus discarded.
Once rollback is done the old values get restored in the database and all the
changes made to the database are also discarded. This is called the “Un-
doing” process. If the commit is done before crashing the system, then after
restarting the system the changes are stored permanently in the database.
Difference between Deferred update and Immediate update:
Deferred Update Immediate Update
In a deferred update, the changes are In an immediate update, the changes
not applied immediately to the are applied directly to the database.
database.
The log file contains all the changes The log file contains both old as well
that are to be applied to the database. as new values.
In this method once rollback is done In this method once rollback is done
all the records of log file are discarded the old values are restored to the
and no changes are applied to the database using the records of the log
database file.
Concepts of buffering and caching are Concept of shadow paging is used in
used in deferred update method. immediate update method.
The major disadvantage of this The major disadvantage of this
method is that it requires a lot of time method is that there are frequent I/O
for recovery in case of system failure. operations while the transaction is
active.
in this method of recovery, firstly the In this method of recovery, the
changes carried out by a transaction database gets directly updated after
on the data are done in the log file the changes made by the transaction
and then applied to the database on and the log file keeps the old and new
commit. Here, the maintained record values. In the case of rollback, these
gets discarded on rollback and thus, records are used to restore old values.
not applied to the database.
Shadow paging:
Shadow paging is one of the techniques that is used to recover from failure. We all
know that recovery means to get back the information, which is lost. It helps to
maintain database consistency in case of failure.
Concept of shadow paging
Now let see the concept of shadow paging step by step −
• Step 1 − Page is a segment of memory. Page table is an index of pages.
Each table entry points to a page on the disk.
• Step 2 − Two page tables are used during the life of a transaction: the
current page table and the shadow page table. Shadow page table is a
copy of the current page table.
• Step 3 − When a transaction starts, both the tables look identical, the
current table is updated for each write operation.
• Step 4 − The shadow page is never changed during the life of the
transaction.
• Step 5 − When the current transaction is committed, the shadow page
entry becomes a copy of the current page table entry and the disk block
with the old data is released.
• Step 6 − The shadow page table is stored in non-volatile memory. If the
system crash occurs, then the shadow page table is copied to the
current page table.
The shadow paging is represented diagrammatically as follows −
Advantages
The advantages of shadow paging are as follows −
• No need for log records.
• No undo/ Redo algorithm.
• Recovery is faster.
Disadvantages
The disadvantages of shadow paging are as follows −
• Data is fragmented or scattered.
• Garbage collection problem. Database pages containing old versions of
modified data need to be garbage collected after every transaction.
• Concurrent transactions are difficult to execute.
Checkpoint
o The checkpoint is a type of mechanism where all the previous logs are removed from
the system and permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the transaction, such
checkpoints are marked, and the transaction is executed then using the steps of the
transaction, the log files will be created.
o When it reaches to the checkpoint, then the transaction will be updated into the
database, and till that point, the entire log file will be removed from the file. Then the
log file is updated with the new step of transaction till next checkpoint and so on.
o The checkpoint is used to declare a point before which the DBMS was in the consistent
state, and all transactions were committed.
Recovery using Checkpoint
In the following manner, a recovery system recovers the database from this failure:
o The recovery system reads log files from the end to start. It reads log files from
T4 to T1.
o Recovery system maintains two lists, a redo-list, and an undo-list.
o The transaction is put into redo state if the recovery system sees a log with <Tn,
Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their
previous list, all the transactions are removed and then redone before saving
their logs.
o For example: In the log file, transaction T2 and T3 will have <Tn, Start> and
<Tn, Commit>. The T1 transaction will have only <Tn, commit> in the log file.
That's why the transaction is committed after the checkpoint is crossed. Hence
it puts T1, T2 and T3 transaction into redo list.
o The transaction is put into undo state if the recovery system sees a log with
<Tn, Start> but no commit or abort log found. In the undo-list, all the
transactions are undone, and their logs are removed.
o For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo
list since this transaction is not yet complete and failed amid.
On-line backup during database updates:
It is imperative to have a backup of the database in case the original is corrupted or
lost because of any reason. Using this backup, the database can be recovered as it
was before the failure.
Database backup basically means that a duplicate of the database information and
data is created and stored in backup server just to be on the safe side. Transaction
logs are also stored in the backup along with the database data because without
them, the data would be useless.
Reasons of Failure in a Database
There can be multiple reasons of failure in a database because of which a database
backup and recovery plan is required. Some of these reasons are:
• User Error - Normally, user error is the biggest reason of data destruction
or corruption in a database. To rectify the error, the database needs to be
restored to the point in time before the error occured.
• Hardware Failure - This can also lead to loss of data in a database. The
database is stored on multiple hard drives across various locations. These
hard drives may sometimes malfunction leading to database corruption.
So, it is important to periodically change them.
• Catastrophic Event - A catastrophic event can be a natural calamity like
a flood or earthquake or deliberate sabotage such as hacking of the
database. Either way, the database data may be corrupted and backup
may be required.
Methods of Backup
The different methods of backup in a database are:
• Full Backup - This method takes a lot of time as the full copy of the
database is made including the data and the transaction records.
• Transaction Log - Only the transaction logs are saved as the backup in
this method. To keep the backup file as small as possible, the previous
transaction log details are deleted once a new backup record is made.
• Differential Backup - This is similar to full backup in that it stores both
the data and the transaction records. However only that information is
saved in the backup that has changed since the last full backup. Because
of this, differential backup leads to smaller files.
Database Recovery
There are two methods that are primarily used for database recovery. These are:
• Log based recovery - In log based recovery, logs of all database
transactions are stored in a secure area so that in case of a system failure,
the database can recover the data. All log information, such as the time of
the transaction, its data etc. should be stored before the transaction is
executed.
• Shadow paging - In shadow paging, after the transaction is completed
its data is automatically stored for safekeeping. So, if the system crashes
in the middle of a transaction, changes made by it will not be reflected in
the database.
Concurrency Control
DBMS Concurrency Control
Concurrency Control is the management procedure that is required for controlling
concurrent execution of the operations that take place on a database.
But before knowing about concurrency control, we should know about concurrent
execution.
Concurrent Execution in DBMS
o In a multi-user system, multiple users can access and use the same database at one
time, which is known as the concurrent execution of the database. It means that the
same database is executed simultaneously on a multi-user system by different users.
o While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case,
concurrent execution of the database is performed.
o The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations,
thus maintaining the consistency of the database. Thus, on making the concurrent
execution of the transaction operations, there occur several challenging problems that
need to be solved.
Problems with Concurrent Execution
In a database transaction, the two main operations are READ and WRITE operations.
So, there is a need to manage these two operations in the concurrent execution of the
transactions as if these operations are not performed in an interleaved manner, and
the data may become inconsistent. So, the following problems occur with the
Concurrent Execution of the operations:
Problem 1: Lost Update Problems (W - W Conflict)
The problem occurs when two different database transactions perform the read/write
operations on the same database items in an interleaved manner (i.e., concurrent
execution) that makes the values of the items incorrect hence making the database
inconsistent.
For example:
Consider the below diagram where two transactions TX and TY, are performed on
the same account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that becomes $250 (only
deducted and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that will be
$300 only because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400 (only
added but not updated/write).
o At time t6, transaction TX writes the value of account A that will be updated as
$250 only, as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it will write
as done at time t4 that will be $400. It means the value written by TX is lost, i.e.,
$250 is lost.
Hence data becomes incorrect, and database sets to inconsistent.
Dirty Read Problems (W-R Conflict)
The dirty read problem occurs when one transaction updates an item of the database,
and somehow the transaction fails, and before the data gets rollback, the updated
database item is accessed by another transaction. There comes the Read-Write Conflict
between both transactions.
For example:
Consider two transactions TX and TY in the below diagram performing read/write
operations on account A where the available balance in account A is $300:
o At time t1, transaction TX reads the value of account A, i.e., $300.
o At time t2, transaction TX adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e., $350.
o Then at time t4, transaction TY reads account A that will be read as $350.
o Then at time t5, transaction TX rollbacks due to server problem, and the value
changes back to $300 (as initially).
o But the value for account A remains $350 for transaction TY as committed, which
is the dirty read and therefore known as the Dirty Read Problem.
Unrepeatable Read Problem (W-R Conflict)
Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two
different values are read for the same database item.
For example:
Consider two transactions, TX and TY, performing the read/write operations on account
A, having an available balance = $300. The diagram is shown below:
o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100 to the
available balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction TX reads the available value of account A, and that will
be read as $400.
o It means that within the same transaction TX, it reads two different values of account A,
i.e., $ 300 initially, and after updation made by transaction TY, it reads $400. It is an
unrepeatable read and is therefore known as the Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid such problems that
take place in concurrent execution, management is needed, and that is where the
concept of Concurrency Control comes into role.
Concurrency Control
Concurrency Control is the working concept that is required for controlling and
managing the concurrent execution of database operations and thus avoiding the
inconsistencies in the database. Thus, for maintaining the concurrency of the database,
we have the concurrency control protocols.
Concurrency Control Protocols
The concurrency control protocols ensure the atomicity, consistency, isolation,
durability and serializability of the concurrent execution of the database transactions.
Therefore, these protocols are categorized as:
o Lock Based Concurrency Control Protocol
o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol
We will understand and discuss each protocol one by one in our next sections.
Locking techniques:
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by
the transaction.
o It can be shared between the transactions because when the transaction holds a lock,
then it can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same
data simultaneously.
There are four types of lock protocols available:
1. Simplistic lock protocol
It is the simplest way of locking the data while transaction. Simplistic lock-based
protocols allow all the transactions to get the lock on the data before insert or delete
or update on it. It will unlock the data item after completing the transaction.
2. Pre-claiming Lock Protocol
o Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which
they need locks.
o Before initiating an execution of the transaction, it requests DBMS for all the lock on
all those data items.
o If all the locks are granted then this protocol allows the transaction to begin. When the
transaction is completed then it releases all the lock.
o If all the locks are not granted then this protocol allows the transaction to rolls back
and waits until all the locks are granted.
3. Two-phase locking (2PL)
o The two-phase locking protocol divides the execution phase of the transaction
into three parts.
o In the first part, when the execution of the transaction starts, it seeks permission
for the lock it requires.
o In the second part, the transaction acquires all the locks. The third phase is
started as soon as the transaction releases its first lock.
o In the third phase, the transaction cannot demand any new locks. It only
releases the acquired locks.
There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be acquired
by the transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction may be
released, but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase can
happen:
1. Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.
2. Downgrading of lock (from X(a) to S(a)) must be done in shrinking phase.
Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
o Growing phase: from step 1-3
o Shrinking phase: from step 5-7
o Lock point: at 3
Transaction T2:
o Growing phase: from step 2-6
o Shrinking phase: from step 8-9
o Lock point: at 6
4. Strict Two-phase locking (Strict-2PL)
o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring
all the locks, the transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not
release a lock after using it.
o Strict-2PL waits until the whole transaction to commit, and then it releases all
the locks at a time.
o Strict-2PL protocol does not have shrinking phase of lock release.
It does not have cascading abort as 2PL does.
Timestamp Ordering Protocol
o The Timestamp Ordering Protocol is used to order the transactions based on their
Timestamps. The order of transaction is nothing but the ascending order of the
transaction creation.
o The priority of the older transaction is higher that's why it executes first. To determine
the timestamp of the transaction, this protocol uses system time or logical counter.
o The lock-based protocol is used to manage the order between conflicting pairs among
transactions at the execution time. But Timestamp based protocols start working as
soon as a transaction is created.
o Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has
entered the system at 007 times and transaction T2 has entered the system at 009
times. T1 has the higher priority, so it executes first as it is entered the system first.
o The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write'
operation on a data.
Basic Timestamp ordering protocol works as follows:
1. Check the following condition whenever a transaction Ti issues a Read
(X) operation:
o If W_TS(X) >TS(Ti) then the operation is rejected.
o If W_TS(X) <= TS(Ti) then the operation is executed.
o Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues
a Write(X) operation:
o If TS(Ti) < R_TS(X) then the operation is rejected.
o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the
operation is executed.
Where,
TS(TI) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
Advantages and Disadvantages of TO protocol:
o TO protocol ensures serializability since the precedence graph is as follows:
o TS protocol ensures freedom from deadlock that means no transaction ever
waits.
o But the schedule may not be recoverable and may not even be cascade- free.
Multiversion Concurrency Control Techniques
It is essential to maintain data consistency and prevent concurrency issues in
database systems. It should be multiple transactions accessing the same data
simultaneously. Multiversion Concurrency Control (MVCC) techniques provide an
efficient and effective way to achieve this.
In this article, we will discuss Multiversion Concurrency Control (MVCC) techniques,
its various types, and properties.
Concurrency Control Protocols
Database systems provide concurrency control to ensure isolation among
transactions. It maintains consistency of the database through consistent execution
of transactions. It also resolves conflicts arising from read-write and write-read
operations. There are various techniques used for concurrency control −
• Two-phase locking protocol
• Time-stamp ordering protocol
• Multi-version concurrency control
• Validation concurrency control
Let's discuss the two-phase locking protocol briefly. Locking is an essential
operation in the protocol that provides permission to read or write a data item. The
two-phase locking protocol is a process that enables acquiring shared resources
without creating the possibility of deadlock. The protocol involves three main
activities:
• (i) Lock Acquisition
• (ii) Modification of Data
• (iii) Release of Lock
In distributed systems, two-phase locking prevents deadlock by releasing all the
acquired resources if it's impossible to acquire all the required resources without
waiting for another process to finish using a lock. This ensures that no process holds
a shared resource while waiting for another process to release it, thus preventing
deadlock due to resource contention.
A transaction in the two-phase locking protocol can assume either the growing
phase or the shrinking phase.
• In the growing phase, a transaction can acquire locks but cannot release
any lock until it reaches the lock point, which is the point where a
transaction acquires all the necessary locks.
• In the shrinking phase, a transaction can only release locks but cannot
acquire any new locks.
Multiversion concurrency control (MVCC)
Multi-version protocol aims to reduce the delay for read operations. It maintains
multiple versions of data items. Whenever a write operation is performed, the
protocol creates a new version of the transaction data to ensure conflict-free and
successful read operations.
The newly created version contains the following information −
• Content − This field contains the data value of that version.
• Write_timestamp − This field contains the timestamp of the transaction
that created the new version.
• Read_timestamp − This field contains the timestamp of the transaction
that will read the newly created value.
By creating multiple versions of the data, the multi-version protocol ensures that
read operations can access the appropriate version of the data without encountering
conflicts. The protocol thus enables efficient concurrency control and reduces delays
in read operations.
Optimistic concurrency control
All data items are updated at the end of the transaction, at the end, if any data item
is found inconsistent with respect to the value in, then the transaction is rolled back.
Check for conflicts at the end of the transaction. No checking while the transaction
is executing. Checks are all made at once, so low transaction execution overhead.
Updates are not applied until end-transaction. They are applied to local copies in a
transaction space.
Phases
The optimistic concurrency control has three phases, which are explained below −
Read Phase
Various data items are read and stored in temporary variables (local copies). All
operations are performed in these variables without updating the database.
Validation Phase
All concurrent data items are checked to ensure serializability will not be validated
if the transaction updates are actually applied to the database. Any changes in the
value cause the transaction rollback. The transaction timestamps are used and the
write-sets and read-sets are maintained.
To check that transaction A does not interfere with transaction B the following must
hold −
• TransB completes its write phase before TransA starts the read phase.
• TransA starts its write phase after TransB completes its write phase,
and the read set of TransA has no items in common with the write set
of TransB.
• Both the read set and write set of TransA have no items in common with
the write set of TransB and TransB completes its read before TransA
completes its read Phase.
Write Phase
The transaction updates applied to the database if the validation is successful.
Otherwise, updates are discarded and transactions are aborted and restarted. It does
not use any locks hence deadlock free, however starvation problems of data items
may occur.
Problem
S: W1(X), r2(Y), r1(Y), r2(X).
T1 -3
T2 – 4
Check whether timestamp ordering protocols allow schedule S.
Solution
Initially for a data-item X, RTS(X)=0, WTS(X)=0
Initially for a data-item Y, RTS(Y)=0, WTS(Y)=0
For W1(X) : TS(Ti)<RTS(X) i.e.
TS(T1)<RTS(X)
TS(T1)<WTS(X)
3<0 (FALSE)
=>goto else and perform write operation w1(X) and WTS(X)=3
For r2(Y): TS(T2)<WTS(Y)
4<0 (FALSE)
=>goto else and perform read operation r2(Y) and RTS(Y)=4
For r1(Y) :TS(T1)<WTS(Y)
3<0 (FALSE)
=>goto else and perform read operation r1(Y).
For r2(X) : TS(T2)<WTS(X)
4<3 (FALSE)
=>goto else and perform read operation r2(X) and RTS(X)=4
Multiple Granularity
Let's start by understanding the meaning of granularity.
Granularity: It is the size of data item allowed to lock.
Multiple Granularity:
o It can be defined as hierarchically breaking up the database into blocks which can be
locked.
o The Multiple Granularity protocol enhances concurrency and reduces lock overhead.
o It maintains the track of what to lock and how to lock.
o It makes easy to decide either to lock a data item or to unlock a data item. This type of
hierarchy can be graphically represented as a tree.
For example: Consider a tree which has four levels of nodes.
o The first level or higher level shows the entire database.
o The second level represents a node of type area. The higher level database
consists of exactly these areas.
o The area consists of children nodes which are known as files. No file can be
present in more than one area.
o Finally, each file contains child nodes known as records. The file has exactly
those records that are its child nodes. No records represent in more than one
file.
o Hence, the levels of the tree starting from the top level are as follows:
1. Database
2. Area
3. File
4. Record
In this example, the highest level shows the entire database. The levels below are file,
record, and fields.
There are three additional lock modes with multiple granularity:
Intention Mode Lock
Intention-shared (IS): It contains explicit locking at a lower level of the tree but only
with shared locks.
Intention-Exclusive (IX): It contains explicit locking at a lower level with exclusive or
shared locks.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared mode,
and some node is locked in exclusive mode by the same transaction.
Compatibility Matrix with Intention Lock Modes: The below table describes the
compatibility matrix for these lock modes:
It uses the intention lock modes to ensure serializability. It requires that if a transaction
attempts to lock a node, then that node must follow these protocols:
o Transaction T1 should follow the lock-compatibility matrix.
o Transaction T1 firstly locks the root of the tree. It can lock it in any mode.
o If T1 currently has the parent of the node locked in either IX or IS mode, then
the transaction T1 will lock a node in S or IS mode only.
o If T1 currently has the parent of the node locked in either IX or SIX modes, then
the transaction T1 will lock a node in X, SIX, or IX mode only.
o If T1 has not previously unlocked any node only, then the Transaction T1 can
lock a node.
o If T1 currently has none of the children of the node-locked only, then
Transaction T1 will unlock a node.
Observe that in multiple-granularity, the locks are acquired in top-down order, and
locks must be released in bottom-up order.
o If transaction T1 reads record Ra9 in file Fa, then transaction T1 needs to lock the
database, area A1 and file Fa in IX mode. Finally, it needs to lock Ra2 in S mode.
o If transaction T2 modifies record Ra9 in file Fa, then it can do so after locking the
database, area A1 and file Fa in IX mode. Finally, it needs to lock the Ra9 in X
mode.
o If transaction T3 reads all the records in file Fa, then transaction T3 needs to lock
the database, and area A in IS mode. At last, it needs to lock Fa in S mode.
o If transaction T4 reads the entire database, then T4 needs to lock the database
in S mode.
Database Security
database and functions can be managed by two different modes of security controls:
1. Authentication
2. Authorization
Authentication
Authentication is the process of confirming that a user logs in only in accordance with
the rights to perform the activities he is authorized to perform. User authentication
can be performed at operating system level or database level itself. By using
authentication tools for biometrics such as retina and figure prints are in use to keep
the database from hackers or malicious users.
The database security can be managed from outside the db2 database system. Here
are some type of security authentication process:
• Based on Operating System authentications.
• Lightweight Directory Access Protocol (LDAP)
For DB2, the security service is a part of operating system as a separate product. For
Authentication, it requires two different credentials, those are userid or username,
and password.
Authorization
You can access the DB2 Database and its functionality within the DB2 database
system, which is managed by the DB2 Database manager. Authorization is a process
managed by the DB2 Database manager. The manager obtains information about the
current authenticated user, that indicates which database operation the user can
perform or access.
Here are different ways of permissions available for authorization:
Primary permission: Grants the authorization ID directly.
Secondary permission: Grants to the groups and roles if the user is a member
Public permission: Grants to all users publicly.
Context-sensitive permission: Grants to the trusted context role.
Access Control
Database access control is a method of allowing access to company’s
sensitive data only to those people (database users) who are allowed to
access such data and to restrict access to unauthorized persons. It
includes two main components: authentication and authorization.
Authentication is a method of verifying the identity of a person who is
accessing your database. Note that authentication isn’t enough to protect
data. An additional layer of security is required, authorization, which
determines whether a user should be allowed to access the data or make
the transaction he’s attempting. Without authentication and authorization,
there is no data security.
Any company whose employees connect to the Internet, thus, every
company today, needs some level of access control implemented.
Types of Access Control
Obsolete access models include Discretionary Access Control (DAC) and
Mandatory Access Control (MAC). Role Based Access Control (RBAC) is
the most common method today, and the most recent model is Attribute
Based Access Control (ABAC).
Discretionary Access Control (DAC)
With DAC models, the data owner allows access. DAC is a means of
assigning access rights based on user-specified rules.
Mandatory Access Control (MAC)
MAC was developed using a nondiscretionary model, in which people are
granted access based on an information clearance. MAC is a policy in
which access rights are assigned based on central authority regulations.
Role Based Access Control (RBAC)
RBAC grants access based on a user’s role and implements key security
principles such as “least privilege” and “separation of privilege.” Thus,
someone attempting to access information can only access data necessary
for their role.
Attribute Based Access Control (ABAC)
In ABAC, each resource and user are assigned a series of attributes. In this
dynamic method, a comparative assessment of the user’s attributes,
including time of day, position and location, are used to make a decision on
access to a resource.
How it Works
Let’s take a look how access control works in DataSunrise.
Two-Factor Authentication
DataSunrise includes two-factor authentication mechanisms based on
emails and one-time passwords (OTP) which allow to access the target
database. Database users should input database’s password and complete
email-based or Google Authenticator based authentication to get access to
the target database.
Database Access Restriction
DataSunrise features Data Security component which enables you to
restrict access to a complete database or certain database objects
depending on the following factors:
• Database username;
• Client application;
• Application username;
• IP address or hostname;
• Operating system user;
• Number of unsuccessful login attempts;
• Query text.
Thus, DataSunrise utilizes the ABAC method of access control. Data
Security’s functionality is based on security rules created by DataSunrise
administrator.
Nobody in an organization should have free rein to access any resource.
Access control is the combination of policies and technologies that
decide which authenticated users may access which resources. Security
requirements, infrastructure, and other considerations lead companies to
choose among the four most common access control models:
Mandatory Access Control (MAC)
Discretionary Access Control (DAC)
Role-Based Access Control (RBAC)
Privileged Access Management (PAM)
We will review the advantages and disadvantages of each model. Then
we will explore how, given the shift to remote and blended workforces,
security professionals want more dynamic approaches to access control.
What is mandatory access control (MAC)?
Mandatory access control uses a centrally managed model to provide
the highest level of security. A non-discretionary system, MAC reserves
control over access policies to a centralized security administration.
MAC works by applying security labels to resources and individuals.
These security labels consist of two elements:
Classification and clearance — MAC relies on a classification system
(restricted, secret, top-secret, etc.) that describes a resource’s sensitivity.
Users’ security clearances determine what kinds of resources they may
access.
Compartment — A resource’s compartment describes the group of
people (department, project team, etc.) allowed access. A user’s
compartment defines the group or groups they participate in.
A user may only access a resource if their security label matches the
resource’s security label.
MAC originated in the military and intelligence community. Beyond the
national security world, MAC implementations protect some companies’
most sensitive resources. Banks and insurers, for example, may use MAC
to control access to customer account data.
Advantages of MAC
Enforceability — MAC administrators set organization-wide policies that
users cannot override, making enforcement easier.
Compartmentalization — Security labels limit the exposure of each
resource to a subset of the user base.
Disadvantages of MAC
Collaboration — MAC achieves security by constraining communication.
Highly collaborative organizations may need a less restrictive approach.
Management burden — A dedicated organizational structure must
manage the creation and maintenance of security labels.
What is discretionary access control (DAC)?
Discretionary access control decentralizes security decisions to resource
owners. The owner could be a document’s creator or a department’s
system administrator. DAC systems use access control lists (ACLs) to
determine who can access that resource. These tables pair individual and
group identifiers with their access privileges.
The sharing option in most operating systems is a form of DAC. For each
document you own, you can set read/write privileges and password
requirements within a table of individuals and user groups. System
administrators can use similar techniques to secure access to network
resources.
Advantages of DAC
Conceptual simplicity — ACLs pair a user with their access privileges. As
long as the user is in the table and has the appropriate privileges, they
may access the resource.
Responsiveness to business needs — Since policy change requests do
not need to go through a security administration, decision-making is
more nimble and aligned with business needs.
Disadvantages of DAC
Over/underprivileged users — A user can be a member of multiple,
nested workgroups. Conflicting permissions may over- or under privilege
the user.
Limited control — Security administrators cannot easily see how
resources are shared within the organization. And although viewing a
resource’s ACL is straightforward, seeing one user’s privileges requires
searching every ACL.
Compromised security — By giving users discretion over access policies,
the resulting inconsistencies and missing oversight could undermine the
organization’s security posture.
What is role-based access control (RBAC)?
Role-based access control grants access privileges based on the work
that individual users do. A popular way of implementing “least privilege‚
policies, RBAC limits access to just the resources users need to do their
jobs.
Implementing RBAC requires defining the different roles within the
organization and determining whether and to what degree those roles
should have access to each resource.
Accounts payable administrators and their supervisor, for example, can
access the company’s payment system. The administrators’ role limits
them to creating payments without approval authority. Supervisors, on
the other hand, can approve payments but may not create them.
Advantages of RBAC
Flexibility — Administrators can optimize an RBAC system by assigning
users to multiple roles, creating hierarchies to account for levels of
responsibility, constraining privileges to reflect business rules, and
defining relationships between roles.
Ease of maintenance — With well-defined roles, the day-to-day
management is the routine on-boarding, off-boarding, and cross-
boarding of users’ roles.
Centralized, non-discretionary policies — Security professionals can set
consistent RBAC policies across the organization.
Lower risk exposure — Under RBAC, users only have access to the
resources their roles justify, greatly limiting potential threat vectors.
Disadvantages of RBAC
Complex deployment — The web of responsibilities and relationships in
larger enterprises makes defining roles so challenging that it spawned its
own subfield: role engineering.
Balancing security with simplicity — More roles and more granular roles
provide greater security, but administering a system where users have
dozens of overlapping roles becomes more difficult.
Layered roles and permissions — Assigning too many roles to users also
increases the risk of over-privileging users.
Intrusion Detection System (IDS):
A system called an intrusion detection system (IDS) observes network traffic for
malicious transactions and sends immediate alerts when it is observed. It is software
that checks a network or system for malicious activities or policy violations. Each
illegal activity or violation is often recorded either centrally using a SIEM system or
notified to an administration. IDS monitors a network or system for malicious activity
and protects a computer network from unauthorized access from users, including
perhaps insiders. The intrusion detector learning task is to build a predictive model
(i.e. a classifier) capable of distinguishing between ‘bad connections’
(intrusion/attacks) and ‘good (normal) connections’.
How does an IDS work?
An IDS (Intrusion Detection System) monitors the traffic on a computer network to
detect any suspicious activity.
It analyzes the data flowing through the network to look for patterns and signs of
abnormal behavior.
The IDS compares the network activity to a set of predefined rules and patterns to
identify any activity that might indicate an attack or intrusion.
If the IDS detects something that matches one of these rules or patterns, it sends an
alert to the system administrator.
The system administrator can then investigate the alert and take action to prevent
any damage or further intrusion.
Classification of Intrusion Detection System
IDS are classified into 5 types:
Network Intrusion Detection System (NIDS): Network intrusion detection systems
(NIDS) are set up at a planned point within the network to examine traffic from all
devices on the network. It performs an observation of passing traffic on the entire
subnet and matches the traffic that is passed on the subnets to the collection of
known attacks. Once an attack is identified or abnormal behavior is observed, the
alert can be sent to the administrator. An example of a NIDS is installing it on the
subnet where firewalls are located in order to see if someone is trying to crack the
firewall.
Host Intrusion Detection System (HIDS): Host intrusion detection systems (HIDS) run
on independent hosts or devices on the network. A HIDS monitors the incoming and
outgoing packets from the device only and will alert the administrator if suspicious
or malicious activity is detected. It takes a snapshot of existing system files and
compares it with the previous snapshot. If the analytical system files were edited or
deleted, an alert is sent to the administrator to investigate. An example of HIDS
usage can be seen on mission-critical machines, which are not expected to change
their layout.
Protocol-based Intrusion Detection System (PIDS): Protocol-based intrusion
detection system (PIDS) comprises a system or agent that would consistently reside
at the front end of a server, controlling and interpreting the protocol between a
user/device and the server. It is trying to secure the web server by regularly
monitoring the HTTPS protocol stream and accepting the related HTTP protocol. As
HTTPS is unencrypted and before instantly entering its web presentation layer then
this system would need to reside in this interface, between to use the HTTPS.
Application Protocol-based Intrusion Detection System (APIDS): An application
Protocol-based Intrusion Detection System (APIDS) is a system or agent that
generally resides within a group of servers. It identifies the intrusions by monitoring
and interpreting the communication on application-specific protocols. For example,
this would monitor the SQL protocol explicitly to the middleware as it transacts with
the database in the web server.
Hybrid Intrusion Detection System: Hybrid intrusion detection system is made by the
combination of two or more approaches to the intrusion detection system. In the
hybrid intrusion detection system, the host agent or system data is combined with
network information to develop a complete view of the network system. The hybrid
intrusion detection system is more effective in comparison to the other intrusion
detection system. Prelude is an example of Hybrid IDS.
Benefits of IDS
Detects malicious activity: IDS can detect any suspicious activities and alert the
system administrator before any significant damage is done.
Improves network performance: IDS can identify any performance issues on the
network, which can be addressed to improve network performance.
Compliance requirements: IDS can help in meeting compliance requirements by
monitoring network activity and generating reports.
Provides insights: IDS generates valuable insights into network traffic, which can be
used to identify any weaknesses and improve network security.
Detection Method of IDS
Signature-based Method: Signature-based IDS detects the attacks on the basis of the
specific patterns such as the number of bytes or a number of 1s or the number of 0s
in the network traffic. It also detects on the basis of the already known malicious
instruction sequence that is used by the malware. The detected patterns in the IDS
are known as signatures. Signature-based IDS can easily detect the attacks whose
pattern (signature) already exists in the system but it is quite difficult to detect new
malware attacks as their pattern (signature) is not known.
Anomaly-based Method: Anomaly-based IDS was introduced to detect unknown
malware attacks as new malware is developed rapidly. In anomaly-based IDS there is
the use of machine learning to create a trustful activity model and anything coming
is compared with that model and it is declared suspicious if it is not found in the
model. The machine learning-based method has a better-generalized property in
comparison to signature-based IDS as these models can be trained according to the
applications and hardware configurations.
Comparison of IDS with Firewalls
IDS and firewall both are related to network security but an IDS differs from a firewall
as a firewall looks outwardly for intrusions in order to stop them from happening.
Firewalls restrict access between networks to prevent intrusion and if an attack is
from inside the network it doesn’t signal. An IDS describes a suspected intrusion
once it has happened and then signals an alarm.
Conclusion:
Intrusion Detection System (IDS) is a powerful tool that can help businesses in
detecting and prevent unauthorized access to their network. By analyzing network
traffic patterns, IDS can identify any suspicious activities and alert the system
administrator. IDS can be a valuable addition to any organization’s security
infrastructure, providing insights and improving network performance.
SQL Injection
The SQL Injection is a code penetration technique that might cause loss to our
database. It is one of the most practiced web hacking techniques to place malicious
code in SQL statements, via webpage input. SQL injection can be used to manipulate
the application's web server by malicious users.
SQL injection generally occurs when we ask a user to input their username/userID.
Instead of a name or ID, the user gives us an SQL statement that we will unknowingly
run on our database. For Example - we create a SELECT statement by adding a variable
"demoUserID" to select a string. The variable will be fetched from user input
(getRequestString).
1. demoUserI = getrequestString("UserId");
2. demoSQL = "SELECT * FROM users WHERE UserId =" +demoUserId;
Types of SQL injection attacks
SQL injections can do more harm other than passing the login algorithms. Some of the
SQL injection attacks include:
o Updating, deleting, and inserting the data: An attack can modify the cookies to poison
a web application's database query.
o It is executing commands on the server that can download and install malicious
programs such as Trojans.
o We are exporting valuable data such as credit card details, email, and passwords to the
attacker's remote server.
o Getting user login details: It is the simplest form of SQL injection. Web application
typically accepts user input through a form, and the front end passes the user input to
the back end database for processing.
Example of SQL Injection
We have an application based on employee records. Any employee can view only their
own records by entering a unique and private employee ID. We have a field like an
Employee ID. And the employee enters the following in the input field:
236893238 or 1=1
It will translate to:
1. SELECT * from EMPLOYEE where EMPLOYEE_ID == 236893238 or 1=1
The SQL code above is valid and will return EMPLOYEE_ID row from the EMPLOYEE
table. The 1=1 will return all records for which this holds true. All the employee data is
compromised; now, the malicious user can also similarly delete the employee records.
Example:
1. SELECT * from Employee where (Username == "" or 1=1) AND (Password=""
or 1=1).
Now the malicious user can use the '=' operator sensibly to retrieve private and secure
user information. So instead of the query mentioned above, the following query, when
exhausted, retrieve protected data, not intended to be shown to users.
1. SELECT * from EMPLOYEE where (Employee_name =" " or 1=1) AND (Passwo
rd=" " or 1=1)
SQL injection based on Batched SQL statements
Several databases support batched SQL statements. It is a group of two or more SQL
statements separated by semicolons.
The SQL statement given below will return all rows from the Employee table, then
delete the Employee_Add table.
1. SELECT * From Employee; DROP Table Employee_Add
How to detect SQL Injection attacks
Creating a SQL Injection attack is not difficult, but even the best and good-intentioned
developers make mistakes. The detection of SQL Injection is, therefore, an essential
component of creating the risk of an SQL injection attack. Web Application Firewall
can detect and block basic SQL injection attacks, but we should depend on it as the
sole preventive measure.
Intrusion Detection System (IDS) is both network-based and host-based. It can be
tuned to detect SQL injection attacks. Network-based IDSec can monitor all
connections to our database server, and flags suspicious activities. The host-based IDS
can monitor web server logs and alert when something strange happens.
Impact of SQL Injection
The intruder can retrieve all the user-data present in the database, such as user details,
credit card information, and social security numbers, and can also gain access to
protected areas like the administrator portal. It is also possible to delete the user data
from the tables. These days all the online shopping applications, bank transactions use
back-end database servers. If the intruder can exploit SQL injection, the entire server
is compromised.
How to prevent SQL Injection attack
o We should use user authentication to validate input from the user by pre-defining
length, input type, and the input field.
o Restricting the access privileges of users and defining the amount of data any outsider
can access from the database. Generally, the user cannot be granted permission to
access everything in the database.
o We should not use system administrator accounts.