ADBMS
Unit 2
Transaction processing and Concurrency control
Transaction
o The transaction is a set of logically related operation. It contains a group
of tasks.
o A transaction is an action or series of actions. It is performed by a single
user to perform operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to
Y's account. This small transaction contains several low-level tasks:
X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction:
Read(X): Read operation is used to read the value of X from the database and
stores it in a buffer in main memory.
Write(X): Write operation is used to write the value back to the database from
the buffer.
Let's take an example to debit transaction from an account which consists of
following operations:
1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
Let's assume the value of X before starting of the transaction is 4000.
Following are the main operations of transaction:
o he first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will
contain 3500.
o The third operation will write the buffer's value to the database. So X's
final value will be 3500.
But it may be possible that because of the failure of hardware, software or
power, etc. that transaction may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after
executing operation 2 then X's value will remain 4000 in the database which is
not acceptable by the bank.
To solve this problem, we have two important operations:
Commit: It is used to save the work done permanently.
Rollback: It is used to undo the work done.
ACID Properties
The transaction has the four properties. These are used to maintain consistency
in a database, before and after the transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity
o It states that all operations of the transaction take place at once if not, the
transaction is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each
transaction is treated as one unit and either run to completion or is not
executed at all.
Atomicity involves the following two operations:
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Example: Let's assume that following transaction T consisting of T1 and T2. A
consists of Rs 600 and B consists of Rs 300. Transfer Rs 100 from account A to
account B.
T1 T2
Read(A) Read(B)
A:= A-100 Y:=
Write(A) Write(B)
After completion of the transaction, A consists of Rs 500 and B consists of Rs
400.
If the transaction T fails after the completion of transaction T1 but before
completion of transaction T2, then the amount will be deducted from A but not
added to B. This shows the inconsistent database state. In order to ensure
correctness of database state, the transaction must be executed in entirety.
Consistency
o The integrity constraints are maintained so that the database is consistent
before and after the transaction.
o The execution of a transaction will leave a database in either its prior
stable state or a new stable state.
o The consistent property of database states that every transaction sees a
consistent database instance.
o The transaction is used to transform the database from one consistent
state to another consistent state.
For example: The total amount must be maintained before or after the
transaction.
1. Total before T occurs = 600+300=900
2. Total after T occurs= 500+400=900
Therefore, the database is consistent. In the case when T1 is completed but T2
fails, then inconsistency will occur.
Isolation
o It shows that the data which is used at the time of execution of a
transaction cannot be used by the second transaction until the first one is
completed.
o In isolation, if the transaction T1 is being executed and using the data
item X, then that data item can't be accessed by any other transaction T2
until the transaction T1 ends.
o The concurrency control subsystem of the DBMS enforced the isolation
property.
Durability
o The durability property is used to indicate the performance of the
database's consistent state. It states that the transaction made the
permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or
by the system failure. When a transaction is completed, then the database
reaches a state known as the consistent state. That consistent state cannot
be lost, even in the event of a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability
property.
States of Transaction
The different stages a transaction goes through during its lifecycle are known as
the transaction states. The following is a diagrammatic representation of the
different stages of a transaction.
Active state
o The active state is the first state of every transaction. In this state, the
transaction is being executed.
o For example: Insertion or deletion or updating a record is done here. But
all the records are still not saved to the database.
Partially committed
o In the partially committed state, a transaction executes its final operation,
but the data is still not saved to the database.
o In the total mark calculation example, a final display of the total marks
step is executed in this state.
Committed
A transaction is said to be in a committed state if it executes all its operations
successfully. In this state, all the effects are now permanently saved on the
database system.
Failed state
o If any of the checks made by the database recovery system fails, then the
transaction is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire
a query to fetch the marks, then the transaction will fail to execute.
Aborted
o If any of the checks fail and the transaction has reached a failed state then
the database recovery system will make sure that the database is in its
previous consistent state. If not then it will abort or roll back the
transaction to bring the database into a consistent state.
o If the transaction fails in the middle of the transaction then before
executing the transaction, all the executed transactions are rolled back to
its consistent state.
o After aborting the transaction, the database recovery module will select
one of the two operations:
1. Re-start the transaction
2. Kill the transaction
Example
Let us take a very simple example of Railway ticket booking. Can you think of
the things that need to be retrieved from the database when you initiate the
booking process?
You will need the train details, the already booked ticket details, the platform
details, and many more such things. Now, once these details are retrieved the
transaction of booking a ticket enters the active state.
After the user has completed the entire process of booking a ticket from their
end, the transaction enters the partially committed state. In case any error
occurred during the process, then the transaction will enter the failed state.
Now, say the process was successful and the transaction entered the partially
committed state, now if the saving in the database is completed successfully
then the transaction enters the committed state. In case there is any error while
saving in the database then it enters the failed state.
Anything from the failed state enters the aborted state so that rollbacks can
take place and the database consistency is maintained.
Now, let’s talk about the terminated state. If the booking is permanently saved
in the database, or it has been aborted due to some unforeseen reasons then the
transaction enters the terminated state.
Serializability
In the field of computer science, serializability is a term that is a property
of the system that describes how the different process operates the shared
data.
If the result given by the system is similar to the operation performed by
the system, then in this situation, we call that system serializable.
Here the cooperation of the system means there is no overlapping in the
execution of the data. In DBMS, when the data is being written or read
then, the DBMS can stop all the other processes from accessing the data.
A schedule is serialized if it is equivalent to a serial schedule. A
concurrent schedule must ensure it is the same as if executed serially
means one after another. It refers to the sequence of actions such as read,
write, abort, commit are performed in a serial manner.
Schedules in DBMS are of two types:
1. Serial Schedule - A schedule in which only one transaction is executed at
a time, i.e., one transaction is executed completely before starting another
transaction.
Example:
Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
R(b)
W(b)
R(a)
W(a)
2. Here, we can see that Transaction-2 starts its execution after the
completion of Transaction-1.
2. Non serial schedule − When a transaction is overlapped between the
transaction T1 and T2.
Example:
Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
R(b)
R(a)
W(b)
W(a)
We can see that Transaction-2 starts its execution before the completion of
Transaction-1, and they are interchangeably working on the same data, i.e., "a"
and "b".
Types of serializability
There are two types of serializability −
1. Conflict serializability
Conflict serializability is a type of conflict operation in serializability that
operates the same data item that should be executed in a particular order and
maintains the consistency of the database. In DBMS, each transaction has some
unique value, and every transaction of the database is based on that unique
value of the database.
This unique value ensures that no two operations having the same conflict value
are executed concurrently. For example, let's consider two examples, i.e., the
order table and the customer table. One customer can have multiple orders, but
each order only belongs to one customer. There is some condition for the
conflict serializability of the database. These are as below.
o Both operations should have different transactions.
o Both transactions should have the same data item.
o There should be at least one write operation between the two operations.
If there are two transactions that are executed concurrently, one operation has to
add the transaction of the first customer, and another operation has added by the
second operation. This process ensures that there would be no inconsistency in
the database.
The conflicting pairs are:
1. READ(a) - WRITE(a)
2. WRITE(a) - WRITE(a)
3. WRITE(a) - READ(a)
2. View serializability
If a non-serial schedule is view equivalent to some other serial schedule then
the schedule is called View Serializable Schedule. It is needed to ensure the
consistency of a schedule.
What is view equivalency?
The two conditions needed by schedules(S1 and S2) to be view equivalent are:
1. Initial read must be on the same piece of data.
Example: If transaction t1 is reading "A" from database in schedule S1, then in
schedule S2, t1 must read A.
2. Final write must be on the same piece of data.
Example: If a transaction t1 updated A at last in S1, then in S2, t1 should
perform final write as well.
3. The mid sequence should also be in the same order.
Example: If t1 is reading A which is updated by t2 in S1, then in S2, t1 should
read A which should be updated by t2.
This process of checking view equivalency of a schedule is called View
Serializability.
Example: We have a schedule "S" having two transactions t1, and t2 working
simultaneously.
S:
t1 t2
R(x)
W(x)
t1 t2
R(x)
W(x)
R(y)
W(y)
R(y)
W(y)
Let's form its view equivalent schedule (S') by interchanging mid-read-write
operations of both the transactions. S':
t1 t2
R(x)
W(x)
R(y)
W(y)
R(x)
W(x)
R(y)
W(y)
Since a view equivalent schedule is possible, it is a view serializable schedule.
Prioritization
Prioritization is useful for browsing tasks, and tasks that use a lot of processor
time. Input/Output bound tasks can take the required amount of CPU, and move
on to the next read/write wait. CPU-intensive tasks take higher priority over the
less intensive tasks. Prioritization can be implemented in all CICS® systems. It
is more important in a high-activity system than in a low-activity system. With
careful priority selection, you can improve overall throughput and response
time. Prioritization can minimize resource usage of certain resource-bound
transactions. Prioritization increases the response time for lower-priority tasks,
and can distort the regulating effects of MXT and the MAXACTIVE attribute of
the transaction class definition.
Priorities do not affect the order of servicing terminal input messages and,
therefore, the time they wait to be attached to the transaction manager. Because
prioritization is determined in three sets of definitions (terminal, transaction,
and operator), it can be a time-consuming process for you to track many
transactions in a system. CICS prioritization is not interrupt-driven as is the case
with operating system prioritization, but determines the position on a ready
queue. This means that, after a task is given control of the processor, the task
does not relinquish that control until it issues a CICS command that calls the
CICS dispatcher. After the dispatch of a processor-bound task, CICS can be tied
up for long periods if CICS requests are infrequent. For that reason,
prioritization should be implemented only if MXT and the MAXACTIVE
attribute of the transaction class definition adjustments have proved to be
insufficient.
You should use prioritization sparingly, if at all, and only after you have already
adjusted task levels using MXT and the MAXACTIVE attribute of the
transaction class definition. It is probably best to set all tasks to the same
priority, and then prioritize some transactions either higher or lower on an
exception basis, and according to the specific constraints in a system. Do not
prioritize against slow tasks unless you can accept the longer task life and
greater dispatch overhead; these tasks are slow, in any case, and give up control
each time they have to wait for I/O. Use small priority values and differences
and concentrate on transaction priority. Give priority to control operator tasks
rather than the person, or at least to the control operator's signon ID rather than
to a specific physical terminal (the control operator may move around).
Consider for high priority a task that uses large resources. However, the effects
of this on the overall system need careful monitoring to ensure that loading a
large transaction of this type does not lock out other transactions. Also
consider for high priority those transactions that cause enqueues to system
resources, thus locking out other transactions. As a result, these can process
quickly and then release resources. Here are some examples:
Using intrapartition transient data with logical recovery
Updating frequently used records
Automatic logging
Tasks needing fast application response time, for example, data entry.
Lower priority should be considered for tasks that:
Have long browsing activity
Are process-intensive with minimal I/O activity
Do not require terminal interaction, for example:
o Auto-initiate tasks (except when you use transient data
intrapartition queues that have a destination of terminal defined
and a trigger level that is greater than zero).
o Batch update controlling tasks.
There is no direct measurement of transaction priority. Indirect measurement
can be made from:
Task priorities
Observed transaction responses
Overall processor, storage, and data set I/O usage.
The following table shows how you might prioritize three typical transactions.
Establish a HIGH priority for transactions that are vital to the operations
of the organization. For example, you might specify a HIGH priority for a
transaction that services the information needs of upper-level managers in
the organization.
Sample Transactions
Transaction Processing Mode Time Priority
Add or delete a claim Online 3 seconds High
List of employees for an office Batch 15 minutes Medium
Show salary grade for all jobs Online 6 seconds Low
Failure Classification
To find that where the problem has occurred, we generalize a failure into the
following categories:
1. Transaction failure
2. System crash
3. Disk failure
1. Transaction failure
The transaction failure occurs when it fails to execute or when it reaches
a point from where it can't go any further. If a few transaction or process
is hurt, then this is called as transaction failure.
Reasons for a transaction failure could be -
1. Logical errors: If a transaction cannot complete due to some code
error or an internal error condition, then the logical error occurs.
2. Syntax error: It occurs where the DBMS itself terminates an
active transaction because the database system is not able to
execute it. For example, The system aborts an active transaction,
in case of deadlock or resource unavailability.
2. System Crash
o System failure can occur due to power failure or other hardware or
software failure. Example: Operating system error.
Fail-stop assumption: In the system crash, non-volatile storage is
assumed not to be corrupted.
3. Disk Failure
o It occurs where hard-disk drives or storage drives used to fail
frequently. It was a common problem in the early days of
technology evolution.
o Disk failure occurs due to the formation of bad sectors, disk head
crash, and unreachability to the disk or any other failure, which
destroy all or part of disk storage.
Recoverability of Schedule
Sometimes a transaction may not execute completely due to a software issue,
system crash or hardware failure. In that case, the failed transaction has to be
rollback. But some other transaction may also have used value produced by
the failed transaction. So we also have to rollback those transactions.
The above table 1 shows a schedule which has two transactions. T1 reads and
writes the value of A and that value is read and written by T2. T2 commits but
later on, T1 fails. Due to the failure, we have to rollback T1. T2 should also be
rollback because it reads the value written by T1, but T2 can't be rollback
because it already committed. So this type of schedule is known as
irrecoverable schedule.
Irrecoverable schedule: The schedule will be irrecoverable if Tj reads the
updated value of Ti and Tj committed before Ti commit.
The above table 2 shows a schedule with two transactions. Transaction T1 reads
and writes A, and that value is read and written by transaction T2. But later on,
T1 fails. Due to this, we have to rollback T1. T2 should be rollback because T2
has read the value written by T1. As it has not committed before T1 commits so
we can rollback transaction T2 as well. So it is recoverable with cascade
rollback.
Recoverable with cascading rollback: The schedule will be recoverable with
cascading rollback if Tj reads the updated value of Ti. Commit of Tj is delayed
till commit of Ti.
The above Table 3 shows a schedule with two transactions. Transaction T1
reads and write A and commits, and that value is read and written by T2. So this
is a cascade less recoverable schedule.
Levels of Transaction Consistency
In a Distributed System, consistency means that “all reads executed at a given
time must return the same result irrespective of the server that executed the
read.” For one server to read the data written by another server, we need to
maintain a global ordering of reads and writes. So that each copy of the database
will execute the events in the same order, and they will all result in the same
final database state. Therefore, our data will be consistent.
The consistency levels differ in the way they implement this ordering of events.
Some are strict, w.r.t maintaining a time-based order while others are more
relaxed.
Consistency Levels
Sequential Consistency: All writes must be globally ordered. Each thread of
execution must see the same ordering of writes, irrespective of which thread
executed and which data items were written to. However, this ordering needn’t
be the same as the real-time ordering of writes. We can have any ordering of
writes, as long as such an ordering is agreed upon by all the threads.
Figure 1: sequentially consistent
Threads 3 and 4 see that B has been updated before A (which is different from
what actually happened), but this is still considered sequentially consistent as all
the threads are in agreement.
Figure 2: Not sequentially consistent
In the above image, thread 3 sees the update to B before the update to A, while
thread 4 sees the update to A first. This violates sequential consistency.
Strict Consistency: This is the highest level of consistency. Strict Consistency
requires events to be ordered in the same real-time sequence in which they
occurred. In other words, “an earlier write always has to be seen before a later
write.” This is almost impossible to implement in distributed systems. Hence it
remains in the realm of theoretical discussions.
Strict Consistency also demands that any write by a given thread should be
immediately visible to all other threads. Hence the above two images do not
satisfy strict consistency, as they read zero values from data items that were
already written to.
Figure 3: Strictly Consistent
The above image satisfies strict consistency as immediate subsequent reads see
the new value of the data item.
Linearizability/ Atomic Consistency: Although strict consistency is very hard
to achieve, we can come pretty close to it. Linearizability also requires the writes
to be ordered in a real-time fashion, but it acknowledges that there’s some time
gap between when an operation is submitted to the system and when the system
acknowledges it.
Figure 4: Linearizable but not strictly consistent
The write of thread 1 overlaps with the read of thread 3. Therefore, there needn’t
be a real-time ordering between them. Hence the above image satisfies
Linearizability even though thread three’s read doesn’t fetch the latest value.
Causal Consistency: Causal Consistency requires only related operations to
have a global ordering between them. Two operations can be related because
they acted on the same data item, or because they originated from the same
thread. Operations that are not related can be seen in any order by any thread.
Causal consistency is a weaker form of sequential consistency.
Figure 5: Causally consistent
The write to B by thread 2 follows the read from A. Hence these two operations
are causally related. Hence, the write to A must appear before the write to B.
Figure 5 is causally consistent since threads 3 and 4 see the writes in the same
order.
Eventual Consistency: This can be called the weakest level of consistency. The
only guarantee offered here is that if there are no writes for a long period of
time, the threads will eventually agree on the value of the last write. That is,
eventually, all the copies of the database will reflect the same value.
Figure 6: Eventually consistent
The above figure violates all consistency levels except eventual consistency. If
there isn’t any other write for a long time, all the threads will see the same value
of A and B.
Deadlock
A deadlock is a condition where two or more transactions are waiting
indefinitely for one another to give up locks. Deadlock is said to be one of the
most feared complications in DBMS as no task ever gets finished and is in
waiting state forever.
For example: In the student table, transaction T1 holds a lock on some rows
and needs to update some rows in the grade table. Simultaneously, transaction
T2 holds locks on some rows in the grade table and needs to update the rows in
the Student table held by Transaction T1.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release
its lock and similarly, transaction T2 is waiting for T1 to release its lock. All
activities come to a halt state and remain at a standstill. It will remain in a
standstill until the DBMS detects the deadlock and aborts one of the
transactions.
Deadlock Avoidance
o When a database is stuck in a deadlock state, then it is better to avoid the
database rather than aborting or restating the database. This is a waste of
time and resource.
o Deadlock avoidance mechanism is used to detect any deadlock situation
in advance. A method like "wait for graph" is used for detecting the
deadlock situation but this method is suitable only for the smaller
database. For the larger database, deadlock prevention method can be
used.
Deadlock Detection
In a database, when a transaction waits indefinitely to obtain a lock, then the
DBMS should detect whether the transaction is involved in a deadlock or not.
The lock manager maintains a Wait for the graph to detect the deadlock cycle in
the database.
Wait for Graph
o This is the suitable method for deadlock detection. In this method, a
graph is created based on the transaction and their lock. If the created
graph has a cycle or closed loop, then there is a deadlock.
o The wait for the graph is maintained by the system for every transaction
which is waiting for some data held by the others. The system keeps
checking the graph if there is any cycle in the graph.
The wait for a graph for the above scenario is shown below:
Deadlock Prevention
o Deadlock prevention method is suitable for a large database. If the
resources are allocated in such a way that deadlock never occurs, then the
deadlock can be prevented.
o The Database management system analyzes the operations of the
transaction whether they can create a deadlock situation or not. If they do,
then the DBMS never allowed that transaction to be executed.
Wait-Die scheme
In this scheme, if a transaction requests for a resource which is already held
with a conflicting lock by another transaction then the DBMS simply checks the
timestamp of both transactions. It allows the older transaction to wait until the
resource is available for execution.
Let's assume there are two transactions Ti and Tj and let TS(T) is a timestamp
of any transaction T. If T2 holds a lock by some other transaction and T1 is
requesting for resources held by T2 then the following actions are performed by
DBMS:
1. Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held
some resource, then Ti is allowed to wait until the data-item is available
for execution. That means if the older transaction is waiting for a resource
which is locked by the younger transaction, then the older transaction is
allowed to wait for resource until it is available.
2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some
resource and if Tj is waiting for it, then Tj is killed and restarted later
with the random delay but with the same timestamp.
Wound wait scheme
o In wound wait scheme, if the older transaction requests for a resource
which is held by the younger transaction, then older transaction forces
younger one to kill the transaction and release the resource. After the
minute delay, the younger transaction is restarted but with the same
timestamp.
o If the older transaction has held a resource which is requested by the
Younger transaction, then the younger transaction is asked to wait until
older releases it.
Features of deadlock in a DBMS:
o Mutual Exclusion: Each resource can be held by only one transaction
at a time, and other transactions must wait for it to be released.
o Hold and Wait: Transactions can request resources while holding on to
resources already allocated to them.
o No Preemption: Resources cannot be taken away from a transaction
forcibly, and the transaction must release them voluntarily.
o Circular Wait: Transactions are waiting for resources in a circular
chain, where each transaction is waiting for a resource held by the next
transaction in the chain.
o Indefinite Blocking: Transactions are blocked indefinitely, waiting for
resources to become available, and no transaction can proceed.
o System Stagnation: Deadlock leads to system stagnation, where no
transaction can proceed, and the system is unable to make any progress.
o Inconsistent Data: Deadlock can lead to inconsistent data if
transactions are unable to complete and leave the database in an
intermediate state.
o Difficult to Detect and Resolve: Deadlock can be difficult to detect and
resolve, as it may involve multiple transactions, resources, and
dependencies.
Disadvantages:
o System downtime: Deadlock can cause system downtime, which can
result in loss of productivity and revenue for businesses that rely on the
DBMS.
o Resource waste: When transactions are waiting for resources, these
resources are not being used, leading to wasted resources and decreased
system efficiency.
o Reduced concurrency: Deadlock can lead to a decrease in system
concurrency, which can result in slower transaction processing and
reduced throughput.
o Complex resolution: Resolving deadlock can be a complex and time-
consuming process, requiring system administrators to intervene and
manually resolve the deadlock.
o Increased system overhead: The mechanisms used to detect and
resolve deadlock, such as timeouts and rollbacks, can increase system
overhead, leading to decreased performance.
LONG DURATION TRANSACTIONS
Depending on the lifetime or duration, transactions can be classified as:
–Short Duration Transaction: Short duration transaction is also known as
online transaction requiring very short execution/response time and access small
portion of the database.
–Long Duration Transaction: A long duration transaction also known as
Batch transaction requires a longer execution/response time and generally
accesses larger portion of the database.
Alternately we can define long duration in context of database systems as the on
that involve human intervention, while short duration transactions are more or
less Non-Interactive.
•The long-duration transactions exhibit following properties:
a) Long Duration: When interacting with Humans, it is quite natural
the response will be very slow relative to computer speed.
In some applications, the human activity may involve hours,
days, even longer periods of time.
So overall the transactions will be of longer duration.
b) Exposure to Uncommitted Data: In many cases of long duration
transactions, the other transactions may be forced to read
uncommitted data.
If several users are cooperating on a project, user transactions
may need to exchange data prior to transaction commit.
c) Subtasks: Interactive transaction will consist of set of subtasks
initiated by the user.
At some point of time, user may wish to abort a subtask,
without necessarily causing
the entire transaction to Abort.
c) Recoverability: It is un-acceptable to abort a Long-duration
interactive transaction given a system crash.
Even if we have a system crash, the active transaction must be
recovered to a state that existed shortly before the crash, so that
relatively human work is lost.
e) Performance: Good performance in Long Duration Transaction is
defined as how fast the response has been generated for a particular transaction.
However, in case of non-interactive system, performance is
measured as HIGH THROUGHPUT.
Transaction processing as implemented in contemporary
databases
Transaction processing in contemporary databases refers to the management of
database transactions in a way that ensures data integrity, consistency, and
reliability. It is a critical aspect of database management systems (DBMS) and
is essential in various applications, including e-commerce, banking, healthcare,
and more. Here are some key aspects of transaction processing as implemented
in contemporary databases:
ACID Properties: Contemporary databases typically adhere to the ACID
(Atomicity, Consistency, Isolation, Durability) properties to guarantee the
reliability of transactions.
Atomicity: A transaction is treated as a single, indivisible unit. It is either
entirely completed or entirely aborted. There are no partial changes to the
database.
Consistency: Transactions bring the database from one consistent state to
another. They must follow predefined rules and constraints to maintain data
integrity.
Isolation: Transactions are executed in isolation from each other. This means
that the changes made by one transaction are not visible to others until the
transaction is completed. Isolation levels like Read Uncommitted, Read
Committed, Repeatable Read, and Serializable control the degree of isolation.
Durability: Once a transaction is committed, its changes are permanent and will
survive system failures. This ensures data is not lost in the event of a crash.
Concurrency Control: Contemporary databases employ various techniques to
manage concurrent access to the database by multiple transactions. This
includes locking, multi-version concurrency control, and timestamp-based
ordering, among others.
Transaction Logs: Databases maintain transaction logs to record all changes
made by transactions. These logs are crucial for recovery in the event of system
failures.
Commit and Rollback: Transactions provide mechanisms for committing
changes to the database or rolling back changes in case of errors or user
requests.
Savepoints: Some contemporary databases support savepoints within
transactions. This allows a transaction to be partially rolled back to a specific
point, which can be useful in error recovery.
Two-Phase Commit (2PC): In distributed databases, the two-phase commit
protocol is often used to ensure that transactions are either committed on all
distributed nodes or rolled back in a coordinated manner.
Optimistic Concurrency Control: In some scenarios, databases use optimistic
concurrency control, where conflicts are detected and resolved only when
committing a transaction. This can reduce contention but may require more
complex conflict resolution mechanisms.
Snapshot Isolation: Some databases support snapshot isolation, which provides
a consistent view of the database as of the start of a transaction. This can
simplify concurrency control and improve performance.
Deadlock Detection and Resolution: Databases incorporate mechanisms to
detect and resolve deadlocks, which occur when transactions are waiting for
each other to release locks.
Distributed Transactions: In distributed databases, managing transactions
across multiple nodes and ensuring they maintain ACID properties can be more
complex. Distributed transaction managers are used to coordinate such
transactions.
In-Memory Databases: Some contemporary databases are designed to work
entirely or partially in memory, offering high-speed transaction processing.
NoSQL Databases: While traditional SQL databases follow ACID properties,
some NoSQL databases may relax these constraints to provide better scalability
and performance, opting for BASE (Basically Available, Soft state, Eventually
consistent) properties instead.
Contemporary databases, whether SQL or NoSQL, are designed to meet
specific application requirements while ensuring data consistency and
reliability. The choice of database system and transaction processing methods
depends on the specific needs of an application, such as performance,
scalability, and data integrity.
Concurrency Control
Concurrency Control is the management procedure that is required for
controlling concurrent execution of the operations that take place on a database.
Concurrent Execution in DBMS
o In a multi-user system, multiple users can access and use the same
database at one time, which is known as the concurrent execution of the
database. It means that the same database is executed simultaneously on a
multi-user system by different users.
o While working on the database transactions, there occurs the requirement
of using the database by multiple users for performing different
operations, and in that case, concurrent execution of the database is
performed.
o The thing is that the simultaneous execution that is performed should be
done in an interleaved manner, and no operation should affect the other
executing operations, thus maintaining the consistency of the database.
Thus, on making the concurrent execution of the transaction operations,
there occur several challenging problems that need to be solved.
Problems with Concurrent Execution
In a database transaction, the two main operations
are READ and WRITE operations. So, there is a need to manage these two
operations in the concurrent execution of the transactions as if these operations
are not performed in an interleaved manner, and the data may become
inconsistent. So, the following problems occur with the Concurrent Execution
of the operations:
Problem 1: Lost Update Problems (W - W Conflict)
The problem occurs when two different database transactions perform the
read/write operations on the same database items in an interleaved manner
(i.e., concurrent execution) that makes the values of the items incorrect hence
making the database inconsistent.
For example:
Consider the below diagram where two transactions TX and TY, are
performed on the same account A where the balance of account A is $300.
o At time t1, transaction TX reads the value of account A, i.e., $300 (only
read).
o At time t2, transaction TX deducts $50 from account A that becomes $250
(only deducted and not updated/write).
o Alternately, at time t3, transaction TY reads the value of account A that
will be $300 only because TX didn't update the value yet.
o At time t4, transaction TY adds $100 to account A that becomes $400
(only added but not updated/write).
o At time t6, transaction TX writes the value of account A that will be
updated as $250 only, as TY didn't update the value yet.
o Similarly, at time t7, transaction TY writes the values of account A, so it
will write as done at time t4 that will be $400. It means the value written
by TX is lost, i.e., $250 is lost.
Hence data becomes incorrect, and database sets to inconsistent.
Dirty Read Problems (W-R Conflict)
The dirty read problem occurs when one transaction updates an item of the
database, and somehow the transaction fails, and before the data gets rollback,
the updated database item is accessed by another transaction. There comes the
Read-Write Conflict between both transactions.
For example:
Consider two transactions TX and TY in the below diagram performing
read/write operations on account A where the available balance in account
A is $300:
o At time t1, transaction TX reads the value of account A, i.e., $300.
o At time t2, transaction TX adds $50 to account A that becomes $350.
o At time t3, transaction TX writes the updated value in account A, i.e.,
$350.
o Then at time t4, transaction TY reads account A that will be read as $350.
o Then at time t5, transaction TX rollbacks due to server problem, and the
value changes back to $300 (as initially).
o But the value for account A remains $350 for transaction TY as
committed, which is the dirty read and therefore known as the Dirty Read
Problem.
Unrepeatable Read Problem (W-R Conflict)
Also known as Inconsistent Retrievals Problem that occurs when in a
transaction, two different values are read for the same database item.
For example:
Consider two transactions, TX and TY, performing the read/write
operations on account A, having an available balance = $300. The diagram
is shown below:
o At time t1, transaction TX reads the value from account A, i.e., $300.
o At time t2, transaction TY reads the value from account A, i.e., $300.
o At time t3, transaction TY updates the value of account A by adding $100
to the available balance, and then it becomes $400.
o At time t4, transaction TY writes the updated value, i.e., $400.
o After that, at time t5, transaction TX reads the available value of account
A, and that will be read as $400.
o It means that within the same transaction TX, it reads two different values
of account A, i.e., $ 300 initially, and after updation made by transaction
TY, it reads $400. It is an unrepeatable read and is therefore known as the
Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid such problems
that take place in concurrent execution, management is needed, and that is
where the concept of Concurrency Control comes into role
Concurrency Control Protocols
The concurrency control protocols ensure the atomicity, consistency, isolation,
durability and serializability of the concurrent execution of the database
transactions. Therefore, these protocols are categorized as:
o Lock Based Concurrency Control Protocol
o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol
Lock-Based Protocol
o In this type of protocol, any transaction cannot read or write data until it
acquires an appropriate lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can
only read by the transaction.
o It can be shared between the transactions because when the transaction
holds a lock, then it can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by
the transaction.
o This lock is exclusive, and in this lock, multiple transactions do not
modify the same data simultaneously.
There are four types of lock protocols available:
1. Simplistic lock protocol
It is the simplest way of locking the data while transaction. Simplistic lock-
based protocols allow all the transactions to get the lock on the data before
insert or delete or update on it. It will unlock the data item after completing the
transaction.
2. Pre-claiming Lock Protocol
o Pre-claiming Lock Protocols evaluate the transaction to list all the data
items on which they need locks.
o Before initiating an execution of the transaction, it requests DBMS for all
the lock on all those data items.
o If all the locks are granted then this protocol allows the transaction to
begin. When the transaction is completed then it releases all the lock.
o If all the locks are not granted then this protocol allows the transaction to
rolls back and waits until all the locks are granted.
3. Two-phase locking (2PL)
o The two-phase locking protocol divides the execution phase of the
transaction into three parts.
o In the first part, when the execution of the transaction starts, it seeks
permission for the lock it requires.
o In the second part, the transaction acquires all the locks. The third phase
is started as soon as the transaction releases its first lock.
o In the third phase, the transaction cannot demand any new locks. It only
releases the acquired locks.
There are two phases of 2PL:
Growing phase: In the growing phase, a new lock on the data item may be
acquired by the transaction, but none can be released.
Shrinking phase: In the shrinking phase, existing lock held by the transaction
may be released, but no new locks can be acquired.
In the below example, if lock conversion is allowed then the following phase
can happen:
1. Upgrading of lock (from S(a) to X (a)) is allowed in growing phase.
2. Downgrading of lock (from X(a) to S(a)) must be done in shrinking
phase.
Example:
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
o Growing phase: from step 1-3
o Shrinking phase: from step 5-7
o Lock point: at 3
Transaction T2:
o Growing phase: from step 2-6
o Shrinking phase: from step 8-9
o Lock point: at 6
4. Strict Two-phase locking (Strict-2PL)
o The first phase of Strict-2PL is similar to 2PL. In the first phase, after
acquiring all the locks, the transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does
not release a lock after using it.
o Strict-2PL waits until the whole transaction to commit, and then it
releases all the locks at a time.
o Strict-2PL protocol does not have shrinking phase of lock release.
It does not have cascading abort as 2PL does.
Timestamp Ordering Protocol
o The Timestamp Ordering Protocol is used to order the transactions based
on their Timestamps. The order of transaction is nothing but the
ascending order of the transaction creation.
o The priority of the older transaction is higher that's why it executes first.
To determine the timestamp of the transaction, this protocol uses system
time or logical counter.
o The lock-based protocol is used to manage the order between conflicting
pairs among transactions at the execution time. But Timestamp based
protocols start working as soon as a transaction is created.
o Let's assume there are two transactions T1 and T2. Suppose the
transaction T1 has entered the system at 007 times and transaction T2 has
entered the system at 009 times. T1 has the higher priority, so it executes
first as it is entered the system first.
o The timestamp ordering protocol also maintains the timestamp of last
'read' and 'write' operation on a data.
Basic Timestamp ordering protocol works as follows:
1. Check the following condition whenever a transaction Ti issues a Read
(X) operation:
o If W_TS(X) >TS(Ti) then the operation is rejected.
o If W_TS(X) <= TS(Ti) then the operation is executed.
o Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues
a Write(X) operation:
o If TS(Ti) < R_TS(X) then the operation is rejected.
o If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back
otherwise the operation is executed.
Where,
TS(TI) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
Advantages and Disadvantages of TO protocol:
o TO protocol ensures serializability since the precedence graph is as
follows:
o TS protocol ensures freedom from deadlock that means no transaction
ever waits.
o But the schedule may not be recoverable and may not even be cascade-
free.
Validation Based Protocol
Validation phase is also known as optimistic concurrency control technique. In
the validation based protocol, the transaction is executed in the following three
phases:
1. Read phase: In this phase, the transaction T is read and executed. It is
used to read the value of various data items and stores them in temporary
local variables. It can perform all the write operations on temporary
variables without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be
validated against the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the
temporary results are written to the database or system otherwise the
transaction is rolled back.
Here each phase has the following different timestamps:
Start(Ti): It contains the time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its read phase and starts
its validation phase.
Finish(Ti): It contains the time when Ti finishes its write phase.
o This protocol is used to determine the time stamp for the transaction for
serialization using the time stamp of the validation phase, as it is the
actual phase which determines if the transaction will commit or rollback.
o Hence TS(T) = validation(T).
o The serializability is determined during the validation process. It can't be
decided in advance.
o While executing the transaction, it ensures a greater degree of
concurrency and also less number of conflicts.
o Thus it contains transactions which have less number of rollbacks.
Multiple Granularity
Granularity: It is the size of data item allowed to lock.
Multiple Granularity:
o It can be defined as hierarchically breaking up the database into blocks
which can be locked.
o The Multiple Granularity protocol enhances concurrency and reduces
lock overhead.
o It maintains the track of what to lock and how to lock.
o It makes easy to decide either to lock a data item or to unlock a data item.
This type of hierarchy can be graphically represented as a tree.
For example: Consider a tree which has four levels of nodes.
o The first level or higher level shows the entire database.
o The second level represents a node of type area. The higher level
database consists of exactly these areas.
o The area consists of children nodes which are known as files. No file can
be present in more than one area.
o Finally, each file contains child nodes known as records. The file has
exactly those records that are its child nodes. No records represent in
more than one file.
o Hence, the levels of the tree starting from the top level are as follows:
1. Database
2. Area
3. File
4. Record
In this example, the highest level shows the entire database. The levels below
are file, record, and fields.
There are three additional lock modes with multiple granularity:
Intention Mode Lock
Intention-shared (IS): It contains explicit locking at a lower level of the tree
but only with shared locks.
Intention-Exclusive (IX): It contains explicit locking at a lower level with
exclusive or shared locks.
Shared & Intention-Exclusive (SIX): In this lock, the node is locked in shared
mode, and some node is locked in exclusive mode by the same transaction.
Compatibility Matrix with Intention Lock Modes: The below table describes
the compatibility matrix for these lock modes:
It uses the intention lock modes to ensure serializability. It requires that if a
transaction attempts to lock a node, then that node must follow these protocols:
o Transaction T1 should follow the lock-compatibility matrix.
o Transaction T1 firstly locks the root of the tree. It can lock it in any
mode.
o If T1 currently has the parent of the node locked in either IX or IS mode,
then the transaction T1 will lock a node in S or IS mode only.
o If T1 currently has the parent of the node locked in either IX or SIX
modes, then the transaction T1 will lock a node in X, SIX, or IX mode
only.
o If T1 has not previously unlocked any node only, then the Transaction T1
can lock a node.
o If T1 currently has none of the children of the node-locked only, then
Transaction T1 will unlock a node.
Observe that in multiple-granularity, the locks are acquired in top-down order,
and locks must be released in bottom-up order.
o If transaction T1 reads record Ra9 in file Fa, then transaction T1 needs to
lock the database, area A1 and file Fa in IX mode. Finally, it needs to lock
Ra2 in S mode.
o If transaction T2 modifies record Ra9 in file Fa, then it can do so after
locking the database, area A1 and file Fa in IX mode. Finally, it needs to
lock the Ra9 in X mode.
o If transaction T3 reads all the records in file Fa, then transaction T3 needs
to lock the database, and area A in IS mode. At last, it needs to lock F a in
S mode.
o If transaction T4 reads the entire database, then T4 needs to lock the
database in S mode.
Crash Recovery
DBMS is a highly complex system with hundreds of transactions being
executed every second. The durability and robustness of a DBMS depends on
its complex architecture and its underlying hardware and system software. If it
fails or crashes amid transactions, it is expected that the system would follow
some sort of algorithm or techniques to recover lost data.
Failure Classification
To see where the problem has occurred, we generalize a failure into various
categories, as follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point
from where it can’t go any further. This is called transaction failure where only
a few transactions or processes are hurt.
Reasons for a transaction failure could be −
Logical errors − Where a transaction cannot complete because it has
some code error or any internal error condition.
System errors − Where the database system itself terminates an active
transaction because the DBMS is not able to execute it, or it has to stop
because of some system condition. For example, in case of deadlock or
resource unavailability, the system aborts an active transaction.
System Crash
There are problems − external to the system − that may cause the system to stop
abruptly and cause the system to crash. For example, interruptions in power
supply may cause the failure of underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common problem where hard-
disk drives or storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk
head crash or any other failure, which destroys all or a part of disk storage.
Storage Structure
The storage structure can be divided into two categories −
Volatile storage − As the name suggests, a volatile storage cannot
survive system crashes. Volatile storage devices are placed very close to
the CPU; normally they are embedded onto the chipset itself. For
example, main memory and cache memory are examples of volatile
storage. They are fast but can store only a small amount of information.
Non-volatile storage − These memories are made to survive system
crashes. They are huge in data storage capacity, but slower in
accessibility. Examples may include hard-disks, magnetic tapes, flash
memory, and non-volatile (battery backed up) RAM.
Recovery and Atomicity
When a system crashes, it may have several transactions being executed and
various files opened for them to modify the data items. Transactions are made
of various operations, which are atomic in nature. But according to ACID
properties of DBMS, atomicity of transactions as a whole must be maintained,
that is, either all the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
It should check the states of all the transactions, which were being
executed.
A transaction may be in the middle of some operation; the DBMS must
ensure the atomicity of the transaction in this case.
It should check whether the transaction can be completed now or it needs
to be rolled back.
No transactions would be allowed to leave the DBMS in an inconsistent
state.
There are two types of techniques, which can help a DBMS in recovering as
well as maintaining the atomicity of a transaction −
Maintaining the logs of each transaction, and writing them onto some
stable storage before actually modifying the database.
Maintaining shadow paging, where the changes are done on a volatile
memory, and later, the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed
by a transaction. It is important that the logs are written prior to the actual
modification and stored on a stable storage media, which is failsafe.
Log-based recovery works as follows −
The log file is kept on a stable storage media.
When a transaction enters the system and starts execution, it writes a log
about it.
<Tn, Start>
When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.
When the transaction finishes, it logs −
<Tn, commit>
The database can be modified using two approaches −
Deferred database modification − All logs are written on to the stable
storage and the database is updated when a transaction commits.
Immediate database modification − Each log follows an actual database
modification. That is, the database is modified immediately after every
operation.
Recovery with Concurrent Transactions
When more than one transaction are being executed in parallel, the logs are
interleaved. At the time of recovery, it would become hard for the recovery
system to backtrack all logs, and then start recovering. To ease this situation,
most modern DBMS use the concept of 'checkpoints'.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out
all the memory space available in the system. As time passes, the log file may
grow too big to be handled at all. Checkpoint is a mechanism where all the
previous logs are removed from the system and stored permanently in a storage
disk. Checkpoint declares a point before which the DBMS was in consistent
state, and all the transactions were committed.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in
the following manner −
The recovery system reads the logs backwards from the end to the last
checkpoint.
It maintains two lists, an undo-list and a redo-list.
If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or
just <Tn, Commit>, it puts the transaction in the redo-list.
If the recovery system sees a log with <Tn, Start> but no commit or abort
log found, it puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed.
All the transactions in the redo-list and their previous logs are removed and then
redone before saving their logs.
Database Backup – Physical and Logical
A database backup is a copy of storage that is stored on a server. Backup is
used to prevent unexpected data loss. If original data gets lost, then with the
help of a backup, it is easy to gain access to the data again.
There are two types of database backup.
Physical backup
Logical backup
Physical Backup:
Physical database backups are backups of physical files that are used to store
and recover databases. These include different data files, control files,
archived redo logs, and many more. Typically, physical backup data is kept in
the cloud, offline storage, magnetic tape, or on a disc.
There are two methods to perform a physical backup :
1. Operating system utilities
2. Recovery manager
This type of backup is useful when the user needs to restore the complete
database in a short period. It is beneficial to provide details of transactions and
changes made in databases. It is considered the foundation of the recovery
mechanism. This form of backup has the drawback of slowing down database
operations.
Advantages:
It is useful when the user needs to restore the complete database in a short
period.
They provide details of transactions and changes made in databases.
Disadvantage:
This slows down database operations.
Logical Backup:
It contains logical data which is retrieved from the database. It contains a
view, procedure, function, and table. This is useful When users want to restore
or transfer a copy of the database to a different location. Logical backups are
not as secure as physical backups in preventing data loss. It only provides
structural details. Every week, complete logical backups should be performed.
Logical backups are used as a supplement to a physical backup.
Advantages:
This is useful when the user needs to restore the complete database to the
last time.
It was more complex and provides granular recovery capabilities.
Disadvantages:
Critical for recovery of special components.
less secure compared to physical backup.
It only provides structural details.
1. Deferred Update: It is a technique for the maintenance of the transaction
log files of the DBMS. It is also called NO-UNDO/REDO technique. It is used
for the recovery of transaction failures that occur due to power, memory, or
OS failures. Whenever any transaction is executed, the updates are not made
immediately to the database. They are first recorded on the log file and then
those changes are applied once the commit is done. This is called the “Re-
doing” process. Once the rollback is done none of the changes are applied to
the database and the changes in the log file are also discarded. If the commit is
done before crashing the system, then after restarting the system the changes
that have been recorded in the log file are thus applied to the database.
2. Immediate Update: It is a technique for the maintenance of the transaction
log files of the DBMS. It is also called UNDO/REDO technique. It is used for
the recovery of transaction failures that occur due to power, memory, or OS
failures. Whenever any transaction is executed, the updates are made directly
to the database and the log file is also maintained which contains both old and
new values. Once the commit is done, all the changes get stored permanently
in the database, and records in the log file are thus discarded. Once rollback is
done the old values get restored in the database and all the changes made to
the database are also discarded. This is called the “Un-doing” process. If the
commit is done before crashing the system, then after restarting the system the
changes are stored permanently in the database.
Shadow Paging
Shadow Paging is recovery technique that is used to recover database. In this
recovery technique, database is considered as made up of fixed size of logical
units of storage which are referred as pages. pages are mapped into physical
blocks of storage, with help of the page table which allow one entry for each
logical page of database. This method uses two page tables named current
page table and shadow page table. The entries which are present in current
page table are used to point to most recent database pages on disk. Another
table i.e., Shadow page table is used when the transaction starts which is
copying current page table. After this, shadow page table gets saved on disk
and current page table is going to be used for transaction. Entries present in
current page table may be changed during execution but in shadow page table
it never get changed. After transaction, both tables become identical. This
technique is also known as Cut-of-Place updating.
To understand concept, consider above figure. In this 2 write operations are
performed on page 3 and 5. Before start of write operation on page 3, current
page table points to old page 3. When write operation starts following steps are
performed :
1. Firstly, search start for available free block in disk blocks.
2. After finding free block, it copies page 3 to free block which is represented
by Page 3 (New).
3. Now current page table points to Page 3 (New) on disk but shadow page
table points to old page 3 because it is not modified.
4. The changes are now propagated to Page 3 (New) which is pointed by
current page table.
COMMIT Operation : To commit transaction following steps should be done
:
1. All the modifications which are done by transaction which are present in
buffers are transferred to physical database.
2. Output current page table to disk.
3. Disk address of current page table output to fixed location which is in
stable storage containing address of shadow page table. This operation
overwrites address of old shadow page table. With this current page table
becomes same as shadow page table and transaction is committed.
Checkpoint
o The checkpoint is a type of mechanism where all the previous logs are
removed from the system and permanently stored in the storage disk.
o The checkpoint is like a bookmark. While the execution of the
transaction, such checkpoints are marked, and the transaction is executed
then using the steps of the transaction, the log files will be created.
o When it reaches to the checkpoint, then the transaction will be updated
into the database, and till that point, the entire log file will be removed
from the file. Then the log file is updated with the new step of transaction
till next checkpoint and so on.
o The checkpoint is used to declare a point before which the DBMS was in
the consistent state, and all transactions were committed.
Recovery using Checkpoint
In the following manner, a recovery system recovers the database from this
failure:
o The recovery system reads log files from the end to start. It reads log files
from T4 to T1.
o Recovery system maintains two lists, a redo-list, and an undo-list.
o The transaction is put into redo state if the recovery system sees a log
with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-
list and their previous list, all the transactions are removed and then
redone before saving their logs.
o For example: In the log file, transaction T2 and T3 will have <Tn, Start>
and <Tn, Commit>. The T1 transaction will have only <Tn, commit> in
the log file. That's why the transaction is committed after the checkpoint
is crossed. Hence it puts T1, T2 and T3 transaction into redo list.
o The transaction is put into undo state if the recovery system sees a log
with <Tn, Start> but no commit or abort log found. In the undo-list, all
the transactions are undone, and their logs are removed.
o For example: Transaction T4 will have <Tn, Start>. So T4 will be put
into undo list since this transaction is not yet complete and failed amid.
Client/server database
Evolution of client concepts
The evolution of client concepts in client/server database systems has closely
followed the broader evolution of computing and networking technologies.
Client/server architecture is characterized by the division of labor between
clients and servers, with clients making requests and servers providing data and
services. Here's an overview of the evolution of client concepts in client/server
database systems:
Early Client/Server Systems (1980s-1990s):
Dumb Terminals: Initially, client devices were often dumb terminals with
limited processing capabilities. They relied heavily on centralized servers for
data processing and storage.
PC-Based Clients (Late 1980s-1990s):
PC Clients: The advent of personal computers (PCs) led to the rise of PC-based
clients. PCs were more capable than dumb terminals and ran client applications
that communicated with database servers. These clients had local processing
power and storage.
Thin Clients (1990s-2000s):
Thin Clients: As network technology improved, there was a trend toward thin
clients. These clients offloaded much of the processing to the server, reducing
the requirements on the client side. Thin clients often used technologies like
Citrix and terminal services.
Web-Based Clients (Late 1990s-Present):
Web Clients: The late 1990s marked the advent of web-based clients. Web
browsers became the primary interface for interacting with client/server
databases. This allowed for cross-platform access and reduced the need for
client-side installations.
Mobile Clients (2000s-Present):
Mobile Clients: With the proliferation of smartphones and tablets, mobile
clients became essential. Mobile apps provided access to client/server databases
on mobile devices, and responsive web design adapted web clients to various
screen sizes.
Rich Clients (2000s-Present):
Rich Clients: Some applications use rich client technology, offering a more
interactive and feature-rich user experience. Rich clients often include a mix of
server-side and client-side processing.
Serverless Clients (2010s-Present):
Serverless Clients: With the emergence of serverless computing, some
client/server applications have adopted serverless clients. These clients can
perform tasks on-demand and scale dynamically without traditional server
infrastructure.
Edge Computing Clients (2010s-Present):
Edge Clients: Edge computing has given rise to clients that can perform
processing at the edge of the network, reducing latency and the need for
continuous communication with centralized servers.
AI-Enhanced Clients (2010s-Present):
AI-Driven Clients: Some contemporary clients incorporate artificial intelligence
to provide intelligent insights and automation. These clients can make data-
driven decisions and recommendations.
Hybrid and Multi-Platform Clients (2010s-Present):
Hybrid Clients: In many cases, applications use hybrid client concepts,
combining web, mobile, and desktop components. This allows users to access
databases from multiple platforms seamlessly.
Client Security and Authentication (Ongoing):
Enhanced Security: Client/server databases have evolved with enhanced
security features to protect data and ensure secure communication between
clients and servers. Multi-factor authentication and encryption are now
commonplace.
The evolution of client concepts in client/server databases reflects the
advancement of technology and the need for flexibility, scalability, and
enhanced user experiences. As technology continues to evolve, client/server
databases are likely to adapt to incorporate emerging technologies like
blockchain, quantum computing, and more for specialized applications.
Client/Server Environment
A client-server environment is a computing architecture in which tasks or
workloads are divided between two types of entities: clients and servers.
Clients:
Definition: Clients are the end-user devices or applications that request services,
resources, or data from servers. Clients initiate requests and wait for responses.
Types of Clients:
Hardware Clients: These include devices like personal computers,
smartphones, tablets, and IoT devices.
Software Clients: These are software applications running on devices.
Examples include web browsers, mobile apps, and desktop applications.
Responsibilities:
Initiating requests: Clients request services, data, or resources from servers by
sending requests.
User Interface: Clients provide the user interface through which users interact
with the system.
Processing: In some cases, clients may perform local processing and data
manipulation before sending data to servers.
Presentation: Clients are responsible for presenting data to users in a human-
readable format
Servers:
Definition: Servers are dedicated devices or software applications that respond
to client requests. They provide services, data, or resources as requested by
clients.
Types of Servers:
File Servers: Serve files or storage resources to clients.
Database Servers: Manage and provide access to databases.
Web Servers: Serve web pages and web applications to clients.
Application Servers: Execute application logic and provide services to clients.
Mail Servers: Handle email communication and services.
Print Servers: Manage printing tasks for clients.
Responsibilities:
Receiving requests: Servers listen for incoming client requests and process
them.
Processing: Servers perform the necessary computations, data retrieval, or
service execution as requested by clients.
Data Management: In the case of database servers, they manage and store data.
Communication: Servers communicate with clients to provide responses.
Scalability: Servers must be scalable to handle multiple client requests
simultaneously.
Communication:
Client-server systems communicate through a network, typically using
standardized protocols like HTTP, FTP, or TCP/IP.
Clients initiate communication by sending requests to the appropriate server.
Servers respond to client requests by sending data or executing the requested
service.
Benefits:
Scalability: Client-server architectures are scalable, allowing for the addition of
more clients and servers to meet growing demands.
Centralized Control: Servers centralize data and services, making it easier to
manage, secure, and maintain the system.
Client Diversity: Different types of clients can connect to servers, including web
browsers, mobile apps, and more.
Improved Performance: Servers can offload processing tasks from clients,
improving system performance and response times.
The client-server environment is a fundamental architecture used in various
applications, including web services, databases, email systems, and more.
Characteristics of Client Server Architecture
The client / server refer to a mode of communication between multiple
computers on a network that distinguishes one or more clients on the server:
each client software can send requests to a server. A server can be specialized in
server applications, files, terminals, or email.
Characteristics of a server:
It is initially passive (or slave, waiting for a query);
It is listening, ready to respond to requests sent by clients;
When a request comes, he treats it and sends a response.
Characteristics of a client:
It is the first active (or master);
Sends requests to the server;
It expects and receives responses from the server.
The client and the server must of course use the same communication protocol.
A server is generally capable of serving multiple clients simultaneously.
What is Client Server Architecture?
❉ The Client Server architecture, which is also known as Network Computing
model is a shared computer network that allows large applications to be split into
smaller tasks and to perform the tasks among server machine (host) and client
machine (desktop).
❉ It is defined as a form of distributed computing system since all the
components execute their tasks independently without affecting each other.
❉ All the requests and services are delivered over a network or internet
connection.
There are 3 main components in Client Server computing.
1. Client Process
2. Server Process
3. Middleware
Client
� It is a process that requests particular services from the server process.
� Client systems have many operations such as;
➣ User Interface Management.
➣ Accepting and checking the syntax of user inputs.
➣ Processing application logic.
➣ Generation of database requests and transmitting them to server.
➣ Passing response back to server.
Server
� A process that supplies services requested by clients.
� Server does operations such as;
➣ Accepting and processing database requests from client.
➣ Authorization checking.
➣ Make sure that integrity constraints are not violated.
➣ Perform necessary queries and transmit responses to clients.
➣ Providing recovery control.
➣ Providing concurrent database access.
Middleware
� A middleware can be introduced as a software which runs in between client
and server processes enabling the communication process.
� It delivers secure and transparent services to the users.
Types of Client Server Architectures
❉ When considering a client/server architecture, the server acts as the producer
while the client acts as a consumer.
❉ There are many types of Client Server Architectures.
2-Tier Architecture
2-Tier Architecture basically consists of two parts namely; a client tier and a
server tier. But there is another protocol that bridges the gap between those 2
tiers.
� This architecture is designed in a way where user feedback and validation are
allowed to occur on the client side.
� Usually, GUI code exists on the client host while the business logic exists on
the server host.
Ex: Web Server
Advantages of 2-Tier Architecture
� Implementation of Homogeneous Environment
� Easy to maintain and modify
� Easy to design applications
� Better user satisfaction
Disadvantages of 2-Tier Architecture
� Complicated
� Expensive
� Error prone
� Time consuming
3-Tier Architecture
There are 3 layers in 3-tier architecture namely;
1. Presentation layer (Client Tier)
➣ Input/output and interaction with the user.
➣ It is responsible for deciding how the data should be presented to the user
➣ Basic input validation can still be done here.
2. Application layer (Business Tier)
➣ Control access to data
➣ Responsible for processing
3. Database layer (Data Tier)
➣ Record transaction
➣ Ensure integrity of data.
� When a client machine sends the request to server machine, the middle layer
receives the request first, and the server obtains this request at the end. Therefore
the response of server is received by middle layer first and then it is obtained by
client machine.
� All data logic and business logic are stored on the middleware.
Advantages of 3-Tier Architecture
� 3 Tiers provide better service.
� Fast performance.
� Improved Data security.
� Hide database structure.
� Reliable
� Improvement in concurrency.
Disadvantages of 3-Tier Architecture
�Complex due to advanced features
Application partitioning
Application partitioning in client-server computing involves breaking down a
software application into various components or layers to distribute processing
and functionality between clients and servers. This partitioning is a critical
design decision that affects system performance, scalability, and
maintainability. There are several common strategies for application
partitioning in client-server architectures:
Presentation Layer Partitioning:
Client-Side Rendering: In this approach, the client handles the user interface
rendering and presentation logic. The server sends data to the client, which then
processes and displays it. This is common in web applications using JavaScript
frameworks.
Server-Side Rendering: In contrast to client-side rendering, server-side
rendering involves rendering the user interface on the server and sending pre-
rendered pages to the client. This approach is often used in content management
systems (CMS) and traditional web applications.
Hybrid Rendering: A combination of client-side and server-side rendering,
where some parts of the user interface are rendered on the client, and others are
rendered on the server. This approach optimizes performance and user
experience.
Business Logic Partitioning:
Client-Side Business Logic: Some business logic can be handled by the client,
reducing the load on the server. For example, client-side validation in web
forms and client-side caching are common.
Server-Side Business Logic: Complex business rules and processing are
executed on the server. This ensures data consistency and security. For example,
in e-commerce applications, the calculation of prices and inventory
management is often server-side.
Data Layer Partitioning:
Client-Side Data Management: In this approach, clients manage local data
storage, such as browser-based local storage, to reduce the need for frequent
server requests. This is commonly used in progressive web apps (PWAs) and
mobile applications.
Server-Side Data Management: Databases and data-related operations are
typically handled on the server. Clients send requests to retrieve or update data.
Load Balancing and Scaling:
Load balancing can distribute client requests across multiple server instances.
Application partitioning may involve replicating and distributing application
components across different server nodes to ensure scalability and high
availability.
Microservices:
In a microservices architecture, an application is partitioned into a collection of
small, loosely-coupled services that can be developed, deployed, and scaled
independently. Each microservice can have its own client-server structure.
Service-Oriented Architecture (SOA):
SOA divides an application into a set of services that are accessible over a
network. These services have well-defined interfaces and can be used by
various clients.
Caching and Content Delivery:
Content delivery networks (CDNs) and caching mechanisms can be used to
reduce server load by caching static content or frequently accessed data near the
client.
Real-time Communication:
Applications that require real-time capabilities often employ technologies like
WebSockets to enable bidirectional communication between clients and servers
for activities like chat, notifications, and collaborative editing.
Remote Procedure Call (RPC):
In an RPC-based system, client and server communicate by invoking remote
procedures or methods. This can help partition the application's logic and
execution between clients and servers.
Asynchronous Processing:
Background tasks and processing can be offloaded to server-side workers,
allowing clients to receive quick responses while resource-intensive operations
are executed asynchronously on the server.
The choice of how to partition an application in a client-server environment
depends on factors like performance requirements, data security,
maintainability, and the nature of the application. It's essential to carefully
design and plan application partitioning to achieve the desired balance of
functionality and performance.
Communication between clients and servers
Communication between clients and servers in a client-server computing model
is a fundamental aspect of distributed systems. It involves the exchange of data,
requests, and responses over a network. There are several communication
methods and protocols that facilitate this interaction:
Network Protocols:
TCP/IP (Transmission Control Protocol/Internet Protocol): TCP/IP is the
foundation of most internet communications. It provides reliable, connection-
oriented communication. It is commonly used for applications like web
browsing, email, and file transfer.
HTTP/HTTPS (Hypertext Transfer Protocol/Secure Hypertext Transfer
Protocol): HTTP is used for communication between web clients (browsers)
and web servers. HTTPS adds security through encryption, making it suitable
for sensitive data transmission.
FTP (File Transfer Protocol): FTP is a protocol for transferring files between
clients and servers. It is often used for uploading and downloading files to and
from a server.
SMTP/POP/IMAP (Simple Mail Transfer Protocol/Post Office
Protocol/Internet Message Access Protocol): These are email communication
protocols used for sending and receiving email messages.
DNS (Domain Name System): DNS resolves domain names to IP addresses,
facilitating the location of servers on the internet.
SNMP (Simple Network Management Protocol): SNMP is used for
managing and monitoring network devices, such as routers and switches.
Web-Based Communication:
REST (Representational State Transfer): REST is an architectural style for
designing networked applications. It uses HTTP methods (GET, POST, PUT,
DELETE) to perform CRUD (Create, Read, Update, Delete) operations on
resources, making it common for web services.
SOAP (Simple Object Access Protocol): SOAP is a protocol for exchanging
structured information in the implementation of web services. It can use various
transport protocols, including HTTP and SMTP.
WebSockets: WebSockets enable full-duplex, bidirectional communication
between clients and servers. They are used for real-time applications like chat,
gaming, and collaborative tools.
Remote Procedure Call (RPC): gRPC: gRPC is an open-source RPC
framework that uses HTTP/2 for communication. It allows clients to call
methods on a server as if they were local procedures. It's often used in
microservices architectures.
XML-RPC and JSON-RPC: These are protocols that allow remote procedure
calls using XML or JSON as the message format.
Message Queues:
MQTT (Message Queuing Telemetry Transport): MQTT is a lightweight and
efficient protocol for publish-subscribe messaging. It is commonly used in IoT
applications.
AMQP (Advanced Message Queuing Protocol): AMQP is an open standard
for message-oriented middleware, providing queuing and routing features.
Custom Protocols:
In some cases, applications may use custom communication protocols tailored
to their specific needs.
Security Protocols:
SSL/TLS (Secure Sockets Layer/Transport Layer Security): These protocols
add encryption and secure authentication to communication, ensuring data
privacy and integrity.
Authentication and Authorization:
Various mechanisms, such as tokens, OAuth, and API keys, are used to
authenticate clients and authorize their access to server resources.
Asynchronous Communication:
In scenarios where immediate responses are not required, asynchronous
communication can be implemented using message queues, publish-subscribe
models, or callback mechanisms.
Effective communication between clients and servers is essential for building
reliable, secure, and efficient distributed systems. The choice of communication
method and protocol depends on the specific requirements and constraints of the
application, including performance, security, and scalability considerations.