Advanced Database Systems
Spring 2025
Lecture #18:
Transactions
R&G: Chapters 16 & 17
2
A RCHITECTURE OF A DBMS
Up until now we have assumed a SQ L C lie nt
single-user architecture and failure-
Q uer y Pl anning
free execution
O pera tor Exe cuti on
F ile s & Index Mana ge me nt
C onc urre ncy Control
Buf fe r Mana ge me nt
Re cove r y
Di sk S pace Ma nag em e nt
Da taba se
3
M OTIVATION
We both change the same record in
Concurrency Control
a table at the same time.
How to avoid race condition?
You transfer £100 between bank accounts
but there is a power failure. Recovery
What is the correct database state?
Both concurrency control and recovery are based
on a concept of transactions with ACID properties
4
T RANSACTIONS
A transaction is the execution of a
Transaction 1 Transaction 2 Transaction 3
sequence of operations (e.g., SQL BEGIN XACT BEGIN XACT BEGIN XACT
queries) on a shared database to SELECT INSERT DELETE
INSERT DELETE DELETE
perform some higher-level function DELETE ROLLBACK COMMIT
COMMIT
Basic unit of change in a DBMS
Partial transactions are not allowed!
5
U SER P ERSPECTIVE : T RANSACTIONS
Transaction (abbr. txn) = group of operations the user wants
the DBMS to treat “as one”
A new transaction starts with the BEGIN command
The transaction stops with either COMMIT or ABORT (ROLLBACK)
If commits, all changes are saved
If aborts, all changes are undone (as if the txn never executed at all)
Abort can be either self-inflicted or caused by DBMS
6
T RANSACTION E XAMPLE
Transfer £100 from Checking to Savings account of user 1904
BEGIN Consistent DB
// check if Checking balance > 100
UPDATE Accounts
SET balance = balance – 100
WHERE customer_id = 1904
AND account_type = ‘Checking’;
UPDATE Accounts
Temporary inconsistent DB
SET balance = balance + 100
WHERE customer_id = 1904
AND account_type = ‘Savings’;
COMMIT Consistent DB
7
T RANSACTION E XAMPLE
Transfer £100 from Checking to Savings account of user 1904
BEGIN How to check if balance > 100?
// check if Checking balance > 100
UPDATE Accounts Outside DBMS using another language
SET balance = balance – 100
WHERE customer_id = 1904 E.g., in Java or PHP code
AND account_type = ‘Checking’;
UPDATE Accounts Inside DBMS using stored procedures
SET balance = balance + 100 expressed in PL/SQL or T-SQL
WHERE customer_id = 1904
AND account_type = ‘Savings’; PL/SQL = SQL + procedural constructs such as
COMMIT if-then-else, loops, variables, functions…
8
D ATABASE P ERSPECTIVE
A transaction may carry out many operations on the data
retrieved from the database
However, the DBMS is only concerned about what data is
read/written from/to the database
Changes to the “outside world” are beyond scope of the DBMS
9
T RANSACTIONS : F ORMAL D EFINITION
Database = fixed set of named data objects (A, B, C, …)
Transactions access object A using read A and write A, for short R(A) and W(A)
In a relational DBMS, an object can be an attribute, record, page, or table
Transaction = sequence of read and write operations
T = ⟨ R(A), W(A), W(B), … ⟩
DBMS’s abstract view of a user program
10
S TRAWMAN E XECUTION
Execute each txn one-by-one (serial order) as they arrive in the DBMS
One and only one txn can be running at the same time in the DBMS
Before a txn starts, copy the entire database to a new file and make all
changes to that file
If the txn completes successfully, overwrite the original file with the new one
If the txn fails, just remove the dirty copy
SQLite executes transactions in serial order
11
C ONCURRENT E XECUTION
A better approach is to allow concurrent execution of independent transactions
Why do we want that? Txn 1 Txn 2 Txn 3
Better resource utilization and throughput (txns/sec) 3 3 3
2 2 2
Use the CPU while another txn is waiting for the disk
1 1 1
Multicore: Ideally, scale throughput in the # of CPUs
SCHEDULER
Decreased response times to users
One txn’s latency need not be dependent on another unrelated txn 1
Or that’s the hope 1
2
But we also would like correctness and fairness 1
EXECUTOR
12
T RANSACTION G UARANTEES : ACID
Atomicity: All actions in the txn happen, or none happen
“all or nothing”
Consistency: If each txn is consistent and the DB starts
consistent, then it ends up consistent
“it looks correct to me”
Isolation: Execution of one txn is isolated from that of other txns
“as if alone”
Durability: If a txn commits, its effects persist
“survive failures”
13
ACID P ROPERTIES : ATOMICITY
Two possible outcomes of executing a transaction:
Commit after completing all actions
Abort (or be aborted by the DBMS) after executing some actions
The DBMS guarantees that transactions are atomic
From user’s point of view:
A transaction always either executes all its actions or executes no actions at all
Example:
Take £100 from account A, but then a power failure happens before crediting account B
When the DBMS comes back online, what should be the correct state of the database?
14
M ECHANISMS FOR E NSURING ATOMICITY
Approach #1: Logging
DBMS logs all actions so that it can undo the actions of aborted transactions
Write-ahead logging is used by almost all modern database systems
Efficiency reasons: random writes turned into sequential writes through a log
Audit trail: everything done by the app is recorded
Approach #2: Shadow Paging (copy-on-write)
DBMS makes copies of pages and transactions make changes to those copies
Only when the transaction commits is the page made visible to others
Few database systems do this (CouchDB, LMDB)
15
ACID P ROPERTIES : C ONSISTENCY
Database consistency
The database accurately models the real world and follows integrity constraints
Transactions in the future see the effects of transactions committed in the past
Transaction consistency
If the database is consistent before the txn starts (running alone), it will be also consistent after
Transaction consistency is the application’s responsibility!
16
ACID P ROPERTIES : I SOLATION
Users submit transactions, and each transaction executes as
if it was running alone
The DBMS achieves concurrency by interleaving actions
(read/writes of database objects) of various transactions
How do we achieve this?
17
M ECHANISMS FOR E NSURING I SOLATION
A concurrency control protocol is how the DBMS decides the proper
interleaving of operations from multiple transactions
Two main categories:
Pessimistic: Don’t let problems arise in the first place
Optimistic: Assume conflicts are rare, deal with them after they happen
18
E XAMPLE
Assume at first accounts A and B each have £1000
T1 transfers £100 from A to B
T2 credits both accounts with 6% interest
T1 T2
BEGIN BEGIN
A = A - 100 A = A * 1.06
B = B + 100 B = B * 1.06
END END
20
E XAMPLE
Assume at first accounts A and B each have £1000
What are the possible outcomes of running T1 and T2?
Many! But A+B should be 2000 * 1.06 = 2120
There is no guarantee that T1 will execute before T2 or vice versa,
if both are submitted together
But the net effect must be equivalent to these two transactions running
serially in some order
22
E XAMPLE : S ERIAL E XECUTION
Schedule Schedule
T1 T2 T1 T2
BEGIN BEGIN
A = A - 100 A = A * 1.06
B = B + 100 B = B * 1.06
TIME
COMMIT
BEGIN
A = A * 1.06
≡ BEGIN
A = A - 100
COMMIT
B = B * 1.06 B = B + 100
COMMIT COMMIT
A = 954, B = 1166 A = 960, B = 1160
A+B = 2120
24
E XAMPLE : I NTERLEAVING (G OOD )
Schedule Schedule
T1 T2 T1 T2
BEGIN BEGIN
A = A - 100 A = A - 100
BEGIN B = B + 100
TIME
B = B + 100
COMMIT
A = A * 1.06
≡ COMMIT
BEGIN
A = A * 1.06
B = B * 1.06 B = B * 1.06
COMMIT COMMIT
A = 954, B = 1166 A = 954, B = 1166
25
E XAMPLE : I NTERLEAVING (B AD )
Schedule
T1 T2
BEGIN
A = A - 100
BEGIN A = 954, B = 1166
TIME
A = A * 1.06
B = B * 1.06
COMMIT
≢ or
B = B + 100 A = 960, B = 1160
COMMIT
A = 954, B = 1160
The bank is missing £6!
A+B = 2114
26
E XAMPLE : I NTERLEAVING (B AD )
Schedule DBMS View
T1 T2 T1 T2
BEGIN BEGIN
A = A - 100 R(A)
BEGIN W(A)
TIME
A = A * 1.06 BEGIN
B = B * 1.06 R(A)
COMMIT W(A)
B = B + 100 R(B)
COMMIT W(B)
COMMIT
A = 954, B = 1160 R(B)
W(B)
COMMIT
A+B = 2114
27
C ORRECTNESS
How do we judge whether a schedule is correct?
If the schedule is equivalent to some serial execution
Schedule S for a set of transactions { T1, … , Tn }
S contains all steps of all transactions and order among steps in each Ti is preserved
S = ⟨ (T1, read B), (T2, read A), (T2, write B), (T1, write A) ⟩
for short, S = ⟨ R1(B), R2(A), W2(B), W1(A) ⟩
28
F ORMAL P ROPERTIES OF S CHEDULES
Equivalent schedules
For any database state, the effect of executing the first schedule is identical
to the effect of executing the second schedule
Does not matter what the higher-level operations are!
Serial schedule (no concurrency)
A schedule that does not interleave the actions of different transactions
Txn 2 Txn 1 Txn 3
1 2 3 1 2 3 1 2 3
TIME
29
F ORMAL P ROPERTIES OF S CHEDULES
Serializable schedule
A schedule that is equivalent to some serial execution of the transactions
If each transaction preserves consistency, every serializable
schedule preserves consistency
Serializability
Less intuitive notion of correctness compared to transaction initiation time
or commit order
But it provides the DBMS with flexibility in scheduling operations
More flexibility means better parallelism
30
C ONFLICTING O PERATIONS
We need a formal notion of equivalence that can be implemented
efficiently based on the notion of “conflicting” operations
Two operations conflict if
They are by different transactions
They are on the same object and at least one of them is a write
Interleaved execution anomalies:
Read-Write conflicts (R-W)
Write-Read conflicts (W-R)
Write-Write conflicts (W-W)
31
R EAD -W RITE C ONFLICTS
Unrepeatable Reads
T1 T2
BEGIN
£10 R(A)
BEGIN
R(A) £10
W(A) £19
COMMIT
£19 R(A)
COMMIT
32
W RITE -R EAD C ONFLICTS
Reading Uncommitted Data (“Dirty Reads”)
T1 T2
BEGIN
£10 R(A)
£12 W(A)
BEGIN
R(A) £12
W(A) £14
COMMIT
ABORT
Not recoverable
33
W RITE -W RITE C ONFLICTS
Overwriting Uncommitted Data (”Lost Update”)
T1 T2
BEGIN
£10 W(A)
BEGIN
W(A) £19
W(B) Alice
COMMIT
Michael W(B)
COMMIT
34
F ORMAL P ROPERTIES OF S CHEDULES
Given these conflicts, we can now understand what it means
for a schedule to be serializable
This is to check whether schedules are correct
This is not how to generate a correct schedule
There are levels of serializability
Conflict Serializability
Most DBMS try to support this
View Serializability
No DBMS supports this
35
C ONFLICT S ERIALIZABLE S CHEDULES
Two schedules are conflict equivalent iff
They involve the same actions of the same transactions
Every pair of conflicting actions is ordered in the same way
Schedule S is conflict serializable if S is conflict equivalent to some serial
schedule
Intuition: Schedule S is conflict serializable if you can transform S into a serial schedule by
swapping consecutive non-conflicting operations of different txns
36
C ONFLICT S ERIALIZABILITY : I NTUITION
Schedule
T1 T2
BEGIN BEGIN
R(A)
W(A)
TIME
R(A)
W(A)
R(B)
W(B)
COMMIT
R(B)
W(B)
COMMIT
37
C ONFLICT S ERIALIZABILITY : I NTUITION
Schedule
T1 T2
BEGIN BEGIN
R(A)
W(A)
TIME
R(A)
R(B)
W(A)
W(B)
COMMIT
R(B)
W(B)
COMMIT
38
C ONFLICT S ERIALIZABILITY : I NTUITION
Schedule
T1 T2
BEGIN BEGIN
R(A)
W(A)
TIME
R(B)
R(A)
W(A)
W(B)
COMMIT
R(B)
W(B)
COMMIT
39
C ONFLICT S ERIALIZABILITY : I NTUITION
Schedule
T1 T2
BEGIN BEGIN
R(A)
W(A)
TIME
R(B)
R(A)
W(B)
W(A)
COMMIT
R(B)
W(B)
COMMIT
40
C ONFLICT S ERIALIZABILITY : I NTUITION
Schedule Serial schedule
T1 T2 T1 T2
BEGIN BEGIN BEGIN
R(A) R(A)
W(A) W(A)
≡
TIME
R(B) R(B)
W(B) W(B)
R(A) COMMIT BEGIN
W(A) R(A)
COMMIT W(A)
R(B) R(B)
W(B) W(B)
COMMIT COMMIT
Serializable
41
C ONFLICT S ERIALIZABILITY : I NTUITION
Schedule Serial schedule
T1 T2 T1 T2
BEGIN BEGIN BEGIN
TIME
R(A) R(A)
R(A) W(A)
W(A)
W(A)
≢ COMMIT BEGIN
R(A)
COMMIT COMMIT W(A)
COMMIT
Not conflict-serializable
42
S ERIALIZABILITY
Swapping operations is easy when there are only two txns in the schedule
But it’s cumbersome when there are many txns
Are there any faster algorithms to figure this out other than transposing
operations?
43
D EPENDENCY G RAPHS
Dependency Graph
Dependency graph for a schedule
Ti Tj
One node per transaction
Edge from Ti to Tj if:
Operation Oi of Ti conflicts with an operation Oj of Tj and
Oi appears earlier in the schedule than Oj
Also known as a conflict graph or precedence graph
A schedule is conflict-serializable if and only if
its dependency graph is acyclic
Equivalent serial schedule can be obtained by sorting the graph topologically
44
E XAMPLE #1
Schedule Dependency Graph
T1 T2
BEGIN BEGIN
R(A) T1 T2
W(A)
TIME
R(A)
W(A)
R(B)
W(B)
COMMIT
R(B) The cycle in the graph reveals
W(B) the problem. The output of T1
COMMIT depends on T2, and vice versa.
45
E XAMPLE #2 - T HREESOME
Schedule Dependency Graph
T1 T2 T3
T1 T2
BEGIN
R(A)
W(A) BEGIN
T3
TIME
R(A)
W(A)
BEGIN COMMIT
R(B)
W(B)
R(B) COMMIT Is this equivalent to a serial schedule?
W(B)
COMMIT
Yes, (T2, T1, T3)
Notice that T3 should go after T2 although
T3 starts before T2!
48
V IEW S ERIALIZABILITY
Alternative (weaker) notion of serializability
Schedule S1 and S2 are view equivalent iff
If T1 reads initial value of A in S1, then T1 also reads initial value of A in S2
If T1 reads value of A written by T2 in S1, then T1 also reads value of A
written by T2 in S2
If T1 writes final value of A in S1, then T1 also writes final value of A in S2
49
V IEW S ERIALIZABILITY
Schedule Dependency Graph
T1 T2 T3
BEGIN
R(A) BEGIN T1 T2
W(A)
TIME
BEGIN
W(A)
W(A)
COMMIT COMMIT COMMIT
T3
Not conflict serializable. But is this
equivalent to a serial schedule?
50
V IEW S ERIALIZABILITY
Schedule Serial Schedule
T1 T2 T3 T1 T2 T3
BEGIN BEGIN
R(A) BEGIN R(A)
W(A) W(A)
TIME
BEGIN VIEW COMMIT
W(A)
COMMIT COMMIT
W(A)
COMMIT
≡ BEGIN
W(A)
COMMIT
BEGIN
W(A)
COMMIT
Allows all conflict serializable
schedules + “blind writes”
51
S ERIALIZABILITY
Conflict serializability All Schedules
Can enforced efficiently
View Serializable
All DBMSs support it
Conflict Serializable
View serializability
Admits (slightly) more schedules than CS Serial
But it is difficult to enforce efficiently
No DBMS supports it
Neither definition allows all “serializable” schedules
They do not understand the meaning of the operations or the data
52
ACID P ROPERTIES : D URABILITY
All of the changes of committed transactions must be persistent
No torn updates
No changes from failed transactions
The DBMS uses either logging or shadow paging to ensure
that all changes are durable
More about logging in next lectures
53
S UMMARY
ACID Transactions Serializability
Atomicity: All or nothing Serializable schedules
Consistency: Only valid data Conflict & view serializability
Isolation: No interference Checking for conflict serializability
Durability: Committed data persists
Concurrency control and recovery are among the most important functions
provided by a DBMS