Unit Iv
Unit Iv
Unit Lecture
4. CONSENSUS AND RECOVERY 29
No No
Topic Consensus and Agreement Algorithms: Problem Definition
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Compare Byzantine agreement and consensus problems K2
LO2 Explain the concept of "validity" in the context of agreement K2
problems.
LO3 Discuss the Byzantine agreement problem in detail, including its K3
definition, requirements (agreement, validity, termination), and
the challenges posed by Byzantine failures.
LO4 Analyze the consensus problem by explaining its key K3
components (agreement, validity, termination), the types of
failure models it can handle, and common algorithms used to
achieve consensus in distributed systems.
Problem definition
Agreement among the processes in a distributed system is a fundamental requirement for a wide range
of applications. Many forms of coordination require the processes to exchange information to negotiate
with one another and eventually reach a common understanding or agreement, before taking
application-specific actions. A classical example is that of the commit decision in database systems,
wherein the processes collectively decide whether to commit or abort a transaction that they participate
in.
We first state some assumptions underlying our study of agreement algorithms:
• Failure models Among the n processes in the system, at most f processes can be faulty. A faulty
process can behave in any manner allowed by the failure model assumed. The various failure models –
fail-stop, send omission and receive omission, and Byzantine failures.
• Agreement All non-faulty processes must agree on the same (single) value.
• Validity If all the non-faulty processes have the same initial value, then the agreed uponvalue by
all the non-faulty processes must be that same value.
• Termination Each non-faulty process must eventually decide on a value.
The interactive consistency problem
The interactive consistency problem differs from the Byzantine agreement problem in that each process
has an initial value, and all the correct processes must agree upon a set ofvalues, with one value for
each process
• Agreement All non-faulty processes must agree on the same array of values A[v1…vn]
• Validity If process i is non-faulty and its initial value is vi, then all non faulty processes agree on
vi as the ith element of the array A. If process j is faulty, then the non-faulty processes can agree on
any value for A[j].
• Termination Each non-faulty process must eventually decide on the array A
Bloom’s
Qn
Question Answer Knowledge
No
Level
1. What is the main goal of the Byzantine agreement problem? B K1
A) Agreement
B) Validity
C) Performance
D) Termination
A) A single value B
B) A set of values corresponding to each process
C) The maximum value from all processes
D) A random value chosen by a faulty process
A) n−1
B) f
C) (n−1)/3
D) n/2
Students have to prepare answers for the following questions at the end of the lecture
Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 What are the two main differences between Byzantine 2 CO4 K1
agreement and consensus problems?
2 Explain the concept of "validity" in the context of 2 CO4 K1
agreement problems.
3 What role does authenticated messaging play in solving 2 CO4 K1
agreement problems in distributed systems?
4 Discuss the Byzantine agreement problem in detail, 15 CO4 K3
including its definition, requirements (agreement, validity,
termination), and the challenges posed by Byzantine
failures. Explain one algorithm used to solve the
Byzantine agreement problem.
5 Analyze the consensus problem by explaining its key 15 CO4 K3
components (agreement, validity, and termination), the
types of failure models it can handle, and common
algorithms used to achieve consensus in distributed
systems. Provide examples of scenarios where consensus
is critical.
Reference Book
Unit Lecture
4. CONSENSUS AND RECOVERY 30
No No
Topic Overview of Results
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Explain the relationship between the consensus problem and the K2
attainability of common knowledge in both synchronous and
asynchronous systems. Discuss the implications of various
failure modes, including crash and Byzantine failures, on the
ability to achieve consensus.
LO2 Discuss the different variants of the consensus problem, K2
including reliable broadcast, k-set consensus, C-agreement, and
renaming. Describe their definitions, the failure models they
address, and the specific conditions required for each variant to
be solvable.
Overview of results:
It is worth understanding the relation between the consensus problem and the problem of attaining
common knowledge of the agreement value. For the “no failure” case, consensus is attainable. Further in the
synchronous system, common knowledge of the consensus value is also attainable. Whereas in the
asynchronous system, concurrent common knowledge of the consensus is attainable.
f < n processes
Byzantine agreement attainable agreement not attainable
Failure f ≤ [(n - 1)/3] Byzantine processes
The overhead bounds are for the given algorithms, and not necessarily tight bounds for the problem.
k-set Crash Failure, f<k<n.(MP and Size of the set of values agreed
consensus SM) upon must be less than k
Bloom’s
Qn
Question Answer Knowledge
No
Level
1. What is the primary condition under which consensus is C K1
attainable in a synchronous system with no failures?
Students have to prepare answers for the following questions at the end of the lecture
Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 In a synchronous system with no failures, what can be 2 CO4 K2
said about the attainability of both agreement and
common knowledge of the consensus value?
2 Define the k-set consensus problem and its condition 2 CO4 K2
regarding the size of the set of values that must be agreed
upon.
3 What is the maximum number of Byzantine processes f 2 CO4 K1
that can be tolerated in order to achieve agreement in the
Byzantine agreement problem?
4 Explain the relationship between the consensus problem 15 CO4 K2
and the attainability of common knowledge in both
synchronous and asynchronous systems. Discuss the
implications of various failure modes, including crash and
Byzantine failures, on the ability to achieve consensus.
5 Discuss the different variants of the consensus problem, 15 CO4 K2
including reliable broadcast, k-set consensus, C-
agreement, and renaming. Describe their definitions, the
failure models they address, and the specific conditions
required for each variant to be solvable.
Reference Book
Bloom’s
Qn
Question Answer Knowledge
No
Level
1. B K1
In a failure-free system, how is consensus typically reached?
2 C K1
What is an example of a decision function used in consensus
algorithms in distributed systems?
a) Sum function
b) Product function
c) Max function
d) Logarithmic function
3 B K1
In a synchronous system, consensus can be reached in how
many rounds?
a) Infinite rounds
b) Constant number of rounds
c) One round
d) Varying number of rounds based on failure
4 A K1
Which of the following algorithms could be used for
consensus in a distributed system?
Students have to prepare answers for the following questions at the end of the lecture
Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 In a failure-free system, how can consensus be reached 2 CO4 K1
among processes?
2 What is one advantage of using a synchronous system for 2 CO4 K1
reaching consensus compared to an asynchronous system?
3 What additional capability can be obtained in a 2 CO4 K1
synchronous system after reaching a consensus?
4 Discuss the methods used to achieve consensus in a 15 CO4 K2
failure-free system. Include a description of various
algorithms and functions that can be applied, and explain
the differences in approach between synchronous and
asynchronous systems.
5 Evaluate the impact of different logical topologies on 15 CO4 K3
consensus algorithms in a failure-free system. Discuss
how specific topologies might influence the efficiency and
effectiveness of reaching consensus in both synchronous
and asynchronous environments.
Reference Book
Unit Lecture
4. CONSENSUS AND RECOVERY 32
No No
Topic Agreement in Synchronous Systems with Failures
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Explain in detail the consensus algorithm for crash failures in a K3
synchronous system. Discuss its working mechanism, conditions
for correctness, and its complexity.
LO2 Describe the Phase King algorithm used for Byzantine failures in K3
synchronous systems. Discuss how it ensures consensus despite
faulty processes and provide an analysis of its correctness and
complexity.
• The agreement condition is satisfied because in the f+ 1 rounds, there must be at least one round in which
no process failed.
• In this round, say round r, all the processes that have not failed so far succeed in broadcasting their
values, and all these processes take the minimum of the values broadcast and received in that round.
• Thus, the local values at the end of the round are the same, say x r i for all non-failed processes.
• In further rounds, only this value may be sent by each process at most once, and no process i will update
its value x r i.
• The validity condition is satisfied because processes do not send fictitious values in this failure model.
• For all i, if the initial value is identical, then the only value sent by any process is the value that has been
agreed upon as per the agreement condition.
• The termination condition is seen to be satisfied.
Complexity: The complexity of this particular algorithm is it requires f + 1 rounds where f < n and the
number of messages is O(n2)in each round and each message has one integers hence the total number
of messages is O((f+1)· n 2 ) is the total number of rounds and in each round n 2 messages are required.
2 in 2nd round, the "Phase king" process arrives at an estimate based on thevalues
it received in 1st round, and broadcasts its new estimate to all others.
Fig. Message pattern for the phase-king algorithm.
(f + 1) phases, (f + 1)[(n - 1)(n + 1)] messages, and can tolerate up to f < dn=4e
malicious processes
Correctness Argument
Pi and Pj use their own majority values. Pi 's mult > n=2 + f )
Pi uses its majority value; Pj uses phase-king's tie-breaker value. (Pi’s mult >
n=2 +f , Pj 's mult > n=2 for same value)
Pi and Pj use the phase-king's tie-breaker value. (In the phase in which Pkis
non- malicious, it sends same value to Pi and Pj )
In all 3 cases, argue that Pi and Pj end up with same value as estimate
If all non-malicious processes have the value x at the start of a phase, they
will continue to have x as the consensus value at the end of the phase.
Qn Bloom’s
Question Answer
No Knowledge Level
1 C K1
In a crash failure model, how many rounds are required
to tolerate up to f failures in a synchronous system with n
processes?
a) f−1
b) 2f
c) f+1
d) n−f
2 C K1
In the crash failure consensus algorithm, what value does
each process use to update its local variable at the end of
each round?
3 C K1
Which of the following is true about the Byzantine fault
tolerance in synchronous systems?
4 B K1
In the Phase King algorithm, what happens during the
second round of each phase?
5 C K1
What is the primary method used by processes in the
Phase King algorithm to resolve conflicts and reach
consensus?
Students have to prepare answers for the following questions at the end of the lecture
Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 What is the key difference between crash failures and 2 CO4 K1
Byzantine failures in distributed systems?
2 How many rounds are required to reach consensus in a 2 CO4 K2
crash failure model with up to fff failures in a
synchronous system?
3 Explain in detail the consensus algorithm for crash 15 CO4 K3
failures in a synchronous system. Discuss its working
mechanism, conditions for correctness, and its
complexity.
4 Describe the Phase King algorithm used for Byzantine 15 CO4 K3
failures in synchronous systems. Discuss how it ensures
consensus despite faulty processes and provide an analysis
of its correctness and complexity.
Reference Book
Unit Lecture
4. CONSENSUS AND RECOVERY 33
No No
Topic Check pointing and Rollback Recovery: Introduction
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Define a domino effect in rollback recovery? K2
LO2 Illustrate key difference between coordinated and uncoordinated K2
check pointing in rollback recovery?
LO3 What does log-based rollback recovery rely on, and why is it K1
useful?
Rollback recovery protocols restore the system back to a consistent state after a failure,
It achieves fault tolerance by periodically saving the state of a process during the failure-
free execution
It treats a distributed system application as a collection of processes that communicate
over a network
Check points
The saved state is called a checkpoint, and the procedure of restarting from a previously check
pointed state is called rollback recovery. A checkpoint can be saved on either the stable storage
or the volatile storage
Why is rollback recovery of distributed systems complicated?
Messages induce inter-process dependencies during failure-free operation
Roll back propagation
The dependencies among messages may force some of the processes that did not fail to roll back.
This phenomenon of cascaded rollback is called the domino effect.
Uncoordinated check pointing
If each process takes its checkpoints independently, then the system cannot avoid the domino
effect – this scheme is called independent or uncoordinated check pointing
Techniques that avoid domino effect
1. Coordinated check pointing rollback recovery - Processes coordinate their checkpoints to form a
system-wide consistent state
2. Communication-induced check pointing rollback recovery - Forces each process to take
checkpoints based on information piggybacked on the application.
Bloom’s
Qn
Question Answer Knowledge
No
Level
1. B K1
What is the purpose of rollback recovery protocols in
distributed systems?
2 C K1
What is the purpose of rollback recovery protocols in
distributed systems?
3 D K1
Which phenomenon occurs when dependencies among
messages force non-failed processes to roll back during
recovery?
a) Deadlock
b) Rollback propagation
c) Message ordering
d) Domino effect
4 C K1
What is the main disadvantage of uncoordinated check pointing
in rollback recovery?
5 C K1
Which rollback recovery technique combines check pointing
with logging of non-deterministic events?
Students have to prepare answers for the following questions at the end of the lecture
Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 Define a domino effect in rollback recovery? 2 CO4 K2
2 Illustrate key difference between coordinated and 2 CO4 K2
uncoordinated check pointing in rollback recovery?
3 What does log-based rollback recovery rely on, and why 2 CO4 K1
is it useful?
Reference Book
Processes cooperate to execute a distributed application and interact with the outside
world by receiving and sending input and output messages, respectively.
Rollback-recovery protocols generally make assumptions about the reliability of theinter-
process communication.
Some protocols assume that the communication uses first-in-first-out (FIFO) order, while
other protocols assume that the communication subsystem can lose, duplicate, or reorder
messages.
Rollback-recovery protocols therefore must maintain information about the internal
interactions among processes and also the external interactions with the outside world.
An example of a distributed system with three processes.
A local checkpoint
A local check point is a snapshot of the state of the process at a given instance
Assumption
Consistent states
The state in fig (a) is consistent and the state in Figure (b) is inconsistent.
Note that the consistent state in Figure (a) shows message m1 to have been sent but not
yet received, but that is alright.
The state in Figure (a) is consistent because it represents a situation in which every
message that has been received, there is a corresponding message send event.
The state in Figure (b) is inconsistent because process P2 is shown to have received m2
but the state of process P1 does not reflect having sent it.
Such a state is impossible in any failure-free, correct computation. Inconsistent states
occur because of failures.
Interactions with outside world
A distributed system often interacts with the outside world to receive input data or deliver the
outcome of a computation. If a failure occurs, the outside world cannot be expected to roll back.
For example, a printer cannot roll back the effects of printing a character
Outside World Process (OWP)
It is a special process that interacts with the rest of the system through message passing.
It is therefore necessary that the outside world see a consistent behavior of the system
despite failures.
Thus, before sending output to the OWP, the system must ensure that the state from
which the output is sent will be recovered despite any future failure.
A common approach is to save each input message on the stable storage before allowing the
application program to process it.
An interaction with the outside world to deliver the outcome of a computation is shown on the
process-line by the symbol “||”.
Different types of Messages
1. In-transit message
2. Lost messages
3. Delayed messages
messages whose “receive‟ is not recorded because the receiving process was
either down or the message arrived after rollback
4. Orphan messages
5. Duplicate messages
In-transit messages
In Figure , the global state {C1,8 , C2, 9 , C3,8, C4,8} shows that message m1 has been sent but
not yet received. We call such a message an in-transit message. Message m2 is also an in-transit
message.
Delayed messages
Messages whose receive are not recorded because the receiving process was either down or the
message arrived after the rollback of the receiving process, are called delayed messages. For
example, messages m2 and m5 in Figure are delayed messages.
Lost messages
Messages whose send is not undone but receive is undone due to rollback are called lost
messages. This type of messages occurs when the process rolls back to a checkpoint prior to
reception of the message while the sender does not rollback beyond the send operation of the
message. In Figure , message m1 is a lost message.
Duplicate messages
Duplicate messages arise due to message logging and replaying during process
recovery. For example, in Figure, message m4 was sent and received before the
rollback. However, due to the rollback of process P4 to C4,8 and process P3 to C3,8,
both send and receipt of message m4 are undone.
When process P3 restarts from C3,8, it will resend message m4.
Bloom’s
Qn
Question Answer Knowledge
No
Level
1 C K1
What is a local checkpoint in a distributed system?
2 B K1
What defines a consistent global checkpoint in a distributed
system?
a) A set of checkpoints taken at the same time by all processes
b) A set of checkpoints in which no messages are sent by a
process after taking its checkpoint that are received by another
process before taking its checkpoint
c) A set of checkpoints that reflect lost messages
d) A set of checkpoints that include delayed messages
3 B K1
What is an in-transit message in the context of rollback
recovery?
4 B K1
What are orphan messages in a rollback recovery protocol?
5 C K1
Which type of message arises due to message logging and
replaying during process recovery?
a) Delayed messages
b) Lost messages
c) Duplicate messages
d) Orphan messages
Students have to prepare answers for the following questions at the end of the lecture
Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 Give the difference between a consistent and an 2 CO4 K2
inconsistent global state in a distributed system?
2 Define an orphan message in rollback recovery protocols? 2 CO4 K2
3 Explain the concept of check pointing and rollback 15 CO4 K3
recovery in distributed systems. Discuss the different
types of check pointing strategies (coordinated,
uncoordinated, and communication-induced) and their
impact on system consistency and recovery.
4 Discuss the types of messages (in-transit, lost, delayed, 15 CO4 K3
orphan, and duplicate) that arise in rollback recovery
protocols. Explain how each type of message can affect
the recovery process and system consistency, and describe
strategies for handling these message types.
Reference Book
Unit Lecture
4. CONSENSUS AND RECOVERY 35
No No
Topic Issues in Failure Recovery
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Define an orphan message in the context of failure recovery? K2
LO2 Discuss the challenges in failure recovery within a distributed K2
system.
In a failure recovery, we must not only restore the system to a consistent state, but also
appropriately handle messages that are left in an abnormal state due to the failure and recovery
• The computation comprises of three processes Pi, Pj , and Pk, connected through a
communication network. The processes communicate solely by exchanging messages over fault
free, FIFO communication channels.
• Processes Pi, Pj , and Pk, have taken checkpoints {Ci,0, Ci,1}, {Cj,0, Cj,1, Cj,2}, and {Ck,0,
Ck,1}, respectively, and these processes have exchanged messages A to J
Suppose process Pi fails at the instance indicated in the figure. All the contents of the volatile
memory of Pi are lost and, after Pi has recovered from the failure, the system needs to be
restored to a consistent global state from where the processes can resume their execution.
• Process Pi’s state is restored to a valid state by rolling it back to its most recent checkpoint
Ci,1. To restore the system to a consistent state, the process Pj rolls back to checkpoint Cj,1
because the rollback of process Pi to checkpoint Ci,1 created an orphan message H (the receive
event of H is recorded at process Pj while the send event of H has been undone at process Pi).
• Pj does not roll back to checkpoint Cj,2 but to checkpoint Cj,1. An orphan message I is created
due to the roll back of process Pj to checkpoint Cj,1. To eliminate this orphan message, process
Pk rolls back to checkpoint Ck,1.
• Messages C, D, E, and F are potentially problematic. Message C is in transit during the failure
and it is a delayed message. The delayed message C has several possibilities: C might arrive at
process Pi before it recovers, it might arrive while Pi is recovering, or it might arrive after Pi has
completed recovery. Each of these cases must be dealt with correctly.
• Message D is a lost message since the send event for D is recorded in the restored state for
process Pj , but the receive event has been undone at process Pi. Process Pj will not resend D
without an additional mechanism.
• Messages E and F are delayed orphan messages and pose perhaps the most serious problem of
all the messages. When messages E and F arrive at their respective destinations, they must be
discarded since their send events have been undone. Processes, after resuming execution from
their checkpoints, will generate both of these messages.
• Lost messages like D can be handled by having processes keep a message log of all the sent
messages. So when a process restores to a checkpoint, it replays the messages from its log to
handle the lost message problem.
• Overlapping failures further complicate the recovery process. If overlapping failures are to be
tolerated, a mechanism must be introduced to deal with amnesia and the resulting
inconsistencies.
Students have to prepare answers for the following questions at the end of the lecture
Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 What is an orphan message in the context of failure 2 CO4 K2
recovery?
2 Discuss the challenges in failure recovery within a 15 CO4 K2
distributed system.
Reference Book
Unit Lecture
4. CONSENSUS AND RECOVERY 36
No No
Topic Checkpoint-based Recovery
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Write about the delayed message in the context of failure K2
recovery?
LO2 Illustrate the difference between autonomous and forced K2
checkpoints in communication-induced checkpointing?
LO3 Describe the process of handling failure recovery in a distributed K3
system, focusing on the challenges posed by messages such as
in-transit, lost, delayed, orphan, and duplicate messages. How
does checkpoint-based recovery mitigate these issues?
LO4 Compare and contrast uncoordinated, coordinated, and K3
communication-induced checkpointing strategies in rollback
recovery. Discuss their impact on runtime overhead, recovery
time, and system consistency.
Checkpoint-based recovery
1. Uncoordinated checkpointing
2. Coordinated checkpointing
3. Communication-induced checkpointing
1. Uncoordinated Checkpointing
Advantages
Disadvantages
The processes record the dependencies among their checkpoints caused by message
exchange during failure-free operation
The following direct dependency tracking technique is commonly used in uncoordinated
checkpointing.
Direct dependency tracking technique
Assume each process 𝑃𝑖 starts its execution with an initial checkpoint 𝐶𝑖,0
When 𝑃𝑗 receives a message m during 𝐼𝑗,𝑦 , it records the dependency from 𝐼𝑖,𝑥 to 𝐼𝑗,𝑦,
which is later saved onto stable storage when 𝑃𝑗 takes 𝐶𝑗,𝑦
When a process receives this message, it stops its execution and replies with the
dependency information saved on the stable storage as well as with the dependency
information, if any, which is associated with its current state.
The initiator then calculates the recovery line based on the global dependency
information and broadcasts a rollback request message containing the recovery line.
Upon receiving this message, a process whose current state belongs to the recovery line
simply resumes execution; otherwise, it rolls back to an earlier checkpoint as indicated by
the recovery line.
2. Coordinated Check pointing
In coordinated check pointing, processes orchestrate their check pointing activities so that all
local checkpoints form a consistent global state
Types
This situation results in an inconsistent checkpoint since checkpoint 𝐶1,𝑥 shows the
receipt of message m from 𝑃0, while checkpoint 𝐶0,𝑥 does not show m being sent from
𝑃0
If channels are FIFO, this problem can be avoided by preceding the first post-checkpoint
message on each channel by a checkpoint request, forcing each process to take a
checkpoint before receiving the first post-checkpoint message
Algorithm
The algorithm consists of two phases. During the first phase, the checkpoint initiator
identifies all processes with which it has communicated since the last checkpoint and
sends them a request.
Upon receiving the request, each process in turn identifies all processes it has
communicated with since the last checkpoint and sends them a request, and so on, until
no more processes can be identified.
During the second phase, all processes identified in the first phase take a checkpoint. The
result is a consistent checkpoint that involves only the participating processes.
In this protocol, after a process takes a checkpoint, it cannot send any message until the
second phase terminates successfully, although receiving a message after the checkpoint
has been taken is allowable.
3. Communication-induced Checkpointing
1. Autonomous checkpoints
2. Forced checkpoints
The checkpoints that a process takes independently are called local checkpoints, while those that
a process is forced to take are called forced checkpoints.
Communication-induced check pointing piggybacks protocol- related information on
1. Model-based checkpointing
2. Index-based checkpointing.
Model-based checkpointing
The MRS (mark, send, and receive) model of Russell avoids the domino effect by
ensuring that within every checkpoint interval all message receiving events precede
all message-sending events.
Index-based checkpointing.
Bloom’s
Qn
Question Answer Knowledge
No
Level
1 A rollback recovery protocol restores the system back to consistent K1
a __________ state after a failure.
2 In a distributed system, a local checkpoint is a snapshot state K1
of the __________ of the process at a given instance.
3 The phenomenon where the rollback of one process domino K1
forces other processes to roll back to maintain
consistency is called the __________ effect.
4 In coordinated checkpointing, processes synchronize consistent K1
their checkpoints to form a __________ global state.
5 A message that has been sent but not yet received is in-transit K1
termed an __________ message.
Students have to prepare answers for the following questions at the end of the lecture
Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 Write about the delayed message in the context of failure 2 CO4 K2
recovery?
2 Illustrate the difference between autonomous and forced CO4 K2
checkpoints in communication-induced checkpointing? 2
3 Describe the process of handling failure recovery in a 15 CO4 K2
distributed system, focusing on the challenges posed by
messages such as in-transit, lost, delayed, orphan, and
duplicate messages. How does checkpoint-based recovery
mitigate these issues?
4 Compare and contrast uncoordinated, coordinated, and 15 CO4 K3
communication-induced checkpointing strategies in
rollback recovery. Discuss their impact on runtime
overhead, recovery time, and system consistency.
Reference Book
Unit Lecture
4. CONSENSUS AND RECOVERY 37
No No
Topic Coordinated Checkpointing Algorithm
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Give the two types of checkpoints used in the Koo and Toueg K2
checkpointing algorithm?
LO2 Describe, What happens if the initiating process Pi learns that not K2
all processes successfully took tentative checkpoints?
LO3 Explain the two phases of the Koo and Toueg checkpointing K3
algorithm. What are the key actions taken during each phase?
LO4 Discuss the rollback recovery algorithm of Koo and Toueg. K3
Describe its phases and how it ensures a consistent state after a
failure.
The checkpoint algorithm makes the following assumptions about the distributed system:
Assume that end-to-end protocols (the sliding window protocol) exist to handle with
message loss due to rollback recovery and communication failure.
Communication failures do not divide the network.
The checkpoint algorithm takes two kinds of checkpoints on the stable storage: Permanent and
Tentative.
1. An initiating process Pi takes a tentative checkpoint and requests all other processes to
take tentative checkpoints. Each process informs Pi whether it succeeded in taking a
tentative checkpoint.
2. A process says “no” to a request if it fails to take a tentative checkpoint
3. If Pi learns that all the processes have successfully taken tentative checkpoints, Pi decides
that all tentative checkpoints should be made permanent; otherwise, Pi decides that all the
tentative checkpoints should be thrown-away.
Second Phase
1. Pi informs all the processes of the decision it reached at the end of the first phase.
3. Either all or none of the processes advance the checkpoint by taking permanent
checkpoints.
4. The algorithm requires that after a process has taken a tentative checkpoint, it cannot
send messages related to the basic computation until it is informed of Pi’s decision.
Correctness: for two reasons
An Optimization
The above protocol may cause a process to take a checkpoint even when it is not necessary for
consistency. Since taking a checkpoint is an expensive operation, we avoid taking checkpoints.
B. The Rollback Recovery Algorithm
The rollback recovery algorithm restores the system state to a consistent state after a failure. The rollback
recovery algorithm assumes that a single process invokes the algorithm. It assumes that the checkpoint and
the rollback recovery algorithms are not invoked concurrently. The rollback recovery algorithm has two
phases.
First Phase
1. An initiating process Pi sends a message to all other processes to check if they all arewilling to
restart from their previous checkpoints.
2. A process may reply “no” to a restart request due to any reason (e.g., it is already participating in a
check pointing or a recovery process initiated by some other process).
3. If Pi learns that all processes are willing to restart from their previous checkpoints, Pidecides
that all processes should roll back to their previous checkpoints. Otherwise,
4. Pi aborts the roll back attempt and it may attempt a recovery at a later time.
Second Phase
3. During the execution of the recovery algorithm, a process cannot send messages related
to the underlying computation while it is waiting for Pi’s decision.
Correctness: Resume from a consistent state
Optimization: May not to recover all, since some of the processes did not change anything
The above protocol, in the event of failure of process X, the above protocol will requireprocesses X, Y, and Z to
restart from checkpoints x2, y2, and z2, respectively.
Process Z need not roll back because there has been no interaction between process Z and the
other two processes since the last checkpoint at Z.
Bloom’s
Qn
Question Answer Knowledge
No
Level
1 The Koo and Toueg coordinated checkpointing technique aims domino K1
to take a consistent set of checkpoints while avoiding the effect
__________ and livelock problems during recovery.
2 In the checkpointing algorithm, a __________ checkpoint is a tentative K1
temporary checkpoint that becomes permanent upon the
successful completion of the checkpointing process.
3 During the rollback recovery algorithm, if the initiating process Abort K1
Pi learns that not all processes are willing to restart from their
previous checkpoints, it will __________ the rollback attempt.
4 The correctness of the checkpointing algorithm ensures that None K1
either all processes take permanent checkpoints or __________.
5 The rollback recovery algorithm consists of two phases, with restart K1
the first phase involving Pi checking with all other processes if
they are willing to __________ from their previous
checkpoints.
Students have to prepare answers for the following questions at the end of the lecture
Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 Give the two types of checkpoints used in the Koo and 2 CO4 K2
Toueg checkpointing algorithm?
2 What happens if the initiating process Pi learns that not all 2 CO4 K2
processes successfully took tentative checkpoints?
3 Explain the two phases of the Koo and Toueg 15 CO4 K3
checkpointing algorithm. What are the key actions taken
during each phase?
4 Discuss the rollback recovery algorithm of Koo and 15 CO4 K3
Toueg. Describe its phases and how it ensures a consistent
state after a failure.
Reference Book
Unit Lecture
4. CONSENSUS AND RECOVERY 38
No No
Topic Algorithm for Asynchronous Checkpointing and Recovery
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 List the two types of logs maintained in the Juang and Venkatesan K2
algorithm?
LO2 Demonstrate how does the algorithm identify orphan messages K2
during the recovery process?
LO3 Describe the asynchronous checkpointing mechanism in the Juang K2
and Venkatesan algorithm. How does it handle the recording of
events?
LO4 Explain the recovery algorithm in the Juang and Venkatesan K2
approach. What steps are taken when a processor restarts after a
failure?
The algorithm of Juang and Venkatesan for recovery in a system that uses asynchronous check
pointing.
A. System Model and Assumptions
The algorithm makes the following assumptions about the underlying system:
The communication channels are reliable, deliver the messages in FIFO order and have
infinite buffers.
The message transmission delay is arbitrary, but finite.
– Volatile log: short time to access but lost if processor crash. Move to stable log
periodically.
– Stable log: longer time to access but remained if crashed
– After executing an event, the triplet is recorded without any synchronization with
other processes.
– Local checkpoint consist of set of records, first are stored in volatile log, then
moved to stable log.
B. The Recovery Algorithm
Notations and data structure
The following notations and data structure are used by the algorithm:
Since the algorithm is based on asynchronous check pointing, the main issue in the
recovery is to find a consistent set of checkpoints to which the system can be restored.
The recovery algorithm achieves this by making each processor keep track of both the
number of messages it has sent to other processors as well as the number of messages it
has received from other processors.
Whenever a processor rolls back, it is necessary for all other processors to find out if any
message has become an orphan message. Orphan messages are discovered by comparing
the number of messages sent to and received from neighboring processors.
For example, if RCVDi←j(CkPti) > SENTj→i(CkPtj) (that is, the number of messages received
by processor pi from processor pj is greater than the number of messages sent by processor pj to
processor pi, according to the current states the processors), then one or more messages at
processor pj are orphan messages.
C. The Algorithm
When a processor restarts after a failure, it broadcasts a ROLLBACK message that it had failed
Procedure RollBack_Recovery
STEP (a)
volatile storage.}
end if
STEP (b)
end for
find the latest event e such that RCVDi←j(e) = c {Such an event e may be in the volatile storage
or stable storage.}
CkPti := e
end if
end for
end for{for k}
D. An Example
Bloom’s
Qn
Question Answer Knowledge
No
Level
1 Which of the following best describes a Volatile log in B K1
the Juang and Venkatesan algorithm?
Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 List the two types of logs maintained in the Juang and 2 CO4 K2
Venkatesan algorithm?
2 Demonstrate how does the algorithm identify orphan 2 CO4 K2
messages during the recovery process?
3 Describe the asynchronous checkpointing mechanism in 15 CO4 K2
the Juang and Venkatesan algorithm. How does it handle
the recording of events?
4 Explain the recovery algorithm in the Juang and 15 CO4 K2
Venkatesan approach. What steps are taken when a
processor restarts after a failure?
Reference Book