0% found this document useful (0 votes)

9 views46 pages

Unit Iv

Material

Uploaded by

22bp8c6cvd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views46 pages

Unit Iv

Material

Uploaded by

22bp8c6cvd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

UNIT IV CONSENSUS AND RECOVERY

Consensus and Agreement Algorithms: Problem Definition – Overview of Results – Agreement in a

Failure-Free System(Synchronous and Asynchronous) – Agreement in Synchronous Systems with
Failures; Checkpointing and Rollback Recovery: Introduction – Background and Definitions – Issues in
Failure Recovery – Checkpoint-based Recovery – Coordinated Checkpointing Algorithm ––
Algorithm for Asynchronous Checkpointing and Recovery

Unit Lecture
4. CONSENSUS AND RECOVERY 29
No No
Topic Consensus and Agreement Algorithms: Problem Definition
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Compare Byzantine agreement and consensus problems K2
LO2 Explain the concept of "validity" in the context of agreement K2
problems.
LO3 Discuss the Byzantine agreement problem in detail, including its K3
definition, requirements (agreement, validity, termination), and
the challenges posed by Byzantine failures.
LO4 Analyze the consensus problem by explaining its key K3
components (agreement, validity, termination), the types of
failure models it can handle, and common algorithms used to
achieve consensus in distributed systems.

Problem definition
Agreement among the processes in a distributed system is a fundamental requirement for a wide range
of applications. Many forms of coordination require the processes to exchange information to negotiate
with one another and eventually reach a common understanding or agreement, before taking
application-specific actions. A classical example is that of the commit decision in database systems,
wherein the processes collectively decide whether to commit or abort a transaction that they participate
in.
We first state some assumptions underlying our study of agreement algorithms:
• Failure models Among the n processes in the system, at most f processes can be faulty. A faulty
process can behave in any manner allowed by the failure model assumed. The various failure models –
fail-stop, send omission and receive omission, and Byzantine failures.

• Synchronous/asynchronous communication If a failure-prone process chooses to send a message to

process Pi but fails, then Pi cannot detect the non-arrival of the message in an asynchronous system.
In a synchronous system, however, the scenario in which a message has not been sent can be
recognized by the intended recipient, at the end of the round.
• Network connectivity The system has full logical connectivity, i.e., each process can communicate
with any other by direct message passing.
• Sender identification A process that receives a message always knows the identity of the sender
process.
• Channel reliability The channels are reliable, and only the processes may fail (under one of various
failure models).
• Authenticated vs. non-authenticated messages With unauthenticated messages, when a faulty
process relays a message to other processes, (i) it can forge the message and claim that it was received
from another process, and (ii) it can also tamper with the contents of a received message before relaying
it. When a process receives a message, it has no way to verify its authenticity. An unauthenticated
message is also called an oral message or an unsigned message. Using authentication via techniques
such as digital signatures, it is easier to solve the agreement problem because, if some process forges a
message or tampers with the contents of a received message before relaying it, the recipient can detect
the forgery or tampering. Thus, faulty processes can inflict less damage.
• Agreement variable The agreement variable may be boolean or multivalued, and need not be an
integer.

The Byzantine agreement

The Byzantine agreement problem requires a designated process, called the source process, with an
initial value
Problem definition agreement with the other processes about its initial value, subject to thefollowing
conditions:
• Agreement All non-faulty processes must agree on the same value.
• Validity If the source process is non-faulty, then the agreed upon value by all the non-faultyprocesses
must be the same as the initial value of the source.
• Termination Each non-faulty process must eventually decide on a value. The validity condition rules
out trivial solutions, such as one in which the agreed upon value is a constant. The consensus problem
The consensus problem differs from the Byzantine agreement problem in that each process has an initial
value and all the correct processes must agree on a single value

• Agreement All non-faulty processes must agree on the same (single) value.
• Validity If all the non-faulty processes have the same initial value, then the agreed uponvalue by
all the non-faulty processes must be that same value.
• Termination Each non-faulty process must eventually decide on a value.
The interactive consistency problem
The interactive consistency problem differs from the Byzantine agreement problem in that each process
has an initial value, and all the correct processes must agree upon a set ofvalues, with one value for
each process
• Agreement All non-faulty processes must agree on the same array of values A[v1…vn]
• Validity If process i is non-faulty and its initial value is vi, then all non faulty processes agree on
vi as the ith element of the array A. If process j is faulty, then the non-faulty processes can agree on
any value for A[j].
• Termination Each non-faulty process must eventually decide on the array A

Assessment questions to the lecture

Bloom’s
Qn
Question Answer Knowledge
No
Level
1. What is the main goal of the Byzantine agreement problem? B K1

 A) Ensure all processes have different values

 B) Achieve consensus among processes despite faulty
ones
 C) Guarantee that all processes always terminate
 D) Allow faulty processes to influence decisions

2. In the context of the consensus problem, what does the B K1

validity condition state?

 A) All non-faulty processes must agree on the initial

values of faulty processes.
 B) If all non-faulty processes have the same initial
value, they must agree on that value.
 C) Faulty processes must be excluded from the
agreement.
 D) All processes can decide any value regardless of
their initial values.
3. Which condition is NOT part of the Byzantine agreement C K1
problem?

 A) Agreement
 B) Validity
 C) Performance
 D) Termination

4. What does the interactive consistency problem require all K1

non-faulty processes to agree on?

 A) A single value B
 B) A set of values corresponding to each process
 C) The maximum value from all processes
 D) A random value chosen by a faulty process

5. In a system with n processes where at most f can fail, what C K1

is the maximum number of faulty processes that can be
tolerated in Byzantine agreement?

 A) n−1
 B) f
 C) (n−1)/3
 D) n/2

Students have to prepare answers for the following questions at the end of the lecture

Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 What are the two main differences between Byzantine 2 CO4 K1
agreement and consensus problems?
2 Explain the concept of "validity" in the context of 2 CO4 K1
agreement problems.
3 What role does authenticated messaging play in solving 2 CO4 K1
agreement problems in distributed systems?
4 Discuss the Byzantine agreement problem in detail, 15 CO4 K3
including its definition, requirements (agreement, validity,
termination), and the challenges posed by Byzantine
failures. Explain one algorithm used to solve the
Byzantine agreement problem.
5 Analyze the consensus problem by explaining its key 15 CO4 K3
components (agreement, validity, and termination), the
types of failure models it can handle, and common
algorithms used to achieve consensus in distributed
systems. Provide examples of scenarios where consensus
is critical.

Reference Book

Author(s) Title of the book Page numbers

Ajay D. Distributed Computing 510-514
Kshemkalyani and
Mukesh Singhal

Unit Lecture
4. CONSENSUS AND RECOVERY 30
No No
Topic Overview of Results
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Explain the relationship between the consensus problem and the K2
attainability of common knowledge in both synchronous and
asynchronous systems. Discuss the implications of various
failure modes, including crash and Byzantine failures, on the
ability to achieve consensus.
LO2 Discuss the different variants of the consensus problem, K2
including reliable broadcast, k-set consensus, C-agreement, and
renaming. Describe their definitions, the failure models they
address, and the specific conditions required for each variant to
be solvable.

Overview of results:
It is worth understanding the relation between the consensus problem and the problem of attaining
common knowledge of the agreement value. For the “no failure” case, consensus is attainable. Further in the
synchronous system, common knowledge of the consensus value is also attainable. Whereas in the
asynchronous system, concurrent common knowledge of the consensus is attainable.

Failure Synchronous system Asynchronous system

mode (message-passing and (message-passing and
shared memory) shared memory)
No Failure agreement attainable; agreement attainable;

common knowledge attainable concurrent common knowledge

Crash Failure agreement attainable agreement not attainable

f < n processes
Byzantine agreement attainable agreement not attainable
Failure f ≤ [(n - 1)/3] Byzantine processes

The overhead bounds are for the given algorithms, and not necessarily tight bounds for the problem.

Solvable Failure model and overhead Definition

Variants
Reliable Crash Failure, n>f(MP) Validity, Agreement, Integrity
broadcast conditions

k-set Crash Failure, f<k<n.(MP and Size of the set of values agreed
consensus SM) upon must be less than k

C-agreement Crash Failure, n≥5f+1(MP) Values agreed upon are

within ɛ of each other

Renaming uptoffail- Select a unique name from

stopprocesses,n≥2f+1(MP) a set of names
Crash Failure, f ≤ n- 1(SM)

Circumventing the impossibility results for consensus in asynchronous systems:

Consensus is not solvable in asynchronous systems even if one process can fail by crashing. To
circumvent this impossibility result, weaker variants of the consensus problem are defined in the above table.

Assessment questions to the lecture

Bloom’s
Qn
Question Answer Knowledge
No
Level
1. What is the primary condition under which consensus is C K1
attainable in a synchronous system with no failures?

 A) Agreement is not possible.

 B) Common knowledge of the consensus value is not
attainable.
 C) Agreement and common knowledge of the
consensus value are both attainable.
 D) Only agreement is attainable.

2. What does the k-set consensus problem require regarding B K1

the size of the set of values?

 A) The size of the set of values must be greater than k.

 B) The size of the set of values must be less than k.
 C) The size of the set of values must be equal to k.
 D) The size of the set of values must be k or more.

3. In the context of C-agreement, what is the condition K1

regarding the values agreed upon? B

 A) They must be distinct.

 B) They must be within ϵ\epsilonϵ of each other.
 C) They can be any random values.
 D) They must be the same value.

4. In the context of C-agreement, what is the condition K1

regarding the values agreed upon?

 A) They must be distinct.

 B) They must be within ϵ of each other. B
 C) They can be any random values.
 D) They must be the same value.

5. Which of the following statements is true regarding C K1

consensus in asynchronous systems?

 A) Consensus is solvable even if one process can fail.

 B) Consensus is always attainable regardless of the
number of processes.
 C) Consensus is not solvable if at least one process can
crash.
 D) Consensus can be achieved using only crash
failures.

Students have to prepare answers for the following questions at the end of the lecture

Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 In a synchronous system with no failures, what can be 2 CO4 K2
said about the attainability of both agreement and
common knowledge of the consensus value?
2 Define the k-set consensus problem and its condition 2 CO4 K2
regarding the size of the set of values that must be agreed
upon.
3 What is the maximum number of Byzantine processes f 2 CO4 K1
that can be tolerated in order to achieve agreement in the
Byzantine agreement problem?
4 Explain the relationship between the consensus problem 15 CO4 K2
and the attainability of common knowledge in both
synchronous and asynchronous systems. Discuss the
implications of various failure modes, including crash and
Byzantine failures, on the ability to achieve consensus.
5 Discuss the different variants of the consensus problem, 15 CO4 K2
including reliable broadcast, k-set consensus, C-
agreement, and renaming. Describe their definitions, the
failure models they address, and the specific conditions
required for each variant to be solvable.

Reference Book

Author(s) Title of the book Page numbers

Ajay D. Distributed Computing 514-515
Kshemkalyani and
Mukesh Singhal
Unit Lecture
4. CONSENSUS AND RECOVERY 31
No No
Topic Agreement in a Failure-Free System(Synchronous and Asynchronous)
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Discuss the methods used to achieve consensus in a failure-free K2
system. Include a description of various algorithms and functions
that can be applied, and explain the differences in approach
between synchronous and asynchronous systems.
LO2 Evaluate the impact of different logical topologies on consensus K3
algorithms in a failure-free system. Discuss how specific
topologies might influence the efficiency and effectiveness of
reaching consensus in both synchronous and asynchronous
environments.

AGREEMENT IN A FAILURE-FREE SYSTEM (SYNCHRONOUS OR ASYNCHRONOUS)

In a failure-free system, consensus can be reached by collecting information from the different
processes, arriving at a “decision,” and distributing this decision in the system.
A distributed mechanism would have each process broadcast its values to others, and each process
computes the same function on the values received.
The decision can be reached by using an application specific function – some simple examples being the
majority, max, and min functions. Algorithms to collect the initial values and then distribute the decision
may be based on the token circulation on a logical ring, or the three- phase
Consensus and agreement algorithms tree-based broadcast–converge cast–broadcast, or direct
communication with all nodes.
• In a synchronous system, this can be done simply in a constant number of rounds (depending on
the specific logical topology and algorithm used). Further, common knowledge of the decision
value can be obtained using an additional round.
• In an asynchronous system, consensus can similarly be reached in a constant number of message
hops. Further, concurrent common knowledge of the consensus value can also be attained, using
any of the algorithms.

Assessment questions to the lecture

Bloom’s
Qn
Question Answer Knowledge
No
Level
1. B K1
In a failure-free system, how is consensus typically reached?

a) By randomly selecting a process

b) By collecting information from different processes, arriving
at a decision, and distributing it
c) By only one process making the decision
d) By having no communication between processes

2 C K1
What is an example of a decision function used in consensus
algorithms in distributed systems?

a) Sum function
b) Product function
c) Max function
d) Logarithmic function

3 B K1
In a synchronous system, consensus can be reached in how
many rounds?

a) Infinite rounds
b) Constant number of rounds
c) One round
d) Varying number of rounds based on failure

4 A K1
Which of the following algorithms could be used for
consensus in a distributed system?

a) Tree-based broadcast-converge cast-broadcast

b) Random polling algorithm
c) Majority voting with no communication
d) Random number generation
5 C K1
In an asynchronous system, how is consensus reached?

a) By stopping all processes

b) By direct communication with only one node
c) In a constant number of message hops
d) Without exchanging any messages

Students have to prepare answers for the following questions at the end of the lecture

Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 In a failure-free system, how can consensus be reached 2 CO4 K1
among processes?
2 What is one advantage of using a synchronous system for 2 CO4 K1
reaching consensus compared to an asynchronous system?
3 What additional capability can be obtained in a 2 CO4 K1
synchronous system after reaching a consensus?
4 Discuss the methods used to achieve consensus in a 15 CO4 K2
failure-free system. Include a description of various
algorithms and functions that can be applied, and explain
the differences in approach between synchronous and
asynchronous systems.
5 Evaluate the impact of different logical topologies on 15 CO4 K3
consensus algorithms in a failure-free system. Discuss
how specific topologies might influence the efficiency and
effectiveness of reaching consensus in both synchronous
and asynchronous environments.

Reference Book

Author(s) Title of the book Page numbers

Ajay D. Distributed Computing 515-516
Kshemkalyani and
Mukesh Singhal

Unit Lecture
4. CONSENSUS AND RECOVERY 32
No No
Topic Agreement in Synchronous Systems with Failures
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Explain in detail the consensus algorithm for crash failures in a K3
synchronous system. Discuss its working mechanism, conditions
for correctness, and its complexity.
LO2 Describe the Phase King algorithm used for Byzantine failures in K3
synchronous systems. Discuss how it ensures consensus despite
faulty processes and provide an analysis of its correctness and
complexity.

AGREEMENT IN (MESSAGE-PASSING) SYNCHRONOUS SYSTEMS WITH

FAILURES
CONSENSUS ALGORITHM FOR CRASH FAILURES (SYNCHRONOUS SYSTEM)
• The consensus algorithm for n processes where up to f processes where f < n may fail in a
fail stop failure model.
• Here the consensus variable x is integer value; each process has initial value xi. If up to f
failures are to be tolerated than algorithm has f+1 rounds, in each round a process i sense the
value of its variable xi to all other processes if that value has not been sent before.
• So, of all the values received within that round and its own value xi at that start of the round
the process takes minimum and updates xi occur f + 1 rounds the local value xi guaranteed to
be the consensus value.
• If one process is faulty, among three processes then f = 1. So the agreement requires f + 1
that is equal to two rounds.
• If it is faulty let us say it will send 0 to 1 process and 1 to another process i, j and k. Now,
on receiving one on receiving 0 it will broadcast 0 over here and this particular process on
receiving 1 it will broadcast 1 over here.
• So, this will complete one round in this one round and this particular process on receiving 1
it will send 1 over here and this on the receiving 0 it will send 0 over here.

• The agreement condition is satisfied because in the f+ 1 rounds, there must be at least one round in which
no process failed.
• In this round, say round r, all the processes that have not failed so far succeed in broadcasting their
values, and all these processes take the minimum of the values broadcast and received in that round.
• Thus, the local values at the end of the round are the same, say x r i for all non-failed processes.
• In further rounds, only this value may be sent by each process at most once, and no process i will update
its value x r i.
• The validity condition is satisfied because processes do not send fictitious values in this failure model.
• For all i, if the initial value is identical, then the only value sent by any process is the value that has been
agreed upon as per the agreement condition.
• The termination condition is seen to be satisfied.
Complexity: The complexity of this particular algorithm is it requires f + 1 rounds where f < n and the
number of messages is O(n2)in each round and each message has one integers hence the total number
of messages is O((f+1)· n 2 ) is the total number of rounds and in each round n 2 messages are required.

Consensus algorithms for Byzantine failures (synchronous system)

STEPS FOR BYZANTINE GENERALS (ITERATIVE FORMULATION),
SYNCHRONOUS, MESSAGE-PASSING:
STEPS FOR BYZANTINE GENERALS (RECURSIVE FORMULATION),
SYNCHRONOUS, MESSAGE-PASSING:

CODE FOR THE PHASE KING ALGORITHM:

Each phase has a unique "phase king" derived, say, from

PID. Each phase has two rounds:
 1 in 1st round, each process sends its estimate to all other processes.

 2 in 2nd round, the "Phase king" process arrives at an estimate based on thevalues
it received in 1st round, and broadcasts its new estimate to all others.
Fig. Message pattern for the phase-king algorithm.

PHASE KING ALGORITHM CODE:

(f + 1) phases, (f + 1)[(n - 1)(n + 1)] messages, and can tolerate up to f < dn=4e
malicious processes

Correctness Argument

 1 Among f + 1 phases, at least one phase k where phase-king is non-malicious.

 2 In phase k, all non-malicious processes Pi and Pj will have same

estimate of consensus value as Pk does.

 Pi and Pj use their own majority values. Pi 's mult > n=2 + f )

 Pi uses its majority value; Pj uses phase-king's tie-breaker value. (Pi’s mult >
n=2 +f , Pj 's mult > n=2 for same value)

 Pi and Pj use the phase-king's tie-breaker value. (In the phase in which Pkis
non- malicious, it sends same value to Pi and Pj )

In all 3 cases, argue that Pi and Pj end up with same value as estimate

 If all non-malicious processes have the value x at the start of a phase, they
will continue to have x as the consensus value at the end of the phase.

Assessment questions to the lecture

Qn Bloom’s
Question Answer
No Knowledge Level
1 C K1
In a crash failure model, how many rounds are required
to tolerate up to f failures in a synchronous system with n
processes?

a) f−1
b) 2f
c) f+1
d) n−f

2 C K1
In the crash failure consensus algorithm, what value does
each process use to update its local variable at the end of
each round?

a) The maximum value received

b) The sum of all values received
c) The minimum value received
d) The average of values received

3 C K1
Which of the following is true about the Byzantine fault
tolerance in synchronous systems?

a) It can tolerate any number of faulty processes

b) It can tolerate up to f<⌊n/3⌋ faulty processes
c) It can tolerate up to f<⌊n/4⌋ faulty processes
d) It can tolerate exactly n−1 faulty processes

4 B K1
In the Phase King algorithm, what happens during the
second round of each phase?

a) All processes compute the average of their estimates

b) The Phase King broadcasts a new estimate to all
other processes
c) All processes update their estimates based on
majority voting
d) The processes randomly select a new leader for the
next round

5 C K1
What is the primary method used by processes in the
Phase King algorithm to resolve conflicts and reach
consensus?

a) Voting based on the maximum value received

b) Choosing the Phase King's value as the final decision
c) Majority voting combined with the Phase King's tie-
breaking value
d) Random selection of a new estimate in each phase

Students have to prepare answers for the following questions at the end of the lecture

Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 What is the key difference between crash failures and 2 CO4 K1
Byzantine failures in distributed systems?
2 How many rounds are required to reach consensus in a 2 CO4 K2
crash failure model with up to fff failures in a
synchronous system?
3 Explain in detail the consensus algorithm for crash 15 CO4 K3
failures in a synchronous system. Discuss its working
mechanism, conditions for correctness, and its
complexity.
4 Describe the Phase King algorithm used for Byzantine 15 CO4 K3
failures in synchronous systems. Discuss how it ensures
consensus despite faulty processes and provide an analysis
of its correctness and complexity.
Reference Book

Author(s) Title of the book Page numbers

Ajay D. Distributed Computing 516-529
Kshemkalyani and
Mukesh Singhal

Unit Lecture
4. CONSENSUS AND RECOVERY 33
No No
Topic Check pointing and Rollback Recovery: Introduction
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Define a domino effect in rollback recovery? K2
LO2 Illustrate key difference between coordinated and uncoordinated K2
check pointing in rollback recovery?
LO3 What does log-based rollback recovery rely on, and why is it K1
useful?

Check pointing and rollback recovery: Introduction

 Rollback recovery protocols restore the system back to a consistent state after a failure,

 It achieves fault tolerance by periodically saving the state of a process during the failure-
free execution
 It treats a distributed system application as a collection of processes that communicate
over a network
Check points

The saved state is called a checkpoint, and the procedure of restarting from a previously check
pointed state is called rollback recovery. A checkpoint can be saved on either the stable storage
or the volatile storage
Why is rollback recovery of distributed systems complicated?
Messages induce inter-process dependencies during failure-free operation
Roll back propagation
The dependencies among messages may force some of the processes that did not fail to roll back.
This phenomenon of cascaded rollback is called the domino effect.
Uncoordinated check pointing

If each process takes its checkpoints independently, then the system cannot avoid the domino
effect – this scheme is called independent or uncoordinated check pointing
Techniques that avoid domino effect

1. Coordinated check pointing rollback recovery - Processes coordinate their checkpoints to form a
system-wide consistent state
2. Communication-induced check pointing rollback recovery - Forces each process to take
checkpoints based on information piggybacked on the application.

3. Log-based rollback recovery - Combines check pointing with logging of non-

deterministic events

• Relies on piecewise deterministic (PWD) assumption.

Assessment questions to the lecture

Bloom’s
Qn
Question Answer Knowledge
No
Level
1. B K1
What is the purpose of rollback recovery protocols in
distributed systems?

a) To improve system performance

b) To restore the system to a consistent state after a failure
c) To prevent communication between processes
d) To speed up message passing between processes

2 C K1
What is the purpose of rollback recovery protocols in
distributed systems?

a) To improve system performance

b) To restore the system to a consistent state after a failure
c) To prevent communication between processes
d) To speed up message passing between processes

3 D K1
Which phenomenon occurs when dependencies among
messages force non-failed processes to roll back during
recovery?

a) Deadlock
b) Rollback propagation
c) Message ordering
d) Domino effect

4 C K1
What is the main disadvantage of uncoordinated check pointing
in rollback recovery?

a) It results in data loss

b) It prevents message passing between processes
c) It cannot avoid the domino effect
d) It requires constant communication between processes

5 C K1
Which rollback recovery technique combines check pointing
with logging of non-deterministic events?

a) Coordinated check pointing

b) Uncoordinated check pointing
c) Log-based rollback recovery
d) Communication-induced check pointing

Students have to prepare answers for the following questions at the end of the lecture

Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 Define a domino effect in rollback recovery? 2 CO4 K2
2 Illustrate key difference between coordinated and 2 CO4 K2
uncoordinated check pointing in rollback recovery?
3 What does log-based rollback recovery rely on, and why 2 CO4 K1
is it useful?

Reference Book

Author(s) Title of the book Page numbers

Ajay D. Distributed Computing 456-457
Kshemkalyani and
Mukesh Singhal
Unit Lecture
4. CONSENSUS AND RECOVERY 34
No No
Topic Background and Definitions
Bloom’sKnowledge
Learning Outcome (LO) At the end of this lecture, students will be able to
Level
LO1 Give the difference between a consistent and an inconsistent K2
global state in a distributed system?
LO2 Define an orphan message in rollback recovery protocols? K2
LO3 Explain the concept of check pointing and rollback recovery in K3
distributed systems. Discuss the different types of check pointing
strategies (coordinated, uncoordinated, and communication-
induced) and their impact on system consistency and recovery.
LO4 Discuss the types of messages (in-transit, lost, delayed, orphan, K3
and duplicate) that arise in rollback recovery protocols. Explain
how each type of message can affect the recovery process and
system consistency, and describe strategies for handling these
message types.

Background and definitionsSystem model

 A distributed system consists of a fixed number of processes, P1, P2,…_ PN , which
communicate only through messages.

 Processes cooperate to execute a distributed application and interact with the outside
world by receiving and sending input and output messages, respectively.
 Rollback-recovery protocols generally make assumptions about the reliability of theinter-
process communication.
 Some protocols assume that the communication uses first-in-first-out (FIFO) order, while
other protocols assume that the communication subsystem can lose, duplicate, or reorder
messages.
 Rollback-recovery protocols therefore must maintain information about the internal
interactions among processes and also the external interactions with the outside world.
An example of a distributed system with three processes.

A local checkpoint

 All processes save their local states at certain instants of time

 A local check point is a snapshot of the state of the process at a given instance

 Assumption

– A process stores all local checkpoints on the stable storage

– A process is able to roll back to any of its existing local checkpoints

 𝐶𝑖,𝑘 – The kth local checkpoint at process 𝑃𝑖

 𝐶𝑖,0 – A process 𝑃𝑖 takes a checkpoint 𝐶𝑖,0 before it starts execution

Consistent states

 A global state of a distributed system is a collection of the individual states of all

participating processes and the states of the communication channels
 Consistent global state

– a global state that may occur during a failure-free execution of distribution of

distributed computation
– if a process’s state reflects a message receipt, then the state of the
corresponding sender must reflect the sending of the message
 A global checkpoint is a set of local checkpoints, one from each process

 A consistent global checkpoint is a global checkpoint such that no message is sent by a

process after taking its local point that is received by another process before taking its
checkpoint.
 For instance, Figure shows two examples of global states.

 The state in fig (a) is consistent and the state in Figure (b) is inconsistent.

 Note that the consistent state in Figure (a) shows message m1 to have been sent but not
yet received, but that is alright.
 The state in Figure (a) is consistent because it represents a situation in which every
message that has been received, there is a corresponding message send event.
 The state in Figure (b) is inconsistent because process P2 is shown to have received m2
but the state of process P1 does not reflect having sent it.
 Such a state is impossible in any failure-free, correct computation. Inconsistent states
occur because of failures.
Interactions with outside world

A distributed system often interacts with the outside world to receive input data or deliver the
outcome of a computation. If a failure occurs, the outside world cannot be expected to roll back.
For example, a printer cannot roll back the effects of printing a character
Outside World Process (OWP)

 It is a special process that interacts with the rest of the system through message passing.

 It is therefore necessary that the outside world see a consistent behavior of the system
despite failures.
 Thus, before sending output to the OWP, the system must ensure that the state from
which the output is sent will be recovered despite any future failure.
A common approach is to save each input message on the stable storage before allowing the
application program to process it.
An interaction with the outside world to deliver the outcome of a computation is shown on the
process-line by the symbol “||”.
Different types of Messages

1. In-transit message

 messages that have been sent but not yet received

2. Lost messages

 messages whose “send‟ is done but “receive‟ is undone due to rollback

3. Delayed messages

 messages whose “receive‟ is not recorded because the receiving process was
either down or the message arrived after rollback
4. Orphan messages

 messages with “receive‟ recorded but message “send‟ not recorded

 do not arise if processes roll back to a consistent global state

5. Duplicate messages

 arise due to message logging and replaying during process recovery

In-transit messages

In Figure , the global state {C1,8 , C2, 9 , C3,8, C4,8} shows that message m1 has been sent but
not yet received. We call such a message an in-transit message. Message m2 is also an in-transit
message.

Delayed messages

Messages whose receive are not recorded because the receiving process was either down or the
message arrived after the rollback of the receiving process, are called delayed messages. For
example, messages m2 and m5 in Figure are delayed messages.

Lost messages

Messages whose send is not undone but receive is undone due to rollback are called lost
messages. This type of messages occurs when the process rolls back to a checkpoint prior to
reception of the message while the sender does not rollback beyond the send operation of the
message. In Figure , message m1 is a lost message.
Duplicate messages

 Duplicate messages arise due to message logging and replaying during process
recovery. For example, in Figure, message m4 was sent and received before the
rollback. However, due to the rollback of process P4 to C4,8 and process P3 to C3,8,
both send and receipt of message m4 are undone.
 When process P3 restarts from C3,8, it will resend message m4.

 Therefore, P4 should not replay message m4 from its log.

 If P4 replays message m4, then message m4 is called a duplicate message.

Assessment questions to the lecture

Bloom’s
Qn
Question Answer Knowledge
No
Level
1 C K1
What is a local checkpoint in a distributed system?

a) A snapshot of the entire system

b) A message sent between processes
c) A snapshot of the state of an individual process at a given
instant
d) A log of messages exchanged between processes

2 B K1
What defines a consistent global checkpoint in a distributed
system?
a) A set of checkpoints taken at the same time by all processes
b) A set of checkpoints in which no messages are sent by a
process after taking its checkpoint that are received by another
process before taking its checkpoint
c) A set of checkpoints that reflect lost messages
d) A set of checkpoints that include delayed messages

3 B K1
What is an in-transit message in the context of rollback
recovery?

a) A message that has been logged but not yet sent

b) A message that has been sent but not yet received
c) A message that has been delayed due to network congestion
d) A message that was lost during recovery

4 B K1
What are orphan messages in a rollback recovery protocol?

a) Messages with both "send" and "receive" undone due to

rollback
b) Messages with "receive" recorded but "send" not recorded
c) Messages that are duplicated during recovery
d) Messages whose "send" is recorded but "receive" is not

5 C K1
Which type of message arises due to message logging and
replaying during process recovery?

a) Delayed messages
b) Lost messages
c) Duplicate messages
d) Orphan messages

Students have to prepare answers for the following questions at the end of the lecture

Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 Give the difference between a consistent and an 2 CO4 K2
inconsistent global state in a distributed system?
2 Define an orphan message in rollback recovery protocols? 2 CO4 K2
3 Explain the concept of check pointing and rollback 15 CO4 K3
recovery in distributed systems. Discuss the different
types of check pointing strategies (coordinated,
uncoordinated, and communication-induced) and their
impact on system consistency and recovery.
4 Discuss the types of messages (in-transit, lost, delayed, 15 CO4 K3
orphan, and duplicate) that arise in rollback recovery
protocols. Explain how each type of message can affect
the recovery process and system consistency, and describe
strategies for handling these message types.

Reference Book

Author(s) Title of the book Page numbers

Ajay D. Distributed Computing 457-462
Kshemkalyani and
Mukesh Singhal

Unit Lecture
4. CONSENSUS AND RECOVERY 35
No No
Topic Issues in Failure Recovery
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Define an orphan message in the context of failure recovery? K2
LO2 Discuss the challenges in failure recovery within a distributed K2
system.

Issues in failure recovery

In a failure recovery, we must not only restore the system to a consistent state, but also
appropriately handle messages that are left in an abnormal state due to the failure and recovery
• The computation comprises of three processes Pi, Pj , and Pk, connected through a
communication network. The processes communicate solely by exchanging messages over fault
free, FIFO communication channels.
• Processes Pi, Pj , and Pk, have taken checkpoints {Ci,0, Ci,1}, {Cj,0, Cj,1, Cj,2}, and {Ck,0,
Ck,1}, respectively, and these processes have exchanged messages A to J
Suppose process Pi fails at the instance indicated in the figure. All the contents of the volatile
memory of Pi are lost and, after Pi has recovered from the failure, the system needs to be
restored to a consistent global state from where the processes can resume their execution.
• Process Pi’s state is restored to a valid state by rolling it back to its most recent checkpoint
Ci,1. To restore the system to a consistent state, the process Pj rolls back to checkpoint Cj,1
because the rollback of process Pi to checkpoint Ci,1 created an orphan message H (the receive
event of H is recorded at process Pj while the send event of H has been undone at process Pi).
• Pj does not roll back to checkpoint Cj,2 but to checkpoint Cj,1. An orphan message I is created
due to the roll back of process Pj to checkpoint Cj,1. To eliminate this orphan message, process
Pk rolls back to checkpoint Ck,1.

• Messages C, D, E, and F are potentially problematic. Message C is in transit during the failure
and it is a delayed message. The delayed message C has several possibilities: C might arrive at
process Pi before it recovers, it might arrive while Pi is recovering, or it might arrive after Pi has
completed recovery. Each of these cases must be dealt with correctly.
• Message D is a lost message since the send event for D is recorded in the restored state for
process Pj , but the receive event has been undone at process Pi. Process Pj will not resend D
without an additional mechanism.
• Messages E and F are delayed orphan messages and pose perhaps the most serious problem of
all the messages. When messages E and F arrive at their respective destinations, they must be
discarded since their send events have been undone. Processes, after resuming execution from
their checkpoints, will generate both of these messages.
• Lost messages like D can be handled by having processes keep a message log of all the sent
messages. So when a process restores to a checkpoint, it replays the messages from its log to
handle the lost message problem.
• Overlapping failures further complicate the recovery process. If overlapping failures are to be
tolerated, a mechanism must be introduced to deal with amnesia and the resulting
inconsistencies.

Assessment questions to the lecture

Bloom’s
Qn
Question Answer Knowledge
No
Level
1 In a failure recovery process, a consistent global state is checkpoints K1
achieved by ensuring that all processes have __________
corresponding to the same point in their execution.
2 In coordinated checkpointing, processes must orphan K1
synchronize their checkpoints to avoid creating
__________ messages, which can lead to recovery
complications.
3 Lost messages can be managed by having processes message K1
maintain a __________ log of all sent messages to replay
them in case of a rollback.

Students have to prepare answers for the following questions at the end of the lecture

Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 What is an orphan message in the context of failure 2 CO4 K2
recovery?
2 Discuss the challenges in failure recovery within a 15 CO4 K2
distributed system.

Reference Book

Author(s) Title of the book Page numbers

Ajay D. Distributed Computing 462-463
Kshemkalyani and
Mukesh Singhal

Unit Lecture
4. CONSENSUS AND RECOVERY 36
No No
Topic Checkpoint-based Recovery
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Write about the delayed message in the context of failure K2
recovery?
LO2 Illustrate the difference between autonomous and forced K2
checkpoints in communication-induced checkpointing?
LO3 Describe the process of handling failure recovery in a distributed K3
system, focusing on the challenges posed by messages such as
in-transit, lost, delayed, orphan, and duplicate messages. How
does checkpoint-based recovery mitigate these issues?
LO4 Compare and contrast uncoordinated, coordinated, and K3
communication-induced checkpointing strategies in rollback
recovery. Discuss their impact on runtime overhead, recovery
time, and system consistency.

Checkpoint-based recovery

Checkpoint-based rollback-recovery techniques can be classified into three categories:

1. Uncoordinated checkpointing

2. Coordinated checkpointing

3. Communication-induced checkpointing

1. Uncoordinated Checkpointing

 Each process has autonomy in deciding when to take checkpoints

 Advantages

The lower runtime overhead during normal execution

 Disadvantages

1. Domino effect during a recovery

2. Recovery from a failure is slow because processes need to iterate to find a

consistent set of checkpoints
3. Each process maintains multiple checkpoints and periodically invoke a
garbage collection algorithm
4. Not suitable for application with frequent output commits

 The processes record the dependencies among their checkpoints caused by message
exchange during failure-free operation
 The following direct dependency tracking technique is commonly used in uncoordinated
checkpointing.
Direct dependency tracking technique

Assume each process 𝑃𝑖 starts its execution with an initial checkpoint 𝐶𝑖,0

𝐼𝑖,𝑥 : checkpoint interval, interval between 𝐶𝑖,𝑥−1 and 𝐶𝑖,𝑥

When 𝑃𝑗 receives a message m during 𝐼𝑗,𝑦 , it records the dependency from 𝐼𝑖,𝑥 to 𝐼𝑗,𝑦,
which is later saved onto stable storage when 𝑃𝑗 takes 𝐶𝑗,𝑦


When a failure occurs, the recovering process initiates rollback by broadcasting a

dependency request message to collect all the dependency information maintained by
each process.

When a process receives this message, it stops its execution and replies with the
dependency information saved on the stable storage as well as with the dependency
information, if any, which is associated with its current state.
The initiator then calculates the recovery line based on the global dependency
information and broadcasts a rollback request message containing the recovery line.

Upon receiving this message, a process whose current state belongs to the recovery line
simply resumes execution; otherwise, it rolls back to an earlier checkpoint as indicated by
the recovery line.
2. Coordinated Check pointing

In coordinated check pointing, processes orchestrate their check pointing activities so that all
local checkpoints form a consistent global state
Types

1. Blocking Check pointing: After a process takes a local checkpoint, to prevent

orphanmessages, it remains blocked until the entire check pointing activity is complete
Disadvantages: The computation is blocked during the check pointing
2. Non-blocking Check pointing: The processes need not stop their execution while taking
checkpoints. A fundamental problem in coordinated check pointing is to prevent a
processfrom receiving application messages that could make the checkpoint inconsistent.
Example (a) : Check point inconsistency
Message m is sent by 𝑃0 after receiving a checkpoint request from the checkpoint
coordinator
Assume m reaches 𝑃1 before the checkpoint request

This situation results in an inconsistent checkpoint since checkpoint 𝐶1,𝑥 shows the
receipt of message m from 𝑃0, while checkpoint 𝐶0,𝑥 does not show m being sent from
𝑃0

Example (b) : A solution with FIFO channels

If channels are FIFO, this problem can be avoided by preceding the first post-checkpoint
message on each channel by a checkpoint request, forcing each process to take a
checkpoint before receiving the first post-checkpoint message

Impossibility of min-process non-blocking checkpointing

A min-process, non-blocking checkpointing algorithm is one that forces only a minimum

number of processes to take a new checkpoint, and at the same time it does not force any
process to suspend its computation.

Algorithm

The algorithm consists of two phases. During the first phase, the checkpoint initiator
identifies all processes with which it has communicated since the last checkpoint and
sends them a request.
Upon receiving the request, each process in turn identifies all processes it has
communicated with since the last checkpoint and sends them a request, and so on, until
no more processes can be identified.
During the second phase, all processes identified in the first phase take a checkpoint. The
result is a consistent checkpoint that involves only the participating processes.

In this protocol, after a process takes a checkpoint, it cannot send any message until the
second phase terminates successfully, although receiving a message after the checkpoint
has been taken is allowable.
3. Communication-induced Checkpointing

Communication-induced checkpointing is another way to avoid the domino effect, while

allowing processes to take some of their checkpoints independently. Processes may be forced to
take additional checkpoints
Two types of checkpoints

1. Autonomous checkpoints

2. Forced checkpoints

The checkpoints that a process takes independently are called local checkpoints, while those that
a process is forced to take are called forced checkpoints.
Communication-induced check pointing piggybacks protocol- related information on

each application message

The receiver of each application message uses the piggybacked information to determine
if it has to take a forced checkpoint to advance the global recovery line
The forced checkpoint must be taken before the application may process the contents of
the message
In contrast with coordinated check pointing, no special coordination messages are
exchanged
Two types of communication-induced checkpointing

1. Model-based checkpointing

2. Index-based checkpointing.

Model-based checkpointing

Model-based checkpointing prevents patterns of communications and checkpoints

that could result in inconsistent states among the existing checkpoints.
No control messages are exchanged among the processes during normal operation.
All information necessary to execute the protocol is piggybacked on application
messages
There are several domino-effect-free checkpoint and communication model.

The MRS (mark, send, and receive) model of Russell avoids the domino effect by
ensuring that within every checkpoint interval all message receiving events precede
all message-sending events.
Index-based checkpointing.

Index-based communication-induced checkpointing assigns monotonically increasing

indexes to checkpoints, such that the checkpoints having the same index at different
processes form a consistent state.

Assessment questions to the lecture

Bloom’s
Qn
Question Answer Knowledge
No
Level
1 A rollback recovery protocol restores the system back to consistent K1
a __________ state after a failure.
2 In a distributed system, a local checkpoint is a snapshot state K1
of the __________ of the process at a given instance.
3 The phenomenon where the rollback of one process domino K1
forces other processes to roll back to maintain
consistency is called the __________ effect.
4 In coordinated checkpointing, processes synchronize consistent K1
their checkpoints to form a __________ global state.

5 A message that has been sent but not yet received is in-transit K1
termed an __________ message.

Students have to prepare answers for the following questions at the end of the lecture

Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 Write about the delayed message in the context of failure 2 CO4 K2
recovery?
2 Illustrate the difference between autonomous and forced CO4 K2
checkpoints in communication-induced checkpointing? 2
3 Describe the process of handling failure recovery in a 15 CO4 K2
distributed system, focusing on the challenges posed by
messages such as in-transit, lost, delayed, orphan, and
duplicate messages. How does checkpoint-based recovery
mitigate these issues?
4 Compare and contrast uncoordinated, coordinated, and 15 CO4 K3
communication-induced checkpointing strategies in
rollback recovery. Discuss their impact on runtime
overhead, recovery time, and system consistency.

Reference Book

Author(s) Title of the book Page numbers

Ajay D. Distributed Computing 464-470
Kshemkalyani and
Mukesh Singhal

Unit Lecture
4. CONSENSUS AND RECOVERY 37
No No
Topic Coordinated Checkpointing Algorithm
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 Give the two types of checkpoints used in the Koo and Toueg K2
checkpointing algorithm?
LO2 Describe, What happens if the initiating process Pi learns that not K2
all processes successfully took tentative checkpoints?
LO3 Explain the two phases of the Koo and Toueg checkpointing K3
algorithm. What are the key actions taken during each phase?
LO4 Discuss the rollback recovery algorithm of Koo and Toueg. K3
Describe its phases and how it ensures a consistent state after a
failure.

KOO AND TOUEG COORDINATED CHECKPOINTING AND RECOVERY

TECHNIQUE:
• Koo and Toueg coordinated check pointing and recovery technique takes a consistent setof
checkpoints and avoids the domino effect and livelock problems during the recovery.
• Includes 2 parts: the check pointing algorithm and the recovery algorithm

A. The Checkpointing Algorithm

The checkpoint algorithm makes the following assumptions about the distributed system:

Processes communicate by exchanging messages through communication channels.

Communication channels are FIFO.

Assume that end-to-end protocols (the sliding window protocol) exist to handle with
message loss due to rollback recovery and communication failure.
Communication failures do not divide the network.
The checkpoint algorithm takes two kinds of checkpoints on the stable storage: Permanent and
Tentative.

A permanent checkpoint is a local checkpoint at a process and is a part of a consistent global

checkpoint.
A tentative checkpoint is a temporary checkpoint that is made a permanent checkpoint on the
successful termination of the checkpoint algorithm.

The algorithm consists of two phases.

First Phase

1. An initiating process Pi takes a tentative checkpoint and requests all other processes to
take tentative checkpoints. Each process informs Pi whether it succeeded in taking a
tentative checkpoint.
2. A process says “no” to a request if it fails to take a tentative checkpoint

3. If Pi learns that all the processes have successfully taken tentative checkpoints, Pi decides
that all tentative checkpoints should be made permanent; otherwise, Pi decides that all the
tentative checkpoints should be thrown-away.
Second Phase

1. Pi informs all the processes of the decision it reached at the end of the first phase.

2. A process, on receiving the message from Pi will act accordingly.

3. Either all or none of the processes advance the checkpoint by taking permanent
checkpoints.
4. The algorithm requires that after a process has taken a tentative checkpoint, it cannot
send messages related to the basic computation until it is informed of Pi’s decision.
Correctness: for two reasons

i. Either all or none of the processes take permanent checkpoint

ii. No process sends message after taking permanent checkpoint

An Optimization

The above protocol may cause a process to take a checkpoint even when it is not necessary for
consistency. Since taking a checkpoint is an expensive operation, we avoid taking checkpoints.
B. The Rollback Recovery Algorithm
The rollback recovery algorithm restores the system state to a consistent state after a failure. The rollback
recovery algorithm assumes that a single process invokes the algorithm. It assumes that the checkpoint and
the rollback recovery algorithms are not invoked concurrently. The rollback recovery algorithm has two
phases.
First Phase

1. An initiating process Pi sends a message to all other processes to check if they all arewilling to
restart from their previous checkpoints.
2. A process may reply “no” to a restart request due to any reason (e.g., it is already participating in a
check pointing or a recovery process initiated by some other process).
3. If Pi learns that all processes are willing to restart from their previous checkpoints, Pidecides
that all processes should roll back to their previous checkpoints. Otherwise,
4. Pi aborts the roll back attempt and it may attempt a recovery at a later time.

Second Phase

1. Pi propagates its decision to all the processes.

2. On receiving Pi’s decision, a process acts accordingly.

3. During the execution of the recovery algorithm, a process cannot send messages related
to the underlying computation while it is waiting for Pi’s decision.
Correctness: Resume from a consistent state
Optimization: May not to recover all, since some of the processes did not change anything
The above protocol, in the event of failure of process X, the above protocol will requireprocesses X, Y, and Z to
restart from checkpoints x2, y2, and z2, respectively.
Process Z need not roll back because there has been no interaction between process Z and the
other two processes since the last checkpoint at Z.

Assessment questions to the lecture

Bloom’s
Qn
Question Answer Knowledge
No
Level
1 The Koo and Toueg coordinated checkpointing technique aims domino K1
to take a consistent set of checkpoints while avoiding the effect
__________ and livelock problems during recovery.
2 In the checkpointing algorithm, a __________ checkpoint is a tentative K1
temporary checkpoint that becomes permanent upon the
successful completion of the checkpointing process.
3 During the rollback recovery algorithm, if the initiating process Abort K1
Pi learns that not all processes are willing to restart from their
previous checkpoints, it will __________ the rollback attempt.
4 The correctness of the checkpointing algorithm ensures that None K1
either all processes take permanent checkpoints or __________.
5 The rollback recovery algorithm consists of two phases, with restart K1
the first phase involving Pi checking with all other processes if
they are willing to __________ from their previous
checkpoints.

Students have to prepare answers for the following questions at the end of the lecture

Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 Give the two types of checkpoints used in the Koo and 2 CO4 K2
Toueg checkpointing algorithm?
2 What happens if the initiating process Pi learns that not all 2 CO4 K2
processes successfully took tentative checkpoints?
3 Explain the two phases of the Koo and Toueg 15 CO4 K3
checkpointing algorithm. What are the key actions taken
during each phase?
4 Discuss the rollback recovery algorithm of Koo and 15 CO4 K3
Toueg. Describe its phases and how it ensures a consistent
state after a failure.
Reference Book

Author(s) Title of the book Page numbers

Ajay D. Distributed Computing 476-478
Kshemkalyani and
Mukesh Singhal

Unit Lecture
4. CONSENSUS AND RECOVERY 38
No No
Topic Algorithm for Asynchronous Checkpointing and Recovery
Bloom’s
Learning Outcome (LO) At the end of this lecture, students will be able to
Knowledge Level
LO1 List the two types of logs maintained in the Juang and Venkatesan K2
algorithm?
LO2 Demonstrate how does the algorithm identify orphan messages K2
during the recovery process?
LO3 Describe the asynchronous checkpointing mechanism in the Juang K2
and Venkatesan algorithm. How does it handle the recording of
events?
LO4 Explain the recovery algorithm in the Juang and Venkatesan K2
approach. What steps are taken when a processor restarts after a
failure?

ALGORITHM FOR ASYNCHRONOUS CHECKPOINTING AND RECOVERY:

The algorithm of Juang and Venkatesan for recovery in a system that uses asynchronous check
pointing.
A. System Model and Assumptions

The algorithm makes the following assumptions about the underlying system:

The communication channels are reliable, deliver the messages in FIFO order and have
infinite buffers.
The message transmission delay is arbitrary, but finite.

Underlying computation/application is event-driven: process P is at state s, receives

message m, processes the message, moves to state s’ and send messages out. So the
triplet (s, m, msgs_sent) represents the state of P

Two type of log storage are maintained:

– Volatile log: short time to access but lost if processor crash. Move to stable log
periodically.
– Stable log: longer time to access but remained if crashed

A. Asynchronous Check pointing

– After executing an event, the triplet is recorded without any synchronization with
other processes.
– Local checkpoint consist of set of records, first are stored in volatile log, then
moved to stable log.
B. The Recovery Algorithm
Notations and data structure
The following notations and data structure are used by the algorithm:

• RCVDi←j(CkPti) represents the number of messages received by processor pi from processor

pj , from the beginning of the computation till the checkpoint CkPti.

• SENTi→j(CkPti) represents the number of messages sent by processor pi to processor pj , from

the beginning of the computation till the checkpoint CkPti.
Basic idea

 Since the algorithm is based on asynchronous check pointing, the main issue in the
recovery is to find a consistent set of checkpoints to which the system can be restored.
 The recovery algorithm achieves this by making each processor keep track of both the
number of messages it has sent to other processors as well as the number of messages it
has received from other processors.
 Whenever a processor rolls back, it is necessary for all other processors to find out if any
message has become an orphan message. Orphan messages are discovered by comparing
the number of messages sent to and received from neighboring processors.
For example, if RCVDi←j(CkPti) > SENTj→i(CkPtj) (that is, the number of messages received
by processor pi from processor pj is greater than the number of messages sent by processor pj to
processor pi, according to the current states the processors), then one or more messages at
processor pj are orphan messages.
C. The Algorithm

When a processor restarts after a failure, it broadcasts a ROLLBACK message that it had failed

Procedure RollBack_Recovery

processor pi executes the following:

STEP (a)

if processor pi is recovering after a failure then

CkPti := latest event logged in the stable storage
else
CkPti := latest event that took place in pi {The latest event at pi can be either in stable or in

volatile storage.}
end if

STEP (b)

for k = 1 1 to N {N is the number of processors in the system} do

for each neighboring processor pj do
compute SENTi→j(CkPti)

send a ROLLBACK(i, SENTi→j(CkPti)) message to pj

end for

for every ROLLBACK(j, c) message received from a neighbor j do

if RCVDi←j(CkPti) > c {Implies the presence of orphan messages} then

find the latest event e such that RCVDi←j(e) = c {Such an event e may be in the volatile storage
or stable storage.}
CkPti := e
end if
end for
end for{for k}
D. An Example

Consider an example shown in Figure 2 consisting of three processors. Suppose processor Y

fails and restarts. If event ey2 is the latest checkpointed event at Y, then Y will restart from the
state corresponding to ey2.

Figure 2: An example of Juan-Venkatesan algorithm.

 Because of the broadcast nature of ROLLBACK messages, the recovery algorithm is

initiated at processors X and Z.
 Initially, X, Y, and Z set CkPtX ← ex3, CkPtY ← ey2 and CkPtZ ← ez2, respectively,
and X, Y, and Z send the following messages during the first iteration:
 Y sends ROLLBACK(Y,2) to X and ROLLBACK(Y,1) to Z;

 X sends ROLLBACK(X,2) to Y and ROLLBACK(X,0) to Z;

 Z sends ROLLBACK(Z,0) to X and ROLLBACK(Z,1) to Y.

Since RCVDX←Y (CkPtX) = 3 > 2 (2 is the value received in the

ROLLBACK(Y,2) message from Y), X will set CkPtX to ex2 satisfying
RCVDX←Y (ex2) = 1≤ 2.
Since RCVDZ←Y (CkPtZ) = 2 > 1, Z will set CkPtZ to ez1 satisfying
RCVDZ←Y (ez1) = 1 ≤1.
At Y, RCVDY←X(CkPtY ) = 1 < 2 and RCVDY←Z(CkPtY ) = 1 = SENTZ←Y (CkPtZ).

Y need not roll back further.

In the second iteration, Y sends ROLLBACK(Y,2) to X and ROLLBACK(Y,1) to Z;

Z sends ROLLBACK(Z,1) to Y and

ROLLBACK(Z,0) to X; X sends
ROLLBACK(X,0) to Z and ROLLBACK(X, 1)
to Y.
If Y rolls back beyond ey3 and loses the message from X that caused ey3, X can
resend this message to Y because ex2 is logged at X and this message available in
the log. The second and third iteration will progress in the same manner. The set of
recovery points chosen at the end of the first iteration, {ex2, ey2, ez1}, is consistent,
and no further rollback occurs.

Assessment questions to the lecture

Bloom’s
Qn
Question Answer Knowledge
No
Level
1 Which of the following best describes a Volatile log in B K1
the Juang and Venkatesan algorithm?

 A) A log that remains after a processor crash.

 B) A log that is lost if the processor crashes.
 C) A log that requires long access times.
 D) A log that stores only permanent checkpoints.
2 Which of the following best describes a Volatile log in B K1
the Juang and Venkatesan algorithm?

 A) A log that remains after a processor crash.

 B) A log that is lost if the processor crashes.
 C) A log that requires long access times.
 D) A log that stores only permanent checkpoints.

3 In the Juang and Venkatesan algorithm, what C K1

condition indicates that a message at a neighboring
processor has become an orphan?

 A) When the number of sent messages is equal to

the number of received messages.
 B) When the number of messages received is less
than the number of messages sent.
 C) When the number of messages received is
greater than the number of messages sent.
 D) When a processor has no messages in its log.

4 Which of the following is true about the B K1

Asynchronous Checkpointing approach?

 A) It requires synchronization among all

processes during checkpointing.
 B) It allows processes to log their states
independently.
 C) It guarantees that no orphan messages will
occur.
 D) It is less efficient than synchronous
checkpointing.

5 What does the algorithm do with messages that are C K1

stored in the Volatile log?

 A) They are discarded immediately after being

processed.
 B) They are permanently saved to stable storage.
 C) They are transferred to stable logs
periodically.
 D) They are never used for recovery
Students have to prepare answers for the following questions at the end of the lecture

Marks CO Bloom’s
Qn
Question Knowledge
No
Level
1 List the two types of logs maintained in the Juang and 2 CO4 K2
Venkatesan algorithm?
2 Demonstrate how does the algorithm identify orphan 2 CO4 K2
messages during the recovery process?
3 Describe the asynchronous checkpointing mechanism in 15 CO4 K2
the Juang and Venkatesan algorithm. How does it handle
the recording of events?
4 Explain the recovery algorithm in the Juang and 15 CO4 K2
Venkatesan approach. What steps are taken when a
processor restarts after a failure?

Reference Book

Author(s) Title of the book Page numbers

Ajay D. Distributed Computing 478-483
Kshemkalyani and
Mukesh Singhal

Unit Iv
No ratings yet
Unit Iv
46 pages
UNIT 4 DC Final
No ratings yet
UNIT 4 DC Final
38 pages
DC - Unit Iv - Consensus and Recovery Notes
No ratings yet
DC - Unit Iv - Consensus and Recovery Notes
33 pages
DC - Unit 4 Latest
No ratings yet
DC - Unit 4 Latest
110 pages
Lm1-Consensus Algorithm
No ratings yet
Lm1-Consensus Algorithm
35 pages
Consensus and Agreement Algorithms - Introduction
No ratings yet
Consensus and Agreement Algorithms - Introduction
25 pages
Module 5
No ratings yet
Module 5
11 pages
Consensus & Byzantine Protocols
No ratings yet
Consensus & Byzantine Protocols
38 pages
4.2.1. Problem Definition
No ratings yet
4.2.1. Problem Definition
14 pages
CST402 Distributed Computing M5
No ratings yet
CST402 Distributed Computing M5
41 pages
Distributed Systems: Consensus & Recovery
No ratings yet
Distributed Systems: Consensus & Recovery
38 pages
# Consensus and Agreement Algorithms: Distributed Computing
No ratings yet
# Consensus and Agreement Algorithms: Distributed Computing
9 pages
Module 5 Notes
No ratings yet
Module 5 Notes
10 pages
Distributed UNIT IV
No ratings yet
Distributed UNIT IV
60 pages
Unit 4 Final-1
No ratings yet
Unit 4 Final-1
25 pages
Cs3551 - Dss-Unit - IV Notes Final
No ratings yet
Cs3551 - Dss-Unit - IV Notes Final
46 pages
Agreement in A Failure-Free System: Rohini College of Engineering & Technology
No ratings yet
Agreement in A Failure-Free System: Rohini College of Engineering & Technology
6 pages
Rohini 20904778466
No ratings yet
Rohini 20904778466
9 pages
Cs3551 DC Unit IV Notes Unit4
No ratings yet
Cs3551 DC Unit IV Notes Unit4
31 pages
Aggrement Protocols
No ratings yet
Aggrement Protocols
17 pages
CS8603 UNIT 4 Agreement in A Failure Free System
No ratings yet
CS8603 UNIT 4 Agreement in A Failure Free System
37 pages
Byzantine Generals Problem MIT6 - 852JF09 - Lec04 (Ocw - Mit.edu)
No ratings yet
Byzantine Generals Problem MIT6 - 852JF09 - Lec04 (Ocw - Mit.edu)
48 pages
Agreement Protocols
No ratings yet
Agreement Protocols
17 pages
Chapter 15
No ratings yet
Chapter 15
33 pages
Lecture 18: Distributed Agreement: CSC 469H1F / CSC 2208H1F Fall 2007 Angela Demke Brown
No ratings yet
Lecture 18: Distributed Agreement: CSC 469H1F / CSC 2208H1F Fall 2007 Angela Demke Brown
35 pages
Agreement in Synchronous Systems With Failures
No ratings yet
Agreement in Synchronous Systems With Failures
8 pages
DC (Unit 4)
No ratings yet
DC (Unit 4)
14 pages
Distributed Systems: Consensus & Fault Tolerance
No ratings yet
Distributed Systems: Consensus & Fault Tolerance
10 pages
Agreement
No ratings yet
Agreement
5 pages
Distributed Systems - Agreement Protocol
No ratings yet
Distributed Systems - Agreement Protocol
17 pages
Distributed Consensus Essentials
No ratings yet
Distributed Consensus Essentials
36 pages
Synchronous Systems With Failures
No ratings yet
Synchronous Systems With Failures
9 pages
Nikil DS Report
No ratings yet
Nikil DS Report
4 pages
Distributed Computing Consensus and Agreement Algorithms: BITS Pilani
No ratings yet
Distributed Computing Consensus and Agreement Algorithms: BITS Pilani
46 pages
Process Resilience: by Ravalika Pola
No ratings yet
Process Resilience: by Ravalika Pola
17 pages
Unit-3A - Aggrement Protocols
No ratings yet
Unit-3A - Aggrement Protocols
15 pages
DC MergedPostMidSem
No ratings yet
DC MergedPostMidSem
262 pages
CS3551 - Unit IV
No ratings yet
CS3551 - Unit IV
34 pages
Da10 Byzantine
No ratings yet
Da10 Byzantine
28 pages
Distributed Consensus Challenges
No ratings yet
Distributed Consensus Challenges
18 pages
Agreement Protocols Book
No ratings yet
Agreement Protocols Book
5 pages
Reliable Multicast & Consensus Systems
No ratings yet
Reliable Multicast & Consensus Systems
41 pages
Document 32distributed Computing Concept
No ratings yet
Document 32distributed Computing Concept
16 pages
Week 5 Assignment 05
No ratings yet
Week 5 Assignment 05
4 pages
Distributed Computing UNIT-4
No ratings yet
Distributed Computing UNIT-4
27 pages
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
No ratings yet
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
52 pages
Byzantine Agreement: Distributed Systems
No ratings yet
Byzantine Agreement: Distributed Systems
32 pages
DC Unit IV
No ratings yet
DC Unit IV
37 pages
Agp
No ratings yet
Agp
7 pages
Unit5 Compressed Fault Tolerance - PACE
No ratings yet
Unit5 Compressed Fault Tolerance - PACE
11 pages
DS Chapter 8-Fault Tolerance
No ratings yet
DS Chapter 8-Fault Tolerance
68 pages
Chapter 8-Fault Tolerance
100% (1)
Chapter 8-Fault Tolerance
71 pages
Consensus and Agreement
No ratings yet
Consensus and Agreement
6 pages
Lecture-6 - Consensus
No ratings yet
Lecture-6 - Consensus
29 pages
Consensus and Recovery
No ratings yet
Consensus and Recovery
12 pages
Unit 4 Part 1 B
No ratings yet
Unit 4 Part 1 B
31 pages
Sec 2425 L06
No ratings yet
Sec 2425 L06
42 pages
APznzaaXffSqYMt6FkYQ0232zlG__fySOaacNFJrnznmxMJ7ZUE8i_5pvQdZTvYrytNVU92wPgbQMEZEBF45ep5ocX5WIYL2XCHGoCfhmwnlgKZo3468oAhaY0f5Ua583UEdpV4DcELKoWag479q3OLktZn6Ysk_ohvdyX0Kj1Y6TpkQ0By1WF8YICb6VBjXKC7-az0n3-dI0
No ratings yet
APznzaaXffSqYMt6FkYQ0232zlG__fySOaacNFJrnznmxMJ7ZUE8i_5pvQdZTvYrytNVU92wPgbQMEZEBF45ep5ocX5WIYL2XCHGoCfhmwnlgKZo3468oAhaY0f5Ua583UEdpV4DcELKoWag479q3OLktZn6Ysk_ohvdyX0Kj1Y6TpkQ0By1WF8YICb6VBjXKC7-az0n3-dI0
7 pages
Unit - 1 Block Chain
No ratings yet
Unit - 1 Block Chain
81 pages
Ripple Consensus Whitepaper
No ratings yet
Ripple Consensus Whitepaper
10 pages
A Review On The Role of Blockchain Technology in T
No ratings yet
A Review On The Role of Blockchain Technology in T
29 pages
Block Chain - 1
No ratings yet
Block Chain - 1
32 pages
Distributed Systems CST-019
No ratings yet
Distributed Systems CST-019
1 page
Blockchain Consensus Mechanisms
No ratings yet
Blockchain Consensus Mechanisms
15 pages
Unit I Introduction To Blockchain
No ratings yet
Unit I Introduction To Blockchain
78 pages
Byzantine Problem and Solutions
No ratings yet
Byzantine Problem and Solutions
16 pages
UNIT 2 IoT and Cyber-Physical Systems
No ratings yet
UNIT 2 IoT and Cyber-Physical Systems
43 pages
Fault Tolerance Notes
No ratings yet
Fault Tolerance Notes
101 pages
Naoris Protocol WhitePaper
No ratings yet
Naoris Protocol WhitePaper
49 pages
A Survey of Blockchain Applications in The FinTech Sector
No ratings yet
A Survey of Blockchain Applications in The FinTech Sector
44 pages
IoT Edge Blockchain: LiTiChain
No ratings yet
IoT Edge Blockchain: LiTiChain
15 pages
Block Chain Technologies For UAV Swarms and UAV-based Networks, SLR
No ratings yet
Block Chain Technologies For UAV Swarms and UAV-based Networks, SLR
23 pages
UNIT2
No ratings yet
UNIT2
5 pages
Module III-part 1
No ratings yet
Module III-part 1
36 pages
GENKI - Whitepaper v0.9.2 (Completed)
No ratings yet
GENKI - Whitepaper v0.9.2 (Completed)
59 pages
Blockchain
No ratings yet
Blockchain
7 pages
Analysis of The Main Consensus Protocols of Blockchain: Sciencedirect
No ratings yet
Analysis of The Main Consensus Protocols of Blockchain: Sciencedirect
5 pages
Blockchain Book
No ratings yet
Blockchain Book
150 pages
Emerging Trends in Blockchain Technology and Applications - A Review
No ratings yet
Emerging Trends in Blockchain Technology and Applications - A Review
24 pages
Blockchain As A Distributed System
No ratings yet
Blockchain As A Distributed System
17 pages
07 BCT Hyperledger Fabric
No ratings yet
07 BCT Hyperledger Fabric
47 pages
A Case-Study Application of RTCA DO-254: Design Assurance Guidance For Airborne Electronic Hardware
No ratings yet
A Case-Study Application of RTCA DO-254: Design Assurance Guidance For Airborne Electronic Hardware
10 pages
Unit I BCT
No ratings yet
Unit I BCT
41 pages
Iqbal 1
No ratings yet
Iqbal 1
17 pages
Permissioned Blockchain Insights
No ratings yet
Permissioned Blockchain Insights
44 pages
Blockbuster Blockchain
No ratings yet
Blockbuster Blockchain
81 pages
CS1713-Blockchain Technologies Lecture Notes-Unit III
No ratings yet
CS1713-Blockchain Technologies Lecture Notes-Unit III
19 pages