0% found this document useful (0 votes)

36 views26 pages

A 161126

a161126

Uploaded by

yarzarmon.yzm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views26 pages

A 161126

a161126

Uploaded by

yarzarmon.yzm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Page 2

1. Introduction
Checkpointing and rollback-recovery are well-known techniques that allow
processes to make progress in spite of failures [Rand78]. The failures under con-
sideration are transient problems such as hardware errors and transaction aborts,
i.e., those that are unlikely to recur when a process restarts. With this scheme, a
process takes a checkpoint from time to time by saving its state on stable storage
[Lamp79]. When a failure occurs, the process rolls back to its most recent check-
point, assumes the state saved in that checkpoint, and resumes execution.
We first identify consistency problems that arise in applying this technique to
a distributed system. We then propose a checkpoint algorithm and a rollback-
recovery algorithm to restart the system from a consistent state when failures
occur. Our algorithms prevent the well-known "domino effect" as well as liveness
problems associated with rollback-recovery. In contrast to previous algorithms,
they are fault-tolerant and involve a minimal number of processes. With our
approach, each process stores at most two checkpoints in stable storage. This
storage requirement is shown to be minimal under general assumptions.
The paper is organized as follows: We discuss the notion of consistency in a
distributed system in section 2 and describe our system model in section 3. In sec-
tion 4 we identify the problems to be solved. Sections 5 and 6 contain the check-
point and rollback-recovery algorithms respectively. The algorithms are extended
for concurrent executions in section 7. In section 8 we consider optimizations. Sec-
tions 9 and 10 contain a discussion and our conclusion.

2. Consistent Global States in Distributed Systems

The notion of a consistent global state is central to reasoning about correct-
ness in distributed systems. It was initially studied in [Rand75, Russ77, Pres83]
and later formalized by Chandy and Lamport [Chan85l. We summarise the ideas
in [Chan85]:
In a distributed computation, an euent at a process p can be a spontaneous
change of p's state, or the sending or receipt of a message by p. Event a
directly happens before event b if and only if
(1) there exist states s, 1,2, and ';3 such that event a changes p's process state
from s, to s 2 and event b changes p's process state from s2 to s 3 ; or
(2) event a is the sending of a message m by a process p and event b is the
receiving of m by another process q.

_..--.-....-,.......-....,............-.....,,.........". ........................ "...+....."....."."] i

Page 3

The transitive closure of the directly happens before relation is the happens before
relation. If event a happens before event b, b happens after a. (We abbreviate hap-
pens before, "before" and happens after, 'after".)
The local state of a process at time 0 is its initial state; the local state of a pro-
ces at time t is the state resulting from applying the sequence of events occurring
in(, t to its initial state. If a process has failed by time t, its local state at t is
undefined. A global state of a system: at time t is the set of all processes' local
states at t. The state of a channel at time t is the set of messages sent over that
channel but not yet received at t. We car depict the occurrences of events over
time with a time diagram, in which horizontal lines are time axes of processes,
points are events, and arrows represent messages from the sending process to the
receiving process. In this representation, a global state is a cut dividing the time
diagram into two halves. The channel states are the arrows (messages) that cross
the cut. Figure 1 is a time diagram for a system of four processes.
Informally, a cut (global state) in the time diagram is consistent if no arrow
starts on the right hand side and ends on the left hand side of it. This notion of
consistency fits the observation that a message cannot be received before it is sent
in any temporal frame of reference. For example, the cuts c and c' in Figure 1 are
consistent and inconsistent cuts, respectively. The channel states corresponding to
cut c consists of one message in the channel from p to q, and one in the channel
from s to r. Readers are referred to [Chan85] for a formal discussion of consistent
global states.

3. System Model
The distributed system considered in this paper has the following characteris-
tics:

I..

FIG. 1. Consistent and inconsistent cuts in a distributed system.

,
. , -
.-.
-- , * - ,,, * m
*, . ,. .~'~t, .ia.iadlt * .. .. ..
. . . . . - , - . . . . . . . -. . . . . - . . . . -F. i . 7 , - , r : , - . 2

Page 4

(1) Processes do not share memory or clocks. They communicate via messages
sent through reliable first-;n-first-out (FIFO) channels with variable non-zero
transmission time.
(2) Processes fail by stopping, and whenever a process fails, all other processes are
informed of the failure in finite time. We assume that processes' failures
never partition the communication network.
We want to develop our algorithms under the weakest possible set of assump-
tions. In particular, we do not assume that the underlying system is a database
transaction system ([Fisc82] and [Jose85]). This special case admits simpler solu-
tions: the mechanisms that ensure atomicity of transactions can hide inconsisten-
cies introduced by the failure of a transaction. Furthermore, we do not assume
that processes are deterministic: this simplifying assumption is made in previous
results (e.g., [Stro85] and [Jose85]).

4. Identification of Problems
A checkpoint is a saved state of a process. A set of checkpoints, one per process
in the system, is consistent if the saved states form a consistent global state. For
example, consider the system history in Figure 2. Process p takes a checkpoint at
time X and sends a message to q some time later. After receiving this message, q
takes a checkpoint at time Y. Subsequently, p fails and restarts from the check-
point taken at X. The global state at p's restart is inconsistent because p's local
state shows that no message has been sent to q, while q's local state shows that a
message from p has been received. If p and q are processes supervising a
customer's accounts at different banks, and the message transfers funds from p to
q, the customer will have the funds at both banks when p restarts. This incon-
sistency persists even if q is forced to roll back and restart from its checkpoint
taken at Y.

X failure
P

FIG. 2. Inconsistent checkpoints.

Page 5

The problem of ensuring that the system recovers to a consistent state after
transient failures has two components: checkpoint creation and rollback-recovery;
we examine each one in turn.
4.1. Checkpoint Creation
There are two approaches to creating checkpoints. With the first approach,
processes take checkpoints independently and save all checkpoints on stable
storage. Upon a fMiiure; processes must find and agree upon a. consistent set of
checkpoints among the saved ones. The system is then rolled back to and restarted
from this set of checkpoints (Ande79, Russ8O, Wood81, Hadz82J.
With the second approach, processes coordinate their checkpointing actions
such that each process saves only its most recent checkpoint, and the set of check-
points in the system is guaranteed to be consistent. When a failure occurs, the sys-
tem restarts from these checkpoints ITami84].
A disadvantage of the first approach has long been recognized (Rand75,
Pres831 and is named. the "domino effect". We illustrate this effect in Figure 3. In
this example, processes p and q have independently taken a sequence of check-
points. The interleaving of messages and checkpoints leaves no consistent set of
checkpoints for p and q, except the initial one at {Xo, Y 0}. Consequently, after p
fails, both p and q must roll back to the starting point of the computation. For
time-critical applications that require a guaranteed rate of progress, such as real
time process control, this behavior results in unacceptable delays. An additional
disadvantage of independent checkpoints is the large amount of stable storage
required for the saved states.
To avoid, these drawbacks, we pursue the second approach. In contrast to
£Tami84], our method ensures that when a process takes a checkpoint, a minimal
number of additional processes are forced. to take checkpoints.

X Xt X2 X3 faildu

qI
O r2Y' Y.

FIG. 3. -Domino effect" following a failure.

"..-.
-. .-. '. . . .-' ---................-..-....- ,.."..-.....-".,.."........'.'
.-'.-'.-. ..............-'.-..,..,..-,.-.....-.., -.
Page 6

4.2. Rollback-Recovery
Rollback- recovery from a consistent set of checkpoints appears deceptively
simple. The following scheme seems to work: Whenever a process rolls back to its
checkpoint, it notifies all other processes to also roll back to their respective check-
points. It then installs its checkpointed state and resumes execution. Unfor-
tunately, this simple recovery method has a major flaw. In the absence of syn-
chronization, processes cannot all recover (from their respective checkpoints) simul-
taneously. Recovering processes at different times introduces a liveness problem as
illustrated below.
Consider two processes p and q. Figure 4 illustrates their histories up to the
time p fails. Process p fails before receiving the message n1 , rolls back to its
checkpoint, and notifies q. Then p recovers, it sends m., and receives n 1. After p's
recovery, p has no record of sending m 1, while q has a record of its receipt. There-
fore, the global state is inconsistent. To restore consistency, q must also roll back
(to "forget" the receipt of ml1 ), and notify p. Note that after q rolls back, q has no
record of sending n, while p has a record of its receipt. Hence, the global state is
inconsistent again, and upon notification of q's rollback, p must roll back a second
time. After q recovers, q sends n2 and receives in2 . Suppose p rolls back before
receiving n2 as shown in Figure 5. With the second rollback of p, the sending of
M2 is "forgotten". To restore consistency, q must roll back a second time. After p
recovers it receives n2 , and upon notification of q's rollback, it must roll back a
third time. It is now clear that p and q can be forced to roll back forever, even
though no additional failures occur.
Our ro llback-recovery algorithm solves this liveness problem. It tolerates
failures that occur during its execution, and forces a minimal number of processes
to roll back after a failure. In [Tami84], a single failure forces the system to roll

failure

checkpoints/V

FIG. 4. Histories of p and q tip to p's failure.

Page 7

FIG, S. History of p and q up to p's 2nd rollback.

back as a whole. Furthermore, the system crashes (and does not recover) if a
failure occurs while it is rolling back.

S. Checkpoint Creation

0 5.1. Naive Algorithms

From Figure 2 it is obvious that if every process takes a checkpoint after
every sending of a message, and these two actions are done atomically, the set of'
the most recent checkpoints is always consistent. But creating a checkpoint after
every send is expensive. We may naively reduce the cost of the above method with
a strategy such as "every process takes a checkpoint after every k sends, k > 1" or
"every process takes a checkpoint on the hour". However, the former can be shown
to suffer domino effects by a construction similar to, the one in Figure 3, while the
latter is meaningless for a system that lacks perfectly synchronized clocks.

5.2 Classes of-Checkpoints

Our algorithm saves two kinds of checkpoints on stable storage: permanent
and tentative. A permanent checkpoint cannot be undone. It guarantees that the
computation needed to reach. ther checkpointed state will not be repeated. A tenta-
tive checkpoint, hoL wer, can be undone or changed to be a permanent checkpoint.
When the context is clear, we call permanent checkpoints 'checkpoints".
Consider a system with, a consistent set of permanent checkpoints. A check-
point algorithm is resilient to- failures if the set of permanent checkpoints in the
system is still consistent after the algorithm terminates, even if some processes fail
during its execution. Consider systems where processes cannot afford to take a
checkpoint after every send, or systems where processes cannot combine the send-
ing of a message and the taking of a checkpoint atomically. For these systems,
checkpoint algorithms must store at least two checkpoints in stable storage in
Page 9

rest of the proof is analogous to Case 1. 12

Theorem 1 shows that besides the impractical 'naive" algorithm described in sec-
tion 5.1, any resilient checkpoint algorithm must store at least two checkpoints on
stable storage.

53. Our Checkpoint Algorithm

We assume that, the algorithm is invoked by a single process that wants to
take a permanent checkpoint. We also assume that no failures occur in the system.
I. section 5.3.4 we extend. the algorithm to handle failures, and in section 7 we
describe concurrent invocations of this algorithm.

53.L Motivation
To create consistent checkpoints, processes can execute an algorithm that is
patterned on two-phase-commit protocols. In the first phase, the initiator q takes a
tentative checkpoint and requests all processes to take tentative checkpoints. If q
learns that all processes have taken tentative checkpoints, q decides all tentative
checkpoints should be made permanent; otherwise, q decides tentative checkpoints
should be discarded. In the second phase, q's decision is propagated and carried out
by all processes. Since all or none of the processes take permanent checkpoints, the
most recent set of checkpoints is always consistent.
However, our goal is to force a minimal number of processes to take check-
points. The above algorithm is modified as follows: A process p takes a tentative
checkpoint after it receives a checkpoint request from q only if q's tentative check-
point records the receipt of a message from p, while p's latest permanent check-
point does not record the sending of that message. Process p determines whether
this condition is true using the label appended to q's request. This labeling scheme
is described below.
Messages that are not sent by the checkpoint or rollback-recovery algorithms
are system messages. Every system message n contains a label m.l. Each process
appends outgoing system messages with monotonically increasing labels. We
define I and 7 to be the smallest and largest labels, respectively. For any
processes r and p, let m be the last message that r received from p after r took its
last permanent or tentative checkpoint. Define:

lasLnrmsg (p) - { Lif mn exists

otherwise
Also, let m be the first message that r sent to process p after r took its last

-- ",' ' . ,' . " . ' " ." . . € "' - " m . . . . . .. .m .. ... ' " -. . . . . .i''
Page 10

permanent or tentative checkpoint. Define:

.-if m exists
first_smsgr(p) _ otherwise

When q requests p to take a tentative checkpoint, it appends lastrmsgq(p) to its

request; p takes the checkpoint if /<first-smsgp(q)<_1astrMsgq(p).

5.3.2. Description
Process p is a ckpt-cohort of q if q has taken a tentative checkpoint, and
last_rmsgq(p)>1_ before the tentative checkpoint was taken. The set of
ckptcohorts of q is denoted ckpt.cohortq. Every process p keeps a variable
willing.to-ckpt, to denote its willingness to take checkpoints. Whenever p cannot
be interrupted to run the checkpoint algorithm, willing-to.ckptP is "no". The ini-
tiator q starts the checkpoint algorithm by making a tentative checkpoint and
sending a request "take a tentative checkpoint and last_rmsgq(p)" to all
pEckpt-cohortq. A process p znherits this request if willing_.tockptp is "yes" and
lastrrmsgq(p)-firstsmsgp(q)>-L. After p inherits a request, it takes a tentative
checkpoint and sends "take a tentative checkpoint and last-rmsgp(r)" requests to
all rEckpt-cohort.. If p receives but does not inherit a request from q, p replies
willing.to-ckpt, to q.
After p sends out its requests, it waits for replies that can be either "yes" or
-no", indicating a ckpLcohort's acceptance or rejection of p's request. If at least one
reply is "no", willtng.to-ckptp becomes "no"; otherwise willing-to-ckptp is
unchanged. Process p then sends willing__to-ckptp to the process whose request p
has inherited.
If all the replies from its ckpLcohorts arrive and are all "yes", the initiator
decides to take all tentative checkpoints permanent. Otherwise the decision is to
undo all tentative checkpoints. This decision is propagated in the same fashion as
the request "take a tentative checkpoint" was delivered. Between the times a pro-
cess p takes a tentative checkpoint and it receives the decision from the initiator, p
does not send any system messages. Also, after processes take new permanent
checkpoints, they may discard their previous checkpoints.
The algorithm is presented in Figure 7. For simplicity, we create a fictitious
process called daemon to assume the initiation and decision tasks of -he initiator.
In practice, daemon is a part of the initiator process.

await does not prevent a process from receiving messages.

Page 11

Daemon process:

send(initiator, 'take a tentative checkpoint and T");

await(initiator, reply);'
if reply = "yes" them
send(initiator, "take tentative checkpoint permanent')
else
send(initiator, "undo tentative checkpoint")
ft.

All processes :

INITIAL STATE:
first.smsg,(daemon) = T-,
w'yest if p is willing to take a checkpoint
wiling..to..ckpt "no" otherwise

UPON RECEIPT OF "take a tentative checkpoint and lasL.rmsgq(p)" from q DO

if willing-to-xkptp and lasLjrmsgq(p)-firs.smsgp(q) >.L then
take a. tentative checkpoint;
for all uEckpLcohortp, send(u, 'take a tentative checkpoint and 1asLtrmsgp(u)");
for all uEckpt-cohort., await(u, willing-to-ckptu);
if 3 uEckpt-cohort., willingtoA..ckpt, f lnow then willing-to.ckptpy- "no" f8;

send(q, willing-l-kpt p);

UPON FIRST RECEIPT OF m ="take tentative checkpoint permanent" or

m ="undo tentative checkpoint" DO
if n ="take tentative checkpoint permanent" then
take tentative checkpoint permanent;
else
undo tentative checkpoint;

for all uEckpt-cohort., send(u, m);

odL

FIG. 7. Algorithm CI: the Checkpoint Algorithnm

. .. . *.
Page 12

5.3.3. Proof of Correctness

We consider a single invocation of the algorithm, and we assume no process
fails in the system.
Lemma 2. Every process inherits a request to take a tentative checkpoint at most
once.
Proof- Immediately after a process p inherits a request it takes a tentative
checkpoint. From the time p takes this checkpoint to the time it
receives the initiator's decision, p does not send any system messages.
Therefore, during this interval of time, first_smsgp(q)=.± for all q.
Process p does not inherit additional requests during the execution of
the algorithm. Q
Lemma 3: Every process terminates its execution of Algorithm C1.
Proof: Any process that executes C1 without making a tentative checkpoint
clearly terminates. Let p be a process that takes a tentative check-
point. By lemma 2, p inherits a request to take a tentative checkpoint
at most once. Consequently, to prove that C1 terminates at p, it suffices
to prove that after p takes a tentative checkpoint, it does not wait for-
ever for either the "yes" or "no" from its ckpt.cohorts, or the initiator's
decision.

Let q be a ckpt._cohort of p. If q inherits p's request to take a tentative

checkpoint, it sends willing.Jockptq to p before it waits for the
initiator's decision. If q, on. the other hand, does not inherit p's request,
it sends p willing-o.ckptq immediately after receiving p's request.
Therefore, there can be no deadlock involving p waiting for replies from
its ckpLcohorts and a ckpLcohort of p waiting for the initiator's deci-
sion.

Suppose that p is in a deadlock waiting for replies from its

ckpL.cohorts. Then there exists a circular chain of processes p=po,
ph (kal) such that for 0i_<k, pi waits forever for its
ckpt-cohort, p, + k,
,1md to send wifhing.to.ckptp., If p, waits for-
ever for p, + I od k p, +I wd k must have inherited a request from p,.
Since the initiator does not inherit any requests, it is not in the chain.
And since there is only one initiator, there must exist a process q such
that for some i, p, inherits a request from q, and q ep for all i. But p,

:-""-:-
- : -.'.:
: "' "- :.''.'' i" . '':/'-:- :.- ',' -,.-." - "'.. . . . . . "-"":"
. . ..-. - .:?'"' "

" "
_ :.,: , " "-. . .,, ***** .***.* iai"d m non - '" "' ' "' '.. - '- '" " "n .. "" .
"b. '. - '"
. u. - W w :r ".- 7. . -_ - - . .. .' . . .. .-- .
-r .,., . .. -.. i
-.. . , .,

Page 13

contradicts lemma 2 by inheriting two requests: one from q and one

from Pg-L ,. 1, Consequently, no deadlock can exist and p will receive
replies from all its ckpL-cohorts.

Since every process receives replies from all its ckpt-cohorts, the initia-
tor will receive replies from all its ckpt-cohorts to decide on the tenta-
tive checkpoints. Its decision is guaranteed to reach all processes that
have taken tentative checkpoints because all processes will pass on the
decision and messages are always delivered. Thus we have shown that
n& process waits forever for replies from its ckpt-cohorts or the
initiator's decision. 0
The next lemma shows that C1 takes a consistent set of checkpoints.
Lemma 4: If the set of checkpoints in the system is consistent before the execution
of Algorithm CI, the set of checkpoints in the system is consistent after
the termination of CL
Proof. Without loss of generality, assume new checkpoints are taken in C1.
The proof is by contradiction. Suppose the set of checkpoints after C1
terminate is not consistent. Then there must exist two processes p and
q suchr that p sent q a message m after making its permanent check-
point, and q received m before making its permanent checkpoint. Since
al checkpoints are consistent before the execution of C1, q must have
taken its permanent checkpoint during this execution. Before q took a
tentative checkpoint in C1, last..rmsgq(p) ti.1; therefore, p was in
ckpt-cohortq and received a request to take a tentative checkpoint from
q. When p received the request, willingto.ckpt, had to be "yes"
because q cannot have taken its tentative checkpoint permanent other-
wise. Moreover, if p had not taken a tentative checkpoint when q's
request arrived, t8Lrrmg (p) afirs .smsg.(q) because
first.smsgp(q) 5m. Hence, process p took a tentative checkpoint after
sending m. Process p, however, must take its tentative checkpoint per-
manent if q takes its permanent. Consequently p takes a permanent
checkpoint after sending m, a contradiction. 0
We now show that the nuriber of processes that take new permanent check-
points during the execution of Algorithm C1 is minimal- Let P-pol,Pi -. , Pk}
be the set of processes that take new permanent checkpoints in CI, where Po is the
initiator of CL Let C(P)={c(po), c(p 1 ), • c(pk)j be the permanent checkpoints
Page 14

taken by processes in P. Define an alternate set of checkpoints:

C'(P)={c'(po), c'(pl), ", c'(pk)l where c'(po) = c(po) and for 1:5i~k, c'(pi)=
either c(pi) or the checkpoint pi had before executing C1.
Theorem 5: C'(P) is consistent if and only if C'(P)-C(P).
Proof. Without loss of generality, assume IPI a-2. The if part is by lemma 4.
We show the only if part by contradiction. Suppose C'(P) eC(P) and
C'(P) is consistent. Then there exists a nonempty subset Q of P such
that for all process q in Q, c'(q)ec(q). For any processes p and q, if p
inherits a checkpoint request from q, q's tentative checkpoint is taken
before p's. Therefore, the inherit relation is non-circular. Because of
this non-circularity and the fact that the initiator is in Q (since
c'(po)=c(po)), there exists pjEQ such that pi inherits a checkpoint
request from another process pj EQ. Since pi EP implies pj EP, we know
that c'(p1 ) = c(pj).

When pi inherits pj's request, lastrmsgpj(ps) a first.smsgp(pj)>i.

There exists a message m such that last_rmsg,(p) =m.l. In C'(P), the
sending of m is not recorded in c'(pi) since m.l Z-first..smsgp(pj), but the
receipt of m is recorded in c'(pj). Contrary to the assumption, C'(P) is
not consistent. []
Theorem 5 shows that if po takes a checkpoint, then all processes in P must take a
checkpoint to ensure global consistency.

5.3.4. Coping with Failures

We now extend Algorithm C1 to handle processes' failures. We first consider
the effects of failures on non-faulty processes. When failures occur, a non-faulty
process may receive zero or more of the following messages:
(1) "yes" or "no" from ckpt-cohorts,

(2) "take tentative checkpoint permanent' or "undo tentative checkpoint" from

the initiator.
Suppose process p fails before replying "yes" or "no" to process q's request. By
the assumptions of section 3, q will know of p's failure. Process q can then assume
that p is unwilling to take a permanent checkpoint. This assumption is correct
even if p has taken a tentative checkpoint before it fails, as long as p undoes its
tentative checkpoint when it recovers (see section 5.5). Therefore, to take care of
the case of a missing "yes" or "no", it suffices to change the line in CI from

•
*. . - . . . . .. •,. . . o. . o . . o . q . . . . . . - . . i - o - - . ° . .' - °
Page 15

if 3 uEckpt.cohortp, willinga.kpt. = "no" then willingJockpt.4- "no" fB

to
if 3 v Eckpt.cohort., willing-tockpt, = "no" or u has failed then
willing-to.ckpt,+- t"no" &

Suppose that a process p does not receive the decision regarding its tentative
checkpoint. Ifp undoes its tentative checkpoint or takes it permanent, it risks con-
tradicting the initiator. A common practice in this situation is to have p blocked
until it discovers the initiator's decision [Skee82I. We will discuss ways to obviate
blocking in section S.
We now consider the recovery of faulty processes. When a process restarts
after a failure, its latest checkpoint on stable storage may be tentative or per-
manent. If this checkpoint is tentative, the recovering process must decide whether
to discard it or to take it permanent. The decision is made as follows:
Suppose the recovering process is the initiator. The initiator knows that every
process that has taken a tentative checkpoint is still blocked waiting for its deci-
sion. Hence it is safe for the initiator to decide to undo the tentative checkpoints
and send this decision to its ckpt.coharta
If the recovering process is not the initiator, it must discover the initiator's
decision regarding tentative checkpoints. It may contact either the initiator or
those processes of which it is a ckpt-cohort; it follows the decision accordingly.
Now the recovering process is left with one permanent checkpoint on stable
storage. Recovery is complete when it uses the rollback-recovery algorithm to be
presented in section .6 to restart from this checkpoint.
Let C2 be the Algorithm C1 as modified above. C2 terminates if all processes
that fail during the execution of C2 recover. At termination, the set of checkpoints
in the system is consistent, and the number of processes that took new permanent
checkpoints is minimal The proofs for these properties are similar to those of CI
and are omitted.

6. Rollack-Recovery
We assume that the algorithm is invoked by a single process that wants to roll
back and recover (henceforth. denoted restart). We also assume that the checkpoint
algorithni and the rollback-recovery algorithm are not invoked concurrently. Con-
current invocations of the algorithms are described in section 7.

- . 'm'i m 'l l l iil..l.. ..

Page 16

6.1. Motivation
The rollback-recovery algorithm is patterned on two-phase-commit protocols.
In the first phase, the initiator q requests all processes to indicate their willingness
to restart from their checkpoints. Process q decides to restart all the processes if
and only if they are all willing to restart. In the second phase, q's decision is pro-
pagated and carried out by all processes. We will prove that the two-phase struc-
ture of this algorithm prevents the liveness problem discussed in section 4.2. Since
all or none of the processes restart, when the rollback-recovery algorithm ter-
minates the global state is consistent.
However, our goal is an algorithm that rolls back a minimal number of
processes in order to recover from a failure. If a process p rolls back to a state
saved before an event e occurred, we say that e is undone by p. With our algo-
rithm, process p must restart only if q's rollback will undo the sending of a mes-
sage to p. Process p determines if it must restart using the label appended to q's
request.
For any processes r and p, let m be the last message that r sent to p before r
took its latest permanent checkpoint. Define
m.m if m exists
last-smsgr(p) T otherwise
When q requests p to restart, it appends last_smsgq(p) to its request. Process p
restarts from its permanent checkpoint if last-rmsgp(q) >last.smsgq(p).

6.2. Description
Process p is a roll -cohort of q if q can send messages to it. The set of roll-
cohorts of q is roll-cohortq2 . Every process p keeps a variable willing.Jo...rollP to
denote its willingness to roll back. The initiator q starts the rollback-recovery algo-
rithm by sending a request "prepare to roll back and lastsmsgq(p)" to all
pEroll-cohortq. A process p inherits this request if willing.to..rollp is "yes",
lastrmsgp(q)>last__smsgq(p), and p has not already inherited another request to
roll back. After p inherits the request, it sends "prepare to roll back and
last.smsgp(r)" to all. rEroll -cohort ,; otherwise, it replies willing._to-rollP to q.

'The relationship between roll -cohort and ckpt-cohort is not symmetric. if p is a ckpt-cohort of
q, lastrm.s 4 (p)>I and q must then be a roll -cohort of p. On the other hand, it is possible that
pfckpt-cohortq but qEroll -cohort:, because p can but does not send messages to q.
Page 17

After p sends out its requests, it waits for replies from each process in
rol-cohort.,. The reply can be an explicit "ys or "no message, or an implicit
ttno" when p discovers that r has failed. If at least one reply is "no",
willing-Jo.roll,, becomes "no", otherwise willingjo-rollp is uinchanged. Process p
then sends wiling-to..roll, to the process whose request p inherits. Between the
times p inherits the rollback request and it receives the decision from the initiator,
it does not send any system messages.
If all the replies from its roll-cohorts arrive and are all "yes", the initiator
decides the rollbacks will proceed, otherwise it decides no process will roll back.
This decision is propagated to all processes in the same fashion as the request
"tprepare to roll back"' is delivered. Process p blocks waiting for the discovery of the
initiator's decision, if failures prevent the decision from reaching p. We discuss
non-blocking algorithms in section 8.
The rollback-recovery algorithm is presented in Figure 8. Like the presenta-
tion of Algorithm C1, we introduce a fictitious process called daemon to perform
functions that are unique to the initiator of the algorithm.

6.3. Proof of Correctness

We first assume that the rollback-recovery algorithm is invoked by a single
process that wants to restart. The variable ready-Jo..roll ensures that a process p
inherits at most one request to roll back. Therefore, to prove the termination of
Algorithm R, it suffices to show that Algorithm R is free of deadlock and it rolls
each process back at most once.
Lemma 6: Algorithm R is deadlock free.
Proo Similar to the proof of lemma 3. E
Lemma 7: Every process in the system rolls back at most once.
Proof- Without loss of generality, assume that the initiator decides to roll
back. The initiator receives replies from all its roll-cohorts only after
all processes have received replies from all their respective roll-cohorts.
Therefore, should a process *p receive a rollback request from another
process q after p has received the initiator decision, the initiator must
have decided to roll back before it received all the replies from its roll-
cohorts, an impossibility. C
We next show that for each send event that is undone in Algorithm R, its
corresponding receive event is also undone.
Page 18

Daemon process:

send(initiator, "prepare to roll back and 1_");

await(initiator, reply,-
if reply = "yes" then
send, initiator, "roll back")
else
send(initiator, "do not roll back")
ft.

All processes p:

INITIAL STATE:
ready_to_.ro1I =
" true;
ast..rmsg, daenon) = T;
="yes" if p is willing to roll back
Lwdhnlg.torol1 = l~no" otherwise

UPON RECEIPT OF "prepare to roll back and last_smsgq(p)" from q DO

if tviling-toJroll and last_rmsgP(q) >last_smsgq(p) and readyto.roll. then
readvyto-roll,.--false;
for all rEroll-cohort,, send(r, "prepare to roll back and last_smsg.(r)");
for all rEroll -cohort,, await{r, LLallin-toroll .);
if 3 r~roll-cohortp, ',lling-to-roll_ = "no" or r has failed
then wvilling.toroll ,- "no" fi;
fi:
send(q, willing-to-roll,,);
od;

UPON RECEIPT OF m ="roll back" or

m ="do not roll back" and readvto-roll, = true DO
if m = "roll back" then
roll back to p's permanent checkpoint;
else
resume normal execution:
fi:
for all rEroll -cohort,,, send(r, rn);
od:

FIG 8. Alortthrr R: the Rollback Algorithm

-- . . . .. . . .J
Page 19

Lemma : After every process has terminated its execution of Algorithm R, for
each. send event that was undone, its corresponding receive event was
also undone.
Proof. Without loss of generality, assume that the initiator decides to roll
back. The proof is by contradiction. Suppose that after Algorithm R
terminates, there exists a message m such that the receiver p did not
undo the receipt of n while the sender q undid the sending of M. First,
we show that p inherited a request to roll back. Since q cannot send
system messages after inheriting a rollback request, q must have sent
mnbefore inheriting the request. And since q undid the sending of m,
rn.I >Iastsmsgq(p). Therefore, when p receives q's request,
lasLtrmsg,(q) 2.m.1 >las.snsgq(p). In addition, the variable
willing.toroll, must have been "yes"; otherwise the initiator cannot
have decided to roll back. Consequently, when q's request reached p,
either p had. already inherited a rollback request or it inherited q's
request.

Next we show that p undid the receipt of n. Since p and q received

the same decision, p rolled back. There are two cases to consider-

Case 1: m reached p after p inherited a rollback request. Obvious.

Case : m reached p before p inherited a rollback request. The receipt

of m was not undone only if after receiving m and before inheriting a
rollback request, p took a permanent checkpoint. However, if p took a
permanent checkpoint after receiving m while q did not take a per-
manent checkpoint after sending m (since q can undo the sending of
m), lemma 4. will be contradicted.

In all cases, p undoes the receipt of m when it rolls back, contradicting

our assumptio. Q
Lastly, we show that a minimal number of processes roll back in Algorithm P.
Let P be the set of processes in the system that roll back.
Theorem 9: After Algorithm R terminates, for each send event that is undone, its
corresponding receive event is undone if and only if for all nonempty
QCP such that Q does not contain the initiator, all processes in Q roll
back.

. . . . . •. - o . . . . .
.- ° , .-. . . . . . .-. . . .• ... ,. , ..-. •-.-.-.,. o . . ., , - -
Page 20

Proof Without loss of generality, assume IPI a:2. The if part is by lemma 8.
We show the only if part by contradiction. Suppose that there exists a
Q such that even if all processes in Q do not roll back, for each send
event that is undone by Algorithm R, its corresponding receive event is
undone. For any processes p and q, if p inherits a rollback request
from q, ready-to-rollq becomes true before ready-oroll, becomes true.
Therefore, the inherit relation is non-circular. Because of this non-
circularity and the fact that the initiator is in Q, there exists q EQ such
that q inherits a rollback request from another process p outside of Q.
Since q EP, p EP. When q inherits p's request,
lastrmsgq(p)>last_.smsgp(q). Let m be the message such that
m.l=last_.rmsgq(p). If processes in Q do not roll back while those in
P-Q do, p undoes the sending of m while q does not undo the receipt
of m, a contradiction.

7. Interference
In this section, we consider concurrent invocations of the checkpoint and
rollback-recovery algorithms. An execution of these algorithms by process p is
interfered with if any of the following events occur:
(1) Process p receives a rollback request from another process q while executing
the checkpoint algorithm.
(2) Process p receives a checkpoint request from q while executing the rollback-
recovery algorithm.
(3) Process p, while executing the checkpoint algorithm for initiator i, receives a
checkpoint request from q, but q's request originates from a different initiator
than i.
(4) Process p, while executing the rollback-recovery algorithm for initiator i,
receives a rollback request from q, but q's request originates from a different
initiator than i.
One single rule handles the four cases of interference: once p starts the execu-
tion of a checkpoint [rollback] algorithm, p is unwilling to take a tentative check-
point [roll back] for another initiator, or to roll back [take a tentative checkpoint].
As a result, in all four cases, p replies "no" to q. We can show that this rule
suffices to guarantee that all previuas lemmas and theorems hold despite con-
current invocations of the algorithms. This rule can, however, be modified to per-
mit more concurrency in the system. The modification is that in case (1), instead of
Page 21

sending nor to q, p can begirL executing the rollback-recovery algorithm after it

finishes the checkpoint algorithm. We cannot, however, apply a similar
modification in case (2) lest deadlock may occur.

8. Optimization
When the initiator of the checkpoint or of the rollback-recovery algorithm fails
before propagating its decision to its cohorts, it is desirable for processes not to
block waiting for its recovery. To prevent processes from blocking, we can modify
our algorithms by replacing the underlying two-phase commit protocol with a non-
blocking three-phase commit protocol [Skee82J. However, non-blocking protocols
are inherently more expensive than blocking ones [Dwor83].
We now address the following problem: after a ckpt-cohort q of a process p
fails, p is unable to take a permanent checkpoint until q recovers (p cannot know if
the latest checkpoint of q records the sendings of all messages it received from q).
To avoid waiting for q's recovery, p can remove q from ckpt-cohortp by restarting
frour its checkpoint (using the rollback-recovery algorithm). Thereafter, process p
can take checkpoints.

9. Message Loss.
Rollback-recovery can cause message loss as illustrated in Figure 9. When p
is rolled back to X following a failure at F, the global state is consistent, but the
message m from q is lost It is lost because the set of checkpoints {X, Y}
corresponds to a consistent global state with m in the channel.
One method to circumvent message loss requires that processes use transmis-
sion protocols that transform lossy channels to virtual error-free channels, e.g.,
sliding window protocols [Tane81]. Another method is to ensure that the most
recent set of checkpoints corresponds to a consistent global state with no messages

railum

q
F
FIG. 9. Message loss following p's rollback to X.

K. ... ................... .... .

Page 22

in the channels. We can modify the checkpoint and rollback-recovery algorithms to

satisfy this condition, but this modification increases the number of processes that
are forced to take checkpoints and roll back.

10. Conclusion
We have presented a checkpoint algorithm and a rollback-recovery algorithm
to solve the problem of bringing a distributed system to a consistent state after
transient failures. In contrast to previous algorithms, they tolerate failures that
occur during their executions. Furthermore, when a process takes a checkpoint, a
minimal number of additional processes are forced to take checkpoints. Similarly,
when a process restarts after a failure, a minimal number of additional processes
are forced to restart with it. We also show that the stable storage requirement of
our algorithms is minimal.
Acknowledge ments We would like to thank Amr El Abbadi, Ken Birman, Rance
Cleaveland, and Jennifer Widomn for commenting on earlier drafts of this paper.
Page 23

Bibliography

Ande79 T. Anderson, P. A. Lee and S. K. Shrivastava, System fault tolerance, in

Computing System Reliability, T. Anderson, B. Randell (eds.) Cambridge
University Press, Cambridge, 1979, pp. 153-210.
Bria84 D. Briatico, A. Ciuffoletti, and L. Simoncini, A distributed domino-effect
free recovery algorithm, Proc. of the 4th Symposium on Reliability in
DistributedSoftware and Database Systems, October 1984.
Chan86 K. W.Cbandy and L. LamporM, Distributed snapshots: Determining glo-
bal states of distributed systems, ACM Transactions on Computer"Sys-
tems, vol. 3, no. 1, pp. 63-75, February 1985.
Dwor84 C. Dwork and D. Skeen, The inherent cost of nonblocking commitment,
Proc. ACM Symposium on Principles of Database Systems, March 1983.
Fisc82 M. Fischer, N. Griffeth, and N. Lynch, Global states of a distributed sys-
tem, IEEE Transaction on Software Engineering, May 1982, pp. 198-202
Hadz82 V. Hadzilacos, An algorithm for minimizing rollback cost, Proc. ACM
Symposium on Principlesof DatabaseSystems, March 1982.
Jose85 T. Joseph and K. Birman, Low cost management of replicated data in
fault-tolerant distributed systems, To appear in TOCS.
Lamp78 L Lamport, Time, clocks and the ordering of events in a distributed sys-
tem, Communications of the ACM, vol. 21, no. 7, July 1978, pp. 558-565.
Lamp79 B. Lampson and. H. Sturgis, Crash recovery in a distributed storage sys-
tem, Xerox PARC Tech. Rep., Xerox Palo Alto Research Center, April
1979.
Pres83 D. L. Presotto, Publishing: A reliable broadcast communication mechan-
ism, Tech. Rep. UCB/CSD 83-165, Computer Science Division, University
of California,Berkeley, December 1983.
Rand75 B. Randell, System structure for software fault tolerance, IEEE Transac-
tions On Software Engineering, vol. SE-I, no.2, June 1975, pp. 226-232.
Rand7& B. Randell, P.A. Lee, and P.C. Treleaven, Reliability issues in computing
system designACM Computing Surveys, vol. 10, no. 2, June 1978, pp.
123-166.
Russ77 D. L. Russell, Process backup in producer-consumer systems, Proc. ACM
Symposium on Operating Systems Principles, November, 1977.

-A
Page 24

Russ8O D. L. Russell, State restoration in systems of communicating processes,

IEEE Transactions on Software Engineering, vol. SE-6, no. 2, March
1980, pp. 183-194.
Skee82 D. M. Skeen, Crash recovery in a distributed database system, Ph.D.
dissertation. Computer Science Division, University of California, Berke-
ley, 1982.
Stro85 R. Strom and S. Yemini, Optimistic recovery in distributed systems,
Transactions on Computer Systems, August 1985, pp. 204-226.
Tami84 Y. Tamir and C. H. Sequin, Error recovery in multicomputers using glo-
bal checkpoints, Proc. of 13th International Conference on Parallel Pro-
cessing, August 1984.
TaneSI A. S. Tanenbaum, Computer Networks, Prentice Hill, New Jersey, 1981,
pp. 148-164.
Wood8l W. G. Wood, A decentralized recovery control protocol, Proc. of the 11th
Annual International Symposium on Fault-Tolerant Computing, June
1981.

• 5"..
. "- -.'.. ..
.""-••- -. '.............. ....... ..- .. i.......:- -.... '..... ... :.-...]
:. .....
- . .,.G
. .. .-- --.... '. a ,.::. -, .- : -: ,, ,,.-. . - -. . - . . -.. . .
, .
,,

Checkpointing and Rollback Recovery For Distributed Systems 5cvcuy5txm
No ratings yet
Checkpointing and Rollback Recovery For Distributed Systems 5cvcuy5txm
23 pages
Ordering and Consistent Cuts
No ratings yet
Ordering and Consistent Cuts
35 pages
System Recovery
No ratings yet
System Recovery
38 pages
Dc-3551 Unit IV Notes
No ratings yet
Dc-3551 Unit IV Notes
32 pages
Unit 4
No ratings yet
Unit 4
32 pages
Distributed UNIT IV
No ratings yet
Distributed UNIT IV
60 pages
Rishi Distributed System
No ratings yet
Rishi Distributed System
6 pages
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
No ratings yet
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
52 pages
Chap 14
No ratings yet
Chap 14
30 pages
4th Unit Topics Recovery
No ratings yet
4th Unit Topics Recovery
73 pages
Unit - Iv
No ratings yet
Unit - Iv
10 pages
Unit 4 Part 3
No ratings yet
Unit 4 Part 3
21 pages
Rohini 40997394702
No ratings yet
Rohini 40997394702
6 pages
4 Distributed Algorithms
No ratings yet
4 Distributed Algorithms
176 pages
DC Module 2
No ratings yet
DC Module 2
76 pages
Roll Back and Recovery Mechanisms
No ratings yet
Roll Back and Recovery Mechanisms
12 pages
A Minimum-Process Coordinated Checkpointing Protocol For Mobile Distributed System
No ratings yet
A Minimum-Process Coordinated Checkpointing Protocol For Mobile Distributed System
10 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
21 pages
Unit-3 Part2
No ratings yet
Unit-3 Part2
74 pages
GlobalSnapshot ds14
No ratings yet
GlobalSnapshot ds14
31 pages
Unit Iv Recovery
No ratings yet
Unit Iv Recovery
27 pages
Chapter 8 Fault Tolerance
No ratings yet
Chapter 8 Fault Tolerance
20 pages
Lm2-Rollback & Recovery
No ratings yet
Lm2-Rollback & Recovery
34 pages
Distributed Systems Snapshots
No ratings yet
Distributed Systems Snapshots
65 pages
Distributed Systems Analysis
No ratings yet
Distributed Systems Analysis
18 pages
Midterm Solutions
No ratings yet
Midterm Solutions
8 pages
Distributed System Recovery Guide
No ratings yet
Distributed System Recovery Guide
119 pages
A Low Overhead Minimum Process Global Snapshop Collection Algorithm For Mobile Distributed System
No ratings yet
A Low Overhead Minimum Process Global Snapshop Collection Algorithm For Mobile Distributed System
19 pages
M.Tech Course Distributed Computing
No ratings yet
M.Tech Course Distributed Computing
117 pages
Election Algorithm Coordinator
No ratings yet
Election Algorithm Coordinator
7 pages
Distributed Environemnt
No ratings yet
Distributed Environemnt
51 pages
Consistent Global States of Distributed Systems Fundamental Concepts and Mechanisms
No ratings yet
Consistent Global States of Distributed Systems Fundamental Concepts and Mechanisms
43 pages
Failure Recovery in Distributed Systems
No ratings yet
Failure Recovery in Distributed Systems
24 pages
Cs3551 Unit IV Notes
No ratings yet
Cs3551 Unit IV Notes
34 pages
Reliable Communication in The Presence of Failures
No ratings yet
Reliable Communication in The Presence of Failures
17 pages
Distributed System Lecture 5
No ratings yet
Distributed System Lecture 5
24 pages
CST402 Scheme
No ratings yet
CST402 Scheme
9 pages
Distributed Systems: Global State
No ratings yet
Distributed Systems: Global State
31 pages
The Drinking Philosophers Problem: K. M. Chandy and J. Misra University of Texas at Austin
No ratings yet
The Drinking Philosophers Problem: K. M. Chandy and J. Misra University of Texas at Austin
15 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms
No ratings yet
Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms
33 pages
CheckpointingRecovery ds14
No ratings yet
CheckpointingRecovery ds14
35 pages
Consensus Algorithms for Blockchain
No ratings yet
Consensus Algorithms for Blockchain
32 pages
Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms
No ratings yet
Consistent Global States of Distributed Systems: Fundamental Concepts and Mechanisms
53 pages
DC (Unit 4)
No ratings yet
DC (Unit 4)
14 pages
Unit 4 Part 3
No ratings yet
Unit 4 Part 3
33 pages
Unit II 2 Marks With Answer
No ratings yet
Unit II 2 Marks With Answer
3 pages
Chapter 8-Fault Tolerance
100% (1)
Chapter 8-Fault Tolerance
71 pages
Clocks DOS
No ratings yet
Clocks DOS
9 pages
DC Unit4
No ratings yet
DC Unit4
32 pages
SES Algorithm
No ratings yet
SES Algorithm
23 pages
DSCC Unit 2 PDF
No ratings yet
DSCC Unit 2 PDF
10 pages
Distributed Systems: Consensus & Fault Tolerance
No ratings yet
Distributed Systems: Consensus & Fault Tolerance
10 pages
Session 32
No ratings yet
Session 32
3 pages
Kaku - Time and Global State Seminar
No ratings yet
Kaku - Time and Global State Seminar
8 pages
Byzantine Generals Problem MIT6 - 852JF09 - Lec04 (Ocw - Mit.edu)
No ratings yet
Byzantine Generals Problem MIT6 - 852JF09 - Lec04 (Ocw - Mit.edu)
48 pages
Unit 4
No ratings yet
Unit 4
94 pages
Chandy Lamport
No ratings yet
Chandy Lamport
13 pages
SQL Injection Ethical Hacking Guide
No ratings yet
SQL Injection Ethical Hacking Guide
2 pages
Wasp Labeler User Manual
No ratings yet
Wasp Labeler User Manual
209 pages
JSP and Servlet Data Display
No ratings yet
JSP and Servlet Data Display
7 pages
Electricity Billing DBMS Guide
No ratings yet
Electricity Billing DBMS Guide
37 pages
3rd PL - SQL Interview Questions (2022) - Javatpoint
No ratings yet
3rd PL - SQL Interview Questions (2022) - Javatpoint
17 pages
Dokumen - Pub Guide To Data Privacy Models Technologies Solutions 9783031128363 9783031128370
No ratings yet
Dokumen - Pub Guide To Data Privacy Models Technologies Solutions 9783031128363 9783031128370
323 pages
Sri Vidya College of Engineering & Technology Lecture Notes
No ratings yet
Sri Vidya College of Engineering & Technology Lecture Notes
27 pages
RDL Reports Nuts N Bolts
No ratings yet
RDL Reports Nuts N Bolts
8 pages
Numerical Data Analysis Assignment
No ratings yet
Numerical Data Analysis Assignment
4 pages
Caching at Scale With Redis Updated 2021 12 04
No ratings yet
Caching at Scale With Redis Updated 2021 12 04
104 pages
WWW Simplilearn Com Tutorials SQL Tutorial What Is Normalization in SQL
No ratings yet
WWW Simplilearn Com Tutorials SQL Tutorial What Is Normalization in SQL
9 pages
Chapter Four
No ratings yet
Chapter Four
51 pages
STC College Portal
No ratings yet
STC College Portal
49 pages
Test Bank For Succeeding in Business With Microsoft Excel 2013 A Problem-Solving Approach, 1st Edition Fast Download
100% (1)
Test Bank For Succeeding in Business With Microsoft Excel 2013 A Problem-Solving Approach, 1st Edition Fast Download
66 pages
SWE Resume Template
No ratings yet
SWE Resume Template
1 page
Europass CV Template
100% (1)
Europass CV Template
2 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
11 pages
Digital Society Guide for Students
No ratings yet
Digital Society Guide for Students
43 pages
Medicine Recommendation System
No ratings yet
Medicine Recommendation System
82 pages
Ibm PROJECT 1 1 Output
No ratings yet
Ibm PROJECT 1 1 Output
10 pages
Accounting & Business Process Quiz
No ratings yet
Accounting & Business Process Quiz
105 pages
New Text Document
No ratings yet
New Text Document
3 pages
Net Developer - Job Description
No ratings yet
Net Developer - Job Description
2 pages
Phase 4 Document
No ratings yet
Phase 4 Document
14 pages
S Rohith Nagarjuna PB
No ratings yet
S Rohith Nagarjuna PB
4 pages
Smartplant Enterprise: Smartplant Adapter For Smartplant 3D
No ratings yet
Smartplant Enterprise: Smartplant Adapter For Smartplant 3D
53 pages
Introduction To Data Mesh
No ratings yet
Introduction To Data Mesh
5 pages
DBMSL Assignment 2 Starting MySQL
No ratings yet
DBMSL Assignment 2 Starting MySQL
12 pages
Oracle Backup & Recovery Q&A Guide
100% (1)
Oracle Backup & Recovery Q&A Guide
2 pages
Oracle HCM Interview
100% (1)
Oracle HCM Interview
160 pages

A 161126

Uploaded by

A 161126

Uploaded by

Page 2

2. Consistent Global States in Distributed Systems

_..--.-....-,.......-....,............-.....,,.........". ........................ "...+....."....."."] i

FIG. 1. Consistent and inconsistent cuts in a distributed system.

FIG. 2. Inconsistent checkpoints.

FIG. 3. -Domino effect" following a failure.

FIG. 4. Histories of p and q tip to p's failure.

FIG, S. History of p and q up to p's 2nd rollback.

0 5.1. Naive Algorithms

5.2 Classes of-Checkpoints

rest of the proof is analogous to Case 1. 12

53. Our Checkpoint Algorithm

lasLnrmsg (p) - { Lif mn exists

permanent or tentative checkpoint. Define:

When q requests p to take a tentative checkpoint, it appends lastrmsgq(p) to its

await does not prevent a process from receiving messages.

send(initiator, 'take a tentative checkpoint and T");

UPON RECEIPT OF "take a tentative checkpoint and lasL.rmsgq(p)" from q DO

send(q, willing-l-kpt p);

UPON FIRST RECEIPT OF m ="take tentative checkpoint permanent" or

for all uEckpt-cohort., send(u, m);

FIG. 7. Algorithm CI: the Checkpoint Algorithnm

5.3.3. Proof of Correctness

Let q be a ckpt._cohort of p. If q inherits p's request to take a tentative

Suppose that p is in a deadlock waiting for replies from its

contradicts lemma 2 by inheriting two requests: one from q and one

taken by processes in P. Define an alternate set of checkpoints:

When pi inherits pj's request, lastrmsgpj(ps) a first.smsgp(pj)>i.

5.3.4. Coping with Failures

(2) "take tentative checkpoint permanent' or "undo tentative checkpoint" from

if 3 uEckpt.cohortp, willinga.kpt. = "no" then willingJockpt.4- "no" fB

- . 'm'i m 'l l l iil..l.. ..

6.3. Proof of Correctness

send(initiator, "prepare to roll back and 1_");

UPON RECEIPT OF "prepare to roll back and last_smsgq(p)" from q DO

UPON RECEIPT OF m ="roll back" or

FIG 8. Alortthrr R: the Rollback Algorithm

Next we show that p undid the receipt of n. Since p and q received

Case 1: m reached p after p inherited a rollback request. Obvious.

Case : m reached p before p inherited a rollback request. The receipt

In all cases, p undoes the receipt of m when it rolls back, contradicting

sending nor to q, p can begirL executing the rollback-recovery algorithm after it

K. ... ................... .... .

in the channels. We can modify the checkpoint and rollback-recovery algorithms to

Ande79 T. Anderson, P. A. Lee and S. K. Shrivastava, System fault tolerance, in

Russ8O D. L. Russell, State restoration in systems of communicating processes,

You might also like