Consensus and Recovery
Syllabus
Consensus and Agreement Algorithms : Problem Definition - Overview of Results - Agreement in a
Failure-Free System (Synchronous and Asynchronous) - Agreement in Synchronous Systems with
* Failures; Check-pointing and Rollback Recovery : Introduction - Background and Definitions ~
Issues in Failure Recovery - Checkpoint-based Recovery - Coordinated Checkpointing Algorithm -
Algorithm for Asynchronous Checkpointing and Recovery.
Contents
4.1 Consensus and Agreement Algorithms : Problem Definition
4.2. Byzantine Agreement Problem
43 Overview of Results
4.4 Solution to Byzantine Agreement Problem
45 Agreement in a Failure-Free System (Synchronous and Asynchronous)
4.6 Agreement in Synchronous Systems
with Failures .
4.7 Introduction of Check-pointing
and Rollback Recovery . .
4.8 Background and Definitions
49 Consistent Set of Checkpoints
Issues in Failure Recovery
Checkpoint-based Recovery
Coordinated Checkpointing Algorithm
Algorithm for Asynchronous Checkpointing and Recovery
Two Marks Questions with Answers
Marks 13
Marks 13
May-22, Dec.-22, «+--+ -- Marks 13
a7a
42
oor ob Dion
itt
HERI Consensus and Agreement Algor
in distributed systems oft
e a common goal. Mutual Trust/agree
len compete 95 well a5 cooperate 4,
rent is very much required.
jan where data managers have
jon”, When there is no
Agreement protocols
Examples : Aj
Several rounds
agreement can be reached.
[EE] Byzantine Agreement Problem
* The Problem : “Several divisions of the Byzantine army are camped outside an
y, each division commanded by its own general, After observing, the
decide upon a common plan of action, Some of the generals
generals from reaching agreement.”
Is are agree fo atack oF to retreat, Once the commander i
vutenants to the commander 3k or retreat
retreating to another.
+ Ifa lieutenant
eacherous, he
him to attack and another that they are Of is pers thatthe commander told
to retreat
* Source processor broadcasts
objectives
Agreement :
its val
ves f0 others, Solution must meet followité
ty processor
Y Processors agree on the same value.
TECHNICAL PUBLICATIONS? ,
99 woth for knowedgo
puted Computing
a
If source is nonfaulty, then the comy
al on agreed value must be the
ice is faulty then all non - fa
pure 'y processors can agree on any common
value”. “Value ageced upon by faulty processor ie erent
«Fig 42.1 shows Byzantine agreement,
Consensus Problem
Me ty processors
ty processors is different then all non -
‘can agree on any common value.
gree on the same single value.
processor is v , then the common
alee nota Pe
ny processors must be
ue rent then all
agreed value by
© “If initial value of non: e
processors can agree on any common vs
Processors is irrelevant”.
EEEA interactive Consistency Problem
i. Every processor has its own
ji, All non faulty processors must 9Br°<
TECHNICAL PUBLICATIONS
reo on a set of common values.
‘an uptist fer hoodail non fully Processors must egg,
rement is on a single value.
In Byzan 1 is on a set of common values,
ment
tency problems aer® initializes the value whe
lem, only one Processor teas
has its own initial value.
‘© In interactive consis
it pro
processor
EEE] overview of Results
in asynchronous system even if one Process can fail by
‘Synchronous system Asynchronous system
‘Agreement not attainable
“Agreement attainable
3. | Byzantine fare
Z [f= (a= 1) 6) byzantine
process
EEZI solution to Byzantine Agreement Problem
+ First defined and solved by Lamport
+ An arbitrary source processor broadcasts its initial value to all others.
+ If the source processor is faulty, other non-faulty processor can agree on sy
‘common value.
+ Faulty processors’ values and agreements do not matter,
1 tty pes ae my, he nay proceso can! 0h
* Number of faulty processors, cannot exced : tune n-)/3]
{This bound canbe relaxed for systems using authenticated messages.
+ Solution must meet following objectives:
\ Agreement: All nonaulty processors agree on the same value.
source is nonfaulty,
by the source processor,
bia Raa"
3” upset for knowledge
then the common agreed value must be the val
Disiio ee ——— 4-5
[EEE impossible Scenario
1 Consider a system with 3 processors: po, pi, pa
Consensus and Recovery
Ource) is not faulty. 2 is faulty. pt should agree upon 1 as the
value. Not possible, ae ee
«Case 2 : p0is faulty. pl may agree on 1 and p2 on 0
Fig. 44.1 processor pO non faulty and processor po faulty
EEE] Lamport-Shostak-Pease Al:
+ This algorithm also known as Oral Message Algorithm OM(m) where m is the
number of
‘+’ = Number of processors and n >= 3m+l
recursively defined as fllows= ——y
ee
Lampert ays a 1p ph. ps source pis ay
aS Hy Land 0.
pppoe oa
jal value to be 1.
to [p2, p3I- pd sends
he faulty one) sends 110 pl and 0 t0 p3.
ction at pl and pis 1, which is the desired result, (yy
fu
algorithm OM|
Fig. 442
Examplo 2:
© System with 4 proce 2, pB. pO is source, and i
‘Assumption : Possible and 0.
+ st ial value to be 1 for pl and p3, For p2, it sends a
* Step 2: OM(0). pl sends 1 to {p2, pi. p3 sends 1 to {pl,p2)
‘= p2 sends 0 to pl and ps.
Step 3: Majority function at
3 Majority * Ply p2, p3 is still the same (1), which is the desired
TECHIICAL PuBLICATIONS®
+8 wiht for knowledge
pare ee ae
—omenting ese cry
Agreement in a Failure-F
and Asynchronous)" S¥8t@m (Synchronous
In a failure-free system, consensus can
ing information from
ing this decision in
A distributed mechanism would have each p
and each process computes the
cast its values to others,
received.
KONOUS syst number of rounds
he decision value can be obtained using an additional
round.
lop processes in a system,
/ global constant ; maximum number of crash failures tolerated
Integer: x+ local value;
Process P, executes the consensus algorithm for up tof crash failures
for round from 1 t0 +1.do
{if tho curtont value of x haa not been broadcast then
broadcast(x);
y) ¢ value (if any) received from process j in this round;
‘The agreement cor
least one round in which no process failed.
ty condition is satisfied because processes do not send fictitious values
+ The val
in this
+ The termination condition is sen to be satisfied
‘There are { + 1 rounds, where f
0) checkpoint of process Pp is assigned a sequence number i and is denoted
by Chi
We also assume that each process Fp
with a virtual checkpoint that represents the last
takes an initial checkpoint C,,o imme«
before execution begins and ends
state attained before termination.
‘The it® checkpoint interval of process Fy, denotes ll
1) checkpoint, including the i
I the computation performed
(® checkpoint but not the
between its i and
+1)" checkpoint.
EE] consistent Set of Checkpoints
Fig, 4.9.1 shows consistent and incons
processes (sites) that interact
with one another es checkpoints,
All the sites sav
one from each
TATIONS? = an pth fr howe?
TECHNICAL PUBLIC(1) Conaatet state (bp inconsistent stato
Fig. 4.9.4
effect Is caused by omphan messages, which in turn are caused by
"y
psrouted COMPUT
an
«Fig, 49.3 shows consistent set of checkpoint
Consensus and Recovery
labelled.
2 Records of
lost abo_recelve,M
‘nat (a message mani
fiat Jel seny8)
TECHNICAL
Fig. 49.4
ing counter with which each message from that node is
last message fom and the first message to all other nodes.
—pusureaTIOnS®- on wast fr MoweConsensus and Recovey
(Detrbuted Computing “on
Druted Computing
Note : “a” denotes a “smallest label” that is < any other label and “II” denotes ,
“largest label” that is > any other label.
‘Simple method for taking consistent set of checkpoint
4+ The set of the most recent checkpoints is always consistent,
EEEEI synchronous checkpointing and Recovery
Livelock problem during fecovery is avoided by taking a consistent set of
checkpoints
‘Algorithm is said to be synchronous when the processes involved coordinate their
local checkpointing actions such that the set of all recent checkpoints in the system
guaranteed to be
EEE chectpoining agortnm
+ Make some simplifying assumptions
1. Processes communicate by exchanging messages through channels.
“2 Channels are FIFO, end-to-end protocols cope with message loss due to
rollback recovery.
3. Communication failures do not partition the network.
* A single process invokes the algorithm. The checkpoint and the rollback recovery
algorithms are not invoked concurrently. a :
A temporary checkpoint that is mad.
: is made a permanent checkpoint on the
‘sctsful termination ofthe checkpoint algorithm _
2 Permanent: A local checkpoint at a process.
Phase One
* Initiating process P, takes a tentati
ive checkpoint :
take tentative and requests that all the process
+ Each process informs P, whether
i N succeeded in taking a tentative checkpoint.
+ IF; leams that all processes ha checkpoint
ve taken tentative na
tentative checkpoints should be made permanen, “eee
* Otherwise F desis tha al etic cecpoints should be disca
s ded.
TT aetase paeanaP
* 8 tet fr inonledge
ated Computing tn
phase TWO
1. P, propagates its decision to all processes
2. Om receiving the message from P, sl process ac
3 h y act accordingly.
+ Between tentative checkpoint and comit/abort of ehckpoin process must hold
Consensus and Recovery
back messages.
+ Does this guarantee we have a strongly consistent state ? Can you construct an
example that shows we can stil have lost messages ?
synchronous Checkpointing : Properties,
+ All processes from which P, has received messages after it has taken its last
checkpoint take a checkpoint to record the sending of those messages.
+ Fig. 49.5 shows the checkpoints taken unnecessarily
Fig. 49.5 Checkpoints taken unnecessarily
1. Process X decides to initiate checkpoint aigoritn ater receiving SSG
2 Te take a tentative checkpoints Xp and sends take tentative checkpoint messages to
processes ¥ and Z,, causing Y and
3. Now { xp, yp, 22) forms a consist os
4. xy, yp, zi) also forms a consistent sf oF ~
aor
Tepe PUBS
Z to take checkpoints yz and 2 respectively
of checkpoints
pointsa Consensus
Daeuted Conpuing eM _______RNN ny
last message received by X from Y way
the last checkpoint (last_recv(x, y) >.
Synchronous Checkpointing Disadvantagos
al messages must be exchanged to coordinate checkpointing..
progress,
algorithm places an unnecessary extra load on the system, which can si
affect performance.
AZEXA Tho Rollback Recovery Algorithm
+ Restore the system st
Phaso One :
Process checks whether al “
Al proess are willing to restart fom thee previous
checkpoints, = ae mre
# A process may reply “no” if it is alread
is already participating in a inting of
secoverng Poss insted by some cer proces * StHPointn ©
+ Wall processes are willing to
ling to restat from their
tha ey rd i 1 rst fom their previous checkpoints, , desis
* Otherwise, P, decides that all the processes continue with the activities.
Phase Two:
+P propagates decison a pce
* On receiving P;s decision, the : accor
the pres set cd
‘Optimization ae
+ Aino nanber
imum mantra
+ wl es rm i peomany
where the sending of one of mare se Pet SHY if X is rolling back to sat
‘Messages from X to Y is being undone.
OR aa gpg
20 upset fr bo
pnbuted COMPUIPS aot
i Consensus and Recovery
1s Fig, 49.6 shows the unnecessary rolack,
=f af —
eee
Ka] Me
iz T
Fig. 46 Unnecessary rock
EEE] Message Types
Intransit message : Messages that have been sent but not yet received
Lost messages : Messages whose “send” is done but "receive" is undone due to
rollback
Delayed messages
receiving process was ei
ages whose "receive" is not recorded because the
1 down or the message arrived after rollback
ive” recorded but message "send" not
Orphan messages : Messages wi
recorded - do not arise if processes roll back to a consistent global state
Duplicate messages : Arise due to message logging and replaying during process
recovery.
eae
ERD issues in Failure Recovery
operational state, Once a failure
i al
+ Recovery rofers to restoring a system 10 is norm re
ou crerrial that the process where the failure happened