DC Unit4
DC Unit4
                                            Page 1 of 32
                                                                                           CS8603 DS
                                           Page 2 of 32
                                                                                       CS8603 DS
                                         Page 3 of 32
                                                                                         CS8603 DS
   1. In-transit message
                   messages that have been sent but not yet received
   2. Lost messages
                                            Page 4 of 32
                                                                                        CS8603 DS
   3. Delayed messages
                                                                      se the receiving process was
                  either down or the message arrived after rollback
   4. Orphan messages
In-transit messages
In Figure , the global state {C1,8 , C2, 9 , C3,8, C4,8} shows that message m1 has been sent but
not yet received. We call such a message an in-transit message. Message m2 is also an in-transit
message.
                                          Page 5 of 32
                                                                                        CS8603 DS
Delayed messages
Messages whose receive is not recorded because the receiving process was either down or the
message arrived after the rollback of the receiving process, are called delayed messages. For
example, messages m2 and m5 in Figure are delayed messages.
Lost messages
Messages whose send is not undone but receive is undone due to rollback are called lost messages.
This type of messages occurs when the process rolls back to a checkpoint prior to reception of the
message while the sender does not rollback beyond the send operation of the message. In Figure ,
message m1 is a lost message.
Duplicate messages
           Duplicate messages arise due to message logging and replaying during process
           recovery. For example, in Figure, message m4 was sent and received before the
           rollback. However, due to the rollback of process P4 to C4,8 and process P3 to C3,8,
           both send and receipt of message m4 are undone.
           When process P3 restarts from C3,8, it will resend message m4.
           Therefore, P4 should not replay message m4 from its log.
           If P4 replays message m4, then message m4 is called a duplicate message.
4.3 Issues in failure recovery
In a failure recovery, we must not only restore the system to a consistent state, but also
appropriately handle messages that are left in an abnormal state due to the failure and recovery
                                          Page 6 of 32
                                                                                       CS8603 DS
The computation comprises of three processes Pi, Pj , and Pk, connected through a communication
network. The processes communicate solely by exchanging messages over fault- free, FIFO
communication channels.
1. Uncoordinated Checkpointing
    Each process has autonomy in deciding when to take checkpoints
    Advantages
               The lower runtime overhead during normal execution
                                         Page 7 of 32
                                                                                          CS8603 DS
    Disadvantages
                 1. Domino effect during a recovery
                 2. Recovery from a failure is slow because processes need to iterate to find a
                     consistent set of checkpoints
                 3. Each process maintains multiple checkpoints and periodically invoke a
                     garbage collection algorithm
                 4. Not suitable for application with frequent output commits
    The processes record the dependencies among their checkpoints caused by message
    exchange during failure-free operation
                                           Page 8 of 32
                                                                                           CS8603 DS
        When a process receives this message, it stops its execution and replies with the
        dependency information saved on the stable storage as well as with the dependency
        information, if any, which is associated with its current state.
        The initiator then calculates the recovery line based on the global dependency information
        and broadcasts a rollback request message containing the recovery line.
        Upon receiving this message, a process whose current state belongs to the recovery line
        simply resumes execution; otherwise, it rolls back to an earlier checkpoint as indicated by
        the recovery line.
2. Coordinated Checkpointing
In coordinated checkpointing, processes orchestrate their checkpointing activities so that all
local checkpoints form a consistent global state
Types
   1. Blocking Checkpointing: After a process takes a local checkpoint, to prevent orphan
        messages, it remains blocked until the entire checkpointing activity is complete
        Disadvantages: The computation is blocked during the checkpointing
   2. Non-blocking Checkpointing: The processes need not stop their execution while taking
        checkpoints. A fundamental problem in coordinated checkpointing is to prevent a process
        from receiving application messages that could make the checkpoint inconsistent.
Example (a) : Checkpoint inconsistency
        Message m is sent by        0 after receiving a checkpoint request from the checkpoint
        coordinator
        Assume m reaches 1 before the checkpoint request
        This situation results in an inconsistent checkpoint since checkpoint      1,      shows the
        receipt of message m from 0, while checkpoint 0, does not show m being sent from
         0
Example (b) : A solution with FIFO channels
        If channels are FIFO, this problem can be avoided by preceding the first post-checkpoint
        message on each channel by a checkpoint request, forcing each process to take a checkpoint
        before receiving the first post-checkpoint message
                                           Page 9 of 32
                                                                                          CS8603 DS
Algorithm
      The algorithm consists of two phases. During the first phase, the checkpoint initiator
      identifies all processes with which it has communicated since the last checkpoint and sends
      them a request.
      Upon receiving the request, each process in turn identifies all processes it has
      communicated with since the last checkpoint and sends them a request, and so on, until
      no more processes can be identified.
      During the second phase, all processes identified in the first phase take a checkpoint. The
      result is a consistent checkpoint that involves only the participating processes.
                                        Page 10 of 32
                                                                                       CS8603 DS
       In this protocol, after a process takes a checkpoint, it cannot send any message until the
       second phase terminates successfully, although receiving a message after the checkpoint
       has been taken is allowable.
3. Communication-induced Checkpointing
Communication-induced checkpointing is another way to avoid the domino effect, while allowing
processes to take some of their checkpoints independently. Processes may be forced to take
additional checkpoints
Two types of checkpoints
       1. Autonomous checkpoints
       2. Forced checkpoints
The checkpoints that a process takes independently are called local checkpoints, while those that
a process is forced to take are called forced checkpoints.
       Communication-induced check pointing piggybacks protocol- related information on
       each application message
       The receiver of each application message uses the piggybacked information to determine
       if it has to take a forced checkpoint to advance the global recovery line
       The forced checkpoint must be taken before the application may process the contents of
       the message
       In contrast with coordinated check pointing, no special coordination messages are
       exchanged
                                          Page 11 of 32
                                                                                          CS8603 DS
Meenakshi.R
                                           Page 12 of 32
                                                                                        CS8603 DS
Suppose a set of processes       crashes. A process p in    becomes an orphan when p itself does
not fail and p                                                               e whose determinant
cannot be recovered from the stable storage or from the volatile memory of a surviving process.
storage or from the volatile memory of a surviving process. Formally, it can be stated as follows
                                            Page 13 of 32
                                                                                              CS8603 DS
Types
1. Pessimistic Logging
        Pessimistic logging protocols assume that a failure can occur after any non-deterministic
        event in the computation. However, in reality failures are rare
        Pessimistic protocols implement the following property, often referred to as synchronous logging,
        which is a stronger than the always-no-orphans condition
        Synchronous logging
                            e:    Stable(e)     |Depend(e)| = 0
        Thai is,if an event has not been logged on the stable storage, then no process can depend
        on it.
   Example:
        Suppose processes 1 and 2 fail as shown, restart from checkpoints B and C, and roll forward
        using their determinant logs to deliver again the same sequence of messages as in the pre-
        failure execution
        Once the recovery is complete, both processes will be consistent with the state of 0
        that includes the receipt of message 7 from 1
                                              Page 14 of 32
                                                                                    CS8603 DS
                                       Page 15 of 32
                                                                                CS8603 DS
Consider the example shown in Figure Suppose process P2 fails before the determinant for
m5 is logged to the stable storage. Process P1 then becomes an orphan process and must
roll back to undo the effects of receiving the orphan message m6. The rollback of P1
further forces P0 to roll back to undo the effects of receiving message m7.
Advantage: better performance in failure-free execution
Disadvantages:
        coordination required on output commit
        more complex garbage collection
Since determinants are logged asynchronously, output commit in optimistic logging
protocols requires a guarantee that no failure scenario can revoke the output. For example,
if process P0 needs to commit output at state X, it must log messages m4 and m7 to the
stable storage and ask P2 to log m2 and m5. In this case, if any process fails, the
computation can be reconstructed up to state X.
                                 Page 16 of 32
                                                                                         CS8603 DS
3. Causal Logging
   Combines the advantages of both pessimistic and optimistic logging at the expense of a more
   complex recovery protocol
   Like optimistic logging, it does not require synchronous access to the stable storage except
   during output commit
   Like pessimistic logging, it allows each process to commit output independently and never
   creates orphans, thus isolating processes from the effects of failures at other processes
   Make sure that the always-no-orphans property holds
   Each process maintains information about all the events that have causally affected its state
      Consider the example in Figure Messages m5 and m6 are likely to be lost on the failures
      of P1 and P2 at the indicated instants. Process
      P0 at state X will have logged the determinants of the nondeterministic events that
      causally precede its state accordi                   happened-before relation.
      These events consist of the delivery of messages m0, m1, m2, m3, and m4.
      The determinant of each of these non-deterministic events is either logged on the stable
      storage or is available in the volatile log of process P0.
      The determinant of each of these events contains the order in which its original receiver
      delivered the corresponding message.
                                           Page 17 of 32
                                                                                          CS8603 DS
       The message sender, as in sender-based message logging, logs the message content. Thus,
       process P                                                     P1 and P2 since it knows the
       order in which P1 should replay messages m1 and m3 to reach the state from which P1 sent
       message m4.
       Similarly, P0 has the order in which P2 should replay message m2 to be consistent with
       both P0 and P1.
       The content of these messages is obtained from the sender log of P0 or regenerated
       deterministically during the recovery of P1 and P2.
       Note that information about messages m5 and m6 is lost due to failures. These messages
       may be resent after recovery possibly in a different order.
       However, since they did not causally affect the surviving process or the outside world, the
       resulting state is consistent.
       Each process maintains information about all the events that have causally affected its state.
                                         Page 18 of 32
                                                                                          CS8603 DS
First Phase
   1. An initiating process Pi takes a tentative checkpoint and requests all other processes to take
        tentative checkpoints. Each process informs Pi whether it succeeded in taking a tentative
        checkpoint.
   2.
   3. If Pi learns that all the processes have successfully taken tentative checkpoints, Pi decides
        that all tentative checkpoints should be made permanent; otherwise, Pi decides that all the
        tentative checkpoints should be thrown-away.
Second Phase
   1. Pi informs all the processes of the decision it reached at the end of the first phase.
   2. A process, on receiving the message from Pi will act accordingly.
   3. Either all or none of the processes advance the checkpoint by taking permanent
        checkpoints.
   4. The algorithm requires that after a process has taken a tentative checkpoint, it cannot
        send messages related to the basic
Correctness: for two reasons
                      i. Either all or none of the processes take permanent checkpoint
                   ii. No process sends message after taking permanent checkpoint
An Optimization
The above protocol may cause a process to take a checkpoint even when it is not necessary for
consistency. Since taking a checkpoint is an expensive operation, we avoid taking checkpoints.
                                           Page 19 of 32
                                                                                         CS8603 DS
        Optimization: May not to recover all, since some of the processes did not change
        anything
                                          Page 20 of 32
                                                                                        CS8603 DS
The above protocol, in the event of failure of process X, the above protocol will require
processes X, Y, and Z to restart from checkpoints x2, y2, and z2, respectively.
Process Z need not roll back because there has been no interaction between process Z and the
other two processes since the last checkpoint at Z.
                                          Page 21 of 32
                                                                                          CS8603 DS
The Algorithm
When a processor restarts after a failure, it broadcasts a ROLLBACK message that it had failed
Procedure RollBack_Recovery
processor pi executes the following:
STEP (a)
if processor pi is recovering after a failure then
CkPti := latest event logged in the stable storage
else
CkPti := latest event that took place in pi {The latest event at pi can be either in stable or in
volatile storage.}
end if
STEP (b)
for k = 1 1 to N {N is the number of processors in the system} do
for each neighboring processor pj do
                                           Page 22 of 32
                                                                                            CS8603 DS
end for
for every ROLLBACK(j, c) message received from a neighbor j do
if RCV                    {Implies the presence of orphan messages} then
find the latest event e such that RCV
or stable storage.}
CkPti := e
end if
end for
end for{for k}
D. An Example
Consider an example shown in Figure 2 consisting of three processors. Suppose processor Y
fails and restarts. If event ey2 is the latest checkpointed event at Y, then Y will restart from the
state corresponding to ey2.
                                            Page 23 of 32
                                                                                          CS8603 DS
                       f < n processes
     Byzantine         agreement attainable                    agreement not attainable
                                            Page 24 of 32
                                                                                      CS8603 DS
Agreement: All non-faulty processes must agree on the same (single) value.
Validity: If all the non-faulty processes have the same initial value, then the agreed upon
value by all the non-faulty processes must be that same value.
The overhead bounds are for the given algorithms, and not necessarily tight bounds for
the
problem.
                                      Page 25 of 32
                                                                                     CS8603 DS
Validity: If the source process is non-faulty, then the agreed upon value by all the non-
faulty processes must be the same as the initial value of the source.
                                       Page 26 of 32
                                                        CS8603 DS
                           Page 27 of 32
                                                                                  CS8603 DS
Each phase has a unique "phase king" derived, say, from PID.
1 in 1st round, each process sends its estimate to all other processes.
       2 in 2nd round, the "Phase king" process arrives at an estimate based on the values
       it received in 1st round, and broadcasts its new estimate to all others.
Meenakshi.R
                                        Page 28 of 32
                                                                                        CS8603 DS
(f + 1) phases, (f + 1)[(n - 1)(n + 1)] messages, and can tolerate up to f < dn=4e
malicious processes
Correctness Argument
Pi and Pj use their own majority values. Pi 's mult > n=2 + f )
       Pi uses its majority value; Pj uses phase-king's tie-breaker value. (Pi s mult > n=2 +
         f , Pj 's mult > n=2 for same value)
       Pi and Pj use the phase-king's tie-breaker value. (In the phase in which P k is
       non- malicious, it sends same value to Pi and Pj )
In all 3 cases, argue that Pi and Pj end up with same value as estimate
       If all non-malicious processes have the value x at the start of a phase, they will
       continue to have x as the consensus value at the end of the phase.
_-Agreement: All non-faulty processes must make a decision and the values decided
upon by any two non-faulty processes must be within range of each other.
Validity: If a non-faulty process Pi decides on some value vi , then that value must be
within the range of values initially proposed by the processes.
Termination: Each non-faulty process must eventually decide on a value. The algorithm
for the message-passing model assumes n 5f + 1, although the problem is solvable for n
> 3f + 1.
                                        Page 29 of 32
                                                                                           CS8603 DS
Not possible to go from bivalent to univalent state if even a single failure is allowed.
       Using memory that is stronger than atomic Read/Write memory to design wait-
       free consensus algorithms. Such a memory would need corresponding access
       primitives.
                                        Page 30 of 32
                                                                                     CS8603 DS
Are there objects (with supporting operations), using which there is a wait-free (i.e., (n -1)-
crash resilient) algorithm for reaching consensus in a n-process system? Yes, e.g., Test&Set,
Swap, Compare&Swap. The crash failure model requires the solutions to be wait-free.
                                       Page 31 of 32
                                                                                      CS8603 DS
An object is defined to be universal if that object along with read/write registers can simulate
any other object in a wait-free manner. In any system containing up to k processes, an object
X such that CN(X) = k is universal.
For any system with up to k processes, the universality of objects X with consensus number k
is shown by giving a universal algorithm to wait-free simulate any object using objects of type
X and read/write registers.
       2 Then, the arbitrary k-process consensus objects are simulated with objects of type
       X, having consensus number k. This trivially follows after the first step.
The linked list stores the linearized sequence of operations and states following each
operation.
Operations to the arbitrary object Z are simulated in a nonblocking way using an arbitrary
consensus object (the field op.next in each record) which is accessed via the Decide call.
Each process attempts to thread its own operation next into the linked
list.
       A single pointer/counter cannot be used instead of the array Head. Because reading
       and updating the pointer cannot be done atomically in a wait-free manner.
Page 32 of 32