EnggTree.
com
                                                                   CS3551 DISTRIBUTED COMPUTING
       When a failure occurs, the recovering process initiates rollback by broadcasting a
        dependency request message to collect all the dependency information maintained by each
        process.
       When a process receives this message, it stops its execution and replies with the
        dependency information saved on the stable storage as well as with the dependency
        information, if any, which is associated with its current state.
                                      www.EnggTree.com
       The initiator then calculates the recovery line based on the global dependency information
        and broadcasts a rollback request message containing the recovery line.
       Upon receiving this message, a process whose current state belongs to the recovery line
        simply resumes execution; otherwise, it rolls back to an earlier checkpoint as indicated by
        the recovery line. 
2. Coordinated Checkpointing
In coordinated checkpointing, processes orchestrate their checkpointing activities so that all
local checkpoints form a consistent global state
Types
   1. Blocking Checkpointing: After a process takes a local checkpoint, to prevent orphan
        messages, it remains blocked until the entire checkpointing activity is complete
        Disadvantages: The computation is blocked during the checkpointing
   2. Non-blocking Checkpointing: The processes need not stop their execution while taking
        checkpoints. A fundamental problem in coordinated checkpointing is to prevent a process
        from receiving application messages that could make the checkpoint inconsistent.
                                                               M.A.M COLLEGE OF ENGINEERING
                    Downloaded from EnggTree.com
                                 EnggTree.com
                                                                CS3551 DISTRIBUTED COMPUTING
Example (a) : Checkpoint inconsistency
      Message m is sent by 𝑃0 after receiving a checkpoint request from the checkpoint
       coordinator
      Assume m reaches 𝑃1 before the checkpoint request
      This situation results in an inconsistent checkpoint since checkpoint 𝐶1,𝑥 shows the receipt
       of message m from 𝑃0, while checkpoint 𝐶0,𝑥 does not show m being sent from
       𝑃0
Example (b) : A solution with FIFO channels
      If channels are FIFO, this problem can be avoided by preceding the first post-checkpoint
       message on each channel by a checkpoint request, forcing each process to take a checkpoint
       before receiving the first post-checkpoint message
                                    www.EnggTree.com
Impossibility of min-process non-blocking checkpointing
      A min-process, non-blocking checkpointing algorithm is one that forces only a minimum
       number of processes to take a new checkpoint, and at the same time it does not force any
       process to suspend its computation.
                                                            M.A.M COLLEGE OF ENGINEERING
                Downloaded from EnggTree.com
                                  EnggTree.com
                                                                 CS3551 DISTRIBUTED COMPUTING
Algorithm
      The algorithm consists of two phases. During the first phase, the checkpoint initiator
       identifies all processes with which it has communicated since the last checkpoint and sends
       them a request.
      Upon receiving the request, each process in turn identifies all processes it has
       communicated with since the last checkpoint and sends them a request, and so on, until
       no more processes can be identified.
      During the second phase, all processes identified in the first phase take a checkpoint. The
       result is a consistent checkpoint that involves only the participating processes.
      In this protocol, after a process takes a checkpoint, it cannot send any message until the
       second phase terminates successfully, although receiving a message after the checkpoint
       has been taken is allowable.
3. Communication-induced Checkpointing
Communication-induced checkpointing is another way to avoid the domino effect, while allowing
processes to take some of their checkpoints independently. Processes may be forced to take
additional checkpoints
                                      www.EnggTree.com
Two types of checkpoints
       1. Autonomous checkpoints
       2. Forced checkpoints
The checkpoints that a process takes independently are called local checkpoints, while those that
a process is forced to take are called forced checkpoints.
      Communication-induced check pointing piggybacks protocol- related information on
       each application message
      The receiver of each application message uses the piggybacked information to determine
       if it has to take a forced checkpoint to advance the global recovery line
      The forced checkpoint must be taken before the application may process the contents of
       the message
      In contrast with coordinated check pointing, no special coordination messages are
       exchanged
Two types of communication-induced checkpointing
   1. Model-based checkpointing
   2. Index-based checkpointing.
                                                             M.A.M COLLEGE OF ENGINEERING
                 Downloaded from EnggTree.com