0% found this document useful (0 votes)
174 views24 pages

Unit 4 Answer Key

unit 4 answers

Uploaded by

nulinjeriba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
174 views24 pages

Unit 4 Answer Key

unit 4 answers

Uploaded by

nulinjeriba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

UNIT IV

Q.N CO’ Bloom’s


Questions Level
o s
What is checkpointing in distributed systems? CO5 K1

Its a technique used to save the system's state at specific intervals, allowing
1. recovery from failures. It helps ensure fault tolerance by enabling the system
to roll back to the last checkpointed state after a failure, reducing the loss of
computation and maintaining consistency across nodes.

What are the types of checkpoint-based rollback recovery? CO5 K1


The two types of checkpoints are:

1. Coordinated Checkpoints: All processes take checkpoints


2. simultaneously, ensuring a consistent global state for recovery.
2. Uncoordinated Checkpoints: Each process takes checkpoints
independently, which can lead to inconsistencies and may require
multiple rollbacks to achieve a consistent state.

Why do we need checkpoints? CO5 K1

Checkpoints are needed to ensure fault tolerance and efficient recovery.


3. They allow the system to save its state periodically, enabling it to roll back
to a previous checkpoint in case of failures, minimizing data loss and
reducing the need for restarting processes from the beginning.

State the use of Rollback recovery. CO5 K1

Rollback recovery is used to restore the system to a consistent state after


4. a failure. It helps in minimizing data loss and reducing the need to reprocess
tasks by allowing the system to revert to a previously saved checkpoint,
ensuring fault tolerance and continuity of operations.

What is coordinated checkpointing? CO5 K1

Coordinated checkpointing is a technique where all processes coordinate to


5. take checkpoints at the same time. This ensures a globally consistent state
across the system, making recovery from failures straightforward without
the need for complex rollbacks or dependency tracking between processes.

What is the purpose of checkpointing? CO5 K1


6.
The purpose of checkpointing is to ensure fault tolerance and efficient
recovery. By periodically saving the system's state, checkpointing allows
the system to recover from failures by rolling back to the last saved state,
reducing data loss and avoiding the need to restart processes from the
beginning.

What are the advantages of checkpoint? CO5 K1

The advantages of checkpoints in distributed systems include:

7. 1. Fault Tolerance
2. Reduced Downtime
3. Efficient Resource Utilization
4. Consistency
5. Scalability
6. Enhanced Reliability

What are the issues in asynchronous checkpointing? CO5 K1

The issues in asynchronous checkpointing in distributed systems include:

8. 1. Inconsistent States: Since processes checkpoint independently, it


can result in inconsistent global states, complicating recovery.
2. Domino Effect: A failure may force multiple processes to roll back
to earlier checkpoints, leading to significant overhead and loss of
progress.

What is the difference between rollback and roll-forward CO5 K1


recovery?
The difference between rollback and roll-forward recovery in distributed
systems is:

9.  Rollback Recovery: This method involves reverting the system to a


previous state (checkpoint) after a failure, restoring the system to a
known good condition.
 Roll-Forward Recovery: This approach involves applying logged
actions or operations that occurred after the last consistent state to
recover the system to a later state, effectively moving forward from
that point.

Define the term rollback propagation. CO5 K1

Rollback propagation refers to the process by which a failure in one process


10. leads to the need for other dependent processes to also roll back to earlier
checkpoints. This ensures a consistent global state across the system, but can
result in significant overhead as multiple processes may need to revert to
previous states to maintain consistency.
What are the two types of checkpoints. CO5 K1

Local Checkpoint

 Definition: A local checkpoint is the state saved by an individual


process in a distributed system, allowing it to recover from failures
by restoring its own state.
 Key Point: Each process takes checkpoints independently, which can
lead to inconsistent global states across the system.
11.
Global Checkpoint

 Definition: A global checkpoint is a collection of local checkpoints


from all processes in a distributed system, capturing a consistent
global state.
 Key Point: Ensures a coordinated recovery, avoiding inconsistencies
like the domino effect.

Classify the faults based on failure of components and based CO5 K1


on faulty components

1. Based on Failure of Components:


o Crash Failures: Components become completely
unresponsive.
12. o Omission Failures: Components fail to respond to messages
while remaining operational.
2. Based on Faulty Components:
o Byzantine Failures: Components behave arbitrarily,
providing incorrect or misleading information.
o Transient Failures: Temporary faults that may resolve
themselves, such as brief network outages.

Define fault and failure. CO5 K1

 Fault: A fault refers to an underlying condition or defect in a


component that may lead to failure, such as hardware malfunctions,
13. software bugs, or network issues.
 Failure: A failure occurs when a fault manifests, resulting in the
component no longer performing its intended function, leading to
incorrect behavior or a complete loss of service.

State Byzantine agreement problem. CO5 K1

14. The Byzantine Agreement Problem involves a group of nodes needing to


agree on a single value despite some nodes acting arbitrarily or maliciously.
The goal is for all non-faulty nodes to reach consensus on the same value,
ensuring reliable decision-making in the presence of Byzantine faults.

CO5 K1
What is a consensus algorithm?

A consensus algorithm is a protocol used to achieve agreement among


15. multiple nodes on a single data value or state, even in the presence of faults
or failures. It ensures that all non-faulty nodes reach the same decision,
maintaining consistency and reliability across the system despite potential
network partitions or malicious actors.
PART - B
Discuss the taxonomy of rollback recovery in distributed CO5
system to ensure data consistency and system reliability?
Show how the dominos effect could lead to a cascaded
rollback.
Rollback recovery in distributed systems is essential for ensuring data
consistency and system reliability. It can be categorized into several types
based on how the recovery process is managed. The primary categories are:

1 K2

1. Coordinated Checkpointing:
o All processes in the distributed system synchronize to take
checkpoints simultaneously.
o This method ensures a consistent global state, making
recovery straightforward, as all processes can revert to their
corresponding checkpoints without worrying about
dependencies.
2. Uncoordinated Checkpointing:
o Each process independently takes checkpoints without
synchronization with other processes.
o This method can lead to inconsistencies, as different
processes may have checkpoints that do not align, resulting in
what is known as the "domino effect."
3. Message Logging:
o This involves logging messages exchanged between processes
to ensure that, during recovery, processes can replay messages
to restore their states.
o There are two main types of message logging:
 Pessimistic Logging: Messages are logged before
being sent, ensuring that processes can recover their
state completely.
 Optimistic Logging: Messages are logged after they
are sent, which may lead to the need for rollback if
failures occur.
4. Hybrid Approaches:
o Some systems may use a combination of coordinated
checkpointing and message logging to balance the trade-offs
between performance and consistency.

The Domino Effect and Cascaded Rollback

The domino effect occurs in uncoordinated checkpointing when a failure in


one process forces it to roll back to a previous checkpoint, which in turn
requires other dependent processes to roll back as well. This can lead to a
cascading rollback, where multiple processes are forced to revert to earlier
states.

2
Elaborate in detail about Log based Recovery system with its CO5 K2
types.
Types:
What are the different types of message may be generated on CO5
the process of rollback. Explain each with space time diagram
Rollback recovery treats a distributed system application as a collection of
processes that communicate over a network. It achieves fault tolerance by
periodically/ saving the state of a process during the failure-free execution,
enabling it to restart from a saved state upon a failure to reduce the amount
3 of lost work. The saved state is called a checkpoint, and the procedure of K2
restarting from a previously checkpointed state is called rollback recovery

A process failure and subsequent recovery may leave messages that were
perfectly received (and processed) before the failure in abnormal states. This
is because a rollback of processes for recovery may have to rollback the send
and receive operations of several messages.
Outline the direct dependency tracking technique .Give the CO5
pros and cons of un-coordinated checkpointing.

Direct Dependency Tracking (DDT) Technique

The Direct Dependency Tracking (DDT) technique is used in distributed


systems to detect causal dependencies between processes during message
passing. The technique focuses on recording causal dependencies to aid in
rollback recovery, ensuring that system states can be recovered consistently
after a failure. The primary goal is to track dependencies and manage
consistent global states without relying on synchronization.

4 K2

Steps in Direct Dependency Tracking:

1. Message Passing: Each process in the distributed system sends and


receives messages. These messages create causal dependencies
between the processes.
2. Dependency Recording: Each process maintains a log of the
messages it sends and receives, which includes information on other
processes it is causally dependent on.
3. Tracking Dependencies: Instead of storing all previous states, only
the causal dependencies (directly involved processes) are tracked.
This reduces the storage overhead.
4. Recovery: When a failure occurs, the system identifies the last
consistent global state based on the dependency logs and rolls back to
that point.

Benefits of DDT:

 Efficiency: Reduces the amount of state information that needs to be


tracked compared to other methods.
 Scalability: Scales well with large distributed systems since only
direct dependencies are tracked rather than maintaining a full global
state.
 Faster Recovery: Recovery is often faster because only the
processes that are causally dependent on the failed process need to be
rolled back.

Limitations of DDT:

 Complexity in Failure Scenarios: Handling concurrent failures or


complex communication patterns can be difficult.
 State Explosion: In some systems with high message passing rates,
the number of dependencies may grow rapidly, increasing tracking
overhead.

Uncoordinated Checkpointing

Uncoordinated checkpointing refers to a fault tolerance mechanism where


each process in a distributed system independently takes checkpoints at
different times without coordinating with other processes. If a process fails,
it can roll back to its last checkpoint. However, to ensure consistent global
recovery, all processes may need to roll back to a consistent set of
checkpoints across the system.

Pros of Uncoordinated Checkpointing:

1. Low Overhead: Since there is no need for coordination, processes


can take checkpoints based on their local state, reducing
synchronization delays.
2. Asynchronous Operation: Processes can continue to execute
independently, which is beneficial for systems with high concurrency
and where synchronization could introduce bottlenecks.
3. Flexibility: Each process can determine its checkpointing frequency
based on its workload and resource availability, allowing
optimization for individual process requirements.

Cons of Uncoordinated Checkpointing:

1. Risk of Cascading Rollbacks (Domino Effect): One of the biggest


drawbacks is the possibility of cascading rollbacks. If a failure
occurs, the system may need to roll back multiple processes, leading
to the "domino effect," where all processes might need to roll back to
their initial state.
2. Checkpointing Complexity: Determining a consistent global state
after a failure becomes complex, as processes may have taken
checkpoints at different times, leading to inconsistent snapshots.
3. Message Log Overhead: To ensure consistency during recovery, a
system must log all messages to reapply them after rolling back. This
leads to additional storage and performance overhead.
4. Increased Recovery Time: The process of determining a consistent
recovery point can be time-consuming, especially when many
processes are involved, increasing the overall recovery time.

Discuss in detail about Consensus and Agreement Algorithms.


Describe the Byzantine General Problem in detail.

5 CO5 K2
Byzantine Generals Problem

1. Scenario: Byzantine generals need to agree on a common plan


(attack or retreat), but some generals may be traitors.
2. Goal: Loyal generals must reach consensus on the same action,
despite possible traitorous behavior.
3. Challenge: Traitorous generals can send conflicting or false
messages, making it difficult to agree.
4. Communication: Generals can only communicate via messages, no
direct knowledge of who is loyal or traitorous.
5. Conditions:
o All loyal generals must agree on the same action.
o The system must tolerate some generals acting arbitrarily or
maliciously.

Problem Requirements

1. Consistency: All loyal nodes (generals) should agree on the same


decision.
2. Fault Tolerance: The system should work correctly even if some
nodes (generals) act maliciously.
3. Traitor Detection: It is difficult to distinguish between faulty and
malicious nodes.

Solution Concepts

1. Byzantine Fault Tolerance (BFT): An agreement protocol designed


to handle Byzantine failures.
2. Majority Voting: Loyal generals forward messages to others and use
majority voting to reach a consensus.
3. Tolerance Limit: Byzantine consensus is possible if fewer than one-
third of the nodes are faulty.

Key Algorithms

1. Lamport, Shostak, and Pease Algorithm:


o Requires n > 3f generals (nodes) to tolerate f faults.
2. Practical Byzantine Fault Tolerance (PBFT):
o Requires n = 3f + 1 nodes to tolerate up to f faults.
o Used in modern systems like blockchain (Hyperledger,
Tendermint).

Challenges

1. Communication Complexity: High number of messages exchanged


for consensus.
2. Fault Detection: Hard to differentiate between a faulty and malicious
node.
3. Scalability: BFT protocols become complex and resource-intensive
as the number of nodes increases.

Applications

 Blockchain Technology: BFT is critical in decentralized systems like


Bitcoin and Ethereum to handle malicious actors.
 Distributed Systems: Ensures reliable consensus even in the
presence of arbitrary faults or malicious behavior.

Explain the two popular flavours of the Byzantine agreement


6 problem.Illustrate Lamport-Shostak-Pease Algorithm with a CO5 K2
neatdiagram,
PART – C
Show how would you utilize the Koo-Toueg coordinated CO5
Checkpointing algorithm in maintaining system consistency
during failures in a distributed environment?

The Koo-Toueg Coordinated Checkpointing Algorithm ensures system


consistency during failures in distributed systems by coordinating the
checkpointing of processes without requiring all of them to checkpoint
simultaneously. It handles in-transit messages and ensures a consistent
1 global state. K3

Key Concepts:

 Coordinator Process: A process responsible for managing the


checkpointing protocol.
 In-Transit Messages: Messages sent but not yet received, which
need to be logged to ensure consistency.
 Checkpoint: A saved state of a process, allowing recovery after a
failure.

The Koo-Toueg Algorithm operates as follows:


 Step 1: Checkpoint Initiation: One node, known as the
coordinator, initiates the checkpointing process by broadcasting a
checkpoint request to all other nodes in the distributed system.
 Step 2: Request Propagation: Upon receiving the
checkpoint request, each node propagates it to all its neighbors
to ensure that every node is aware of the checkpointing process.
 Step 3: Freezing Application Execution: Nodes halt their
application processes temporarily to ensure that no new
messages are processed during the checkpointing phase. This
ensures consistency in the captured state.
 Step 4: State Recording: Each node records its local state,
including memory contents, register values, and process states,
to stable storage.
 Step 5: Message Logging: Nodes log all messages sent and
received during the checkpointing period to maintain a
comprehensive record of system communication.
 Step 6: Acknowledgment: After recording their states, nodes
send acknowledgments back to the coordinator to indicate that
they have completed their checkpointing process.
 Step 7: Coordinator Confirmation: The coordinator waits to
receive acknowledgments from all nodes. Once all
acknowledgments are received, the coordinator confirms that the
checkpointing process is complete, and the system can resume
normal operation.

Handling In-Transit Messages:

The key feature of the Koo-Toueg algorithm is its ability to handle in-transit
messages. Without accounting for these messages, checkpoints could be
inconsistent. The algorithm ensures that:

 Sent but unreceived messages are logged.


 These messages are replayed during recovery to ensure that no data
or state is lost.

Advantages of Koo-Toueg Coordinated Checkpointing:

1. Consistency: By coordinating checkpoints and handling in-transit


messages, the system ensures that all processes recover to a globally
consistent state.
2. Minimal Overhead: The algorithm reduces the overhead of
checkpointing by allowing processes to continue without halting until
absolutely necessary.
3. Avoids Domino Effect: Since the algorithm coordinates checkpoints,
it avoids the need for cascading rollbacks (domino effect), ensuring
efficient recovery.

Applications use cases of koo-toueg algorithm

 Distributed Databases: Ensuring data consistency and


reliability in distributed database systems.
 Scientific Computing: Facilitating fault-tolerant computations
in distributed scientific applications.
 Real-time Systems: Supporting fault recovery and
continuous operation in real-time distributed systems.
 High-Performance Computing: Enabling coordinated
checkpointing in parallel and distributed computing
environments.

Identify whether the dominos effect is presented if the process CO5


P3 fails as shown in the below given model of distributed
computation. If so track the path of the dominos effect

2 K3

You might also like