0% found this document useful (0 votes)

41 views5 pages

Unit 4 - DSRM

Distributed database systems use the Two-Phase Commit protocol for transaction commit and recovery is facilitated by logging and checkpoints. The 2PC protocol involves a prepare and commit phase where participants vote to commit or abort. Logging records changes to a log file before applying them to the database. Checkpoints periodically save the system state to stable storage to provide a recovery starting point.

Uploaded by

kashish.sharma.batch2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views5 pages

Unit 4 - DSRM

Uploaded by

kashish.sharma.batch2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Unit 4

Failures and Their Classification:

Definition: A failure in a distributed database system refers to any event that disrupts the normal
operation of the system, resulting in the loss of data consistency, availability, or reliability.

Types of Failures:

- Hardware Failures: Failures in physical components such as servers, disks, or network devices.

- Software Failures: Errors or bugs in the software components of the database system, such as the
database management system (DBMS) or applications.

- Network Failures: Communication failures or network outages that prevent data transmission
between distributed nodes.

- Site Failures: Failures that affect an entire site or data center, resulting in the loss of access to all
resources hosted at that location.

- Media Failures: Physical damage or corruption to storage media, such as disks or tapes, leading to
data loss or corruption.

Classification:

- Transient Failures: Temporary failures that can be recovered from quickly, such as a network
glitch or a brief power outage.

- Permanent Failures: Irreversible failures that require more extensive recovery procedures, such
as hardware failures or data corruption.

__________________________________________________________________________________

Checkpoints and Recovery:

1. Checkpoints:

- Definition: Checkpoints are predefined moments in time when the state of a distributed database
system is saved to stable storage, allowing recovery to a consistent state after a failure.

- Purpose: Checkpoints help reduce the amount of work needed during recovery by providing a
consistent starting point.

- Types:

- Periodic Checkpoints: Scheduled at regular intervals to save the current state of the system.

- Forced Checkpoints: Triggered manually or automatically in response to specific events, such as

transaction commits or system checkpoints.

2. Recovery:
- Definition: Recovery in a distributed database system involves restoring the system to a
consistent state after a failure occurs.

- Phases:

- Analysis: Identifying the transactions that were in progress at the time of failure and
determining the necessary actions for recovery.

- Undo: Reverting the effects of incomplete transactions by rolling them back to their pre-failure
state.

- Redo: Reapplying the effects of committed transactions that were lost due to the failure.

- Techniques:

- Backward Recovery: Reverting to a previous consistent state and replaying transactions from
that point forward.

- Forward Recovery: Applying recovery actions directly to the current state of the system without
reverting to a previous state.

3. Recovery Protocols:

- Two-Phase Commit (2PC): Ensures atomicity and durability of distributed transactions by

coordinating commit or rollback decisions among participating nodes.

- Three-Phase Commit (3PC): Enhances the reliability of 2PC by introducing a prepare phase to
handle failure scenarios more robustly.

Process Resilience

Definition: Process resilience refers to the ability of a system or application to continue functioning
despite failures or disruptions.

Fault Tolerance:

- Redundancy: Introducing duplicate processes or components to ensure continued operation if

one fails.

- Failure Detection: Detecting failures quickly to initiate recovery processes.

- Recovery Mechanisms: Implementing strategies such as checkpointing and rollback to recover

from failures.

Techniques -

Replication: Running multiple instances of a process on different nodes to tolerate failures.

- Isolation: Isolating individual processes to prevent failures from propagating to other

components.
- Graceful Degradation: Prioritizing essential functions to maintain basic functionality during failure
conditions.

Challenges:

- Overhead: Replication and recovery mechanisms can introduce overhead in terms of resources
and performance.

- Consistency: Ensuring consistency across replicated processes while maintaining performance.

- Complexity: Designing and managing resilient systems can be complex and require careful
planning.

__________________________________________________________________________________

Reliable Client-Server Communication:

Definition: Reliable client-server communication ensures that data is transmitted accurately and in
the correct order between clients and servers, even in the presence of failures or network issues.

Techniques

- Acknowledgments: Using acknowledgments to confirm successful receipt of data and

retransmitting if necessary.

- Sequence Numbers: Assigning sequence numbers to data packets to ensure correct ordering.

- Timeouts and Retransmissions: Setting timeouts to detect lost packets and retransmitting them if
no acknowledgment is received.

Protocols:

- TCP (Transmission Control Protocol): Provides reliable, connection-oriented communication with

mechanisms such as acknowledgment, retransmission, and flow control.

- HTTP (Hypertext Transfer Protocol): Built on top of TCP, it ensures reliable transfer of web data
between clients and servers.

- RPC (Remote Procedure Call): Provides reliable communication between distributed systems by
abstracting procedure calls over the network.

4. Challenges:

- Performance: Ensuring reliability without sacrificing performance can be challenging.

- Overhead: Adding reliability mechanisms can increase network overhead and latency.

- Scalability: Maintaining reliability in large-scale distributed systems with many clients and servers
can be complex.

_____________________________________________________________________

Reliable Group Communication:

Definition: Reliable group communication ensures that messages are delivered to all members of a
group in a consistent and ordered manner, even in the presence of failures or network partitions.

Techniques

- Total Order: Ensuring that messages are delivered to all group members in the same order.

- View Synchronization: Keeping group members synchronized to detect failures and maintain
consistency.

- Membership Management: Handling dynamic changes in group membership due to joins, leaves,
or failures.

3. Protocols:

- IP Multicast: Allows for one-to-many communication by sending packets to a group of destination

hosts.

- Paxos: A consensus protocol used to ensure agreement among a group of nodes in a distributed
system.

- Virtual Synchrony: Maintains a consistent view of the group by synchronizing membership

changes and message delivery.

4. Challenges:

- Scalability: Ensuring reliable group communication in large-scale distributed systems with many
members.

- Fault Tolerance: Handling failures and network partitions while maintaining consistency.

- Complexity: Designing and implementing reliable group communication protocols can be complex
and require careful consideration of various factors.

Mechanism for commit and recovery in distributed Database system

Ans: In distributed database systems, the Two-Phase Commit (2PC) protocol is commonly used for
commit, and recovery is often facilitated by techniques such as logging and checkpoints.

Two-Phase Commit Protocol:

1. Prepare Phase:

- The coordinator (typically the transaction manager) sends a prepare request to all participants
(resource managers) involved in the transaction.

- Each participant responds with either a "yes" (vote to commit) or "no" (vote to abort).

- If any participant votes "no" (indicating it cannot commit the transaction), the coordinator
proceeds to the abort phase.

2. Commit Phase:

- If all participants vote "yes" in the prepare phase, the coordinator sends a commit request to all
participants.
- Upon receiving the commit request, each participant performs the commit operation, making the
transaction's changes permanent.

- After successfully committing, the participant acknowledges the coordinator.

3. Abort Phase:

- If any participant votes "no" in the prepare phase or if the coordinator times out waiting for
responses, the coordinator sends an abort request to all participants.

- Upon receiving the abort request, each participant rolls back the transaction, undoing any
changes made by the transaction.

- After successfully aborting, the participant acknowledges the coordinator.

Recovery Mechanisms: Logging and Checkpoints

1. Logging:

- Logging involves recording all changes made by transactions to a log file before they are applied
to the database.

- During recovery, the log is replayed to redo committed transactions or undo aborted
transactions, bringing the system to a consistent state.

- Write-Ahead Logging (WAL) is a common logging protocol where changes are written to the log
before being applied to the database to ensure durability.

2. Checkpoints:

- Checkpoints involve periodically saving the system state to stable storage.

- During recovery, the system can roll back to the last checkpoint and replay the log from that point
to recover transactions committed after the checkpoint.

- Checkpoints help reduce the time and resources required for recovery by providing a consistent
starting point.

Assignment 4 - 044
No ratings yet
Assignment 4 - 044
4 pages
Fault Tolerance in Distributed Systems
100% (1)
Fault Tolerance in Distributed Systems
21 pages
Group 8 Assignment - Ruth & Cosmas
No ratings yet
Group 8 Assignment - Ruth & Cosmas
7 pages
Distributed Systems Recovery Guide
No ratings yet
Distributed Systems Recovery Guide
15 pages
Distributed Recovery Management: UNIT-4
No ratings yet
Distributed Recovery Management: UNIT-4
31 pages
Distributed System Recovery Guide
No ratings yet
Distributed System Recovery Guide
119 pages
DC Unit 4 Important
No ratings yet
DC Unit 4 Important
6 pages
Chapter 8 Fault Tolerance
No ratings yet
Chapter 8 Fault Tolerance
20 pages
Du3 1
No ratings yet
Du3 1
54 pages
DS Unit5
No ratings yet
DS Unit5
13 pages
Consensus
No ratings yet
Consensus
77 pages
Possible Types of Failure
No ratings yet
Possible Types of Failure
16 pages
Distributed Deadlock & Recovery
No ratings yet
Distributed Deadlock & Recovery
55 pages
Ads Unit 4
No ratings yet
Ads Unit 4
6 pages
Ds Chapter 7
No ratings yet
Ds Chapter 7
21 pages
Ds Part B
No ratings yet
Ds Part B
30 pages
System Recovery
No ratings yet
System Recovery
38 pages
Word Unit5
No ratings yet
Word Unit5
19 pages
15-440 Distributed Systems: Fault Tolerance, Logging and Recovery Thursday Oct 8, 2015
No ratings yet
15-440 Distributed Systems: Fault Tolerance, Logging and Recovery Thursday Oct 8, 2015
30 pages
DC Unit 3
No ratings yet
DC Unit 3
44 pages
Distributed Computing QB Answers
No ratings yet
Distributed Computing QB Answers
15 pages
Unit-3 Part2
No ratings yet
Unit-3 Part2
74 pages
DDP Unit V
No ratings yet
DDP Unit V
44 pages
DS Chapter V8.0fault Tolerance
No ratings yet
DS Chapter V8.0fault Tolerance
23 pages
Chapter 7-Fault Tolerance
No ratings yet
Chapter 7-Fault Tolerance
71 pages
CC Unit-4
No ratings yet
CC Unit-4
28 pages
DC
No ratings yet
DC
37 pages
Fault Tolerance FDCC
No ratings yet
Fault Tolerance FDCC
76 pages
14CS705B-Distributed Systems Scheme
No ratings yet
14CS705B-Distributed Systems Scheme
24 pages
Chapter 8-Fault Tolerance
No ratings yet
Chapter 8-Fault Tolerance
30 pages
DS Mid-Terms Preparation
No ratings yet
DS Mid-Terms Preparation
11 pages
Distributed DBMS Reliability Unit IV
100% (1)
Distributed DBMS Reliability Unit IV
27 pages
DC Ese Notes
No ratings yet
DC Ese Notes
47 pages
DSC5
No ratings yet
DSC5
13 pages
DSCC QB Solution
No ratings yet
DSCC QB Solution
15 pages
Dos Notes
No ratings yet
Dos Notes
18 pages
DS ModelQP Solution
No ratings yet
DS ModelQP Solution
44 pages
6CS5 DS Unit-5
No ratings yet
6CS5 DS Unit-5
34 pages
Intro To DS Chapter 6
No ratings yet
Intro To DS Chapter 6
51 pages
DS Unit - 4
No ratings yet
DS Unit - 4
20 pages
Distributed Computing: Farhad Muhammad Riaz
No ratings yet
Distributed Computing: Farhad Muhammad Riaz
18 pages
Unit IV - Distributed Transaction Processing
No ratings yet
Unit IV - Distributed Transaction Processing
38 pages
Session 35
No ratings yet
Session 35
3 pages
Distributed Systems & Fault Tolerance
No ratings yet
Distributed Systems & Fault Tolerance
34 pages
Distributed Systems Overview
No ratings yet
Distributed Systems Overview
48 pages
Unit # IV Replication and Fault Tolerance
No ratings yet
Unit # IV Replication and Fault Tolerance
82 pages
Distributed Dbms Advanced Concepts
No ratings yet
Distributed Dbms Advanced Concepts
70 pages
Distributed Systems As DS DS
No ratings yet
Distributed Systems As DS DS
7 pages
Distributed Transactions, ACID, BLOB
No ratings yet
Distributed Transactions, ACID, BLOB
3 pages
Failure Recovery in Distributed Systems
No ratings yet
Failure Recovery in Distributed Systems
24 pages
DS Ia1
No ratings yet
DS Ia1
34 pages
Chapter 7
No ratings yet
Chapter 7
26 pages
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
No ratings yet
Fault Tolerance:-: Introduction, Process Resilience, Distributed Commit, Recovery
52 pages
Distributed Transactions in Distributed Systems
No ratings yet
Distributed Transactions in Distributed Systems
6 pages
Distributed Systems Checkpointing
No ratings yet
Distributed Systems Checkpointing
2 pages
Iii Year/V Semester Question Bank Unit-Iv Part-A
No ratings yet
Iii Year/V Semester Question Bank Unit-Iv Part-A
5 pages
Distributed Database Systems Guide
No ratings yet
Distributed Database Systems Guide
29 pages
DS CH7 - Fault Tolerance
No ratings yet
DS CH7 - Fault Tolerance
17 pages
Lecture 7 PDC
No ratings yet
Lecture 7 PDC
8 pages
Atmel 42141 SAM AT02333 Safe and Secure Bootloader Implementation For SAM3 4 Application Note
No ratings yet
Atmel 42141 SAM AT02333 Safe and Secure Bootloader Implementation For SAM3 4 Application Note
41 pages
NE40E V800R011C00 Product Description
No ratings yet
NE40E V800R011C00 Product Description
130 pages
DCN Unit-4 Notes
No ratings yet
DCN Unit-4 Notes
15 pages
5G RAN Features Overview
No ratings yet
5G RAN Features Overview
6 pages
Egd Protocol Manual
No ratings yet
Egd Protocol Manual
70 pages
5130 - 04 5G Basic Service Capabilities and Applications
100% (1)
5130 - 04 5G Basic Service Capabilities and Applications
92 pages
Vida s5800 Series
No ratings yet
Vida s5800 Series
5 pages
Computer Networking Essentials
No ratings yet
Computer Networking Essentials
15 pages
cs3591 CN
No ratings yet
cs3591 CN
36 pages
5.2.1.1 Video Explanation - Transport Layer Protocols
No ratings yet
5.2.1.1 Video Explanation - Transport Layer Protocols
1 page
Mmedia and Information Literacy Quarter 1 Module 3
No ratings yet
Mmedia and Information Literacy Quarter 1 Module 3
34 pages
Ultra Reliable Low Latency Communication URLLC Mobile Communication and 5G Standards
No ratings yet
Ultra Reliable Low Latency Communication URLLC Mobile Communication and 5G Standards
8 pages
CN Material Unit-1-2023
No ratings yet
CN Material Unit-1-2023
18 pages
Message Passing in Distributed Operating Systems
No ratings yet
Message Passing in Distributed Operating Systems
39 pages
Network Exam Ta
No ratings yet
Network Exam Ta
66 pages
Internet Fundamentals by Ietf
No ratings yet
Internet Fundamentals by Ietf
78 pages
Module-01 CN Search Creators Hanumanthu
No ratings yet
Module-01 CN Search Creators Hanumanthu
42 pages
Network Programming & Management
No ratings yet
Network Programming & Management
27 pages
Marvell Hpe Storefabric Sn1100Q: Single and Dual Port 16Gfc Adapters
No ratings yet
Marvell Hpe Storefabric Sn1100Q: Single and Dual Port 16Gfc Adapters
6 pages
BDCOM S3900 (S3928GX) 10G Routing Switch Series - Adv - Pagev2.0
No ratings yet
BDCOM S3900 (S3928GX) 10G Routing Switch Series - Adv - Pagev2.0
7 pages
Connection-Oriented vs Connectionless
No ratings yet
Connection-Oriented vs Connectionless
7 pages
Ad-Hoc Mac: Class - 1
No ratings yet
Ad-Hoc Mac: Class - 1
69 pages
ROUTER
No ratings yet
ROUTER
26 pages
Unit 3 - Wireless Network
0% (1)
Unit 3 - Wireless Network
13 pages
Introduction To Multimedia Networks
No ratings yet
Introduction To Multimedia Networks
36 pages
Distributed Systems Ch2-2022
No ratings yet
Distributed Systems Ch2-2022
10 pages
Time Sensitive Networking For Wireless Networks - A State of The Art Analysis
No ratings yet
Time Sensitive Networking For Wireless Networks - A State of The Art Analysis
5 pages
Signaling System: A Seminar
No ratings yet
Signaling System: A Seminar
17 pages
Avnu WirelessTSN White Paper V1.0 - Final
No ratings yet
Avnu WirelessTSN White Paper V1.0 - Final
16 pages
Services Provided by The Transport Layer
No ratings yet
Services Provided by The Transport Layer
16 pages

Unit 4 - DSRM

Uploaded by

Unit 4 - DSRM

Uploaded by

Unit 4

Failures and Their Classification:

Checkpoints and Recovery:

- Forced Checkpoints: Triggered manually or automatically in response to specific events, such as

- Two-Phase Commit (2PC): Ensures atomicity and durability of distributed transactions by

- Redundancy: Introducing duplicate processes or components to ensure continued operation if

- Failure Detection: Detecting failures quickly to initiate recovery processes.

- Recovery Mechanisms: Implementing strategies such as checkpointing and rollback to recover

Replication: Running multiple instances of a process on different nodes to tolerate failures.

- Isolation: Isolating individual processes to prevent failures from propagating to other

- Consistency: Ensuring consistency across replicated processes while maintaining performance.

Reliable Client-Server Communication:

- Acknowledgments: Using acknowledgments to confirm successful receipt of data and

- TCP (Transmission Control Protocol): Provides reliable, connection-oriented communication with

- Performance: Ensuring reliability without sacrificing performance can be challenging.

Reliable Group Communication:

- IP Multicast: Allows for one-to-many communication by sending packets to a group of destination

- Virtual Synchrony: Maintains a consistent view of the group by synchronizing membership

Mechanism for commit and recovery in distributed Database system

Two-Phase Commit Protocol:

- After successfully committing, the participant acknowledges the coordinator.

- After successfully aborting, the participant acknowledges the coordinator.

Recovery Mechanisms: Logging and Checkpoints

- Checkpoints involve periodically saving the system state to stable storage.

You might also like