UNIT-2
Distributed Deadlock Detection
Introduction
In distributed systems, a deadlock occurs when a set of processes wait indefinitely for resources held
by each other. Unlike centralized systems, detecting deadlocks in a distributed environment is more
complex due to the absence of a global state.
Deadlock Handling Strategies in Distributed Systems
1. Prevention – Ensuring that at least one of the necessary deadlock conditions (mutual
exclusion, hold and wait, no preemption, circular wait) does not hold.
2. Avoidance – Using techniques like wait-die or wound-wait to prevent deadlocks.
3. Detection and Recovery – Allowing deadlocks to occur and then detecting and resolving
them using algorithms.
Issues in Deadlock Detection and Resolution
Lack of Global State – No single entity has complete knowledge of resource allocation.
False Positives – Temporary cycles in wait-for graphs can be mistaken for deadlocks.
Overhead – Detection mechanisms introduce communication and computation costs.
Control Organizations for Distributed Deadlock Detection
1. Centralized Control – A single node is responsible for detecting deadlocks.
2. Distributed Control – All nodes cooperate to detect deadlocks.
3. Hierarchical Control – A hybrid approach using multiple levels of detection.
Centralized and Distributed Deadlock Detection Algorithms
Centralized Algorithm – A designated node maintains a global wait-for graph and detects
cycles.
Distributed Algorithm – Each node maintains part of the wait-for graph and exchanges
messages to detect cycles collectively.
Hierarchical Deadlock Detection Algorithms
Uses multiple levels of control where local controllers detect deadlocks at their level and
escalate potential issues to higher levels.
Reduces overhead compared to fully distributed detection.
Agreement Protocols
Introduction
Agreement protocols ensure that all non-faulty processes in a distributed system reach a common
decision, despite failures or misbehavior by some nodes.
The System Model
Synchronous Model – Messages are delivered within a known time bound.
Asynchronous Model – No guarantee on message delivery time.
Failure Types – Crash failures (nodes stop functioning), Byzantine failures (nodes act
arbitrarily).
Classification of Agreement Problems
1. Consensus Problem – Processes must agree on a single value.
2. Byzantine Agreement Problem – Achieving consensus even when some nodes behave
maliciously.
3. Atomic Commit Problem – Ensures that all nodes commit or abort a transaction.
Solutions to the Byzantine Agreement Problem
1. Oral Message Algorithm (OM) – Requires more than 2/3 majority of honest nodes to work.
2. Signed Message Algorithm (SM) – Uses cryptographic signatures to ensure message
authenticity.
Applications of Agreement Algorithms
Fault-tolerant distributed computing
Blockchain and cryptocurrency consensus mechanisms
Reliable distributed database transactions
Distributed Resource Management
Introduction
Managing resources (files, processes, memory) in a distributed system is complex due to autonomy,
concurrency, and fault tolerance requirements.
Architecture
Client-Server Model – Clients request resources from centralized or distributed servers.
Peer-to-Peer Model – Nodes manage resources collaboratively without a central server.
Mechanisms for Building Distributed File Systems
1. Remote Service Model – Files are accessed remotely over the network.
2. Caching – Frequently accessed files are stored locally for efficiency.
3. Replication – Duplicates of files are maintained across multiple nodes for fault tolerance.
Design Issues
Transparency – Users should not be aware of file location.
Concurrency Control – Managing simultaneous access to prevent conflicts.
Fault Tolerance – Ensuring data availability even if nodes fail.
Log-Structured File Systems
Instead of modifying files in place, changes are appended to a log.
Improves write performance and facilitates crash recovery.