Primitives for Distributed Communication
In a distributed system, processes run on different machines and communicate by sending messages.
The basic operations are Send() and Receive().
1. Parameters
Send(dest, buffer) → Sends data from sender’s buffer to the destination process.
Receive(source, buffer) → Receives data from a source into the receiver’s buffer.
2. Buffered vs Unbuffered
Buffered: Data copied to a kernel buffer before sending; sender can continue without waiting
for receiver.
Unbuffered: Data sent directly from user buffer to receiver; sender may wait until receiver is
ready.
3. Blocking vs Non-blocking
Blocking: Process waits until the operation finishes (send acknowledged or message
received).
Non-blocking: Operation starts and control returns immediately; process can do other work
while message is sent/received.
4. Synchronous vs Asynchronous
Synchronous: Sender and receiver meet (handshake). Send completes only when receiver is
ready.
Asynchronous: Send completes after data leaves sender’s buffer, without waiting for
receiver.
5. Four Send Modes
1. Blocking synchronous send – Wait until receiver copies data and sends acknowledgement.
2. Non-blocking synchronous send – Starts send, returns immediately; completion checked
later.
3. Blocking asynchronous send – Wait until data is copied to kernel buffer.
4. Non-blocking asynchronous send – Start copying, return immediately, check completion
later.
6. Receive Modes
Blocking receive: Wait until message arrives.
Non-blocking receive: Return immediately, check later when message arrives.
7. Examples
MPI: MPI_Send, MPI_Isend, MPI_Recv
RPC / RMI / CORBA: Higher-level communication built on message passing.
Conclusion:
These primitives decide when a process waits and how data is moved. Correct choice improves
performance, avoids deadlocks, and ensures reliable communication in distributed systems.
Global State of a Distributed System
1. Definition
The global state of a distributed system is the combined condition of:
Local state of each process
State of all communication channels
It represents a possible configuration of the whole system at a specific instant of time.
Since there is no shared memory and no global clock, computing this state is challenging.
2. Components
a) Local State of a Process
Includes: processor registers, program counter, call stack, local memory, and variable values.
Depends on the process’s execution at that moment.
b) State of a Communication Channel
The set of messages in transit (sent but not yet received) on that channel.
3. Why Global State is Important
Debugging and Monitoring – Helps to understand and trace system behavior.
Deadlock Detection – Identify if processes are all waiting for each other.
Termination Detection – Find out if computation has completed.
Checkpointing and Recovery – Save system state to recover from failures.
Performance Analysis – Evaluate how processes and channels behave.
4. How It is Computed – Distributed Snapshot
Since no global clock exists, snapshots must be coordinated:
Chandy–Lamport Algorithm (example method):
1. A process initiates the snapshot by recording its local state and sending a marker message on
all outgoing channels.
2. When a process receives a marker for the first time, it records its local state and sends
markers on its own outgoing channels.
3. Each process records messages arriving on a channel after recording its state but before
receiving a marker on that channel — these are the in-transit messages (channel state).
5. Example
Imagine 3 processes P1, P2, P3 connected in a ring:
Local state: execution variables in each process.
Channel state: messages sent but not yet received (e.g., P1 → P2: “update”, still in transit).
Conclusion:
The global state is a snapshot that combines all local and channel states, essential for consistency,
coordination, and fault recovery in distributed systems.
Design Issues and Challenges in Distributed Systems (System Perspective)
1. Introduction
Designing a distributed system is complex because processes run on different machines, communicate
over networks, and must appear as a single system to the user. From a system perspective, the design
must handle communication, coordination, security, scalability, and fault tolerance.
2. Main Issues
a) Communication
Provide reliable and efficient message exchange between processes.
Examples: Remote Procedure Call (RPC), Remote Method Invocation (RMI).
b) Process Management
Creation, scheduling, and termination of processes/threads.
Support migration of code and mobile agents between machines.
c) Naming
Assign unique, location-independent names to resources and processes.
Must be robust and scalable for geographically distributed systems.
d) Synchronization
Coordinate access to shared resources and events.
Techniques: Mutual exclusion, leader election, physical/logical clock synchronization, global
state recording.
e) Data Storage and Access
Design distributed file systems and databases for easy and efficient data access.
f) Consistency and Replication
Replication improves performance and fault tolerance, but must maintain data consistency
across copies.
g) Fault Tolerance
Detect and recover from failures.
Techniques: Reliable communication, checkpointing, distributed commit protocols.
h) Security
Protect data and communication using cryptography, authentication, authorization, and secure
channels.
i) Transparency (hiding system complexity)
Access transparency – hide differences in data formats.
Location transparency – hide location of resources.
Migration transparency – move resources without changing names.
Replication transparency – hide if resource is replicated.
Concurrency transparency – allow simultaneous access without conflict.
Failure transparency – mask system failures.
j) Scalability and Modularity
System should grow without performance loss.
Use techniques like replication, caching, asynchronous processing.
3. Conclusion
Addressing these system-level design issues ensures that the distributed system is reliable, secure,
scalable, and user-friendly, meeting both technical and performance goals.
Need of Clock Synchronization in Distributed Systems
1. Introduction
In distributed systems, each machine has its own clock.
Due to hardware drift, these clocks do not run at exactly the same rate → no single global
clock.
Many distributed algorithms need events to be ordered consistently across machines.
Clock synchronization ensures all processes have a common notion of time, either physical
or logical.
2. Why Synchronization is Needed
1. Event Ordering – Decide the correct order of events happening in different processes.
2. Causal Relationship – Ensure events that depend on each other are processed in correct
order.
3. Coordination – Tasks like mutual exclusion, leader election need agreed timing.
4. Logging & Debugging – Merge logs from different machines in correct order.
5. Transactions – Maintain consistency in distributed databases.
6. Fault Detection – Timeout-based detection needs accurate timing.
3. Methods of Synchronization
a) Scalar Time (Lamport’s Logical Clock)
Introduced by Lamport to order events without a physical clock.
Rules:
1. All counters start at 0.
2. Increment local counter for every event (internal, send, receive).
3. When sending a message → attach current counter value.
4. When receiving a message →
Ci = max(Ci, Cm) then increment by 1.
Properties:
o Consistency: Clock values always increase.
o Total ordering: Tie-breaking using process IDs.
o Event counting: Clock value − 1 = events before this event.
o Limitation: Does not give strong consistency (can’t always detect concurrency).
b) Vector Time (Vector Clocks)
Extends Lamport’s clock for strong consistency.
For N processes, each process keeps a vector of size N.
Rules:
1. Initially all counters = 0.
2. Increment own counter for each local event.
3. On send: attach vector clock.
4. On receive: increment own counter, then update each element to
max(local[i], received[i]).
Properties:
o Strong consistency: Can determine if two events are causally related.
o Captures concurrency explicitly.
4. Conclusion
Clock synchronization is essential to maintain order, causality, and coordination in distributed
systems.
Physical synchronization (e.g., NTP) keeps real clocks close; logical clocks (Lamport’s, vector
clocks) provide ordering without depending on real time.