Unit 2
Unit 2
This technique results in considerable saving in the cost; only one scalar is
piggybacked on every message.
Clock synchronization is the process of ensuring that physically distributed processors have a
common notion of time.
Due to different clocks rates, the clocks at various sites may diverge with time, and
periodically a clock synchrinization must be performed to correct this clock skew in
distributed systems. Clocks are synchronized to an accurate real-time standard like UTC
(Universal Coordinated Time). Clocks that must not only be synchronized with each other but
also have to adhere to physical time are termed physical clocks. This degree of
synchronization additionally enables to coordinate and schedule actions between multiple
computers connected to a common network.
Dr. Gopikrishnan M 37
UNIT II Distributed systems CS3551
Fig 1.30 a) Offset and delay estimation Fig 1.30 b) Offset and delay estimation
between processes from same server between processes from different servers
Let T1, T2, T3, T4 be the values of the four mostrecent timestamps. The clocks A and
B are stable andrunning at the same speed. Let a = T1 − T3 and b = T2 − T4. If the
networkdelay difference from A to B and from B to A, called differential delay, is
small, the clock offset and roundtrip delay of B relative to A at time T4are approximately
given by the following:
Each NTP message includes the latest three timestamps T1, T2, andT3, while T4 is
determined upon arrival.
Dr. Gopikrishnan M 38
UNIT II Distributed systems CS3551
UNIT II
Dr. Gopikrishnan M 30
UNIT II Distributed systems CS3551
In a system of logical clocks, every process has a logical clock that is advanced using
a set of rules. Every event is assigned a timestamp and the causality relation between events
can be generally inferred from their timestamps.
The timestamps assigned to events obey the fundamental monotonicity property; that
is, if an event a causally affects an event b, then the timestamp of a is smaller than the
timestamp of b.
A system of logical clocks consists of a time domain T and a logical clock C. Elements of T form a
partially ordered set over a relation <. This relation is usually called the happened before or causal
precedence.
The logical clock C is a function that maps an event e in a distributed system to an element in
the time domain T denoted as C(e).
such that
for any two events ei and ej,. ei→ej C(ei)< C(ej).
This monotonicity property is called the clock consistency condition.When T and C satisfy
the following condition,
Data structures:
Each process pimaintains data structures with the given capabilities:
• A local logical clock (lci), that helps process pi measure itsown progress.
• A logical global clock (gci), that is a representation of process pi’s local view of the logical
global time. It allows this process to assignconsistent timestamps to its local events.
Dr. Gopikrishnan M 31
UNIT II Distributed systems CS3551
Protocol:
The protocol ensures that a process’s logical clock, and thus its view of theglobal
time, is managed consistently with the following rules:
Rule 1: Decides the updates of the logical clock by a process. It controls send, receive and
other operations.
Rule 2: Decides how a process updates its global logical clock to update its view of the
global time and global progress. It dictates what information about the logical time is
piggybacked in a message and how this information is used by the receiving process to
update its view of the global time.
Dr. Gopikrishnan M 32
UNIT II Distributed systems CS3551
2. Total Reordering:Scalar clocks order the events in distributed systems.But all the events
do not follow a common identical timestamp. Hence a tie breaking mechanism is essential to
order the events. The tie breaking is done through:
• Linearly order process identifiers.
• Process with low identifier value will be given higher priority.
The term (t, i) indicates timestamp of an event, where t is its time of occurrence and i is the
identity of the process where it occurred.
The total order relation ( ) over two events x and y with timestamp (h, i) and (k, j) is given by:
3. Event Counting
If event e has a timestamp h, then h−1 represents the minimum logical duration,
counted in units of events, required before producing the event e. This is called height of the
event e. h-1 events have been produced sequentially before the event e regardless of the
processes that produced these events.
4. No strong consistency
The scalar clocks are not strongly consistent is that the logical local clock and logical
global clock of a process are squashed into one, resulting in the loss causal dependency
information among events at different processes.
The time domain is represented by a set of n-dimensional non-negative integer vectors in vector
time.
Dr. Gopikrishnan M 33
UNIT II Distributed systems CS3551
Rule 2: Each message m is piggybacked with the vector clock vt of the sender
process at sending time. On the receipt of such a message (m,vt), process
pi executes the following sequence of actions:
1. update its global logical time
2. execute R1
3. deliver the message m
• There is an isomorphism between the set of partially ordered events produced by a distributed
computation and their vector timestamps.
• If the process at which an event occurred is known, the test to compare two timestamps can
be simplified as:
2. Strong consistency
The system of vector clocks is strongly consistent; thus, by examining the vector
timestamp of two events, we can determine if the events are causally related.
3. Event counting
If an event e has timestamp vh, vh[j] denotes the number of events executed by
process pj that causally precede e.
Dr. Gopikrishnan M 34
UNIT II Distributed systems CS3551
− This cuts down the message size, communication bandwidth and buffer (to store messages)
requirements.
− The storage overhead is resolved by maintaining two vectors by process pi :
Dr. Gopikrishnan M 35
UNIT II Distributed systems CS3551
− However, this means processes cannot know their transitive dependencies when looking at
the causality of events.
− In order to gain a full view of all dependencies that lead to a specific event, an offline search
must be made across processes.
− Each process pimaintains a dependency vector Di. Initially,
− i is updated as follows:
1. Whenever an event occurs at pi such that,
2. When a process pi sends a message to process pj, it piggybacks the updatedvalue of Di[i] in
the message.
3. When pi receives a message from pj with piggybacked value d, piupdates its dependency
vector as follows: Di[j]:= max{Di[j], d}.
Dr. Gopikrishnan M 36
CS3551 Distributed Systems – UNIT II
Asynchronous Executions
An asynchronous execution (or A-execution) is an execution (E, ≺) for which the causality relation
is a partial order.
• There cannot be any causal relationship between events in asynchronous execution.
• The messages can be delivered in any order even in non FIFO.
• Though there is a physical link that delivers the messages sent on it in FIFO order due
to the physical properties of the medium, a logicallink may be formed as a composite
of physical links and multiple paths mayexist between the two end points of the
logical link.
Dr. Gopikrishnan M 1
CS3551 Distributed Systems – UNIT II
FIFO executions
• Two send events s and s’ are related by causality ordering (not physical time
ordering), then a causally ordered execution requires that their corresponding receive
events r and r’ occur in the same order at all common destinations.
• If s and s’ are not related by causality, then CO is vacuously satisfied.
• Causal order is used in applications that update shared data, distributed shared
memory, or fair resource allocation.
• A message m that arrives in the local OS buffer at Pi may have to be delayed until the
messages that were sent to Pi causally before m was sent have arrived and are
processed by the application.
• The delayed message m is then given to the application for processing. The event of
an application processing an arrived message is referred to as a delivery event.
• No message overtaken by a chain of messages between the same (sender, receiver)
pair.
If send(m1) ≺ send(m2) then for each common destination d of messages m1 and m2,
deliverd(m1) ≺deliverd(m2) must be satisfied.
.
2. Empty Interval Execution: An execution (E ≺) is an empty-interval (EI)
execution if for each pair of events (s, r) ∈ T, the open interval set
Dr. Gopikrishnan M 2
CS3551 Distributed Systems – UNIT II
Synchronous Execution
• When all the communication between pairs of processes uses synchronous send and receives
primitives, the resulting order is the synchronous order.
• The synchronous communication always involves a handshake between the receiver and the
sender, the handshake events may appear to be occurring instantaneously and atomically.
• The instantaneous communication property of synchronous executions requires a modified
definition of the causality relation because for each (s, r) ∈ T, the send event is not causally
ordered before the receive event.
• The two events are viewed as being atomic and simultaneous, and neither event precedes the
other.
S2: If (s, r ∈ T, then) for all x ∈ E, [(x<< s ⇐⇒ x<<r) and (s<< x ⇐⇒ r<< x)].
Dr. Gopikrishnan M 3
CS3551 Distributed Systems – UNIT II
• An execution can be modeled to give a total order that extends the partial order
(E, ≺).
• In an A-execution, the messages can be made to appear instantaneous if there exist a
linear extension of the execution, such that each send event is immediately followed
by its corresponding receive event in this linear extension.
Non-separated linear extension is an extension of (E, ≺) is a linear extension of (E, ≺) such that
for each pair (s, r) ∈ T, the interval { x∈ E s ≺ x ≺ r } is empty.
A A-execution (E, ≺) is an RSC execution if and only if there exists a non-separated linear
extension of the partial order (E, ≺).
• In the non-separated linear extension, if the adjacent send event and its corresponding
receive event are viewed atomically, then that pair of events shares a common past
and a common future with each other.
Crown
Let E be an execution. A crown of size k in E is a sequence <(si, ri), i ∈{0,…, k-1}> of pairs of
corresponding send and receive events such that: s0 ≺ r1, s1 ≺ r2, sk−2 ≺ rk−1, sk−1 ≺ r0.
The crown is <(s1, r1) (s2, r2)> as we have s1 ≺ r2 and s2 ≺ r1. Cyclic dependencies
may exist in a crown. The crown criterion states that an A-computation is RSC, i.e., it can be
realized on a system with synchronous communication, if and only if it contains no crown.
Dr. Gopikrishnan M 4
CS3551 Distributed Systems – UNIT II
− RSC ⊂ CO ⊂ FIFO ⊂ A
− This hierarchy is illustrated in Figure 2.3(a), and example executions of each class are
shown side-by-side in Figure 2.3(b)
− The above hierarchy implies that some executions belonging to a class X will not
belong to any of the classes included in X. The degree of concurrency is most in A
and least in SYNC.
− A program using synchronous communication is easiest to develop and verify.
− A program using non-FIFO communication, resulting in an A execution, is hardest to
design and verify.
2.2.3 Simulations
− The events in the RSC execution are scheduled as per some non-separated linear
extension, and adjacent (s, r) events in this linear extension are executed sequentially
in the synchronous system.
− The partial order of the asynchronous execution remains unchanged.
− If an A-execution is not RSC, then there is no way to schedule the events to make
them RSC, without actually altering the partial order of the given A-execution.
− However, the following indirect strategy that does not alter the partial order can be
used.
− Each channel Ci,j is modeled by a control process Pi,j that simulates the channel buffer.
− An asynchronous communication from i to j becomes a synchronous communication
from i to Pi,j followed by a synchronous communication from Pi,j to j.
− This enables the decoupling of the sender from the receiver, a feature that is essential
in asynchronous systems.
Dr. Gopikrishnan M 5
CS3551 Distributed Systems – UNIT II
2.3.1 Rendezvous
Rendezvous systems are a form of synchronous communication among an arbitrary
number of asynchronous processes. All the processes involved meet with each other, i.e.,
communicate synchronously with each other at one time. Two types of rendezvous systems
are possible:
• Binary rendezvous: When two processes agree to synchronize.
• Multi-way rendezvous: When more than two processes agree to synchronize.
Dr. Gopikrishnan M 6
CS3551 Distributed Systems – UNIT II
• Scheduling involves pairing of matching send and receives commands that are both
enabled. The communication events for the control messages under the covers do not
alter the partial order of the execution.
The message (M) types used are: M, ack(M), request(M), and permission(M). Execution
events in the synchronous execution are only the send of the message M and receive of the
message M. The send and receive events for the other message types – ack(M), request(M),
and permission(M) which are control messages. The messages request(M), ack(M), and
permission(M) use M’s unique tag; the message M is not included in these messages.
--------------------------------------------------------------------------------------------------------------------------
(message types)
Pi executes send(M) and blocks until it receives ack(M) from Pj . The send event SEND(M) now
completes.
Any M’ message (from a higher priority processes) and request(M’) request for synchronization (from
a lower priority processes) received during the blocking period are queued.
(i) If a message M’ arrives from a higher priority process Pk, Pi accepts M’ by scheduling a
RECEIVE(M’) event and then executes send(ack(M’)) to Pk.
Dr. Gopikrishnan M 7
CS3551 Distributed Systems – UNIT II
(ii) If a request(M’) arrives from a lower priority process Pk, Pi executes send(permission(M’)) to Pk
and blocks waiting for the messageM’. WhenM’ arrives, the RECEIVE(M’) event is executed.
(2c) When the permission(M) arrives, Pi knows partner Pj is synchronized and Pi executes send(M).
The SEND(M) now completes.
At the time a request(M) is processed by Pi, process Pi executes send(permission(M)) to Pj and blocks
waiting for the message M. When M arrives, the RECEIVE(M) event is executed and the process
unblocks.
At the time a message M is processed by Pi, process Pi executes RECEIVE(M) (which is assumed to
be always enabled) and then send(ack(M)) to Pj .
When Pi is unblocked, it dequeues the next (if any) message from the queue and processes it as a
message arrival (as per rules 3 or 4).
---------------------------------------------------------------------------------------------------------
Dr. Gopikrishnan M 8
CS3551 Distributed Systems – UNIT II
Dr. Gopikrishnan M 9
CS3551 Distributed Systems – UNIT II
The Propagation Constraints also imply that if either (I) or (II) is false, the information
“d ∈ M.Dests” must not be stored or propagated, even to remember that (I) or (II) has been
falsified:
▪ not in the causal future of Deliverd(M1, a)
▪ not in the causal future of e k, c where d ∈Mk,cDests and there is no other
message sent causally between Mi,a and Mk, c to the same destination d.
Dr. Gopikrishnan M 10
CS3551 Distributed Systems – UNIT II
The data structures maintained are sorted row–major and then column–major:
Dr. Gopikrishnan M 11
CS3551 Distributed Systems – UNIT II
1. Explicit tracking:
▪ Tracking of (source, timestamp, destination) information for messages (i) not known to be
delivered and (ii) not guaranteed tobe delivered in CO, is done explicitly using the I.Dests
field of entries inlocal logs at nodes and o.Dests field of entries in messages.
▪ Sets li,aDestsand oi,a. Dests contain explicit information of destinations to which Mi,ais not
guaranteed to be delivered in CO and is not known to be delivered.
▪ The information about d ∈Mi,a .Destsis propagated up to the earliestevents on all causal
paths from (i, a) at which it is known that Mi,a isdelivered to d or is guaranteed to be
delivered to d in CO.
2. Implicit tracking:
▪ Tracking of messages that are either (i) already delivered, or (ii) guaranteed to be
delivered in CO, is performed implicitly.
▪ The information about messages (i) already delivered or (ii) guaranteed tobe delivered
in CO is deleted and not propagated because it is redundantas far as enforcing CO is
concerned.
▪ It is useful in determiningwhat information that is being carried in other messages and
is being storedin logs at other nodes has become redundant and thus can be purged.
▪ Thesemantics are implicitly stored and propagated. This information about messages
that are (i) already delivered or (ii) guaranteed to be delivered in CO is tracked
without explicitly storing it.
▪ The algorithm derives it from the existing explicit information about messages (i) not
known to be delivered and (ii) not guaranteed to be delivered in CO, by examining
only oi,aDests or li,aDests, which is a part of the explicit information.
Dr. Gopikrishnan M 12
CS3551 Distributed Systems – UNIT II
M5,1.Dests which was needed for routing, must not be stored in Log6 because of constraint I.
In the same way when M5,1 is delivered to process P4
at event (4, 1), only the new piggybacked information P6 ∈ M5,1 .Dests is inserted in Log4 as
M5,1.Dests =P6which is later propagated duringmulticast M4,2.
Multicast M4,3
At event (4, 3), the information P6 ∈M5,1.Dests in Log4 is propagated onmulticast M4,3only to
process P6 to ensure causal delivery using the DeliveryCondition. The piggybacked
information on message M4,3sent to process P3must not contain this information because of
constraint II. As long as any future message sent to P6 is delivered in causal order w.r.t.
M4,3sent to P6, it will also be delivered in causal order w.r.t. M5,1. And as M5,1 is already
delivered to P4, the information M5,1Dests = ∅ is piggybacked on M4,3 sent to P 3. Similarly,
the information P6 ∈ M5,1Dests must be deleted from Log4 as it will no longer be needed,
because of constraint II. M5,1Dests = ∅ is stored in Log4 to remember that M5,1 has been
delivered or is guaranteed to be delivered in causal order to all its destinations.
Processing at P6
When message M5,1 is delivered to P6, only M5,1.Dests = P4 is added to Log6. Further, P6
propagates only M5,1.Dests = P4 on message M6,2, and this conveys the current implicit
information M5,1 has been delivered to P6 by its very absence in the explicit information.
• When the information P6 ∈ M5,1Dests arrives on M4,3, piggybacked as M5,1 .Dests
= P6 it is used only to ensure causal delivery of M4,3 using the Delivery Condition,
and is not inserted in Log6 (constraint I) – further, the presence of M5,1 .Dests = P4
in Log6 implies the implicit information that M5,1 has already been delivered to
P6. Also, the absence of P4 in M5,1 .Dests in the explicit piggybacked information
implies the implicit information that M5,1 has been delivered or is guaranteed to be
delivered in causal order to P4, and, therefore, M5,1. Dests is set to ∅ in Log6.
• When the information P6 ∈ M5,1 .Dests arrives on M5,2 piggybacked as M5,1. Dests
= {P4, P6} it is used only to ensure causal delivery of M4,3 using the Delivery
Condition, and is not inserted in Log6 because Log6 contains M5,1 .Dests = ∅,
Dr. Gopikrishnan M 13
CS3551 Distributed Systems – UNIT II
which gives the implicit information that M5,1 has been delivered or is guaranteed
to be delivered in causal order to both P4 and P6.
Processing at P1
• When M2,2arrives carrying piggybacked information M5,1.Dests = P6 this (new)
information is inserted in Log1.
• When M6,2arrives with piggybacked information M5,1.Dests ={P4}, P1learns implicit
information M5,1has been delivered to P6 by the very absence of explicit information
P6 ∈ M5,1.Dests in the piggybacked information, and hence marks information P6 ∈
M5,1Dests for deletion from Log1. Simultaneously, M5,1Dests = P6 in Log1 implies
the implicit information that M5,1has been delivered or is guaranteed to be delivered in
causal order to P4.Thus, P1 also learns that the explicit piggybacked information
M5,1.Dests = P4 is outdated. M5,1.Dests in Log1 is set to ∅.
• The information “P6 ∈M5,1.Dests piggybacked on M2,3,which arrives at P 1, is
inferred to be outdated usingthe implicit knowledge derived from M5,1.Dest= ∅” in
Log1.
Complexity: Each message transmission takes two message hops and exactly n messages
in a system of n processes.
Drawbacks: A centralized algorithm has a single point of failure and congestion, and is
not an elegant solution.
Phase 2
• The sender process awaits a reply from all the group members who respond with a
tentative proposal for a revised timestamp for that message M.
Dr. Gopikrishnan M 14
CS3551 Distributed Systems – UNIT II
Phase 3
• The process multicasts the final timestamp to the group.
Phase 2
• The receiver sends the revised timestamp back to the sender. The receiver then waits
in a non-blocking manner for the final timestamp.
Phase 3
• The final timestamp is received from the multicaster. The corresponding message
entry in temp_Q is identified using the tag, and is marked as deliverable after the
revised timestamp is overwritten by the final timestamp.
• The queue is then resorted using the timestamp field of the entries as the key. As the
queue is already sorted except for the modified entry for the message under
consideration, that message entry has to be placed in its sorted position in the queue.
Dr. Gopikrishnan M 15
CS3551 Distributed Systems – UNIT II
• If the message entry is at the head of the temp_Q, that entry, and all consecutive
subsequent entries that are also marked as deliverable, are dequeued from temp_Q,
and enqueued in deliver_Q.
Complexity
This algorithm uses three phases, and, to send a message to n − 1 processes, it uses 3(n – 1)
messages and incurs a delay of three message hops
Dr. Gopikrishnan M 16
CS3551 Distributed Systems – UNIT II
Law of conservation of messages: Every messagemijthat is recorded as sent in the local state of a
process pi must be capturedin the state of the channel Cij or in the collected local state of the
receiver process pj.
➢ In a consistent global state, every message that is recorded as received isalso recorded
as sent. Such a global state captures the notion of causalitythat a message cannot be
received if it was not sent.
➢ Consistent global statesare meaningful global states and inconsistent global states are
not meaningful in the sense that a distributed system can never be in an
inconsistentstate.
Dr. Gopikrishnan M 17
CS3551 Distributed Systems – UNIT II
• Any message that is sent by a process after recording its snapshot, mustnot be
recorded in the global snapshot (from C2).
Issue 2:
How to determine the instant when a process takes its snapshot?
The answer
Answer:
A process pj must record its snapshot before processing a message mij that was sent by
process pi after recording its snapshot.
A snapshot captures the local states of each process along with the state of each communication channel.
2.9.1Chandy–Lamport algorithm
• The algorithm will record a global snapshot for each process channel.
• The Chandy-Lamport algorithm uses a control message, called a marker.
• Aftera site has recorded its snapshot, it sends a marker along all of its
outgoingchannels before sending out any more messages.
• Since channels are FIFO, amarker separates the messages in the channel into those to
be included in the snapshot from those not to be recorded inthe snapshot.
• This addresses issue I1. The role of markers in a FIFO systemis to act as delimiters
for the messages in the channels so that the channelstate recorded by the process at
the receiving end of the channel satisfies thecondition C2.
Dr. Gopikrishnan M 18
CS3551 Distributed Systems – UNIT II
Initiating a snapshot
• Process Pi initiates the snapshot
• Pi records its own state and prepares a special marker message.
• Send the marker message to all other processes.
• Start recording all incoming messages from channels Cij for j not equal to i.
Propagating a snapshot
• For all processes Pjconsider a message on channel Ckj.
• If marker message is seen for the first time:
− Pjrecords own sate and marks Ckj as empty
− Send the marker message to all other processes.
− Record all incoming messages from channels Clj for 1 not equal to j or k.
− Else add all messages from inbound channels.
Terminating a snapshot
• All processes have received a marker.
• All process have received a marker on all the N-1 incoming channels.
• A central server can gather the partial state to build a global snapshot.
Dr. Gopikrishnan M 19
CS3551 Distributed Systems – UNIT II
Complexity
The recording part of a single instance of the algorithm requires O(e) messages
and O(d) time, where e is the number of edges in the network and d is thediameter of the
network.
Dr. Gopikrishnan M 20