CS3551 – DISTRIBUTED COMPUTING
UNIT I
Definition
A distributed system is a collection of independent entities that cooperate to solve a problem that
cannot be individually solved.
A distributed system can be characterized as a collection of mostly autonomous processors
communicating over a communication network and having the following features:
No common physical clock -Thisis an important assumption because it introduces the
element of “distribution” in the system and givesrise to the inherent asynchrony
amongst the processors.
No shared memory - This is a key feature that requires message-passing for communication. This
feature implies the absence of the common physical clock
Geographical separation - The geographically wider apart that the processors are, the more
representative is the system of a distributed system.
Autonomy and heterogeneity- The processors are “loosely coupled” in that they have different
speeds and each can be running a different operating system. They are usually not part of a dedicated
system, but cooperate with one another by offering services or solving a problem jointly.
Relation to Computer System
Components
Relation to Computer System
Components
Motivation
1. Inherently distributed computations
2.Resource Sharing
3. Accessto geographically remote data and resources
4.Enhanced Reliability
5. Increased performance/cost ratio
6.Scalability
7. Modularity and incremental expandability
Shared Memory Model and
Message Passing Model
Inter-Process Communication (IPC) is a critical aspect of modern operating systems, enabling
processes to exchange data and synchronize their actions. Two primary models for IPC are the Shared
Memory Model and the Message Passing Model. In the modern operating system, the Inter-Process
Communication (IPC) is a very important element. It enables some processes that help to exchange
data and synchronize their actions.
What is the Shared Memory Model?
In this IPC Model, a shared memory region is established which is used by the processes for data
communication. This memory region is present in the address space of the process which creates the
shared memory segment. The processes that want to communicate with this process should attach
this memory segment to their address space.
Advantages of the Shared Memory Model
Fast Communication: As seen, all the processes can directly access the memory and therefore, the
rates of communication are fast.
Efficient for Large Data Transfers: The data structures can be passed from one process to another
in large blocks at once, however, there is no need to copy it in every process memory space.
Less Kernel Involvement: This was said because after the shared memory space has been
established, the kernel doesn’t have to keep on transferring data from one process to another, which
in the long run, is costly.
What is the Message Passing Model?
In this model, the processes communicate with each other by
exchanging messages. For this purpose, a Communication Link
must exist between the processes and it must facilitate at least
two operations send (message) and receive (message). The size
of messages may be variable or fixed.
Advantages of the Message Passing Model
Simplicity: Memory management is hidden on the message-
passing model, which sort of makes the actual process
communication more straightforward.
No Synchronization Required: In case of processes, there’s no
requirement to communicate memory as they do not have
access to it hence no issues on synchronization.
Distributed Systems Friendly: Due to such support of
communication between processes on two different machines,
this model is adopted in distributed systems.
Shared Memory Model Message Passing Model
The shared memory region is used for communication. A message-passing facility is used for communication.
It is used for communication between processes on a single
It is typically used in a distributed environment where
processor or multiprocessor system where the communicating
communicating processes reside on remote machines
processes reside on the same machine as the communicating
connected through a network.
processes share a common address space.
The code for reading and writing the data from the shared No such code is required here as the message-passing facility
memory should be written explicitly by the Application provides a mechanism for communication and synchronization
programmer. of actions performed by the communicating processes.
It provides a maximum speed of computation as
It is time-consuming as message passing is implemented
communication is done through shared memory so system calls
through kernel intervention (system calls).
are made only to establish the shared memory.
Here the processes need to ensure that they are not writing to It is useful for sharing small amounts of data as conflicts need
the same location simultaneously. not be resolved.
Faster communication strategy. Relatively slower communication strategy.
No kernel intervention. It involves kernel intervention.
It can be used in exchanging larger amounts of data. It can be used in exchanging small amounts of data.
Example-
Example- •Web browsers
•Data from a client process may need to be transferred to a
server process for modification before being returned to the •Web Servers
client.
•Chat program on WWW (World Wide Web)
Design issues and challenges
Distributed systems challenges from a
system perspective
1.Communication
2.Processes
3.Naming
4.Synchronization
5.Data Storage and Acces
6.Consistency and Replication
7.Fault Tolerance
8.Security
9. Applications Programming Interface (API) and transparency
10. Scalability and modularity
a. Access Transparency hides differences in data representation on
different systems and provides uniform operations to access system
resources.
b. Location transparency makes the locations of resources
transparent to the users.
c. Migration transparency allows relocating resources without
changing names.
d. The ability to relocate the resources as they are being accessed
is relocation transparency.
e. Replication transparency does not let the user become aware of
any replication.
f. Concurrency transparency deals with masking the concurrent use
of shared resources for the user.
g. Failure transparency refers to the system being reliable and fault-
tolerant.
Algorithmic challenges in
distributed computing
a. Designing useful execution models and frameworks The interleaving model and
partial order model are two widely adopted models of distributed system executions.
They have proved to be particularly useful for operational reasoning and the design of
distributed algorithms.
b. Dynamic distributed graph algorithms and distributed routing algorithms The
distributed system is modeled as a distributed graph, and the graph algorithms form
the building blocks for a large number of higher level communication, data
dissemination, object location, and object search functions.
c. Time and global state in a distributed system The processes in the system are
spread across three-dimensional physical space. Another dimension, time, has to be
superimposed uniformly across space. The challenges perta
in to providing accurate physical time, and to providing a variant of time, called logical
time. d. Synchronization/coordination mechanisms The processes must be allowed to
execute concurrently, except when they need to synchronize to exchange information,
i.e., communicate about shared data. Synchronization is essential for the distributed
processes to overcome the limited observation of the system state from the viewpoint
of any one process. Here are some examples of problems requiring synchronization.
They are Physical clock synchronization, Leader election, Mutual exclusion, Deadlock
detection and resolution, Termination detection and Garbage collection
e. Group communication, multicast, and ordered message delivery A
group is a collection of processes that share a common context and
collaborate on a common task within an application domain. Specific
algorithms need to be designed to enable efficient group
communication and group management wherein processes can join and
leave groups dynamically, or even fail.
f. Monitoring distributed events and predicates Predicates defined on
program variables that are local to different processes are used for
specifying conditions on the global system state, and are useful for
applications such as debugging, sensing the environment, and in
industrial process control.
g. Distributed program design and verification tools Methodically
designed and verifiably correct programs can greatly reduce the
overhead of software design, debugging, and engineering. Designing
mechanisms to achieve these design and verification goals is a
challenge
h. Debugging distributed programs Debugging sequential programs is
hard; debugging distributed programs is that much harder because of
the concurrency in actions and the ensuing uncertainty due to the large
number of possible executions defined by the interleaved concurrent
actions.
i. Data replication, consistency models, and caching Fast access to data
and other resources requires them to be replicated in the distributed
system. Managing such replicas in the face of updates introduces the
problems of ensuring consistency among the replicas and cached copies
. j. World Wide Web design – caching, searching, scheduling The Web is an
example of a widespread distributed system with a direct interface to the
end user, wherein the operations are predominantly read-intensive on
most objects.
k. Distributed shared memory abstraction A shared memory abstraction
simplifies the task of the programmer because he or she has to deal only
with read and write operations, and no message communication
primitives. However, under the covers in the middleware layer, the
abstraction of a shared address space has to be implemented by using
message-passing. Hence, in terms of overheads, the shared memory
abstraction is not less expensive.
l. Reliable and fault-tolerant distributed systems A reliable and fault-
tolerant environment has multiple requirements and aspects, and these
can be addressed using various strategies. They are Consensus
algorithms, Replication and replica management, Voting and quorum
systems, Distributed databases and distributed commit, Self-stabilizing
systems, Checkpointing and recovery algorithms, Failure detectors
m. Load balancing The goal of load balancing is to gain higher
throughput, and reduce the userperceived latency. The following
are some forms of load balancing: Data migration, Computation
migration and Distributed scheduling.
n. Real-time scheduling Real-time scheduling is important for
mission-critical applications, to accomplish the task execution on
schedule. The problem becomes more challenging in a
distributed system where a global view of the system state is
absent. On-line or dynamic changes to the schedule are also
harder to make without a global view of the state.
o. Performance Although high throughput is not the primary goal
of using a distributed system, achieving good performance is
important. The following are some example issues arise in
determining the performance: Metrics and Measurement
methods/tools
A Model of Distributed
Computations
A Distributed Program
A distributed program is composed of a set of n asynchronous processes p1, p2…,
pi… ,pn that communicate by message passing over the communication network.
Without loss of generality, we assume that each process is running on a different
processor. The processes do not share a global memory and communicate solely
by passing messages.
Let Cij denote the channel from process pi to process pj and let mij denote a
message sent by pi to pj . The communication delay is finite and unpredictable.
Also, these processes do not share a global clock that is instantaneously accessible
to these processes. Process execution and message transfer are asynchronous – a
process may execute an action spontaneously and a process sending a message
does not wait for the delivery of the message to be complete.
The global state of a distributed computation is composed of the states of the
processes and the communication channels. The state of a process is
characterized by the state of itslocal memory and depends upon the context. The
state of a channel is characterized by the set of messages in transit in the channel.
A model of distributed executions
The execution of a process consists of a sequential execution of
its actions. The actions are atomic and the actions of a process
are modeled as three types of events, namely, internal events,
message send events, and message receive events. Let ex i
denote the xth event at process pi. Subscripts and/or
superscripts will be dropped when they are irrelevant or are clear
from the context. For a message m, let send(m) and rec(m)
denote its send and receive events, respectively. The occurrence
of events changes the states of respective processes and
channels, thus causing transitions in the global system state. An
internal event changes the state of the process atwhich it occurs.
A send event (or a receive event) changes the state of the
process that sends (or receives) the message and the state of the
channel on which the message is sent (or received). An internal
event only affects the process at which it occurs. The events at a
process are linearly ordered by their order of occurrence. The
execution of process pi produces a sequence of events e 1 , e
2 ,.....e x ,....e x 1 ,...and it is denoted by Hi,
The send and the receive events signify the flow of information
between processes and establish causal dependency from the
sender process to the receiver process. A relation →msg that
captures the causal dependency due to message exchange, is
defined as follows. For every message m that is exchanged
between two processes, we have
send(m) →msg rec(m)
Logical vs. physical concurrency In a distributed computation,
two events are logically concurrent if and only if they do not
causally affect each other. Physical concurrency, on the other
hand, has a connotation that the events occur at the same
instant in physical time. Note that two or more events may be
logically concurrent even though they do not occur at the same
instant in physical time. Whether a set of logically concurrent
events coincide in the physical time or in what order in the
physical time they occur does not change the outcome of the
computation. Therefore, even though a set of logically concurrent
events may not have occurred at the same instant in physical
time, for all practical and theoretical purposes, we can assume
that these events occurred at the same instant in physical time.