Chapter 4 Communication
Chapter 4 Communication
Introduction
interprocess communication is at the heart of all distributed
systems
communication in distributed systems is based on message
passing as offered by the underlying network which is
harder as opposed to using shared memory
modern distributed systems consist of thousands of
processes scattered across an unreliable network such as
the Internet
unless the primitive communication facilities of the network
are replaced by more advanced ones, development of large
scale Distributed Systems becomes extremely difficult
2
Objectives of the Chapter
review how processes communicate in a network (the rules
or the protocols) and their structures
introduce the five widely used communication models for
distributed systems:
Remote Procedure Call (RPC) - which hides the details of
message passing and suitable for client-server models
Remote Object (Method) Invocation (RMI)
Message-Oriented Middleware (MOM) - instead of the
client-server model, think in terms of messages and have
a high level message queuing model similar to e-mail
Stream-Oriented Communication - for multimedia to
support the continuous flow of messages with timing
constraints
Multicast Communication - information dissemination for
several recipients
3
4.1 Network Protocols and Standards
why communication in distributed systems? because there is
no shared memory
two communicating processes must agree on the syntax and
semantics of messages
a protocol is a set of rules that governs data communications
a protocol defines what is communicated, how it is
communicated, and when it is communicated
for instance, for one computer to send a message to another
computer, the first computer must perform the following
general steps (highly simplified)
break the data into small sections called packets (message,
datagram, packet, frame)
add addressing information to the packets identifying the
source and destination computers
deliver the data to the network interface card for
transmission over the network
4
the receiving computer must perform the same steps, but in
reverse order
accept the data from the NIC
remove transmitting information that was added by the
transmitting computer
reassemble the packets of data into the original message
the key elements of a protocol are syntax, semantics, and
timing
syntax: refers to the structure or format of the data
semantics: refers to the meaning of each section of bits
timing: refers to when data should be sent and how fast
they can be sent
functions of protocols
each device must perform the same steps the same way
so that the data will arrive and reassemble properly; if
one device uses a protocol with different steps, the two
devices will not be able to communicate with each other
5
Protocols in a layered architecture
protocols that work together to provide a layer or layers of
the model are known as a protocol stack or protocol suite,
e.g. TCP/IP
each layer handles a different part of the communications
process and has its own protocol
Data Communication Standards
standards are essential for interoperability
data communication standards fall into two categories
De facto standards: that have not been approved by an
organized body; mostly set by manufacturers
De jure standards: those legislated by an officially
recognized body such as ISO, ITU, ANSI, IEEE
6
Network (Reference) Models
Layers and Services
within a single machine, each layer uses the services
immediately below it and provides services for the layer
immediately above it
between machines, layer x on one machine communicates
with layer x on another machine
Two important network models or architectures
The ISO OSI (Open Systems Interconnection) Reference
Model
The TCP/IP Reference Model
a. The OSI Reference Model
consists of 7 layers
was never fully implemented as a protocol stack, but a
good theoretical model
Open – to connect open systems or systems that are open
for communication with other systems
7
layers, interfaces, and protocols in the OSI model
8
Media (lower) Layers
Physical: Physical characteristics of the media
Data Link: Reliable data delivery across the link
Network: Managing connections across the network
or routing
Transport: End-to-end connection and reliability (handles
lost packets); TCP (connection-oriented),
UDP (connectionless), etc.
Session: Managing sessions between applications
(dialog control and synchronization); rarely
supported
Presentation: Data presentation to applications; concerned
with the syntax and semantics of the
information transmitted
Application: Network services to applications; contains
protocols that are commonly needed by
users; FTP, HTTP, SMTP, ...
Host (upper) Layers
9
a typical message as it appears on the network
10
b. The TCP/IP Reference Model
TCP/IP - Transmission Control Protocol/Internet Protocol
used by ARPANET and its successor the Internet
design goals
the ability to connect multiple networks (internetworking)
in a seamless way
the network should be able to survive loss of subnet
hardware, i.e., the connection must remain intact as long
as the source and destination machines are properly
functioning
flexible architecture to accommodate requirements of
different applications - ranging from transferring files to
real-time speech transmission
these requirements led to the choice of a packet-switching
network based on a connectionless internetwork layer
has 4 (or 5 depending on how you see it) layers:
Application, Transport, Internet (Internetwork), Host-to-
network (some split it into Physical and Data Link) 11
OSI and TCP/IP Layers Correspondence
12
Layers involved in various hosts (TCP/IP)
when a message is sent from device A to device B, it may
pass through many intermediate nodes
the intermediate nodes usually involve the first three layers
13
Middleware Protocols
a middleware is an application that contains general-purpose
protocols to provide services
example of middleware services
authentication and authorization services
distributed transactions (commit protocols; locking
mechanisms) - see later in Chapter 8
middleware communication protocols (calling a procedure
or invoking an object remotely, synchronizing streams for
real-time data, multicast services) - see later in this Chapter
hence an adapted reference model for networked
communications is required
14
an adapted reference model for networked communication
15
4.2 Remote Procedure Call
the first distributed systems were based on explicit message
exchange between processes through the use of explicit
send and receive procedures; but do not allow access
transparency
in 1984, Birrel and Nelson introduced a different way of
handling communication: RPC
it allows a program to call a procedure located on another
machine
simple and elegant, but there are implementation problems
the calling and called procedures run in different address
spaces
parameters and results have to be exchanged; what if the
machines are not identical?
what happens if both machines crash?
16
Conventional Procedure Call, i.e., on a single machine
e.g. count = read (fd, buf, bytes); a C like statement, where
fd is an integer indicating a file
buf is an array of characters into which data are read
bytes is the number of bytes to be read
Stack pointer
Stack pointer
21
original message on the Pentium
(the numbers in boxes indicate the address of each byte)
22
one approach is to invert the bytes of each word after
receipt
the message after being inverted (correct integer but wrong string)
23
2. Passing Reference Parameters
assume the parameter is a pointer to an array
copy the array into the message and send it to the server
the server stub can then call the server with a pointer to this
array
the server then makes any changes to the array and sends it
back to the client stub which copies it to the client
this is in effect call-by-copy/restore
optimization of the method
one of the copy operations can be eliminated if the stub
knows whether the parameter is input or output to the
server
if it is an input to the server (e.g., in a call to write), it need
not be copied back
if it is an output, it need not be sent over in the first place;
only send the size
the above procedure can handle pointers to simple arrays
and structures, but difficult to generalize it to an arbitrary
data structure
24
Parameter Specification and Stub Generation
the caller and the callee need to use the same protocol
(format of messages) and the same steps; with such rules the
client and server stubs can assemble, communicate, and
interpret messages correctly
consider the following example; the procedure foobar has 3
parameters: a character, a floating point number, and an array
of 5 integers
26
Asynchronous RPC
a shortcoming of the original model is that it is blocking: but
no need of blocking for the client in some cases
two cases
1. if there is no result to be returned
e.g., inserting records in a database, ...
the server immediately sends an ack promising that it
will carryout the request
the client can now proceed without blocking
29
DCE (Distributed Computing Environment) RPC
a middleware and an example RPC system developed by
OSF (Open Software Foundation), now The Open Group; it
is designed to execute as a layer of abstraction between
existing OSs and distributed applications
available as open source and vendors integrate it into their
systems (http://www.opengroup.org/dce/)
it uses the client-server programming model and
communication is by means of RPCs
services
distributed file service: a worldwide file system that
provides a transparent way of accessing files
directory service: to keep track of the location of all
resources in the system (machines, printers, data,
servers, ...); a process can ask for a resource without
knowing its location
security service: for protecting resources; access is only
through authorization
30
distributed time service: to maintain clocks on different
machines synchronized (clock synchronization is covered
in Chapter 6)
Steps in writing a Client and a Server in DCE RPC
the system consists of languages, libraries, daemons,
utility programs, ... for writing clients and servers
IDL (Interface Definition Language) is the interface
language - the glue that holds everything together
it allows procedure declarations (similar to function
prototypes in C++)
it contains type definitions, constant declarations,
information needed to marshal parameters and
unmarshal results, and what the procedures do (only
their syntax)
31
Edit file
33
two steps for locating the server
locate the server’s machine (3)
locate the server process on that machine (with an
endpoint or port) (4)
now the RPC can take place; the above look up information
can be stored for subsequent RPCs
36
common organization of a remote object with client-side proxy
37
Object Servers
an object server is a server to support distributed objects
it does not provide a specific service; services are
implemented by the objects that reside on the server
the server provides only the means to invoke local objects
based on remote client requests
38
different approaches exist
1.assume that all objects look alike and there is only one way to
invoke an object like in DCE; inflexible
2.let a server support different policies
transient versus persistent objects
transient object: create it at first request and destroy it if
no clients are bound to it or
persistent object: it exists even if it is not currently used
separate or shared memory
put each object in a memory segment of its own, i.e.,
objects share neither code nor data; protection of
segments required, probably by the underlying OS or
objects can at least share code
threading
implement the server with a single thread of control; or
the server may have several threads, one for each of its
objects
39
Object Adaptor
activation policies: decisions on how to invoke an object
object adaptor (wrapper): to group objects per policy; it is a
software for implementing a specific activation policy
an object adaptor has one or more objects under its control
42
the situation when passing an object by reference or by value
44
Persistence and Synchronicity in Communication
assume the communication system is organized as a
computer network shown below
45
communication can be
persistent or transient
asynchronous or synchronous
persistent: a message that has been submitted for
transmission is stored by the communication system as long
as it takes to deliver it to the receiver
e.g., e-mail delivery, snail mail delivery
transient: a message that has been submitted for
transmission is stored by the communication system only as
long as the sending and receiving applications are executing
asynchronous: a sender continues immediately after it has
submitted its message for transmission
synchronous: the sender is blocked until its message is
stored in a local buffer at the receiving host or delivered to the
receiver
46
Message-Oriented Transient Communication
many applications are built on top of the simple
message-oriented model offered by the transport layer
standardizing the interface of the transport layer by
providing a set of primitives allows programmers to use
messaging protocols
they also allow porting applications
1. Berkley Sockets
an example is the socket interface as used in Berkley
UNIX
a socket is a communication endpoint to which an
application can write data that are to be sent over the
network, and from which incoming data can be read
47
Primitive Meaning Executed by
Socket Create a new communication endpoint both
Attach a local address to a socket; e.g., IP
Bind
address with a known port number
Announce willingness to accept connections; servers
Listen
non-blocking
Accept Block caller until a connection request arrives
Actively attempt to establish a connection; the
Connect clients
client is blocked until connection is set up
Send Send some data over the connection
Receive Receive some data over the connection both
Close Release the connection
48
connection-oriented communication pattern using sockets
49
2. The Message-Passing Interface (MPI)
sockets were designed to communicate across networks
using general-purpose protocol stacks such as TCP/IP
they were not designed for proprietary protocols
developed for high-speed interconnection networks; of
course portability will suffer
MPI is designed for parallel applications and tailored for
transient communication
MPI assumes communication takes place within a known
group of processes, where each group is assigned an
identifier (groupID)
each process within a group is also assigned an identifier
(processID)
a (groupID, processID) identifies the source or destination
of a message, and is used instead of a transport-level
address
50
Primitive Meaning
Append outgoing message to a local send buffer (for
MPI_bsend
transient asynchronous communication)
Send a message and wait until copied to local or remote
MPI_send
buffer; semantics are implementation dependent
Send a message and wait until receipt starts (for transient
MPI_ssend
synchronous communication)
Send a message and wait for reply; strongest form; similar
MPI_sendrecv
to RPC (synchronous)
Pass reference to outgoing message (not copying the
MPI_isend
message), and continue (asynchronous)
Pass reference to outgoing message (not copying the
MPI_issend
message), and wait until receipt starts (synchronous)
MPI_recv Receive a message; block if there are none (synchronous)
Check if there is an incoming message, but do not block
MPI_irecv
(asynchronous)
51
Message-Oriented Persistent Communication
there are message-oriented middleware services, called
Message-Queuing Systems or Message-Oriented
Middleware (MOM)
they support persistent asynchronous communication
they have intermediate-term storage capacity for messages,
without requiring the sender or the receiver to be active
during message transmission
unlike Berkley sockets and MPI, message transfer may take
minutes instead of seconds or milliseconds
Message-Queuing Model
applications communicate by inserting messages in
specific queues
it permits loosely-coupled communication
the sender may or may not be running; similarly the
receiver may or may not be running, giving four possible
combinations
52
four combinations for loosely-coupled communications using queues
54
General Architecture of a Message-Queuing System
messages can be put only into queues that are local to the
sender (same machine or on a nearby machine on a LAN)
such a queue is called the source queue
messages can also be read only from local queues
a message put into a local queue must contain the
specification of the destination queue; hence a message-
queuing system must maintain a mapping of queues to
network locations; like in DNS
56
the general organization of a message-queuing system with routers
57
Message Brokers
how can applications understand the messages they receive
each receiver can not be made to understand message
formats of new applications
hence, in a message-queuing system conversations are
handled by message brokers
a message broker converts incoming messages to a format
that can be understood by the destination application based
on a set of rules
59
4.5 Stream-Oriented Communication
until now, we focused on exchanging independent and
complete units of information
time has no effect on correctness; a system can be slow or fast
however, there are communications where time has a critical
role
Multimedia
media
storage, transmission, interchange, presentation,
representation and perception of different data types
text, graphics, images, voice, audio, video, animation, ...
movie: video + audio + …
multimedia: handling of a variety of representation media
end user pull
information overload and starvation
technology push
emerging technology to integrate media 60
The Challenge
new applications
multimedia will be pervasive in few years (as graphics)
continuous delivery
e.g., 30 frames/s (NTSC), 25 frames/s (PAL) for video
guaranteed Quality of Service
admission control
storage and transmission
e.g., 2 hours uncompressed HDTV (1920×1080) movie:
1.12 TB (1920×1080x3x25x60x60x2)
videos are extremely large, even after compressed
(actually encoded)
search
can we look at 100… videos to find the proper one?
61
Types of Media
two types
discrete media: text, executable code, graphics, images;
temporal relationships between data items are not
fundamental to correctly interpret the data
continuous media: video, audio, animation; temporal
relationships between data items are fundamental to
correctly interpret the data
a data stream is a sequence of data units and can be applied
to discrete as well as continuous media; e.g., TCP provides
byte-oriented discrete data streams
stream-oriented communication provides facilities for the
exchange of time-dependent information (continuous media)
such as audio and video streams
62
timing in transmission modes
asynchronous transmission mode: data items are
transmitted one after the other, but no timing constraints;
e.g. text transfer
synchronous transmission mode: a maximum end-to-end
delay defined for each data unit; it is possible that data can
be transmitted faster than the maximum delay, but not
slower
isochronous transmission mode: maximum and minimum
end-to-end delay are defined; also called bounded delay
jitter; applicable for distributed multimedia systems
a continuous data stream can be simple or complex
simple stream: consists of a single sequence of data; e.g.,
mono audio, video only (only visual frames)
complex stream: consists of several related simple streams,
called substreams, that must be synchronized; e.g., stereo
audio, video consisting of audio and video (may also contain
subtitles, translation to other languages, ...) 63
movie as a set of simple streams
64
a stream can be considered as a virtual connection between a
source and a sink
the source or the sink could be a process or a device
streaming means a user can listen (or watch) after the
downloading has started
we can stream stored data or live data (compression, actually
encoding is required)
65
setting up a stream directly between two devices - live data
66
the data stream can also be multicasted to several receivers
if devices and the underlying networks have different
capabilities, the stream may be filtered, generally called
adaptation (filtering?, transcoding?)
67
Quality of Service (QoS)
timing and other nonfunctional requirements are expressed
as Quality of Service requirements
QoS requirements describe what is needed from the
underlying distributed system and network to ensure
acceptable delivery; e.g. viewing experience of a user
for continuous data, the concerns are
timeliness: data must be delivered in time
initial delay: maximum delay until a session has been
setup
maximum end-to-end delay
maximum delay variance or jitter
volume/bandwidth: the required throughput (bit rate) must
be met
reliability: a given level of loss of data must not be
exceeded
quality of perception: highly subjective
68
Enforcing QoS
the underlying system offers a best-effort delivery service
however, the Internet also provides mechanisms such as
differentiated services which categorizes packets into many
classes; for example, it has an expedited class to inform the
router to forward a packet with absolute priority
in addition, a distributed system can help to improve QoS
three methods: buffering, forward error correction, and
interleaving frames
1. Buffering - Client Side
buffer (store) flows on the receiving side (client machine)
before delivery (playback)
it smoothes jitter (for audio and video on demand since
jitter is the main problem) - does not affect reliability or
bandwidth, increases delay
69
how long to buffer?
2. Forward Error Correction - Client Side
packets may be lost
retransmission is not applicable for time-dependent data
the overhead for forward error correction may be high
70
3. Interleaving Frames - Server Side
a single packet may contain multiple audio and video frames
if such a packet is lost, there will be a large gap during play
back
hence, the idea is to distribute the effect of a packet loss
over time
but, a larger buffer is required at the receiver/client
for example, to play the first four frames, four packets need
to be delivered and stored
71
The effect of packet loss in (a) non interleaved transmission and
(b) interleaved transmission
72
Stream Synchronization
how to maintain temporal relations between streams
examples: lip synchronization or a slide show enhanced with
audio
two approaches
1. explicitly by operating on the data units of simple
streams; the responsibility of the application (not good
for applications to do it)
74
4.6 Multicast Communication
multicasting: delivery of data from one host to many
destinations; for instance for multimedia applications
a one-to-many relationship
1. Application-Level Multicasting
nodes are organized into an overlay network (a network
which is built on top of another network) and information is
disseminated to its members (routers are not involved as in
network-level routing)
how to construct the overlay network
nodes organize themselves as a tree with a unique path
between two pairs of nodes or
nodes organize into a mesh network and there will be
multiple paths between two nodes; adv: robust
2. Gossip-Based Data Transmission
use epidemic protocols where information is propagated
among a collection of nodes without a coordinator
for details read pages 166-174 75