0% found this document useful (0 votes)
11 views44 pages

DDP Unit V

The document discusses distributed system algorithms essential for the functioning of modern computing, covering topics such as communication, synchronization, consensus, replication, and security algorithms. It also outlines distributed computing system models, including physical, architectural, and fundamental models, detailing their components and interactions. Key concepts include load balancing, failure detection and recovery, and the importance of maintaining data consistency and availability across distributed systems.

Uploaded by

Priya Rajappa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views44 pages

DDP Unit V

The document discusses distributed system algorithms essential for the functioning of modern computing, covering topics such as communication, synchronization, consensus, replication, and security algorithms. It also outlines distributed computing system models, including physical, architectural, and fundamental models, detailing their components and interactions. Key concepts include load balancing, failure detection and recovery, and the importance of maintaining data consistency and availability across distributed systems.

Uploaded by

Priya Rajappa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

UDS24201J UNIT V

DISTRIBUTED SYSTEM ALGORITHMS


Distributed systems are the backbone of modern computing, but what keeps them running
smoothly? It's all about the algorithms. These algorithms are like the secret sauce, making sure
everything works together seamlessly. In this article, we'll break down distributed system
algorithms in simple language.
Important Topics for Distributed System Algorithms
 Communication Algorithms
 Synchronization Algorithms
 Consensus Algorithms
 Replication Algorithms
 Distributed Query Processing Algorithms
 Load Balancing Algorithms
 Distributed Data Structures and Algorithms
 Failure Detection and Failure Recovery Algorithms
 Security Algorithms for a Distributed Environment
1. Communication Algorithms
Communication algorithms are the guiding regulations for data exchanges that take place in a
distributed system between nodes. They cover a broad area of communication mechanisms,
message relay algorithms, and routing schemes for efficient data transmission and low latency.
Some examples are the MPI (the Message Passing Interface), RPC (Remote Procedure Call), and
pub-sub mechanisms, where different schemes are designed to fit different communication patterns
and requirements.
 Message Passing:
o This fundamental paradigm involves sending and receiving messages between nodes.
o Algorithms governing message passing dictate how messages are routed, delivered, and processed,
ensuring reliable and efficient communication.
 Publish-Subscribe:
o In this model, publishers produce messages or events, and subscribers express interest in receiving
specific types of messages.
o Publish-subscribe algorithms manage message dissemination, ensuring that subscribers receive
relevant messages on time.
 Group Communication:
o Group communication algorithms enable nodes to communicate and collaborate as a cohesive unit.
o They facilitate communication among a defined group of nodes, ensuring that messages are reliably
delivered to all group members.
2. Synchronization Algorithms
Synchronization Algorithms closely interact with each other to synchronize parallel executions
within dispensed nodes. This synchronization is enabled so that indifferent processes or threads
operate simultaneously to avoid race conditions, deadlocks, and inconsistencies.
For this purpose, distributed locks, semaphores, and distributed clocks play a significant part. The
combination of these guarantees safe synchronization of the system without compromising its
performance.
 Distributed Locks: Distributed locks are mechanisms used to coordinate access to shared resources
across multiple nodes in a distributed system.
 Semaphores: Semaphores are another synchronization primitive used to control access to shared
resources, particularly in concurrent programming. They can be used to limit the number of
concurrent accesses to a resource or to signal events between processes.

1
UDS24201J UNIT V

 Distributed Clocks: Distributed clocks are used to maintain synchronized timestamps across
multiple nodes in a distributed system.
3. Consensus Algorithms
Consensus algorithms allow the different nodes distributed throughout them to agree on a single
shared value or outcome in spite of individual node failures and disagreements among them
(meaning despite the situations when one of the nodes failed or there were discrepancies among
them).
 They provide a fundamental basis for distributed applications like distributed DBMS, blockchain,
blockchain networks, and BFT protocols such as Paxos, Raft, and BFT.
 These guidelines guarantee consistency and fault tolerance in the presence of various types of
pathways.
4. Replication Algorithms
Replication algorithms enable those processes of replicating multiple instructions of data in
different nodes, which boosts the level of fault tolerance, availability, and performance.
 They tell us how to distribute data, replicate it, and synchronize it to have consistency and
resistance in distributed databases, file systems, and web servers' environments.
 For instance, approaches like quorum-based replication, eventual consistency, and conflict
resolution algorithms can cope with the challenges of replication backups while at the same time
reducing overhead and cost.
5. Distributed Query Processing Algorithms
Distributed query processing algorithms in distributed systems involve executing queries across
multiple nodes to retrieve and process data distributed across the network. These algorithms aim to
optimize query performance, minimize communication overhead, and ensure data consistency.
Some distributed query processing algorithms include:
 Parallel Query Execution: Queries are divided into subtasks that can be executed concurrently on
multiple nodes. Results are then combined to form the final query result, reducing overall execution
time.
 Data Partitioning: Data is partitioned across multiple nodes based on a predefined scheme, such as
range partitioning or hash partitioning. Queries are then executed locally on each partition,
minimizing data transfer between nodes.
 Replica-Aware Query Routing: Queries are routed to nodes containing replicas of the required data,
minimizing network traffic and improving query performance by leveraging data locality.
 Join Algorithms: Join operations involving data from multiple nodes are optimized using
distributed join algorithms such as hash join or merge join, which minimize data transfer and
processing overhead.
6. Load Balancing Algorithms
The load balancing algorithms split and distribute the computation task or network traffic equally
among the nodes in order to avoid overloading and prevent the resources from getting used or
spent.
 They do a smart job of scheduling resources based on workload variances, node capacity, and the
metrics of performance.
 This is to ensure efficient resource usage and decrease the response time. Mechanisms like a round-
trip schedule and weighted load balancing ensure efficient sharing of work in changing distributed
systems.
Different Types of Load Balancing Algorithms are:
a. Round Robin: Requests are distributed evenly across servers in a circular manner. Each request is
forwarded to the next server in line, ensuring that all servers receive approximately the same
number of requests.

2
UDS24201J UNIT V

b. Least Connection: Incoming requests are sent to the server with the fewest active connections at
the time of the request. This helps to distribute the load based on the current capacity of each
server.
c. IP Hash: The IP address of the client is used to determine which server will handle the request.
Requests from the same IP address are consistently routed to the same server, which can be
beneficial for session persistence.
d. Weighted Round Robin: Similar to Round Robin, but servers are assigned weights to reflect their
capacity or performance. Servers with higher weights receive a proportionally higher number of
requests, allowing for more granular control over load distribution.
e. Least Response Time: Requests are forwarded to the server with the shortest response time or
lowest latency. This algorithm aims to minimize response times for clients by directing them to the
server that can respond most quickly.
7. Distributed Data Structures and Algorithms
Distributed Data Structures and Algorithms is the study of how to store and manipulate data on
multiple computers in a way that optimizes performance and provides high availability while
maintaining consistency of data in the face of concurrent updates by different users.
 The application of distributed data structures and algorithms designs the framework for storing,
retrieving, and working on the data in a distributed manner.
 They include distributed hash tables (DHTs), distributed queues, distributed caches, and consensus-
based data structures catering to particular distributed computing paradigms.
 These types of data structures and algorithms allow data to be stored and retrieved quickly across
numerous blocks and ensure data integrity when any node breaks down.
8. Failure Detection and Failure Recovery Algorithms
Failure detection and recovery algorithms in distributed systems are essential for maintaining
system reliability and availability in the face of node failures or network partitions. These
algorithms monitor the health and status of nodes in the system, detect failures promptly, and take
appropriate actions to recover from failures.
1. Failure Detection Algorithms:
 Heartbeat-Based Detection:
o Nodes periodically send heartbeat messages to indicate their liveness.
o Failure detectors monitor the arrival of these messages and trigger failure detection if a node fails to
send heartbeats within a specified timeout period.
 Neighbor Monitoring:
o Nodes monitor the status of their neighboring nodes by exchanging status information or
monitoring network connectivity.
o If a node detects that a neighbor is unresponsive, it assumes that the neighbor has failed.
 Quorum-Based Detection:
o Failure is detected when a quorum of nodes agrees on the unavailability of a particular node.
o This approach ensures that false positives are minimized and enhances the accuracy of failure
detection.
2. Failure Recovery Algorithms:
 Replication and Redundancy:
o Replicating data and services across multiple nodes ensures fault tolerance.
o In the event of a node failure, redundant copies can be used to continue providing service without
interruption.
 Automatic Failover:
o In systems with primary-backup replication, automatic failover mechanisms detect when a primary
node has failed and promote a backup node to become the new primary.
o This ensures continuity of service with minimal manual intervention.
 Recovery Protocols:

3
UDS24201J UNIT V

o Recovery protocols, such as the Two-Phase Commit (2PC) and Three-Phase Commit (3PC), ensure
data consistency and recover from partially completed transactions in the event of a failure.
9. Security Algorithms for a Distributed Environment
Security algorithms in distributed systems are designed to protect data, communication channels,
and system resources from unauthorized access, tampering, and other security threats. Some
security algorithms in distributed environment are:
 Cryptography: Protects data transmission and storage with encryption, hash functions, and digital
signatures.
 Authentication and Authorization: Verify user and node identities and grant access based on roles
and permissions.
 Access Control: Enforce policies to restrict access to resources based on user attributes and
permissions.
 Secure Communication Protocols: Encrypt data exchanged between nodes over the network and
provide mutual authentication.
 Intrusion Detection and Prevention: Monitor network traffic and system logs to detect and prevent
security breaches and unauthorized access.
 Key Management: Manage cryptographic keys for encryption, decryption, and authentication
securely.
*****
DISTRIBUTED COMPUTING SYSTEM MODELS
Distributed computing is a system where processing and data storage is distributed across multiple
devices or systems, rather than handled by a single central device. In this article, we will see
Distributed Computing System Models.
 Types of Distributed Computing System Models
o Physical Model
o Architectural Model
o Fundamental Model
Types of Distributed Computing System Models
1. Physical Model
A physical model represents the underlying hardware elements of a distributed system. It
encompasses the hardware composition of a distributed system in terms of computers and other
devices and their interconnections. It is primarily used to design, manage, implement, and
determine the performance of a distributed system.
A physical model majorly consists of the following components:
a. Nodes
Nodes are the end devices that can process data, execute tasks, and communicate with the other
nodes. These end devices are generally the computers at the user end or can be servers,
workstations, etc.
 Nodes provision the distributed system with an interface in the presentation layer that enables the
user to interact with other back-end devices, or nodes, that can be used for storage and database
services, processing, web browsing, etc.
 Each node has an Operating System, execution environment, and different middleware
requirements that facilitate communication and other vital tasks.,
b. Links
Links are the communication channels between different nodes and intermediate devices. These
may be wired or wireless. Wired links or physical media are implemented using copper wires, fiber
optic cables, etc. The choice of the medium depends on the environmental conditions and the
requirements. Generally, physical links are required for high-performance and real-time computing.
Different connection types that can be implemented are as follows:

4
UDS24201J UNIT V

 Point-to-point links: Establish a connection and allow data transfer between only two nodes.
 Broadcast links: It enables a single node to transmit data to multiple nodes simultaneously.
 Multi-Access links: Multiple nodes share the same communication channel to transfer data.
Requires protocols to avoid interference while transmission.
c. Middleware
These are the softwares installed and executed on the nodes. By running middleware on each node,
the distributed computing system achieves a decentralised control and decision-making. It handles
various tasks like communication with other nodes, resource management, fault tolerance,
synchronisation of different nodes and security to prevent malicious and unauthorised access.
d. Network Topology
This defines the arrangement of nodes and links in the distributed computing system. The most
common network topologies that are implemented are bus, star, mesh, ring or hybrid. Choice of
topology is done by determining the exact use cases and the requirements.
e. Communication Protocols
Communication protocols are the set rules and procedures for transmitting data from in the links.
Examples of these protocols include TCP, UDP, HTTPS, MQTT etc. These allow the nodes to
communicate and interpret the data.
2. Architectural Model
Architectural model in distributed computing system is the overall design and structure of the
system, and how its different components are organised to interact with each other and provide the
desired functionalities. It is an overview of the system, on how will the development, deployment
and operations take place. Construction of a good architectural model is required for efficient cost
usage, and highly improved scalability of the applications.
The key aspects of architectural model are:
a. Client-Server model
It is a centralised approach in which the clients initiate requests for services and severs respond by
providing those services. It mainly works on the request-response model where the client sends a
request to the server and the server processes it, and responds to the client accordingly.
 It can be achieved by using TCP/IP, HTTP protocols on the transport layer.
 This is mainly used in web services, cloud computing, database management systems etc.

5
UDS24201J UNIT V

b. Peer-to-peer model
It is a decentralised approach in which all the distributed computing nodes, known as peers, are all
the same in terms of computing capabilities and can both request as well as provide services to
other peers. It is a highly scalable model because the peers can join and leave the system
dynamically, which makes it an ad-hoc form of network.
 The resources are distributed and the peers need to look out for the required resources as and when
required.
 The communication is directly done amongst the peers without any intermediaries according to
some set rules and procedures defined in the P2P networks.
 The best example of this type of computing is BitTorrent.

c. Layered model
It involves organising the system into multiple layers, where each layer will provision a specific
service. Each layer communicated with the adjacent layers using certain well-defined protocols
without affecting the integrity of the system. A hierarchical structure is obtained where each layer
abstracts the underlying complexity of lower layers.

6
UDS24201J UNIT V

d. Micro-services model
In this system, a complex application or task, is decomposed into multiple independent tasks and
these services running on different servers. Each service performs only a single function and is
focussed on a specific business-capability. This makes the overall system more maintainable,
scalable and easier to understand. Services can be independently developed, deployed and scaled
without affecting the ongoing services.

3. Fundamental Model
The fundamental model in a distributed computing system is a broad conceptual framework that
helps in understanding the key aspects of the distributed systems. These are concerned with more
formal description of properties that are generally common in all architectural models. It represents
the essential components that are required to understand a distributed system’s behaviour. Three
fundamental models are as follows:
a. Interaction Model
Distributed computing systems are full of many processes interacting with each other in highly
complex ways. Interaction model provides a framework to understand the mechanisms and patterns
that are used for communication and coordination among various processes. Different components
that are important in this model are –
 Message Passing – It deals with passing messages that may contain, data, instructions, a service
request, or process synchronisation between different computing nodes. It may be synchronous or
asynchronous depending on the types of tasks and processes.
 Publish/Subscribe Systems – Also known as pub/sub system. In this the publishing process can
publish a message over a topic and the processes that are subscribed to that topic can take it up and
execute the process for themselves. It is more important in an event-driven architecture.
b. Remote Procedure Call (RPC)
It is a communication paradigm that has an ability to invoke a new process or a method on a remote
process as if it were a local procedure call. The client process makes a procedure call
using RPC and then the message is passed to the required server process using communication

7
UDS24201J UNIT V

protocols. These message passing protocols are abstracted and the result once obtained from the
server process, is sent back to the client process to continue execution.

4. Failure Model
This model addresses the faults and failures that occur in the distributed computing system. It
provides a framework to identify and rectify the faults that occur or may occur in the system. Fault
tolerance mechanisms are implemented so as to handle failures by replication and error detection
and recovery methods. Different failures that may occur are:
 Crash failures – A process or node unexpectedly stops functioning.
 Omission failures – It involves a loss of message, resulting in absence of required communication.
 Timing failures – The process deviates from its expected time quantum and may lead to delays or
unsynchronised response times.
 Byzantine failures – The process may send malicious or unexpected messages that conflict with the
set protocols.
5. Security Model
Distributed computing systems may suffer malicious attacks, unauthorised access and data
breaches. Security model provides a framework for understanding the security requirements,
threats, vulnerabilities, and mechanisms to safeguard the system and its resources. Various aspects
that are vital in the security model are:

 Authentication: It verifies the identity of the users accessing the system. It ensures that only the
authorised and trusted entities get access. It involves –
o Password-based authentication: Users provide a unique password to prove their identity.

8
UDS24201J UNIT V

o Public-key cryptography: Entities possess a private key and a corresponding public key, allowing
verification of their authenticity.
o Multi-factor authentication: Multiple factors, such as passwords, biometrics, or security tokens, are
used to validate identity.
 Encryption:
o It is the process of transforming data into a format that is unreadable without a decryption key. It
protects sensitive information from unauthorized access or disclosure.
 Data Integrity:
o Data integrity mechanisms protect against unauthorised modifications or tampering of data. They
ensure that data remains unchanged during storage, transmission, or processing. Data integrity
mechanisms include:
o Hash functions – Generating a hash value or checksum from data to verify its integrity.
o Digital signatures – Using cryptographic techniques to sign data and verify its authenticity and
integrity.

*****
MESSAGE PASSING IN DISTRIBUTED SYSTEM
Message passing in distributed systems refers to the communication medium used by nodes
(computers or processes) to communicate information and coordinate their actions. It involves
transferring and entering messages between nodes to achieve various goals such as coordination,
synchronization, and data sharing.
What is Message Passing in Distributed Systems?
The method by which entities or processes in a distributed system communicate and exchange data
is known as message passing. It enables several components, which could be operating on different
computers or nodes connected by a network, to plan their actions, exchange data, and work together
to accomplish shared objectives.
 Models like synchronous and asynchronous message passing offer different synchronization and
communication semantics to suit system requirements.
 Synchronous message passing ensures sender and receiver synchronization, while asynchronous
message passing allows concurrent execution and non-blocking communication.

9
UDS24201J UNIT V

Importance of Message Passing


In distributed systems, where multiple independent components work together to perform tasks,
message passing is crucial for inter-process communication (IPC). It enables applications to
distribute workloads, share resources, synchronize actions, and handle concurrent activities across
different nodes efficiently.
Types of Message Passing in Distributed Systems
Message passing describes the method by which nodes or processes interact and share information
in distributed systems. Message passing can be divided into two main categories according to the
sender and reciever’s timing and synchronization:
1. Synchronous Message Passing
Synchronous message passing involves a tightly coordinated interaction between the sender and
receiver. The key characteristics include:
 Timing Coordination: Before proceeding with execution, the sender waits for the recipient to
confirm receipt of the message or finish processing it.
 Request-Response Pattern: often use a request-response paradigm in which the sender sends a
message requesting something and then waits for the recipient to react.
 Advantages:
o Ensures precise synchronization between communicating entities.
o Simplifies error handling as the sender knows when the message has been successfully received or
processed.
 Disadvantages:
o May introduce latency if the receiver is busy or unavailable.
o Synchronous blocking can reduce overall system throughput if many processes are waiting for
responses.
2. Asynchronous Message Passing
Asynchronous message passing allows processes to operate independently of each other in terms of
timing. Key features include:
 Decoupled Timing: The sender does not wait for an immediate response from the receiver after
sending a message. It continues its execution without blocking.
 Event-Driven Model: Communication is often event-driven, where processes respond to messages
or events as they occur asynchronously.
 Advantages:
o Enhances system responsiveness and throughput by allowing processes to execute concurrently.
o Allows for interactions that are loosely connected, allowing processes to process messages at their
own speed.
 Disadvantages:
o Requires additional mechanisms (like callbacks or event handlers) to manage responses or
coordinate actions.
o Handling out-of-order messages or ensuring message delivery reliability can be more complex
compared to synchronous communication.
 Time Complexity:
In asynchronous systems, there's no explicit notion of time, but it can be defined by assumin g
each message is delivered and processed within a certain time unit after being sent. Time
complexity is then measured by the maximum number of these asynchronous steps required for
an operation to complete.
 Message Complexity:
This refers to the total number of messages exchanged during the execution of a distributed
algorithm or system. It's a crucial factor in evaluating the efficiency of algorithms, especially in
distributed systems.

10
UDS24201J UNIT V

3. Unicast Messaging
Unicast messaging is a one-to-one communication where a message is sent from a single sender to
a specific receiver. The key characteristics include:
 Direct Communication: The message is targeted at a single, specific node or endpoint.
 Efficiency for Point-to-Point: Since only one recipient receives the message, resources are
efficiently used for direct, point-to-point communication.
 Advantages:
o Optimized for targeted communication, as the message is only sent to the intended recipient.
o Minimizes network load compared to group messaging, as it doesn’t broadcast to unnecessary
nodes.
 Disadvantages:
o Not scalable for group communications; sending multiple unicast messages can strain the system in
larger networks.
o Can increase the complexity of managing multiple unicast connections in large-scale applications.
4. Multicast Messaging
Multicast messaging enables one-to-many communication, where a message is sent from one
sender to a specific group of receivers. The key characteristics include:
 Group-Based Communication: Messages are delivered to a subset of nodes that have joined the
multicast group.
 Efficient for Groups: Saves bandwidth by sending the message once to all nodes in the group
instead of individually.
 Advantages:
o Reduces network traffic by sending a single message to multiple recipients, making it ideal for
content distribution or group updates.
o Scales efficiently for applications where data needs to reach specific groups, like video
conferencing or online gaming.
 Disadvantages:
o Complex to implement as nodes need mechanisms to manage group memberships and handle node
join/leave requests.
o Not all network infrastructures support multicast natively, which can limit its applicability.
5. Broadcast Messaging
Broadcast messaging involves sending a message from one sender to all nodes within the network.
The key characteristics include:
 Wide Coverage: The message is sent to every node, ensuring that all nodes in the network receive
it.
 Network-Wide Reach: Suitable for announcements, alerts, or updates intended for all nodes without
targeting specific ones.
 Advantages:
o Guarantees that every node in the network receives the message, which is useful for critical
notifications or status updates.
o Simplifies dissemination of information when all nodes need to be aware of an event or data
change.
 Disadvantages:
o Consumes significant network resources since every node, regardless of relevance, receives the
message.
o Can lead to unnecessary processing at nodes that don’t need the message, potentially causing
inefficiency.
*****

11
UDS24201J UNIT V

WHAT IS CLOCK SYNCHRONIZATION IN DISTRIBUTED SYSTEMS?


Clock synchronization in distributed systems refers to the process of ensuring that all clocks
across various nodes or computers in the system are set to the same time or at least have their
times closely aligned.
 In a distributed system, where multiple computers communicate and collaborate over a network,
each computer typically has its own local clock.
 However, due to factors such as hardware differences, network delays, and clock drift
(inaccuracies in timekeeping), these local clocks can drift apart over time.
Importance of Clock Synchronization
Below are the importance of clock synchronization in distributed system:
 Consistency and Coherence:
o Clock synchronization ensures that timestamps and time-based decisions made across different
nodes in the distributed system are consistent and coherent. This is crucial for maintaining the
correctness of distributed algorithms and protocols.
 Event Ordering:
o Many distributed systems rely on the notion of event ordering based on timestamps to ensure
causality and maintain logical consistency. Clock synchronization helps in correctly ordering
events across distributed nodes.
 Data Integrity and Conflict Resolution:
o In distributed databases and file systems, synchronized clocks help in timestamping data
operations accurately. This aids in conflict resolution and maintaining data integrity, especially in
scenarios involving concurrent writes or updates.
 Fault Detection and Recovery:
o Synchronized clocks facilitate efficient fault detection and recovery mechanisms in distributed
systems. Timestamps can help identify the sequence of events leading to a fault, aiding in
debugging and recovery processes.
 Security and Authentication:
o Timestamps generated by synchronized clocks are crucial for security protocols, such as in
cryptographic operations and digital signatures. They provide a reliable basis for verifying the
authenticity and temporal validity of transactions and messages.
Bridging Time Gaps
Clock synchronization in distributed systems aims to establish a reference for time across nodes.
Imagine a scenario where three distinct systems are part of a distributed environment. In order for
data exchange and coordinated operations to take place these systems must have a shared
understanding of time.
Achieving clock synchronization ensures that data flows seamlessly between them tasks are
executed coherently and communication happens without any ambiguity.
Types of Clock Synchronization in Distributed Systems
Below are the types of clock synchronization in distributed systems:
1. Physical clock synchronization
In distributed systems each node operates with its clock, which can lead to time differences.
However the goal of physical clock synchronization is to overcome this challenge. This involves
equipping each node with a clock that is adjusted to match Universal Coordinated Time (UTC) a
recognized standard. By synchronizing their clocks in this way diverse systems, across the
distributed landscape can maintain harmony.
 Addressing Time Disparities: When it comes to distributed systems each node operates with its
clock, which can result in variations. The goal of physical clock synchronization is to minimize
these disparities by aligning the clocks.

12
UDS24201J UNIT V

 Using UTC as a Common Reference Point: The key to achieving this synchronization lies in
adjusting the clocks to adhere to an accepted standard known as Universal Coordinated Time
(UTC). UTC offers a reference for all nodes.
2. Logical clock synchronization
In distributed systems absolute time often takes a backseat to clock synchronization. Think of
clocks as storytellers that prioritize the order of events than their exact timing. These clocks
enable the establishment of connections between events like weaving threads of cause and effect.
By bringing order and structure into play, task coordination within distributed systems becomes
akin to a choreographed dance where steps are sequenced for execution.
 Event Order Over Absolute Time: In the realm of distributed systems logical clock
synchronization focuses on establishing the order of events than relying on absolute time. Its
primary objective is to establish connections between events.
 Approach towards Understanding Behavior: Logical clocks serve as storytellers weaving together
a narrative of events. This narrative enhances comprehension and facilitates coordination within
the distributed system.
3. Mutual exclusion synchronization
In the bustling symphony of distributed systems one major challenge is managing shared
resources. Imagine multiple processes competing for access, to the resource simultaneously. To
address this issue mutual exclusion synchronization comes into play as an expert technique that
reduces chaos and promotes resource harmony. This approach relies on creating a system where
different processes take turns accessing shared resources.
 Managing Resource Conflicts: In the ecosystem of distributed systems multiple processes often
compete for shared resources simultaneously. To address this issue mutual exclusion
synchronization enforces a mechanism for accessing resources.
 Enhancing Efficiency through Sequential Access: This synchronization approach ensures that
resources are accessed sequentially minimizing conflicts and collisions. By orchestrating access,
in this manner resource utilization and overall system efficiency are optimized.
*****
MESSAGE ORDERING IN DISTRIBUTED SYSTEMS
The order in which messages are processed determines the final outcome of the actions in any
distributed system. This is actually more difficult than it appears to be.
Why is the order of messages important?
Consider the following scenario: A user initially dislikes a post but quickly realizes they want to like
it and does so almost immediately.
Since network delays can vary, let's say the "like" request comes before the "dislike" request. We
would dislike the post if we processed the messages in the order they were received, but the user’s
intent was to like the post.
Thus, determining the order of two messages is critical for our applications.
Types of Ordering
There are two ways to order.
1. Total Order
2. Partial Order
Total Order
Given a set of events "A, B,C,D,’ there exists a single order of all messages in total order.
For instance, if we have a set of numbers 7,4,5,9,1 and want to order them using the < relation, there
is only one order1,4,5,7,9.
Total order is used in single-machine systems where it is easy to figure out the order of events
because they happen one after the other. For any events that happen at the same time, the machine's
clock can be used since it’s a single global clock for all processes.

13
UDS24201J UNIT V

Total ordering, on the other hand, is not possible in distributed systems because different machines
have different clocks and concurrent events cannot be ordered.
Partial Order
Only some events can be ordered in a partial order, while others cannot. As a result, multiple orders
are generated for a given set of values.
For example, if we have a list of events "A, B,C,D’ and we know that events A and B are ordered,
but Event C occurred at the exact same millisecond as Event A and Event D occurred at the exact
same millisecond as Event B, there is no way to order these four events in a single order because
they occurred concurrently.
Partial order is commonly used in distributed systems because some events occur in order while
others do not, which is consistent with the theme of partial ordering.
Happens Before Relation
When we have two events A and B, we say that A happens before B (A -> B) if
1. Event A occurs before Event B in the Node Execution Order
2. Event A is the transmission of Message M, and Event B is the reception of the same message,
because a message cannot be received before it is transmitted.
3. There is a Message C such that A occurs before C and C occurs before B.
If we can't establish a happens-before relationship between two messages, it means they happened at
the same time. ( a || b ).

In this diagram
1. Events A and B occurred in the same node, and A occurred before B in the execution order of the
node.
2. Event A is the transmission of Message M, and Event B is the arrival of the same message, because
a message cannot be received before it is transmitted.
3. There is a Message C such that A occurs before C and C occurs before B.
Causality
Take, for example, social media. Do we really care about the order in which we see two unrelated
posts? Most likely not.
As a result, the system could use Partial Ordering here, where posts that cannot be ordered are
displayed in a random order.
However, there are some events that must be shown in chronological order. For example, if User A
responds to Comment C1 by User B, User A’s comment must appear after User B’s comment. The
conversation would be difficult to follow otherwise.
What we described in the preceding example is the concept of Causality, which states that one event
contributes to the occurrence of another, i.e. Event B occurred solely because Event A occurred.

14
UDS24201J UNIT V

Based on what we’ve seen so far, we can also conclude that if Event A happened before Event B,
then Event A may have caused Event B. It is critical to emphasise that this is a possibility, not a
guarantee.
However, if Events A and B occur concurrently, we can be certain that Event A did not cause Event
B and vice versa.
What is Group Communication in Distributed Systems?
Group communication in distributed systems refers to the process where multiple nodes or
entities communicate with each other as a group.
 Instead of sending messages to individual recipients, group communication allows a sender to
transmit information to all members of a group simultaneously.
 This method is essential for coordinating actions, sharing data, and ensuring that all participants
in the system are informed and synchronized. It’s particularly useful in scenarios like
collaborative applications and real-time updates.
Importance of Group Communication in Distributed Systems
Group communication is critically important in distributed systems due to several key reasons:
 Multiple nodes must collaborate and synchronize their actions. Group communication helps them
exchange information and stay updated.
 Different nodes can create data that needs to be shared. Group communication helps quickly send
this information to everyone involved, reducing delays and keeping data consistent.
 Group communication protocols enhance reliability by allowing messages to be replicated or
acknowledged across multiple nodes. This ensures robust communication, even during failures or
network issues.
 As distributed systems expand, effective scaling is crucial. Group communication mechanisms
can manage more nodes and messages without sacrificing performance, keeping the system
efficient and responsive.
Types of Group Communication in a Distributed System
Below are the three types of group communication in distributed systems:
1. Unicast Communication
Unicast communication is the point-to-point transmission of data between two nodes in a
network. In the context of distributed systems:
 Unicast is when a sender sends a message to a specific recipient, using their unique network
address.
 Each message targets one recipient, creating a direct connection between the sender and the
receiver.
 You commonly see unicast in client-server setups, where a client makes requests and receives
responses, as well as in direct connections between peers.
 This method makes good use of network resources, is easy to implement, and keeps latency low
because messages go straight to the right person.
 Unicast isn’t efficient for sending messages to many recipients at once, as it requires separate
messages for each one, leading to more work.

15
UDS24201J UNIT V

2. Multicast Communication

Multicast communication involves sending a single message from one sender to multiple
receivers simultaneously within a network. It is particularly useful in distributed systems where
broadcasting information to a group of nodes is necessary:
 Multicast lets a sender share a message with a specific group of people who want it.
 This way, the sender can reach many people at once, which is more efficient than sending
separate messages.
 This approach is often used to send updates to subscribers or in collaborative applications where
real-time sharing of changes is needed.
 By sending data just once to a group, multicast saves bandwidth, simplifies communication, and
can easily handle a larger number of recipients.
 Managing group membership is necessary to ensure reliable message delivery, and multicast can
run into issues if there are network problems that affect everyone in the group.

3. Broadcast Communication
Broadcast communication involves sending a message from one sender to all nodes in the
network, ensuring that every node receives the message:

 Broadcast is when a sender sends a message to every node in the network without targeting
specific recipients.
 Messages are delivered to all nodes at once using a special address designed for this purpose.
 It’s often used for network management tasks, like sending status updates, or for emergency
alerts that need to reach everyone quickly.
 Broadcast ensures that every node receives the message without needing to specify who the
recipients are, making it efficient for sharing information widely.
 It can cause network congestion in larger networks and raises security concerns since anyone on
the network can access the broadcast message, which might lead to unauthorized access.

16
UDS24201J UNIT V

Reliable Multicast Protocols for Group Communication


Messages sent from a sender to multiple recipients should be delivered reliably, consistently, and
in a specified order. Types of Reliable Multicast Protocols include:
 FIFO Ordering:
o Ensures that messages are delivered to all group members in the order they were sent by the
sender.
o Achieved by sequencing messages and delivering them sequentially to maintain the correct order.
 Causal Ordering:
o Preserves the causal relationships between messages based on their dependencies.
o Ensures that messages are delivered in an order that respects the causal dependencies observed by
the sender.
 Total Order and Atomicity:
o Guarantees that all group members receive messages in the same global order.
o Ensures that operations based on the multicast messages (like updates to shared data) appear
atomic or indivisible to all recipients.
Scalability for Group Communication
Scalability and performance are vital for effective group communication in distributed systems.
It’s essential for the system to handle more nodes, messages, and participants while still operating
efficiently. Here’s a closer look at these important aspects:
1. Scalability
Scalability refers to the system’s ability to grow without losing efficiency. This includes:
 Managing an increasing number of nodes or participants while keeping communication smooth.
 Handling more messages exchanged among group members, ensuring timely and responsive
communication.
 Supporting connections across distant nodes or networks, which can introduce latency and
bandwidth issues.
2. Challenges in Scalability
As the group size grows, several challenges arise:
 The management of group membership and message routing becomes more complex, adding
overhead.
 The network must have enough bandwidth to support the higher traffic from a larger group to
avoid congestion.
 Keeping distributed nodes consistent and synchronized gets more complicated as the system
scales.
3. Strategies for Scalability
To tackle these challenges, various strategies can be employed:
 Partitioning and Sharding: Breaking the system into smaller parts can make communication and
management more manageable.
 Load Balancing: Evenly distributing the workload across nodes helps prevent bottlenecks and
optimizes resource use.
 Replication and Caching: Duplicating data or messages across nodes can lower access times and
enhance fault tolerance, aiding scalability.
Performance for Group Communication
Performance in group communication is crucial for optimizing message speed, resource
utilization, and addressing challenges like message ordering and concurrent access, ensuring
efficient collaboration in distributed systems.
1. Performance
In group communication, performance focuses on a few key areas:
 It’s crucial to minimize the time it takes for messages to reach their intended recipients.
 We want to enhance the rate at which messages are handled and sent out.

17
UDS24201J UNIT V

 Efficient use of bandwidth, CPU, and memory helps keep communication fast and effective.
2. Challenges in Performance
There are challenges that come with achieving high performance:
 Making sure messages arrive in the right order can be tough, especially with strict protocols.
 Handling multiple users trying to access shared resources at the same time can lead to
slowdowns.
 Communication needs to adjust based on changing conditions, like slow bandwidth or lost
packets.
3. Strategies for Improvement
To boost performance, consider these strategies:
 Smart Routing: Using efficient routing methods can reduce delays by cutting down on the number
of hops messages take.
 Asynchronous Communication: This allows senders and receivers to work independently,
improving responsiveness.
 Pre-storing Data: Keeping frequently accessed messages or data ready can help lower delays and
speed up responses.
Challenges of Group Communication in Distributed Systems
Group communication in distributed systems comes with several challenges due to the need to
coordinate multiple nodes that may be spread out or connected through unreliable networks. Key
challenges include:
 Reliability: Messages must reach all intended recipients, even during network failures or node
crashes, which can be complicated when nodes frequently join or leave.
 Scalability: As the number of participants grows, managing communication effectively becomes
harder, leading to issues with bandwidth usage and processing delays.
 Concurrency and Consistency: Keeping shared data consistent while allowing simultaneous
updates is tricky, requiring strong synchronization to avoid conflicts.
 Fault Tolerance: The system must handle node failures and communication issues without losing
reliability. This means having mechanisms to detect failures and manage changes in group
membership.
*****
INTRODUCTION TO SCALA
Scala is a general-purpose, high-level, multi-paradigm programming language. It is a pure object-
oriented programming language which also provides the support to the functional programming
approach. There is no concept of primitive data as everything is an object in Scala. It is designed to
express the general programming patterns in a refined, succinct, and type-safe way. Scala programs
can convert to bytecodes and can run on the JVM(Java Virtual Machine). Scala stands
for Scalable language. It also provides the JavaScript runtimes. Scala is highly influenced by Java
and some other programming languages like Lisp, Haskell, Pizza, etc.
Evolution of Scala:

Scala was designed by the Martin Odersky, professor of programming methods at École
Polytechnique Fédérale de Lausanne (EPFL) in Switzerland and a German computer scientist.
Martin Odersky is also the co-creator of javac (Java Compiler), Generic Java, and EPFL’s
Funnel programming language. He started to design the Scala in 2001. Scala was first released
publicly in 2004 on the Java platform as its first version. In June 2004, Scala was modified for
the .Net Framework. Soon it was followed by second version i.e. (v2.0) in 2006.
At JavaOne conference in 2012, Scala was awarded as the winner of the ScriptBowl contest. From
June 2012, Scala doesn’t provide any support for .Net Framework. The latest version of scala is
2.12.6 which released on 27-Apr-2018.

18
UDS24201J UNIT V

Why Scala?
Scala has many reasons for being popular among programmers. Few of the reasons are :
 Easy to Start: Scala is a high level language so it is closer to other popular programming languages
like Java, C, C++. Thus it becomes very easy to learn Scala for anyone. For Java programmers,
Scala is more easy to learn.
 Contains best Features: Scala contains the features of different languages like C, C++, Java, etc.
which makes the it more useful, scalable and productive.
 Close integration with Java: The source code of the Scala is designed in such a way that its
compiler can interpret the Java classes. Also, Its compiler can utilize the frameworks, Java
Libraries, and tools etc. After compilation, Scala programs can run on JVM.
 Web – Based & Desktop Application Development: For the web applications it provides the
support by compiling to JavaScript. Similarly for desktop applications, it can be compiled to JVM
bytecode.
 Used by Big Companies: Most of the popular companies like Apple, Twitter, Walmart, Google etc.
move their most of codes to Scala from some other languages. reason being it is highly scalable and
can be used in backend operations.
Note: People always thinks that Scala is a extension of Java. But it is not true. It is just completely
interoperable with Java. Scala programs get converted into .class file which contains Java Byte
Code after the successful compilation and then can run on JVM(Java Virtual Machine).
Beginning with Scala Programming
Finding a Compiler: There are various online IDEs such as GeeksforGeeks IDE, Scala Fiddle IDE,
etc. which can be used to run Scala programs without installing.
Programming in Scala: Since the Scala is a lot similar to other widely used languages syntactically,
it is easier to code and learn in Scala. Programs can be written in Scala in any of the widely used
text editors like Notepad++, gedit, etc. or on any of the text-editors. After writing the program, save
the file with the extension .sc or .scala.
For Windows & Linux: Before installing the Scala on Windows or Linux, you must have Java
Development Kit(JDK) 1.8 or greater installed on your system. Because Scala always runs on Java
1.8 or above.
In this article, we will discuss how to run the Scala programs on online IDE’s.
Example : A simple program to print Hello Geeks! using object-oriented approach.
Scala

// Scala program to print Hello, Geeks!


// by using object-oriented approach
// creating object
object Geeks {
// Main method
def main(args: Array[String])
{

// prints Hello, Geeks!


println("Hello, Geeks!")
}
}

Output:
Hello, Geeks!

19
UDS24201J UNIT V

Comments: Comments are used for explaining the code and are used in a similar manner as in Java
or C or C++. Compilers ignore the comment entries and do not execute them. Comments can be of
a single line or multiple lines.
 Single line Comments:
Syntax:
// Single line comment
 Multi line comments:
Syntax:
/* Multi-line comments
syntax */
object Geeks: object is the keyword which is used to create the objects. Here “Geeks” is the name
of the object.
def main(args: Array[String]): def is the keyword in Scala which is used to define the function and
“main” is the name of Main Method. args: Array[String] are used for the command line arguments.
println(“Hello, Geeks!”): println is a method in Scala which is used to display the string on console.
Note: There is also functional approach that can be used in Scala programs. Some Online IDE
doesn’t provide support for it. We will discuss it in upcoming articles.
Features of Scala
There are many features which makes it different from other languages.
 Object- Oriented: Every value in Scala is an object so it is a purely object-oriented programming
language. The behavior and type of objects are depicted by the classes and traits in Scala.
 Functional: It is also a functional programming language as every function is a value and every
value is an object. It provides the support for the high-order functions, nested functions, anonymous
functions, etc.
 Statically Typed: The process of verifying and enforcing the constraints of types is done at compile
time in Scala. Unlike other statically typed programming languages like C++, C, etc., Scala doesn’t
expect the redundant type information from the user. In most cases, the user has no need to specify
a type.
 Extensible: New language constructs can be added to Scala in form of libraries. Scala is designed to
interpolate with the JRE(Java Runtime Environment).
 Concurrent & Synchronize Processing: Scala allows the user to write the codes in an immutable
manner that makes it easy to apply the parallelism(Synchronize) and concurrency.
 Run on JVM & Can Execute Java Code: Java and Scala have a common runtime environment. So
the user can easily move from Java to Scala. The Scala compiler compiles the program into .class
file, containing the Bytecode that can be executed by JVM. All the classes of Java SDK can be used
by Scala. With the help of Scala user can customize the Java classes.
Advantages:
 Scala’s complex features provided the better coding and efficiency in performance.
 Tuples, macros, and functions are the advancements in Scala.
 It incorporates the object-oriented and functional programming which in turn make it a powerful
language.
 It is highly scalable and thus provides a better support for backend operations.
 It reduces the risk associated with the thread-safety which is higher in Java.
 Due to the functional approach, generally, a user ends up with fewer lines of codes and bugs which
result in higher productivity and quality.
 Due to lazy computation, Scala computes the expressions only when they are required in the
program.

20
UDS24201J UNIT V

 There are no static methods and variables in Scala. It uses the singleton object(class with one object
in the source file).
 It also provides the Traits concept. Traits are the collection of abstract and non-abstract methods
which can be compiled into Java interfaces.
Disadvantages:
 Sometimes, two approaches make the Scala hard to understand.
 There is a limited number of Scala developers available in comparison to Java developers.
 It has no true-tail recursive optimization as it runs on JVM.
 It always revolves around the object-oriented concept because every function is value and every
value is an object in Scala.
Applications:
 It is mostly used in data analysis with the spark.
 Used to develop the web-applications and API.
 It provide the facility to develop the frameworks and libraries.
 Preferred to use in backend operations to improve the productivity of developers.
 Parallel batch processing can be done using Scala.
*****
TERMINATION DETECTION ALGORITHM AND REASONING WITH KNOWLEDGE
In the context of distributed systems, termination detection algorithms aim to determine when a
distributed computation has completed, a task complicated by the lack of global knowledge and
time synchronization. Reasoning with knowledge, such as process states and message counts, helps
algorithms make informed decisions about termination.
 Termination Detection Problem:
In distributed systems, where processes operate independently without a central authority,
determining when all processes have finished their work and the computation is complete is a
challenging problem.
 Why it's non-trivial:
 No Global Knowledge: Each process only has local information about its own state and the
messages it receives.
 No Global Time: Distributed systems often lack a shared clock, making it difficult to synchronize
events across processes.
 Termination Detection Algorithms:
 Basic Idea: These algorithms rely on processes communicating to infer when all processes have
become idle and no more messages are in transit.
 Control Messages: Special messages, distinct from the computational messages, are used to
facilitate the termination detection process.
 Examples:
 Snapshot-based algorithms: Processes take snapshots of their local state and the state of the
communication channels to determine if the computation has reached a consistent state.
 Message-counting algorithms: Processes keep track of the number of messages sent and received
to determine when all messages have been processed.
 Reasoning with Knowledge:
 Process States: Algorithms can use information about the states of processes (active/idle) to infer
when the computation is complete.

21
UDS24201J UNIT V

 Message Counts: Tracking the number of messages can help determine if all messages have been
processed and no further computation is needed.
 Logical Clocks: Logical clocks, which are used to order events in a distributed system, can be
used to determine when all events have occurred.
 Huang's Algorithm:
A well-known algorithm for termination detection, which uses a snapshot-based approach.
 Chandy-Lamport Algorithm:
Another well-known algorithm for snapshot recording, which can be used in conjunction with
termination detection.
Huang’s Termination detection algorithm
Huang’s algorithm is an algorithm for detecting termination in a distributed system. The
algorithm was proposed by Shing-Tsaan Huang in 1989 in the Journal of Computers. In a
distributed system, a process is either in an active state or in an idle state at any given point of
time. Termination occurs when all of the processes becomes idle and there are no any in
transit(on its way to be delivered) computational message. Assumptions of the algorithm:
 One of the co-operating processes which monitors the computation is called the controlling agent.
 The initial weight of controlling agent is 1
 All other processes are initially idle and have weight 0.
 The computation starts when the controlling agent send a computation message to one of the
processes.
 The process become active on receiving a computation message.
 Computation message can be sent only by controlling agent or an active process.
 Control message is sent to controlling agent by an active process when they are becoming idle.
 The algorithm assigns a weight W (such that 0 < W < 1 ) to every active process and every in
transit message.
Notations used in the algorithm:
 B(DW): Computation message with weight DW
 C(DW): Control message with weight DW
Algorithm:
 Rule to send B(DW) –
o Suppose Process P with weight W is sending B(DW) to process Q
o Split the weight of the process P into W1 and W2. Such that
W = W1 + W2 and W1 > 0, W2 > 0
 Set weight of the process P as W1 ( i.e W = W1 )
 Send B(W2) to process Q, here DW = W2.
 Note: Only the Controlling agent or any active process can send Computation message.
 On receiving B(DW) by process Q –
o Add the weight DW to the weight of process Q i.e for process Q, W = W + DW
o If process Q was idle, it will become active on receiving B(DW).
 Rule to send C(DW) –
o Any active process having weight W can become idle by sending C(W) to controlling agent
o Send a control message C(W) to the controlling agent. Here DW = W.
o Set weight of the process as 0 i.e W = 0. (After this process will become idle.)
 On receiving C(DW) by controlling agent –
o Add the weight received through control message to the weight of controlling agent i.e W = W +
DW
o After adding, if the weight of controlling agent becomes 1 then it can be conclude that the
computation has terminated.

22
UDS24201J UNIT V

Advantages of Huang’s Algorithm:


 Guaranteed termination: Termination Detection Algorithms ensure that a distributed computation
terminates eventually, even in the presence of failures such as process crashes, message loss, or
network partitioning. This guarantees that the distributed system will not be stuck in an infinite
loop or deadlock.
 Minimal overhead: Termination Detection Algorithms have a low communication overhead,
which means that they do not significantly impact the performance of the distributed system. This
is important because many distributed systems have strict latency requirements.
 Scalability: Termination Detection Algorithms are scalable, which means that they can handle
large distributed systems with many processes.
 Fault tolerance: Termination Detection Algorithms are fault-tolerant, which means that they can
handle process crashes, message loss, and other types of failures that can occur in a distributed
system.
 Easy to implement: Termination Detection Algorithms are easy to implement and do not require
any special hardware or software.
Limitations of Huang’s Algorithm:
 Sensitivity to initialization: The performance of Huang’s Algorithm can be sensitive to the
initialization of the factors, which can impact the quality of the extracted features. The algorithm
may need to be run multiple times with different initializations to find the best solution.
 Computational complexity: Huang’s Algorithm can be computationally expensive, especially
when dealing with large datasets or high-dimensional data. This can limit its applicability to real-
world problems that require fast and efficient computations.
 Non-unique solutions: The NMF model produced by Huang’s Algorithm is not unique, meaning
that there can be multiple factorizations that result in similar reconstruction errors. This can make
it difficult to interpret the results of the analysis.
 Limited interpretability: While the NMF model produced by Huang’s Algorithm has a clear
interpretation in terms of the extracted factors, the factors themselves may not be easily
interpretable or meaningful. This can limit the usefulness of the results in some applications.
 Limited applicability: Huang’s Algorithm is designed specifically for non-negative matrix
factorization, which means that it may not be suitable for all types of data or applications. Other
dimensionality reduction techniques, such as principal component analysis (PCA), may be more
appropriate in some cases.
Chandy–Lamport’s global state recording algorithm
Each distributed system has a number of processes running on a number of different physical
servers. These processes communicate with each other via communication channels using text
messaging. These processes neither have a shared memory nor a common physical clock, this
makes the process of determining the instantaneous global state difficult.
A process could record it own local state at a given time but the messages that are in transit (on its
way to be delivered) would not be included in the recorded state and hence the actual state of the
system would be incorrect after the time in transit message is delivered.
Chandy and Lamport were the first to propose a algorithm to capture consistent global state of a
distributed system. The main idea behind proposed algorithm is that if we know that all message
that have been sent by one process have been received by another then we can record the global
state of the system.
Any process in the distributed system can initiate this global state recording algorithm using a
special message called MARKER. This marker traverse the distributed system across all
communication channel and cause each process to record its own state. In the end, the state of
entire system (Global state) is recorded. This algorithm does not interfere with normal execution of
processes.

23
UDS24201J UNIT V

Assumptions of the algorithm:


 There are finite number of processes in the distributed system and they do not share memory and
clocks.
 There are finite number of communication channels and they are unidirectional and FIFO ordered.
 There exists a communication path between any two processes in the system
 On a channel, messages are received in the same order as they are sent.
Algorithm:
 Marker sending rule for a process P :
o Process p records its own local state
o For each outgoing channel C from process P, P sends marker along C before sending any other
messages along C.
(Note: Process Q will receive this marker on his incoming channel C1.)
 Marker receiving rule for a process Q :
o If process Q has not yet recorded its own local state then
o Record the state of incoming channel C1 as an empty sequence or null.
o After recording the state of incoming channel C1, process Q Follows the marker sending rule
o If process Q has already recorded its state
o Record the state of incoming channel C1 as the sequence of messages received along channel C1
after the state of Q was recorded and before Q received the marker along C1 from process P.
Need of taking snapshot or recording global state of the system:
 Checkpointing: It helps in creating checkpoint. If somehow application fails, this checkpoint can be
reused
 Garbage collection: It can be used to remove objects that do not have any references.
 It can be used in deadlock and termination detection.
 It is also helpful in other debugging.
*****
OPENMP | INTRODUCTION WITH INSTALLATION GUIDE
After a long thirst for parallelizing highly regular loops in matrix-oriented numerical programming,
OpenMP was introduced by
OpenMP Architecture Review Board (ARB) on 1997
. In the subsequent releases, the enthusiastic OpenMP team added many features to it including the
task parallelizing, support for accelerators, user-defined reductions and lot more. The latest
OpenMP 5.0
release was made in 2018 November.
Open Multi-processing (OpenMP)
is a technique of parallelizing a section(s) of C/C++/Fortran code. OpenMP is also seen as an
extension to C/C++/Fortran languages by adding the parallelizing features to them. In general,
OpenMP uses a
portable , scalable model that gives programmers a simple and flexible interface for developing
parallel applications for platforms that ranges from the normal desktop computer to the high-end
supercomputers.
THREAD Vs PROCESS
A process is created by the OS to execute a program with given resources(memory, registers);
generally, different processes do not share their memory with another. A thread is a subset of a
process, and it shares the resources of its parent process but has its own stack to keep track of
function calls. Multiple threads of a process will have access to the same memory.
Parallel Memory Architectures

24
UDS24201J UNIT V

Before getting deep into OpenMP, let’s revive the basic parallel memory architectures. These are
divided into three categories;

 Shared memory:
 OpenMP comes under the shared memory concept. In this, different CPU’s (processors) will have
access to the same memory location. Since all CPU’s connect to the same memory, memory access
should be handled carefully.
 Distributed memory:

 Here, each CPU(processor) will have its own memory location to access and use. In order to make
them communicate, all independent systems will be connected together using a network. MPI is
based on distributed architecture.
 Hybrid: Hybrid is a combination of both shared and distributed architectures. A simple scenario to
showcase the power of OpenMP would be comparing the execution time of a normal C/C++
program and the OpenMP program.
Steps for Installation of OpenMP
 STEP 1: Check the GCC version of the compiler
gcc --version

25
UDS24201J UNIT V

GCC provides support for OpenMP starting from its version 4.2.0. So if the system has GCC
compiler with the version higher than 4.2.0, then it must have OpenMP features configured with it.

If the system doesn’t


have the GCC compiler, we can use the following command
sudo apt install gcc
For more detailed support for installation, we can refer here
 STEP 2: Configuring OpenMP We can check whether the OpenMP features are configured into our
compiler or not, using the command
echo |cpp -fopenmp -dM |grep -i open

If OpenMP is not
featured in the compiler, we can configure it use using the command
sudo apt install libomp-dev
 STEP 3: Setting the number of threads In OpenMP, Before running the code, we can initialise the
number of threads to be executed using the following command. Here, we set the number of threads
to be getting executed to be 8 threads.
export OMP_NUM_THREADS=8
Running First Code in OpenMP
 C

 // OpenMP header

26
UDS24201J UNIT V

 #include <omp.h>
 #include <stdio.h>
 #include <stdlib.h>

 int main(int argc, char* argv[])
 {
 int nthreads, tid;

 // Begin of parallel region
 #pragma omp parallel private(nthreads, tid)
 {
 // Getting thread number
 tid = omp_get_thread_num();
 printf("Welcome to GFG from thread = %d\n",
 tid);
 if (tid == 0) {

 // Only master thread does this
 nthreads = omp_get_num_threads();
 printf("Number of threads = %d\n",
 nthreads);
 }
 }
 }

Output:

This program will print a message which will be getting executed by various threads.
Compile:
gcc -o gfg -fopenmp geeksforgeeks.c

Execute:
./gfg
*****
27
UDS24201J UNIT V

GETTING STARTED WITH MEMORY PROGRAMMING


To get started with memory programming, understand the basic concepts of memory organization
(stack, heap, static), explore memory management techniques (allocation/deallocation), and
practice with languages like C/C++ that require manual memory management.
Here's a more detailed breakdown:
1. Understanding Memory Organization:
 Stack:
Used for storing local variables and function call information. It's a last-in, first-out (LIFO)
structure.
 Heap:
Dynamically allocated memory, where you can request memory during program execution and
free it when no longer needed.
 Static:
Memory allocated for global variables and static local variables, which persist throughout the
program's execution.
2. Memory Management Techniques:
 Allocation:
Requesting memory from the heap using functions like malloc() (in C) or new (in C++).
 Deallocation:
Releasing memory back to the heap, preventing memory leaks, using functions like free() (in C)
or delete (in C++).
 Memory Leaks:
Occur when you allocate memory but forget to deallocate it, leading to a gradual reduction in
available memory.
 Dangling Pointers:
Occur when you try to access memory that has already been deallocated, leading to unpredictable
program behavior.
3. Languages for Memory Programming:
 C/C++:
These languages offer fine-grained control over memory management, requiring developers to
allocate and deallocate memory manually.
 Other Languages:
Languages like Java, Python, and Rust have garbage collectors or other mechanisms that
automate memory management, making them safer but less flexible.
4. Practical Steps for Beginners:
 Start with the basics: Learn about data types, variables, and pointers in C/C++.
 Practice memory allocation and deallocation: Write programs that allocate memory, store data, and
then deallocate it.

28
UDS24201J UNIT V

 Debug memory issues: Use debugging tools to identify memory leaks and dangling pointers.
 Explore memory-safe languages: Once you understand the fundamentals, you can explore
languages like Rust, which prioritize memory safety.
 Read documentation and online resources: There are many tutorials and articles available online
that can help you learn about memory programming.
*****
FUNDAMENTALS OF SHARED MEMORY PROGRAMMING
In distributed systems, shared memory programming enables multiple processes, potentially
running on different machines, to access and modify the same memory space, facilitating fast
communication and data sharing, but requires careful synchronization to avoid inconsistencies.
Key Concepts:
 Distributed Shared Memory (DSM):
A memory architecture where physically separated memories can be addressed as a single shared
address space, allowing processes on different nodes to access the same data.
 Shared Address Space:
Processes can directly access data in shared memory without explicit communication, leading to
faster data transfer and reduced overhead.
 Synchronization:
Mechanisms like mutexes, semaphores, and atomic operations are crucial to ensure that multiple
processes access and modify shared memory concurrently without causing data corruption or race
conditions.
 Memory Consistency:
Ensuring that all processes see a consistent view of the shared memory, even when multiple
processes are updating it concurrently. This is achieved through different consistency models like
sequential consistency, causal consistency, and others.
 Challenges:
 Latency: Accessing memory on a remote node can be slower than accessing local memory,
impacting performance.
 Scalability: Ensuring that the shared memory system can handle a large number of processes and
nodes without performance degradation.
 Fault Tolerance: Designing the system to continue operating even if some nodes or processes
fail.
 Advantages:
 Fast Communication: Direct memory access between processes is significantly faster than
message passing.
 Efficient Data Sharing: Processes can easily share data without the overhead of copying data
between processes.
 Simplified Programming: Shared memory programming can simplify the development of
distributed applications compared to message-passing paradigms.
 Examples:

29
UDS24201J UNIT V

 Distributed Databases: Databases that allow multiple clients to access and modify the same data
concurrently.
 Parallel Processing: Applications that utilize multiple processors to perform computations in
parallel.
 Real-time Systems: Systems that require fast and reliable communication between processes.
*****
BASIC OPENMP CONCEPTS
OpenMP is an Application Program Interface (API) that may be used to explicitly direct multi-
threaded, shared memory parallelism in C/C++ programs. It is not intrusive on the original serial
code in that the OpenMP instructions are made in pragmas interpreted by the compiler.
OpenMP uses the fork-join model of parallel execution. All OpenMP programs begin with a single
master thread which executes sequentially until a parallel region is encountered, when it creates a
team of parallel threads (FORK). When the team threads complete the parallel region, they
synchronize and terminate, leaving only the master thread that executes sequentially (JOIN).
Hello World Example

Here is a basic example showing how to parallelize a hello world program. First, the serial version:
#include <stdio.h>
int main() {
printf( "Hello, World from just me!\n" );
return 0;
}
To do this in parallel (have a series of threads print out a “Hello World!” statement), we would do
the following:

#include <stdio.h>
#include <omp.h>
int main() {
int thread_id;
#pragma omp parallel private(thread_id)
{
thread_id = omp_get_thread_num();
printf( "Hello, World from thread %d!\n", thread_id );
}
return 0;
}
Running an OpenMP program

To compile and run the above omphello.c program:


wget https://carleton.ca/rcs/wp-content/uploads/omphello.c
gcc -o omphello omphello.c -fopenmp
export OMP_NUM_THREADS=4
./omphello
OpenMP General Code Structure

The snippet below shows the general structure of a C/C++ program using OpenMP.

30
UDS24201J UNIT V

#include <omp.h>
main () {
int var1, var2, var3;
Serial code
...

//Beginning of parallel section.


#pragma omp parallel private(var1, var2) shared(var3)
{
/* Parallel section executed by all threads */

...

/* All threads join master thread and disband*/


}
Resume serial code
...

return 0;
}
When looking at this example you should notice a few things. First, we need to include the
OpenMP header (omp.h). Second, we notice a few variables that are declared outside of the parallel
region of the code. If these variables are used within the parallel region we will need to know if
they are public or private variables. A variable being private means that every thread will have their
own copy of this variable and that changes to that variable by one thread will not be seen by other
threads. A variable defined within the parallel region will be private. On the other hand, a public
variable is one that is shared between all of the threads and any changes made by one thread will be
seen by all of the threads. Any read-only variables can be shared. Caution must be taken when
when having multiple threads read and write to the same variable. Ensuring that this is done in the
proper order avoids what are called “race conditions”.
Parallel For Loops

OpenMP can be used to easily parallelize for loops. This can only be done when the loop iterations
are independent (ie. the running of one iteration of the loop does not depend on the result of
previous iterations). Here is an example of a serial for loop:
for( i=0; i < 25; i++ ) {
printf(“Foo”);
}
The parallel version of this loop is:
#pragma omp parallel for
for( i=0; i < 25; i++ ) {
printf(“Foo”);
}
*****

31
UDS24201J UNIT V

OPENMP DIRECTIVES

In the previous sections examples of OpenMP directives have been given. The general format of
these directives are:
#pragma omp directive-name [clause,..] newline
The scope of a directive is a block of statements surrounded by { }. A variety of clauses are
available, including:

 if (expression): only in parallel if expression evaluates to true


 private(list): variables private and local to a thread
 shared(list): data accessed by all threads
 default (none|shared)
 reduction (operator: list)
The reduction clause is used when the result of a parallel region is single value. For example,
imagine we have an array of integers we would like the sum of. We can do this in parallel as
follows:
int sum = 0;
#pragma omp parallel default(none) shared (n, x) \
private (i) reduction(+ : sum)
{
for(i = 0; i < n; i++)
sum = sum + x(I);
}
Since sum is a shared variable, we must be careful to avoid race conditions surrounding it. Using a
reduction clause ensures that the generated code avoids such situations
*****
PARALLEL DIRECTIVES, DATA SCOPING RULES
In the context of parallel programming with directives like OpenMP, data scoping rules determine
how variables are shared or kept private among threads within a parallel region, ensuring correct
and efficient execution.
Here's a breakdown of key concepts:
1. Parallel Regions and Threads:
 A "parallel region" is a section of code that's designed to be executed by multiple threads
concurrently.
 Each thread within a parallel region has its own execution context, and data scoping rules dictate
how variables are accessed and modified by these threads.
2. Data Scoping Attributes (OpenMP):
 private:
Each thread gets its own, independent copy of the variable, ensuring no interference between
threads.
 shared:

32
UDS24201J UNIT V

All threads share the same variable, allowing for data exchange and synchronization.
 firstprivate:
Each thread gets its own copy, initialized with the value of the variable before the parallel
region.
 lastprivate:
Each thread gets its own copy, and the value of the variable after the parallel region is the value
from the last thread's copy.
 reduction:
A special clause that allows for efficient accumulation of data across threads, such as summing
values or finding maximums.
 copyin:
Copies the value of a variable from the master thread to the private copies of the threads before
the parallel region.
3. Scoping Rules and Compiler Behavior:
 Automatic Scoping:
Some compilers can automatically determine the scope of variables based on how they are used
within the parallel region.
 Explicit Scoping:
You can use data scoping attributes in OpenMP directives to explicitly control the scope of
variables.
 Data Races:
Incorrect scoping can lead to data races, where multiple threads access or modify the same
variable without proper synchronization, resulting in unpredictable behavior.
 Synchronization:
When using shared variables, you need to ensure that access is synchronized to prevent data
races, often using mechanisms like mutexes or atomic operations.
 Lexical/Static Extent:
Data scope attribute clauses are effective only within their lexical/static extent.
 Orphaned Directives:
An OpenMP directive that appears independently from another enclosing directive is said to be
an orphaned directive.
 Directive Binding and Nesting Rules:
OpenMP specifies a number of scoping rules on how directives may associate (bind) and nest
within each other.
4. Example (OpenMP):
C++
#include <iostream>
#include <omp.h>

int main() {
int x = 0; // Shared variable

33
UDS24201J UNIT V

#pragma omp parallel


{
int thread_id = omp_get_thread_num();
#pragma omp critical
{
x++; // Increment x (needs synchronization)
}
std::cout << "Thread " << thread_id << " x: " << x << std::endl;
}

std::cout << "Final x: " << x << std::endl;


return 0;
}
In this example, x is a shared variable, and the #pragma omp critical block ensures that only one
thread can increment x at a time, preventing data races.

*****

OPENMP CONSTRUCTS
OpenMP provides several constructs to enable parallel execution, including parallel, for, sections,
and single, which allow developers to divide code into parallel regions, loops, sections, and single-
thread execution, respectively.
Here's a breakdown of the basic OpenMP constructs:
1. parallel Construct:
 Purpose: Defines a parallel region where multiple threads can execute the code concurrently.
 Syntax: #pragma omp parallel { /* code to be executed in parallel */ }
 Example:
C++
#include <iostream>
#include <omp.h>

int main() {
#pragma omp parallel
{
std::cout << "Hello from thread " << omp_get_thread_num() << std::endl;
}
return 0;
}
According to the Cornell Virtual Workshop, this code will print "Hello from thread..." multiple
times, each from a different thread.
2. for Construct:
 Purpose: Divides a loop's iterations among threads, enabling parallel execution of the loop body.

34
UDS24201J UNIT V

 Syntax: #pragma omp parallel for { /* loop */ }


 Example:
C++
#include <iostream>
#include <omp.h>

int main() {
#pragma omp parallel
{
#pragma omp for
for (int i = 0; i < 10; ++i) {
std::cout << "Thread " << omp_get_thread_num() << " processing iteration " << i <<
std::endl;
}
}
return 0;
}
According to the Cornell Virtual Workshop, this code will divide the loop iterations among the
threads, with each thread processing a different set of iterations.
3. sections Construct:
 Purpose: Divides a code block into sections, where each section is executed by a different thread.
 Syntax: #pragma omp parallel { #pragma omp sections { /* section 1 */ #pragma omp section { /*
section 2 */ } } }
 Example:
C++
#include <iostream>
#include <omp.h>

int main() {
#pragma omp parallel
{
#pragma omp sections
{
#pragma omp section
{
std::cout << "Section 1 executed by thread " << omp_get_thread_num() << std::endl;
}
#pragma omp section
{
std::cout << "Section 2 executed by thread " << omp_get_thread_num() << std::endl;
}
}
}
return 0;
}

35
UDS24201J UNIT V

According to the Cornell Virtual Workshop, this code will execute section 1 and section 2
concurrently, each by a different thread.
4. single Construct:
 Purpose: Ensures that a code block is executed by only one thread in the team.
 Syntax: #pragma omp parallel { #pragma omp single { /* code to be executed by a single thread */
}}
 Example:
C++
#include <iostream>
#include <omp.h>

int main() {
#pragma omp parallel
{
#pragma omp single
{
std::cout << "This code is executed by only one thread: " << omp_get_thread_num() <<
std::endl;
}
}
return 0;
}

*****
OPENMP DIRECTIVES
For parallel work-sharing:
Directive Description
parallel Defines a parallel region, which is code that will be executed by multiple threads in
parallel.
for Causes the work done in a for loop inside a parallel region to be divided among
threads.
sections Identifies code sections to be divided among all threads.
single Lets you specify that a section of code should be executed on a single thread, not
necessarily the main thread.

For main thread and synchronization:


Directive Description
master Specifies that only the main thread should execute a section of the program.
critical Specifies that code is only executed on one thread at a time.
barrier Synchronizes all threads in a team; all threads pause at the barrier, until all threads
execute the barrier.
atomic Specifies that a memory location that will be updated atomically.
flush Specifies that all threads have the same view of memory for all shared objects.

36
UDS24201J UNIT V

Directive Description
ordered Specifies that code under a parallelized for loop should be executed like a
sequential loop.
For data environment:
Directive Description
threadprivate Specifies that a variable is private to a thread.
atomic
Specifies that a memory location that will be updated atomically.
C++Copy
#pragma omp atomic
expression
Parameters
expression
The statement that has the lvalue, whose memory location you want to protect against more than
one write.
barrier
Synchronizes all threads in a team; all threads pause at the barrier, until all threads execute the
barrier.
C++Copy
#pragma omp barrier
critical
Specifies that code is only be executed on one thread at a time.
C++Copy
#pragma omp critical [(name)]
{
code_block
}
Parameters
name
(Optional) A name to identify the critical code. The name must be enclosed in parentheses.
flush
Specifies that all threads have the same view of memory for all shared objects.
C++Copy
#pragma omp flush [(var)]
Parameters
var
(Optional) A comma-separated list of variables that represent objects you want to synchronize.
If var isn't specified, all memory is flushed.
for
Causes the work done in a for loop inside a parallel region to be divided among threads.
C++Copy
#pragma omp [parallel] for [clauses]
for_statement
Parameters
clauses
(Optional) Zero or more clauses, see the Remarks section.
for_statement
A for loop. Undefined behavior will result if user code in the for loop changes the index variable.
master
Specifies that only the main thread should execute a section of the program.
37
UDS24201J UNIT V

C++Copy
#pragma omp master
{
code_block
}
ordered
Specifies that code under a parallelized for loop should be executed like a sequential loop.
C++Copy
#pragma omp ordered
structured-block
parallel
Defines a parallel region, which is code that will be executed by multiple threads in parallel.
C++Copy
#pragma omp parallel [clauses]
{
code_block
}
Parameters
clauses
(Optional) Zero or more clauses, see the Remarks section.
sections
Identifies code sections to be divided among all threads.
C++Copy
#pragma omp [parallel] sections [clauses]
{
#pragma omp section
{
code_block
}
}
Parameters
clauses
single
Lets you specify that a section of code should be executed on a single thread, not necessarily the
main thread.
C++Copy
#pragma omp single [clauses]
{
code_block
}
Parameters
clauses
(Optional) Zero or more clauses, see the Remarks section.
threadprivate
Specifies that a variable is private to a thread.
C++Copy
#pragma omp threadprivate(var)
Parameters

38
UDS24201J UNIT V

var
A comma-separated list of variables that you want to make private to a thread. var must be either a
global- or namespace-scoped variable or a local static variable.
OPENMP CALLS
OpenMP is an API that allows developers to write applications that effectively use multiple
processors by adding compiler directives and library routines to C, C++, and Fortran code, enabling
shared-memory parallelism.
Here's a more detailed explanation:
 What it is:
OpenMP (Open Multi-Processing) is a standardized API for shared-memory parallel
programming in C, C++, and Fortran.
 How it works:
It uses compiler directives (like #pragma omp parallel) and runtime library routines to specify
parallel regions of code, allowing multiple threads to execute concurrently.
 Key Concepts:
 Parallel Regions: Sections of code that are executed by multiple threads in parallel.
 Work-Sharing Constructs: Mechanisms to divide work among threads, such as #pragma omp
for for parallelizing loops.
 Threads: OpenMP creates and manages threads to execute parallel regions.
 Directives: Compiler directives (e.g., #pragma omp parallel) that instruct the compiler to
parallelize code.
 Runtime Library: Provides functions for managing threads, synchronization, and other OpenMP-
related tasks.
 Benefits:
 Portability: OpenMP is a widely supported standard, making it easier to write parallel code that
can run on different platforms.
 Ease of Use: It simplifies the process of writing parallel applications compared to lower-level
threading APIs.
 Performance: Enables applications to leverage multiple cores and processors, leading to
significant performance gains.
 Example:
C++
#include <iostream>
#include <omp.h> // Include OpenMP header

int main() {
#pragma omp parallel // Parallel region
{
int thread_id = omp_get_thread_num(); // Get thread ID
std::cout << "Hello from thread " << thread_id << std::endl;
}
return 0;
}
*****

39
UDS24201J UNIT V

PARALLELIZING AN EXISTING CODE WITH OPENMP

Specifying the parallel region (creating threads)

The basic directive is:


#pragma omp parallel
{
}
This is used to fork additional threads to carry out the work enclosed in the block following the
#pragma construct. The block is executed by all threads in parallel. The original thread will be
denoted as master thread with thread-id 0.

Example (C program): Display "Hello, world." using multiple threads.

#include < stdio.h >

int main(void)
{
#pragma omp parallel
{
printf("Hello, world.\n");
}

return 0;
}
Use flag -fopenmp to compile using gcc:
$ gcc -fopenmp hello.c -o hello
Output on a computer with two cores, and thus two threads:
Hello, world.
Hello, world.
On dover, I got 24 hellos, for 24 threads. On my desktop I get (only) 8. How many do you get?

Note that the threads are all writing to the standard output, and there is a race to share it. The way
the threads are interleaved is completely arbitrary, and you can get garbled output:

Hello, wHello, woorld.


rld.

Private and shared variables

In a parallel section variables can be private (each thread owns a copy of the variable) or shared
among all threads. Shared variables must be used with care because they cause race conditions.

 shared: the data within a parallel region is shared, which means visible and accessible by all threads
simultaneously. By default, all variables in the work sharing region are shared except the loop
iteration counter.
 private: the data within a parallel region is private to each thread, which means each thread will
have a local copy and use it as a temporary variable. A private variable is not initialized and the
value is not maintained for use outside the parallel region. By default, the loop iteration counters in
the OpenMP loop constructs are private.

40
UDS24201J UNIT V

The type of variables is specified following the #pragma omp

Example:

int main (int argc, char *argv[]) {

int th_id, nthreads;

#pragma omp parallel private(th_id)


// th_id is declared above. It is is specified as private; so each thread will have its own copy of
th_id
{
th_id = omp_get_thread_num();
printf("Hello World from thread %d\n", th_id);
}
Sharing variables is sometimes what you want, other times its not, and can lead to race conditions.
Put differently, some variables need to be shared, some need to be private, and you the programmer
have to specify what you want.
*****

MESSAGE PASSING INTERFACE


MPI (Message Passing Interface) is a standard for writing parallel programs on distributed memory
systems, where processes communicate by exchanging messages, and it is a widely used standard in
high-performance computing.
What is MPI?
 Standard Interface:
MPI defines a standardized application programming interface (API) for writing parallel
programs, allowing developers to create portable and scalable applications.
 Distributed Memory:
MPI is designed for distributed memory architectures, where each processor has its own memory
space, and communication happens through message passing.
 Message Passing:
Processes communicate by explicitly sending and receiving messages, allowing them to exchange
data and synchronize their actions.
 Not a Programming Language:
MPI is not a programming language itself, but rather a library of functions that programmers can
call from languages like C, C++, or Fortran to write parallel programs.
 Widely Used:
MPI is a dominant model used in high-performance computing and is a de facto standard for
communication among processes in distributed memory systems.
Key Concepts of MPI:
 Processes:
MPI programs consist of multiple processes, each running on a different processor or core.

41
UDS24201J UNIT V

 Communication:
Processes communicate by sending and receiving messages, which can contain data or
instructions.
 Synchronization:
MPI provides mechanisms for processes to synchronize their actions, ensuring that they cooperate
correctly.
 Point-to-Point Communication:
Messages can be sent between specific pairs of processes.
 Collective Communication:
MPI also supports collective operations, where a group of processes perform a task together, like
broadcasting or gathering data.
 MPI Communicators:
MPI uses communicators to group processes together, allowing for different communication
patterns.
 MPI Ranks:
Each process has a unique rank within a communicator, which is used to identify it.
 Memory Management:
In MPI, each process has its own local memory, and data must be explicitly shared by passing
messages between processes.
Benefits of Using MPI:
 Portability:
MPI programs can be run on various distributed memory architectures without significant
changes.
 Scalability:
MPI allows programs to scale to large numbers of processors and cores.
 Performance:
MPI provides efficient mechanisms for communication and synchronization, leading to high
performance in parallel applications.
 Flexibility:
MPI supports various communication patterns, allowing developers to optimize their programs
for different architectures and workloads.
*****
MESSAGE PASSING MODELS
Communication Protocols for Message Passing in Distributed Systems
Communication protocols for message passing are crucial in distributed systems to guarantee the
safe, effective, and dependable transfer of data between nodes or processes over a network. These
protocols specify the formats, guidelines, and practices that control the construction, transmission,
reception, and interpretation of messages. Typical protocols for communication in distributed
systems include:
 Transmission Control Protocol (TCP):

42
UDS24201J UNIT V

o Data packets are reliably and systematically delivered between network nodes via TCP, which is a
connection-oriented protocol.
o It ensures that data sent from one endpoint (sender) reaches the intended endpoint (receiver)
without errors and in the correct order.
o Suitable for applications where data integrity and reliability are paramount, such as file transfers,
web browsing, and database communication.
 User Datagram Protocol (UDP):
o UDP is a connectionless protocol that provides fast, lightweight communication by transmitting
data packets without establishing a connection or ensuring reliability.
o Used in real-time applications where low latency and speed are critical, such as streaming media,
online gaming, and VoIP (Voice over IP).
 Message Queuing Telemetry Transport (MQTT):
o MQTT is a lightweight publish-subscribe messaging protocol designed for IoT (Internet of Things)
and M2M (Machine-to-Machine) communication.
o It enables effective data transfer between devices with limited computing power and bandwidth.
o Ideal for IoT applications, smart home automation, remote monitoring, and telemetry data
collection.
 Hypertext Transfer Protocol (HTTP):
o HTTP is a protocol used for transmitting hypermedia documents, such as HTML pages and
multimedia content, over the World Wide Web.
o While not typically considered a message passing protocol in the traditional sense, it’s essential for
web-based communication and distributed applications.
Challenges for Message Passing In Distributed Systems
Below are some challenges for message passing in distributed systems:
 Scalability: Balancing the system’s expanding size and message volume while preserving
responsiveness, performance, and effective resource use.
 Fault Tolerance: Ensuring system resilience against node failures, network partitions, and message
loss through redundancy, replication, error handling mechanisms, and recovery strategies.
 Security: protecting messages’ confidentiality, integrity, and authenticity while guarding against
illegal access, interception, and manipulation.
 Message Ordering: Ensuring that messages arrive in the same order they were sent, especially in
systems where order affects the outcome.
Message Passing in Distributed Systems Examples
Example 1:
Scenario:
Let’s take example of an e-commerce platform where users place orders. When an order is placed,
the order system sends a message to the payment processing service to handle payment, and then
continues to process other orders without waiting for an immediate response.
Here, the order system doesn’t wait for confirmation that the payment was completed, allowing it to
keep processing orders efficiently. This asynchronous message passing helps the platform handle a
large volume of orders smoothly.
Example 2:
Scenario:
Let’s take example of a stock trading app, when a user wants to buy or sell shares, the app sends a
request to the trading server. The server checks if there are enough shares, processes the
transaction, and confirms to the user before the app allows further actions.
This synchronous message passing means the app waits for the server’s response to ensure the
transaction is complete. This is important for accuracy in trading, where users need immediate
confirmation of each transaction.
Message Passing System vs. Shared Memory System in Distributed Systems

43
UDS24201J UNIT V

In distributed systems, communication between processes or nodes can be achieved through either
message passing or shared memory systems. Below are some important differences between these
two approaches:
Aspect
Message Passing System Shared Memory System

Communication Asynchronous or synchronous Processes access and modify


Model communication between processes. shared memory locations.

Processes share data directly


Messages are copied from sender to receiver.
Data Exchange through shared memory.

Processes are loosely coupled and do not Processes interact by reading


Decoupling access each other’s memory directly. and writing to shared memory.

Scales well as communication is based on More challenging to scale


Scalability network protocols. across distributed nodes.

Introduces serialization/deserialization and Generally faster due to direct


Performance network overhead. memory access.

Requires synchronization
Processes are isolated, enhancing fault
mechanisms for data
tolerance and security.
Isolation consistency.

Management of message queues, Simpler programming model


synchronization, and delivery can be but needs careful
Complexity complex. synchronization.

Distributed systems, cloud computing, IoT Multiprocessor systems,


Use Cases where nodes are geographically dispersed. tightly coupled clusters.

44

You might also like