Distributed Systems :
Software Concepts
DS Software
Distributed
Systems
Multi- Multi-
Processor Computer
OS for each Distributed
Asymmetric Network OS
CPU OS
Time
Symmetric
sharing
10/24/2024 Distributed Systems 6
Distributed Systems : Software
System Software Hardware Main goal
Network Loosely Loosely Offer local service for
operating system coupled coupled remote clients
Additional layer on Provide distribution
Middleware
top of NOS transparency
Distributed Tightly Loosely provide user with
operating system coupled coupled transparency
Multiprocessor Provide the feeling of
Tightly Tightly
Time Sharing working with single
coupled coupled
systems processor
10/24/2024 Distributed Systems 7
Multiprocessor OS
OS for each CPU
Each CPU Has Its Own Operating System:
The simplest possible way to organize a
multiprocessor operating system is to:
divide memory into partitions = No. of CPUs
give each CPU its own
▪ private memory
▪ private copy of the operating system.
The n CPUs operate as n independent computers.
10/24/2024 Distributed Systems 9
OS for each CPU
10/24/2024 Distributed Systems 10
Asymmetric Multiprocessors
Also called Master-slave multiprocessing
One copy of the operating system and its tables
is present on one CPU (Master) and not on any
of the others.
All system calls are redirected to Master CPU for
processing there.
Master CPU may also run user processes if there
is CPU time left over.
10/24/2024 Distributed Systems 11
Asymmetric Multiprocessors
10/24/2024 Distributed Systems 12
Asymmetric Multiprocessors
Disadvantages of
Advantages of AMP
AMP
Simplified Control Single Point of Failure
Reduced Resource
Limited Scalability
Conflicts
Less Efficient
Specialized Processors
Resource Utilization
10/24/2024 Distributed Systems 13
Symmetric Multiprocessors
In SMP, two or more identical processors are
connected to a single, shared main memory
have full access to all input/output devices
controlled by a single OS instance that treats all processors equally.
There is one copy of the operating system in memory, but any
CPU can run it.
When a system call is made, the CPU on which the system call
was made traps to the kernel and processes the system call.
A mutex (i.e., lock) is associated with the OS.
When a CPU wants to run operating system code, it must first
acquire the mutex.
10/24/2024 Distributed Systems 14
Symmetric Multiprocessors
10/24/2024 Distributed Systems 15
Symmetric Multiprocessors
Disadvantages of
Advantages of SMP
SMP
Better Resource
Complex Scheduling
Utilization
Scalability Resource Contention
Increased Hardware
Fault Tolerance
Costs
10/24/2024 Distributed Systems 16
SMP vs AMP
Symmetric Asymmetric
Parameter
Multiprocessing Multiprocessing
Processor types Identical Not identical
Tasks of the OS Any symmetric CPU The master processor.
the master manages all
Communication Via shared memory.
the slave processors.
Task assignment via a prepared queue list. Via master processor
very complex due to the
very simple and
Architecture need for CPU
straightforward.
synchronization.
Expensive due to comparatively Cheaper
Expense complicated architectural due to simpler
design. architectural design.
10/24/2024 Distributed Systems 17
Multiprocessor Scheduling
On a uniprocessor, scheduling is one dimensional. The only question that
must be answered (repeatedly) is: ''Which process should be run next?‘’
On a multiprocessor, scheduling is two dimensional.
The scheduler has to decide which process to run and which CPU to run
it on.
This extra dimension greatly complicates scheduling on multiprocessors.
Another complicating factor is that in some systems, all the processes are
unrelated whereas in others they come in groups.
An example of the former situation is a timesharing system in which
independent users start up independent processes.
The processes are unrelated and each one can be scheduled without
regard to the other ones
10/24/2024 Distributed Systems 18
Timesharing
The simplest scheduling algorithm for
dealing with unrelated processes (or threads)
is to have a single system wide data structure
for ready processes, possibly just a list, but
more likely a set of lists for processes at
different priorities as depicted
10/24/2024 Distributed Systems 19
Timesharing
Here the 16 CPUs are all currently
busy, and a prioritized set of 14
processes are waiting to run.
The first CPU to finish its current
work (or have its process block) is
CPU 4, which then locks the
scheduling queues and selects the
highest priority process, A
Next, CPU 12 goes idle and chooses
process B, as illustrated in Fig. As
long as the processes are completely
unrelated, doing scheduling this way
is a reasonable choice.
10/24/2024 Distributed Systems 20
Multiprocessor Time Sharing
Having a single scheduling data structure used by all CPUs
timeshares the CPUs, much as they would be in a uniprocessor
system.
It also provides automatic load balancing because it can never
happen that one CPU is idle while others are overloaded.
Two disadvantages of this approach are
the potential contention for the scheduling data structure as the
numbers of CPUs grows
the usual overhead in doing a context switch when a process blocks
for I/O.
It is also possible that a context switch happens when a process'
quantum expires.
10/24/2024 Distributed Systems 21
Challenges and Overheads
• Spin locks vs. blocking locks
Synchronization • Hardware vs. software support
• Number of accesses increases with number of CPUs.
Memory Access • Memory speed must scale with N(CPU) or constrain.
Caching and • Memory bottleneck increases the need to cache
Consistency • multiple copies in caches means coherency problems
System Bus • Traffic increases with the number of CPUs
File System • Serializes access if designed conventionally
10/24/2024 Distributed Systems 22
Network Operating
System
Network Operating System
Each computer has its own operating system with networking facilities.
Computers work independently
(i.e., they may even have different operating systems).
Services are tied to individual nodes (ftp, WWW).
Highly file oriented (basically, processors share only files).
Machine A Machine B Machine C
Distributed applications
NOS services NOS services NOS services
Kernel Kernel Kernel
Network
10/24/2024 Distributed Systems 24
Network Operating System
Intermediate stage between
independent individual workstations and
a distributed SW environment.
Independent machines with significant interaction:
They may run different operating systems.
Servers support interaction of distributed pieces.
Network File System (NFS) is the most obvious and the strongest
combining force.
Extensive communication - location dependent.
Client/server - service model.
File servers - file sharing.
WWW service - newest addition.
10/24/2024 Distributed Systems 25
NOS for users
Users are aware of multiplicity of machines
Access to resources of various machines is done explicitly
by:
Remote logging into the appropriate remote machine
Transferring data from remote machines to local machines,
via the File Transfer Protocol (FTP) mechanism
Upload, download, access, or share files through cloud storage
Users must change paradigms – establish a session, give
network-based commands, use a web browser
More difficult for users
10/24/2024 Distributed Systems 26
PC vs. NOS
The NOS enhances the reach of the
client PC by making remote services
available as extensions of the local
operating system.
Although a number of users may have
accounts on a PC, only a single account is
active on the system at any given time.
NOS supports multiple user accounts at
the same time and enables concurrent
access to shared resources by multiple
clients (multitasking and multiuser
environment).
10/24/2024 Distributed Systems 27
Multiuser, Multitasking, and
Multiprocessor Systems
A NOS server is a multitasking system.
OS is capable of executing multiple tasks
at the same time.
Some systems are equipped with more
than one processor, called
multiprocessing systems.
multiprocessing systems are capable of
executing multiple tasks in parallel by
assigning each task to a different
processor.
The total amount of work that the server
can perform in a given time is greatly
enhanced in multiprocessor systems.
10/24/2024 Distributed Systems 28
NOS: Where to use?
NOS can be used in:
Routers, switches and hardware firewall.
PCs in Peer-to-peer networks
Client-server Architecture
10/24/2024 Distributed Systems 29
Middleware
Middleware is a computer software that connects
software components or people and their
applications.
It consists of a set of services that allows multiple
processes running on one or more machines to
interact.
Interoperable in support of distributed systems.
Middleware sits "in the middle" between application
software that may be working on different operating
systems.
10/24/2024 Distributed Systems 30
Middleware
OS on each computer need
not know about the other
computers.
OS on different computers
need not generally be the
same.
Services are generally
(transparently) distributed
across computers.
10/24/2024 Distributed Systems 31
Distributed
Operating System
Multicomputer OS
Multicomputer OS has a totally different structure and complexity from
multiprocessor OS because of the lack of physically shared memory for
storing data structures for system-wide resource management.
Users are not aware of multiplicity of machines
Access to remote resources similar to access to local resources
Machine A Machine B Machine C
Distributed applications
Distributed OS applications
Kernel Kernel Kernel
Network
10/24/2024 Distributed Systems 33
Fully Distributed Systems
Programming and user interface provides a
virtual uni-processor.
Built on top of distributed heterogeneous machines.
New approaches and implementations to user
and system software are generally required.
Long development time.
Execution overhead is the REAL challenge
Message latency and overhead are usually the right
place to start looking
10/24/2024 Distributed Systems 34
Fully Distributed Systems
Familiar uni-processor challenges are all
present
They become more challenging when generalized
for a truly distributed implementation
Synchronization Squared.
Scheduling multiplied.
Programming model complication.
Shared and Virtual Memory complications.
10/24/2024 Distributed Systems 35
Fully Distributed Systems
Distribution brings its own complications
Load balancing.
Cache coherence.
Fault tolerance.
Process migration.
10/24/2024 Distributed Systems 36
Migrations
Data Migration – transfer data by transferring entire file, or transferring only
those portions of the file necessary for the immediate task
Computation Migration – transfer the computation, rather than the data, across
the system
Via remote procedure calls (RPCs)
Via messaging system
Process Migration – execute an entire process, or parts of it, at different sites
Load balancing – distribute processes across network to even the workload
Computation speedup – subprocesses can run concurrently on different sites
Hardware preference – process execution may require specialized processor
Software preference – required software may be available at only a particular
site
Data access – run process remotely, rather than transfer all data locally
10/24/2024 Distributed Systems 37
Design questions!
Robustness
Can the distributed system withstand failures?
Transparency
Can the distributed system be transparent to the
user both in terms of where files are stored and user
mobility?
Scalability
Can the distributed system be scalable to allow
addition of more computation power, storage, or
users?
10/24/2024 Distributed Systems 38
Robustness
Hardware failures can include failure of a link,
failure of a site, and loss of a message.
A fault-tolerant system can tolerate a certain
level of failure
Degree of fault tolerance depends on design of
system and the specific fault
The more fault tolerance, the better!
Involves failure detection, reconfiguration,
and recovery
10/24/2024 Distributed Systems 39
Failure Detection
Detecting hardware failure is difficult
To detect a link failure, a heartbeat protocol can be used
Assume Site A and Site B have established a link
▪ At fixed intervals, each site will exchange an I-am-up message indicating that they are up and running
▪ If Site A does not receive a message within the fixed interval, it assumes either (a) the other site is not up or (b) the
message was lost
▪ Site A can now send an Are-you-up? message to Site B
▪ If Site A does not receive a reply, it can repeat the message or try an alternate route to Site B
▪ If Site A does not ultimately receive a reply from Site B, it concludes some type of failure has occurred
Types of failures:
Site B is down
The direct link between A and B is down
The alternate link from A to B is down
The message has been lost
However, Site A cannot determine exactly why the failure has occurred
10/24/2024 Distributed Systems 40
Failure Detection
Detecting hardware failure is difficult
To detect a link failure, a heartbeat protocol can be used
Assume Site A and Site B have established a link
At fixed intervals, each site will exchange an I-am-up message indicating that they are up
and running
If Site A does not receive a message within the fixed interval, it assumes either (a) the other
site is not up or (b) the message was lost
Site A can now send an Are-you-up? message to Site B
If Site A does not receive a reply, it can repeat the message or try an alternate route to Site B
If Site A does not ultimately receive a reply from Site B, it concludes some type of failure has
occurred
Types of failures:
▪ Site B is down
▪ The direct link between A and B is down
▪ The alternate link from A to B is down
▪ The message has been lost
However, Site A cannot determine exactly why the failure has occurred
10/24/2024 Distributed Systems 41
Reconfiguration and Recovery
When Site A determines a failure has occurred, it
must reconfigure the system:
If the link from A to B has failed, this must be
broadcast to every site in the system
If a site has failed, every other site must also be notified
indicating that the services offered by the failed site are
no longer available
When the link or the site becomes available again,
this information must again be broadcast to all other
sites
10/24/2024 Distributed Systems 42
Transparency
The distributed system should appear as a
conventional, centralized system to the user
User interface should not distinguish between
local and remote resources
User mobility allows users to log into any
machine in the environment and see his/her
environment
10/24/2024 Distributed Systems 43
Scalability
As demands increase, the system should easily
accept the addition of new resources to
accommodate the increased demand
Reacts gracefully to increased load
Adding more resources may generate additional
indirect load on other resources if not careful
Data compression or deduplication can cut down
on storage and network resources used
10/24/2024 Distributed Systems 44
Comparison between Systems
Multiproc. Multicomp. Network Middleware
Item
DOS DOS OS -based OS
Degree of transparency Very High High Low High
Same OS on all nodes Yes Yes No No
Number of copies of OS 1 N N N
Basis for Shared Model
Messages Files
communication memory specific
Global, Global,
Resource management Per node Per node
central distributed
Scalability No Moderately Yes Varies
Openness Closed Closed Open Open
10/24/2024 Distributed Systems 45
10/24/2024 Distributed Systems 46