DISTRIBUTED AND CLOUD SYSTEMS
Chapter 1: Distributed System
Models and Enabling Technologies
Objectives of the course
• Introduce the student to fundamental concepts, technologies and tools of distributed
and cloud systems.
• Provide the student with practical skills on distributed- and cloud-based
applications/services development and deployment,
• Familiarize the student with latest tools and technologies used for developing and
deploying cloud services.
Course outline
• Distributed system models and enabling technologies
• Cloud architectures and classification
• Virtualization and resource management
• Cloud-based environments and deployments
• Parallel Programming in the Cloud: MapReduce, Hadoop and Spark
• Data Storage, Management and replication
• Security management in the cloud
Assessment method
• Quiz (1) 15%
• Quiz (2) 15%
• Final Exam 70%
Computing Paradigm Distinctions
• Computing is the process of using computer technology to complete a given
goal-oriented task.
• Computing may encompass the design and development of software and
hardware systems for a broad range of purposes.
Computing Paradigm Distinctions
• Centralized computing
• All computer resources are centralized in one physical system.
• All resources (processors, memory, and storage) are fully shared and tightly coupled within one
integrated OS.
• Many data centers and supercomputers are centralized systems, but they are used in parallel,
distributed, and cloud computing applications.
• Parallel computing (parallel processing)
• All processors are either tightly coupled with centralized shared memory or loosely coupled with
distributed memory.
• Interprocessor communication is accomplished through shared memory or via message passing.
• A computer system capable of parallel computing is commonly known as a parallel computer.
• Programs running in a parallel computer are called parallel programs.
• The process of writing parallel programs is often referred to as parallel programming
Computing Paradigm Distinctions
• Distributed computing
• A distributed system consists of multiple autonomous computers, each having its own private
memory, communicating through a computer network.
• Information exchange in a distributed system is accomplished through message passing.
• A computer program that runs in a distributed system is known as a distributed program.
• The process of writing distributed programs is referred to as distributed programming.
• Cloud computing
• An Internet cloud of resources can be either a centralized or a distributed computing system.
• The cloud applies parallel or distributed computing, or both.
• Clouds can be built with physical or virtualized resources over large data centers that are
centralized or distributed.
• Some authors consider cloud computing to be a form of utility computing or service computing
Computer architectures
• Computer architectures consisting of interconnected,
multiple processors are basically of two types:
• Tightly coupled systems:
• In these systems, there is a single system wide primary
memory (address space) that is shared by all the processors.
• in these systems, any communication between the
processors usually takes place through the shared memory
• Loosely coupled systems:
• In these systems, the processors do not share memory, and
each processor has its own local memory.
• In these systems, all physical communication between the
processors is done by passing messages across the network
that interconnects the processors.
Computer architectures
• Tightly coupled systems are referred to as parallel processing systems
• Loosely coupled systems are referred to as distributed systems.
• In contrast to the tightly coupled systems, the processor of distributed computing systems
can be located far from each other to cover a wider geographical area.
• Furthermore, in tightly coupled systems, the number of processors that can be usefully
deployed is usually small and limited by the bandwidth of the shared memory.
• This is not the case with distributed computing systems that are more freely expandable and
can have an almost unlimited number of processors
Distributed computing
• Distributed computing system:
• Is basically a collection of processors interconnected by a communication network in which
each processor has its own local memory, and the communication between any two
processors of the system takes place by message passing over the communication network.
Hardware and Software Architectures
• definition of distributed systems
• independent computers (Hardware)
• performing a task and providing a service (Software)
• From a hardware point of view distributed systems are generally implemented on
multicomputers.
• From a software point of view they are generally implemented as distributed operating
systems or middleware.
Hardware and Software Architectures
• Hardware - Multicomputers
• A multicomputer consists of separate computing nodes connected to each other over a
network, generally differ from each other in three ways:
1) Node resources. This includes the processors, amount of memory, amount of secondary storage,
etc. available on each node.
2) Network connection. The network connection between the various nodes can have a large
impact on the functionality and applications that such a system can be used for.
3) Homogeneity. A homogeneous multicomputer is one where all the nodes are the same. A
heterogeneous multicomputer is one where the nodes are not expected to be the same.
Hardware and Software Architectures
• Software - Distributed Operating System
• An operating system built to provide and manage
distributed services:
• Designed for homogeneous multicomputer
systems.
• Optimized for Local Area Network (LAN)
environments.
• Shift to Middleware Systems:
• Middleware offers greater flexibility; not tied to a
specific OS.
• Better suited for heterogeneous systems and
wide-area networks.
DS Characteristics:
• There are several key properties that we wish the systems to have:
• Transparency
• Scalability
• Dependability
• Performance
• Flexibility
• Providing systems with these properties leads to many of the challenges
DS Characteristics: Transparency
• Transparency is the concealment from the user and the application programmer of the
separation of the components of a distributed system (i.e., a single image view).
• There are a number of different forms of transparency including the following:
• Access Transparency: Local and remote resources are accessed in same way
• Location Transparency: Users are unaware of the location of resources
• Migration Transparency: Resources can migrate without name change
• Replication Transparency: Users are unaware of the existence of multiple copies of resources
• Failure Transparency: Users are unaware of the failure of individual components
• Concurrency Transparency: Users are unaware of sharing resources with others
DS Characteristics: Scalability
• Scalable system: can handle the addition of users and resources without suffering a
noticeable loss of performance.
• Scalability is important in distributed systems, and in particular in wide area distributed
systems, and those expected to experience large growth.
• This growth has two dimensions:
• Size: A distributed system can grow with regards to the number of users or resources (e.g.,
computers) that it supports.
• Geography: A distributed system can grow with regards to geography or the distance
between nodes.
DS Characteristics: Dependability
• The dependability of a system reflects the extent of the user's confidence that it will operate
as users expect and that it will not corrupt data or other systems and will not 'fail' in normal
use.
• dependability refers to the system's ability to operate reliably and correctly, even in the
presence of failures, security threats, or other challenges.
• Dependability requires consistency, security, and fault tolerance.
DS Characteristics: Dependability
• To achieve dependability, the system must ensure three key aspects:
• Consistency:
• All parts of the system should agree on the current state of data, even if they are spread across
different locations.
• Security:
• The system must protect data and operations from unauthorized access, breaches, or attacks.
• Fault Tolerance:
• The system must continue to function correctly even if some components fail.
• Together, these ensure that the system is dependable, meaning users can trust it to work
correctly and securely at all times.
DS Characteristics: Performance
• While high performance is always a goal, achieving it often conflicts with other important
properties such as transparency, security, dependability, and scalability.
• Transparency vs. Performance:
• A distributed file system might hide the complexity of data location from the user (transparency). However, to
do so, the system may need to constantly check multiple locations to find data, which can slow down
performance.
• Security vs. Performance:
• Encrypting all data sent between distributed system nodes enhances security, but it adds processing overhead,
reducing overall system speed.
• Dependability vs. Performance:
• To ensure dependability, a system might replicate data across multiple servers. While this increases fault
tolerance, it also means more data syncing and management, which can slow down operations.
• Scalability vs. Performance:
• A scalable system is designed to add more resources (like servers) easily. However, as the system scales,
coordination between these resources can introduce delays, reducing performance.
DS Characteristics: Flexibility
• A flexible distributed system allows users or programmers to configure it to meet specific
needs or requirements.
• This means that the system can be adapted to provide only the services that are necessary,
without forcing users to deal with unnecessary or irrelevant features.
• Imagine a cloud computing platform that offers a wide range of services, such as data
storage, machine learning, and web hosting.
• A flexible system would allow a developer to configure the hosting service to match the exact
needs of their application, such as the amount of server resources or security features
required.
DS Characteristics: Flexibility
• Extensibility allows one to add or replace system components in order to
extend or modify system functionality.
• Openness means that a system provides its services according to standard
rules regarding invocation syntax and semantics.
• Openness allows multiple implementations of standard components to be
produced. This provides choice and flexibility.
• Interoperability ensures that systems implementing the same standards (and
possibly even those that do not) can interoperate.
Advantages of Distributed Systems
• Cost: Better price/performance as long as commodity hardware is used for the component
computers. a collection of microprocessors offer a better price/performance than
mainframes.
• Performance: By using the combined processing and storage capacity of many nodes, may
have more total computing power than a mainframe
• Scalability: Resources such as processing and storage capacity can be increased
incrementally.
Advantages of Distributed Systems
• Reliability: By having redundant components the impact of hardware and software faults on
users can be reduced.
• Inherent distribution: Some applications, such as email and the Web (where users are spread
out over the whole world), are naturally distributed.
• Banking, Airline reservation etc. are examples of the applications that are inherently
distributed.
System Architectures
• The architecture include:
• The division of responsibilities between system components.
• The placement of the components on computers in the network.
System Architectures
• Client-server model:
• Most important and most widely distributed
system architecture.
• Client and server roles are assigned and
changeable.
• Servers may in turn be clients of other servers.
• Services may be implemented as several
interacting processes in different host computers
to provide a service to client processes:
• Servers partition the set of objects on which the
service is based and distribute them among
themselves (e.g. Web data and web servers)
System Architectures
• Web proxy server
• Provides a shared cache of web resources for client machines at a site or across several sites.
• Increase availability and performance of a service by reducing load on the WAN and web
servers.
System Architectures
• Peer processes Model
• All processes play similar roles without destination as a client or a server.
• Interacting cooperatively to perform a distributed activity.
• Communications pattern will depend on application requirements.
END