COMPUTER
ARCHITECTURE
PCC-CS 402
Department of Computer Science & Engineering
2nd Year,4th Semester
2022
Overview
Module 4
Multiprocessor architecture: taxonomy of
parallel architectures;
Centralized shared memory architecture:
synchronization, memory consistency
Interconnection networks.
Distributed shared memory architecture.
Cluster computers.
Non von Neumann architectures: data flow
computers, reduction computer
architectures,
Systolic architectures.
Multiprocessor:
A Multiprocessor is a computer system with two or more central
processing units (CPUs) share full access to a common RAM. The main
objective of using a multiprocessor is to boost the system’s execution
speed, with other objectives being fault tolerance and application
matching.
There are two types of multiprocessors, one is called shared memory
multiprocessor and another is distributed memory multiprocessor.
In shared memory multiprocessors, all the CPUs shares the common
memory but in a distributed memory multiprocessor, every CPU has its
own private memory.
• A multiprocessor system is an interconnection of two or more CPU’s with memory and input-output
equipment.
• Multiprocessors system are classified as multiple instruction stream, multiple data stream
systems(MIMD).
• There exists a distinction between multiprocessor and multicomputers that though both support
concurrent operations.
• In multicomputers several autonomous computers are connected through a network and they may
or may not communicate but in a multiprocessor system there is a single OS Control that provides
interaction between processors and all the components of the system to cooperate in the solution of
the problem.
• VLSI circuit technology has reduced the cost of the computers to such a low Level that the concept
of applying multiple processors to meet system performance requirements has become an attractive
design possibility.
Applications of Multiprocessor –
• As a uniprocessor, such as single instruction, single data stream (SISD).
• As a multiprocessor, such as single instruction, multiple data stream (SIMD), which is
usually used for vector processing.
• Multiple series of instructions in a single perspective, such as multiple instruction, single
data stream (MISD), which is used for describing hyper-threading or pipelined
processors.
• Inside a single system for executing multiple, individual series of instructions in
multiple perspectives, such as multiple instruction, multiple data stream (MIMD).
Benefits of using a Multiprocessor –
• Enhanced performance.
• Multiple applications.
• Multi-tasking inside an application.
• High throughput and responsiveness.
• Hardware sharing among CPUs.
Multicomputer:
A Multicomputer system is a computer system with multiple processors that are connected together to
solve a problem. Each processor has its own memory and it is accessible by that particular processor and
those processors can communicate with each other via an interconnection network.
As the multicomputer is capable of messages passing between the processors, it is possible to divide
the task between the processors to complete the task. Hence, a multicomputer can be used for
distributed computing. It is cost effective and easier to build a multicomputer than a multiprocessor.
Difference between multiprocessor and Multicomputer:
• Multiprocessor is a system with two or more central processing units (CPUs) that is capable of
performing multiple tasks where as a multicomputer is a system with multiple processors that are
attached via an interconnection network to perform a computation task.
• A multiprocessor system is a single computer that operates with multiple CPUs where as a
multicomputer system is a cluster of computers that operate as a singular computer.
• Construction of multicomputer is easier and cost effective than a multiprocessor.
• In multiprocessor system, program tends to be easier where as in multicomputer system, program tends
to be more difficult.
• Multiprocessor supports parallel computing, Multicomputer supports distributed computing.
COUPLING OF PROCESSORS
Tightly Coupled System/Shared Memory:
- Tasks and/or processors communicate in a highly synchronized fashion
- Communicates through a common global shared memory
- Shared memory system. This doesn’t preclude each processor from having
its own local memory(cache memory)
Loosely Coupled System/Distributed Memory
- Tasks or processors do not communicate in a synchronized fashion.
- Communicates by message passing packets consisting of an address, the
data content, and some error detection code.
- Overhead for data exchange is high
- Distributed memory system
Loosely coupled systems are more efficient when the interaction between tasks is
minimal, whereas tightly coupled system can tolerate a higher degree of interaction
between tasks.
Shared (Global) Memory
- A Global Memory Space accessible by all processors
- Processors may also have some local memory
Distributed (Local, Message-Passing) Memory
- All memory units are associated with processors
- To retrieve information from another processor's memory a message must be
sent there
Uniform Memory
- All processors take the same time to reach allmemory locations
Non-uniform (NUMA) Memory
- Memory access is not uniform
Interconnection Structures:
The interconnection between the components of a multiprocessor System can have different physical
configurations depending n the number of transfer paths that are available between the processors and
memory in a shared memory system and among the processing elements in a loosely coupled system.
Some of the schemes are as:
- Time-Shared Common Bus
- Multiport Memory
- Crossbar Switch
- Multistage Switching Network
- Hypercube System
Multistage Switching Network: -
The basic component of a multi stage switching network is a two-input, two output interchange switch.
- Some request patterns cannot be
connected simultaneously. i.e., any two
sources cannot be connected
simultaneously to destination 000 and
001 - In a tightly coupled multiprocessor
system, the source is a processor and
the destination is a memory module. - Set
up the path transfer the address into
memory transfer the data - In a loosely
coupled multiprocessor system, both the
source and destination are Processsing
elements.
Hypercube System:
The hypercube or binary n-cube multiprocessor structure is a loosely coupled system composed of N=2n
processors interconnected in an n-dimensional binary cube.
- Each processor forms a node of the cube, in effect it contains not only a CPU but also local memory and
I/O interface.
- Each processor address differs from that of each of its n neighbors by exactly one bit position.
- Fig. below shows the hypercube structure for n=1, 2, and 3.
- Routing messages through an n-cube structure may take from one to n links from a source node to a
destination node.
- A routing procedure can be developed by computing the exclusive-OR of the source node address with
the destination node address.
- The message is then sent along any one of the axes that the resulting binary value will have 1 bits
corresponding to the axes on which the two nodes differ.
- A representative of the hypercube architecture is the Intel iPSC computer complex.
- It consists of 128(n=7) microcomputers, each node consists of a CPU, a floating point processor, local
memory, and serial communication interface units
Inter-processor Arbitration
- Only one of CPU, IOP, and Memory can be granted to use the bus at a time
- Arbitration mechanism is needed to handle multiple requests to the shared resources to resolve multiple
contention
- SYSTEM BUS:
o A bus that connects the major components such as CPU’s, IOP’s and memory
o A typical System bus consists of 100 signal lines divided into three functional groups: data, address and
control lines. In addition there are power distribution lines to the components.
- Synchronous Bus
o Each data item is transferred over a time slice
o known to both source and destination unit
o Common clock source or separate clock and synchronization signal is transmitted periodically to synchronize
the clocks in the system
- Asynchronous Bus
o Each data item is transferred by Handshake mechanism
-Unit that transmits the data transmits a control signal that indicates the presence of data
-Unit that receiving the data responds with another control signal to acknowledge the receipt of the
data
o Strobe pulse -supplied by one of the units to indicate to the other unit when the data transfer has to occur
Cluster Computers
Cluster Computing
• It is a form of computing in which a group of computers are linked
together so that they can act like a single entity.
• It is the technique of linking two or more computers into a
network in order to take advantage of the parallel processing
power of those computers.
• A computer cluster is a set of loosely or tightly connected
computers that work together so that, in many respects, they can
be viewed as a single system.
• To date clusters do not typically use physically shared memory.
Cluster Computing
• Main Components
• Memory
• Network Components
• Processors
• Cluster Computer Types
• High Availability Clusters
• Load Balancing Clusters
• High-Performance Clusters
Cluster Computer Benefits
• Awesome Processing Power
• The processing power of a High Performance Computer Cluster is SAME as that of a
mainframe computer. With increase in number of computers in a cluster, the processing power
may exceed the mainframe computer.
• Cost Efficient
• Purchasing several good quality computers and adding them into a cluster is less
expensive that purchasing a Supercomputer.
• Expandability
• Computer clusters can be easily expanded by addition of further computers in the
cluster. A mainframe computer has a fixed capacity.
• Availability
• When a mainframe computer fails, the entire system fails. However if a node in a cluster fails,
the operations of that node can be easily transferred to another node within the cluster ensuring non
interruption in service.
Cluster Computer Challenges
• Size Scalability (Physical and Application)
• Single System Image (Look and feel of one system)
• Security and Encryption (Clusters of Clusters)
• Enhanced availability (Failure Management)
Cluster Computer Applications
• Google Search Engine
• Petroleum Reservoir Simulation
• Protein Explorer
• Earthquake Simulation
• Image Rendering
• Weather Forecasting
Non Von Neumann Computers
Non Von Neumann Computers
• Any computer architecture in which the underlying model of computation is different from what has come to
be called the standard von Neumann model.
• A non von Neumann machine may be without:
• The concept of sequential flow of control (i.e. without any register corresponding to a “program
counter” that indicates the current point that has been reached in execution of a program) .
• And/or without the concept of a variable (i.e. without “named” storage locations in which a value
may be stored and subsequently referenced or changed).
• Examples:
• Data Flow Computers
• Reduction Computers
• In both of these cases there is a high degree of parallelism, and instead of variables there are immutable
bindings between names and constant values.
The Von Neumann Architecture
• A processing unit that contains an arithmetic logic unit
and processor registers.
• A control unit that contains an instruction register and
program counter.
• Memory that stores data and instructions
• External mass storage.
• Input and output mechanisms.
• Completed one after another instructions.
• They were limited by the previous instructions.
The Von Neumann Bottlenecks
• The shared bus between the program memory and data memory leads to the von Neumann bottleneck.
• limited throughput (data transfer rate) between the central processing unit (CPU) and memory
compared to the amount of memory.
• The single bus can only access one of the two classes of memory at a time, throughput is lower than
the rate at which the CPU can work.
• The CPU is continually forced to wait for needed data to move to or from memory.
The Von Neumann Mitigations
• Providing a cache between the CPU and the main memory.
• Providing separate caches or separate access paths for data and instructions (the so-called Modified
Harvard architecture).
• Using branch predictor algorithms and logic.
• Providing a limited CPU stack or other on-chip scratchpad memory to reduce memory access
• Implementing the CPU and the memory hierarchy as a system on chip, providing greater locality of
reference and thus reducing latency and increasing throughput between processor registers and
main memory.
Harvard architecture
The Harvard architecture is a computer architecture with physically separate storage and
signal pathways for instructions and data. The term originated from the Harvard Mark I
relay-based computer, which stored instructions on punched tape (24 bits wide) and data in
electro-mechanical counters. These early machines had data storage entirely contained
within the central processing unit, and provided no access to the instruction storage as data.
Programs needed to be loaded by an operator; the processor could not boot itself.
Difference between Harvard and
Von Neumann Computer Architectures
VON NEUMANN ARCHITECTURE HARVARD ARCHITECTURE
It is ancient computer architecture based on stored It is modern computer architecture based on Harvard
program computer concept. Mark I relay based model.
Same physical memory address is used for Separate physical memory address is used for
instructions and data. instructions and data.
There is common bus for data and instruction Separate buses are used for transferring data and
transfer. instruction.
Two clock cycles are required to execute single
An instruction is executed in a single cycle.
instruction.
It is cheaper in cost. It is costly than van neumann architecture.
CPU can not access instructions and read/write at the CPU can access instructions and read/write at the
same time. same time.
It is used in personal computers and small
It is used in micro controllers and signal processing.
computers.
Other Non Von Neumann Architectures
• Analog Computers
• Optical Computers
• Quantum Computers
• Cell Processors
• DNA
• Neural Networks
• MIMD Architectures
Non Von Neumann Design Problems
• Processor designing
• Physical organization
• Interconnection structure
• Inter-processor communication protocols
• Memory hierarchy
• Cache organization and coherency
• Operating system design
• Parallel programming languages
Dataflow Architectures
• A computer architecture that directly contrasts the traditional von Neumann architecture or control flow
architecture.
• Dataflow architectures do not have a program counter.
• The execution of instructions is solely determined based on the availability of input arguments to the
instructions.
• The order of instruction execution is unpredictable.
Dataflow Nodes
• In a Data Flow Machine, a program consists of Data Flow Node.
• A Data Flow Node fires when all its inputs are ready, or all its inputs have tokens.
Dataflow Architectures
A small set of Data Flow Operators can be used to define programming language
Dataflow Models
• Static:
• Allows only one instance of a node to be enabled for firing.
• Use conventional memory addresses as data dependency tags.
• These machines did not allow multiple instances of the same routines to be executed simultaneously
because the simple tags could not differentiate between them.
• Doesn’t support Re-entrant code (Function Calls, loops) and Data Structures.
• Dynamic:
• Designs that use content-addressable memory (CAM).
• They use tags in memory to facilitate parallelism.
Static Dataflow Architecture
Manchester Dynamic Dataflow Architecture
Dataflow Model Execution
There are three types of execution sequences between modules:
• Batch Sequential
• Pipe and Filter
• Process Control
Process Control Execution
• Process Control Architecture is a type of Data Flow Architecture, where data is neither batch sequential nor
pipe stream.
• In process control architecture, the flow of data comes from a set of variables which controls the execution
of process.
• This architecture decomposes the entire system into subsystems or modules and connects them.
• Process control architecture is suitable in the embedded system software design, where the system is
manipulated by process control variable data and in the Real time system software, process control
architecture is used to control automobile anti-lock brakes, nuclear power plants etc.
• This architecture is applicable for car-cruise control and building temperature control system.
Dataflow Model Advantages
• Highly parallel by its nature.
• Not constrained by artificial dependencies.
• Provides simpler divisions on subsystems.
• Each subsystem can be an independent program working on input data and producing output data.
Dataflow Model Complications
• It is tough for the node to analyze if the inputs arrive at different times.
• Some models may require merging of inputs.
• All dataflow models need some control. External control is required for implementation.
• Synchronization. Suppose a node receives inputs from two iterations. Speed of iterations matter.
• Provides high latency and low throughput.
• Does not provide concurrency and interactive interface.
Systolic Architectures
• Idea: Data flows from the computer memory in a rhythmic fashion, passing through many processing
elements before it returns to memory.
• Similar to an assembly line of processing elements
• Different people work on the same car
• Many cars are assembled simultaneously
• Typically, fully pipelined (all communication between PE’s contain delay element. Also communication
between neighboring PE’s only.
• Some processors (especially boundary ones) may be different than the rest.
Systolic Architectures
Systolic Architectures
• Basic Principle:
• Replace one PE with a regular array of PEs and carefully orchestrate flow of data between the
PEs.
• Balance computation and memory bandwidth.
• Differences from pipelining:
• These are individual PEs
• Array structure can be non-linear and multi-dimensional
• PE connections can be multidirectional (and different speed)
• PEs can have local memory and execute kernels (rather than a piece of the
instruction)
• In short, a Systolic Array is
• A specialized form of parallel computing.
• Multiple processors connected by short wires.
• Cells (processors) compute data and store it independent of each other.
• Systolic Cell (unit)
• Each unit is an independent processor
• Every processor has some registers and ALU.
• The cells share information with their neighbors, after performing the
needed operations on the data.
Systolic Architectures
• Characteristics:
• Parallel Computing –
• Many processes are carried out simultaneously. As the arrays have a non centralized structure,
parallel computing is implemented.
• Pipelinability –
• It means that the array can achieve high speed. It shows a linear rate pipelinability.
• Synchronous evaluation –
• Computation of data is timed by a global clock and then the data is passed through the network. The
global clock synchronizes the array and has fixed length clock cycles.
• Repetability –
• Most of the arrays have the repetition and interconnection of a single type of PE in the entire
network.
• Spatial Locality –
• The cells have a local communication interconnection.
• Temporal Locality –
• One unit time delay is at least required for the transmission of signals from one cell to another.
• Modularity and regularity –
• A systolic array consists of processing units that are modular and have homogeneous
interconnection and the computer network can be extended indefinitely.