0% found this document useful (0 votes)

90 views66 pages

Aca Mod4&5

The document discusses multiprocessor systems and cache coherence mechanisms. It covers multiprocessor interconnect topologies and caching hierarchies. It also describes the cache coherence problem that can result in inconsistent data copies across caches. Common cache coherence protocols like snoopy bus and directory-based approaches are examined. Write policies involving write-back, write-through with write-invalidate or write-update are discussed. Cache coherence states and the snoopy bus write-invalidate protocol are outlined.

Uploaded by

Radhika K R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views66 pages

Aca Mod4&5

Uploaded by

Radhika K R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Advanced Computer Architecture

Chapter 7
Multiprocessors and Multicomputers
Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani

Source diginotes.in Save the earth. Go paperless

In this chapter…

• Multiprocessor System Interconnects

• Cache Coherence and Synchronization Mechanisms
• Three Generations of Multi-computers
• Message Routing Schemes

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
2
MULTIPROCESSOR SYSTEM INTERCONNECTS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
3
MULTIPROCESSOR SYSTEM INTERCONNECTS

• Network Characteristics
o Topology
• Dynamic Networks
o Timing control protocol
• Synchronous (with global clock)
• Asynchronous (with handshake or interlocking mechanism)
o Switching method
• Circuit switching
• Packet switching
o Control Strategy
• Centralized (global controller to receive requests from all devices and grant network access)
• Distributed (requests handled by local devices independently)

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
4
MULTIPROCESSOR SYSTEM INTERCONNECTS

• Hierarchical Bus System

o Local Bus (board level)
• Memory bus, data bus
o Backplane Bus (backplane level)
• VME bus (IEEE 1014-1987), Multibus II (IEEE 1296-1987), Futurebus+ (IEEE 896.1-1991)
o I/O Bus (I/O level)
o E.g. Encore Multimax multprocessor’s nanobus
• 20 slots
• 32-bit address path
• 64-bit data path
• Clock rate: 12.5 MHz
• Total Memory bandwidth: 100 Megabytes per second

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
5
MULTIPROCESSOR SYSTEM INTERCONNECTS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
6
MULTIPROCESSOR SYSTEM INTERCONNECTS

• Hierarchical Buses and Caches

o Cache Levels
• First level caches
• Second level caches
o Buses
• (Intra) Cluster Bus
• Inter-cluster bus
o Cache coherence
• Snoopy cache protocol for coherence among first level caches of same cluster
• Intra-cluster cache coherence controlled among second level caches and results passed to
first level caches
o Use of Bridges between multiprocessor clusters

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
7
MULTIPROCESSOR SYSTEM INTERCONNECTS

• Hierarchical Buses and Caches

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
8
MULTIPROCESSOR SYSTEM INTERCONNECTS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
9
MULTIPROCESSOR SYSTEM INTERCONNECTS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
10
MULTIPROCESSOR SYSTEM INTERCONNECTS

• Crossbar Switch Design

o Based on number of network stages
• Single stage (or recirculating) networks
• Multistage networks
o Blocking networks
o Non-blocking (re-arranging) networks
• Crossbar networks
o n x m and n2 Cross-point switch design
o Crossbar benefits and limitations

• Multiport Memory Design

o Multiport Memory

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
11
MULTIPROCESSOR SYSTEM INTERCONNECTS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
12
MULTIPROCESSOR SYSTEM INTERCONNECTS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
13
CACHE COHERENCE MECHANISMS

• Cache Coherence Problem

o Inconsistent copies of same memory block in different caches
o Sources of inconsistency:
• Sharing of writable data
• Process migration
• I/O activity

• Protocol Approaches
o Snoopy Bus Protocol
o Directory Based Protocol

• Write Policies
o (Write-back, Write-through) x (Write-invalidate, Write-update)

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
14
CACHE COHERENCE MECHANISMS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
15
CACHE COHERENCE MECHANISMS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
16
CACHE COHERENCE MECHANISMS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
17
CACHE COHERENCE MECHANISMS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
18
CACHE COHERENCE MECHANISMS

• Snoopy Bus Protocols

o Write-through caches
• Write invalidate coherence protocol for write-through caches
• Write-update coherence protocol for write-through caches
• Data item states:
o VALID
o INVALID
• Possible operations:
o Read by same processor R(i) Read by different processor R( j )
o Write by same processor W(i) Write by different processor W( j )
o Replace by same processor Z(i) Replace by different processor Z( j )

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
19
CACHE COHERENCE MECHANISMS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
20
CACHE COHERENCE MECHANISMS

• Snoopy Bus Protocols

o Write-through caches – write invalidate scheme

Current New Current New

Operation Operation
State State State State
R(i) Valid R(i) Valid
W(i) Valid W(i) Valid
Z(i) Invalid Z(i) Invalid
Valid Invalid
R(j) Valid R(j) Invalid
W(j) Invalid W(j) Invalid
Z(j) Valid Z(j) Invalid
Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of
Technology Source diginotes.in Save the earth. Go paperless
21
CACHE COHERENCE MECHANISMS

• Snoopy Bus Protocols

o Write-back caches
• Ownership protocol: Write invalidate coherence protocol for write-through caches
• Data item states:
o RO : Read Only (Valid state)
o RW : Read Write (Valid state)
o INV : Invalid state
• Possible operations:
o Read by same processor R(i) Read by different processor R( j )
o Write by same processor W(i) Write by different processor W( j )
o Replace by same processor Z(i) Replace by different processor Z( j )

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
22
CACHE COHERENCE MECHANISMS

• Snoopy Bus Protocols

o Write-back caches – write invalidate (ownership protocol) scheme

Current New Current New Current New

Operation Operation Operation
State State State State State State
R(i) RO R(i) RW R(i) RO
W(i) RW W(i) RW W(i) RW
RO Z(i) INV RW Z(i) INV INV Z(i) INV
(Valid) R(j) RO (Valid) R(j) RO (Invalid) R(j) INV
W(j) INV W(j) INV W(j) INV
Z(j) RO Z(j) RW Z(j) INV
Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of
Technology Source diginotes.in Save the earth. Go paperless
23
CACHE COHERENCE MECHANISMS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
24
CACHE COHERENCE MECHANISMS

• Snoopy Bus Protocols

o Write-once Protocol
• First write using write-through policy
• Subsequent writes using write-back policy
• In both cases, data item copy in remote caches is invalidated
• Data item states:
o Valid :cache block consistent with main memory copy
o Reserved : data has been written exactly once and is consistent with main memory
copy
o Dirty : data is written more than once but is not consistent with main memory copy
o Invalid :block not found in cache or is inconsistent with main memory copy

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
25
CACHE COHERENCE MECHANISMS

• Snoopy Bus Protocols

o Write-once Protocol
• Cache events and actions:
o Read-miss
o Read-hit
o Write-miss
o Write-hit
o Block replacement

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
26
CACHE COHERENCE MECHANISMS

• Directory Protocol

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
27
CACHE COHERENCE MECHANISMS
• Protocol Performance issues
o Snoopy Cache Protocol Performance determinants:
• Workload Patterns
• Implementation Efficiency
o Goals/Motivation behind using snooping mechanism
• Reduce bus traffic
• Reduce effective memory access time
o Data Pollution Point
• Miss ratio decreases as block size increases, up to a data pollution point (that is, as blocks
become larger, the probability of finding a desired data item in the cache increases).
• The miss ratio starts to increasing as the block size increases to data pollution point.
o Ping-Pong effect on data shared between multiple caches
• If two processes update a data item alternately, data will continually migrate between two caches
with high miss-rate

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
28
THREE GENERATIONS OF MULTICOMPUTERS

• Multicomputer v/s Multiprocessor

• Design Choices for Multi-computers
o Processors
• Low cost commodity (off-the-shelf) processors
o Memory Structure
• Distributed memory organization
• Local memory with each processor
o Interconnection Schemes
• Message passing, point-to-point , direct networks with send/receive semantics with/without
uniform message communication speed
o Control Strategy
• Asynchronous MIMD, MPMD and SPMD operations

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
29
THREE GENERATIONS OF MULTICOMPUTERS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
30
THREE GENERATIONS OF MULTICOMPUTERS

• The Past, Present and Future Development

o First Generation
• Example Systems: Caltech’s Cosmic Cube, Intel iPSC/1, Ametek S/14, nCube/10
o Second Generation
• Example Systems: iPSC/2, i860, Delta, nCube/2, Supernode 1000, Ametek Series 2010
o Third Generation
• Example Systems: Caltech’s Mosaic C, J-Machine, Intel Paragon
o First and second generation multi-computers are regarded as medium-grain systems
o Third generation multi-computers were regarded as fine-grain systems.
o Fine-grain and shared memory approach can, in theory, combine the relative merits of
multiprocessors and multi-computers in a heterogeneous processing environment.

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
31
1st Generation 2nd Generation 3rd Generation
THREE
MIPS
GENERATIONS 1OF MULTICOMPUTERS
10 100
Typical MFLOPS (scalar) 0.1 2 40
Node
Attributes MFLOPS (vector) 10 40 200
Memory Size (in MB) 0.5 4 32
Number of Nodes (N) 64 256 1024

Typical MIPS 64 2560 100 K

System MFLOPS (scalar) 6.4 512 40 K
Attributes MFLOPS (vector) 640 10 K 200 K
Memory Size (in MB) 32 1K 32 K
Local Neighbour
Communi- (in microseconds) 2000 5 0.5
cation
Latency Non-local node 6000 5 0.5
(in microseconds)
Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of
Technology Source diginotes.in Save the earth. Go paperless
32
THREE GENERATIONS OF MULTICOMPUTERS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
33
MESSAGE PASSING SCHEMES

• Message Routing Schemes

• Message Formats
o Messages
o Packets
o Flits (Control Flow Digits)
• Data Only Flits
• Sequence Number
• Routing Information

• Store-and-forward routing
• Wormhole routing

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
34
MESSAGE PASSING SCHEMES

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
35
MESSAGE PASSING SCHEMES

• Asynchronous Pipelining

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
36
MESSAGE PASSING SCHEMES

• Latency Analysis
o L: Packet length (in bits)
o W: Channel Bandwidth (in bits per second)
o D: Distance (number of nodes traversed minus 1)
o F: Flit length (in bits)
o Communication Latency in Store-and-forward Routing
• TSF = L (D + 1) / W
o Communication Latency in Wormhole Routing
• TWH = L / W + F D / W

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
37
Advanced Computer Architecture

Chapter 8
Multivector and SIMD Computers
Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani

Source diginotes.in Save the earth. Go paperless

In this chapter…

• Vector Processing Principles

• Compound Vector Operations
• Vector Loops and Chaining
• SIMD Computer Implementation Models

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
2
VECTOR PROCESSING PRINCIPLES

• Vector Processing Definitions

o Vector
o Stride
o Vector Processor
o Vector Processing
o Vectorization
o Vectorizing Compiler or Vectorizer

• Vector Instruction Types

o Vector-vector instructions
o Vector-scalar instructions
o Vector-memory instructions

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
3
VECTOR PROCESSING PRINCIPLES

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
4
VECTOR PROCESSING PRINCIPLES

• Vector-Vector Instructions
o F1: Vi  Vj
o F2: Vi x Vj Vk
o Examples: V1 = sin(V2) V3 = V1+ V2

• Vector-Scalar Instructions
o F3: s x Vi  Vj
o Examples: V2 = 6 + V1

• Vector-Memory Instructions
o F4: MV (Vector Load)
o F5: VM (Vector Store)
o Examples: X = V1 V2 = Y

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
5
VECTOR PROCESSING PRINCIPLES

• Vector Reduction Instructions

o F6: Vi  s
o F7: Vi x Vj  s

• Gather and Scatter Instructions

o F8: M  Vi x Vj (Gather)
o F9: Vi x Vj  M (Scatter)

• Masking
o F10: Vi x Vm  Vj (Vm is a binary vector)

• Examples…

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
6
VECTOR PROCESSING PRINCIPLES

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
7
VECTOR PROCESSING PRINCIPLES

• Vector-Access Memory Schemes

o Vector-operand Specifications
• Base address, stride and length
o C-Access Memory Organization
• Low-order m-way interleaved memory
o S-access Memory Organizations
• High-order m-way interleaved memory
o C/S Access Memory Organization

• Early Supercomputers (Vectors Processors)

o Cray Series ETA 10E NEC Sx-X 44
o CDC Cyber Fujitsu VP2600 Hitachi 820/80

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
8
VECTOR PROCESSING PRINCIPLES

• Relative Vector/Scalar Performance

o Vector/scalar speed ratio r
o Vectorization ratio in program f
o Relative Performance P is given by:
𝟏 𝒓
• 𝑷= =
𝟏−𝒇 + 𝒇/𝒓 𝟏−𝒇 𝒓 + 𝒇
o When f is low, the speedup cannot be high even with very high r
o Limiting Case:
• P  1 if f  0
o Maximum Case:
• P  r if f  1
o Powerful single chip processors and multicore system-on-a-chip provide High-Performance
Computing (HPC) using MIMD and/or SPMD configurations with large no. of processors.

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
9
COMPUOUND VECTOR PROCESSING

• Compound Vector Operations

o Compound Vector Functions (CVFs)
• Composite function of vector operations converted from a looping structure of linked scalar
operations
o CVF Example: The SAXPY (Single-precision A multiply X Plus Y) Code
• For I = 1 to N
o Load R1, X(I)
o Load R2, Y(I)
o Multiply R1, A
o Add R2, R1
o Store Y(I), R2
• (End of Loop)

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
10
COMPUOUND VECTOR PROCESSING

• One-dimensional CVF Examples

o V(I) = V2(I) + V(3) x V(4)
o V1(I) = B(I) + C(I)
o A(I) = V(I) x S + B(I)
o A(I) = V(I) + B(I) + C(I)
o A(I) = Q x v1(I) (R x B(I) + C(I)), etc.
Legend:
o Vi(I) are vector registers
o A(I), B(I), C(I) are vectors in memory
o Q, S are scalars available from scalar registers in memory

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
11
COMPUOUND VECTOR PROCESSING

• Vector Loops
o Vector segmentation or strip-mining approach
o Example

• Vector Chaining
o Example: SAXPY code
• Limited Chaining using only one memory-access pipe in Cray-I
• Complete Chaining using three memory-access pipes in Cray X-MP

• Functional Unit Independence

• Vector Recurrence

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
12
COMPUOUND VECTOR PROCESSING

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
13
COMPUOUND VECTOR PROCESSING

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
14
SIMD COMPUTER ORGANIZATIONS

• SIMD Computer Variants

o Array Processor
o Associative Processor
• SIMD Processor v/s SISD v/s Vector Processor Operation
o Illustration: for(i=0;i<5;i++) a[i] = a[i]+2;
o Lockstep mode of operation in SIMD processor
o Relative Performance comparison
• SIMD Implementation Models
o Distributed Memory Model
• E.g. Illiac IV
o Shared memory Model
• E.g. BSP (Burroughs Scientific Processor)

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
15
SIMD COMPUTER ORGANIZATIONS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
16
SIMD COMPUTER ORGANIZATIONS

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
17
SIMD COMPUTER ORGANIZATIONS

• SIMD Instructions
o Scalar Operations
• Arithmetic/Logical
o Vector Operations
• Arithmetic/Logical
o Data Routing Operations
• Permutations, broadcasts, multicasts, rotation and shifting
o Masking Operations
• Enable/Disable PEs
• Host and I/O
• Bit-slice and Word-slice Processing
o WSBS, WSBP, WPBS, WPBP

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
18
Advanced Computer Architecture

Chapter 9
…Dataflow Architectures
Book: “Advanced Computer Architecture – Parallelism, Scalability, Programmability”, Hwang & Jotwani

Source diginotes.in Save the earth. Go paperless

In this chapter…

• Evolution of Dataflow Computers

• Dataflow Graphs
• Static v/s Dynamic Data Flow Computers
• Pure Dataflow Machines
• Explicit Token Store Machines
• Hybrid and Unified Architectures
• Dataflow v/s Control flow Computers

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
2
DATAFLOW AND HYBRID ARCHITECTURES

• Data-driven machines
• Evolution of Dataflow Machines
• Dataflow Graphs
o Dataflow Graphs examples.
o Activity Templates and Activity Store
o Example: dataflow graph for cos x
𝒙𝟐 𝒙𝟒 𝒙𝟔
• 𝐜𝐨𝐬 𝐱 ≅ 𝟏 − + −
𝟐! 𝟒! 𝟔!
o More examples

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
3
DATAFLOW AND HYBRID ARCHITECTURES

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
4
DATAFLOW AND HYBRID ARCHITECTURES

• Static Dataflow Computers

o Special Feedback Acknowledge Signals between nodes
o Firing Rule:
• A node is enabled as soon as tokens are present on all input arcs and there is no token on
any of its output arc
o Example: Dennis Machine (1974)

• Dynamic Dataflow Computers

o Tagged Tokens
o Firing Rule:
• A node is enabled as soon as tokens with identical tags are present at each of its input arcs
o Example: MIT Tagged Token Dataflow Architecture (TTDA) machine (just simulation, never built)

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
5
DATAFLOW AND HYBRID ARCHITECTURES

• Diagrams of static dataflow and dynamic dataflow

• from Hwang and Briggs….

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
6
DATAFLOW AND HYBRID ARCHITECTURES

• Pure Dataflow Machines

o TTDA (1983)
• TTDA was simulated but never built
o Manchester Dataflow Computer (1982)
• Operated asynchronously using a separate clock for each PE
o ETL Sigma-1 (1987)
• 128 PEs fully synchronous with a 10-Mhz clock
• Implemented I-structured memory proposed in TTDA
• Lacked High Level Languages for users

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
7
DATAFLOW AND HYBRID ARCHITECTURES

• Explicit Token Store Machines

o Eliminate associative token matching.
o Waiting token memory is directly accessed using full/empty bits.
o Examples
• MIT/Motorola Monsoon (proposed 1988; operational 1991)
o Multithreading support
o 8 processors
o 8 I-structure memory modules
o 8x8 crossbar network
• ETL EM-4 (1989)
o Extension of Sigma-1
o Proposed 1024 nodes; Operational Implementation 80 nodes

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
8
DATAFLOW AND HYBRID ARCHITECTURES

• Hybrid and Unified Architectures

o Combining Features of von-Neumann and Dataflow architectures
o Examples:
• MIT P-RISC (1988)
• IBM Empire (1991)
• MIT/Motorola *T (1991)
o “RISC-ified” dataflow architecture
• Implemented in P-RISC
• Splitting complex dataflow instructions into separate simple component instructions
• Tighter encoding and longer threads for better performance

• Dataflow Processing v/s Control Flow Processing

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
9
DATAFLOW AND HYBRID ARCHITECTURES

• Computing ab + a/c with:

(a) control flow; (b) dataflow. Pure dataflow basic execution pipeline: (c) single-token-per-arc dataflow;
(d) tagged-token dataflow; (e) explicit token store dataflow

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
10
DATAFLOW AND HYBRID ARCHITECTURES

• Computing ab + a/c with:

(a) control flow; (b) dataflow. Pure dataflow basic execution pipeline: (c) single-token-per-arc dataflow;
(d) tagged-token dataflow; (e) explicit token store dataflow

Dr. Vasanthakumar G U, Department of CSE, Cambridge Institute of

Technology Source diginotes.in Save the earth. Go paperless
11

Module 4
No ratings yet
Module 4
66 pages
Cache Coherence (Part 1)
No ratings yet
Cache Coherence (Part 1)
13 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Chapter 7
No ratings yet
Chapter 7
97 pages
Unit 4
No ratings yet
Unit 4
9 pages
EGC121lect20 Multicore MSI Protocol
No ratings yet
EGC121lect20 Multicore MSI Protocol
39 pages
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
No ratings yet
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
24 pages
Module 4
No ratings yet
Module 4
40 pages
Multiprocessors & Thread-Level Parallelism
79% (19)
Multiprocessors & Thread-Level Parallelism
29 pages
Cache Coherence
No ratings yet
Cache Coherence
39 pages
F05 - Memory Consistency Models Plus Introduction To Caches
No ratings yet
F05 - Memory Consistency Models Plus Introduction To Caches
48 pages
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
No ratings yet
VII. Cache Coherence. Interconnection Networks (1) : March 16, 2009
42 pages
Shared Memory Multiprocessors: Logical Design and Software Interactions
No ratings yet
Shared Memory Multiprocessors: Logical Design and Software Interactions
107 pages
L7 Multicore 1
No ratings yet
L7 Multicore 1
50 pages
L39 - Centralized Shared Memory Architectures
No ratings yet
L39 - Centralized Shared Memory Architectures
31 pages
Multiprocessors
No ratings yet
Multiprocessors
39 pages
Cache Coherence Part 1
No ratings yet
Cache Coherence Part 1
37 pages
Parallel 2
No ratings yet
Parallel 2
14 pages
Multiprocessing: Flynn's Classification (1966)
No ratings yet
Multiprocessing: Flynn's Classification (1966)
8 pages
Cache Coherence - 20250120 - 142158 - 0000
No ratings yet
Cache Coherence - 20250120 - 142158 - 0000
34 pages
Cache Coherency
No ratings yet
Cache Coherency
19 pages
Cs 6461 Computer Architecture Lecture 11
No ratings yet
Cs 6461 Computer Architecture Lecture 11
51 pages
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
No ratings yet
Cache Coherency in Multiprocessors (MPS) / Multi-Cores: Topic 9
79 pages
MC&CC
No ratings yet
MC&CC
21 pages
Cache Coherence - MESI MOESI
No ratings yet
Cache Coherence - MESI MOESI
57 pages
Multiprocessor Architectures & Cache Coherence
No ratings yet
Multiprocessor Architectures & Cache Coherence
54 pages
Distributed OS: Memory & Multiprocessors
No ratings yet
Distributed OS: Memory & Multiprocessors
89 pages
MultiProcessors Tanenbaum BP
No ratings yet
MultiProcessors Tanenbaum BP
29 pages
Multi Processor
No ratings yet
Multi Processor
63 pages
Cache Coherence: - According To Webster's Dictionary
No ratings yet
Cache Coherence: - According To Webster's Dictionary
15 pages
Coherence
No ratings yet
Coherence
16 pages
MODULE 4 HPC
No ratings yet
MODULE 4 HPC
41 pages
Computer Science 146 Computer Architecture
No ratings yet
Computer Science 146 Computer Architecture
17 pages
William Stallings Computer Organization and Architecture 9 Edition
No ratings yet
William Stallings Computer Organization and Architecture 9 Edition
51 pages
CH17 COA9e
No ratings yet
CH17 COA9e
51 pages
ACA Lecture 29 Cache-Coherence 2
No ratings yet
ACA Lecture 29 Cache-Coherence 2
42 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
39 pages
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
14 pages
CA Lecture 13
No ratings yet
CA Lecture 13
27 pages
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
No ratings yet
Multiprocessors and Multithreading: CS151B/EE M116C Computer Systems Architecture
13 pages
1st Ia Preparation
No ratings yet
1st Ia Preparation
15 pages
KTMTSS Shared Memory Multiprocessor
No ratings yet
KTMTSS Shared Memory Multiprocessor
29 pages
Week 5
No ratings yet
Week 5
52 pages
Key Aspects of Shelved Issue
No ratings yet
Key Aspects of Shelved Issue
13 pages
Cache Coherence
No ratings yet
Cache Coherence
18 pages
Cache Coherence: Computer Science & Artificial Intelligence Lab
No ratings yet
Cache Coherence: Computer Science & Artificial Intelligence Lab
36 pages
Chapter 9
No ratings yet
Chapter 9
50 pages
Cache Coherence in SMP Systems
No ratings yet
Cache Coherence in SMP Systems
29 pages
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
No ratings yet
Cache Coherence: Write-Invalidate Snooping Protocol For Write-Back
21 pages
Lect4 Parallelsystem-Shared Memory
No ratings yet
Lect4 Parallelsystem-Shared Memory
31 pages
Unit 10 Multiprocessors
No ratings yet
Unit 10 Multiprocessors
26 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Cache Coherence
No ratings yet
Cache Coherence
53 pages
Mehmet Senvar - Cache Coherence Protocols
No ratings yet
Mehmet Senvar - Cache Coherence Protocols
30 pages
Chapter 8 - Parallel Processing
No ratings yet
Chapter 8 - Parallel Processing
50 pages
Yan Solihin - Fundamentals of Parallel Computer Architecture
100% (2)
Yan Solihin - Fundamentals of Parallel Computer Architecture
547 pages
Snoop-Based Multiprocessor Design
No ratings yet
Snoop-Based Multiprocessor Design
57 pages
Multi Processors and Thread Level Parallelism
No ratings yet
Multi Processors and Thread Level Parallelism
74 pages
Computer Architecture CS F342 Ca-Lect6
No ratings yet
Computer Architecture CS F342 Ca-Lect6
19 pages
LIMS Workflow-SOI
No ratings yet
LIMS Workflow-SOI
9 pages
Microsoft Network NAP
No ratings yet
Microsoft Network NAP
113 pages
CCTV PDF
No ratings yet
CCTV PDF
29 pages
Distance Relay D60
No ratings yet
Distance Relay D60
630 pages
Framework For Prallel Processing
No ratings yet
Framework For Prallel Processing
31 pages
Help - Graphical User Interface
No ratings yet
Help - Graphical User Interface
4 pages
DE1-SoC User Manual v.1.2.2
No ratings yet
DE1-SoC User Manual v.1.2.2
114 pages
Symmetrix CLI Cheat Sheet
No ratings yet
Symmetrix CLI Cheat Sheet
4 pages
LNL 4420
No ratings yet
LNL 4420
2 pages
Accessmax Release 9.0 Etsi General Release Description: March 31, 2004
No ratings yet
Accessmax Release 9.0 Etsi General Release Description: March 31, 2004
24 pages
Huawei BSC6900 HW Introduction
No ratings yet
Huawei BSC6900 HW Introduction
14 pages
IIS Interview Questions and Answers
No ratings yet
IIS Interview Questions and Answers
14 pages
6264 Datasheet
No ratings yet
6264 Datasheet
15 pages
LABSIM For CompTIA Linux+ (LX0-101, LX0-102 & LPIC-1)
0% (3)
LABSIM For CompTIA Linux+ (LX0-101, LX0-102 & LPIC-1)
3 pages
1.ugs Microwave PDF
No ratings yet
1.ugs Microwave PDF
5 pages
DCC Report
No ratings yet
DCC Report
6 pages
CSS 432: Subnetting & CIDR Guide
No ratings yet
CSS 432: Subnetting & CIDR Guide
20 pages
Watch Full Movie Online : (Really Amazing You Can Try This) - Watch Full Movie Online 4K Ultra 1080P Blu-Ray Uhd
No ratings yet
Watch Full Movie Online : (Really Amazing You Can Try This) - Watch Full Movie Online 4K Ultra 1080P Blu-Ray Uhd
3 pages
Downloading Multiple PDF Forms in One PDF File - SAP Q&A
No ratings yet
Downloading Multiple PDF Forms in One PDF File - SAP Q&A
3 pages
Checks360: Advanced Background Screening Platform
No ratings yet
Checks360: Advanced Background Screening Platform
6 pages
Dell Boomi Course Agenda PDF
No ratings yet
Dell Boomi Course Agenda PDF
11 pages
PW Implementation Guide
No ratings yet
PW Implementation Guide
356 pages
Cloud Computing - Statistics On The Use by Enterprises
No ratings yet
Cloud Computing - Statistics On The Use by Enterprises
9 pages
Etl Ssis
No ratings yet
Etl Ssis
10 pages
Network Scanning Using Nmap
No ratings yet
Network Scanning Using Nmap
6 pages
SMS-Controlled Device Circuit Guide
No ratings yet
SMS-Controlled Device Circuit Guide
6 pages
Designed For Big Rooms and Even Bigger Ideas.: Logitech GROUP
No ratings yet
Designed For Big Rooms and Even Bigger Ideas.: Logitech GROUP
4 pages
PCS 7 Architectures V80SP1 en PDF
No ratings yet
PCS 7 Architectures V80SP1 en PDF
260 pages
Lte HeNB Gateway
100% (1)
Lte HeNB Gateway
16 pages