0% found this document useful (0 votes)
512 views100 pages

Isscc2021 T8

Uploaded by

dxzhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
512 views100 pages

Isscc2021 T8

Uploaded by

dxzhang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

On-Chip Interconnects:

Basic Concepts, Designs


and Future Opportunities

Yvain Thonnart
CEA-LIST

Live Q&A Session: Feb. 13, 2021, 7:20-7:40 am, PST


Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 1 of 100
Designs and Future Opportunities
Self-introduction
 Yvain Thonnart

 French “Grandes Écoles” higher education


 M. Sc École Polytechnique in 2005
 Engineering degree Télécom ParisTech in 2005
 Joined the Technological Research Division
of CEA, France in 2005
 15 years in CEA-LETI institute
 Now within CEA-LIST institute
 My interests are in on-chip communication
and integration with new technologies
 Asynchronous logic
 Photonics
 Cryoelectronics

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 2 of 100


Designs and Future Opportunities
Motivation
 With technology advances, computation
logic in itself has become very efficient
and integrated, with massive parallelism,
especially for AI
Photo: Cerebras Systems

 Data movement is crucial in the Cerebras CS-1 [K. Rocki et al., SC '20]
performance of systems

 High throughput interconnect fabrics


across the chip need to achieve
low latency in a low power budget.

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 3 of 100


Designs and Future Opportunities
Outline
 Background and motivation
 Architectural level
 Communication patterns
 Topologies & switching schemes
 Traffic regulation
 Microarchitecture level
 Pipelining
 Routing, Arbitration
 Size & frequency conversion
 Circuit level
 Clock domains
 Resynchronization
 Clock-less pipelines
 3D-chip integration for on-chip communication
 Summary & conclusions
Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 4 of 100
Designs and Future Opportunities
Background: CPU, SoC, ML chip needs

 CPU, IBM, ISSCC’20  SoC, Mediatek, ISSCC’20  ML, KAIST, ISSCC’20


 Up to 25mm  Multiple interconnects  Streamed interconnect
(21.7km signal wire)  Multiple domains  Mesh structure
 Up to 1200b wide  Diversity, latency, power  Semi-static
 Cache miss latency  Latency, power

 Objectives: throughput, latency, power


 Challenges: complexity & distance, multiple communication patterns
Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 5 of 100
Designs and Future Opportunities
Background: the cost of moving data
Local Cluster Chip-level External I/O
e.g. L1 cache L2 cache L3/LLC Main memory

Core L1
L2
Core L1
L3 MEM
Core L1
L2
Core L1

Distance ~0.1 mm x10 ~1 mm x10 ~10 mm x10 ~100 mm


Latency ~4 cycles x3 ~12 cycles x3 ~36 cycles x3 ~100 ns
Power ~0.3-1pJ/bit* x3 ~1-3pJ/bit x3 ~3-10pJ/bit x3 ~10-30pJ/bit
*Technology dependent

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 6 of 100


Designs and Future Opportunities
Interconnects:
from metal to fabric

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 7 of 100


Designs and Future Opportunities
Interconnects: bare metal line ?

Tx Rx

 Bare metal line : single-ended


 Need common reference or differential signaling to get a proper voltage
 Resistive or capacitive load
 Parasitic resistance & capacitance
 Can create large delays, can exhibit large crosstalk

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 8 of 100


Designs and Future Opportunities
Interconnects: transmission lines
Analog approach: control the environment

Tx Zc Rx

 Resistive termination for matched line impedance


 No capacitive load
 Wave propagation  reduced delay
 Analog domain  low swing
 Avoid rebounds, custom design

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 9 of 100


Designs and Future Opportunities
Interconnects: buffered wiring
Digital approach: break long wires and restore signal

Tx Rx

 Use CMOS inverters to restore signal


 Shorter wires between inverters
 Lower capacitance
 Lower crosstalk
 Full-swing signal at receiver

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 10 of 100


Designs and Future Opportunities
Interconnects: beyond point-to-point
Digital approach: bring more functionality with digital logic

Tx Rx

Tx Rx

 e.g. multiplex data over a shared line


 Digital on-chip communication enables bus architectures

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 11 of 100


Designs and Future Opportunities
On-chip interconnect: system view

Core0 CoreN Accelerator Memory I/O

Interconnect

 The whole communication infrastructure connecting all chip modules together


 Multiple interface protocols
 Multiple clock & power domains

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 12 of 100


Designs and Future Opportunities
Interconnects: from metal to networks
 “Interconnect” for technologists designates the metal lines above the CMOS

 “Interconnect” for SoC engineers is the whole communication infrastructure

 A diversity of design and architecture solutions


 Transmission Lines: bare metal, complexity at the ends
 Buffered wiring: repeater logic all along the line
 Buses: shared medium to connect several instances
 Networks on chip (NoC): pipelined topologies with routing and arbitration

Some terminology to be clarified in next slides

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 13 of 100


Designs and Future Opportunities
Interconnects:
from bus to NoC

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 14 of 100


Designs and Future Opportunities
Interconnect/bus/NoC?: terminology
 Interconnect:
 Communication infrastructure between modules

 Network on chip (NoC):


 A structure of interconnect with switches (routers) between modules

 Bus:
 As a system function:
Command - Address - Data protocol between system modules
 As a structure: “Shared Bus”
interconnect with shared command/address/data lines

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 15 of 100


Designs and Future Opportunities
Bus and/or NoC: what is a “bus” ?
 Shared “bus” as a structure
 From board-level integration
 Historically tri-state drivers (high impedance to disconnect)
Target 1
Initiator (e.g. processor) (e.g. memory) Target N

FSM Decode Decode


Data
Address
Command

 Issues:
 No buffering possible
 Long wiring
 Large capacitance

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 16 of 100


Designs and Future Opportunities
Bus and/or NoC: what is a “bus” ?
 Shared “Bus” as a structure : basic on-chip implementation
 Differentiate Read & Write Data channels
 Replace tri-state buffers by multiplexers

Initiator Target 1 Target N

FSM Decode Decode


Write Data
Address
Command
Read Data

 Issue:
 Long combinational paths  low frequency

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 17 of 100


Designs and Future Opportunities
Bus and/or NoC: pipelined buses
 High-performance Bus architectures require pipelining  need to queue data
 Straightforward pipelining of previous architecture with First-in-First-out queues

Target 1 Target N
Initiator

FSM Decode Decode


Write Data
Address
Command

Read Data

Arbiter Arbiter

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 18 of 100


Designs and Future Opportunities
Bus and/or NoC: FIFOs and switches
 FIFO: first-in first-out queue
 Incoming data enters the queue when not full
 Data order is preserved inside the queue
 Data is dequeued when consumer is ready

 Routers – switches
 Send data from input ports to appropriate output ports
 Roughly equivalent terms for on-chip interconnects:
terminology comes from Internet networks. Preferred use:
N
 Switch:  Router:
W E
little or no queuing, often more queuing,
ports not directional Switch directional ports

S
Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 19 of 100
Designs and Future Opportunities
Bus and/or NoC: buses meet networks
 Local handshake between FIFOs & switches enables complex routing topologies
 Advanced buses are NoCs!
 Interconnect as a general term

Switch
Rx
Tx Rx

Switch
Tx Rx

Rx

Switch
Tx
Tx Rx

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 20 of 100


Designs and Future Opportunities
Outline
 Background and motivation
 Architecture level
 Communication patterns
 Topologies
 Switching schemes
 Traffic regulation
 Microarchitecture level
 Circuit level
 3D-chip integration for on-chip communication
 Summary & conclusions

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 21 of 100


Designs and Future Opportunities
A functional view of interconnects
FIFOs
 Transport data
Elastic buffers
 Pipeline & flow-control
 Route data to the destination Demuxes
 Extract routing information Routing tables
 Steer data to the appropriate ports
 Handle concurrent data
 Arbitrate between competing transactions Arbiters
 Preserve transaction integrity Muxes
(do not mix data from different transactions)
 Format data Size converters
 Convert word sizes Bi-synchronous FIFOs
 Convert frequencies (across clock domains)
Network IF
 Convert protocols Bridges
Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 22 of 100
Designs and Future Opportunities
Architecture:
Communication patterns

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 23 of 100


Designs and Future Opportunities
Communication: systolic architectures
 Systolic architectures
 Nearest-neighbor synchronous PE PE PE
data transfers between adjacent cells

PE PE PE

PE PE PE

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 24 of 100


Designs and Future Opportunities
Communication: dataflow streams
 Systolic architectures
 Nearest-neighbor synchronous PE0 PE1
data transfers between adjacent cells

to PE3 from PE2


 Dataflow architectures
 According to a dataflow graph
between tasks, e.g: to PE4 Interconnect
1 3 from PE4
PE3
PE2
0 5

2 4 PE5
PE4
 Static/dynamic data streams

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 25 of 100


Designs and Future Opportunities
Protocols: dataflow streams
 Transaction contains
Source and Destination IDs PE0 PE1
 Identify data stream
 To route in the interconnect
to PE3 from PE2
 To demultiplex at destination Credits 2→3

to PE4 Data Stream


 End-to-end flow control: credits
Src ID:2→Dst ID:3
 To control input queue saturation from PE4
at destination PE3
PE2
 Backward stream to allow new data
 Credit counter maintained in sender
PE5
PE4

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 26 of 100


Designs and Future Opportunities
Communication: memory transactions
 Systolic architectures Initiator: Initiator: Initiator:
 Nearest-neighbor synchronous Core0 DMA Core1
data transfers between adjacent cells

 Dataflow architectures Response


 Static/dynamic data streams Interconnect
Request
 Transaction-based architectures
 Memory-mapped transactions
 Requests & Responses Off-chip
On-chip I/O
Memory
Memory interface
interface

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 27 of 100


Designs and Future Opportunities
Protocols: memory transactions
 Bus protocol between cores & interfaces Initiator: Initiator: Initiator:
 Internal NoC protocol Core0 DMA Core1
routed from addresses & source ID Interface Interface Interface
 Request/Response Request
Interconnect
 Two independent channels Interface Interface Interface
 Possible split networks
 Logical view: two interconnects Off-chip On-chip I/O
Mem. IF Memory interface
 End-to-end flow control:
outstanding transactions Interface Interface Interface
Response
 Multiple requests allowed Interconnect
before first response Interface Interface Interface
 Limited number
 Responses in-order or out-of-order Initiator: Initiator: Initiator:
 Reorder buffer Core0 DMA Core1

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 28 of 100


Designs and Future Opportunities
Protocols: memory map coherence
 Data sharing in multi-core architectures
 Uses complex protocols
 Needs more than two channels (request/response), e.g forward & completion
 For instance:
Core0 Memory Core1
(requesting data) (home of data) (sharing data)

Request
Forward
Response

Time
Completion
Completion

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 29 of 100


Designs and Future Opportunities
Communication patterns
 Systolic architectures Minimal interconnect:
 Nearest-neighbor synchronous  FIFOs between cells
data transfers between adjacent cells  No contention

 Dataflow architectures
 Static/dynamic data streams
Complex interconnect:
 Longer transfers
 Transaction-based architectures
 Routing and arbitration needed
 Memory-mapped transactions
 Contention
 Requests & Responses
 Different traffic classes

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 30 of 100


Designs and Future Opportunities
Architecture:
Topologies

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 31 of 100


Designs and Future Opportunities
Interconnect kinds

 Bipartite: Source ports to destination ports


 A given module can have multiple source and/or destination ports
 Links in the topology are unidirectional
 Request-Response bus-like, Initiators & targets

 Peer-to-Peer: bidirectional communication


 All module interfaces made identical on network side
 For Dataflow architectures or clustered tiles containing initiators & targets
 Can be leveraged with Regular architectures

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 32 of 100


Designs and Future Opportunities
Bipartite interconnect topologies
 Fully-connected (crossbar)
 Single switch connecting
all inputs to all outputs

Src Src
Dst Dst
Src
⇔ Src
Switch

Dst Dst
Src Src
Dst Dst
Src Src

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 33 of 100


Designs and Future Opportunities
Bipartite interconnect topologies
 Fully-connected (Crossbar)
 Single switch connecting
all inputs to all outputs
 Partly-connected
 According to connectivity matrix
S0
D0
D0 D1 D2
S1
S0 X X X


D1
S1 X ∅ X S2
S2 ∅ X X D2
S3 X X ∅ S3

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 34 of 100


Designs and Future Opportunities
Bipartite interconnect topologies
 Fully-connected (Crossbar)
 Single switch connecting
all inputs to all outputs
 Partly-connected
 According to connectivity matrix
S0 D0

Switch
Switch
 Multi-layered
 To limit the design complexity of
many-port switches S1 D1
 Pipelined as necessary
S2 D2

Switch
Switch
S3 D3

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 35 of 100


Designs and Future Opportunities
Bipartite interconnect topologies
 Fully-connected (Crossbar) Request network
 Single switch connecting S0 D0
all inputs to all outputs
S1 D1
 Partly-connected
S2 D2
 According to connectivity matrix
 Multi-layered S3 D3
 To limit the design complexity of
many-port switches Symmetrical response network
 Pipelined as necessary S0 D0
 Symmetrical/asymmetrical S1 D1
 For request-response interconnects
S2 D2
 Response network can be mirrored
from request network or not S3 D3

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 36 of 100


Designs and Future Opportunities
Regular interconnect topologies
 Hierarchical  Distributed
 from Ethernet networks  Suited to tiling

Star/Fat-Tree
Folded Ring (1D) Mesh (2D)

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 37 of 100


Designs and Future Opportunities
Architecture:
Routing schemes

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 38 of 100


Designs and Future Opportunities
Routing: circuit-switched interconnects
 For application-specific needs,
maintain channels between endpoints Scheduling/
Arbiter
 Circuits established for many cycles

 Separate control signaling


 Dedicated control infrastructure
driving the switches

Switch
 Most often centralized control
 Quasi-static scheduling
or centralized arbitration
 Potential time multiplexing:
alternating switch configurations
every cycle

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 39 of 100


Designs and Future Opportunities
Routing: packet-switched interconnects
 Data grouped in packets
 1 packet : 1 or more data words
 One word is a “Flit” (Flow-control unit)
 e.g.: 1st flit = base address & command
2nd flit & next = data burst

data data data data addr

 Each packet contains routing information


in the header flit
 Packet routing is atomic
 No flit interleaving with other packets
 Can span multiple blocks

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 40 of 100


Designs and Future Opportunities
Routing: cyclic topologies & deadlocks
 Queuing behind a stalled packet
waiting for an output
 Potential trail accumulating

 Invalid routing algorithms may create


cycles of stalled packets

 Potential deadlock
 No packet can make progress to
destination

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 41 of 100


Designs and Future Opportunities
Routing: cyclic topologies & deadlocks
 Queuing behind a stalled packet
waiting for an output
 Potential trail accumulating

 Invalid routing algorithms may create


cycles of stalled packets

 Potential deadlock
 No packet can make progress to
destination

 Solved by forbidding some turns


 E.g. X-Y routing: always X first
 No turn from vertical to horizontal
Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 42 of 100
Designs and Future Opportunities
Dependency removal: channels
 Same kind of deadlock could arise from chained dependencies in transactions
 e.g. responses blocked by and blocking new requests
 Need to allow progress of different messages independently from other channels
Blocked Stalled

 Physical channels solution: split networks for each channel


Free
Request channel

Response channel Stalled

 Virtual channels solution: shared infrastructure, split pipelining


Request channel Free
Response channel Stalled
Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 43 of 100
Designs and Future Opportunities
Architecture:
Traffic regulation

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 44 of 100


Designs and Future Opportunities
Contention: network saturation
 Latency vs network load

Average end-to-end
 Simultaneous transfers slow down
the network

latency (ns)
Saturation
 Depends on architecture threshold
 Topology bottlenecks
 Amount of pipelining in the network

 Depends on traffic patterns Zero-load latency


 Routing strategies
 Traffic burstiness 0% ~30%
Injection rate at each source (%)

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 45 of 100


Designs and Future Opportunities
Contention: queue sizing
 Wormhole routing
Stalled Stalled
 Low area: minimal queue sizing
 possibly sub-packet size
 Packet can span several routers Busy
 Could block other routes upstream

 Store and Forward


 Full packet (1 or more) queue sizing Free
Stalled
 But transmission only when
complete packet received
 Cut through Busy
 Full packet (1 or more) queue sizing
 Start transmission as soon as
header flit received

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 46 of 100


Designs and Future Opportunities
Contention: bandwidth limitation
 Used to avoid saturation by big producers blocking other traffic

 Done in interfaces with initiator cores


 Limit average injection rate by delaying transactions

 Should be used with caution:


 Used to limit latency in the network
 But can create large application latency

 Mathematical traffic model preferred


 Queuing theory

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 47 of 100


Designs and Future Opportunities
Outline
 Background and motivation
 Architecture level
 Microarchitecture level
 Pipelining
 Routing
 Arbitration
 Size & frequency conversion
 Circuit level
 3D-chip integration for on-chip communication
 Summary & conclusions

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 48 of 100


Designs and Future Opportunities
Microarchitecture:
Pipelining

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 49 of 100


Designs and Future Opportunities
Pipelining: backpressure needed
 Register insertion to remove long combinational paths and reduce cycle time
 Requires backward signaling to avoid data loss when stalled
 Issue: Naïve approach creates new long combinational backward paths
 Solutions needed to pipeline the backward signal as well

Sender Receiver
ready/stalled

En V En V En V En V En V

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 50 of 100


Designs and Future Opportunities
Pipelining: credit-based flow-control
 Forward & backward register insertion to reduce cycle time
 Backward signals are low-level credits allowing to send data
 Twice as much data buffering needed at destination to maintain throughput

Lb pipeline stages
Sender Receiver
+1
credits 0 consumed?

V V V V V

(Lf+Lb) slots
Lf pipeline stages
Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 51 of 100
Designs and Future Opportunities
Pipelining: elastic buffers
 Depth-2 FIFOs allow to break backward path without credit counters
 No additional buffering at the receiver: uses a lower total amount of storage
 Second place in FIFO used to amortize stalls

Lf elastic buffers, 2Lf data capacity


Sender Receiver
ready ready ready ready
stall ready
stall ready
stall

V V V V V

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 52 of 100


Designs and Future Opportunities
Pipelining: low-level handshake
 Minimal flow control required
 Data: payload word
 Valid (or Send or Request): flow control bit from sender
 Ready (or Accept or Grant): flow control bit from receiver

Valid

Demux
Network Data
Size

FIFO
interface
converter
from core Ready

 Local handshake allows to decouple communication between all blocks

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 53 of 100


Designs and Future Opportunities
Low-level handshake protocol
 Minimal flow control required
 Data: payload word
 Valid (or Request or Send): flow control bit from sender
 Ready (or Grant or Accept): flow control bit from receiver for backpressure
 Various chronograms possible
 Valid/Ready preferred without combinational path between sender and receiver
Clock edges Clock edges Clock edges
Send Request Valid

Data Data Data

Accept Grant Ready

Combinational Combinational Sequential

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 54 of 100


Designs and Future Opportunities
Microarchitecture:
Routing & arbitration

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 55 of 100


Designs and Future Opportunities
Routers/switches
 group routing and arbitration in a router
In Out

In:
Steer Demuxes
Routing
In In In tables

Out

In
Out
In
Out Out Out:
Arbiters
Out In
Arb
Muxes

3x2 switch for bipartite 5-port router for 2D-mesh

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 56 of 100


Designs and Future Opportunities
Routing & packet format: header
 Routing information extracted from header flit
 Encoded from source network interface
Decode Steer

 Local association tables in input port of router


 To a local output port of the router

BOP
EOP
 Packet delimiters to route the whole packet Header 10 Route info
to the same port flit
 Begin of Packet on header flit: BOP (optional) 00 Data payload
 End of Packet on last flit: EOP

 Demux set on BOP, released on EOP Tail


01 Data payload
flit

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 57 of 100


Designs and Future Opportunities
Routing & packet format: route
 Option 1: route identifier & routing tables
 From address range or destination ID
 Local table different for each input port

route ID port route ID port


R3 #2 R3 #1

@ route ID
0x00FF… R3 I0 T0

I1 R3 T1
BOP
EOP

I2 T2
Header 10 R3
flit I3 T3

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 58 of 100


Designs and Future Opportunities
Routing & packet format: route
 Option 2: Explicit path to target
 Sequence of turns encoded in header flit, e.g. “East, East, North, Local”
 Each router pops its own info

L:

L
EENL
BOP
EOP

ENL NL
Header 10 xxxEENL
flit E:ENL E:NL N:L

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 59 of 100


Designs and Future Opportunities
Routing & packet format: route
 Option 3: coordinates-based routing, e.g. 2D mesh
 Destination coordinates in header
 Comparison to router coordinates for X-Y routing

R(0,1):
X=0,
Y=1:L
(0,1)
BOP
EOP

R(0,0):
X=0,
Header 10 (0,1) Y>0:N
flit R(1,0): R(2,0):
X<1:W X<2:W

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 60 of 100


Designs and Future Opportunities
Arbitration: different needs
 Between input directions:
 Atomic transmission of all flits in the packet
Arb. Mux
 To maintain consistent routing information
between flits of a packet

BOP

BOP
EOP

EOP
 Between virtual channels:
 Flits can be interleaved
10 Header 10 Header
 Some virtual channels may have higher priority
 High-priority packets can preempt low priority packets
00 payload 00 payload

01 payload 01 payload
From in #1 From in #2
Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 61 of 100
Designs and Future Opportunities
Arbitration: policies
 For arbitration between directions, priority meaningless
 Issue: All directions must be served equally
 Fairness
 Solution: Balanced arbiters
 Round-Robin : rotating priority between inputs => cost: 1 counter
 Least recently used (LRU) => cost: ordered list of used ports

 For arbitration between virtual channels, priority can be relevant


 Issue: if saturation of high-priority traffic, other sources never served
 Starvation
 Solution: Bandwidth-limited priority
 Temporarily decrease priority level

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 62 of 100


Designs and Future Opportunities
Microarchitecture:
Conversion

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 63 of 100


Designs and Future Opportunities
Size conversion
 Parts of the interconnect can have various data widths
 Preservation of the underlying protocol on serialized words
 e.g. transform masked write of 64b words to masked 32b words
 May require doubling the frequency to maintain throughput

BOP
EOP
BOP
EOP

10 Header routing
10 Header routing Only if required
00 2nd part of Header
byte 00 mask 4-byte data
00 8-byte data
mask 00 mask 4-byte data

byte 00 mask 4-byte data


01 8-byte data
mask 01 mask 4-byte data

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 64 of 100


Designs and Future Opportunities
Size and frequency conversion
 Size and frequency conversion done with special FIFOS
 E.g. doubled reading vs writing clock frequency

½ clock gating clk

write logic read logic


even
Bits of even flits /odd

Protocol
compatible
bit reorder

Bits of odd flits

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 65 of 100


Designs and Future Opportunities
Outline
 Background and motivation
 Architecture level
 Microarchitecture level
 Circuit level
 Clock domains
 Resynchronization
 Clock-less pipelines
 3D-chip integration for on-chip communication
 Summary & conclusions

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 66 of 100


Designs and Future Opportunities
Circuit:
Clock domains

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 67 of 100


Designs and Future Opportunities
Multi-synchronous designs
 Various frequencies for each core: no faster than needed
 Enable DVFS: adjust voltage accordingly
 Unrelated clocks: smoothen current peaks
 GALS: Globally Asynchronous, Locally Synchronous
 Requires bi-synchronous FIFO

clk1@F1, V1 clk2@F2, V2

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 68 of 100


Designs and Future Opportunities
Mesochronous designs
 Same frequency (from same clock source), relaxed phase constraints
 E.g. up to half a period
 Relax timing closure at chip top level for clock tree synthesis
Clock source

Skew

w. timing
margin

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 69 of 100


Designs and Future Opportunities
Resynchronization & metastability
 How to capture data from a different clock domain ?
 No possibility to control setup/hold margins
 When master latch closes while D is not settled  metastable point
 Indefinitely long undefined logic value CK
 “Memory-less” stochastic Poisson process
CK CK
D

CK CK
P1
D flip-flop D Q
N1 P1 N2 P2

VP1 VP2 P2/Q

VN1 VN2
Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 70 of 100
Designs and Future Opportunities
Time characterization of metastability
 Forced transitions on D close to clock edge
 Measure clock-to-Q delay
 Deterministic delay increases
near metastable window
 Used for setup/hold characterization
τ
63%
Setup


 Stochastic delay becomes unbounded Hold
margin margin
within metastability window Tw

Δt
 Poisson process: can delay indefinitely
Tw

 Metastability average exit time τ is Det. Stochastic Deterministic


time after which 63% of metastable events
are resolved
Δt →
 Lower τ for high loop gain in latch [A. Agarwal, ISSCC’20]
 Dedicated flip-flops for synchronization

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 71 of 100


Designs and Future Opportunities
Resynchronization & metastability
 Major problem: logic divergence & reconvergence after the unstable node

Trip point
Vout skewed toward 1
CK CK
D flip-flop Error
Vin should never
CK CK 1 happen
Q
D
1

VP2 Vout Trip point


skewed toward 0

VN2 Vin
Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 72 of 100
Designs and Future Opportunities
Resynchronization & metastability
 Major problem: logic divergence & reconvergence after the unstable node
 Solution: no divergence until certain to have resolved metastability
 Output of the 2nd flip-flop
 Or more if needed

Q1 Q2 Q3

CK
D
Q1
Q2
Q3
Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 73 of 100
Designs and Future Opportunities
Synchronization failures
 Mean time between failures (MTBF)
 Depends on Example for Fd =100MHz, Fck=1GHz:
 Resync flip-flop parameters τ & Tw
Tw τ N MTBF
 Number of flip-flop stages N
60ps 2 4 seconds
 Sampling frequency Fck (period Tck)
60ps 3 2 years
 Input data toggle rate Fd 40ps
30ps 2 2 years
30ps 3 700 trillion years
1 Worst case 30ps 2 1 month
𝑀𝑇𝐵𝐹 =
𝑇 ( )
(Tw/Tck=1) 30ps 3 30 trillion years
𝐹 𝑒
𝑇
Probability of  resync. flip-flops tailored for low τ
Metastable
staying metastable  Tw less important
events rate
after N Flip-Flops  N costly in latency

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 74 of 100


Designs and Future Opportunities
Bi-synchronous FIFO buffers
Clocked by wclk Clocked by rclk
 Synchronize
Read & write pointers
 Read stable data
 Dedicated pointer
value encoding wdata rdata
 One bit change
between values

wvalid +1 +1 rready
wptr rptr

Test Test
wready full? empty? rvalid

synchronizer synchronizer

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 75 of 100


Designs and Future Opportunities
Data synchronization
 Never synchronize a whole bus:
 Inconsistent metastability resolution between bits!
 Only a single bit change can be resynchronized (Hamming distance of 1)
 For value passing, i.e. incrementing FIFO pointers, need a specific encoding

 For large depths: Gray  For small depths: Johnson (from 1-hot code)
 Derived from binary code  Shift left & update lsb with inverted msb
 2N values for N bits  2N values for N bits

000 110 0000 1111


001 111 0001 1110
011 101 0011 1100
010 100 0111 1000

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 76 of 100


Designs and Future Opportunities
Crossover issue
 Equal read & write pointers : Full or Empty ?
 Basic solution: Leave an empty slot (“almost-full”)
 Alternative: Add a “parity bit” to pointers write read
 Empty: equal pointers, equal parity next
 Full: equal pointers, opposite parity p=1 p=0
 Gray on 4 inputs + parity: 3 bits  Johnson on 4 inputs + parity: 4 bits
 True value decoded by xor of 2 msbs  True value decoded by xor with msb

00 0 11 0 00 0000 1111 0001


00 1 11 1 Decoded 01 0001 1110 Decoded 0010
01 1 10 1 11 0011 1100 0100
01 0 10 0 10 0111 1000 1000

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 77 of 100


Designs and Future Opportunities
Bi-synchronous FIFO depth
 Flow-control cycle:
 Nw+1 write cycles + Nr+1 read cycles

wvalid +1 +1 rready
wptr rptr

Test Test
wready full? empty? rvalid
Nw Nr

 Slow side may require less resynchronization flip-flops than fast side
 Tck larger in MTBF equation
 To keep peak throughput, need 1 transfer per slow cycle:
 FIFO Depth ≥ (Nslow+1) + (Nfast+1)*Tfast/Tslow
 Larger FIFO depth can be considered if interconnect must be freed upstream

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 78 of 100


Designs and Future Opportunities
Circuit:
Clock-less communication

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 79 of 100


Designs and Future Opportunities
Asynchronous logic for interconnects
 Synchronous pipeline: ready/valid on clock edge
CKS CKI CKR

Sender Receiver

 Asynchronous pipeline: request/acknowledge agreement


CKS CKR

Sender Receiver

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 80 of 100


Designs and Future Opportunities
Asynchronous logic basics
 Local handshaking: request & acknowledge
 Self-synchronized logic
wait( Ireq & Oack); Oreq↑; Iack↓;
Ireq Oreq wait(!Ireq & !Oack); Oreq↓; Iack↑;
Iack Oack

Ireq

Iack

Oreq

Oack

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 81 of 100


Designs and Future Opportunities
Asynchronous logic basics
 Wait on equal inputs
 Implemented by Muller Gate A B
 “C-element”

A
C Z
B
A B
A B Z
0 0 0 A
Z
0 1 Z-1 Hold B
1 0 Z-1 last value
1 1 1

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 82 of 100


Designs and Future Opportunities
Asynchronous pipeline
 Quasi-delay-insensitive pipeline
 Dual-rail or 4-rail encoding of data in the request signals

Data=0
C C C C C
Data=1
C C C C C
Ack

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 83 of 100


Designs and Future Opportunities
Asynchronous logic: flexible pipelining
 Considering long-distance links with buffering
 Long cycle time for the req/ack handshake

C C

C C

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 84 of 100


Designs and Future Opportunities
Asynchronous logic: flexible pipelining
 Considering long-distance links with buffering
 Replacement with asynchronous pipeline stages
 Increased throughput with no additional latency

C C C C C

C C C C C

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 85 of 100


Designs and Future Opportunities
Asynchronous mux & demux

sel1
sel0
ack
sel
 Using 3-input Muller gates
possible to add conditions
 Select signal of a mux C
 Enable signal of a demux
C
way0
C

C
C

C
way1

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 86 of 100


Designs and Future Opportunities
Asynchronous arbiters
 Using a Mutex cell
possible to build arbiters
req0
C
MUTEX
winner
req1
C
 Mutex is an RS-latch
with metastability filter
 Inverter supplied by other output
winner
C ack
ack0

ack1 C

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 87 of 100


Designs and Future Opportunities
Pseudo bisynchronous GALS interfaces
 Event-driven pseudo-clock generation Clocked
from asynchronous world by wclk
 No need for synchronizer
 Reduced round-trip time
binary
 Reduced FIFO depth
wdata to data
QDI

aclk C ack
+
wvalid +1 +1
wptr rptr

Test Test
wready full? empty?

synchronizer

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 88 of 100


Designs and Future Opportunities
Asynchronous networks on chip

Sender Receiver

Sender Receiver

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 89 of 100


Designs and Future Opportunities
Outline
 Background and motivation
 Architectural level
 Design level
 Circuit level
 3D-chip integration for on-chip communication
 Chiplets
 Passive and active interposers
 Interposer-level NoC comparison
 Summary & conclusions

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 90 of 100


Designs and Future Opportunities
Chiplet partitioning
 Break architecture in multiple dies
 Focused on specific function
 Tile-able for parallelism

 Chiplet motivations
 Cost driven using 3D stacking
 Modularity driven technologies
 Heterogeneous integration

 Chiplet challenges ?
 Eco-system maturity,
 Technology & Architecture partitioning,
 Chiplet Interfaces, testability, 3D EDA flow, etc [D. Dutoit, Keynote, 3DIC’2014]

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 91 of 100


Designs and Future Opportunities
Passive interposer

Passive nearest-neighbor
connections

Chiplets :
Clusters of Cores

Passive
Interposer SoC infrastructure
Analog, IOs, DFT

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 92 of 100


Designs and Future Opportunities
Active interposer

Scalable & Distributed NoCs


Any chiplet-to-chiplet traffic

Chiplets :
Clusters of Cores
Power
Management
Active Close to cores
Interposer
SoC infrastructure
Analog, IOs, DFT

Additional features

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 93 of 100


Designs and Future Opportunities
P. Vivet, ISSCC’20: chiplet & 3D plugs

µ-buffer std-cells µ-bumps


20µm pitch

3D-Plug :
• Logic interface
• µ-bumps
• µ-buffer std-cells
Chiplet layout : • DFT
3D-Plug interfaces
[P. Vivet, ISSCC’2020]

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 94 of 100


Designs and Future Opportunities
P. Vivet, ISSCC’20: active interposer
6 Chiplets, 96 cores
(FDSOI28)

 3 Distributed flexible interconnects


 Passive connections on interposer
between multisynchronous
NoCs in chiplets
 Synchronous NoC on interposer
 Asynchronous NoC on interposer
Active
Interposer
(CMOS65)
[P. Vivet, ISSCC’2020]

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 95 of 100


Designs and Future Opportunities
Inter-chiplet interconnects: comparison
Inter-chiplet Synchronous Asynchronous
Units
MS-NoC NoC NoC
B
Routing in chiplets on interposer on interposer
2D NoC frequency 1.00 0.75 0.97 GHz
4 + async
End to end latency 44 37 cycles
(15ns)
Propagation speed 2.9 2.0 0.6 ns/mm

 Similar throughput between synchronous NoC and


asynchronous NoC (~1GHz)
A
 Best latency for asynchronous NoC (3-5x wrt. sync.)
 System latency reduction
A to B end-to-end latency  Useful e.g. for cache coherency traffic
25 mm path on interposer
Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 96 of 100
Designs and Future Opportunities
Outline
 Background and motivation
 Architectural level
 Microarchitecture level
 Circuit level
 3D-chip integration for on-chip communication
 Summary & conclusions

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 97 of 100


Designs and Future Opportunities
Summary
 On-chip interconnects are much more than wires

 In this tutorial, we covered:


 At architecture level, communication patterns, topologies and switching schemes
 At microarchitecture level, routing, arbitration and pipelining of interconnects
 low-level techniques to handle multiple clock domains

 We presented a comparative benchmark of different interconnect solutions


for chiplets stacking on active interposer
 Clock-domain crossing is a major challenge in larger interconnects
 Digital circuit techniques are key to high-performance on-chip
and in-package communication

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 98 of 100


Designs and Future Opportunities
ISSCC’2021 papers of interest
 Session 3: Highlighted Chip Releases: Modern Digital SoCs
 Session 4: Processors
 Session 9: ML Processors from Cloud to Edge
 Various multi-core digital architectures with on-chip interconnects
between multiple clock domains, with dataflow and memory mapped
communication schemes

 Session 29: Digital Circuits for Computing, Clocking


and Power Management
 Paper 29.2: Systolic communication architecture
 Session 35: Adaptive Digital Techniques
for Variation Tolerant Systems
 Papers 35.1, 35.2, 35.3: Adaptive architectures
with decoupled frequency & power domains, GALS

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 99 of 100


Designs and Future Opportunities
Key references
 W.J. Dally, B. Towles, Principles and Practices of Interconnection Networks, Amsterdam: Morgan Kaufmann
Publishers, 2004.
 G. Chen et al., “A 340mV-to-0.9V 20.2Tb/s source-synchronous hybrid packet/circuit-switched 16×16
network-on-chip in 22nm tri-gate CMOS,” IEEE International Solid-State Circuits Conference, 2014.
 P. Vivet et al., “A 4×4×2 homogeneous scalable 3D network-on-chip circuit with 326MFlit/s 0.66pJ/b robust
and fault-tolerant asynchronous 3D links,” IEEE International Solid-State Circuits Conference, 2016.
 S. M. Tam et al., "SkyLake-SP: A 14nm 28-Core Xeon® Processor," IEEE International Solid-State Circuits
Conference, 2018.
 P. Vivet et al., “A 220GOPS 96-Core Processor with 6 Chiplets 3D-Stacked on an Active Interposer Offering
0.6ns/mm Latency, 3Tb/s/mm2 Inter-Chiplet Interconnects and 156mW/mm2@ 82%-Peak-Efficiency DC-DC
Converters,” IEEE International Solid-State Circuits Conference, 2020.
 S. Kang et al., “GANPU: A 135TFLOPS/W Multi-DNN Training Processor for GANs with Speculative Dual-
Sparsity Exploitation,” IEEE International Solid-State Circuits Conference, 2020.
 R. Ginosar, "Fourteen ways to fool your synchronizer,” 9th International Symposium on Asynchronous Circuits
and Systems, 2003.
 A. Agarwal et al., “Time-Borrowing Fast Mux-D Scan Flip-Flop with On-Chip Timing/Power/VMIN
Characterization Circuits in 10nm CMOS,” IEEE International Solid- State Circuits Conference, 2020.
 K. Rocki et al., “Fast stencil-code computation on a wafer-scale processor,” International Conference for High
Performance Computing, Networking, Storage and Analysis (SC '20), 2020.

Yvain Thonnart T8: On-Chip Interconnects: Basic Concepts, 100 of 100


Designs and Future Opportunities

You might also like