0% found this document useful (0 votes)
330 views66 pages

Fundamentals of Ultra-Low Voltage Embedded Memory Design: Eric Karl

Uploaded by

1849571793
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
330 views66 pages

Fundamentals of Ultra-Low Voltage Embedded Memory Design: Eric Karl

Uploaded by

1849571793
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

Fundamentals of

Ultra-Low Voltage Embedded Memory Design

Eric Karl
Intel Fellow, Director of Embedded Memory Circuits & Technology
Technology Development, Intel Corporation

Live Q&A Session: February 20th, 2022, 7-7:20 AM PST

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 1 of 66


© 2023 IEEE International Solid-State Circuits Conference
Introduction
Eric Karl, Intel Fellow, Technology Development

 BS & MS in Electrical Engineering from Univ. of Michigan


 Ph.D. in Electrical Engineering from Univ. of Michigan
 SRAM Design Lead at Intel TD, 2008-2013
 Memory Circuit Technology Leader at Intel TD, 2013-2020
 Director of Advanced Design at Intel TD, 2020-Present

 Area of Expertise: Memory Design & DTCO


 Embedded Memory Design – SRAM, Register Files, Fuse/OTP
 Advanced Logic Process Technologist
 Design Technology Co-optimization
 Digital Logic Library Development & Optimization

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 2 of 66


© 2023 IEEE International Solid-State Circuits Conference
Outline
 Embedded Memory Trends
 SRAM: The Embedded Memory Workhorse
 Register Files: Special, High-Performance Memory for “XPU”
 Logic Sequentials: Ultra-Low Voltage Flip-Flops
 System Design: Putting it all together
 Conclusion

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 3 of 66


© 2023 IEEE International Solid-State Circuits Conference
Modern Memory Hierarchy

In a hierarchical memory system, the


entire addressable memory space is
available in the largest, slowest memory
and incrementally smaller and faster
memories, each containing a subset of
the memory below it, proceed in steps
up toward the processor.

Source: Shanthi, Web

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 4 of 66


© 2023 IEEE International Solid-State Circuits Conference
Modern Memory Hierarchy

This hierarchical organization of memory


works because of the Principle of
Locality. Programs access a relatively
small portion of the address space at
any moment. There are two different
types of locality:
• Temporal Locality: If an item is
referenced, it will tend to be
referenced again soon
• Spatial Locality: If an item is
referenced, items whose addresses are
close by tend to be referenced soon
Source: Shanthi, Web

 Modern hardware relies upon temporal and spatial locality of memory


accesses to create a high-capacity, high performance and energy efficient
virtual memory system
E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 5 of 66
© 2023 IEEE International Solid-State Circuits Conference
System Memory Bandwidth Challenges
Compute
Performance

System
Design
Challenge

Memory
Bandwidth

Source: Gholami, Web

 Peak compute performance has improved at a much faster rate than


bandwidth to high-capacity system memory over the last decade

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 6 of 66


© 2023 IEEE International Solid-State Circuits Conference
Bandwidth Gap driving SRAM Cache

Intel Ponte Vecchio AMD Zen3 with 3D V-Cache


16 Compute Tiles 8 Compute Cores
64MB Total L1 Cache 512kB Total L1 Cache
408MB Total L2 Cache 4MB Total L2 Cache
8 HMB2e Tiles Up to 96MB Shared L3 Cache

 Bandwidth gap to main memory is driving increased SRAM in package


 Latest products exceeding Gbit levels of integration in last level SRAM cache
E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 7 of 66
© 2023 IEEE International Solid-State Circuits Conference
Demand for Energy Efficient Cache
Compute

High
Frequent

Memory Access Rate


Access

Low
Infrequent
Storage Access

 Compute and associated high activity embedded cache faces ever-increasing


pressure deliver sustained energy efficiency improvement and operate
at ultra-low voltages
E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 8 of 66
© 2023 IEEE International Solid-State Circuits Conference
Memory-Centric Computing
Compute inside Compute near Compute near
Memory SRAM Cache DRAM Memory

• Reduced Data Movement Energy • Reduced Data Movement Energy • Reduced Data Movement Energy
• Energy Efficient Compute using
Array Structure

 Emerging Memory-Centric computing models targeting energy efficiency


 Most schemes will require basic memory operation at ultra-low voltages
E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 9 of 66
© 2023 IEEE International Solid-State Circuits Conference
Outline
 Embedded Memory Trends
 SRAM: The Embedded Memory Workhorse
 Register Files: Special, High-Performance Memory for “XPU”
 Logic Sequentials: Ultra-Low Voltage Flip Flops
 System Design: Putting it all together
 Conclusion

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 10 of 66


© 2023 IEEE International Solid-State Circuits Conference
Static Random Access Memory (SRAM)
 The workhorse embedded memory
element in advanced CMOS technology M1 / M6: Pass-gate (PG)
M2 / M4: Pull-up (PU)
 Bi-stable, volatile storage element
M3 / M5: Pull-down (PD)
 Can be constructed without logic
process cost adders in advanced nodes
NMOS PMOS NMOS
VCC
VWL VWL
M2 M4
BL BLB
M6 M2 M3
N0 N1
M1 M6

M3 M5
M5 M4 M1

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 11 of 66


© 2023 IEEE International Solid-State Circuits Conference
6T SRAM Read: Basic Operation
 Precharge and equalize bitlines (BL & BLB)
 Assert bitcell wordline signal (WL)
 Develop differential signal on bitlines (~50+ mV)
 Trigger sense amplifier (SAEN) to amplify small
signal to large output signal

50-150mV
Bitline Bar (BLB)
Bitline (BL)
Precharge
& Equalize
Wordline (WL) Small Multiplexor
Signal (optional)
Sense Amp Enable Sensing
(SAEN) Sense
Amplifier
E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 12 of 66
© 2023 IEEE International Solid-State Circuits Conference
6T SRAM Read: Performance Concerns
Read Performance Failure
 Read Performance Failures
WL VCC - ∆VWLUD
VWL are influenced by variation in
VCS
VCC
WL
the following parameters:
BL
BLB  Bitcell Read Current
BL
VBL PU PU
BLB
VBL
VCC
N0 N1
VCC BL Sensing  Bitline Capacitance
(typical) Margin

PG
∆V VCC
PG
BL
(high-sigma)
Failure  Bitline Resistance
 Wordline RC
PD PD
SENSE
ENABLE  Sense Amplifier Mismatch

Stronger Weaker

 Read performance failures in SRAM arrays are driven by insufficient voltage


differential development between BL and BLB nodes leading to sense
amplifier failures

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 13 of 66


© 2023 IEEE International Solid-State Circuits Conference
6T SRAM Read: Stability Concerns
Read Stability Failure Read Stability vs. Targeting
0 Stable

Weak
WL VCC - ∆VWLUD
VWL

PMOS Targeting
0
VCS WL
VCC
0
BL
VBL PU PU
BL
BLB
VBL
VCC VCC BLB
N0 N1
VCC
BL
∆V
PG PG 0

Strong
PD PD 0
N1
Stability
Failure
0 Unstable
N0
Strong Weak
Stronger Weaker
NMOS Targeting

 Read stability failures result in a bitcell that loses state during a read
operation due to charge injection from bitline to internal nodes

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 14 of 66


© 2023 IEEE International Solid-State Circuits Conference
Static Noise Margin Analysis (SNM)
WL
VCC
 Left and Right side are decoupled and
BL/BLB nodes are held at VCC
PUL PUR

VCC VCC
 Voltage transfer curves are superimposed
BL PGL n0 n1 PGR BLB to find SNM (the largest square that fits
PDL PDR
between the transfer curves)

Retention SNM Read SNM


(WL=‘0’) (WL=‘1’)

PUL PUR

n1
n1 n0
PGL n0 PGR

PDL PDR

Source: Seevinck, JSSC 1987

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 15 of 66


© 2023 IEEE International Solid-State Circuits Conference
Dynamic Stability
 Static noise magin (SNM) analysis is a safe worst-case on read stability
 Static bitline voltage can be pessimistic if array design allows discharge
 Read stability “state flips” have a latency à static assumption is worst case
 Dynamic stability margin analysis is more accurate for modeling to absolute
lowest voltages possible

Read Stable Read Unstable Measured 45nm SRAM

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 16 of 66


© 2023 IEEE International Solid-State Circuits Conference
6T SRAM Write: Basic Operation
 Drive bitline pair to opposite states through
write driver and multiplexor circuit
 Assert bitcell wordline control signal
 Internal nodes of bitcell (N0 and N1) change
state to reflect new written data

Bitline Bar (BLB)


Multiplexor
Bitline (BL) (optional)
Wordline (WL)

Write Driver
N0 N1

Wordline (WL)

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 17 of 66


© 2023 IEEE International Solid-State Circuits Conference
6T SRAM Write: Margin Concerns
Write Failure Write Margin vs. Targeting
High Write

Weak
WL
VWL VCC - ∆VWLUD Margin

PMOS Targeting
VCS WL
VCCàVCSMIN

BL
VBL PU PU BLB
VBL
VCC VSS BL
N0 N1
VSS VCC BLB
PG PG

Strong
PD PD
N1
Write
Failure Low Write
N0 Margin
Stronger Weaker
Strong Weak
NMOS Targeting

 Write margin failures occur when NMOS passgate devices are incapable of
overwriting state held by the cross-coupled inverter pair
 Driven by NMOS passgate contention with PMOS pullup device
E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 18 of 66
© 2023 IEEE International Solid-State Circuits Conference
Static Write Voltage Margin (WVM)

WL
Voltage
n1
Write voltage
BLB
BL = VCC
PUL PUR BLB margin is the
voltage at which
PGL n0 n1 PGR
n0 internal nodes
PDL PDR flips
time

 In static write voltage margin measurement, bitline (BL) is held at VCC and
bitline bar (BLB) is ramped down
 Random Variation analysis is usually applied to assess functionality under
specific process skew, voltage and temperature conditions
 Write Voltage Margin is the bitline voltage at which the bit flips

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 19 of 66


© 2023 IEEE International Solid-State Circuits Conference
Dynamic Write Margin
BLPCH_b
WL
mp,n1 1à 0
 Static write voltage margin is usually optimistic
n0 n1
 Dynamic write margin simulation is able to best
mx0 mx1
0à 1
mp,n0
BL BL_b
capture the interaction between the bitcell and
sramvcc

peripheral circuits
 Time-dependent writability (i.e. write-at-speed)
TVC WABIAS  Bitline resistance and capacitance impact
 Peripheral circuit impact
WRDATA_b

 Cross-coupling effect from neighboring signals


WRDATA

sramvcc
n1 Simulation waveform of a write
vsswrdrv operation with TVC is shown
WL
n0

DIN_b WREN_b DIN

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 20 of 66


© 2023 IEEE International Solid-State Circuits Conference
6T SRAM: Bitcell Design Considerations
SRAM design must balance Bitcell Margin vs. Targeting
conflicting requirements for High

Weak
Margins
stability and write margin

PMOS Targeting
(better)
Stability Write
Margin Margin

+ =

Strong
Low
Margins
(worse)
Strong Weak
NMOS Targeting
 Device sizing and targeting in SRAM is central to delivering adequate margins
 Higher margin cells can operate at lower voltages without electrical failures

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 21 of 66


© 2023 IEEE International Solid-State Circuits Conference
6T SRAM: Bitcell Design Considerations
High
Bitcell Margin vs.
Margins
 Bitcell read current and leakage
Targeting (better) requirements place additional
constraints on cell design
Weak

 NMOS targeting is constrained by


PMOS Targeting

minimum acceptable read current

Unacceptable
Read Current
(cell performance)
 NMOS and PMOS targeting is
constrained on the strong side by
maximum acceptable leakage current
Strong

Unacceptable
 Landing zone for SRAM to meet VMIN
Leakage (margins-driven), performance and
Strong Weak Low
leakage requirements is challenging
Margins in advanced CMOS technologies
NMOS Targeting (worse)

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 22 of 66


© 2023 IEEE International Solid-State Circuits Conference
Variation: Random vs. Systematic
Typical Random Variation High Random Variation
High VMIN

Weaker NMOS, Higher VT à


Weaker NMOS, Higher VT à

NMOS VTH
NMOS VTH

Read Limited Read Limited

PMOS VTH Low VMIN PMOS VTH


Stronger PMOS, Lower VT à Stronger PMOS, Lower VT à

 Random variation determines the contours of constant memory VMIN


 Random variation is a top concern for memory VMIN in scaled technology
 Memory VMIN target and random variation dictates the process systematic
variation landing zone
E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 23 of 66
© 2023 IEEE International Solid-State Circuits Conference
Systematic Variation Challenges
Stress Litho Polish

Anneal Etch

 New technologies continue to introduce new forms of systematic variation


 Challenging to meet goals of memories accounting for systematic variation

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 24 of 66


© 2023 IEEE International Solid-State Circuits Conference
Write Assist Circuits for SRAM
 Negative bitline and voltage collapse are the most common write assists
Negative Supply Voltage Ground Voltage Wordline
Assist
Bitline Collapse Increase Boost
Coupling of low bitline Collapse of active column Increase of active column Increase in wordline
below ground to increase cell supply to weaken ground supply to weaken voltage during write to
Description passgate gate-source latch contention latch contention increase passgate gate-
voltage source voltage

Strengthens Reduces Latch Reduces Latch Strengthens


Type Pass-gate Strength Strength Pass-gate
Adoption Common Common Uncommon Uncommon

Source: Mann, SSE 2010

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 25 of 66


© 2023 IEEE International Solid-State Circuits Conference
Voltage Collapse Write Assist
VWL VCC - ∆VWLUD

VCS
VCCàVCSMIN

VBL PU PU VBL
VCC VSS
N0 N1
VSS VCC
PG PG

PD PD

 Collapse bitcell VCC supply node to weaken PU and reduce write contention

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 26 of 66


© 2023 IEEE International Solid-State Circuits Conference
Voltage Collapse Write Assist
VWL VCC - ∆VWLUD
VCS Write
VCS
VCCàVCSMIN
Wordline = VWL

VBL VBL Wordline = VSS


PU PU
VCC VSS Wordline = VSS
N0 N1
VSS VCC
Dynamic
PG PG
Retention

PD PD Wordline = VSS

VCC VCC < VCC VCC

 Unselected bitcells along the column have dynamic retention risks (lose state)

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 27 of 66


© 2023 IEEE International Solid-State Circuits Conference
Transient Voltage Collapse (TVC) Assist
VCS Source: Wang, IEDM 2011
VCC
VCS
DRV
Voltage @ N1
VTH,PMOS
Voltage

BL BL#
“0” “1”
N0 N1
Dynamic Data
Leakage
Retention Time
Paths
CN1
VSS
Time
 VCS can be temporarily collapsed below Data Retention Voltage (DRV)
 Timing and level are sensitive to leakage paths (transistor, defects, etc.)
E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 28 of 66
© 2023 IEEE International Solid-State Circuits Conference
Negative Bitline (NBL) Write Assist
WL0

BIT BIT BIT

2b TRIM VCC

BL/BL#
VSS

“Unselected” “Unselected” “Unselected”

WREN
NBL-WA 1X 2X 4X

NBLPULSE# DATA# DATA

3b TRIM

 Negative voltage on BL à strengthen passgate à break write contention


Source: Karl, IEDM 2012

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 29 of 66


© 2023 IEEE International Solid-State Circuits Conference
NBL Write Assist Implementation
 Coupling timing tricky
 Early à Degraded VMIN
 Late à Timing Impact
 Risk: Large negative voltage
 Reduce charge on cap prior to coupling
 Disable NBL at high VCC

Source: Chang, ISSCC 2015

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 30 of 66


© 2023 IEEE International Solid-State Circuits Conference
Active Write Assist Comparison
22nm FinFET (VLSI’18) 16nm FinFET (ISSCC’14)

NOWA TVC
Narrow
VMIN (A.U.)

225mV TVC
Mid

TVC
Source: Kim, VLSI 2018
NBL Wide
1.0 1.2 1.4 1.6
Normalized Write Power at 675mV LCV = Voltage Collapse
NBL = Negative Bitline
 Both TVC and NBL circuits
demonstrated to deliver 200-300mV
VMIN enhancements on multiple nodes Source: Chen, ISSCC 2014

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 31 of 66


© 2023 IEEE International Solid-State Circuits Conference
Read Assist Circuits for SRAM
 Wordline Underdrive is the most common SRAM read assist circuit
Increase Global Negative VSS at Boosted VCC at Wordline
Assist
VCC Bitcell Bitcell Underdrive
Coupling of low bitline Collapse of active column Increase of active column Increase in wordline
below ground to increase cell supply to weaken ground supply to weaken voltage during write to
Description passgate gate-source latch contention latch contention increase passgate gate-
voltage source voltage

Strengthens Strengthens Strengthens Reduce Noise into


Type Latch Latch Latch Latch
Adoption Uncommon Uncommon Uncommon Common

Source: Mann, SSE 2010

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 32 of 66


© 2023 IEEE International Solid-State Circuits Conference
Wordline Underdrive (WLUD) Read Assist
VWL VCC - ∆VWLUD

VCS
VWL ∆VWLUD
VCC

VBL PU PU VBL
VCC VCC
N0 N1
∆V VCC
PG PG

N0
PD PD

∆V

 WL voltage adjusted to tradeoff read stability vs. write margin / performance

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 33 of 66


© 2023 IEEE International Solid-State Circuits Conference
Wordline Underdrive (WLUD) Read Assist

 It’s a tradeoff knob à No read


LVC 90%-ile Vmin
vs. write margin enhancement
576kB Array, No Repair
 Usage can eliminate passgate
targeting masks in some

Vmin (AU)
technologies

 Levels can be adjusted post-


fabrication to correct for Write (-10C)
systematic variation Read (90C)

VCC VCC - ∆V
WLUD-RA VCCWL (V)
Source: Karl, ISSCC 2012

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 34 of 66


© 2023 IEEE International Solid-State Circuits Conference
WLUD Read Assist Implementations
 Often implemented with simple voltage divider in SRAM row decoder driver
 Tunable pull-down NMOS or PMOS devices utilized to adjust strength
 Energy and area overhead are very modest, on the order of 1-2%

WLSLP WLUD-RA VCS[0]


XDH[63] BL PU1 PU2
BL#
WL[255]
PG1 N0 PG2
WLCLK[3] PUD[2:0] N1
PD1 PD2

x7 6T-SRAM Cell

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 35 of 66


© 2023 IEEE International Solid-State Circuits Conference
Outline
 Embedded Memory Trends
 SRAM: The Embedded Memory Workhorse
 Register Files: Special, High-Performance Memories for “XPU”
 Logic Sequentials: Ultra-Low Voltage Flip-Flops
 System Design: Putting it all together
 Conclusion

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 36 of 66


© 2023 IEEE International Solid-State Circuits Conference
What is a Register File?
 A register file is an array of memory registers located in an CPU/GPU
 Register files are typically the lowest level of the memory hierarchy, and have
relatively high memory access rates to support co-located execution units
 Modern register files are often implemented with multi-ported memories to
avoid data access conflicts that can stall parallel execution units

Intel Arc A-series GPU

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 37 of 66


© 2023 IEEE International Solid-State Circuits Conference
How Multi-ported Memory Helps
Single-Port Memory Dual-Port Memory

Sequential
Access

Parallel
Access

 Single-port memories have address access restrictions when attempting to


increase read/write bandwidth per cycle
 Native, Multi-ported memories can be used to avoid access restrictions
E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 38 of 66
© 2023 IEEE International Solid-State Circuits Conference
2RW Dual-Port SRAM
VCC

 Simple Idea à More passgates! WL1 WL1


 Arrays support: M2 M4
BL1 BLB1
 2 reads concurrently WL2
WL2
N0 N1
 2 writes concurrently
M1 M6 BLB2
BL2
 1 read/1 write concurrently
 Requires stronger M3/M5 à M7 M3 M5
M8
additional noise injection from
M1/M6/M7/M8 devices during reads

 “True” 2RW dual-port SRAM provides the most flexible access characteristics
for a 2-port memory, but introduces unique margin and timing challenges

February 20th, 2020 T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 39 of 66
© 2023 IEEE International Solid-State Circuits Conference
2RW DP-SRAM: Supported Array Accesses
Different Row Different Row Same Row Same Row
Different Column Same Column Different Column Same Column

Unique to DP-SRAM Single Dummy Read – One Active Wordline


Identical to SP-SRAM
Double Dummy Read – Two Active Wordlines

 Same-row access introduces “Double Dummy Read” in un-accessed cells


 Same-row different column introduces “Single Dummy Read” interacting with
read/write operations in the selected cells
 Write on both ports to the same cell is not commonly supported
E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 40 of 66
© 2023 IEEE International Solid-State Circuits Conference
2RW DP-SRAM: Bitcell Stability
Different Row Same Row Different-Row
(like SP-SRAM) (unique to DP-SRAM) Same-Row

SNMDiff > SNMSame

Source: Nii, JSSC 2009

 Same-row accesses introduce concurrent read or dummy read operations


 This reduces the noise margin (cell read stability) and requires a larger bitcell
design (larger NMOS PD) or external circuit assists to stabilize the cell
E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 41 of 66
© 2023 IEEE International Solid-State Circuits Conference
2RW DP-SRAM: Read Disturb Write
Worst Case Write
(Same-Row Access)

Worse for
Same-Row
Access

 Dummy read creates additional contention with write operation


 Dummy read bitline is commonly held in precharge state (at VCC) to avoid
interfering with write operations on other ports
Source: Ishii, JSSC 2011

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 42 of 66


© 2023 IEEE International Solid-State Circuits Conference
2RW DP-SRAM: Read Disturb Read
Worst Case Read
(Same-Row Access)

Worse for
Same-Row
Access

 Same-row access also sets the worst case read current


 Dummy read colliding with read operation in the same row, leads to the
lowest read current (since dummy BLs are held at VCC) Source: Ishii, JSSC 2011

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 43 of 66


© 2023 IEEE International Solid-State Circuits Conference
2RW DP-SRAM: WL Skew Sensitivity
Read disturb Write

Access WL
(Read or Write)

Negative Skew

Disturb WL Zero Skew


(Dummy Read)

Positive Skew Read disturb Read

 DP-SRAM ports are often clocked


independently (asynchronous access)
 Timing alignment of disturb WL to
access WL in Same-Row access is
critical to identify worst case timing
and cell margins Source: Ishii, JSSC 2011

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 44 of 66


© 2023 IEEE International Solid-State Circuits Conference
1R1W Dual-Port 8T SRAM
 6T SRAM + Dedicated Read Port VCC

 Arrays can support: WL RDWL


 1 read/1 write concurrently M2 M4 RDBL
BLB
 2 reads concurrently (less common)
N0 N1
 Common reuse of 6T SRAM bitcell M7
M1 M6
design in the core of this 8T SRAM
 Simpler array IP design and M3 M5 M8
technology margining due to fewer
unique access conditions
 Single-ended read port à different
tradeoffs for read performance

 1R1W dual-port SRAM doesn’t support two concurrent write operations, but
has fewer timing and margin challenges than 2RW dual-port

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 45 of 66


© 2023 IEEE International Solid-State Circuits Conference
Single-Ended Small Signal Sensing
 Primary difference between 1R1W 8T SRAM and 1RW 6T SRAM is
implementing read sensing scheme without differential bitlines

How are we going to


generate this reference?

Obviously, an external analog


bias generator is an option.

Can you keep the analog


bias noise free routing
across the memory?

Can you afford the


area/power to integrate
Single Read Bitline
analog bias generation
(with 1R1W SRAM) closer to the memory?

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 46 of 66


© 2023 IEEE International Solid-State Circuits Conference
Large Signal Hierarchical Sensing
 Hierarchical local-global bitlines can be utilized with wide-OR domino circuits
to for power-efficient sensing of 1R1W read port
 Keeper circuit contention across interconnect is key technical challenge
GBL Pulldown
LBL GBL Keeper
16b/LBL SRAM 16b/LBL SRAM Discharges GBL and breaks
Merge Keeps GBL at VCC Global Bitline (GBL) GBL keeper when DATA=1
when DATA = 0 M2 or M4,
IO Region (1x)
~600-1200PP Length
Merge Region (2-16x)
GBL Merge, SDL

Hierarchical Hierarchical
8T SRAM Array 8T SRAM Array

128b/GBL (M4) RF Bitcell (16-64x)


128 Row Decoder 128 Row Decoder

64b/WL
GBL Merge, SDL

Hierarchical Hierarchical
8T SRAM Array 8T SRAM Array
Local Bitline (LBL) LBL Keeper
M0 or M2, Keeps LBL at VCC
~32-128PP Length when DATA = 0

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 47 of 66


© 2023 IEEE International Solid-State Circuits Conference
Outline
 Embedded Memory Trends
 SRAM: The Embedded Memory Workhorse
 Register Files: Special, High-Performance Memory for “XPU”
 Logic Sequentials: Ultra-Low Voltage Flip-Flops
 System Design: Putting it all together
 Conclusion

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 48 of 66


© 2023 IEEE International Solid-State Circuits Conference
Logic Sequentials as Memory
 Traditional, compact SRAM and multiported memory relies upon ratioed
transistors to ensure functionality
 This is a fundamental limiter to reaching lowest operating voltages!
 Interruptible Logic Sequentials can be used as memory to take you to even
lower voltage operating points

Fully Interruptible Feedback on Latch Elements

Eliminates circuit functionality dependence on ratio


between transistors, enabling more reliable operation at
low voltage

Primary-Secondary Flip-Flop
E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 49 of 66
© 2023 IEEE International Solid-State Circuits Conference
Vmin Failures in Logic Sequentials
 Even fully interruptible state elements can fail at lower voltages,
depending upon the circuit topology employed
 For the primary-secondary flip-flop from our example, the typical
limiters at lower voltages include:
 Write-back charge-sharing across the pass-gate
 Internal min-delay failures related to internal clock slopes

Standby Vmin Active Vmin


Failure Mode Retention Write-Back Write-Disturb Internal Hold Scan-Stitch
Excessive keeper
Charge-sharing glitch Min-delay failure
Lowest static leakage disturbs Min-delay failure
from secondary latch between bits in
Description retention voltage primary/secondary between primary and
disturbs primary latch internally stitched
under variation latch state when secondary latch
state multi-bit FF scan chain
keeper loop closes
Common Limiter Common Limiter

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 50 of 66


© 2023 IEEE International Solid-State Circuits Conference
Write-back Failure Mechanism

 Momentary charge-sharing from nk15 node to nk14 can compromise primary


latch before clock transitions are complete
 Typically failing first on higher threshold voltage flip-flops, at low temperature

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 51 of 66


© 2023 IEEE International Solid-State Circuits Conference
Internal Min-Delay Failure

 Glitches through internal passgates during clock transitions also start to


cause min-delay failure at secondary latch
 Typically failing first on lower threshold voltage flip-flops, at high temperature
E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 52 of 66
© 2023 IEEE International Solid-State Circuits Conference
Clock Optimization at Ultra Low Voltage

Mind the clocks!


All logic sequential failure
modes are impacted by clock
signal degradation

 Several techniques can extend to


lower voltage operation:
 Local clock driver upsizing and Vth
mapping optimization
 Local clock layout optimization
 Large Vector flop value grows due to
amortizing upsized local clocking

T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 53 of 66


53
© 2023 IEEE International Solid-State Circuits Conference
Circuit Topologies for Ultra-low Voltage
 Circuit topology adjustments and
restrictions enable lower VMIN:
 Stacked Device Restrictions
 High Vth Device Usage Restriction
 Small Device Usage Restrictions
 Transmission Gate Avoidance
 Results in tradeoffs to area,
leakage and capacitance

T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 54 of 66


54
© 2023 IEEE International Solid-State Circuits Conference
Outline
 Embedded Memory Trends
 SRAM: The Embedded Memory Workhorse
 Register Files: Special, High-Performance Memory for “XPU”
 Logic Sequentials: Ultra-Low Voltage Flip-Flops
 System Design: Putting it all together
 Conclusion

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 55 of 66


© 2023 IEEE International Solid-State Circuits Conference
Supply Voltage Strategies
Single Rail Interface Dual Rail Array Dual Rail

Logic Supply
Decoder

Decoder

Decoder
SRAM SRAM SRAM SRAM Supply
Bitcells Bitcells Bitcells

Column Column Column

External Logic External Logic External Logic

• Simple, Easy Integration • Allows wide flexibility • Lower array energy


• Memory can limit Logic between Logic/SRAM • Supply delta limitations
operating voltages supply voltages between SRAM/Logic

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 56 of 66


© 2023 IEEE International Solid-State Circuits Conference
System Vmin with Single Rail Memory
 “Single Rail” design solutions make sense in applications pursuing energy
efficiency with high access rates to memory

Will this memory work


on my existing system
supply voltage rail?
Time-0 Vmin
Time-0
Vmin @
Foundry
FoM

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 57 of 66


© 2023 IEEE International Solid-State Circuits Conference
System Vmin: Some Assembly Required
 Care is required to translate memory Vmin to system Vmin

System Vmin Test Coverage Guardband for the very real, practical
Guardband limitations in time-0 test coverage
Regulator
Tolerance Depending upon application and technology,
Package Droop
random telegraph noise can introduce bit errors
On-Die Droop
during operation
RTN Noise
EOL Vmin Aging guardband dependent upon lifetime stress
Aging
FoM
condition required for application; some JEDEC
Adjustment standards may apply
Time-0 Vmin
Figure of Merit Vmin (standard way to quote
Time-0
Vmin @ Vmin) may need adjustment based upon
Foundry temperature range, array size and distribution
FoM of functional parts desired

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 58 of 66


© 2023 IEEE International Solid-State Circuits Conference
Aging Effects
 Bias Temperature Instability (BTI) effects drive Vmin failure mechanisms
 SRAM Read Stability degrades with NBTI (PMOS weakening)
 SRAM Write Margin improves with NBIT (PMOS weakening)
 The effect is statistical in nature, and margining for it requires understanding:
 Technology characteristics (Intrinsic BTI or other aging effects)
 Stress conditions (voltage, temperature, etc.)
 Memory array size
Read Write

Source: Jain, IEDM 2012

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 59 of 66


© 2023 IEEE International Solid-State Circuits Conference
Erratic Bits: Random Telegraph Noise
 Erratic bit phenomena have been observed in SRAM and attributed to
trapping/detrapping effects that modify transistor threshold voltage
 Depending upon technology optimization, operating voltage range and usage
application, voltage guardbands may be required to avoid in-field errors
Distribution of SRAM Array Vmin Repeated Single Cell
with Repeated Measurement Measurement of Erratic Bit

Source: Agostinelli, IEDM 2005

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 60 of 66


© 2023 IEEE International Solid-State Circuits Conference
Memory Repair and ECC
 Repair schemes to replace outlier
memory cells with alternate cells are
an extremely effective approach to
mitigate numerous memory failure
mechanisms
 Manufacturing Defects / Yield
 Minimum Operating Voltage Limiters
 Performance Outliers
 Error-correcting codes can take bit
error rate (BER) reductions further, by
fixing 1-2 errors per data word
 2-4 order of magnitude reductions in
BER possible with minimal overhead

Source: Zimmer, ASSCC 2016

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 61 of 66


© 2023 IEEE International Solid-State Circuits Conference
A Roadmap to Low Voltage Memory
Storage Elements
Analog HD HP 2P Logic Logic
Sequentials Combinational
& I/O SRAM SRAM SRAM
Vmax
Component Operating
Voltage Range (AU)

Varies
+Assist Ckts Unassisted Unassisted
by IP
General
Purpose General
Purpose

+ECC/Repair
Nominal
+Assist Ckts
LV Optimized
+ECC/Repair
+Assist Ckts
+ECC/Repair LV Optimized
Topologies
LV Optimized
Limit: Vth + Random Variation + Systematic Variation
Near-Threshold
Low voltage operation requires careful selection and optimization of storage elements
 Assisted SRAM enables operation below nominal
 2P SRAM, with independent optimization for read/write ports, can go further
 Optimized Logic Sequentials can reach near-threshold regime

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 62 of 66


© 2023 IEEE International Solid-State Circuits Conference
Conclusion
 Embedded memory usage is increasing due to off-package bandwidth
challenges and emerging memory-centric computing models
 Energy efficiency and low voltage operation of memory elements is a critical
consideration for high throughput compute systems of the future
 A rough roadmap for memory solutions:
 SRAM designers have tools to take Vmin below nominal operating voltages
 Customized, application specific multiport memories can take a further step
 Logic Sequential arrays can be designed to reach near-threshold operation
 System designers have additional tools to enable Vmin
 Dual-Rail supply schemes
 Vmin Repair
 Error Correcting Code Schemes

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 63 of 66


© 2023 IEEE International Solid-State Circuits Conference
References
1. R. Iyer et al., "Advances in Microprocessor Cache Architectures Over the Last 25 Years," in IEEE Micro, vol. 41, no. 6, pp. 78-88, 1 Nov.-Dec.
2021, doi: 10.1109/MM.2021.3114903.
2. Shanthi, A. P., https://www.cs.umd.edu/~meesh/411/CA-online/chapter/memory-hierarchy-design-basics/index.html
3. Gholami, Amir, https://medium.com/riselab/ai-and-memory-wall-2cb4265cb0b8
4. W. Gomes et al., "Ponte Vecchio: A Multi-Tile 3D Stacked Processor for Exascale Computing," 2022 IEEE International Solid- State Circuits
Conference (ISSCC), 2022, pp. 42-44, doi: 10.1109/ISSCC42614.2022.9731673.
5. J. Wuu et al., "3D V-Cache: the Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86-64 CPU," 2022 IEEE International
Solid- State Circuits Conference (ISSCC), 2022, pp. 428-429, doi: 10.1109/ISSCC42614.2022.9731565.
6. G. K. Chen, P. C. Knag, C. Tokunaga and R. K. Krishnamurthy, "An 8-core RISC-V Processor with Compute near Last Level Cache in Intel 4
CMOS," 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, pp. 68-69, doi:
10.1109/VLSITechnologyandCir46769.2022.9830518.
7. J. H. Kim et al., "Aquabolt-XL: Samsung HBM2-PIM with in-memory processing for ML accelerators and beyond," 2021 IEEE Hot Chips 33
Symposium (HCS), 2021, pp. 1-26, doi: 10.1109/HCS52781.2021.9567191.
8. S. Lee et al., "A 1ynm 1.25V 8Gb, 16Gb/s/pin GDDR6-based Accelerator-in-Memory supporting 1TFLOPS MAC Operation and Various Activation
Functions for Deep-Learning Applications," 2022 IEEE International Solid- State Circuits Conference (ISSCC), 2022, pp. 1-3, doi:
10.1109/ISSCC42614.2022.9731711.
9. E. Seevinck, F. J. List and J. Lohstroh, "Static-noise margin analysis of MOS SRAM cells," in IEEE Journal of Solid-State Circuits, vol. 22, no. 5,
pp. 748-754, Oct. 1987, doi: 10.1109/JSSC.1987.1052809.
10.S. O. Toh, Z. Guo and B. Nikolić, "Dynamic SRAM stability characterization in 45nm CMOS," 2010 Symposium on VLSI Circuits, 2010, pp. 35-36,
doi: 10.1109/VLSIC.2010.5560259.
11.R. Mann et al., “Impact of circuit assist methods on margin and performance in 6T SRAM,” Solid State Electronics, 2010, pp. 1398-1407,
doi:10.1016/j.sse.2010.06.009
12.Y. Wang et al., "Dynamic behavior of SRAM data retention and a novel transient voltage collapse technique for 0.6V 32nm LP SRAM," 2011
International Electron Devices Meeting, 2011, pp. 32.1.1-32.1.4, doi: 10.1109/IEDM.2011.6131655.
13.E. Karl, Z. Guo, Y. -G. Ng, J. Keane, U. Bhattacharya and K. Zhang, "The impact of assist-circuit design for 22nm SRAM and beyond," 2012
International Electron Devices Meeting, 2012, pp. 25.1.1-24.1.4, doi: 10.1109/IEDM.2012.6479099.

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 64 of 66


© 2023 IEEE International Solid-State Circuits Conference
References
14.E. Karl et al., "A 4.6 GHz 162 Mb SRAM Design in 22 nm Tri-Gate CMOS Technology With Integrated Read and Write Assist Circuitry," in IEEE
Journal of Solid-State Circuits, vol. 48, no. 1, pp. 150-158, Jan. 2013, doi: 10.1109/JSSC.2012.2213513.
15.M. -F. Chang, C. -F. Chen, T. -H. Chang, C. -C. Shuai, Y. -Y. Wang and H. Yamauchi, "17.3 A 28nm 256kb 6T-SRAM with 280mV improvement in
VMIN using a dual-split-control assist scheme," 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers,
2015, pp. 1-3, doi: 10.1109/ISSCC.2015.7063052.
16.D. Kim et al., "Sub-550mV SRAM Design in 22nm FinFET Low Power (22FFL) Technology with Self-Induced Collapse Write Assist," 2018 IEEE
Symposium on VLSI Technology, 2018, pp. 151-152, doi: 10.1109/VLSIT.2018.8510704.
17.Y. -H. Chen et al., "13.5 A 16nm 128Mb SRAM in high-κ metal-gate FinFET technology with write-assist circuitry for low-VMIN
applications," 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014, pp. 238-239, doi:
10.1109/ISSCC.2014.6757416.
18.Z. Guo, D. Kim, S. Nalam, J. Wiedemer, X. Wang and E. Karl, "A 23.6-Mb/mm2 SRAM in 10-nm FinFET Technology With Pulsed-pMOS TVC and
Stepped-WL for Low-Voltage Applications," in IEEE Journal of Solid-State Circuits, vol. 54, no. 1, pp. 210-216, Jan. 2019, doi:
10.1109/JSSC.2018.2861873.
19.K. Nii et al., "Synchronous Ultra-High-Density 2RW Dual-Port 8T-SRAM With Circumvention of Simultaneous Common-Row-Access," in IEEE
Journal of Solid-State Circuits, vol. 44, no. 3, pp. 977-986, March 2009, doi: 10.1109/JSSC.2009.2013766.
20.Y. Ishii et al., "A 28 nm Dual-Port SRAM Macro With Screening Circuitry Against Write-Read Disturb Failure Issues," in IEEE Journal of Solid-
State Circuits, vol. 46, no. 11, pp. 2535-2544, Nov. 2011, doi: 10.1109/JSSC.2011.2164021.
21.J. P. Kulkarni et al., "5.6 Mb/mm2 1R1W 8T SRAM Arrays Operating Down to 560 mV Utilizing Small-Signal Sensing With Charge Shared Bitline
and Asymmetric Sense Amplifier in 14 nm FinFET CMOS Technology," in IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 229-239, Jan.
2017, doi: 10.1109/JSSC.2016.2607219.
22.S. Vangal et al., “Near-Threshold Voltage Design Techniques for Heterogeneous Manycore System-on-Chips”, in Journal of Low Power Electronics
and Applications, vol. 10, no. 16, May 2020, doi:10.3390/jlpea10020016
23.P. Jain, A. Paul, X. Wang and C. H. Kim, "A 32nm SRAM reliability macro for recovery free evaluation of NBTI and PBTI," 2012 International
Electron Devices Meeting, 2012, pp. 9.7.1-9.7.4, doi: 10.1109/IEDM.2012.6479014.
24.M. Agostinelli et al., "Erratic fluctuations of sram cache vmin at the 90nm process technology node," IEEE InternationalElectron Devices Meeting,
2005. IEDM Technical Digest., 2005, pp. 655-658, doi: 10.1109/IEDM.2005.1609436.

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 65 of 66


© 2023 IEEE International Solid-State Circuits Conference
References
25.B. Zimmer, P. -F. Chiu, B. Nikolić and K. Asanović, "Reprogrammable redundancy for cache Vmin reduction in a 28nm RISC-V processor," 2016
IEEE Asian Solid-State Circuits Conference (A-SSCC), 2016, pp. 121-124, doi: 10.1109/ASSCC.2016.7844150.

E. Karl T7: Fundamentals of Ultra-Low Voltage Embedded Memory Design 66 of 66


© 2023 IEEE International Solid-State Circuits Conference

You might also like