Lec07 Memory sp17
Lec07 Memory sp17
Spring 2017
John Wawrzynek
with
James Martin (GSI)
But the
role of
cache in
computer
design has
varied
widely
over time.
CS 152 L14: Cache I UC Regents Spring 2005 © UCB
1977: DRAM faster than microprocessors
Apple ][ (1977)
CPU: 1000 ns
DRAM: 400 ns
Steve
Steve Wozniak
Jobs
1 1 2 2
9 9 0 0
8 9 0 0
0 0 0 5 Year
CS 250 L07: Memory UC Regents S17 © UCB
Caches: Variable-latency memory ports
Data in upper Data Lower Level
Memory
memory To Processor Upper Level
Memory
returned with Large,
lower latency. Small, fast
Address Blk X slow
From Processor
Data in lower Blk Y
level returned
with higher
latency.
From
CPU
To CPU
Temporal
Locality
Spatial
Locality
Time
Donald J. Hatfield, Jeanette Gerald: Program
Restructuring for Virtual Memory. IBM Systems
CS 250 L07: Memory Journal 10(3): 168-192 (1971) UC Regents S17 © UCB
The caching algorithm in one slide
Blk X
From Processor Blk Y
R
e
g
i
s
t
e
r
s
(1K)
1.5V
+++ +++ State is read by
sensing the amount
--- --- of energy
Capacitor
“Bit Line”
“Bit Line”
Why Vdd
oxide oxide Vcap Vcap
n+ n+ ------ values
p- start Diode
out at leakage
Word Line and Vdd run on “z-axis” ground. current.
CS 250 L07: Memory UC Regents S16 © UCB
A 4 x 4 DRAM array (16 bits) ....
Vdd
Vdd
Vgs Vdd
Vc
Vdd - Vth. Bad, we store less
charge. Why do we not get Vdd?
Ids = k [Vgs -Vth]^2 ,
but “turns off” when Vgs <= Vth!
Vgs = Vdd - Vc. When Vdd - Vc == Vth, charging effectively stops!
CS 250 L07: Memory UC Regents S16 © UCB
DRAM Challenge #2: Destructive Reads
+++++++ (stored charge from cell)
Bit Line
(initialized Word Line
+
to a low +
+
voltage) g s + Vdd
V +
+
+
0 -> Vdd Vc -> 0
Assume Ccell = 1 fF
Bit line may have 2000 nFet drains,
assume bit line C of 100 fF, or 100*Ccell.
Ccell holds Q = Ccell*(Vdd-Vth) 100*Ccell Ccell
“sense amp”
Bit line to sense +
“Dummy” bit line. ?
Dummy bit line -
Cells hold no charge.
oxide oxide
n+ n+ ------
p- Diode leakage ...
CS 250 L07: Memory UC Regents S16 © UCB
DRAM Challenge #5: Cosmic Rays ...
Bit Line
Word Line
+
+
+
+ Vdd
+
+
+
Cell capacitor holds 25,000 electrons (or
less). Cosmic rays that constantly
bombard us can release the charge!
Solution: Store extra bits to detect and
correct random bit flips (ECC).
oxide oxide
n+ n+ ------
p- Cosmic ray hit.
CS 250 L07: Memory UC Regents S16 © UCB
DRAM Challenge #6: Yield
Capacitor
“Bit Line”
“Bit Line”
oxide oxide
n+ n+ ------
p-
Word Line and Vdd run on “z-axis”
CS 250 L07: Memory UC Regents S16 © UCB
Early replacement: “Trench” capacitors
The
companies
that kept
scaling
trench
capacitors
for
commodity
DRAM chips
went out of
business.
CS 152 L14: Cache I UC Regents Spring 2005 © UCB
Samsung
90nm
stacked
capacitor
bitcell.
168
880 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 4, APRIL 2010
DDR2 SDRAM
MT47H128M4 – 32 Meg x 4 x 4 banks
MT47H64M8 – 16 Meg x 8 x 4 banks
MT47H32M16 – 8 Meg x 16 x 4 banks
“Word Line”
“Row”
People
buy So, we
DRAM for amortize
the bits. the edge
“Edge” circuits
circuits over big
are arrays.
overhead.
oxide oxide
n+ n+ ------
p- Diode leakage ...
CS 250 L07: Memory UC Regents S16 © UCB
Latency versus bandwidth
Thus, push to What if we want all of the 16384 bits?
faster DRAM
interfaces In row access time (55 ns) we can do
1 22 transfers at 400 MT/s.
16-bit chip bus -> 22 x 16 = 352 bits << 16384
o
13-bit f Now the row access time looks fast!
row
address 8
16384
1
input 9 columns
2
8192
d rows 134 217 728 usable bits
e (tester found good bits in bigger array)
c
o
d 16384 bits delivered by sense amps
e
r
Select requested bits, send off the chip
CS 250 L07: Memory UC Regents S16 © UCB
DRAM latency/bandwidth chip features
Columns: Design the right interface
for CPUs to request the subset of a
column of data it wishes:
16384 bits delivered by sense amps
CK
DQS, DQS#
DQ DO DO DO DO
n n+1 n+2 n+3
CL = 3 (AL = 0)
T0 T1 T2 T3 T4 T5 T6
DRAM is controlled via
CK#
commands
CK
Synchronous data
READ
(READ, WRITE,
Command NOP NOP NOP
output.
NOP NOP NOP
REFRESH, ...)
DQS, DQS#
DQ DO DO DO DO
n n+1 n+2 n+3
CL = 4 (AL = 0)
CS 250 L07: Memory UC Regents S16 © UCB
Auto-Precharge
Opening a row before reading
ure 52: Bank Read
CK#
CK
T0 – with
T1 AutoT2Precharge
T3 T4 T5 T6
... READ
T7 T7n T8 T8n
Address
1 RA Col n RA
Command NOP1 ACT NOP1 READ2,3 NOP1 NOP1 NOP1 NOP1 NOP1 ACT
4
A10
Address RA
RA Col n RA
RA
4
Bank address Bank x
A10 RA x
Bank Bank x RA
AL = 1 CL = 3
15 ns
tRC
tRCD tRTP
tRAS tRP
DM
tRC
tDQSCK (MIN)
DM
Case 1: tAC (MIN) and tDQSCK (MIN) tRPRE
tRPST
5 5
DQS, DQS#
tLZ (MIN) tDQSCK (MIN)
Case 1: tAC (MIN) and tDQSCK (MIN) tRPRE
tRPST
5 5
DQ6 DO
DQS, DQS# n
tLZ (MIN) tLZ (MIN) tAC (MIN) tHZ (MIN)
DQ6 DO
n
tDQSCK (MAX)
Case 2: tAC (MAX) and tDQSCK (MAX) t t tRPST
tHZ (MIN)
CS 250 L07: Memory
55 ns bet ween row opens.
5
LZ (MIN)
RPRE tAC (MIN)
5
UC Regents S16 © UCB
However, we can read 512Mb:
columns quickly
x4, x8, x16 DDR2 SDR
RE
e READ Bursts
T0 T1 T2 T3 T3n T4 T4n T5 T5n T6 T6n
CK#
CK
Command READ NOP READ NOP NOP NOP NOP
RL = 3
DQS, DQS#
DO DO
DQ n b
Address Both
Bank,
Col n READs are to Bank,
the
Col b same bank, but different columns.
tCCD
RL = 4
CS 250 L07: Memory UC Regents S16 © UCB
Why can we read columns quickly?
1
Column reads select from the 16384 bits here
o
13-bit f
row
address 8
16384
1
input 9 columns
2
8192
d rows 134 217 728 usable bits
e (tester found good bits in bigger array)
c
o
d 16384 bits delivered by sense amps
e
r
Select requested bits, send off the chip
T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
CK#
CK
Command ACT READ ACT READ ACT READ ACT READ NOP NOP ACT
Address Row Col Row Col Row Col Row Col Row
Bank address Bank a Bank a Bank b Bank b Bank c Bank c Bank d Bank d Bank e
tRRD (MIN)
tFAW (MIN)
Don’t Care
ODT
CKE Control
CK Logic
CK#
Command
CS#
decode
VssQ
05aef82f1e6e2
DR2.pdf - Rev. O 7/09 EN 12 Micron Technology, Inc. reserves the right to change products or specifications without n
©2004 Micron Technology, Inc. All rights rese
ACT 0 No 1 0 0 Reserved
ACT = ACTIVATE
CKE_H = CKE HIGH, exit power-down or self refresh 1 Yes 1 0 1 Reserved
CKE_L = CKE LOW, enter power-down 1 1 0 Reserved
(E)MRS = (Extended) mode register set
CKE_L Activating PRE = PRECHARGE M11 M10 M9 Write Recovery 1 1 1 Reserved
_L PRE_A = PRECHARGE ALL
C KE READ = READ 0 0 0 Reserved
Active
power- READ A = READ with auto precharge 2
0 0 1 M3 Burst Type
down REFRESH = REFRESH
C K C KE _ SR = SELF REFRESH 0 1 0 3 0 Sequential
E _L H WRITE = WRITE
WRITE A = WRITE with auto precharge 0 1 1 4 1 Interleaved
Bank
1 0 0 5
active
1 0 1 6 CAS Latency (CL)
E
M6 M5 M4
RE
WRITE RIT AD READ 1 1 0 7 Reserved
A
W 0 0 0
RE
E
AD
RIT
1 1 1 8 Reserved
0 0 1
W
A
PR
Notes:
E_
with with
,
programmed to “0.”
E
PR
auto auto
precharge precharge
2. Mode bits (Mn) with corresponding address balls (An)
Precharging
CS 250 L07: Memory
served for future use and must UC
be Regents
programmed to “0
S16 © UCB
DRAM controllers: reorder requests
(A) Without access scheduling (56 DRAM Cycles)
(A) Without access scheduling
Time (Cycles) (56 DRAM Cycles)
Time (Cycles)
Column)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
Column)
(0,0,0) 1 P
2 3 4 A
5 6 C
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
(0,1,0) P A C
Row,Row,
(0,0,0) P A C
(0,0,1) P A C
(0,1,0) P A C
(0,1,3) P A C
(Bank,
(0,0,1) P A C
(1,0,0) P A C
(0,1,3) P A C
(Bank,
(1,1,1) P A C
P A C
References
(1,0,0)
(1,0,1) P A C
(1,1,1) P A C
(1,1,2) P A C
References
(1,0,1) P A C
(1,1,2) P A C
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
P: bank precharge (3 cycle occupancy)
Column)
(0,0,0) 1 P
2 3 4 A
5 6 C
7 8 9 10 11 12 13 14 15 16 17 18 19
(0,1,0) P A C P: bank
A: precharge (3 cycle occupancy)
row activation
Row,Row,
(0,0,0) P A C
(0,0,1) C
(0,1,0) P A C A: row activation (3 cycle occupancy)
(0,1,3) C C: column access (1
(Bank,
(0,0,1) C
(1,0,0) P A C
(0,1,3) C C: column access (1 cycle occupancy)
(Bank,
(1,1,1) P A C
(1,0,0) P A C
References
(1,0,1) C
(1,1,1) P A C
(1,1,2) C
References
(1,0,1) C
(1,1,2) C
Figure 1. Time to complete a series of memory references without (A) and with (B) access reordering.
Figure 1. Time to complete a series of memory references without (A) and with (B) access reordering.
charge, a row access, and a column access for a total of available column accesses, the cached row must be wr
seven
charge,cycles
a rowper reference,
access, and aorcolumn
56 cycles for all
access foreight refer-
a total of back to the
available memory
column array the
accesses, by cached
an explicit
row operation
must be wr(
ences. If we reschedule
seven cycles
1B theyIfcan
theseoroperations
per reference, 56 cycles as Memory
forshown
all eight Access
in Figure
refer-From: Scheduling
precharge)
back to the which
memory prepares the an
array by bank for a operation
explicit subsequent(
ences. we be performed
reschedule in 19
these cycles. as shown in Figure
operations activation.
precharge) An overview
which of several
prepares different
the bank modern DR
for a subsequent
1
1B they CS
can250beL07:
performed
Memory in 19 Rixner
Scott types
cycles., William J. Dally, Ujval J. Kapasi, Peter and organizations,
activation. An
Mattson, andoverview along
of
John D. Owens withdifferent
several a performance
UC Regentsmodern com
S16 © UCBDR
Memory Packaging
responsible for
DQ2 DQ DQ34 DQ
NC/DQS9#
DQ3 DQ U1 NC/DQS13#
DQ35 DQ U9
DM/ NU/ CS# DQS DQS# DM/ NU/ CS# DQS DQS#
DQ
DQ4 DQ RDQS DQ36
RDQS#
RDQS RDQS# DQ
DQ0 DQ DQ5 DQ DQ32 DQ DQ37
DQ1 DQ6 DQ DQ33 DQ DQ38 DQ
DQ
DQ2 DQ7 DQ DQ34 DQ DQ39 DQ
DQ
8 lines of the
DQ3 DQ U1 DQ35 DQ U9
DQ4 DQ DQ36 DQ
DQS1 DQS5
DQ5 DQ
DQS1# DQ37 DQ
DQS5#
DQ6 DM1/DQS10
DQ DQ
DQ38 DM5/DQS14
DQ7 NC/DQS10#
DQ DQ
DQ39 NC/DQS14#
check bits).
DQS2 DQS6
DQ13 DQ DQ45 DQ
DQS6#
DQS2#
DQ14 DQ
DM2/DQS11 DQ46 DQ
DM6/DQS15
DQ15 DQ
NC/DQS11# DQ47 DQ
NC/DQS15#
DM/ NU/ CS# DQS DQS# DM/ NU/ CS# DQS DQS#
RDQS RDQS# RDQS RDQS#
DQS2 DQ16 DQ DQS6 DQ48 DQ
DQS2# DQ17 DQ DQS6# DQ49 DQ
DM2/DQS11 DQ18 DQ DM6/DQS15 DQ50 DQ
NC/DQS11# DQ19 DQ U3 NC/DQS15# DQ51 DQ U11
DM/ DQ20
NU/ CS# DQS DQS# DM/ DQ52
NU/ CS# DQS DQS#
DQ
DQ RDQS RDQS#
RDQS RDQS# DQ
DQ DQ21 DQ DQ48 DQ DQ53
DQ16
DQ DQ54 DQ
Commands sent
DQ17 DQ DQ22 DQ49 DQ
DQ23 DQ DQ50 DQ DQ55 DQ
DQ18 DQ
DQ19 DQ U3 DQ51 DQ U11
DQ20 DQS3
DQ DQ52 DQS7
DQ
DQS3#
DQ DQ53 DQS7#
DQ
DQ21
DM3/DQS12 DM7/DQS16
DQ
DQ22 DQ DQ54
NC/DQS16#
to all 9 chips,
NC/DQS12#
DQ23 DQ DQ55 DQ
DM/ NU/ CS# DQS DQS# DM/ NU/ CS# DQS DQS#
RDQS RDQS# RDQS RDQS#
DQS3 DQ24 DQ DQS7 DQ56 DQ
DQS3# DQ25 DQ DQS7# DQ57 DQ
DM3/DQS12 DQ26 DQ DM7/DQS16 DQ58 DQ
NC/DQS12# U4 NC/DQS16# DQ59 DQ U12
qualified by
DQ27 DQ
DM/ DQ28
NU/ CS# DQS DQS#
DQ DM/ DQ60
NU/ CS# DQS DQS#
DQ
RDQS RDQS# RDQS RDQS#
DQ DQ DQ61 DQ
DQ24 DQ DQ29 DQ56
DQ DQ62 DQ
DQ25 DQ DQ30 DQ57 DQ
DQ DQ DQ63 DQ
DQ26 DQ DQ31 DQ58
DQ27 DQ U4 DQ59 DQ U12
per-chip
DQ28 DQS8
DQ DQ60 DQ
DQS8# DQ61 DQ DDR2 SDRAM
DQ29 DQ
DM8/DQS17 DQ
DQ30 DQ
NC/DQS17# DQ62 DDR2 SDRAM
DQ31 DQ DQ63 DQ U8 DDR2 SDRAM
DM/ NU/ CS# DQS DQS#
RDQS RDQS# DDR2 SDRAM
select lines.
DQS8 CB0 DQ CK0 DDR2 SDRAM
DQS8# CB1 DQ CK0# PLL
DDR2 SDRAM DDR2 SDRAM
DM8/DQS17 CB2 DQ
NC/DQS17# DDR2 SDRAM DDR2 SDRAM
CB3 DQ U5 U8 RESET# DDR2 SDRAM
DM/ NU/
CB4 CS# DQS DQS#
DQ DDR2 SDRAM
RDQS RDQS# U7 DDR2 SDRAM
CB0 DQ CB5 DQ DDR2 SDRAM
CB6 DQ CK0
SPD EEPROM DDR2 SDRAM
CB1 DQ SCL SDA PLL Register
DQ CK0# DDR2 SDRAM
CB2 DQ CB7 WP A0 A1 A2
CS 250 L07: Memory CB3 DQ U5 RESET#
UC Regents S16 © UCB
DDR2 SDRAM
U6 VSS SA0 SA1 SA2 DDR2 SDRAM
MacBook Air ... too thin to use DIMMs
Play:
CS 250 L10: Memory UC Regents Fall 2013 © UCB
Static Memory Circuits
Dynamic Memory: Circuit remembers
for a fraction of a second.
Static Memory: Circuit remembers
as long as the power is on.
Non-volatile Memory: Circuit remembers
for many years, even if power is off.
CS 250 L07: Memory UC Regents S16 © UCB
Recall DRAM cell: 1 T + 1 C
“Word Line”
“Row”
“Column”
Bit Line
Word
“Column” “Row” Line
Vdd
“Bit Line”
CS 250 L07: Memory UC Regents S17 © UCB
Idea: Store each bit with its complement
x x
“Row”
Why?
Gnd Vdd
y y
Vdd Gnd
We can use the redundant
representation to compensate
for noise and leakage.
Isd
y y
Gnd Vdd
Ids
Isd
y y
Vdd Gnd
Ids
y y
x x
CS 250 L07: Memory UC Regents S17 © UCB
SRAM Challenge #1: It’s so big!
SRAM area is 6X-10X DRAM area, same generation ...
Initial Initial
state state
Vdd Gnd
Bitline Bitline
drives drives
Gnd Vdd
CS 250 L07: Memory UC Regents S16 © UCB
Challenge #3: Preserving state on read
When word line goes high on read, cell inverters must drive
large bitline capacitance quickly,
to preser ve state on its small cell capacitances
Cell Cell
state state
Vdd Gnd
Bitline Bitline
a big a big
capacitor capacitor
Differential
Read or Write
ports
BitB BitB
BitA BitA
Wordline
Read Bitline
Optional Single-ended
Read port
Lecture 9, Memory 15 CS250, UC Berkeley, Fall 2012
Lec19.13
WrWrite
Driver & WrWrite
Driver & WrWrite
Driver & WrWrite
Driver &
- Precharger
Driver + - Precharger
Driver + - Precharger
Driver + - Precharger
Driver +
Word 0 A0
Address Decoder
SRAM SRAM SRAM SRAM
Parallel Cell Cell Cell Cell A1
Data Word 1 A2
I/O SRAM
Cell
SRAM
Cell
SRAM
Cell
SRAM
Cell A3
Lines
: : : :
Word 15
SRAM SRAM SRAM SRAM
Cell Cell Cell Cell
- Sense Amp + - Sense Amp + - Sense Amp + - Sense Amp + Q: Which is longer:
Add muxes
word line or
bit line?
Dout 3 Dout 2 Dout 1 Dout 0
to select
4/12/04 ©UCB Spring 2004
subset
CS152 of bits
/ Kubiatowicz
SRAM advantages
SRAM has deterministic latency:
its cells do not need to be refreshed.
SRAM is much faster: transistors
drive bitlines on reads.
SRAM easy to design in logic
fabrication process (and premium
logic processes have SRAM add-ons)
CS 250 L07: Memory UC Regents S17 © UCB
Flip Flops Revisited
x x!
CS 250 L10: Memory UC Regents Fall 2013 © UCB
clk clk
Recall: Positive edge-triggered flip-flop
Q clk
clk’
ay in Flip-flops
D Q
setup time
A flip-flop “samples” right before
clock to Q delay
the edge, and then “holds” clk’ value.
• Setup time results from delayto Q delay results from
• Clock
through firstSampling
latch. delay throughHolds
second latch.
clk
circuit clk’
value
clk clk’
clk’ clk
to Q delay
clk’ clk
clk’
What do we get for the 10 extra transistors?
Clocked logic semantics.
CS 250 L07: Memory UC Regents S16 © UCB
Small Memories from Stdcell Latches
Write Address Write Data Read Address
Clk
Write by
clocking latch
Combinational logic for
Add additional ports by replicating read port (synthesized)
read and write port logic (multiple
write ports need mux in front of latch) Clk Optional read output latch
Expensive to add many ports
For small
register Synthesis
files, logic
synthesis is
competitive.
Not clear if
the SRAM SRAMS
data points
include area Register
Figure 3: Using the raw area data, the physical implementation team can get a more accurate area estimation early in the
RTL development stage for floorplanning purposes. This shows an example of this graph for a 1-port, 32-bit-wide SRAM.
Bhupesh Dasila
Memory Design
Patterns
Cache A Cache B
Common Memory
The
arbiter
and
interconnect
on the last
slide is how
the two
caches on
this chip
share
access to
DRAM.
CS 152 L14: Cache I UC Regents Spring 2005 © UCB
Stream-Buffered Multiport Memory
Problem: Require simultaneous read and write access by multiple
independent agents to a large shared common memory, where each
requester usually makes multiple sequential accesses.
Solution: Organize memory to have a single wide port. Provide each
requester with an internal stream buffer that holds width of data returned/
consumed by each memory access. Each requester can access own stream
buffer without contention, but arbitrates with others to read/write stream
buffer from memory.
Applicability: Requesters make mostly sequential requests and can
tolerate variable latency for accesses.
Consequences: Requesters must wait arbitration delay to determine if
request will complete. Have to provide stream buffers for each requester.
Need sufficient access width to serve aggregate bandwidth demands of all
requesters, but wide data access can be wasted if not all used by requester.
Have to specify memory consistency model between ports (e.g., provide
stream flush operations).
Lecture 9, Memory 26 CS250, UC Berkeley, Fall 2012
Stream-Buffered Multiport Memory
Wide Memory
Copy 0 Copy 1
Intel Micron 8 GB NAND flash device, 2 bit per cell, 25 nm minimum feature, 16.5 mm by 10.1 mm.
p-
+++ +++
--- ---
Vg Ids
2. 10,000 electrons on floating gate shift
transistor threshold by 2V. Vs
3. In a memory array, shifted transistors
hold “0”, unshifted hold “1”. “Floating gate”.
CS 250 L10: Memory UC Regents Fall 2013 © UCB
Moving electrons on/off floating gate
A high drain voltage A high gate voltage
injects “hot electrons” “tunnels” electrons off
onto floating gate. Vg of floating gate.
Vd dielectric Vs
dielectric
n+ n+
p-
Page format:
@"(
@(
&?/
.(
&?2
Clock out page bytes:
&<: 52,800 ns
#"(
&:@A
&: &:/
*(
Page address in: 175 ns
!
&::
! !
DA)E BBC, /6=D,<++9 /6=D,<++! :6;,<++9 :6;,<++! :6;,<++F FBC, 563&,7 563&,789 563&,E
/6=3>%,<++('44 :6;,<++('44
*A5 234.
First byte out: 10,000 ns
CS 250 L10: Memory UC Regents Fall 2013 © UCB
Where Time Goes
K9WAG08U1A
K9K8G08U0A K9NBG08U5A !"AS% 'E'OR+
!igure 1. K9K8G08U0A !unctional Block Diagram
PCC
PSS
in: +-Gating
175 ns Command
Command Clock
Register
I/O Buffers & "atches S--
S66
out
-N
<N
Control "ogic
& %igh Poltage QRI I
page
Output
ON Generator Global Buffers
Driver bytes:
QRI L
52,
-?N A?N OP
800
!igure 2. K9K8G08U0A Array OrganiHation
ns
CS 250 L10: Memory UC Regents Fall 2013 © UCB
Writing a Page ... L:,.0)&
M5&A$)N:3=BO
N:3=B)&
A page lives in a block of 64 pages:
L,/0)&
1GB Flash: 8K blocks L,/0)!
N:3=B)5
To write a page: L,/0)&
L,/0)!
1. Erase all pages in the block Time: 1,500,000 ns
(cannot erase just one page).
L,/0)P5
N:3=B)&
Even when new, not all blocks work!
L,/0)&
1GB: 8K blocks, 160 may be bad. L,/0)!
N:3=B)5
During factory testing, Samsung writes good/bad
L,/0)&
info for each block in the meta data bytes.L,/0)!
2048 Bytes + 64 Bytes
L,/0)P5
L,/0)PJ
(user data) (meta data)
After an erase/program, chip can say “write failed”, and block
is now “bad”. OS must recover (migrate bad block data to a
new block). Bits can also go bad “silently” (!!!).
CS 250 L10: Memory N:3=B)A
UC Regents Fall 2013 © UCB
Flash controllers: Chips or Verilog IP ...
Flash memory controller manages write lifetime
management, block failures, silent bit errors ...