Memory
CSE 333
Computer Architecture and Assembly Language
[Adapted from slides of Dr. M. Mudawar, ICS 233, KFUPM]
Presentation Outline
Random Access Memory and its Structure
Memory Hierarchy and the need for Cache Memory
The Basics of Caches
Random Access Memory
Large arrays of storage cells
Volatile memory
Hold the stored data as long as it is powered on
Random Access
Access time is practically the same to any data on a RAM chip
Output Enable (OE) control signal RAM
n
Specifies read operation Address
Data
Write Enable (WE) control signal m
OE WE
Specifies write operation
2n × m RAM chip: n-bit address and m-bit data
Memory Technology
Static RAM (SRAM) for Cache
Requires 6 transistors per bit
Requires low power to retain bit
Dynamic RAM (DRAM) for Main Memory
One transistor + capacitor per bit
Must be re-written after being read
Must also be periodically refreshed
Each row can be refreshed simultaneously
Address lines are multiplexed
Upper half of address: Row Access Strobe (RAS)
Lower half of address: Column Access Strobe (CAS)
Burst Mode Operation
Block Transfer
Row address is latched and decoded
A read operation causes all cells in a selected row to be read
Selected row is latched internally inside the SDRAM chip
Column address is latched and decoded
Selected column data is placed in the data output register
Column address is incremented automatically
Multiple data items are read depending on the block length
Fast transfer of blocks between memory and cache
Fast transfer of pages between memory and disk
Trends in DRAM
Year Row Column Cycle Time
Chip size Type
Produced access access New Request
1980 64 Kbit DRAM 170 ns 75 ns 250 ns
1983 256 Kbit DRAM 150 ns 50 ns 220 ns
1986 1 Mbit DRAM 120 ns 25 ns 190 ns
1989 4 Mbit DRAM 100 ns 20 ns 165 ns
1992 16 Mbit DRAM 80 ns 15 ns 120 ns
1996 64 Mbit SDRAM 70 ns 12 ns 110 ns
1998 128 Mbit SDRAM 70 ns 10 ns 100 ns
2000 256 Mbit DDR1 65 ns 7 ns 90 ns
2002 512 Mbit DDR1 60 ns 5 ns 80 ns
2004 1 Gbit DDR2 55 ns 5 ns 70 ns
2006 2 Gbit DDR2 50 ns 3 ns 60 ns
2010 4 Gbit DDR3 35 ns 1 ns 37 ns
2012 8 Gbit DDR3 30 ns 0.5 ns 31 ns
SDRAM and DDR SDRAM
SDRAM is Synchronous Dynamic RAM
Added clock to DRAM interface
SDRAM is synchronous with the system clock
Older DRAM technologies were asynchronous
As system bus clock improved, SDRAM delivered
higher performance than asynchronous DRAM
DDR is Double Data Rate SDRAM
Like SDRAM, DDR is synchronous with the system
clock, but the difference is that DDR reads data on
both the rising and falling edges of the clock signal
Transfer Rates & Peak Bandwidth
Standard Memory Millions Transfers Module Peak
Name Bus Clock per second Name Bandwidth
DDR-200 100 MHz 200 MT/s PC-1600 1600 MB/s
DDR-333 167 MHz 333 MT/s PC-2700 2667 MB/s
DDR-400 200 MHz 400 MT/s PC-3200 3200 MB/s
DDR2-667 333 MHz 667 MT/s PC-5300 5333 MB/s
DDR2-800 400 MHz 800 MT/s PC-6400 6400 MB/s
DDR2-1066 533 MHz 1066 MT/s PC-8500 8533 MB/s
DDR3-1066 533 MHz 1066 MT/s PC-8500 8533 MB/s
DDR3-1333 667 MHz 1333 MT/s PC-10600 10667 MB/s
DDR3-1600 800 MHz 1600 MT/s PC-12800 12800 MB/s
DDR4-3200 1600 MHz 3200 MT/s PC-25600 25600 MB/s
1 Transfer = 64 bits = 8 bytes of data
DRAM Refresh Cycles
Refresh cycle is about tens of milliseconds
Refreshing is done for the entire memory
Each row is read and written back to restore the charge
Some of the memory bandwidth is lost to refresh cycles
Voltage 1 Written Refreshed Refreshed Refreshed
for 1
Threshold
voltage
0 Stored Refresh Cycle
Voltage Time
for 0
Processor-Memory Performance Gap
CPU Performance: 55% per year,
slowing down after 2004
Performance Gap
DRAM: 7% per year
1980 – No cache in microprocessor
1995 – Two-level cache on microprocessor
The Need for Cache Memory
Widening speed gap between CPU and main memory
Processor operation takes less than 1 ns
Main memory requires more than 50 ns to access
Each instruction involves at least one memory access
One memory access to fetch the instruction
A second memory access for load and store instructions
Memory bandwidth limits the instruction execution rate
Cache memory can help bridge the CPU-memory gap
Cache memory is small in size but fast
Typical Memory Hierarchy
Registers are at the top of the hierarchy
Typical size < 1 KB
Access time < 0.5 ns
Level 1 Cache (8 – 64 KB) Microprocessor
Access time: 1 ns Registers
L2 Cache (512KB – 8MB) L1 Cache
Access time: 3 – 10 ns L2 Cache
Bigger
Faster
Main Memory (4 – 16 GB) Memory Bus
Access time: 50 – 100 ns Main Memory
Disk Storage (> 200 GB) I/O Bus
Magnetic or Flash Disk
Access time: 5 – 10 ms
Principle of Locality of Reference
Programs access small portion of their address space
At any time, only a small set of instructions & data is needed
Temporal Locality (in time)
If an item is accessed, probably it will be accessed again soon
Same loop instructions are fetched each iteration
Same procedure may be called and executed many times
Spatial Locality (in space)
Tendency to access contiguous instructions/data in memory
Sequential execution of Instructions
Traversing arrays element by element
What is a Cache Memory ?
Small and fast (SRAM) memory technology
Stores the subset of instructions & data currently being accessed
Used to reduce average access time to memory
Caches exploit temporal locality by …
Keeping recently accessed data closer to the processor
Caches exploit spatial locality by …
Moving blocks consisting of multiple contiguous words
Goal is to achieve
Fast speed of cache memory access
Balance the cost of the memory system
Almost Everything is a Cache !
In computer architecture, almost everything is a cache!
Registers: a cache on variables – software managed
First-level cache: a cache on second-level cache
Second-level cache: a cache on memory
Memory: a cache on hard disk
Stores recent programs and their data
Hard disk can be viewed as an extension to main memory
Branch target and prediction buffer
Cache on branch target and prediction information