Memory-System Design A processor needs to retrieve instructions and data from memory, and store results into memory.
We call this memory Random Access Memory (RAM).
Instructions
Processor
Data
Memory (RAM)
There are two general types of Random Access Memory (RAM) Static RAM fast and expensive. Dynamic RAM slow and cheap.
It would be nice if we could use dynamic RAM (DRAM) for all our main memory needs, but: Best access times for DRAM are 50 to 60 nanoseconds (ns). Processor logic speeds are over 500 MHz 2 ns access requirement.
Memory Systems
Architecture of Parallel Computers
Memory Addressing A 32 bit memory cell takes a 5-bit address and reads or writes 1 bit of data at a time:
A0 A1 A2 A3 A4
5 bit Address in
Memory Array
D0
1 bit data (in or out)
The actual memory is organized in an array of rows and columns, and addressed with a multiplexed address using a Row Address Strobe (RAS) and a Column Address Strobe (CAS):
CAS RAS
Column decode and sense
D0
1 bit data (in or out)
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
A0
Row decode and latch 1 bit 1 bit 1 bit 1 bit
A1
1 bit
1 bit
1 bit
1 bit
A2
1 bit 1 bit 1 bit 1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1 bit
1997, 1999 G.Q. Kenney
CSC 506, Summer 1999
Although the memory array addresses a single bit, we can replicate the arrays so we can get multiple bits in or out in parallel. For example, if we put four 32-bit arrays into a single chip, we get a total of 128 bits, organized as 32 x 4:
5 bit Address in
A0 A1 A2 A3 A4
D3 D2
Memory Array
D1 D0
4 bits data (in or out)
Most computer systems today address memory in byte (8 bit) increments. That is, each sequential address is assumed to identify a byte. Bits, bytes and words Memory is really stored in a bit array. Eight-bit computers access memory and manipulate data a byte (8 bits) at a time. Each sequential memory address denotes a byte of data. Data are stored and retrieved from memory a byte at a time. 16-bit computers access memory and manipulate data up to 16 bits at a time. 16 bits is the word length. In most 16-bit computers, each sequential memory address still denotes a byte of data. However, data are stored and retrieved from memory two bytes at a time. 32-bit computers access memory and manipulate data up to 32 bits at a time. 32 bits is the word length. In most 32-bit computers, each sequential memory address still denotes a byte of data. However, data are stored and retrieved four bytes at a time. Some 32-bit computers access memory 64 bits at a time. 64 bits is the word length. Each sequential memory address still denotes a byte of data. However, data are stored and retrieved from memory eight bytes at a time.
Memory Systems
Architecture of Parallel Computers
Standard Memory Packaging Early memory modules (chips) were always organized x 1 bit. They are called DIP (dual in-line package) chips. We need eight of them at a time for a computer that uses an 8-bit memory bus:
A0
20 bit Address in
A19
D0
D1
D2
D3
D4
D5
D6
D7
8 bit data in or out
Later, the memory chips were packaged onto a small board called a SIMM Single In-line Memory Module that provided eight data bits in parallel. These are called 30 pin SIMMs. 30-pin SIMMs must be installed in groups of four on computers that access 32 bits in parallel. In order to accommodate the 32-bit computers, the 72-pin SIMM was developed. It provided a single package with 32 bits in parallel. More recent computers use a 64-bit bus width. They need 72-pin SIMMS installed in pairs. So, the 168-pin Dual In-line Memory Module (DIMM) was developed. It has 168 pins and a data width of 64 bits. This is an example 168-pin DIMM package:
1997, 1999 G.Q. Kenney
CSC 506, Summer 1999
DRAM Timing Access to dynamic RAM requires: Row address placed on the address pins RAS signal given Column address placed on the address pins CAS signal given Read data from the output pin(s) There is a minimum time required between each of the steps, and these taken together determine the minimum time required to retrieve data from the DRAM. The timings for individual memory parts are given in AC timing diagrams. The access time of asynchronous DRAM is normally specified as the time from when the RAS signal is given to the time that data is valid on the output pins. Note that there is also a minimum row address setup time that needs to be added to that time. The cycle time of DRAM is the rate at which successive random accesses can be made to the RAM. For example 60 ns access RAM can be cycled only every 110 ns.
Memory Systems
Architecture of Parallel Computers
IBM FPM DRAM block diagram and Read cycle timing diagram from Fast Page Mode DRAM, and EDO DRAM on class website.
1997, 1999 G.Q. Kenney
CSC 506, Summer 1999
Typical RAM Parameters
Fast page mode DRAM parameters Parameter tRAC RAS Access Time tCAC CAS Access Time tAA Column Address Access Time tRC Cycle Time tPC Fast Page Mode Cycle Time -50 50 13 25 95 35 -60 Units 60 ns 15 ns 30 ns 110 ns 40 ns
Extended Data Out (EDO) DRAM parameters Parameter -50 tRAC RAS Access Time 50 tCAC CAS Access Time 13 tAA Column Address Access Time 25 tRC Cycle Time 84 tHPC EDO (Hyper Page) Mode Cycle Time 20
-60 Units 60 ns 15 ns 30 ns 104 ns 25 ns
10 ns Synchronous DRAM (SDRAM) parameters Parameter fCK Clock Frequency 100 66 tCK Clock Cycle Time 10 15 tAA CAS Latency 3 2 tRL RAS Latency 6 4 tRC Bank Cycle Time 8 5
33 30 1 2 3
Units MHz ns CLK CLK CLK
Pipeline burst SRAM parameters Parameter tCYCLE Cycle Time tAS Address Setup Time tAH Address Hold Time tCQ Clock to Output Valid
-4 10.0 2.5 0.5 4.0
-5 Units 10.0 ns 2.5 ns 0.5 ns 5.0 ns
Memory Systems
Architecture of Parallel Computers
Because access time is faster than cycle time, memory may be installed in two banks, with successive words alternating between the banks. We can then partially overlap access to successive memory words by starting the access to the second bank while the first one is finishing. Interleaved Memory
D7 D6
Memory Array
Addr. Latch
A0 A1 A2 A3 A4
D5 D4 D3 D2 D1 D0 D7 D6 D5 D4 D3 D2 D1 D0
Bank 0
A0 (BS) A1 A2 A3 A4 A5
D7 D6
Memory Array
Addr. Latch
A0 A1 A2 A3 A4
D5 D4 D3 D2 D1 D0
Bank 1
With a byte-wide bus, sequential bytes are in alternate banks. Bank 0 contains byte addresses 0, 2, 4, 6, etc. and bank 1 contains addresses 1, 3, 5, 7, etc. If we have a word width (bus) greater than a byte, sequential words are in alternate banks. For example, if we have a word width of 32 bits (4 bytes), 32 words in each bank, and two banks of memory (for a total of 2048 bits or 256 bytes of memory), the following diagram shows the addressing:
1997, 1999 G.Q. Kenney
CSC 506, Summer 1999
Data Latch
Addr. Latch
32
Bank 0
Data Latch
8 D0 - D7 8 D8 - D15 8 D16 - D23 8
A2 (BS) 5 A3 - A7
Addr. Latch
32
D24 - D31
Bank 1
Memory Address
5 bits 1 bit 2 bits
8 bit address
4 Bytes within a word One of two banks
32 words in each bank
Bank address
00000 00001 00010 00011 00100 00101 0 8
Bank 0
1 9 2 3 4
Bank 1
5 6 7
10 11
12 13 14 15 20 21 22 23 28 29 30 31
16 17 18 19 24 25 26 27
Memory Systems
Architecture of Parallel Computers
Synchronous DRAM (SDRAM) SDRAM has actually taken the step of packaging interleaved memory into the memory chip. IBM Synchronous DRAM (SDRAM) block diagram and timing diagram from Synchronous DRAM on class website.
1997, 1999 G.Q. Kenney
CSC 506, Summer 1999
10
Static RAM Static RAM (SRAM) does not need separate RAS and CAS signals. The data is stored in flip-flops and so doesnt need to be destroyed and re-written on access, and it doesnt need to be periodically refreshed. SRAM is expensive (it takes more than four times the chip area than DRAM for an equivalent number of bits), but access times are fast. IBM Burst Pipeline SRAM timing diagram from Static RAM on class website.
Memory Systems
Architecture of Parallel Computers
11
1997, 1999 G.Q. Kenney
CSC 506, Summer 1999
12