Module 4
Module 4
• The data-input and data-output of each Sense/Write circuit are connected to a single bidirectional
data-line.
• Data-line can be connected to a data-bus of the computer.
• Following 2 control lines are also used:
1) R/W’ Specifies the required operation.
2) CS’ Chip Select input selects a given chip in the multi-chip memory-system.
• Memories consist of circuits capable of retaining their state as long as power is applied are known.
CMOS Cell
• Transistor pairs (T3, T5) and (T4, T6) form the inverters in the latch (Figure 8.5).
• In state 1, the voltage at point X is high by having T5, T6 ON and T4, T5 are OFF.
• Thus, T1 and T2 returned ON (Closed), bit-line b and b‟ will have high and low signals respectively.
• Advantages:
1) It has low power consumption „.‟ the current flows in the cell only when the cell is active.
2) Static RAM‟s can be accessed quickly. It access time is few nanoseconds.
• Disadvantage: SRAMs are said to be volatile memories „.‟ their contents are lost when power
is interrupted.
(Figure 8.6).
• The appropriate voltage is applied to the bit-line which charges the capacitor.
• After the transistor is turned off, the capacitor begins to discharge.
• Hence, info. stored in cell can be retrieved correctly before threshold value of
capacitor drops down.
• During a read-operation,
→ transistor is turned „ON‟
→ a sense amplifier detects whether the charge on the capacitor is above the
threshold value.
If (charge on capacitor) > (threshold value) Bit-line will have logic value „1‟.
If (charge on capacitor) < (threshold value) Bit-line will set to logic value „0‟.
• During Read/Write-operation,
→ row-address is applied first.
→ row-address is loaded into row-latch in response to a signal pulse on RAS’ input of chip.
(RAS = Row-address Strobe CAS = Column-address Strobe)
• When a Read-operation is initiated, all cells on the selected row are read and
refreshed.
• Shortly after the row-address is loaded, the column-address is
→ applied to the address pins &
→ loaded into CAS’.
SYNCHRONOUS DRAM
• The operations are directly synchronized with clock signal (Figure 8.8).
• The address and data connections are buffered by means of registers.
• The output of each sense amplifier is connected to a latch.
• A Read-operation causes the contents of all cells in the selected row to be loaded in these latches.
• Data held in latches that correspond to selected columns are transferred into data-output register.
• Thus, data becoming available on the data- output pins.
• First, the row-address is latched under control of RAS‟ signal (Figure 8.9).
• The memory typically takes 2 or 3 clock cycles to activate the selected row.
• Then, the column-address is latched under the control of CAS‟ signal.
• After a delay of one clock cycle, the first set of data bits is placed on the data-lines.
• SDRAM automatically increments column-address to access next 3 sets of bits in the selected row.
MEMORY-SYSTEM CONSIDERATION
MEMORY CONTROLLER
• To reduce the number of pins, the dynamic memory-chips use multiplexed-address inputs.
• The address is divided into 2 parts:
1) High Order Address Bit
Select a row in cell array.
It is provided first and latched into memory-chips under the control of RAS‟ signal.
2) Low Order Address Bit
Selects a column.
They are provided on same address pins and latched using CAS‟ signals.
• The Multiplexing of address bit is usually done by Memory Controller Circuit (Figure 5.11).
• The Controller accepts a complete address & R/W‟ signal from the processor.
• A Request signal indicates a memory access operation is needed.
• Then, the Controller
→ forwards the row & column portions of the address to the memory.
→ generates RAS‟ & CAS‟ signals &
→ sends R/W‟ & CS‟ signals to the memory.
RAMBUS MEMORY
• The usage of wide bus is expensive.
• Rambus developed the implementation of narrow bus.
• Rambus technology is a fast signaling method used to transfer information between chips.
• The signals consist of much smaller voltage swings around a reference voltage Vref.
• The reference voltage is about 2V.
• The two logical values are represented by 0.3V swings above and below Vref.
• This type of signaling is generally is known as Differential Signalling.
• Rambus provides a complete specification for design of communication called as Rambus Channel.
• Rambus memory has a clock frequency of 400 MHz.
• The data are transmitted on both the edges of clock so that effective data-transfer rate is 800MHZ.
• Circuitry needed to interface to Rambus channel is included on chip. Such chips are called RDRAM.
(RDRAM = Rambus DRAMs).
• Rambus channel has:
1) 9 Data-lines (1st-8th line ->Transfer the data, 9th line->Parity checking).
2) Control-Line &
3) Power line.
• A two channel rambus has 18 data-lines which has no separate Address-Lines.
• Communication between processor and RDRAM modules is carried out by means of packets
transmitted on the data-lines.
• There are 3 types of packets:
1) Request
2) Acknowledge &
3) Data.
TYPES OF ROM
• Different types of non-volatile memory are
1) PROM
2) EPROM
3) EEPROM &
4) Flash Memory (Flash Cards & Flash Drives)
FLASH MEMORY
• In EEPROM, it is possible to read & write the contents of a single cell.
• In Flash device, it is possible to read contents of a single cell & write entire contents of a block.
• Prior to writing, the previous contents of the block are erased.
Eg. In MP3 player, the flash memory stores the data that represents sound.
• Single flash chips cannot provide sufficient storage capacity for embedded-system.
• Advantages:
1) Flash drives have greater density which leads to higher capacity & low cost per bit.
2) It requires single power supply voltage & consumes less power.
• There are 2 methods for implementing larger memory: 1) Flash Cards & 2) Flash Drives
1) Flash Cards
One way of constructing larger module is to mount flash-chips on a small card.
Such flash-card have standard interface.
The card is simply plugged into a conveniently accessible slot.
Memory-size of the card can be 8, 32 or 64MB.
Eg: A minute of music can be stored in 1MB of memory. Hence 64MB flash cards can store an
hour of music.
2) Flash Drives
Larger flash memory can be developed by replacing the hard disk-drive.
The flash drives are designed to fully emulate the hard disk.
The flash drives are solid state electronic devices that have no movable parts.
Advantages:
1) They have shorter seek & access time which results in faster response.
2) They have low power consumption. .‟. they are attractive for battery driven
application.
3) They are insensitive to vibration.
Disadvantages:
1) The capacity of flash drive (<1GB) is less than hard disk (>1GB).
2) It leads to higher cost per bit.
COMPUTER ORGANIZATION | DEPT. OF ELECTRONICS & COMMUNICATION ENGG.
82
COMPUTER ORGANIZATION|MODULE 4: MEMORY SYSTEM 18EC35
3) Flash memory will weaken after it has been written a number of times (typically at
least 1 million times).
SPEED, SIZE COST
The word in memory will be updated later, when the marked-block is removed from cache.
During Read-operation
• If the requested-word currently not exists in the cache, then read-miss will occur.
• To overcome the read miss, Load–through/Early restart protocol is used.
Load–Through Protocol
The block of words that contains the requested-word is copied from the memory into cache.
After entire block is loaded into cache, the requested-word is forwarded to processor.
During Write-operation
• If the requested-word not exists in the cache, then write-miss will occur.
1) If Write Through Protocol is used, the information is written directly into main-memory.
2) If Write Back Protocol is used,
→ then block containing the addressed word is first brought into the cache &
→ then the desired word in the cache is over-written with the new information.
MAPPING-FUNCTION
• Here we discuss about 3 different mapping-function:
1) Direct Mapping
COMPUTER ORGANIZATION | DEPT. OF ELECTRONICS & COMMUNICATION ENGG.
84
COMPUTER ORGANIZATION|MODULE 4: MEMORY SYSTEM 18EC35
2) Associative Mapping
3) Set-Associative Mapping
DIRECT MAPPING
• The block-j of the main-memory maps onto block-j modulo-128 of the cache (Figure 8.16).
• When the memory-blocks 0, 128, & 256 are loaded into cache, the block is stored in cache-block 0.
Similarly, memory-blocks 1, 129, 257 are stored in cache-block 1.
• The contention may arise when
1) When the cache is full.
2) When more than one memory-block is mapped onto a given cache-block position.
• The contention is resolved by
allowing the new blocks to overwrite the currently resident-block.
• Memory-address determines placement of block in the cache.
If they match, then the desired word is in that block of the cache.
Otherwise, the block containing required word must be first read from the memory.
And then the word must be loaded into the cache.
ASSOCIATIVE MAPPING
1) The memory-block can be placed into any cache-block position. (Figure 8.17).
2) 12 tag-bits will identify a memory-block when it is resolved in the cache.
3) Tag-bits of an address received from processor are compared to the tag-bits of each block of
cache.
4) This comparison is done to see if the desired block is present.
SET-ASSOCIATIVE MAPPING
• It is the combination of direct and associative mapping. (Figure 8.18).
• The blocks of the cache are grouped into sets.
• The mapping allows a block of the main-memory to reside in any block of the
specified set.
• The cache has 2 blocks per set, so the memory-blocks 0, 64, 128…….. 4032 maps
into cache set „0‟.
• The cache can occupy either of the two block position within the set.
6 bit set field
Determines which set of cache contains the desired block.
6 bit tag field
The tag field of the address is compared to the tags of the two blocks of the set.
This comparison is done to check if the desired block is present.
• The cache which contains 1 block per set is called direct mapping.
• A cache that has „k‟ blocks per set is called as “k-way set associative cache‟.
• Each block contains a control-bit called a valid-bit.
• The Valid-bit indicates that whether the block contains valid-data.
• The dirty bit indicates that whether the block has been modified during its cache
residency.
Valid-bit=0 When power is initially applied to system.
If the main-memory-block is updated by a source & if the block in the source is already exists
• If Processor & DMA uses the same copies of data then it is called as Cache Coherence Problem.
• Advantages:
1) Contention problem of direct mapping is solved by having few choices for block placement.
2) The hardware cost is decreased by reducing the size of associative search.
REPLACEMENT ALGORITHM
• In direct mapping method,
the position of each block is pre-determined and there is no need of replacement strategy.
• In associative & set associative method,
The block position is not pre-determined.
If the cache is full and if new blocks are brought into the cache,
then the cache-controller must decide which of the old blocks has to be replaced.
• When a block is to be overwritten, the block with longest time w/o being referenced is over-written.
• This block is called Least recently Used (LRU) block & the technique is called LRU algorithm.
• The cache-controller tracks the references to all blocks with the help of block-counter.
• Advantage: Performance of LRU is improved by randomness in deciding which block is to be over-
written.
Eg:
Consider 4 blocks/set in set associative cache.
2 bit counter can be used for each block.
When a ‘hit’ occurs, then block counter=0; The counter with values originally lower than the
referenced one are incremented by 1 & all others remain unchanged.
When a ‘miss’ occurs & if the set is full, the blocks with the counter value 3 is removed, the
new block is put in its place & its counter is set to “0‟ and other block counters are incremented
by 1.
PERFORMANCE CONSIDERATION
• Two key factors in the commercial success are 1) performance & 2) cost.
• In other words, the best possible performance at low cost.
• A common measure of success is called the Pricel Performance ratio.
• Performance depends on
→ how fast the machine instructions are brought to the processor &
→ how fast the machine instructions are executed.
• To achieve parallelism, interleaving is used.
• Parallelism means both the slow and fast units are accessed in the same manner.
INTERLEAVING
• The main-memory of a computer is structured as a collection of physically separate modules.
• Each module has its own
1) ABR (address buffer register) &
2) DBR (data buffer register).
• So, memory access operations may proceed in more than one module at the same time (Fig 5.25).
• Thus, the aggregate-rate of transmission of words to/from the main-memory can be increased.
• When OS changes contents of page-table, the control-bit will invalidate corresponding entry in TLB.
• Given a virtual-address, the MMU looks in TLB for the referenced-page.
If page-table entry for this page is found in TLB, the physical-address is obtained immediately.
Otherwise, the required entry is obtained from the page-table & TLB is updated.
Page Faults
• Page-fault occurs when a program generates an access request to a page that is not in memory.
• When MMU detects a page-fault, the MMU asks the OS to generate an interrupt.
• The OS
→ suspends the execution of the task that caused the page-fault and
→ begins execution of another task whose pages are in memory.
• When the task resumes the interrupted instruction must continue from the point of interruption.
• If a new page is brought from disk when memory is full, disk must replace one of the resident pages.
In this case, LRU algorithm is used to remove the least referenced page from memory.
• A modified page has to be written back to the disk before it is removed from the memory.
In this case, Write–Through Protocol is used.
4.6 SECONDARY-STORAGE
• The semi-conductor memories do not provide all the storage capability.
• The secondary-storage devices provide larger storage requirements.
• Some of the secondary-storage devices are:
1) Magnetic Disk
2) Optical Disk &
3) Magnetic Tapes.
MAGNETIC DISK
• Magnetic Disk system consists of one or more disk mounted on a common spindle.
• A thin magnetic film is deposited on each disk (Figure 8.27).
• Disk is placed in a rotary-drive so that magnetized surfaces move in close proximity to R/W heads.
• Each R/W head consists of 1) Magnetic Yoke & 2) Magnetizing-Coil.
• Digital information is stored on magnetic film by applying current pulse to the magnetizing-coil.
• Only changes in the magnetic field under the head can be sensed during the Read-operation.
• Therefore, if the binary states 0 & 1 are represented by two opposite states,
then a voltage is induced in the head only at 0-1 and at 1-0 transition in the bit stream.
• A consecutive of 0‟s & 1‟s are determined by using the clock.
• Manchester Encoding technique is used to combine the clocking information with data.
• R/W heads are maintained at small distance from disk-surfaces in order to achieve high bit densities.
• When disk is moving at their steady state, the air pressure develops b/w disk-surfaces & head.
This air pressure forces the head away from the surface.
• The flexible spring connection between head and its arm mounting permits the head to fly at the
desired distance away from the surface.
Winchester Technology
• Read/Write heads are placed in a sealed, air–filtered enclosure called the Winchester Technology.
• The read/write heads can operate closure to magnetic track surfaces because
the dust particles which are a problem in unsealed assemblies are absent.
Advantages
• It has a larger capacity for a given physical size.
• The data intensity is high because
the storage medium is not exposed to contaminating elements.
• The read/write heads of a disk system are movable.
• The disk system has 3 parts: 1) Disk Platter (Usually called Disk)
2) Disk-drive (spins the disk & moves Read/write heads)
3) Disk Controller (controls the operation of the system.)
DATA BUFFER/CACHE
• A disk-drive that incorporates the required SCSI circuit is referred as SCSI Drive.
• The SCSI can transfer data at higher rate than the disk tracks.
• A data buffer can be used to deal with the possible difference in transfer rate b/w disk and SCSI bus
• The buffer is a semiconductor memory.
• The buffer can also provide cache mechanism for the disk.
i.e. when a read request arrives at the disk, then controller first check if the data is available in
the cache/buffer.
If data is available in cache.
Then, the data can be accessed & placed on SCSI bus.
Otherwise, the data will be retrieved from the disk.
DISK CONTROLLER
• The disk controller acts as interface between disk-drive and system-bus (Figure 8.13).
• The disk controller uses DMA scheme to transfer data between disk and memory.
• When the OS initiates the transfer by issuing R/W‟ request, the controllers register will load the
following information:
1) Memory Address: Address of first memory-location of the block of words involved in the
transfer.
2) Disk Address: Location of the sector containing the beginning of the desired block of words.
3) Word Count: Number of words in the block to be transferred.
Problem 1:
Consider the dynamic memory cell. Assume that C = 30 femtofarads (10−15 F) and that leakage current
through the transistor is about 0.25 picoamperes (10−12 A). The voltage across the capacitor when it is
fully charged is 1.5 V. The cell must be refreshed before this voltage drops below 0.9 V. Estimate the
minimum refresh rate.
Solution:
The minimum refresh rate is given by
Problem 2:
Consider a main-memory built with SDRAM chips. Data are transferred in bursts & the burst length is
8. Assume that 32 bits of data are transferred in parallel. If a 400-MHz clock is used, how much time
does it take to transfer:
(a) 32 bytes of data
(b) 64 bytes of data
What is the latency in each case?
Solution:
(a) It takes 5 + 8 = 13 clock cycles.
(b) It takes twice as long to transfer 64 bytes, because two independent 32-byte transfers have
to be made. The latency is the same, i.e. 38 ns.
Problem 3:
Give a critique of the following statement: “Using a faster processor chip results in a corresponding
increase in performance of a computer even if the main-memory speed remains the same.”
Solution:
A faster processor chip will result in increased performance, but the amount of increase will not
be directly proportional to the increase in processor speed, because the cache miss penalty will
remain the same if the main-memory speed is not improved.
Problem 4:
A block-set-associative cache consists of a total of 64 blocks, divided into 4-block sets. The main-
memory contains 4096 blocks, each consisting of 32 words. Assuming a 32-bit byte-addressable
address-space,
(a) how many bits are there in main-memory address
(b) how many bits are there in each of the Tag, Set, and Word fields?
Solution:
(a) 4096 blocks of 128 words each require 12+7 = 19 bits for the main-memory address.
(b) TAG field is 8 bits. SET field is 4 bits. WORD field is 7 bits.
Problem 5:
The cache block size in many computers is in the range of 32 to 128 bytes. What would be the main
advantages and disadvantages of making the size of cache blocks larger or smaller?
Solution:
Larger size
Fewer misses if most of the data in the block are actually used
Wasteful if much of the data are not used before the cache block is ejected from the
cache
Smaller size
More misses
Problem 5:
Consider a computer system in which the available pages in the physical memory are divided among
several application programs. The operating system monitors the page transfer activity and
dynamically adjusts the number of pages allocated to various programs. Suggest a suitable strategy
that the operating system can use to minimize the overall rate of page transfers.
Solution:
The operating system may increase the main-memory pages allocated to a program that has a
large number of page faults, using space previously allocated to a program with a few page
faults
Problem 6:
In a computer with a virtual-memory system, the execution of an instruction may be interrupted by a
page fault. What state information has to be saved so that this instruction can be resumed later? Note
that bringing a new page into the main-memory involves a DMA transfer, which requires execution of
other instructions. Is it simpler to abandon the interrupted instruction and completely re-execute it
later? Can this be done?
Solution:
Continuing the execution of an instruction interrupted by a page fault requires saving the entire
state of the processor, which includes saving all registers that may have been affected by the
instruction as well as the control information that indicates how far the execution has
progressed. The alternative of re-executing the instruction from the beginning requires a
capability to reverse any changes that may have been caused by the partial execution of the
instruction.
Problem 7:
When a program generates a reference to a page that does not reside in the physical main-memory,
execution of the program is suspended until the requested page is loaded into the main-memory from
a disk. What difficulties might arise when an instruction in one page has an operand in a different
page? What capabilities must the processor have to handle this situation?
Solution:
The problem is that a page fault may occur during intermediate steps in the execution of a
single instruction. The page containing the referenced location must be transferred from the
disk into the main-memory before execution can proceed.
Since the time needed for the page transfer (a disk operation) is very long, as compared to
instruction execution time, a context-switch will usually be made.
(A context-switch consists of preserving the state of the currently executing program, and
”switching” the processor to the execution of another program that is resident in the main-
memory.) The page transfer, via DMA, takes place while this other program executes. When the
page transfer is complete, the original program can be resumed.
Therefore, one of two features are needed in a system where the execution of an individual
instruction may be suspended by a page fault. The first possibility is to save the state of
instruction execution. This involves saving more information (temporary programmer-
transparent registers, etc.) than needed when a program is interrupted between instructions.
The second possibility is to ”unwind” the effects of the portion of the instruction completed
when the page fault occurred, and then execute the instruction from the beginning when the
program is resumed.
Problem 8:
Magnetic disks are used as the secondary storage for program and data files in most virtual-memory
systems. Which disk parameter(s) should influence the choice of page size?
Solution:
The sector size should influence the choice of page size, because the sector is the smallest
directly addressable block of data on the disk that is read or written as a unit. Therefore, pages
should be some small integral number of sectors in size.
Problem 9:
A disk unit has 24 recording surfaces. It has a total of 14,000 cylinders. There is an average of 400
sectors per track. Each sector contains 512 bytes of data.
(a) What is the maximum number of bytes that can be stored in this unit?
(b) What is the data-transfer rate in bytes per second at a rotational speed of 7200 rpm?
(c) Using a 32-bit word, suggest a suitable scheme for specifying the disk address.
Solution:
(a) The maximum number of bytes that can be stored on this disk is 24 X 14000 X 400 X 512 =
68.8 X 109 bytes.
(b) The data-transfer rate is (400 X 512 X 7200)/60 = 24.58 X 106 bytes/s.
(c) Need 9 bits to identify a sector, 14 bits for a track, and 5 bits for a surface.
Thus, a possible scheme is to use address bits A8-0 for sector, A22-9 for track, and A27-23 for surface
identification. Bits A31-28 are not used.