2co N3 PDF
2co N3 PDF
In this lecture we develop models for the control unit, ALU and memory. It is the first
that will require most care.
Output Enable(s):   We know that every register has an Output Enable (OE) input which
determines whether the tri-states on its output lines have a high or low resistance.
These are driven by levels such as OEac, OEpc and so on. We assume that OEac=1,
and similar, sets the tri-state for the AC in low resistance mode.
Next, as a more careful study of the CPU diagram Fig. 3.1 shows, some extra tri-state
buffers are required to satisfy every register input that has more than one potential
                                            1
3/2                                            LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
                                                                  MAR
                  OE1       OEpc           OEsp     OE2                                  Memory
                                                                       OEmar
                                PC             SP                    CLKmem
       INCpc/LOADpc                                                WRITE/READ
                                                                                 OEmem
                 OEad                                             MBR
      IR                                                   OEmbr
       IR(opcode) IR(address)
                                             OE3         OE4 OE5           OE6      OE7
  OEop
                                                      AC
                                               OEac
           CU            Status
                                              ALU
                                 SETalu
                                 SETshft
   Control Lines
ALU:    We develop internal hardware for the ALU in §3.2. For now, we treat it as
a black box with 8 or fewer functions, which thus require 3 level bits SETalu[2:0] to
define. These are specified in the following table.
 SETalu         Operation     Comment
  000           ALUnoop       Do nothing. Let the AC input appear on output
  001           ALUnot        Invert each bit in the AC input
  010           ALUor         Output = AC .OR. MBR
  011           ALUand        Output = AC .AND. MBR
  100           ALUadd        Output = AC + MBR
   ..           ..
    .            .
PC:   The Program Counter has a one-bit level input which tells it whether to load the
input or to increment when the clock pulse is received.
SP:   The Stack Pointer has a two-bit level input which tells it whether to load the
input, increment, or decrement, when the clock pulse is received.
3.1. THE CONTROL UNIT                                                                3/3
Now consider the execution phase of a few of the instructions in terms of levels and
pulses
   LDA x (levels and pulses)
   10. OEad=1; OE1=1; CLKmar;
   11. OEmar=1; WRITE=0; OEmem=1; CLKmbr;
   12. OEmbr=1; OE4=1; CLKac;
We can already start sketching out a one-hot controller design — that in Fig. 3.2. The
first three D-types handle the fetch, then at D-type #4 comes the decoding. Then if
STA were high, say, the “hot 1” would pass to D-type #13 then #14 which execute
STA, then back to the fetch, and so on.
3/4                                                             LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
IR(opcode)
             CSL1          CSP1
                                                            Decoder
                                                            8−to−256
                           CLK
                                                                        LDA
               D       Q   D       Q   D       Q    D       Q                 D    Q   D    Q      D    Q
                   1           2           3            4                         10       11          12
                                                                        STA
             CLK                                                              D    Q   D    Q
                                                                                  13       14
                                                                        ADD
                                                                              D    Q   D    Q      D    Q
                                                                                  15       16          17
                                                                        AND
                                                                              D    Q
                                                                                  18
Figure 3.2: The control unit for the first four opcodes.
                                                                  HALT        Decoder
                                                                              8−to−256
                                                                  LDA
                                                                                            HALT
                                                                  STA                      LDA
                                                                                         STA
                                                                                        ADD
                                                                  ADD                  AND
and so on
        Figure 3.3: Decoding the opcode. (a) With gates for using bits, and (b) as a black-box.
3.1. THE CONTROL UNIT                                                                                    3/5
         SETalu[2]
                                                               SETalu[1]
                                                                           SETalu[0]
                                                                                       CLKmem
         LOADpc
         LOADsp
                                                                                       CLKmbr
                                                                                       CLKmar
                                                                                       OEmem
         OEmbr
         OEmar
CLKpc
                                                                                       CLKac
                                                                                       CLKsp
         INCsp
                                                                                       CLKir
         OEop
         OEad
         OEpc
         OEac
What
OEsp
         OE1
         OE2
         OE3
         OE4
         OE5
         OE6
         OE7
         Line
Ftch     1.        1                                       X   X           X             1
         2.                     1                          X   X           X 1               1
         3.                         1                      X   X           X   1                 1
Dcd      4.                 1                              X   X           X
LDA     10.             1                 1                X   X           X             1
        11.                     1                          X   X           X 1               1
        12.                         1            1         X   X           X                         1
STA     13.             1               1 1              1 0   0           0             1 1
        14.                     1 1                  1                                                   1
ADD     15.
        16.
        17.
AND     18.
        19.
        20.
JMP     21.
BZ      22.
        23.
NOT     24.
SHR     25.
HLT     99.
Now look down the columns for each signal. As a work in progress, after filling in the
Fetch, Decode, and the execute phases of LDA and STA, we have already
OEpc = CSL1
OEad = CSL10 .OR. CSL13
OEmar = CSL2 .OR. CSL11 .OR. CSL14
CLKmar = CSP1 .OR. CSP10 .OR. CSP13
3/6                                         LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
ALU Out
                                                  Full Adder
         Carry In
         An
         Bn
                         Decoder
         setALU                    0
              0                    1
              1                    2
              2                    3
                                   4
Carry Out
carry is slow, and that speed-up can be achieved by inserting carry-look-ahead circuitry
between a number of bits.
The ALU contains a none/left/right shifter at its output. This is operated separately
from the other functions using a 2-bit setSHFT input (so one can add two numbers
and rightshift them all in one pass). Some designs use a shift register, but here we
regard it a black box which transparently allows bit[n] of the input to appear as one of
bit[n], bit[n-1], or bit[n+1] of the output.
                                   BA 15      BA 14                  BA 2     BA 1     BA 0
         setALU
                       3                                                                        0
Carry O 15 O 14 O2 O1 O0
         setSHFT                                          Shifter
                       2
                   C
                   N                           ZNV
                   Z                         generation
                   V                                            16
Output
Figure 3.5: Multi-bit ALU. The carry-in on the right must be 0. A shifter is placed on the back end. In
our design it is not a clocked register.
  Z Zero flag: This is set to 1 whenever the output from the ALU is zero.
  N Negative flag: This is set to 1 whenever the most significant bit of the output is 1.
    Note that it is not correct to say that it is set when the output of the ALU is neg-
3/8                                    LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
      ative: the ALU doesn’t know or care whether you are working in 2’s complement.
      However, this flag is used by the controller for just such interpretations.
  C Carry flag: Set to 1 when there is a carry from the adder.
  V oVerflow flag: Set to 1 when Amsb = 1, Bmsb = 1, but Omsb = 0; or when
    Amsb = 0, Bmsb = 0, but Omsb = 1. Allows the controller to detect overflow
    during 2’s complement addition. Note (i) that the ALU does not know why it is
    setting the flag, and (ii) in our BSA, the msb=15.
3.4     Memory
You will have gathered that memory is no more than a very large collection of registers
held on an array on a chip, one register being accessible at a time via an addressing
mechanism.
To write to memory, one needs to set the address, place the data on the data bus,
and clock the recipient register. To read from memory, one needs to set the address,
output enable the relevant register onto the data bus, and clock the MBR.
                      Address Bus
               MAR                              D Q      D Q          D Q
                              Address Decoder
D Q D Q D Q
                ChipSelect
                Write/Read
OE
               MBR
                                                           Data Bus
Figure 3.6: Memory hardware Note that the input data lines (top) and output data lines (bottom) are
different. They are not of course — they are the same bus and have been drawn like this to avoid
crossing wires.
Figure 3.6 shows the design of a memory with contents 3 bits wide.
The address lines enter a decoder, which selects one register, ie one row of three D-
type latches with couple clock inputs. Notice that that each output from the decoder
is used in two ways: first for writing it is ANDed with the clock signal, and second for
reading it is ANDed with the register outputs.
The ChipSelect (CS) input is unexpected, as one might expect the memory to be
permanently selected! We shall see in a moment that it actually used as part of
address selection in a memory made from multiple chips. Assume it is CS=1 for now.
3/10                                      LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
The OE input is expected, but instead of a CLKmem input we now use the more usual
name WRITE/READ. (By the way, this notation means WRITE is the “equivalent” of
READ.)
At first this looks like a level selector signal but, by process of elimination, this must
be the clocked input.
          1
W/R       0
                                           Bus no longer floats
          1   Bus floating
                                                              Data Valid
Data
          0
                               Read access time                            Data hold time
                      Figure 3.7: Timing of signals for (a) memory read.
3.5. MEMORY ORGANIZATION IN HARDWARE                                                            3/11
Figure 3.8: Time of signals for the memory write. Notice that the signal which clocks the register on
the memory during write is the WRITE/READ signal.
                                                          A 23 A 0 go to
                                    Address Bus           both chips
                        MAR
                                          High              Low
                      CS
                                          Byte              Byte
                      OE
                     W/R
MBR
Figure 3.9: Using two Byte wide chips to make a 16-bit wide memory.
to generate an array 2 chips high and, as before, 2 wide, as shown in Fig. 3.10.
But each chip has only 23 address lines, A0 − A22 . What happens to A23 ? It is input
into a 1-to-2 line decoder, whose output is connected to the ChipSelect inputs. If
A23 = 0, the lower pair is selected, and if A23 = 1 the upper pair is selected. The OE
and W/R inputs are connected to all chips.
                                                           A 22 A 0 go to
                                     Address Bus           all chips
                       MAR
                     A23
                              1           CS                CS
                      1 to 2−line         OE                OE
                       Decoder            W/R               W/R
                              0
                                          CS                CS
                      OE                  OE                OE
                     W/R                  W/R               W/R
MBR
  Figure 3.10: Using 8MByte chips to create a memory with 16M locations and a width of 2 Bytes.
3.6. MEMORY ORGANIZATION IN SOFTWARE: MEMORY ADDRESSING MODES                                     3/13
  Data
      Figure 3.11: Full address decoding, but using only 2× 1K memories in an 8K address space.
(1) Immediate, (2) Direct, (3) Indirect, and (4) Indexed addressing.
Immediate addressing does not involve further memory addressing after the instruction
fetch. It provides a method of specifying a constant number rather than an address
in the operand. For example LDA# x loads the accumulator immediately with the
operand x.
                                         CPU                                           MAR
                                                                                         Address Bus
                                                                      PC    SP                         Memory
  LDA# x                                     INCpc/LOADpc
                                                                                             CLKmem
  AC←IR (address)                          IR
                                            IR(opcode) IR(address)                     MBR
                                                                                                  Data Bus
                                                                                  AC
  ADD# 22
  AC←AC + 22                                    CU             Status
                                                                            ALU
                                                                   SETalu
                                          Control Lines
                                          to Registers, ALU, Memory, etc                       Outside the CPU
Looking back at our Standard architecture, you will see that there is a direct link from
the IR (address) to the AC to allow this to happen.
Immediate addressing allows statements like “n=n+10” written in some high level lan-
guage to be turned into assembler, using modified versions of other instructions. For
example ADD# 22. How would you actually realize this on our machine?
We have already use direct addressing in the lectures. This is where the operand is the
address of the data you require. Another way of saying this is that the operand is a
pointer to the data.
  LDA x                                 ♣ Quick Example: What does location 23 and
  MAR←IR (address)                      the AC contain after this code snippet?
  MBR←hMAR i                            Code      AC Loc22 Loc23
  AC←MBR                                LDA #21
                                        STA 22
  ADD x
                                        ADD #1
  MAR←IR (address)
                            etc         ADD 22
  MBR←hMAR i
                                        STA 23
  AC←MBR + AC
3.6. MEMORY ORGANIZATION IN SOFTWARE: MEMORY ADDRESSING MODES                       3/15
LDA x,X
Cpus often provide a number of registers for temporary storage which avoid the need
to hold and access pointers in main memory (with obvious savings in time). Methods
which use these registers are known as register addressing modes.
Indexed addressing is a straightforward example of register addressing, and the only
one we consider in these lectures. In the example above, x is an address, and X is an
index register holding an offset. The effective address given to LDA is x+X, the sum
of the two.
The register X will be a counter register, and can be loaded separately, incremented
and decremented.
Here is some half-baked code for Matlab-esque addition of two arrays of length 100,
and placing the result in a third array ...
3/16                                      LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
47 38 38 38 38
         6       52               52               52                  52
         5
         4
         3
         2       47               47               47                  47
         1
         0
                                                                                    Register X
              LDA #2            LDA 2             LDA (2)            LDA 2,X
                                                                                        4
       AC        2                47                38                  52
              Immediate          Direct           Indirect           Indexed
Figure 3.12: Addressing modes: the contents of the AC after LDA immediate, direct, indirect and
indexed
3.7. A SMALL PROGRAM                                                               3/17
Initialize PC to 0 and start the controller running by starting the clock. The AC gets
loaded with 5 (from location 20), and then has 300 (from location 22) subtracted from
it. Because 5 isn’t equal to 300, the Z flag does not get set, and so we reload the AC
with 5, add 1 (from location 21) to it, and then restore it in location 20. The program
eventually halts when location 20 is incremented up to 300. (Halting simply stops the
clock.)
Notice that there is no requirement for assembler mnemonics or for that matter a high
level language. The raw form of the executable program and data is binary code. In
the good old days, when folk were short of things to do, binary code was loaded into
the memory by hand.
3.8. HARDWARE SUPPORT FOR HIGHER LEVEL LANGUAGES (READING)                           3/19
  • it reads the file of high level statements, checks the syntax, and replaces each
    statement with the relevant sequence of assembler instructions.
  • it places declared variables, and the names of labels, into a symbol table.
  • each time the variable or label is encountered, it replaces the name in the code by
    its location in the symbol table.
The instructions are written out into a temporary file. Since each instruction is of
known length, the compiler can associate with it provisional memory addresses, relative
to the Beginning of the Program (BOP). So, after the first pass, the compiler knows
the full extent of the program — or, more strictly, the extent of the instructions. After
the first pass, the compiler therefor knows where it can start placing allocated data.
We will call this memory location BOD (for Beginning of Data).
On the second pass the compiler
After reading the declaration of variables, the compiler could start making the symbol
table, where BOD indicates the unknown Beginning Of Data.
 Name      SymTab       Actual Location     Words
   a         var0            BOD             1
   b         var1          BOD+1             1
   c         var2          BOD+2             1
Let the beginning of the program be located at BOP, and let us count in words. The
compiler would start making its temporary file:
3.8. HARDWARE SUPPORT FOR HIGHER LEVEL LANGUAGES (READING)                         3/21
 Address     Instruction
 BOP         LDA #1
 BOP+1       STA var0
 BOP+2       LDA #2
 BOP+3       STA var1
 BOP+4       ADD var0
 BOP+5       STA var2
 BOP+6       BZ lab0
At this point, the compiler knows that the label will be just after the closing bracket,
but doesn’t know where that is. So it adds an entry to the symbol table
 Name      SymTab       Actual Location    Words
   a         var0            BOD            1
   b         var1          BOD+1            1
   c         var2          BOD+2            1
             lab0              ?            -
then carries on.
  Address      Instruction
  BOP          LDA #1
  BOP+1        STA var0
  BOP+2        LDA #2
  BOP+3        STA var1
  BOP+4        ADD var0
  BOP+5        STA var2
  BOP+6        BZ label0
  BOP+7        LDA #3
  BOP+8        STA var0
  BOP+9        LDA #4
  BOP+10       STA var1
  BOP+11       HALT
By the LDA #4 instruction, the compiler knows that lab0 is actual BOP+9, so the
symbol table is updated
 Name      SymTab       Actual Location    Words
   a         var0            BOD            1
   b         var1          BOD+1            1
   c         var2          BOD+2            1
             lab0          BOP+9            -
3/22                                   LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
When it reached the end of the program, the compiler knows that the beginning of
data BOD = BOP+12. It then rewrites the symbol table as:
 Name      SymTab       Actual Location     Words
   a         var0          BOP+12            1
   b         var1          BOP+13            1
   c         var2          BOP+14            1
             lab0          BOP+9             -
There are now two possibilities for the second pass. The compiler can either rewrite
var0, var1 and var2 and lab0 leaving BOP as a variable to be filled in at run time, or it
can set a value a value for BOP. Choosing the latter with BOP=100 (dec), we end up
with:
  Address     Instruction
  100         LDA #1
  101         STA 112
  102         LDA #2
  103         STA 113
  104         ADD 112
  105         STA 114
  106         BZ 109
  107         LDA #3
  108         STA 112
  109         LDA #4
  110         STA 113
  111         HALT
  112         0
  113         0
  114         0
Now simply replace the opcodes by the relevant binary, and we have executable machine
code. Notice that the assembler has to use a different opcode for LDA, LDA#, and
so on.
3.8. HARDWARE SUPPORT FOR HIGHER LEVEL LANGUAGES (READING)                        3/23
                                                                                       Free memory
                                                                   Memory                  big2
                                                                  allocated    1097
                                                                   on heap
                                                Free memory         during                 big1
Loc
                                                                  execution
                                                                                 97
       Free memory                        96   address of big2                   96        1097
                                          95   address of big1                   95         97
 94         x                             94         x                           94          x
 93     array[10]                         93     array[10]                       93     array[10]
                                                                    Declared
                            Declared                                 data
                            data
 84      array[1]                         84      array[1]                       84      array[1]
 83         d2                            83         d2                          83         d2
 82         d1                            82         d1                          82         d1
 81     instruction                       81     instruction                     81     instruction
 80     instruction                       80     instruction                     80     instruction
Program Program
Figure 3.13: (a) All data of known size when the program is written. (b) Sizes of big1 and big2 unknown
beforehand. Space is allocated on the heap. Only the address of the address is known beforehand, not
the address itself.
3.8. HARDWARE SUPPORT FOR HIGHER LEVEL LANGUAGES (READING)                                            3/25
  1
      often called POPing
3/26                                   LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
3.8.4   Subroutines
Subroutines allow the programmer to modularize code into small chunks which do a
specific task and which may be reused. For example,
main ()                     /* main program */
{
int a ,b , v1 ;
a = 1;      b = 10;
v1    = mysub (a , b );     /* subroutine call */
c = 0;
..
}
How can we call a subroutine in assembler language. We need to (1) jump to the
subroutine’s instructions, (2) transfer the necessary data to the subroutine, (3) arrange
for the result to be transferred back to the calling routine, and (4) jump back to handle
the instruction after the calling point.
Let us consider the assembler for a subroutine where there are two parameters. One
is in location 47, the other in location 48.
...  etc ...
     LDA #1         \\ a =1
     STA 47
     LDA #10        \\ b =10
     STA 48
     LDA 48         \\    2 nd parameter is datum b in loc 48
     PUSH           \\    push it onto stack
     LDA 47         \\    1 st parameter is datum a in loc 47
     PUSH           \\    push it onto stack
     JSR MYSUB      \\    Now Jump to SubRoutine at location MYSUB
     LDA #0         \\    rest of prog : these lines do c =0
     STA 49
... etc ...
                         \\ Subroutine code from here
MYSUB LDA SP ,2
      ADD SP ,3
      SHR
      RTS            \\ ReTurn from Subroutine
3.8. HARDWARE SUPPORT FOR HIGHER LEVEL LANGUAGES (READING)                         3/27
Notice that the subroutine starts at labelled address, so JSR is very like JMP. The
difference is that we have to get back after the subroutine. After the fetch, the
program counter is already incremented to point at the next instruction in the calling
program (ADD in this example). So in its execute phase, JSR pushes the current value
of the PC onto the stack, and then loads the operand into the PC, causing the next
instruction fetched to be the first in the subroutine. The RTS command ends the
subroutine by pulling the stored program counter from the stack. Because the stack is
a LIFO buffer, subroutines can be “nested” to any level until memory runs out.
Missing out the detail the JSR and RTS instructions perform the following:
   JSR x                                      RTS
   hSP i ←PC                                  SP←SP +1
   PC←IR (address); SP←SP-1                   PC←hSP i
When the subroutine has parameters, we also have to worry how to transfer the param-
eters to the subroutine. There are various ways, described in Clements §6.5 and Hill
and Peterson § 15.2. Here we mention just one method which again uses the stack.
Two methods are
In the latter method, the calling routine pushes the parameters onto the stack in order,
then uses JSR which of course pushes the return PC value onto the stack.
Figure 3.15 shows a very correct way of using the parameters from the stack by popping
and pushing. The return PC is pulled and stored temporarily, and then the parameters
are pulled, and the return PC pushed.
The problem with this is that it is very time consuming. A more efficient method is to
use the parameters by indexed addressing relative to the stack pointer, SP. That is,
the subroutine would access parameter 1, for example, by writing
LDA SP,2
When the subroutine RTS’s it will pull the return PC off the stack, but the parameters
will be left on. However, the calling routine knows how many parameters there were.
However, rather than pulling them off, it can simply increase the stack pointer by the
number of parameters (ie, by three in our example). This leaves the parameters in
memory, but as they are now outside the stack, they will get overwritten on the next
PUSH.
3/28                                   LECTURE 3. CONTROL UNIT, ALU, AND MEMORY
                                                         Param2       SP
        During subroutine                                Param1
        Access Param1 as SP+2                SP
        Access Param2 as SP+3
        Leave return value in AC
RTS SP:=SP+2