1
Ch 5: Designing a Single Cycle Datapath
Computer Systems Architecture
CS 365
The Big Picture: Where are We Now?
 The Five Classic Components of a Computer
 Todays Topic: Design a Single Cycle Processor
Control
Datapath
Memory
Processor
Input
Output
inst. set design (Ch 3)
technology 
machine
design
Arithmetic (Ch 4)
2
The Big Picture: The Performance Perspective
 Performance of a machine is determined by:
 Instruction count
 Clock cycle time
 Clock cycles per instruction
 Processor design (datapath and control) will determine:
 Clock cycle time
 Clock cycles per instruction
 Today:
 Single cycle processor:
 Advantage: One clock cycle per instruction
 Disadvantage: long cycle time
CPI
Inst. Count Cycle Time
How to Design a Processor: step-by-step
1. Analyze instruction set => datapath requirements
 the meaning of each instruction is given by the register transfers
 datapath must include storage element for ISA registers
 possibly more
 datapath must support each register transfer
2. Select set of datapath components and establish clocking methodology
3. Assemble datapath meeting the requirements
4. Analyze implementation of each instruction to determine setting of control 
points that effects the register transfer.
5. Assemble the control logic
3
The MIPS Instruction Formats
 All MIPS instructions are 32 bits long.  The three  instruction formats:
 R-type
 I-type
 J-type
 The different fields are:
 op: operation of the instruction
 rs, rt, rd: the source and destination register specifiers
 shamt: shift amount
 funct: selects the variant of the operation in the op field
 address / immediate: address offset or immediate value
 target address: target address of the jump instruction 
op target address
0 26 31
6 bits 26 bits
op rs rt rd shamt funct
0 6 11 16 21 26 31
6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
op rs rt immediate
0 16 21 26 31
6 bits 16 bits 5 bits 5 bits
Step 1a: The MIPS-lite Subset
 ADD, SUB, AND, OR
 add rd, rs, rt
 sub rd, rs, rt
 and rd, rs,rt
 or rd,rs,rt
 LOAD and STORE Word
 lw rt, rs, imm16
 sw rt, rs, imm16
 BRANCH:
 beq rs, rt, imm16
op rs rt rd shamt funct
0 6 11 16 21 26 31
6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
op rs rt immediate
0 16 21 26 31
6 bits 16 bits 5 bits 5 bits
op rs rt immediate
0 16 21 26 31
6 bits 16 bits 5 bits 5 bits
4
Logical Register Transfers
 RTL gives the meaning of the instructions
 First step is to fetch the instruction from memory
op | rs | rt | rd | shamt | funct = MEM[ PC ]
op | rs | rt |   Imm16                = MEM[ PC ]
inst  Register Transfers
ADD R[rd] < R[rs] + R[rt]; PC < PC + 4
SUB R[rd] < R[rs]  R[rt]; PC < PC + 4
OR R[rt] < R[rs] | R[rt]; PC < PC + 4
LOAD R[rt] < MEM[ R[rs] + sign_ext(Imm16)]; PC < PC + 4
STORE MEM[ R[rs] + sign_ext(Imm16) ] < R[rt]; PC < PC + 4
BEQ if ( R[rs] == R[rt] ) then PC < PC + 
sign_ext(Imm16)] || 00 
else PC < PC + 4
Step 1: Requirements of the Instruction Set
 Memory
 instruction & data
 Registers (32 x 32)
 read RS
 read RT
 Write RT or RD
 PC
 Extender
 Add and Sub register or extended immediate
 Add 4 or extended immediate to PC
5
Step 2: Components of the Datapath
 Combinational Elements
 Storage Elements
 Clocking methodology
Registers
Register #
Data
Register #
Data
memory
Address
Data
Register #
PC Instruction ALU
Instruction
memory
Address
Abstract/Simplified View of Datapath
 Two types of functional units:
 elements that operate on data values (combinational)
 elements that contain state (sequential)
6
Combinational Logic Elements (Basic Building Blocks)
 Adder
 MUX
 ALU
32
32
A
B
32
Sum
Carry
32
32
A
B
32
Result
O
P
32
A
B
32
Y
32
Selec
t
A
d
d
e
r
M
U
X
A
L
U
CarryIn
 Unclocked vs. Clocked
 Clocks used in synchronous logic 
 when should an element that contains state be updated?
cycle time
rising edge
falling edge
State Elements: Review
7
 The set-reset latch
 output depends on present inputs and also on past inputs
An unclocked state element
Q
_
Q
R
S
 Output is equal to the stored value inside the element
(don't need to ask for permission to look at the value)
 Change of state (value) is based on the clock
 Latches:  whenever the inputs change, and the clock is asserted
 Flip-flop:  state changes only on a clock edge
(edge-triggered methodology)
"logically true", 
could mean electrically low
A clocking methodology defines when signals can be read and written
wouldn't want to read a signal at the same time it was being written
Latches and Flip-flops
8
 Two inputs:
 the data value to be stored (D)
 the clock signal (C) indicating when to read & store D
 Two outputs:
 the value of the internal state (Q) and its complement
D-latch
Q
C
D
_
Q
D
C
Q
D flip-flop
 Output changes only on the clock edge
Q Q
_
Q
Q
_
Q
D
latch
D
C
D
latch
D D
C
C
D
C
Q
9
Our Implementation
 An edge triggered methodology
 Typical execution:
 read contents of some state elements, 
 send values through some combinational logic
 write results to one or more state elements
Clock cycle
State
element
1
Combinational logic
State
element
2
State
element
Combinational logic
Storage Element: Register (Basic Building Block)
 Register
 Similar to the D Flip Flop except
 N-bit input and output
 Write Enable input
 Write Enable:
 negated  (0): Data Out will not 
change
 asserted (1): Data Out will become 
Data In
Clk
Data In
Write Enable
N N
Data Out
10
 Built using D flip-flops
Register File
M
u
x
Register 0
Register 1
Register n  1
Register n
M
u
x
Read data 1
Read data 2
Read register
number 1
Read register
number 2
Read register
number 1 Read
data 1
Read
data 2
Read register
number 2
Register file
Write
register
Write
data Write
Register File
 Note:  we still use the clock to determine when to write
n-to-1
decoder
Regi ster 0
Regi ster 1
Regi ster n  1
C
C
D
D
Regi ster n
C
C
D
D
Regi ster number
Write
Regi st er dat a
0
1
n  1
n
11
Storage Element: Register File
 Register File consists of 32 registers:
 Two 32-bit output busses:
busA and busB
 One 32-bit input bus: busW
 Register is selected by:
 RA (number) selects the register to put on busA (data)
 RB (number) selects the register to put on busB (data)
 RW (number) selects the register to be  written
via busW (data) when Write Enable is 1
 Clock input (CLK) 
 The CLK input is a factor ONLY during write operation
 During read operation, behaves as a combinational logic 
block:
 RA or RB valid => busA or busB valid after access time.
Clk
busW
Write Enable
32
32
busA
32
busB
5 5 5
RWRARB
32 32-bit
Registers
Storage Element: Idealized Memory
 Memory (idealized)
 One input bus: Data In
 One output bus: Data Out
 Memory word is selected by:
 Address selects the word to put on Data Out
 Write Enable = 1: address selects the memory
word to be written via the Data In bus
 Clock input (CLK) 
 The CLK input is a factor ONLY during write operation
 During read operation, behaves  as a combinational logic 
block:
 Address valid => Data Out valid after access time.
Clk
Data In
Write Enable
32 32
DataOut
Address
12
Clocking Methodology
 All storage elements are clocked by the same clock edge
 Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew
 (CLK-to-Q + Shortest Delay Path - Clock Skew)  >  Hold Time
Clk
Dont Care
Setup Hold
.
.
.
.
.
.
.
.
.
.
.
.
Setup Hold
Step 3
 Register Transfer Requirements
> Datapath Assembly
 Instruction Fetch
 Read Operands and Execute Operation
13
3a: Overview of the Instruction Fetch Unit
 The common RTL operations
 Fetch the Instruction: mem[PC]
 Update the program counter:
 Sequential Code: PC <- PC + 4 
 Branch and Jump:   PC <- something else
 We dont know if instruction is a Branch/Jump or one of the 
other instructions until we have fetched and interpreted the 
instruction from memory. So all instructions initially increment
the PC
PC
Instruction
memory
Instruction
address
Instruction
a. Instruction memory b. Program counter
Add Sum
c. Adder
14
PC
Instruction
memory
Read
address
Instruction
4
Add
Datapath for Instruction Fetch
3b: R-format instructions: add, sub, and, or, slt
 R[rd] <- R[rs] op R[rt]  Example: add    rd, rs, rt
 Read register 1, Read register 2, and Write register come from 
instructions rs, rt, and rd fields
 ALU control  and RegWrite: control logic after decoding the 
instruction             
op rs rt rd shamt funct
0 6 11 16 21 26 31
6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
ALU control
RegWrite
Registers
Write
register
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Write
data
ALU
result
ALU
Data
Data
Register
numbers
a. Registers b. ALU
Zero
5
5
5 3
15
Instruction
Registers
Write
register
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Write
data
ALU
result
ALU
Zero
RegWrite
ALU operation
3
Datapath for R-format instructions
Register-Register Timing
32
Result
ALUctr
Clk
busW
RegWr
32
32
busA
32
busB
5 5 5
Rw Ra Rb
32 32-bit
Registers
Rs Rt Rd
A
L
U
Clk
PC
Rs, Rt, Rd,
Op, Func
Clk-to-Q
ALUctr
Instruction Memory Access Time
Old Value New Value
RegWr Old Value New Value
Delay through Control Logic
busA, B
Register File Access Time
Old Value New Value
busW
ALU Delay
Old Value New Value
Old Value New Value
New Value Old Value
Register Write
Occurs Here
16
3d: Load & Store Operations
 R[rt] <- Mem[R[rs] + SignExt[imm16]] Example: lw    rt, rs, imm16
 Mem[ R[rs] + SignExt[imm16] <- R[rt] ]  Example: sw    rt, rs, imm16
op rs rt immediate
0 16 21 26 31
6 bits 16 bits 5 bits 5 bits
16 32
Sign
extend
b. Sign-extension unit
MemRead
MemWrite
Data
memory
Write
data
Read
data
a. Data memory unit
Address
Instruction
16 32
Registers
Write
register
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Data
memory
Write
data
Read
data
Write
data
Sign
extend
ALU
result
Zero
ALU
Address
MemRead
MemWrite
RegWrite
ALU operation
3
Datapath for lw & sw
17
3f: The Branch Instruction
 beq rs, rt, imm16
 mem[PC] Fetch the instruction from memory
 Equal <- R[rs] == R[rt] Calculate the branch condition
if (COND eq 0) Calculate the next instructions address
PC  <- PC + 4 + ( SignExt(imm16) x 4 )
else
PC  <- PC + 4
op rs rt immediate
0 16 21 26 31
6 bits 16 bits 5 bits 5 bits
16 32
Sign
extend
Zero ALU
Sum
Shift
left 2
To branch
control logic
Branch target
PC + 4 from instruction datapath
Instruction
Add
Registers
Write
register
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Write
data
RegWrite
ALU operation
3
Datapath for branch instruction
18
PC
Instruction
memory
Read
address
Instruction
16 32
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
ALU
result
Zero
Data
memory
Address
Write
data
Read
data
M
u
x
4
Add
M
u
x
ALU
RegWrite
ALU operation
3
MemRead
MemWrite
ALUSrc
MemtoReg
Using multiplexors to stitch together the datapath for 
memory access and R-format instructions
PC
Instruction
memory
Read
address
Instruction
16 32
Add ALU
result
M
u
x
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Shift
left 2
4
M
u
x
ALU operation
3
RegWrite
MemRead
MemWrite
PCSrc
ALUSrc
MemtoReg
ALU
result
Zero
ALU
Data
memory
Address
Write
data
Read
data
M
u
x
Sign
extend
Add
Putting it all together
19
MemtoReg
MemRead
MemWrite
ALUOp
ALUSrc
RegDst
PC
Instruction
memory
Read
address
Instruction
[310]
Instruction [2016]
Instruction [2521]
Add
Instruction [50]
RegWrite
4
16 32
Instruction [150]
0
Registers
Write
register
Write
data
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
ALU
result
Zero
Data
memory
Address
Read
data
M
u
x
1
0
M
u
x
1
0
M
u
x
1
0
M
u
x
1
Instruction [1511]
ALU
control
Shift
left 2
PCSrc
ALU
Add
ALU
result
Putting it all together contd
PC
Instruction
memory
Read
address
Instruction
[31 0]
Instruction [20 16]
Instruction [25 21]
Add
Instruction [5 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
RegDst
ALUSrc
Instruction [31 26]
4
16 32
Instruction [15 0]
0
0
M
u
x
0
1
Control
Add
ALU
result
M
u
x
0
1
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
M
u
x
1
ALU
result
Zero
PCSrc
Data
memory
Write
data
Read
data
M
u
x
1
Instruction [15 11]
ALU
control
Shift
left 2
ALU
Address
Adding the control unit
20
An Abstract View of the Critical Path
 Register file and ideal memory:
 The CLK input is a factor ONLY during write operation
 During read operation, behave as combinational logic:
 Address valid => Output valid after access time.
Critical Path (Load Operation) = 
PCs Clk-to-Q +
Instruction Memorys Access Time +
Register Files Access Time +
ALU to Perform a 32-bit Add +
Data Memory Access Time +
Setup Time for Register File Write +
Clock Skew
Clk
5
Rw Ra Rb
32 32-bit
Registers
Rd
A
L
U
Clk
Data 
In
Data
Address
Ideal
Data
Memory
Instruction
Instruction
Address
Ideal
Instruction
Memory
C
l
k
P
C
5
Rs
5
Rt
16
Imm
32
32
32
32
A
B
N
e
x
t 
A
d
d
r
e
s
s
Step 4: Given Datapath: RTL -> Control
ALUop RegDst ALUSrc
MemRd MemtoReg MemWr
Zero
Instruction<31:0>
<
2
1
:
2
5
>
<
1
6
:
2
0
>
<
1
1
:
1
5
>
<
0
:
1
5
>
Imm16 Rd Rs Rt
Branch
Adr
Inst
Memory
DATA PATH
Control
Op
<
2
1
:
2
5
>
Fun
RegWr
21
Control
 Selecting the operations to perform (ALU, read/write, etc.)
Design the ALU Control Unit
 Controlling the flow of data (multiplexor inputs)
Design the Main Control Unit
 Information comes from the 32 bits of the instruction
 Example:
add $8, $17, $18  Instruction Format:
000000   10001   10010   01000   00000 100000
op   rs   rt   rd   shamt   funct
 ALU's operation based on instruction type and function code
 e.g., what should the ALU do with this instruction
 Example: lw $1, 100($2)
35 2 1 100
op rs rt 16 bit offset
 ALU control input
000  AND
001   OR
010   add
110   subtract
111   set-on-less-than
 Why is the code for subtract 110 and not 011?)
ALU Control
(Recall design of ALU from Chapter 4. Bnegate input for adder set to 
1 for subtraction
22
ALU Control Design
111 Set on less 
than
101010 Set on less 
than
10 R-type
001 Or 1000101 OR 10 R-type
000 And 100100 AND 10 R-type
110 Subtract 100010 Subtract 10 R-type
010 Add 100000 Add 10 R-type
110 Subtract xxxxxx Branch eq 01 BEQ
010 Add xxxxxx Store word 00 SW
010 Add xxxxxx Load word 00 LW
ALU control 
input
Desired 
ALU action
Funct field
Instruction 
operation
ALUOp
Instruction 
opcode
 Must describe hardware to compute 3-bit ALU control input
 given instruction type 
00 = lw, sw
01 = beq 
10 = arithmetic
 function code for arithmetic
 Describe it using a truth table (can turn into gates):
ALUOp 
computed from instruction type
Control
ALUOp Funct field Operation
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010
X 1 X X X X X X 110
1 X X X 0 0 0 0 010
1 X X X 0 0 1 0 110
1 X X X 0 1 0 0 000
1 X X X 0 1 0 1 001
1 X X X 1 0 1 0 111
23
Design the main control unit
 Seven control signals
RegDst
RegWrite
ALUSrc
PCSrc
MemRead
MemWrite
MemtoReg
Control Signals
1. RegDst = 0 => Register destination number for the Write register 
comes from the rt field (bits 20-16)
RegDst = 1 => Register destination number for the Write register 
comes from the rd field (bits 15-11)
2. RegWrite = 1 => The register on the Write register input is written with 
the data on the Write data input (at the next clock edge)
3. ALUSrc = 0 => The second ALU operand comes from Read data 2
ALUSrc = 1 => The second ALU operand comes from the sign-
extension unit
4. PCSrc = 0 => The PC is replaced with  PC+4
PCSrc = 1 => The PC is replaced with the branch target address
5. MemtoReg = 0 => The value fed to the register write data input comes 
from the ALU
MemtoReg = 1 => The value fed to the register write data input comes 
from the data memory
6.  MemRead = 1 => Read data memory
7.  MemWrite = 1 => Write data memory
24
R-format instructions
RegDst = 1
RegWrite = 1
ALUSrc = 0
Branch = 0
MemtoReg = 0
MemRead = 0
MemWrite = 0
ALUOp = 10
RegDst = 0
RegWrite = 1
ALUSrc = 1
Branch = 0
MemtoReg = 1
MemRead = 1
MemWrite = 0
ALUOp = 00
Memory access instructions
RegDst = X
RegWrite = 0
ALUSrc = 1
Branch = 0
MemtoReg = X
MemRead = 0
MemWrite = 1
ALUOp = 00
Load word
Store Word
0
25
Branch Equal
RegDst =  X
RegWrite = 0
ALUSrc = 0
Branch = 1
MemtoReg = X
MemRead = 0
MemWrite = 0
ALUOp = 01
Control
Instruction RegDst ALUSrc
Memto-
Reg
Reg 
Write
Mem 
Read
Mem 
Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0
lw
0 1 1 1 1 0 0 0 0
sw
X 1 X 0 0 1 0 0 0
beq
X 0 X 0 0 0 1 0 1
PC
Instruction
memory
Read
address
Instruction
[31 0]
Instruction [20 16]
Instruction [25 21]
Add
Instruction [5 0]
MemtoReg
ALUOp
MemWrite
RegWrite
MemRead
Branch
RegDst
ALUSrc
Instruction [31 26]
4
16 32
Instruction [15 0]
0
0
M
u
x
0
1
Control
Add
ALU
result
M
u
x
0
1
Registers
Write
register
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
Shift
left 2
M
u
x
1
ALU
result
Zero
Data
memory
Write
data
Read
data
M
u
x
1
Instruction [15 11]
ALU
control
ALU
Address
26
Step 5: Implementing Control
 Simple combinational logic 
(truth tables)
Operation2
Operation1
Operation0
Operation
ALUOp1
F3
F2
F1
F0
F (5 0)
ALUOp0
ALUOp
ALU control block
R-format Iw sw beq
Op0
Op1
Op2
Op3
Op4
Op5
Inputs
Outputs
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUOp1
ALUOpO
ALU Control Unit
Main Control Unit
 All of the logic is combinational
 We wait for everything to settle down, and the right thing to be done
 ALU might not produce right answer right away
 we use write signals along with clock to determine when to write
 Cycle time determined by length of the longest path
Our Simple Control Structure
Clock cycle
State
element
1
Combi national logic
State
element
2
27
An Abstract View of the Critical Path
 Register file and ideal memory:
 The CLK input is a factor ONLY during write operation
 During read operation, behave as combinational logic:
 Address valid => Output valid after access time.
Critical Path (Load Operation) = 
PCs Clk-to-Q +
Instruction Memorys Access Time +
Register Files Access Time +
ALU to Perform a 32-bit Add +
Data Memory Access Time +
Setup Time for Register File Write +
Clock Skew
Clk
5
Rw Ra Rb
32 32-bit
Registers
Rd
A
L
U
Clk
Data 
In
Data
Address
Ideal
Data
Memory
Instruction
Instruction
Address
Ideal
Instruction
Memory
C
l
k
P
C
5
Rs
5
Rt
16
Imm
32
32
32
32
A
B
N
e
x
t 
A
d
d
r
e
s
s
Single Cycle Implementation
 Calculate cycle time assuming negligible delays except:
 memory (2ns), ALU and adders (2ns), register file access (1ns)
MemtoReg
MemRead
MemWrite
ALUOp
ALUSrc
RegDst
PC
Instruction
memory
Read
address
Instruction
[310]
Instruction [2016]
Instruction [2521]
Add
Instruction [50]
RegWrite
4
16 32
Instruction [150]
0
Registers
Write
register
Write
data
Write
data
Read
data 1
Read
data 2
Read
register 1
Read
register 2
Sign
extend
ALU
result
Zero
Data
memory
Address
Read
data
M
u
x
1
0
M
u
x
1
0
M
u
x
1
0
M
u
x
1
Instruction [1511]
ALU
control
Shift
left 2
PCSrc
ALU
Add
ALU
result
28
A Real MIPS Datapath (CNS T0)
Summary
 5 steps to design a processor
 1. Analyze instruction set => datapath requirements
 2. Select set of datapath components & establish clock methodology
 3. Assemble datapath meeting the requirements
 4. Analyze implementation of each instruction to determine setting of control 
points that effects the register transfer.
 5. Assemble the control logic
 MIPS makes it easier
 Instructions same size
 Source registers always in same place
 Immediates same size, location
 Operations always on registers/immediates
 Single cycle datapath => CPI=1, Clock Cycle Time => long