Chapter 4
The Processor
§4.1 Introduction
Introduction
CPU performance factors
Instruction count
Determined by ISA and compiler
CPI and Cycle time
Determined by CPU hardware
We will examine two MIPS implementations
A simplified version
A more realistic pipelined version
Simple subset, shows most aspects
Memory reference: lw, sw
Arithmetic/logical: add, sub, and, or, slt
Control transfer: beq, j
Chapter 4 — The Processor — 2
Instruction Execution
PC → instruction memory, fetch instruction
Register numbers → register file, read registers
Depending on instruction class
Use ALU to calculate
Arithmetic result
Memory address for load/store
Branch target address
Access data memory for load/store
PC ← target address or PC + 4
Chapter 4 — The Processor — 3
CPU Overview
Chapter 4 — The Processor — 4
Multiplexers
Can’t just join
wires together
Use multiplexers
Chapter 4 — The Processor — 5
Control
Chapter 4 — The Processor — 6
§4.2 Logic Design Conventions
Logic Design Basics
Information encoded in binary
Low voltage = 0, High voltage = 1
One wire per bit
Multi-bit data encoded on multi-wire buses
Combinational element
Operate on data
Output is a function of input
State (sequential) elements
Store information
Chapter 4 — The Processor — 7
Combinational Elements
AND-gate Adder A
Y
+
Y=A&B Y=A+B B
A
Y
B
Arithmetic/Logic Unit
Multiplexer Y = F(A, B)
Y = S ? I1 : I0
A
I0 M
u Y ALU Y
I1 x
B
S F
Chapter 4 — The Processor — 8
Sequential Elements
Register: stores data in a circuit
Uses a clock signal to determine when to
update the stored value
Edge-triggered: update when Clk changes
from 0 to 1
Clk
D Q
D
Clk
Q
Chapter 4 — The Processor — 9
Sequential Elements
Register with write control
Only updates on clock edge when write
control input is 1
Used when stored value is required later
Clk
D Q Write
Write D
Clk
Q
Chapter 4 — The Processor — 10
Clocking Methodology
Combinational logic transforms data during
clock cycles
Between clock edges
Input from state elements, output to state
element
Longest delay determines clock period
Chapter 4 — The Processor — 11
§4.3 Building a Datapath
Building a Datapath
Datapath
Elements that process data and addresses
in the CPU
Registers, ALUs, mux’s, memories, …
We will build a MIPS datapath
incrementally
Refining the overview design
Chapter 4 — The Processor — 12
Instruction Fetch
Increment by
4 for next
32-bit instruction
register
Chapter 4 — The Processor — 13
R-Format Instructions
Read two register operands
Perform arithmetic/logical operation
Write register result
Chapter 4 — The Processor — 14
Load/Store Instructions
Read register operands
Calculate address using 16-bit offset
Use ALU, but sign-extend offset
Load: Read memory and update register
Store: Write register value to memory
Chapter 4 — The Processor — 15
Branch Instructions
Read register operands
Compare operands
Use ALU, subtract and check Zero output
Calculate target address
Sign-extend displacement
Shift left 2 places (word displacement)
Add to PC + 4
Already calculated by instruction fetch
Chapter 4 — The Processor — 16
Branch Instructions
Just
re-routes
wires
Sign-bit wire
replicated
Chapter 4 — The Processor — 17
Composing the Elements
First-cut data path does an instruction in
one clock cycle
Each datapath element can only do one
function at a time
Hence, we need separate instruction and data
memories
Use multiplexers where alternate data
sources are used for different instructions
Chapter 4 — The Processor — 18
R-Type/Load/Store Datapath
Chapter 4 — The Processor — 19
Full Datapath
Chapter 4 — The Processor — 20
§4.4 A Simple Implementation Scheme
ALU Control
ALU used for
Load/Store: F = add
Branch: F = subtract
R-type: F depends on funct field
ALU control Function
0000 AND
0001 OR
0010 add
0110 subtract
0111 set-on-less-than
1100 NOR
Chapter 4 — The Processor — 21
ALU Control
Assume 2-bit ALUOp derived from opcode
Combinational logic derives ALU control
opcode ALUOp Operation funct ALU function ALU control
lw 00 load word XXXXXX add 0010
sw 00 store word XXXXXX add 0010
beq 01 branch equal XXXXXX subtract 0110
R-type 10 add 100000 add 0010
subtract 100010 subtract 0110
AND 100100 AND 0000
OR 100101 OR 0001
set-on-less-than 101010 set-on-less-than 0111
Chapter 4 — The Processor — 22
The Main Control Unit
Control signals derived from instruction
R-type 0 rs rt rd shamt funct
31:26 25:21 20:16 15:11 10:6 5:0
Load/
35 or 43 rs rt address
Store
31:26 25:21 20:16 15:0
Branch 4 rs rt address
31:26 25:21 20:16 15:0
opcode always read, write for sign-extend
read except R-type and add
for load and load
Chapter 4 — The Processor — 23
Datapath With Control
Chapter 4 — The Processor — 24
R-Type Instruction
Chapter 4 — The Processor — 25
Load Instruction
Chapter 4 — The Processor — 26
Branch-on-Equal Instruction
Chapter 4 — The Processor — 27
Implementing Jumps
Jump 2 address
31:26 25:0
Jump uses word address
Update PC with concatenation of
Top 4 bits of old PC
26-bit jump address
00
Need an extra control signal decoded from
opcode
Chapter 4 — The Processor — 28
Datapath With Jumps Added
Chapter 4 — The Processor — 29