Control
Control
ARCHITECTURE
Control
ALU Control
This section introduces a single-cycle implementation for a subset of MIPS instructions—load word (lw),
store word (sw), branch equal (beq), and R-type arithmetic-logical instructions
(add, sub, AND, OR, slt)—using the datapath from the previous section and adding a control function.
It focuses on designing the ALU control as the first step in building the control unit,
with plans to later add support for the jump (j) instruction.
ALU Functions
Load/Store (lw/sw)
ALU performs addition (0010) to compute the memory address (base register + sign-extended offset).
Function Inputs
Combines ALUOp and funct field to generate the 6-bit funct field: Specifies the exact operation for
appropriate 4-bit ALU control input. R-type instructions (e.g., 100000 for add, 101010 for slt).
For lw/sw and beq, ALUOp alone determines 2-bit ALUOp: A control signal indicating the instruction class:
the operation. • 00: Add (for lw and sw address calculation).
• 01: Subtract (for beq comparison).
For R-type, ALUOp (10) signals the control unit to
• 10: Use funct field (for R-type instructions
decode the funct field and select the corresponding
to select add, sub, AND, OR, or slt).
ALU operation.
Output
A 4-bit ALU control signal (e.g., 0010 for add,
0110 for subtract) directly controls the ALU operation.
The ALU is reused across all instruction types, with its operation tailored by the ALU control unit.
The ALUOp field simplifies control by categorizing operations,
while the funct field provides fine-grained control for R-type instructions.
Control Mechanism
Multi-Level Decoding
Structure Advantages
The main control unit generates the ALUOp bits, Reduces the size of the main control unit by delegating
which are then decoded by a smaller ALU control detailed decoding to a secondary unit. Potentially increases
unit to produce the final ALU control signals. control unit speed, critical for minimizing clock cycle time.
Implementation Approach
Mapping Logic Design
Only a subset of the 64 possible funct field values A small piece of logic recognizes the relevant funct values
(2⁶) is relevant, and the funct field is used only and sets the ALU control bits accordingly.
when ALUOp is 10 (R-type). A truth table (Figure 4.13) lists the combinations of ALUOp
and funct field that require specific ALU control values.
Truth Table The full truth table (256 entries, 2⁸) is simplified by showing
A logical representation showing input only entries where the ALU control must be asserted,
values and corresponding outputs. omitting “don’t care” or deasserted cases.
Control Mechanism (Cont.)
Instruction Instruction Desired ALU control Illustrates how ALU control bits are derived from
ALUOp Funct field
opcode operation ALU action input ALUOp and funct field, showing the mapping for
LW 00 load word XXXXXX add 0010 lw, sw, beq, and R-type instructions in binary.
SW 00 store word XXXXXX add 0010
Notes
Branch equal 01 branch equal XXXXXX subtract 0110
When ALUOp is 00 or 01, the funct field
R-type 10 add 100000 add 0010
is irrelevant (don’t care).
R-type 10 subtract 100010 subtract 0110
When ALUOp is 10, the funct field
R-type 10 AND 100100 AND 0000 determines the ALU operation.
R-type 10 OR 100101 OR 0001
R-type 10 set on less than 101010 set on less than 0111
Figure 4.12
Don’t-Care Term
An input where the output is independent Don’t-care terms reduce complexity by allowing flexibility in unused input combinations.
of its value (marked as X),
simplifying logic design.
Main Control Unit
Objective: Design the main control unit to manage the datapath (Figure 4.11),
generating write signals, multiplexor selectors, and ALU control inputs.
Builds on the ALU control design (previous section) by connecting instruction fields to datapath operations.
Figure 4.13
Design Process
The opcode (Op[5:0]) drives the main control unit to set ALUOp and other signals.
The ALUOp and funct field feed into the ALU control unit (previously designed) to generate the 4-bit ALU control.
A multiplexor is added to select the destination register (rt for lw, rd for R-type) based on instruction type.
Enhanced Datapath
PCSrc
Figure 4.15 0
M
Add u
x
4 ALU
Add
result 1
RegWrite Shift
left 2
Instruction [25:21] Read
register 1 MemWrite
Read
PC address Instruction [20:16] Read
Read data 1 MemtoReg
register 2 ALUSrc Zero
Instruction 0
[31:0] M ALU
Write Read ALU Read
u 0 Address 1
Instruction register data 2 result data
Instruction [15:11] x M M
memory
1 u u
Write
Data x x
Registers
RegDst 1 0
Data
Write memory
data
• Instruction labels: Indicate fields like opcode, rs, rt, rd, and offset. RegDst
Write register is rt (bits Write register is rd (bits
20:16, for lw). 15:11, for R-type).
• Multiplexor: Selects the write register number for the register file
Write register with data on
(rt [20:16] for lw, rd [15:11] for R-type). RegWrite No register write.
the write input.
• ALU control block: Generates the 4-bit ALU control signal (from prior design). Second ALU operand is Second ALU operand is
ALUSrc register data (Read data 2, sign-extended 16-bit offset
Control lines (shown in color): for R-type/beq). (for lw/sw).
• Write signals: For register file (RegWrite) and data memory (MemWrite). PC = PC + 4 (sequential PC = branch target
PCSrc
• Read signal: For data memory (MemRead). instruction). address (for beq if taken).
• Multiplexor controls: For ALU input, register write data, Memory outputs data to
MemRead No memory read.
PC source, and write register number. read data line (for lw).
Next Steps
Figure 4.17 Figure 4.18
Shows the datapath with the control Informally defines control signal values (0, 1, or X [don’t care])
unit and all nine control signals for each opcode, derived from Figures 4.12 (ALU control),
(7 single-bit + 2-bit ALUOp). 4.16 (signal effects), and 4.17 (datapath).
Operation of the Datapath
Figure 4.17 0
M
Add u
RegDst x
4 ALU
Branch Add 1
result
MemRead
Shift
MemtoReg
left 2
Instruction [31:26] ALUOp
Control
MemWrite
ALUSrc
RegWrite
Outputs RegDst
Branch
Nine control signals (7 single-bit + 2-bit ALUOp). MemRead
MemtoReg
ALUOp
Control
Signal Settings MemWrite
ALUSrc
Figure 4.18 RegWrite
Instruction RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp0
R-format 1 0 0 1 0 0 0 1 0
lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
Figure 4.18 Control signal settings per opcode, with X for don’t cares.
Figure 4.19 0
M
Add u
RegDst x
4 Branch
MemRead
MemtoReg
Instruction [31:26] ALUOp
Control
MemWrite
ALUSrc
RegWrite
R-type Execution
R-type Execution (Cont.)
Example add $t1, $t2, $t3
Steps
Fetch Register Read
01 Instruction fetched from instruction memory;
PC incremented by 4 (adder active).
02 Register file reads $t2 (rs) and $t3 (rt); control unit sets
signals (e.g., RegDst = 1, RegWrite = 1, ALUOp = 10).
Signals
RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp0
1 0 0 1 0 0 0 1 0
Figure 4.20 0
M
Add u
RegDst x
4 Branch
MemRead
MemtoReg
Instruction [31:26] ALUOp
Control
MemWrite
ALUSrc
RegWrite
Steps
Fetch Register Read
01 Instruction fetched; PC incremented by 4. 02 Register file reads $t2 (rs, base register).
Register Write
05 Memory data written to $t1 (rt, bits 20:16)
(RegWrite = 1, MemtoReg = 1).
Signals
RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp0
0 1 1 1 1 0 0 0 0
Figure 4.21 0
M
Add u
RegDst x
4 ALU
Branch Add 1
result
MemRead
Shift
MemtoReg
left 2
Instruction [31:26] ALUOp
Control
MemWrite
ALUSrc
RegWrite
Steps
Fetch Register Read
01 Instruction fetched; PC incremented by 4. 02 Register file reads $t1 (rs) and $t2 (rt).
Signals
RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp0
X 0 X 0 0 0 1 0 1
Finalizing Control
Purpose: Precisely define the control function for the single-cycle datapath to execute
R-format, load word (lw), store word (sw), and branch equal (beq) instructions.
Method: Use a truth table (Figure 4.22) to map opcode values to control signal settings, derived from Figure 4.18.
ALUSrc
RegWrite
Read
PC address
Instruction Zero
[31:0]
Instruction
memory
Figure 4.24
ALU
control
Jump Instruction
Jump Instruction (Cont.) opcode
31:26
address
25:0
New Components
Multiplexor: Added to select the PC source from three options:
• PC + 4: Sequential instruction (for non-branch, non-jump).
• Branch target address: For beq when taken (from branch adder).
• Jump target address: For jump instructions.
Jump Target Address Calculation:
• Shift Left 2: The 26-bit immediate is shifted left by 2 bits (append 00),
implemented by wiring (no hardware shift needed, as it’s a fixed operation).
• Concatenation: Combines the shifted 26 bits with the upper 4 bits of PC + 4
(bits 31:28) to form a 32-bit address.
Integration: The new multiplexor replaces the previous two-way PC source multiplexor
(PC + 4 vs. branch target) with a three-way multiplexor to include the jump target.
Control Modifications
New Control Signal: Jump (single-bit):
• Asserted (1): When the instruction is a jump (opcode = 000010, decimal 2).
• Deasserted (0): For all other instructions (R-type, lw, sw, beq).
Jump = 1: Selects jump
Function: Controls the new PC source multiplexor to select the jump target address when asserted.
target address for PC.
RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp0
Signals X X X 0 0 0 0 X X
Inefficiency
Single-Cycle Implementation Overview Inefficiency of Single-Cycle Design
Functionality: Correctly executes instructions Fixed Clock Cycle: The clock cycle length is determined
(e.g., R-type, lw, sw, beq, j) in one clock cycle using the by the longest possible path in the processor,
datapath and control described previously. typically the load instruction (lw), which uses five
Design: Each instruction uses the entire datapath, functional units in series.
with resources (e.g., ALU, memory) allocated for the Consequence: All instructions, even simpler ones
full cycle, even if not needed. (e.g., R-type or beq), must use the same long clock cycle,
leading to inefficiency.
Performance: Despite a CPI (Cycles Per Instruction) of 1,
Modern Challenges the long clock cycle results in poor overall performance
due to low clock frequency.
Floating-point units: Require longer computation times.
Complex instructions: Increase the worst-case delay,
further lengthening the clock cycle.
Alternative Approach
The next section introduces pipelining,
which uses a similar datapath but achieves
higher throughput.
How It Works
Pipelining executes multiple instructions simultaneously
by dividing the datapath into stages,
allowing each stage to process a different instruction in parallel.
Benefit
Significantly improves efficiency by reducing the effective
cycle time and increasing instruction throughput.