2010 R&E Computer System Education & Research
Lecture 9. MIPS Processor Design Single-Cycle Processor Design
Prof. Taeweon Suh Computer Science Education Korea University
Single-Cycle MIPS Processor
Again, microarchitecture (CPU implementation) is divided into 2 interacting parts
Datapath Control
Korea Univ
Single-Cycle Processor Design
Lets start with a memory access instruction - lw
Example: lw $2, 80($0)
I-Type
op
6 bits
STEP 1: Instruction Fetch
CLK PC' PC A RD Instruction Memory Instr CLK A1 A2 A3 WD3 WE3 RD1
rs
5 bits
rt
5 bits
imm
16 bits
CLK WE A RD2 Register File RD Data Memory WD
Korea Univ
Single-Cycle Processor Design
STEP 2: Decoding
Read source operands from register file
I-Type
Example: lw $2, 80($0)
op
6 bits
rs
5 bits
rt
5 bits
imm
16 bits
CLK
25:21
CLK PC Instr A1 A2 A3 WD3 WE3 RD1
CLK WE A RD2 Register File RD Data Memory WD
PC'
RD
Instruction Memory
Korea Univ
Single-Cycle Processor Design
STEP 2: Decoding
Sign-extend the immediate
I-Type
Example: lw $2, 80($0)
CLK PC' PC A RD Instr
25:21
op
6 bits
rs
5 bits
rt
5 bits
imm
16 bits
CLK A1 A2 A3 WD3 WE3 RD1
CLK WE A RD2 Register File RD Data Memory WD
Instruction Memory
15:0
SignImm Sign Extend
module signext(input [15:0] a, output [31:0] y); assign y = {{16{a[15]}}, a}; endmodule
Korea Univ
Single-Cycle Processor Design
STEP 3: Execution
Compute the memory address
I-Type
Example: lw $2, 80($0)
op
6 bits
rs
5 bits
rt
5 bits
imm
16 bits
ALUControl2:0 CLK PC' PC A RD Instr
25:21
CLK A1 A2 A3 WD3 WE3 RD1 RD2 Register File SrcA
010 Zero
CLK WE A RD Data Memory WD
ALU
ALUResult
Instruction Memory
SrcB
SignImm
15:0
Sign Extend
Korea Univ
Single-Cycle Processor Design
STEP 4: Execution
Read data from memory and write it back to register file Example: lw $2, 80($0)
RegWrite 1 CLK PC' PC A RD Instr
25:21
I-Type
op
6 bits
rs
5 bits
rt
5 bits
imm
16 bits
ALUControl2:0 010 CLK SrcA Zero WE A RD Data Memory WD ReadData
CLK A1 A2 A3 WD3 WE3 RD1 RD2 Register File
ALU
ALUResult
Instruction Memory
20:16
SrcB
SignImm
15:0
Sign Extend
Korea Univ
Single-Cycle Processor Design
We are done with lw CPU starts fetching the next instruction from PC+4
module adder(input [31:0] a, b, output [31:0] y); adder assign y = a + b; endmodule
RegWrite 1 CLK PC' PC A RD Instr
25:21
pcadd1(pc, 32'b100, pcplus4);
ALUControl2:0 010 CLK SrcA Zero WE A RD Data Memory WD ReadData
CLK A1 A2
20:16
WE3
ALU
RD1 RD2
ALUResult
Instruction Memory
SrcB
A3 WD3
Register File
PCPlus4 SignImm
15:0
Sign Extend
Result
Korea Univ
Single-Cycle Processor Design
Lets consider another memory access instruction - sw
sw instruction needs to write data to data memory
I-Type
Example: sw $2, 84($0)
RegWrite 0 CLK PC' PC A RD Instr
25:21
op
6 bits
rs
5 bits
rt
5 bits
imm
16 bits
ALUControl2:0 010
MemWrite 1 CLK
CLK A1 A2
20:16
WE3
ALU
RD1 RD2
SrcA
Zero ALUResult A
WE RD Data Memory WD ReadData
Instruction Memory
20:16
SrcB
A3 WD3
Register File
WriteData
PCPlus4 SignImm
15:0
Sign Extend
Result
Korea Univ
Single-Cycle Processor Design
Lets consider arithmetic and logical instructions - add, sub, and, or
Write ALUResult to register file Note that R-type instructions write to rd field of instruction (instead of rt)
RegWrite 1 CLK PC' PC A RD Instr
25:21
R-Type
op
6 bits
rs
5 bits
rt
5 bits
rd
5 bits
shamt
5 bits
funct
6 bits
RegDst 1
ALUSrc ALUControl2:0 0 SrcA varies
MemWrite CLK 0 WE A RD Data Memory WD
MemtoReg 0
CLK A1 A2 A3 WD3
20:16 15:11
WE3
ALU
RD1 RD2
Zero ALUResult
ReadData
0 1
Instruction Memory
20:16
0 SrcB 1
Register File 0 WriteReg4:0 1
WriteData
PCPlus4
15:0
SignImm 4 Sign Extend
Result
10
Korea Univ
Single-Cycle Processor Design
Lets consider a branch instruction - beq
Determine whether register values are equal Calculate branch target address (BTA) from sign-extended immediate and PC+4
Example: beq $4,$0, around
I-Type
op
6 bits
rs
5 bits
rt
5 bits
PCSrc
imm
16 bits
RegWrite 0 CLK 0 1 PC' PC A RD Instr
25:21
RegDst x
ALUSrc ALUControl2:0 Branch 0 SrcA 110 Zero 1
MemWrite CLK 0 WE A RD Data Memory WD
MemtoReg x
CLK A1 A2 A3 WD3
20:16 15:11
WE3
ALU
RD1 RD2
ALUResult
ReadData
0 1
Instruction Memory
20:16
Register File 0 WriteReg4:0 1
0 SrcB 1
WriteData
PCPlus4
15:0
SignImm 4 Sign Extend
<<2
PCBranch
Result
11
Korea Univ
Single-Cycle Datapath Example
We are done with the implementation of basic instructions Lets see how or instruction works out in the implementation
R-Type
op
6 bits
rs
5 bits
rt
5 bits
rd
5 bits
shamt
5 bits
funct
6 bits
31:26 5:0
Control MemWrite Unit Branch ALUControl2:0 Op Funct ALUSrc RegDst RegWrite
MemtoReg
0 PCSrc
0 0 1
CLK PC' PC A RD Instr
25:21
CLK A1 A2 A3 WD3
20:16 15:11
1 WE3 RD1 0 RD2 Register File 0 WriteReg4:0 1 SrcA
CLK 001 Zero ALUResult A
0 WE 0 ReadData 0 1
ALU
Instruction Memory
20:16
0 SrcB 1 1
WriteData
RD Data Memory WD
PCPlus4
15:0
SignImm 4 Sign Extend
<<2
PCBranch
Result
12
Korea Univ
Single-Cycle Processor - Control
As mentioned, CPU is designed with datapath and control Now, lets delve into the control part design
MemtoReg Control MemWrite Unit Branch ALUControl2:0 Op Funct ALUSrc RegDst RegWrite CLK 0 1 PC' PC A RD Instr
25:21
PCSrc
31:26 5:0
CLK A1 A2 A3 WD3
20:16 15:11
CLK WE3 RD1 RD2 Register File 0 WriteReg4:0 1 SrcA Zero WE A RD Data Memory WD ReadData 0 1
ALU
ALUResult
Instruction Memory
20:16
0 SrcB 1
WriteData
PCPlus4
15:0
SignImm 4 Sign Extend
<<2
PCBranch
Result
13
Korea Univ
Control Unit
Control Unit
Opcode5:0
Main Decoder
MemtoReg MemWrite Branch ALUSrc RegDst RegWrite
Opcode and funct fields come from the fetched instruction
ALUOp1:0 ALU Decoder
Funct5:0
ALUControl2:0
14
Korea Univ
ALU Implementation and Control
A
N
B
N
F2:0 000
Function A&B A|B A+B not used A & ~B A | ~B A-B SLT
A N
B N 3F
Cout
Zero Extend
adder
001
F2
010 011 100
ALU
N Y
+ [N-1] S
101 110 111
N = 32 in 32-bit processor
slt: set less than
2
2
N
F1:0
Example: slt $t0, $t1, $t2 // $t0 = 1 if $t1 < $t2
15
Korea Univ
Control Unit: ALU Control
Implementation is completely dependent on hardware designers But, the designers should make sure the implementation is reasonable enough
Memory access instructions (lw, sw) need to use ALU to calculate memory target address (addition) Branch instructions (beq, bne) need to use ALU for the equality check (subtraction)
ALUOp1:0 00
Meaning Add
01
10 11
Subtract
Look at Funct Not Used ALUControl2:0 010 (Add) 110 (Subtract) 010 (Add)
Control Unit
Opcode5:0
Main Decoder
MemtoReg MemWrite Branch ALUSrc RegDst RegWrite
ALUOp1:0 00 X1 1X
Funct X X 100000 (add)
1X
ALUOp1:0
100010 (sub)
100100 (and) 100101 (or) 101010 (slt)
16
110 (Subtract)
000 (And) 001 (Or) 111 (SLT)
1X
Funct5:0 ALU Decoder ALUControl2:0
1X 1X
Korea Univ
Control Unit: Main Decoder
Instruction
Op5:0
000000 100011 101011 000100
RegWrite
RegDst
AluSrc
Branch
MemWrite
MemtoReg
ALUOp1:0
R-type lw sw beq
1 1 0
1 0 X X
0 1 1
0 0
0 0 1 0
0 1 X X
10 00 00 01
0
1
Control Unit
Opcode5:0
Main Decoder
MemtoReg MemWrite Branch ALUSrc RegDst RegWrite
ALUOp1:0 00 01 10 11
Meaning Add Subtract Look at Funct field Not Used
ALUOp1:0 ALU Decoder
Funct5:0
ALUControl2:0
17
Korea Univ
How about Other Instructions?
Hmmm.. Now, we are done with the control part design Lets examine if the design is able to execute other instructions
addi
Example: addi $t0, $t1, -14
MemtoReg Control MemWrite Unit Branch ALUControl2:0 Op Funct ALUSrc RegDst RegWrite CLK 0 1 PC' PC A RD Instr
25:21
PCSrc
31:26 5:0
CLK A1 A2 A3 WD3
20:16 15:11
CLK WE3 RD1 RD2 Register File 0 WriteReg4:0 1 SrcA Zero WE A RD Data Memory WD ReadData 0 1
ALU
ALUResult
Instruction Memory
20:16
0 SrcB 1
WriteData
PCPlus4
15:0
SignImm 4 Sign Extend
<<2
PCBranch
Result
18
Korea Univ
Control Unit: Main Decoder
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0
R-type lw sw beq addi
000000 100011
1 1
1 0
0 1
0 0
0 0
0 1
10 00
101011
000100 001000
0
0 1
X
X 0
1
0 1
0
1 0
1
0 0
X
X
00
01 00
19
Korea Univ
How about Other Instructions?
Ok. So far, so good How about jump instructions?
j
MemtoReg Control MemWrite Unit Branch ALUControl2:0 Op Funct ALUSrc RegDst RegWrite CLK 0 1 PC' PC A RD Instr
25:21
J-Type
op
6 bits
addr
26 bits
PCSrc
31:26 5:0
CLK A1 A2 A3 WD3
20:16 15:11
CLK WE3 RD1 RD2 Register File 0 WriteReg4:0 1 SrcA Zero WE A RD Data Memory WD ReadData 0 1
ALU
ALUResult
Instruction Memory
20:16
0 SrcB 1
WriteData
PCPlus4
15:0
SignImm 4 Sign Extend
<<2
PCBranch
Result
20
Korea Univ
How about Other Instructions?
We need to add some hardware to support the j instruction A logic to compute the target address op Mux and control signal 6 bits
Jump MemtoReg Control MemWrite Unit Branch ALUControl2:0 Op Funct ALUSrc RegDst RegWrite CLK 0 1 0 1 PC' PC A RD Instr
25:21
J-Type
addr
26 bits
PCSrc
31:26 5:0
CLK A1 A2 A3 WD3
20:16
CLK WE3 RD1 RD2 Register File 0 WriteReg4:0 1 SrcA Zero WE A RD Data Memory WD ReadData 0 Result 1
ALU
ALUResult
Instruction Memory
20:16
0 SrcB 1
WriteData
PCJump
15:11
PCPlus4
15:0
SignImm 4
27:0 31:28
Sign Extend
<<2
25:0
<<2
PCBranch
21
Korea Univ
Control Unit: Main Decoder
There is one more output in the main decoder to support the jump instructions Jump
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump
R-type
000000
10
lw
sw beq addi j
100011
101011 000100 001000 000100
1
0 0 1 0
0
X X 0 X
1
1 0 1 X
0
0 1 0 X
0
1 0 0 0
1
X X 0 X
00
00 01 00 XX
0
0 0 0 1
22
Korea Univ
Verilog Code - Main Decoder and ALU Control
Control Unit Opcode5:0
module maindec(input [5:0] op, output memtoreg, memwrite, output branch, alusrc, output regdst, regwrite, output jump, output [1:0] aluop); reg [8:0] controls; assign {regwrite, regdst, alusrc, branch, memwrite, memtoreg, jump, aluop} = controls; always @(*) case(op) 6'b000000: 6'b100011: 6'b101011: 6'b000100: 6'b001000: 6'b000010: default: endcase endmodule
Main Decoder
MemtoReg MemWrite Branch ALUSrc RegDst RegWrite
ALUOp1:0 ALU Decoder
Funct5:0
ALUControl2:0
module aludec(input [5:0] funct, input [1:0] aluop, output reg [2:0] alucontrol); always @(*) case(aluop) 2'b00: alucontrol <= 3'b010; // add 2'b01: alucontrol <= 3'b110; // sub default: case(funct) // RTYPE 6'b100000: alucontrol <= 3'b010; 6'b100010: alucontrol <= 3'b110; 6'b100100: alucontrol <= 3'b000; 6'b100101: alucontrol <= 3'b001; 6'b101010: alucontrol <= 3'b111; default: alucontrol <= 3'bxxx; // endcase endcase endmodule
controls <= controls <= controls <= controls <= controls <= controls <= controls <=
9'b110000010; // R-type 9'b101001000; // lw 9'b001010000; // sw 9'b000100001; // beq 9'b101000000; // addi 9'b000000100; // j 9'bxxxxxxxxx; // ???
// ADD // SUB // AND // OR // SLT ???
23
Korea Univ
Verilog Code ALU
A N B N 3F
ALU
N Y
A
N
module alu(input [31:0] a, b, input [2:0] alucont, output reg [31:0] result, output zero); wire [31:0] b2, sum, slt; assign b2 = alucont[2] ? ~b:b; assign sum = a + b2 + alucont[2]; assign slt = sum[31];
F2
F2:0
000
Function
A&B
B
N
001
010 011 100 101 110 111
A|B
A+B not used A & ~B A | ~B A-B SLT
Cout
Zero Extend
+ [N-1] S
always@(*) case(alucont[1:0]) 2'b00: result <= a & b2; 2'b01: result <= a | b2; 2'b10: result <= sum; 2'b11: result <= slt; endcase assign zero = (result == 32'b0); endmodule
1
N N
0
N N
2
N
F1:0
24
Korea Univ
Single-Cycle Processor Performance
How fast is the single-cycle processor? Clock cycle time (frequency) is limited by the critical path
The critical path is the path that takes the longest time What do you think the critical path is?
The path that lw instruction goes through
MemtoReg Control MemWrite Unit Branch ALUControl 2:0 Op ALUSrc Funct RegDst RegWrite CLK 0 1 PC' PC A RD Instr
25:21
PCSrc
31:26 5:0
CLK A1 A2 A3 WD3
20:16 15:11
1 WE3 RD1 1 RD2 Register File 0 WriteReg4:0 1 SrcA
CLK 010 Zero ALUResult A
0 WE 1 ReadData 0 1
ALU
Instruction Memory
20:16
0 SrcB 1 0
WriteData
RD Data Memory WD
PCPlus4
15:0
SignImm 4 Sign Extend
<<2
PCBranch
Result
25
Korea Univ
Single-Cycle Processor Performance
Single-cycle critical path:
Tc = tpcq_PC + tmem + max(tRFread, tsext) + tmux + tALU + tmem + tmux + tRFsetup
In most implementations, limiting paths are: memory (instruction and data), ALU, register file. Thus,
Tc = tpcq_PC + 2tmem + tRFread + 2tmux + tALU + tRFsetup
31:26 5:0
MemtoReg Control MemWrite Unit Branch ALUControl 2:0 Op ALUSrc RegWrite Funct RegDst
PCSrc
Elements Register clock-to-Q
Parameter tpcq_PC tmux tALU tmem tRFread tRFsetup
CLK 0 1 PC' PC A RD Instr
25:21
CLK A1 A2 A3
1 WE3 RD1 RD2 SrcA
CLK 010 Zero ALUResult A
0 WE 1 ReadData 0 1
Multiplexer ALU Memory read Register file read
Instruction Memory
20:16
Register WD3 File
20:16 15:11
1 0 SrcB 1 0 0 1
ALU
WriteData
RD Data Memory WD
PCPlus4
15:0
WriteReg4:0
SignImm 4 Sign Extend
<<2
PCBranch
Register file setup
Result
26
Korea Univ
Single-Cycle Processor Performance Example
Elements Register clock-to-Q Multiplexer ALU Memory read Register file read Register file setup Parameter tpcq_PC tmux tALU tmem tRFread tRFsetup Delay (ps) 30 25 200 250 150 20
Tc = tpcq_PC + 2tmem + tRFread + 2tmux + tALU + tRFsetup
= [30 + 2(250) + 150 + 2(25) + 200 + 20] ps = 950 ps
fc = 1/Tc fc = 1/950ps
= 1.052GHz
Assuming that the CPU executes 100 billion instructions to run your program, what is the execution time of the program on a single-cycle MIPS processor?
Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = (100 109)(1)(950 10-12 s) = 95 seconds
27
Korea Univ