Asic Project 1.0
Asic Project 1.0
Project Specification
EECS 151/251A RISC-V Processor Design
Contents
1 Introduction 2
1 Introduction
The primary goal of this project is to familiarize students with the methods and tools of digital design.
In order to make the project both interesting and useful, we will guide you through the implementation
of a CPU that is intended to be integrated on a modern SOC. Working alone or in teams of 2, you will
be designing a simple 3-stage CPU that implements the RISC-V ISA, developed here at UC Berkeley.
If you work in a team, you both must work on the project together (i.e. you are not allowed to divide up
the work), and you will both receive the same grade.
Your first and most important goal is to write a functional implementation of your processor. To
better expose you to real design decisions, you will also be tasked with improving the performance of
your processor. You will be required to meet a minimum performance to be specified later in the project.
You will use Verilog HDL to implement this system. You will be provided with some testbenches to
verify your design, but you will be responsible for creating additional testbenches to exercise your entire
design. Your target implementation technology will be the ASAP7 7nm Educational PDK, a predictive
model technology used for instruction. The project will give you experience designing synthesizeable
RTL (Register Transfer Level) code, resolving hazards in a simple pipeline, building interfaces, and
approaching system-level optimization.
Your first step will be to map our high level specification to a design which can be translated into
a hardware implementation. You will then generate and debug that implementation in Verilog. These
steps may take significant time if you do not put effort into your system architecture before attempting
implementation. After you have built a working design, you will be optimizing it for speed in the 7nm
technology that we have been using this semester.
1.1 RISC-V
The final project for this class will be a VLSI implementation of a RISC-V (pronounced risk-five) CPU.
RISC-V is a new instruction set architecture (ISA) developed here at UC Berkeley. It was originally
developed for computer architecture research and education purposes, but recently there has been a
push towards commercialization and industry adoption. For the purposes of this lab, you don’t need to
delve too deeply into the details of RISC-V. However, it may be good to familiarize yourself with it, as
this will be at the core of your final project. Check out the official Instruction Set Manual and explore
http://riscv.org for more information.
• Read through sections 2.2 and 2.3 starting on page 11 in the RISC-V Instruction Set Manual to
understand how the different types of instructions are encoded.
• Read through sections 2.4, 2.5, and 2.6 starting on page 13 in the Instruction Set Manual and
think about how each of the instructions will use the ALU.
You do not need to read 2.7 or 2.8, as you will not be implementing those instructions in the project.
In the first phase (front-end), you will design and implement a 3-stage RISC-V processor in Verilog,
and run simulations to test for functionality. At this point, you will only have a functional description
of your processor that is independent of technology (there are no standard cells yet). You have about
5 weeks to complete the first phase, but you are highly encouraged to try to finish each checkpoint
early, as each checkpoint will be released before the due date of the ongoing one. Everything will take
much longer than you expect, and finishing early gives you more time to improve your QOR (Quality
of Results, e.g. clock period).
In the second phase (back-end), you will implement your front-end design in the ASAP7 7nm kit
using the VLSI tools you used in lab. When you have finished phase 2, you will have a design that could
actually be fabricated if this were a real process. You will have about 2 weeks to complete the second
phase after its release.
1.3 Philosophy
This document is meant to describe a high-level specification for the project and its associated support
hardware. You can also use it to help lay out a plan for completing the project. As with any design
you will encounter in the professional world, we are merely providing a framework within which your
project must fit.
You should consider the GSI(s) a source of direction and clarification, but it is up to you to produce
a fully functional design, as well as a physical implementation. We will attempt to help, when possible,
but ultimately the burden of designing and debugging your solution lies on you.
The most important goal is to design a functional processor- this alone is 50-60% of the final grade,
and you must have it working completely to receive any credit for performance.
• Checkpoint 1: ALU design and pipeline diagram (due End of 03/16-03/22, 2020)
As soon as you start your project, you must post your group information as a private note on Piazza.
Please provide each group member’s name, student ID number, and instructional account name for the
group members (e.g. eecs151-aa). Please do this even if you are working alone, as these git repos will
be used for part of the final checkoff. Once it is setup you will be given a team number, and you will
be given a repo hosted on the servers for version control for the project. You should be able to add the
remote host of “geecs151:teamXX” where “XX” is the team number that you are assigned. An example
working flow to be able to pull from the skeleton as well as push/pull with your team repository is
shown below:
Then to pull changes from the skeleton, you would need to type:
And to push changes to your team repository, you would usually want to pull first (above) and then
type:
31 27 26 25 24 20 19 15 14 12 11 7 6 0
funct7 rs2 rs1 funct3 rd opcode R-type
imm[11:0] rs1 funct3 rd opcode I-type
imm[11:5] rs2 rs1 funct3 imm[4:0] opcode S-type
imm[12|10:5] rs2 rs1 funct3 imm[4:1|11] opcode SB-type
imm[31:12] rd opcode U-type
imm[20|10:1|11|19:12] rd opcode UJ-type
• `timescale 1ns / 1ps - This specifies, in order, the reference time unit and the precision.
This example sets the unit delay in the simulation to 1ns (i.e. #1 = 1ns) and the precision to 1ps
(i.e. the finest delay you can set is #0.001 = 1ps).
• The clock is generated by the code below. Since the ALU is only combinational logic, this is not
necessary, but it will be a helpful reference once you have sequential elements.
– The initial block sets the clock to 0 at the beginning of the simulation. You should be
sure to only change your stimulus when the clock is falling, since the data is captured on
the rising edge. Otherwise, it will not only be difficult to debug your design, but it will also
cause hold time violations when you run gate level simulation.
– You must use an always block without a sensitivity list (the @ part of an always statement)
to cause the clock to run automatically.
parameter Halfcycle = 5; //half period is 5ns
localparam Cycle = 2*Halfcycle;
reg Clock;
// Clock Signal generation:
initial Clock = 0;
always #(Halfcycle) Clock = ˜Clock;
• task checkOutput; - this task contains Verilog code that you would otherwise have to copy
paste many times. Note that it is not the same thing as a function (as Verilog also has functions).
Version 3.3 March 14, 2020 8
For these two modules, the inputs and outputs that you care about are opcode, funct, add_rshift_type,
A, B and Out. To test your design thoroughly, you should work through every possible opcode,
funct, and add_rshift_type that you care about, and verify that the correct Out is generated
from the A and B that you pass in.
The test bench generates random values for A and B and computes REFout = A + B. It also
contains calls to checkOutput for load and store instructions, for which the ALU should perform
addition. It will be up to you to write tests for the remaining combinations of opcode, funct, and
add_rshift_type to test your other instructions.
Remember to restrict A and B to reasonable values (e.g. masking them, or making sure that they are
not zero) if necessary to guarantee that a function is sufficiently tested. Please also write tests where
the inputs and the output are hard-coded. These should be corner cases that you want to be certain are
stressed during testing.
[106:100] = opcode
[99:97] = funct
[96] = add_rshift_type
[95:64] = A
[63:32] = B
[31:0] = REFout
Open up the skeleton provided to you in ALUTestVectorTestbench.v. You need to complete the
module by making use of $readmemb to read in the test vector file (named testvectors.input),
writing some assign statements to assign the parts of the test vectors to registers, and writing a for loop
to iterate over the test vectors.
The syntax for a for loop can be found in ALUTestbench.v. $readmemb takes as its arguments
a filename and a reg vector, e.g.:
already written a Verilog test bench for our ALU and decoder, we will tackle writing a few test vectors
by hand, then use a script to generate test vectors more quickly.
Test vectors are of the format specified above, with the 7 opcode bits occupying the left-most bits.
Open up the file tests/testvectors.input and add test vectors for the following instructions
to the end (i.e. manually type the 107 zeros and ones required for each test vector): SLT, SLTU, SRA,
and SRL.
In the same directory, we’ve also provided a test vector generator written in Python, which is a
popular language used for scripting. We used this generator to generate the test vectors provided to you.
If you’re curious, you can read the next paragraph and poke around in the file. If not, feel free to skip
ahead to the next section.
The script ALUTestGen.py is located in tests. Run it so that it generates a test vector file in
the tests folder. Keep in mind that this script makes a couple assumptions that aren’t necessary and
may differ from your implementation:
• Jump, branch, load and store instructions will use the ALU to compute the target address.
• For all shift instructions, A is shifted by B. In other words, B is the shift amount.
• For the LUI instruction, the value to load into the register is fed in through the B input.
You can either match these assumptions or modify the script to fit with your implementation. All the
methods to generate test vectors are located in the two Python dictionaries opcodes and functs.
The lambda methods contained (separated by commas) are respectively: the function that the operation
should perform, a function to restrict the A input to a particular range, and a function to restrict the B
input to a particular range.
If you modify the Python script, run the generator to make new test vectors. This will overwrite
the file, so if you want to save your handwritten test vectors, rename the file before running the script,
then append them once the file has been generated.
% python3 ALUTestGen.py
This will write the test vector into the file testvectors.input. Use this file as the target test vector
file when loading the test vectors with $readmemb.
always@(*) begin
case(foo)
3'b000: // something happens here
3'b001: // something else happens here
3'b010, 3'b011: // you can have more than
// one case do the same thing
default: // everything else
endcase
end
To make your job easier, we have provided two Verilog header files: Opcode.vh and ALUop.vh.
They provide, respectively, macros for the opcodes and functs in the ISA and macros for the different
ALU operations. You should feel free to change ALUop.vh to optimize the ALUop encoding, but if
you change Opcode.vh, you will break the test bench skeleton provided to you. You can use these
macros by placing a backtick in front of the macro name, e.g.:
case(opcode)
`OPC_STORE:
is the equivalent of:
case(opcode)
7'b0100011:
As an example of how to use the waveform viewer, suppose you get the following output when you run
ALUTestbench:
The $display() statement actually already tells you everything you need to know to fix your bug,
but you’ll find that this is not always the case. For example, if you have an FSM and you need to look
at multiple time steps, the waveform viewer presents the data in a much neater format. If your design
had more than one clock domain, it would also be nearly impossible to tell what was going on with only
$display() statements.
Add all the signals from ALUTestbench to the waveform viewer and you see the following win-
dow: The two highlighted boxes contain the tools for navigation and zoom. You can hover over the
icons to find out more about what each of them do. You can find the location (time) in the waveform
viewer where the test bench failed by searching for the value of DUTout output by the $display()
statement above (in this case, 0x490a9a92:
1. Selecting DUTout
2. Clicking Edit > Wave Signal Search > Search for Signal Value > 0x490a9a92
Now you can examine all the other signal values at this time. Compare the DUTout and REFout
values at this time, and you should see that they are similar but not quite the same. From the opcode,
funct, and add_rshift_type, you know that this is supposed to be an SRA instruction, but it
looks like your ALU performed a SRL instead. However, you wrote
That looks like it should work, but it doesn’t! It turns out you need to tell Verilog to treat B as a signed
number for SRA to work as you wish. You change the line to say:
After making this change, you run the tests again and cross your fingers. Hopefully, you will see the
line:
If not, you will need to debug your module until all test from the test vector file and the hard-coded test
cases pass.
Version 3.3 March 14, 2020 12
1. Show your pipeline diagram, and explain when writes and reads occur in the register file and
memory relative to the pipeline stages.
2. Show your working ALU test bench files to your TA and explain your hard-coded cases. You
should also be able to show that the tests for the test vectors generated by the python script and
your hard-coded test vectors both work.
3. In ALUTestbench, the inputs to the ALU were generated randomly. When would it be preferable
to perform an exhaustive test rather than a random test?
4. What bugs, if any, did your test bench help you catch?
5. For one of your bugs, come up with a short assembly program that would have failed had you not
caught the bug. In the event that you had no bugs and wrote perfect code the first time, come up
with an assembly program to stress the SRA bug mentioned in the above section.
csrw will write the value from register in rs1. csrwi will write the immediate (stored in rs1) to
the addressed csr. Note that you do not need to write to rd (writing to x0 does nothing).
31 20 19 15 14 12 11 76 0
csr rs1 funct3 rd opcode
12 5 3 5 7
source/dest source CSRRW dest SYSTEM
source/dest zimm[4:0] CSRRWI dest SYSTEM
Version 3.3 March 14, 2020 13
4.2 Details
Your job is to implement the core of the 3-stage RISC-V CPU.
This will generate .out files in the asm output/ directory, and summarize which tests passed and
failed. You can also run single asm test with the following command:
, where ’simple’ gets replaced with any of the available tests defined in the Makefile.
You can read the assembly code of the programs by looking at the dump file. Comments in the code
will help you understand what is happening.
cd tests/asm/
vim rv32ui-p-addi.dump
Last, you can see the hex code that is loaded directly into the memory by looking at the hex file.
cd tests/asm/
vim rv32ui-p-addi.hex
Version 3.3 March 14, 2020 14
Congratulations! You’ve started the design of your datapath by implementing your pipeline dia-
gram, and written and thoroughly tested a key component in your processor and should now be well-
versed in testing Verilog modules. Please answer the following questions to be checked off by a TA.