RISC-V “Rocket Chip”
SoC Generator in Chisel
Yunsup Lee
UC Berkeley
yunsup@eecs.berkeley.edu
What is the Rocket Chip SoC Generator?
! Parameterized SoC generator written in Chisel
! Generates Tiles
- (Rocket) Core + Private Caches
! Generates Uncore (Outer Memory System)
- Coherence Agent
- Shared Caches
- DMA Engines
- Memory Controllers
! Glues all the pieces together
2
“Rocket Chip” SoC Generator
Rocket
Tile
Rocket
Tile HTIF
! Generates n Tiles
Core
ROCC
Core
ROCC - (Rocket) Core
- RoCC Accelerator
Accel. Accel.
FPU FPU
- L1 I$
L1 Inst
sets,
L1 Data
sets,
L1 Inst
sets,
L1 Data
sets,
- L1 D$
! Generates HTIF
ways ways ways ways
- Host DMA Engine
L1 Network ! Generates Uncore
Coherence Manager
- L1 Crossbar
- Coherence Manager
- Exports MemIO
Interface
TileLink / MemIO Converter
3
Why SoC Generators?
! Helps tune the design under different
performance, power, area constraints, and
diverse technology nodes
! Parameters include:
- number of cores
- instantiation of floating-point units, vector units
- cache sizes, associativity, number of TLB entries,
cache-coherence protocol
- number of floating-point pipeline stages
- width of off-chip I/O, and more
4
Why Chisel?
! RTL generator written in Chisel
- HDL embedded in Scala
! Full power of Scala
for writing generators Chisel Program
- object-oriented
Scala/JVM
programming
- functional C++
FPGA
programming code
Verilog
ASIC
Verilog
C++ Compiler
Software FPGA Tools
Simulator ASIC Tools
FPGA
Emulation
GDS
Layout
5
Rocket Scalar Core
PC IF ID EX MEM WB
ITLB Int.RF DTLB
PC To RoCC
I$ Inst. Int.EX D$ Commit to Hwacha
Gen. Accelerator
Access Decode Access
bypass paths omitted
for simplicity
FP.RF FP.EX1 FP.EX2 FP.EX3
Rocket Pipeline
- 64-bit
PC
5-stage
VITLB single-issue
VInst.
in-order pipeline
Seq- Bank1 ... Bank8
- Design
Gen. minimizes
VI$ impact
Decode of long
uencer clock-to-output
Expand
R/W delaysR/W
of compiler-generated RAMs
Access
from Rocket
- 64-entry BTB, 256-entry BHT, 2-entry RAS Hwacha Pipeline
- MMU supports page-based virtual memory
- IEEE 754-2008-compliant FPU
- Supports SP, DP FMA with hw support for subnormals
6
ARM Cortex-A5 vs. RISC-V Rocket
Category ARM Cortex-A5 RISC-V Rocket
ISA 32-bit ARM v7 64-bit RISC-V v2
Architecture Single-Issue In-Order Single-Issue In-Order 5-stage
Performance 1.57 DMIPS/MHz 1.72 DMIPS/MHz
Process TSMC 40GPLUS TSMC 40GPLUS
Area w/o Caches 0.27 mm2 0.14 mm2
Area with 16K Caches 0.53 mm2 0.39 mm2
Area Efficiency 2.96 DMIPS/MHz/mm2 4.41 DMIPS/MHz/mm2
Frequency >1GHz >1GHz
Dynamic Power <0.08 mW/MHz 0.034 mW/MHz
- PPA reporting conditions
- 85% utilization, use Dhrystone for benchmark, frequency/power
at TT 0.9V 25C, all regular VT transistors
- 10% higher in DMIPS/MHz, 49% more area-efficient
7
HTIF: Host-Target Interface
! UC Berkeley specific block mainly used to
emulate devices for simple test chips
- Emulates system calls, console, block devices,
frame buffer, network devices
- No need for this block once the SoC has actual
devices on the target machine
! Consider it as a “host DMA engine”
! A port for for host system to read/write
- Core CSRs (control and status registers)
- Target Memory
8
Important Interfaces in the Rocket Chip
! ROCCIO
Tile Tile HTIF
Rocket Rocket HTIFIO
Core Core
ROCCIO
ROCC
Accel.
ROCC
Accel. - Interface between
Rocket/Accelerator
FPU FPU
HostIO
L1 Inst L1 Data L1 Inst L1 Data ! HTIFIO
- Read/Write CSRs
sets, sets, sets, sets,
ways ways ways ways
! TileLinkIO
client client client client client
- Coherence Fabric
TileLink
O
inkI
L1 NetworkO arb
kIO nkI
! MemIO
Li
L
Lin Tile
Tile
Tile
Coherence Manager
mngr
- Simple AXI-like
memory interface
client arb
! HostIO
TileLinkIO
TileLink
mngr TileLink / MemIO Converter
- Host Interface to
HTIF
MemIO
9
TileLinkIO
Client Client
Cache Cache
Release
Release
Acquire
Acquire
Probe
Probe
Finish
Finish
Grant
Grant
Manager
- TileLinkIO consists of Acquire, Probe, Release, Grant,
Finish
10
UncachedTileLinkIO
Client Client
Cache
Release
Acquire
Acquire
Probe
Finish
Finish
Grant
Grant
Manager
- UncachedTileLinkIO consists of Acquire, Grant, Finish
- Convertors for TileLinkIO/UncachedTileLinkIO in uncore
library
11
MemIO
Master MemReqCmd Slave
MemReqCmd.valid
MemReqCmd.ready
Decoupled(MemData)
Decoupled(MemResp)
- MemReqCmd consists of addr, rw (write=true), tag
- MemData consists of 128 bit data payload
- MemResp consists of 128 bit data payload, tag
- Decoupled(interface) means an interface with ready/
valid signals
12
ROCCIO
! Rocket sends
Rocket Decoupled(Cmd) ROCC coprocessor instruction
Accel. via the Cmd interface
Decoupled(Resp)
! Accelerator responds
CacheIO through Resp interface
! Accelerator sends
busy memory requests to
IRQ L1D$ via CacheIO
! busy bit for fences
supervisor bit ! IRQ, S, exception bit
used for virtualization
UncachedTileLinkIO ! UncachedTileLinkIO for
instruction cache on
PTWIO accelerator
exception
! PTWIO for page-table
walker ports on
accelerator
13
HTIFIO
HTIF reset Tile
core_id
Decoupled(CSRReq)
Decoupled(CSRResp)
Decoupled(IPIReq)
Decoupled(IPIResp)
- reset signal and core_id routed from HTIF (historical
reasons nothing technical)
- CSR Read/Write requests go through CSRReq/CSRResp
- IPI Requests go through IPIReq/IPIResp
- HTIFIO likely to be modified in the near future
14
Rocket Chip C++ Emulator Setup
Verilog Simulator pthread
RISC-V
Rocket Chip HostIO Frontend
MemIO
Server
pthread
DRAMSim2
15
Rocket Chip FPGA Setup
ZYNQ FPGA
Rocket Chip
MemIO
HostIO
HostIO/AXI MemIO/AXIHP
Convertor Convertor
AXI HP
AXI
AXI Master AXI HP Slave
ARM
Processing System
DDR3 DRAM
RISC-V Frontend Server
16
Rocket Chip Berkeley Test Chip Setup
Test Chip
MemIO
Rocket Chip
MemIO
Serializer
M
em
HostIO
Se
ri a
lIO
ZYNQ FPGA
HostIO/AXI MemIO/AXIHP MemIO
MemIO
Convertor Convertor Deserializer
AXI HP
AXI
AXI Master AXI HP Slave
ARM
Processing System
DDR3 DRAM
RISC-V Frontend Server
17
Rocket Chip “SoC” Setup
Interrupts
Rocket Chip TiileLinkIO
Devices
Uncached
TiileLinkIO
MemIO
mIO
Me
LPDDR3 LPDDR3 DRAM
Memory Controller
18
Who should use the Rocket Chip Generator?
People who would like to develop …
! A RISC-V SoC
Tile Tile HTIF
Rocket Rocket HTIFIO
Core
ROCCIO
ROCC
Core
ROCC - Look into Chisel
parameters
Accel. Accel.
FPU FPU
! New Accelerators
HostIO
L1 Inst
sets,
L1 Data
sets,
L1 Inst
sets,
L1 Data
sets,
- Drop in at ROCCIO level
ways ways ways ways
! Own RISC-V Core
client client client client client
- Drop in at TileLinkIO level
or MemIO level
TileLink
O
! Own Device
inkI
L1 NetworkO arb
kIO Li nkI
L
Lin Tile
Tile
Tile
Coherence Manager
- Drop in at TileLinkIO or
UncachedTileLinkIO
mngr
client arb
TileLinkIO
TileLink
mngr TileLink / MemIO Converter
MemIO
19
New Features: L2$ with Directory Bits
Tile Tile HTIF ! Shared L2$ with
multiple banks
Rocket Rocket
Core Core
ROCC ROCC
! Each L2$ will act as a
Accel. Accel.
FPU FPU
L1 Inst L1 Data L1 Inst L1 Data
coherence manager
with directory bits
sets, sets, sets, sets,
ways ways ways ways
client client client client client
(snoop filter)
! These caches can be
TileLink
L1 Network arb
L2Cache L2CacheCoherence ManagerL2Cache
L2Cache L2Cache composed to build
mngr mngr mngr mngr mngr
outer-level caches
such as an L3$
sets, sets, sets, sets, sets,
ways ways ways ways ways
client client client client client
TileLink
mngr TileLink / MemIO Converter
20
New Features: ROCC interfaces with L2$
Tile Tile HTIF ! ROCC talks directly
to the L2$ to
Rocket Rocket
Core Core
ROCC
address more data
Accel.
FPU FPU
ROCC
Accel.
L1 Inst
L1 Inst L1 Data L1 Data
sets, sets, sets,
ways ways ways
client client client client client
TileLink
L1 Network arb
L2Cache L2CacheCoherence ManagerL2Cache
L2Cache L2Cache
mngr mngr mngr mngr mngr
sets, sets, sets, sets, sets,
ways ways ways ways ways
client client client client client
TileLink
mngr TileLink / MemIO Converter
21
New Features on the Deck
! Dual-issue Rocket Core
! Hwacha Vector Unit (checkout hwacha.org)
! Dump MemIO and use AXI
22