0% found this document useful (0 votes)
22 views22 pages

ADA Project Report

This document outlines the design and implementation of an AES hardware accelerator integrated into a RISC-V system using the Chipyard framework. The project aims to enhance AES encryption and decryption performance through a Memory-Mapped I/O interface, demonstrating the advantages of hardware acceleration over software-based solutions. The document details system specifications, design choices, testing methodologies, and potential future applications in secure computing environments.

Uploaded by

hung kung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views22 pages

ADA Project Report

This document outlines the design and implementation of an AES hardware accelerator integrated into a RISC-V system using the Chipyard framework. The project aims to enhance AES encryption and decryption performance through a Memory-Mapped I/O interface, demonstrating the advantages of hardware acceleration over software-based solutions. The document details system specifications, design choices, testing methodologies, and potential future applications in secure computing environments.

Uploaded by

hung kung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Advanced Digital Architecture

AES Accelerator with Chipyard MMIO

Group Members
Debby Miressa MIJENA, Kassem KHALIL

Professor: Dragomir MILOJEVIC, Jan Tobias Mühlberg


Teaching assistant: Navid Ladner
June, 2025
Table of Contents

Introduction ..................................................................................................................... 1
Objective ......................................................................................................................... 2
System Specification ......................................................................................................... 3
System Design .................................................................................................................. 5
Accelerator Design in Chisel ........................................................................................... 5
System Design in Chipyard ............................................................................................. 9
System Implementation .................................................................................................. 11
Testing and Simulation ................................................................................................... 13
Chisel Test (Unit Test) .................................................................................................. 14
Chipyard Test (Integration Test) ..................................................................................... 15
Discussion ...................................................................................................................... 17
Conclusion ..................................................................................................................... 18
Appendix ....................................................................................................................... 19
References ..................................................................................................................... 20

1
Introduction

The Advanced Encryption Standard (AES) stands as a cornerstone of modern cryptography,


offering robust, symmetric-key encryption widely adopted to protect sensitive digital
information. Defined by the National Institute of Standards and Technology (NIST) as a
replacement for the aging DES algorithm, AES provides strong security guarantees and supports
key sizes of 128, 192, or 256 bits, operating on fixed 128-bit data blocks.

Although software-based AES is adequate in many applications, there are numerous


applications, such as embedded systems, or secure communication links for which high-
throughput and low-latency solutions are desired for device-to-device communication and
wireless networking that require higher throughput and lower latency. To overcome the previous
drawbacks, in this paper, hardware acceleration with the aim of achieving more efficient and
faster computation to lift the computing burden from the main processor is adopted.

In this project, AES is implemented in Electronic Codebook (ECB) mode due to its simplicity
and suitability for hardware testing, despite known limitations for data patterns. ECB mode
facilitates easier validation with reference test vectors, such as those provided by the
Cryptographic Algorithm Validation Program (CAVP).

We implement and evaluate an AES accelerator integrated via a Memory-Mapped I/O (MMIO)
interface within a Chipyard-generated RISC-V SoC. The use of MMIO provides a lightweight
and non-intrusive method for connecting the custom AES module to the processor, avoiding the
need to modify the instruction set architecture. The RocketCore, a simple and extensible in-order
RISC-V processor, serves as the software-accessible frontend for driving the accelerator.

We use Chipyard, a hardware design framework based on Chisel and RocketChip, for its
flexibility in composing, customizing, and simulating SoC architectures. This integration allows
us to prototype, simulate, and evaluate the performance of AES encryption in a controlled RISC-
V environment, demonstrating the potential of MMIO-based hardware acceleration in secure and
efficient system designs.

While FPGAs are widely used for prototyping due to their reconfigurability and rapid
development cycle, Application-Specific Integrated Circuits (ASICs) offer distinct advantages in
deployment scenarios. ASIC implementations provide significantly higher performance and
energy efficiency, making them ideal for mass-produced embedded devices where size, power,
and speed are critical. Thus, the hardware architecture designed in this project could serve as a
blueprint for future ASIC implementations of AES accelerators in secure computing
environments.

1
Objective

The goal of this project is to design and evaluate a custom AES hardware accelerator and
integrate it into a RISC-V system using the Chipyard framework. This includes implementing the
accelerator in Chisel, connecting it to a RocketCore processor through a Memory-Mapped I/O
(MMIO) interface, and evaluating the functional correctness and performance improvements
over software-based AES.

The design supports both AES encryption and decryption using a 128-bit key. The
implementation is validated using standardized test vectors from the Cryptographic Algorithm
Validation Program (CAVP) to ensure functional correctness and compliance. The system will
be validated using testbenches and waveform analysis.

Additionally, the project will analyze the security implications of hardware sharing and evaluate
the accelerator’s performance impact. As an optional enhancement, the design will be prototyped
on an FPGA using Vivado to compare RTL simulation results with actual hardware deployment
characteristics (if time permits).

2
System Specification

This section outlines the structural and functional components of the AES accelerator system, the
interfaces involved, and the environment in which it is developed and tested.

1. AES Accelerator

 The core of the system is a custom-designed AES encryption/decryption module


implemented in Chisel, targeting the AES-128 standard.
 The accelerator is designed to process 128-bit plaintext blocks using a 128-bit key,
producing 128-bit ciphertext as output.
 We use ECB mode, where each 128-bit block is processed independently. This mode
simplifies hardware validation and is aligned with CAVP test vector structure.
 Internal modules include:
o Key expansion logic
o SubBytes, ShiftRows, MixColumns transformations
o AddRoundKey stages

2. MMIO Interface

 The AES accelerator is integrated with the RocketCore processor using a Memory-
Mapped I/O (MMIO) interface.
 The interface consists of:
o Control (e.g., input such as start, operation (enrypt or decrypt)) and for
Status (e.g., output such as input_ready, outut_valid (done) flags)
o Input data registers for plaintext and key
o Output data registers for ciphertext
 Communication follows a polling, which is an interrupt-free model for simplicity.

3. SoC Platform

 The system is built using Chipyard, a customizable SoC generation framework.


 The processor used is RocketCore, a 5-stage in-order RISC-V CPU core.
 The memory system includes L1 caches and on-chip memory to simulate real-world
execution flow.
 All modules (processor, MMIO bus, accelerator) communicate via TileLink, Chipyard’s
default interconnect protocol.

3
4. Test Infrastructure

 Simulation is performed using Verilator and ChiselTest.


 Functional verification includes:
o Standalone unit tests of the AES module
o Full system tests verifying end-to-end software-hardware interaction
 Waveform analysis is conducted using GTKWave for signal tracing and debugging.

5. Software Driver

 A simple C driver is used to interact with the MMIO-mapped AES registers.


 The CPU handles all message preparation. It writes plaintext and key into the MMIO
registers and receives the resulting ciphertext. In decryption mode, the process is
reversed.
 It performs:
o Data formatting and loading
o Start/Done signal handling
o Polling for results
 The software runs on the RocketCore, compiled with RISC-V GCC and loaded using
Verilator simulation.

4
System Design

This section describes the internal architecture of the AES accelerator, the MMIO interface to the
RocketCore, and the overall system composition within Chipyard. The design supports both
encryption and decryption in Electronic Codebook (ECB) mode using a fixed 128-bit key size.

The system design is separated into two main sections:

1) Accelerator Design in Chisel


2) System Design in Chipyard

Accelerator Design in Chisel

The AES accelerator is implemented in Chisel, and adheres to the AES-128 specification. It is
structured as a modular design to support both encryption and decryption of 128-bit messages
using 128-bit symmetric keys. The design operates in Electronic Codebook (ECB) mode, where
each 128-bit block is encrypted or decrypted independently. Although ECB lacks semantic
security for structured data, it simplifies hardware control and testing.

Figure 1: AES Encryption/Decryption Architecture

5
In order to describe the transitional behavior from one module to another, we need to model the
modules as states and use a moore state machine to describe the transitional behavior. The figure
below encapsulates the architectural flow shown in Figure 1 much efficiently.

(a) (b)

Figure 2: Moore State Machine for AES Algorithm (a) Encryption (b) Decryption

We used round as a transition condition since each critical transitions happen at a specific round.
For instance, the transition to EXT (Exit) State requires the round to be 11, i.e. the final round.
Also, at this round Mix Columns is skipped as shown in Figure 2. Round spans from 1 to 11.

Description of the State Symbols:


BGN – Begin
ARK – Add Round Key
SB – Substitute Bytes ISB – Inverse Substitute Bytes
SR – Shift Rows ISR – Inverse Shift Rows
MC – Mix Columns IMC – Inverse Mix Columns
EXT – Exit

Each module is designed to be reused for better programming effiency and to remove
redundancy. Also, we replicated modules like Substitute byte multiple times in hardware because
we want to maximize the speed/performance of the accelerator by performing independent tasks
in parallel. This makes each stage/step to be computed in one clock cycle.

6
AesCore Module

The AesCore module acts as the top-level wrapper, instantiating both the encryption and
decryption pipelines. A control signal (e.g., decrypt) determines which of the two is active during
a given operation. This enables runtime selection between AES encryption and AES decryption
using a unified interface.

AesEncryptionCore

Responsible for 128-bit AES encryption, this core contains:

 KeyExpansion: Implements the Rijndael key schedule to generate round keys from the
original 128-bit key. It supports generating all 11 round keys needed for AES-128.
 AddRoundKey: Performs XOR between the state and round key. It's applied before the
first round and at the end of each subsequent round.
 AddRound Module:
o ShiftRows: Performs cyclic left shifts on the state’s rows.
o MixColumns: Applies a matrix multiplication in GF(28 ) to each column of the
state.
o SubBytes: Replaces each byte using the AES S-Box.
 SBoxes: Contains 16 parallel S-Box lookup tables, one for each byte of
the state.

AesDecryptionCore

Handles AES decryption, and mirrors the encryption path with inverse operations:

 KeyExpansion: Reused to generate round keys. For decryption, keys are applied in
reverse order.
 InvAddRoundKey: XORs the state with the correct round key, but applied in the inverse
round order.
 InvAddRound Module:
o InvShiftRows: Performs cyclic right shifts on rows to reverse ShiftRows.
o InvMixColumns: Applies the inverse matrix multiplication in GF(28 ) to reverse
MixColumns.
o InvSubBytes: Uses the inverse S-Box to reverse the substitution layer.
 InvSBoxes: Contains inverse lookup tables corresponding to the S-Box
used in encryption.

7
Figure 3 illustrates the internal module hierarchy and class structure of the AES hardware
accelerator. The design follows a clean separation between encryption and decryption logic, each
encapsulated in its own core submodule for modularity, maintainability, and testing flexibility.

Aes

AesCore

AesEncryptionCore AesDecryptionCore

KeyExpansion KeyExpansion

- AddRoundKey - InvAddRoundKey

AddRound InvAddRound

ShiftRows InvShiftRows

MixColumns InvMixColumns

SubBytes InvSubBytes
SBoxes InvSBoxes

Figure 3: Internal Class Structure/Hierarchy of AES Encryption and Decryption Modules

8
System Design in Chipyard

The following is the regmap used for the AES-MMIO interfacing using TileLink to the CPU.

Register Address Offset Description


Control 0x00 Start and Operation Select (0 for Encrypt/1 for Decrypt)
(1-bit, input from CPU)
X0 – X3 0x04 – 0x10 Data In (plaintext for encryption, ciphertext for decryption)
(128 bits, input from CPU)
Y0 – Y3 0x14– 0x20 Key (128 bits, input from CPU)
aes0 – aes3 0x24 – 0x30 Data Out (plaintext for decryption, ciphertext for encryption)
(128 bits, output to CPU)
Flag 0x34 input_ready (ready to accept input) and output_valid (done)
(1-bit, output to CPU)

Figure 4 illustrates the high-level architecture of the AES accelerator system integrated with
RocketCore via a memory-mapped interface. The RocketCore communicates with the AES
module through the system bus using standard MMIO operations. Input data (plaintext or
ciphertext) and the encryption key are written to dedicated registers. The control register triggers
the operation mode (encryption or decryption), and the result is read from the output registers. A
flag register is used to indicate completion and input readiness, enabling smooth synchronization
between the CPU and accelerator.

RocketCore S
Y P
E data_in
S R
I
T P key
data_out
E
H
E AES
R decrypt
M Y
start done
B
U
Memory B S

U
S

Figure 4: Top-level SoC representation of AES Accelerator Modules

9
Trade-Offs in Design Choices
The AES accelerator is connected to the RocketCore processor using a Memory-Mapped I/O
(MMIO) interface. This integration enables the software to write input data, set control signals,
and read back results using simple load/store instructions without modifying the processor's
instruction set.

We chose RocketCore over BOOM because Rocket is an in-order, single-issue RISC-V core,
offering a simpler and more deterministic execution model ideal for testing and verifying
accelerator behavior. BOOM, being out-of-order, introduces more architectural complexity and
non-determinism that is unnecessary for this project’s focus on functional and performance
validation of the accelerator.

Similarly, we selected MMIO over RoCC (Rocket Custom Coprocessor interface) to reduce
integration complexity. While RoCC allows tighter coupling with custom instructions, it requires
modifying the RocketCore pipeline and writing a dedicated compiler pass or inline assembly.
MMIO offers a simpler software interface, allowing us to control the accelerator with standard
memory-mapped register accesses.

Finally, the accelerator is exposed to the system via the TileLink (TL) protocol, Chipyard's
default interconnect for peripheral communication. TileLink was chosen over AXI4 due to its
tight integration with the RocketChip ecosystem, built-in support in the TLRegisterRouter, and
native compatibility with Chisel/Chipyard tooling. This streamlines address decoding, register
mapping, and simulation workflows while maintaining performance and extensibility.

The AES module thus appears as a TileLink-compliant memory-mapped slave on the periphery
bus, allowing the RocketCore to access its registers directly. The TLRegisterRouter module
automatically handles address decoding, data routing, and response signaling, making the
integration efficient and hardware-agnostic.

10
System Implementation
After checking everything is working in chisel, we now move to chipyard environment to build
the complete system and analyse our design using various tools. In this section, we will discuss
our implementation details step by step, then proceed to testing and simulation.

First, we moved the aes accelerator folder (updated chisel template project) from HOME to the
following path: /home/elech505/chipyard/generators/chipyard/src/main/scala

Next, we create AES.scala inside the following path:


/home/elech505/chipyard/generators/chipyard/src/main/scala/example

The module connects to the Rocket core via TileLink, enabling software to trigger AES
operations and retrieve results using regular memory operations.

AES.scala defines and integrates an AES encryption/decryption accelerator into the Chipyard
system using a memory-mapped I/O (MMIO) interface. It includes:

AESMMIOChiselModule: Implements AES control logic with a simple state machine to manage
input, computation, and output stages.

AESTL: Wraps the AES core with TileLink MMIO support and register mapping for software
access.

CanHavePeripheryAES: Allows optional inclusion of the AES peripheral in the system's


peripheral bus (PBUS).

WithAES: A configuration fragment to include the AES accelerator in the system with
customizable backend options (AXI4, BlackBox, HLS), in our case, all set to false.

Next, we create AESConfigs.scala inside the following path:

/home/elech505/chipyard/generators/chipyard/src/main/scala/config

Here, AESConfigs.scala defines a Chipyard system configuration that includes the


AES accelerator.

Lastly, we edit the DigitalTop.scala found in


/home/elech505/chipyard/generators/chipyard/src/main/scala to include the following trait.

with chipyard.example.CanHavePeripheryAES

11
Next, we run the following code in the command line to create our device in chipyard
environment (AES Accelerator with Rocket Core processor using a MMIO interface)

elech505@elech505-VirtualBox:~$ chipyard-start
(/home/elech505/chipyard/.conda-env) elech505@elech505-VirtualBox:~/chipyard$ cd
sims/verilator/

(/home/elech505/chipyard/.conda-env) elech505@elech505-
VirtualBox:~/chipyard/sims/verilator$ make CONFIG=AESTLRocketConfig

After long execution of the make command, it creates a testHarness used in our integration tests.
The test.harness is inside the following path: /home/elech505//chipyard/sims/verilator

Figure 5: Successful make config AESTLRocketConfig

12
Testing and Simulation

To perform the tests, we needed to update/create the “makefile” inside the following path:
/home/elech505/chipyard/tests

- Add executable tests


- Add dump target
- Make shorthand (alias) for
build commands for each
tests we plan to implement

……………………………………………………………………………………………………

We also updated the “cmakeList”


inside the same path as the “makefile”

- Add dump target


- Add executable tests

We also updated the “makefile” inside


the path =
/home/elech505/chipyard/sims/verilator
…………………………………………………………….
- Make shorthand (alias) for
build commands for each
tests we plan to implement
-

13
Chisel Test (Unit Test)
The first test was performed to test the AES Accelerator in isolation. i.e. we tested both the
components as well as the accelerator as a whole.

For the individual components in the decryption core, we prepared testbench for InvSBox,
InvMixColumn, InvShiftRows and InvSubBytes respectively.

Figure 6: Unit test for Decryptor Core components

Overall, the AES Accelerator is tested with the NIST test vectors provided.

Figure 7: Complete test for the AesCore/AesTop

When testing, we used debug statements (eg. To print each step output to verify the accelerator is
working as expected). We have attached in the appendix the whole debug printed for a decryptor
component.

14
Chipyard Test (Integration Test)

First, we define aes-hardware.c and aes-software.c files inside /home/elech505/chipyard/tests to


test the AES accelerator implementation as well as the plain c (sequential) AES implementation
by the CPU respectively.

Then we build the tests using the command below:

cmake –build ./build/ --target aes-hardware

make –build ./build/ --target aes-software

Now, the builds (.riscv files) can be found inside /home/elech505/chipyard/tests/build

Next, we enter /home/elech505/chipyard/sims/verilator and run make aes-hardware-c to generate


simulation output in the path:
/home/elech505/chipyard/sims/verilator/output/chipyard.harness.TestHarness.AESTLRocketConfig

Next, we enter the output path and run gtkwave aes-hardware.fst to generate the waves.s

Figure 8: Output of simulation using GTKWave

15
From the simulation in Figure 8, we can observe that encryption starts at ns and end when the
output is returned at ns. If we take the difference of the two, we get a delta of 22 𝑛𝑠. This means
each round took 2 𝑛𝑠 to complete. (In this case, the frequency assumed is 500𝑀𝐻𝑧)

To compare with a software implementation, we run the aes-software.c with a RocketConfig


(without AES). The following is the result obtained:

Encryption Time: 0.009 ms


Decryption Time: 0.019 ms

Success: Decrypted text matches the original plaintext!

Therefore, the performance gain by using AES Accelerator is:


𝑇𝐶 9𝜇𝑠
𝐺𝑎𝑖𝑛 = = ≈ 409.1
𝑇𝐴𝐸𝑆 22𝑛𝑠

Therefore, we can clearly see the advantage of using ASIC for time critical tasks.

16
Discussion
This project not only demonstrated the functional integration of an AES accelerator into a RISC-
V SoC, but also revealed several important design trade-offs and security considerations relevant
to real-world deployment.

Key Management and Storage


In our current design, keys are written to the accelerator via MMIO registers directly from
software. While this method is sufficient for simulation and testing, it is insecure in real
applications. In a more robust system, key material should be handled securely through:

 Write-once or locked registers

 Volatile key storage cleared after use

 External secure elements or hardware key managers

Side-Channel Resistance
As with any hardware cryptographic module, the AES accelerator is susceptible to side-channel
attacks, particularly those based on power consumption and timing differences. Our current
prototype does not implement countermeasures, but several techniques should be considered for
secure deployment:

 Masking and randomization in the S-Box lookup process

 Dummy operations to balance timing

 Constant-time computation

 Dual-rail logic to equalize power traces


These enhancements would increase area and latency but significantly strengthen resistance to
differential power analysis (DPA) and electromagnetic analysis (EMA).

Multi-Tenant Access and Isolation


If the accelerator were to be shared between multiple applications, proper isolation is critical to
prevent leakage through residual state or timing behavior. Recommendations include:
 Explicit hardware resets after each operation

 Privilege-based access control (e.g., supervisor/user modes)

 Hardware-enforced timing budgets or execution windows

While these features were not implemented in this project, they are essential for multi-user
environments or deployment in operating systems.

17
Conclusion
This project successfully demonstrated the design, integration, and evaluation of an AES-128
encryption and decryption accelerator within a RISC-V SoC using the Chipyard framework. The
accelerator, implemented in Chisel, supports both encryption and decryption in ECB mode and is
connected to a RocketCore processor via a lightweight MMIO interface.

The choice of RocketCore, MMIO, and the TileLink protocol contributed to a clean and
manageable hardware-software integration. Functional correctness was validated using official
NIST CAVP test vectors, ensuring that both the encryption and decryption logic met standardized
expectations. Unit tests were conducted for key AES modules, and integration tests confirmed
system-level behavior across the CPU and accelerator through simulation.

While MMIO provided a simple communication model, the project also considered potential
security risks such as side-channel leakage and key exposure. Ideas such as masking, pipeline
balancing, and runtime key loading were discussed as future enhancements for improving
hardware security.

An optional FPGA implementation using Vivado was also explored to assess the real-world
feasibility and performance characteristics of the accelerator.

Overall, this project provided deep insights into accelerator design, Chisel-based hardware
development, SoC integration in Chipyard, and RISC-V-based secure processing. The resulting
architecture can serve as a basis for more advanced hardware cryptographic engines or ASIC
implementations in secure embedded systems.

18
Appendix

Ciphertext: 3AD77BB40D7A3660A89ECAF32466EF97
Initial Key: 2B7E151628AED2A6ABF7158809CF4F3C

Expected Plaintext: 6BC1BEE22E409F96E93D7E117393172A

19
References
[1] NIST, "Cryptographic Algorithm Validation Program (CAVP): Block Ciphers," National
Institute of Standards and Technology. [Online]. Available:
https://csrc.nist.gov/Projects/Cryptographic-Algorithm-Validation-Program/Block-Ciphers
[2] N. H. M. Ali and A. M. S. Rahma, An Improved AES Encryption of Audio Wave Files,
Thesis, University of Baghdad, Apr. 2015. [Online]. Available:
https://www.researchgate.net/publication/312277403
[3] U. Blumenthal, F. Maino, and K. McCloghrie, The Advanced Encryption Standard (AES)
Cipher Algorithm in the SNMP User-based Security Model, RFC 3826, IETF, June 2004.
[Online]. Available: https://datatracker.ietf.org/doc/html/rfc3826
[4] N. Ladner, Chipyard: Basic Design Flow & MMIO Accelerators Design, École
Polytechnique de Bruxelles, BEAMS - Cybersecurity, Presentation, Apr. 14, 2025.
[5] N. Ladner, Chisel – A Hardware Construction Language: Introduction, École Polytechnique
de Bruxelles, BEAMS - Cybersecurity, Presentation, Mar. 23, 2025.

20

You might also like