GitHub - momalab/AtNTT: Artifact for ISCAS26 paper

The @NTT framework accelerates post-quantum cryptography (PQC) algorithms by addressing the computational bottleneck of the Number Theoretic Transform (NTT) through design-time constant optimization and a fully pipelined architecture.

Specifically, @NTT employs the following strategies to enhance performance and efficiency:

1. Exploitation of Fixed Ring Parameters

Since the ring parameters (such as the coefficient modulus Q and polynomial degree N−1) for standardized PQC algorithms like Kyber and Dilithium are fixed, @NTT treats these values as synthesis-time constants rather than storing them in registers or memory.

This approach eliminates the need for complex memory access patterns and data movement logic traditionally used to retrieve twiddle factors.

2. Multiplierless Constant Optimization

Multipliers are the most resource-intensive components of an NTT butterfly unit.

@NTT optimizes these by:

Decomposing constant multiplications
The framework replaces traditional multipliers with a minimal set of shifts and adders/subtractors.
RTL Generation
It generates optimized RTL code specifically for the target algorithm, achieving significantly better area and performance results than general-purpose industry-standard synthesis tools.
Merging Twiddle Factors
Twiddle factors are directly merged into the design logic, saving both power and area by eliminating dedicated hardware for twiddle factor generation or storage.

3. Fully Pipelined Architecture for High Throughput

To maximize performance, @NTT uses a deeply pipelined architecture that implements every NTT stage in hardware.

This allows the design to achieve the maximum possible throughput of one N-point NTT per clock cycle.

4. Impact on Key PQC Algorithms

The framework specifically targets standardized lattice-based algorithms where NTT accounts for the majority of execution time.

ML-KEM (Kyber)

On FPGA, @NTT reduces LUT usage by approximately 28%
Improves frequency
Achieves a throughput-per-LUT efficiency 8.5× higher than state-of-the-art implementations

ML-DSA (Dilithium)

Delivers a throughput-per-LUT efficiency 5.2× higher than existing solutions
Produces up to 305,000 NTT/ms on FPGA

ASIC Performance

In ASIC implementations (TSMC 28nm), the design can deliver one N-point NTT every nanosecond

By optimizing the arithmetic units and the overall data flow at the design stage, @NTT creates highly compact and efficient hardware that fits more processing elements into a smaller area compared to traditional non-optimized designs.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
fpga_synth		fpga_synth
include		include
rtl_D		rtl_D
rtl_K		rtl_K
src		src
synth		synth
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Exploitation of Fixed Ring Parameters

2. Multiplierless Constant Optimization

3. Fully Pipelined Architecture for High Throughput

4. Impact on Key PQC Algorithms

ML-KEM (Kyber)

ML-DSA (Dilithium)

ASIC Performance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

1. Exploitation of Fixed Ring Parameters

2. Multiplierless Constant Optimization

3. Fully Pipelined Architecture for High Throughput

4. Impact on Key PQC Algorithms

ML-KEM (Kyber)

ML-DSA (Dilithium)

ASIC Performance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages