Skip to content

momalab/AtNTT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The @NTT framework accelerates post-quantum cryptography (PQC) algorithms by addressing the computational bottleneck of the Number Theoretic Transform (NTT) through design-time constant optimization and a fully pipelined architecture.

Specifically, @NTT employs the following strategies to enhance performance and efficiency:

1. Exploitation of Fixed Ring Parameters

Since the ring parameters (such as the coefficient modulus Q and polynomial degree N−1) for standardized PQC algorithms like Kyber and Dilithium are fixed, @NTT treats these values as synthesis-time constants rather than storing them in registers or memory.

This approach eliminates the need for complex memory access patterns and data movement logic traditionally used to retrieve twiddle factors.


2. Multiplierless Constant Optimization

Multipliers are the most resource-intensive components of an NTT butterfly unit.

@NTT optimizes these by:

  • Decomposing constant multiplications
    The framework replaces traditional multipliers with a minimal set of shifts and adders/subtractors.

  • RTL Generation
    It generates optimized RTL code specifically for the target algorithm, achieving significantly better area and performance results than general-purpose industry-standard synthesis tools.

  • Merging Twiddle Factors
    Twiddle factors are directly merged into the design logic, saving both power and area by eliminating dedicated hardware for twiddle factor generation or storage.


3. Fully Pipelined Architecture for High Throughput

To maximize performance, @NTT uses a deeply pipelined architecture that implements every NTT stage in hardware.

This allows the design to achieve the maximum possible throughput of one N-point NTT per clock cycle.


4. Impact on Key PQC Algorithms

The framework specifically targets standardized lattice-based algorithms where NTT accounts for the majority of execution time.

ML-KEM (Kyber)

  • On FPGA, @NTT reduces LUT usage by approximately 28%
  • Improves frequency
  • Achieves a throughput-per-LUT efficiency 8.5× higher than state-of-the-art implementations

ML-DSA (Dilithium)

  • Delivers a throughput-per-LUT efficiency 5.2× higher than existing solutions
  • Produces up to 305,000 NTT/ms on FPGA

ASIC Performance

  • In ASIC implementations (TSMC 28nm), the design can deliver one N-point NTT every nanosecond

By optimizing the arithmetic units and the overall data flow at the design stage, @NTT creates highly compact and efficient hardware that fits more processing elements into a smaller area compared to traditional non-optimized designs.

About

Artifact for ISCAS26 paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors