The @NTT framework accelerates post-quantum cryptography (PQC) algorithms by addressing the computational bottleneck of the Number Theoretic Transform (NTT) through design-time constant optimization and a fully pipelined architecture.
Specifically, @NTT employs the following strategies to enhance performance and efficiency:
Since the ring parameters (such as the coefficient modulus Q and polynomial degree N−1) for standardized PQC algorithms like Kyber and Dilithium are fixed, @NTT treats these values as synthesis-time constants rather than storing them in registers or memory.
This approach eliminates the need for complex memory access patterns and data movement logic traditionally used to retrieve twiddle factors.
Multipliers are the most resource-intensive components of an NTT butterfly unit.
@NTT optimizes these by:
-
Decomposing constant multiplications
The framework replaces traditional multipliers with a minimal set of shifts and adders/subtractors. -
RTL Generation
It generates optimized RTL code specifically for the target algorithm, achieving significantly better area and performance results than general-purpose industry-standard synthesis tools. -
Merging Twiddle Factors
Twiddle factors are directly merged into the design logic, saving both power and area by eliminating dedicated hardware for twiddle factor generation or storage.
To maximize performance, @NTT uses a deeply pipelined architecture that implements every NTT stage in hardware.
This allows the design to achieve the maximum possible throughput of one N-point NTT per clock cycle.
The framework specifically targets standardized lattice-based algorithms where NTT accounts for the majority of execution time.
- On FPGA, @NTT reduces LUT usage by approximately 28%
- Improves frequency
- Achieves a throughput-per-LUT efficiency 8.5× higher than state-of-the-art implementations
- Delivers a throughput-per-LUT efficiency 5.2× higher than existing solutions
- Produces up to 305,000 NTT/ms on FPGA
- In ASIC implementations (TSMC 28nm), the design can deliver one N-point NTT every nanosecond
By optimizing the arithmetic units and the overall data flow at the design stage, @NTT creates highly compact and efficient hardware that fits more processing elements into a smaller area compared to traditional non-optimized designs.