95 results sorted by ID
Unified MEDS Accelerator
Sanjay Deshpande, Yongseok Lee, Mamuri Nawan, Kashif Nawaz, Ruben Niederhagen, Yunheung Paek, Jakub Szefer
Implementation
The Matrix Equivalence Digital Signature (MEDS) scheme a code-based candidate in the first round of NIST’s Post-Quantum Cryptography (PQC) standardization process, offers competitively small signature sizes but incurs high computational costs for signing and verification. This work explores how a high-performance FPGA-based hardware implementation can enhance MEDS performance by leveraging the inherent parallelism of its computations, while examining the trade-offs between performance gains...
SPHINCSLET: An Area-Efficient Accelerator for the Full SPHINCS+ Digital Signature Algorithm
Sanjay Deshpande, Yongseok Lee, Cansu Karakuzu, Jakub Szefer, Yunheung Paek
Implementation
This work presents SPHINCSLET, the first fully standard-compliant and area-efficient hardware implementation of the SLH-DSA algorithm, formerly known as SPHINCS+, a post-quantum digital signature scheme. SPHINCSLET is designed to be parameterizable across different security levels and hash functions, offering a balanced trade-off between area efficiency and performance. Existing hardware implementations either feature a large area footprint to achieve fast signing and verification or adopt a...
Need for zkSpeed: Accelerating HyperPlonk for Zero-Knowledge Proofs
Alhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt Bünz, Ramesh Karri, Siddharth Garg, Brandon Reagen
Implementation
(Preprint) Zero-Knowledge Proofs (ZKPs) are rapidly gaining importance in privacy-preserving and verifiable computing. ZKPs enable a proving party to prove the truth of a statement to a verifying party without revealing anything else. ZKPs have applications in blockchain technologies, verifiable machine learning, and electronic voting, but have yet to see widespread adoption due to the computational complexity of the proving process.Recent works have accelerated the key primitives of...
PHOENIX: Crypto-Agile Hardware Sharing for ML-KEM and HQC
Antonio Ras, Antoine Loiseau, Mikaël Carmona, Simon Pontié, Guénaël Renault, Benjamin Smith, Emanuele Valea
Implementation
The transition to quantum-safe public-key cryptography has begun: for key agreement, NIST has standardized ML-KEM and selected HQC for future standardization. The relative immaturity of these schemes encourages crypto-agile implementations, to facilitate easy transitions between them. Intelligent crypto-agility requires efficient sharing strategies to compute operations from different cryptosystems using the same resources. This is particularly challenging for cryptosystems with distinct...
Is Your Bluetooth Chip Leaking Secrets via RF Signals?
Yanning Ji, Elena Dubrova, Ruize Wang
Attacks and cryptanalysis
In this paper, we present a side-channel attack on the hardware AES accelerator of a Bluetooth chip used in millions of devices worldwide, ranging from wearables and smart home products to industrial IoT. The attack leverages information about AES computations unintentionally transmitted by the chip together with RF signals to recover the encryption key. Unlike traditional side-channel attacks that rely on power or near-field electromagnetic emissions as sources of information, RF-based...
Masking-Friendly Post-Quantum Signatures in the Threshold-Computation-in-the-Head Framework
Thibauld Feneuil, Matthieu Rivain, Auguste Warmé-Janville
Cryptographic protocols
Side-channel attacks pose significant threats to cryptographic implementations, which require the inclusion of countermeasures to mitigate these attacks. In this work, we study the masking of state-of-the-art post-quantum signatures based on the MPC-in-the-head paradigm. More precisely, we focus on the recent threshold-computation-in-the-head (TCitH) framework that applies to some NIST candidates of the post-quantum standardization process. We first provide an analysis of side-channel attack...
FINAL bootstrap acceleration on FPGA using DSP-free constant-multiplier NTTs
Jonas Bertels, Hilder V. L. Pereira, Ingrid Verbauwhede
Implementation
This work showcases Quatorze-bis, a state-of-the-art Number Theoretic Transform circuit for TFHE-like cryptosystems on FPGAs. It contains a novel modular multiplication design for modular multiplication with a constant for a constant modulus. This modular multiplication design does not require any DSP units or any dedicated multiplier unit, nor does it require extra logic when compared to the state-of-the-art modular multipliers. Furthermore, we present an implementation of a constant...
On symbolic computations over arbitrary commutative rings and cryptography with the temporal Jordan-Gauss graphs.
Vasyl Ustimenko
Foundations
The paper is dedicated to Multivariate Cryptography over general commutative ring K and protocols of symbolic computations for safe delivery of multivariate maps. We consider itera-tive algorithm of generation of multivariate maps of prescribed degree or density with the trapdoor accelerator, i.e. piece of information which allows to compute the reimage of the map in polynomial time. The concept of Jordan-Gauss temporal graphs is used for the obfus-cation of known graph based public keys ...
Accelerating Hash-Based Polynomial Commitment Schemes with Linear Prover Time
Florian Hirner, Florian Krieger, Constantin Piber, Sujoy Sinha Roy
Implementation
Zero-knowledge proofs (ZKPs) are cryptographic protocols that enable one party to prove the validity of a statement without revealing any information beyond its truth. A central building block in many ZKPs are polynomial commitment schemes (PCS) where constructions with \textit{linear-time provers} are especially attractive. Two such examples are Brakedown and its extension Orion which enable linear-time and quantum-resistant proving by leveraging linear-time encodable Spielman codes....
OPTIMSM: FPGA hardware accelerator for Zero-Knowledge MSM
Xander Pottier, Thomas de Ruijter, Jonas Bertels, Wouter Legiest, Michiel Van Beirendonck, Ingrid Verbauwhede
Implementation
The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving...
OpenNTT: An Automated Toolchain for Compiling High-Performance NTT Accelerators in FHE
Florian Krieger, Florian Hirner, Ahmet Can Mert, Sujoy Sinha Roy
Implementation
Modern cryptographic techniques such as fully homomorphic encryption (FHE) have recently gained broad attention. Most of these cryptosystems rely on lattice problems wherein polynomial multiplication forms the computational bottleneck. A popular method to accelerate these polynomial multiplications is the Number-Theoretic Transformation (NTT). Recent works aim to improve the practical deployability of NTT and propose toolchains supporting the NTT hardware accelerator design processes....
On Schubert cells of Projective Geometry and quadratic public keys of Multivariate Cryptography
Vasyl Ustimenko
Public-key cryptography
Jordan-Gauss graphs are bipartite graphs given by special quadratic equations over the commutative ring K with unity with partition sets
K^n and K^m , n ≥m such that the neighbour of each vertex is defined by the system of linear equation given in its row-echelon form.
We use families of this graphs for the construction of new quadratic and cubic surjective multivariate maps F of K^n onto K^m (or K^n onto K^n) with the trapdoor accelerators T , i. e. pieces of information which...
FELIX (XGCD for FALCON): FPGA-based Scalable and Lightweight Accelerator for Large Integer Extended GCD
Sam Coulon, Tianyou Bao, Jiafeng Xie
Implementation
The Extended Greatest Common Divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and post-quantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bezout coefficients $b_a$ and $b_b$ which satisfy $\mathrm{GCD}(a,b) = a\times b_a + b\times b_b$. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation...
MSMAC: Accelerating Multi-Scalar Multiplication for Zero-Knowledge Proof
Pengcheng Qiu, Guiming Wu, Tingqiang Chu, Changzheng Wei, Runzhou Luo, Ying Yan, Wei Wang, Hui Zhang
Implementation
Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...
Towards ML-KEM & ML-DSA on OpenTitan
Amin Abdulrahman, Felix Oberhansl, Hoang Nguyen Hien Pham, Jade Philipoom, Peter Schwabe, Tobias Stelzer, Andreas Zankl
Implementation
This paper presents extensions to the OpenTitan hardware root of trust that aim at enabling high-performance lattice-based cryptography. We start by carefully optimizing ML-KEM and ML-DSA - the two primary algorithms selected by NIST for standardization - in software targeting the OTBN accelerator. Based on profiling results of these implementations, we propose tightly integrated extensions to OTBN, specifically an interface from OTBN to OpenTitan's Keccak accelerator (KMAC core) and...
An NVMe-based Secure Computing Platform with FPGA-based TFHE Accelerator
Yoshihiro Ohba, Tomoya Sanuki, Claude Gravel, Kentaro Mihara, Asuka Wakasugi, Kenta Adachi
Implementation
In this study, we introduce a new approach to secure computing by implementing a platform that utilizes a non-volatile memory express (NVMe)-based system with an FPGA-based Torus fully homomorphic encryption (TFHE) accelerator, solid state drive (SSD), and middleware on the host-side. Our platform is the first to offer completely secure computing capabilities for TFHE by using an FPGA-based accelerator. We defined secure computing instructions to evaluate 14-bit to 14-bit functions using...
Vision Mark-32: ZK-Friendly Hash Function Over Binary Tower Fields
Tomer Ashur, Mohammad Mahzoun, Jim Posen, Danilo Šijačić
Implementation
Zero-knowledge proof systems are widely used in different applications on the Internet. Among zero-knowledge proof systems, SNARKs are a popular choice because of their fast verification time and small proof size. The efficiency of zero-knowledge systems is crucial for usability, resulting in the development of so-called arithmetization-oriented ciphers. In this work, we introduce Vision Mark-32, a modified instance of Vision defined over binary tower fields, with an optimized number of...
On the cryptosystems based on two Eulerian transfor-mations defined over the commutative rings $Z_{2^s}, s>1$.
Vasyl Ustimenko
Cryptographic protocols
We suggest the family of ciphers s^E^n, n=2,3,.... with the space of plaintexts (Z*_{2^s})^n, s >1 such that the encryption map is the composition of kind G=G_1A_1G_2A_2 where A_i are the affine transformations from AGL_n(Z_{2^s}) preserving the variety (Z*_{2^s)}^n , Eulerian endomorphism G_i , i=1,2 of K[x_1, x_2,...., x_n] moves x_i to monomial term ϻ(x_1)^{d(1)}(x_2)^{d(2)}...(x_n)^{d(n)} , ϻϵ Z*_{2^s} and act on (Z*_{2^s})^n as bijective transformations.
The cipher is...
Exploring the Advantages and Challenges of Fermat NTT in FHE Acceleration
Andrey Kim, Ahmet Can Mert, Anisha Mukherjee, Aikata Aikata, Maxim Deryabin, Sunmin Kwon, HyungChul Kang, Sujoy Sinha Roy
Implementation
Recognizing the importance of a fast and resource-efficient polynomial multiplication in homomorphic encryption, in this paper, we design a multiplier-less number theoretic transform using a Fermat number as an auxiliary modulus. To make this algorithm scalable with the degree of polynomial, we apply a univariate to multivariate polynomial ring transformation.
We develop an accelerator architecture for fully homomorphic encryption using these algorithmic techniques for efficient...
Hardware Acceleration of the Prime-Factor and Rader NTT for BGV Fully Homomorphic Encryption
David Du Pont, Jonas Bertels, Furkan Turan, Michiel Van Beirendonck, Ingrid Verbauwhede
Implementation
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...
Aloha-HE: A Low-Area Hardware Accelerator for Client-Side Operations in Homomorphic Encryption
Florian Krieger, Florian Hirner, Ahmet Can Mert, Sujoy Sinha Roy
Implementation
Homomorphic encryption (HE) has gained broad attention in recent years as it allows computations on encrypted data enabling secure cloud computing. Deploying HE presents a notable challenge since it introduces a performance overhead by orders of magnitude. Hence, most works target accelerating server-side operations on hardware platforms, while little attention has been given to client-side operations. In this paper, we present a novel design methodology to implement and accelerate the...
Accelerating Isogeny Walks for VDF Evaluation
David Jacquemin, Anisha Mukherjee, Ahmet Can Mert, Sujoy Sinha Roy
Implementation
VDFs are characterized by sequential function evaluation but an immediate output verification. In order to ensure secure use of VDFs in real-world applications, it is important to determine the fastest implementation. Considering the point of view of an attacker (say with unbounded resources), this paper aims to accelerate the isogeny-based VDF proposed by De Feo-Mason-Petit-Sanso in 2019. It is the first work that implements a hardware accelerator for the evaluation step of an isogeny VDF....
Whipping the MAYO Signature Scheme using Hardware Platforms
Florian Hirner, Michael Streibl, Florian Krieger, Ahmet Can Mert, Sujoy Sinha Roy
Implementation
NIST issued a new call in 2023 to diversify the portfolio of quantum-resistant digital signature schemes since the current portfolio relies on lattice problems. The MAYO scheme, which builds on the Unbalanced Oil and Vinegar (UOV) problem, is a promising candidate for this new call. MAYO introduces emulsifier maps and a novel 'whipping' technique to significantly reduce the key sizes compared to previous UOV schemes.
This paper provides a comprehensive analysis of the implementation...
REED: Chiplet-Based Accelerator for Fully Homomorphic Encryption
Aikata Aikata, Ahmet Can Mert, Sunmin Kwon, Maxim Deryabin, Sujoy Sinha Roy
Implementation
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation and has many applications. However, its practical implementation faces massive computation and memory overheads. To address this bottleneck, several Application-Specific Integrated Circuit (ASIC) FHE accelerators have been proposed. All these prior works put every component needed for FHE onto one chip (monolithic), hence offering high performance. However, they encounter common challenges associated with large-scale...
Towards High-speed ASIC Implementations of Post-Quantum Cryptography
Malik Imran, Aikata Aikata, Sujoy Sinha Roy, Samuel pagliarini
Implementation
In this brief, we realize different architectural techniques towards improving the performance of post-quantum cryptography (PQC) algorithms when implemented as hardware accelerators on an application-specific integrated circuit (ASIC) platform. Having SABER as a case study, we designed a 256-bit wide architecture geared for high-speed cryptographic applications that incorporates smaller and distributed SRAM memory blocks. Moreover, we have adapted the building blocks of SABER to process...
Efficient Accelerator for NTT-based Polynomial Multiplication
Raziyeh Salarifard, Hadi Soleimany
Implementation
The Number Theoretic Transform (NTT) is used to efficiently execute polynomial multiplication. It has become an important part of lattice-based post-quantum methods and the subsequent generation of standard cryptographic systems. However, implementing post-quantum schemes is challenging
since they rely on intricate structures. This paper demonstrates how to develop a high-speed NTT multiplier highly optimized
for FPGAs with few logical resources. We describe a novel architecture for NTT...
A 334µW 0.158mm2 ASIC for Post-Quantum Key-Encapsulation Mechanism Saber with Low-latency Striding Toom-Cook Multiplication Extended Version
Archisman Ghosh, Jose Maria Bermudo Mera, Angshuman Karmakar, Debayan Das, Santosh Ghosh, Ingrid Verbauwhede, Shreyas Sen
Public-key cryptography
The hard mathematical problems that assure the security of our current public-key cryptography (RSA, ECC) are broken if and when a quantum computer appears rendering them ineffective for use in the quantum era. Lattice based cryptography is a novel approach to public key cryptography, of which the mathematical investigation (so far) resists attacks from quantum computers. By choosing a module learning with errors (MLWE) algorithm as the next standard, National Institute of Standard \&...
Hardware Acceleration of FHEW
Jonas Bertels, Michiel Van Beirendonck, Furkan Turan, Ingrid Verbauwhede
Implementation
The magic of Fully Homomorphic Encryption (FHE) is that it allows operations on encrypted data without decryption. Unfortunately, the slow computation time limits their adoption. The slow computation time results from the vast memory requirements (64Kbits per ciphertext), a bootstrapping key of 1.3 GB, and sizeable computational overhead (10240 NTTs, each NTT requiring 5120 32-bit multiplications). We accelerate the FHEW bootstrapping in hardware on a high-end U280 FPGA.
To reduce the...
TREBUCHET: Fully Homomorphic Encryption Accelerator for Deep Computation
David Bruce Cousins, Yuriy Polyakov, Ahmad Al Badawi, Matthew French, Andrew Schmidt, Ajey Jacob, Benedict Reynwar, Kellie Canida, Akhilesh Jaiswal, Clynn Mathew, Homer Gamil, Negar Neda, Deepraj Soni, Michail Maniatakos, Brandon Reagen, Naifeng Zhang, Franz Franchetti, Patrick Brinich, Jeremy Johnson, Patrick Broderick, Mike Franusich, Bo Zhang, Zeming Cheng, Massoud Pedram
Implementation
Secure computation is of critical importance to not only the DoD, but across financial institutions, healthcare, and anywhere personally identifiable information (PII) is accessed. Traditional security techniques require data to be decrypted before performing any computation. When processed on untrusted systems the decrypted data is vulnerable to attacks to extract the sensitive information. To address these vulnerabilities Fully Homomorphic Encryption (FHE) keeps the data encrypted...
RPU: The Ring Processing Unit
Deepraj Soni, Negar Neda, Naifeng Zhang, Benedict Reynwar, Homer Gamil, Benjamin Heyman, Mohammed Nabeel Thari Moopan, Ahmad Al Badawi, Yuriy Polyakov, Kellie Canida, Massoud Pedram, Michail Maniatakos, David Bruce Cousins, Franz Franchetti, Matthew French, Andrew Schmidt, Brandon Reagen
Applications
Ring-Learning-with-Errors (RLWE) has emerged as the foundation of many important techniques for improving security and privacy, including homomorphic encryption and post-quantum cryptography. While promising, these techniques have received limited use due to their extreme overheads of running on general-purpose machines. In this paper, we present a novel vector Instruction Set Architecture (ISA) and microarchitecture for accelerating the ring-based computations of RLWE. The ISA, named B512,...
AI Attacks AI: Recovering Neural Network architecture from NVDLA using AI-assisted Side Channel Attack
Naina Gupta, Arpan Jati, Anupam Chattopadhyay
Attacks and cryptanalysis
During the last decade, there has been a stunning progress in the domain of AI with adoption in both safety-critical and security-critical applications. A key requirement for this is highly trained Machine Learning (ML) models, which are valuable Intellectual Property (IP) of the respective organizations. Naturally, these models have become targets for model recovery attacks through side-channel leakage. However, majority of the attacks reported in literature are either on simple embedded...
Unlimited Results: Breaking Firmware Encryption of ESP32-V3
Karim M. Abdellatif, Olivier Hériveaux, Adrian Thillard
Attacks and cryptanalysis
Because of the rapid growth of Internet of Things (IoT), embedded systems have become an interesting target for experienced attackers. ESP32~\cite{tech-ref-man} is a low-cost and low-power system on chip (SoC) series created by Espressif Systems. The firmware extraction of such embedded systems is a real threat to the manufacturer as it breaks its intellectual property and raises the risk of creating equivalent systems with less effort and resources. In 2019,...
FPT: a Fixed-Point Accelerator for Torus Fully Homomorphic Encryption
Michiel Van Beirendonck, Jan-Pieter D'Anvers, Furkan Turan, Ingrid Verbauwhede
Implementation
Fully Homomorphic Encryption (FHE) is a technique that allows computation on encrypted data. It has the potential to drastically change privacy considerations in the cloud, but high computational and memory overheads are preventing its broad adoption. TFHE is a promising Torus-based FHE scheme that heavily relies on bootstrapping, the noise-removal tool invoked after each encrypted logical/arithmetical operation.
We present FPT, a Fixed-Point FPGA accelerator for TFHE bootstrapping. FPT...
On Extremal Algebraic Graphs and Multivariate Cryptosystems
Vasyl Ustimenko
Public-key cryptography
Multivariate rule x_i -> f_i, i = 1, 2, ..., n, f_i from K[x_1, x_2, ..., x_n]
over commutative ring K defines endomorphism σ_n of K[x_1, x_2, ..., x_n] into itself given by its values on variables x_i. Degree of σ_n can be defined as maximum of degrees of polynomials f_i. We say that family σ_n, n = 2, 3, .... has trapdoor accelerator ^nT if the knowledge of the piece of information ^nT allows to compute reimage x of y = σ_n(x) in time O(n^2). We use extremal algebraic graphs for the...
HPKA: A High-Performance CRYSTALS-Kyber Accelerator Exploring Efficient Pipelining
Ziying Ni, Ayesha Khalid, Dur-e-Shahwar Kundi, Máire O’Neill, Weiqiang Liu
Implementation
CRYSTALS-Kyber (Kyber) was recently chosen as the first quantum resistant Key Encapsulation Mechanism (KEM) scheme for standardisation, after three rounds of the National Institute of Standards and Technology (NIST) initiated PQC competition which begin in 2016 and search of the best quantum resistant KEMs and digital signatures. Kyber is based on the Module-Learning with Errors (M-LWE) class of Lattice-based Cryptography, that is known to manifest efficiently on FPGAs. This work explores...
A Novel High-performance Implementation of CRYSTALS-Kyber with AI Accelerator
Lipeng Wan, Fangyu Zheng, Guang Fan, Rong Wei, Lili Gao, Jiankuo Dong, Jingqiang Lin, Yuewu Wang
Implementation
Public-key cryptography, including conventional cryptosystems and post-quantum cryptography, involves computation-intensive workloads. With noticing the extraordinary computing power of AI accelerators, in this paper, we further explore the feasibility to introduce AI accelerators into high-performance cryptographic computing.
Since AI accelerators are dedicated to machine learning or neural networks, the biggest challenge is how to transform cryptographic workloads into their operations,...
BASALISC: Programmable Hardware Accelerator for BGV Fully Homomorphic Encryption
Robin Geelen, Michiel Van Beirendonck, Hilder V. L. Pereira, Brian Huffman, Tynan McAuley, Ben Selfridge, Daniel Wagner, Georgios Dimou, Ingrid Verbauwhede, Frederik Vercauteren, David W. Archer
Implementation
Fully Homomorphic Encryption (FHE) allows for secure computation on encrypted data. Unfortunately, huge memory size, computational cost and bandwidth requirements limit its practicality. We present BASALISC, an architecture family of hardware accelerators that aims to substantially accelerate FHE computations in the cloud. BASALISC is the first to implement the BGV scheme with fully-packed bootstrapping – the noise removal capability necessary for arbitrary-depth computation. It supports a...
High-Performance Polynomial Multiplication Hardware Accelerators for KEM Saber and NTRU
Elizabeth Carter, Pengzhou He, Jiafeng Xie
Implementation
Along the rapid development in building large-scale quantum computers, post-quantum cryptography (PQC) has drawn significant attention from research community recently as it is proven that the existing public-key cryptosystems are vulnerable to the quantum attacks. Following this direction, this paper presents a novel implementation of high-performance polynomial multiplication hardware accelerators for key encapsulation mechanism (KEM) Saber and NTRU, two PQC algorithms that are currently...
High-speed SABER Key Encapsulation Mechanism in 65nm CMOS
Malik Imran, Felipe Almeida, Andrea Basso, Sujoy Sinha Roy, Samuel Pagliarini
Public-key cryptography
Quantum computers will break cryptographic primitives that are based on integer factorization and discrete logarithm problems. SABER is a key agreement scheme based on the Learning With Rounding problem that is quantum-safe, i.e., resistant to quantum computer attacks. This article presents a high-speed silicon implementation of SABER in a 65nm technology as an Application Specific Integrated Circuit. The chip measures 1$mm^2$ in size and can operate at a maximum frequency of 715$MHz$ at a...
Lightweight Hardware Accelerator for Post-Quantum Digital Signature CRYSTALS-Dilithium
Naina Gupta, Arpan Jati, Anupam Chattopadhyay, Gautam Jha
Implementation
The looming threat of an adversary with Quantum computing capability led to a worldwide research effort towards identifying and standardizing novel post-quantum cryptographic primitives. Post-standardization, all existing security protocols will need to support efficient implementation of these primitives. In this work, we contribute to these efforts by reporting the smallest implementation of CRYSTALS-Dilithium, a finalist candidate for post-quantum digital signature.
By invoking multiple...
Medha: Microcoded Hardware Accelerator for computing on Encrypted Data
Ahmet Can Mert, Aikata, Sunmin Kwon, Youngsam Shin, Donghoon Yoo, Yongwoo Lee, Sujoy Sinha Roy
Implementation
Homomorphic encryption enables computation on encrypted data, and hence it has a great potential in privacy-preserving outsourcing of computations to the cloud. Hardware acceleration of homomorphic encryption is crucial as software implementations are very slow. In this paper, we present design methodologies for building a programmable hardware accelerator for speeding up the cloud-side homomorphic evaluations on encrypted data.
First, we propose a divide-and-conquer technique that...
Accelerator for Computing on Encrypted Data
Sujoy Sinha Roy, Ahmet Can Mert, Aikata, Sunmin Kwon, Youngsam Shin, Donghoon Yoo
Implementation
Fully homomorphic encryption enables computation on encrypted data, and hence it has a great potential in privacy-preserving outsourcing of computations. In this paper, we present a complete instruction-set processor architecture ‘Medha’ for accelerating the cloud-side operations of an RNS variant of the HEAAN homomorphic encryption scheme. Medha has been designed following a modular hardware design approach to attain a fast computation time for computationally expensive homomorphic...
CoHA-NTT: A Configurable Hardware Accelerator for NTT-based Polynomial Multiplication
Kemal Derya, Ahmet Can Mert, Erdinç Öztürk, Erkay Savaş
Public-key cryptography
In this paper, we introduce a configurable hardware architecture that can be used to generate unified and parametric NTT-based polynomial multipliers that support a wide range of parameters of lattice-based cryptographic schemes proposed for post-quantum cryptography. Both NTT and inverse NTT operations can be performed using the unified butterfly unit of our architecture, which constitutes the core building block in NTT operations. The multitude of this unit plays an essential role in...
A Fast Large-Integer Extended GCD Algorithm and Hardware Design for Verifiable Delay Functions and Modular Inversion
Kavya Sreedhar, Mark Horowitz, Christopher Torng
Implementation
The extended GCD (XGCD) calculation, which computes Bézout coefficients b_a, b_b such that b_a ∗ a_0 + b_b ∗ b_0 = GCD(a_0, b_0), is a critical operation in many cryptographic applications. In particular, large-integer XGCD is computationally dominant for two applications of increasing interest: verifiable delay functions that square binary quadratic forms within a class group and constant-time modular inversion for elliptic curve cryptography. Most prior work has focused on fast software...
Design Space Exploration of SABER in 65nm ASIC
Malik Imran, Felipe Almeida, Jaan Raik, Andrea Basso, Sujoy Sinha Roy, Samuel Pagliarini
Public-key cryptography
This paper presents a design space exploration for SABER, one of the finalists in NIST’s quantum-resistant public-key cryptographic standardization effort. Our design space exploration targets a 65nmASIC platform and has resulted in the evaluation of 6 different architectures. Our exploration is initiated by setting a baseline architecture which is ported from FPGA. In order to improve the clock frequency (the primary goal in our exploration), we have employed several optimizations: (i) use...
A Configurable Crystals-Kyber Hardware Implementation with Side-Channel Protection
Arpan Jati, Naina Gupta, Anupam Chattopadhyay, Somitra Kumar Sanadhya
Implementation
In this work, we present a configurable and side channel resistant implementation of the post-quantum key-exchange algorithm Crystals-Kyber. The implemented design can be configured for different performance and area requirements leading to different trade-offs for different applications. A low area implementation can be achieved in 5269 LUTs and 2422 FFs, whereas a high performance implementation required 7151 LUTs and 3730 FFs. Due to a deeply pipelined architecture, a high operating speed...
A Multiplatform Parallel Approach for Lattice Sieving Algorithms
Michał Andrzejczak, Kris Gaj
Implementation
Lattice sieving is currently the leading class of algorithms for solving the shortest vector problem over lattices. The computational difficulty of this problem is the basis for constructing secure post-quantum public-key cryptosystems based on lattices. In this paper, we present a novel massively parallel approach for solving the shortest vector problem using lattice sieving and hardware acceleration.
We combine previously reported algorithms with a proper caching strategy and develop...
CARiMoL: A Configurable Hardware Accelerator for Ringand Module Lattice-Based Post-Quantum Cryptography
Afifa Ishtiaq, Dr. Muhammad Shafique, Dr. Osman Hassan
Implementation
Abstract—CARiMoL is a novel run-time Configurable Hardware Accelerator for Ring and Module Lattice-based postquantum
cryptography. It’s flexible design can be configured to key-pair generation, encapsulation, and decapsulation for NewHope and CRYSTALS-Kyber schemes using same hardware. CARiMoL offers run-time configurability for multiple security levels of NewHope and CRYSTALS-Kyber schemes, supporting both Chosen-Plaintext Attack (CPA) and Chosen-Ciphertext Attack (CCA) secure...
Accelerated RISC-V for Post-Quantum SIKE
Rami Elkhatib, Reza Azarderakhsh, Mehran Mozaffari-Kermani
Public-key cryptography
Software implementations of cryptographic algorithms are slow but highly flexible and relatively easy to implement. On the other hand, hardware implementations are usually faster but provide little flexibility and require a lot of time to implement efficiently. In this paper, we develop a hybrid software-hardware implementation of the third round of Supersingular Isogeny Key Encapsulation (SIKE), a post-quantum cryptography algorithm candidate for NIST. We implement an isogeny field...
High-Speed NTT-based Polynomial Multiplication Accelerator for CRYSTALS-Kyber Post-Quantum Cryptography
Mojtaba Bisheh-Niasar, Reza Azarderakhsh, Mehran Mozaffari-Kermani
Implementation
This paper demonstrates an architecture for accelerating the polynomial multiplication using number theoretic transform (NTT). Kyber is one of the finalists in the third round of the NIST post-quantum cryptography standardization process. Simultaneously, the performance of NTT execution is its main challenge, requiring large memory and complex memory access pattern. In this paper, an efficient NTT architecture is presented to improve the respective computation time. We propose several...
Kyber on ARM64: Compact Implementations of Kyber on 64-bit ARM Cortex-A Processors
Pakize Sanal, Emrah Karagoz, Hwajeong Seo, Reza Azarderakhsh, Mehran Mozaffari-Kermani
Implementation
Public-key cryptography based on the lattice problem is efficient and believed to be secure in a post-quantum era. In this paper, we introduce carefully optimized implementations of Kyber encryption schemes for 64-bit ARM Cortex-A processors. Our research contribution includes several optimizations for Number Theoretic Transform (NTT), noise sampling, and AES accelerator based symmetric function implementations. The proposed Kyber512 implementation on ARM64 improved previous works by 1.72×,...
A Hardware Accelerator for Polynomial Multiplication Operation of CRYSTALS-KYBER PQC Scheme
Ferhat Yaman, Ahmet Can Mert, Erdinç Öztürk, Erkay Savaş
Cryptographic protocols
Polynomial multiplication is one of the most time-consuming operations utilized in lattice-based post-quantum cryptography (PQC) schemes. CRYSTALS-KYBER is a lattice-based key encapsulation mechanism (KEM) and it was recently announced as one of the four finalists at round three in NIST's PQC Standardization. Therefore, efficient implementations of polynomial multiplication operation are crucial for high-performance CRYSTALS-KYBER applications. In this paper, we propose three different...
A Configurable Hardware Implementation of XMSS
Jan Philipp Thoma, Tim Güneysu
Implementation
Quantum computers are about to herald a new age of cryptography. As a fundamental building block in today’s digitalized world, Digital Signature Schemes (DSS) provide the ability to authenticate messages exchanged over untrusted channels. Unfortunately, virtually all currently used DSS are built upon mathematical problems that can efficiently be solved using quantum computers, thus rendering schemes such as RSA and ECC insecure. Due to its conservative security properties, the eXtended...
Efficient Number Theoretic Transform Implementation on GPU for Homomorphic Encryption
Ozgun Ozerk, Can Elgezen, Ahmet Can Mert, Erdinc Ozturk, Erkay Savas
Implementation
Lattice-based cryptography forms the mathematical basis for homomorphic encryption, which allows computation directly on encrypted data. Homomorphic encryption enables privacy-preserving applications such as secure cloud computing; yet, its practical applications suffer from the high computational complexity of homomorphic operations. Fast implementations of the homomorphic encryption schemes heavily depend on efficient polynomial arithmetic; multiplication of very large degree polynomials...
An Area Aware Accelerator for Elliptic Curve Point Multiplication
Malik Imran, Samuel Pagliarini, Muhammad Rashid
Public-key cryptography
This work presents a hardware accelerator, for the optimization of latency and area at the same time, to improve the performance of point multiplication process in Elliptic Curve Cryptography. In order to reduce the overall computation time in the proposed 2-stage pipelined architecture, a rescheduling of point addition and point doubling instructions is performed along with an efficient use of required memory locations. Furthermore, a 41-bit multiplier is also proposed. Consequently, the...
A Fast and Compact RISC-V Accelerator for Ascon and Friends
Stefan Steinegger, Robert Primas
Implementation
Ascon-p is the core building block of Ascon, the winner in the lightweight category
of the CAESAR competition. With ISAP, another Ascon-p-based AEAD scheme is currently competing
in the 2nd round of the NIST lightweight cryptography standardization project.
In contrast to Ascon, ISAP focuses on providing hardening/protection against a large
class of implementation attacks, such as DPA, DFA, SFA, and SIFA, entirely on mode-level.
Consequently, Ascon-p can be used to realize a wide range of...
Proxy Re-Encryption for Accelerator Confidentiality in FPGA-Accelerated Cloud
Furkan Turan, Ingrid Verbauwhede
Cryptographic protocols
FPGAs offer many-fold acceleration to various application domains, and have become a part of cloud-based computation. However, their cloud-use introduce Cloud Service Provider (CSP) as trusted parties, who can access the hardware designs in plaintext. Therefore, the intellectual property of hardware designers is not protected against a dishonest cloud. In this paper, we propose a scheme for the confidentiality of accelerators on cloud, without limiting CSP to maintain their resources freely....
CryptoPIM: In-memory Acceleration for Lattice-based Cryptographic Hardware
Hamid Nejatollahi, Saransh Gupta, Mohsen Imani, Tajana Simunic Rosing, Rosario Cammarota, Nikil Dutt
Implementation
Quantum computers promise to solve hard mathematical problems such as integer factorization and discrete logarithms in polynomial time, making standardized public-key cryptography (such as digital signature and key agreement) insecure. Lattice-Based Cryptography (LBC) is a promising post-quantum public-key cryptographic protocol that could replace standardized public-key cryptography, thanks to the inherent post-quantum resistant properties, efficiency, and versatility. A key mathematical...
A >100 Gbps Inline AES-GCM Hardware Engine and Protected DMA Transfers between SGX Enclave and FPGA Accelerator Device
Santosh Ghosh, Luis S Kida, Soham Jayesh Desai, Reshma Lal
Implementation
This paper proposes a method to protect DMA data transfer that can be used to offload computation to an accelerator. The proposal minimizes changes in the hardware platform and to the application and SW stack. The paper de-scribes the end-to-end scheme to protect communication between an appli-cation running inside a SGX enclave and a FPGA accelerator optimized for bandwidth and latency and details the implementation of AES-GCM hard-ware engines with high bandwidth and low latency.
Investigating Profiled Side-Channel Attacks Against the DES Key Schedule
Johann Heyszl, Katja Miller, Florian Unterstein, Marc Schink, Alexander Wagner, Horst Gieser, Sven Freud, Tobias Damm, Dominik Klein, Dennis Kügler
Implementation
Recent publications describe profiled side-channel attacks (SCAs) against the DES key-schedule of a “commercially available security controller”. They report a significant reduction of the average remaining entropy of cryptographic keys after the attack, with large, key-dependent variations and results as low as a few bits using only a single attack trace. Unfortunately, they leave important questions unanswered: Is the reported wide distribution of results plausible? Are the results...
2019/1403
Last updated: 2019-12-14
No RISC, no Fun: Comparison of Hardware Accelerated Hash Functions for XMSS
Ingo Braun, Fabio Campos, Steffen Reith, Marc Stöttinger
Implementation
We investigate multiple implementations of a hash-based digital signature scheme in software and hardware for a RISC-V processor. For this, different instantiations of XMSS by leveraging SHA-256 and SHA-3 are considered. Moreover, we propose various optimisations for accelerating the signature scheme on resource-constrained FPGAs.
Compared to the pure software version, the implemented hardware accelerators for SHA-256 and SHA-3 achieve a significant speedup of 25x and 87x respectively for...
Voltage-based Covert Channels using FPGAs
Dennis R. E. Gnad, Cong Dang Khoa Nguyen, Syed Hashim Gillani, Mehdi B. Tahoori
Implementation
FPGAs are increasingly used in cloud applications and being integrated into Systems-on-Chip (SoCs).
For these systems, various side-channel attacks on cryptographic implementations have been reported, motivating to apply proper countermeasures.
Beyond cryptographic implementations, maliciously introduced covert channel receivers and transmitters can allow to exfiltrate other secret information from the FPGA.
In this paper, we present a fast covert channel on FPGAs, which exploits the on-chip...
Hardware-Software Co-Design Based Obfuscation of Hardware Accelerators
Abhishek Chakraborty, Ankur Srivastava
Implementation
Existing logic obfuscation approaches aim to protect
hardware design IPs from SAT attack by increasing query count
and output corruptibility of a locked netlist. In this paper, we
demonstrate the ineffectiveness of such techniques to obfuscate
hardware accelerator platforms. Subsequently, we propose
a Hardware/software co-design based Accelerator Obfuscation
(HSCAO) scheme to provably safeguard the IP of such designs
against SAT as well as removal/bypass type of attacks while
still...
SPQCop: Side-channel protected Post-Quantum Cryptoprocessor
Arpan Jati, Naina Gupta, Anupam Chattopadhyay, Somitra Kumar Sanadhya
Implementation
The past few decades have seen significant progress in practically realizable quantum technologies. It is well known since the work of Peter Shor that large scale quantum computers will threaten the security of most of the currently used public key cryptographic algorithms. This has spurred the cryptography community to design algorithms which will remain safe even with the emergence of large scale quantum computing systems. An effort in this direction is the currently ongoing post-quantum...
SIKE'd Up: Fast and Secure Hardware Architectures for Supersingular Isogeny Key Encapsulation
Brian Koziel, A-Bon Ackie, Rami El Khatib, Reza Azarderakhsh, Mehran Mozaffari-Kermani
Implementation
In this work, we present a fast parallel architecture to perform supersingular isogeny key encapsulation (SIKE). We propose and implement a fast isogeny accelerator architecture that uses fast and parallelized isogeny formulas. On top of our isogeny accelerator, we build a novel architecture for the SIKE primitive, which provides both quantum and IND-CCA security. Since SIKE can support static keys, we propose and implement additional differential power analysis countermeasures. We...
FPGA-based High-Performance Parallel Architecture for Homomorphic Computing on Encrypted Data
Sujoy Sinha Roy, Furkan Turan, Kimmo Jarvinen, Frederik Vercauteren, Ingrid Verbauwhede
Implementation
Homomorphic encryption is a tool that enables computation on encrypted data and thus has applications in privacy-preserving cloud computing. Though conceptually amazing, implementation of homomorphic encryption is very challenging and typically software implementations on general purpose computers are extremely slow. In this paper we present our year long effort to design a domain specific architecture in a heterogeneous Arm+FPGA platform to accelerate homomorphic computing on encrypted...
Anonymous Attestation for IoT
Santosh Ghosh, Andrew H. Reinders, Rafael Misoczki, Manoj R. Sastry
Implementation
Internet of Things (IoT) have seen tremendous growth and are being deployed pervasively in areas such as home, surveillance, health-care and transportation. These devices collect and process sensitive data with respect to user's privacy. Protecting the privacy of the user is an essential aspect of security, and anonymous attestation of IoT devices are critical to enable privacy-preserving mechanisms. Enhanced Privacy ID (EPID) is an industry-standard cryptographic scheme that offers...
Design and Implementation of a Fast and Scalable NTT-Based Polynomial Multiplier Architecture
Ahmet Can Mert, Erdinc Ozturk, Erkay Savas
Implementation
In this paper, we present an optimized FPGA implementation of a novel, fast and highly parallelized NTT-based polynomial multiplier architecture, which proves to be effective as an accelerator for lattice-based homomorphic cryptographic schemes. As input-output (I/O) operations are as time-consuming as NTT operations during homomorphic computations in a host processor/accelerator setting, instead of achieving the fastest NTT implementation possible on the target FPGA, we focus on a balanced...
XMSS and Embedded Systems - XMSS Hardware Accelerators for RISC-V
Wen Wang, Bernhard Jungk, Julian Wälde, Shuwen Deng, Naina Gupta, Jakub Szefer, Ruben Niederhagen
Implementation
We describe a software-hardware co-design for the hash-based post-quantum signature scheme XMSS on a RISC-V embedded processor. We provide software optimizations for the XMSS reference implementation for SHA-256 parameter sets and several hardware accelerators that allow to balance area usage and performance based on individual needs. By integrating our hardware accelerators into the RISC-V processor, the version with the best time-area product generates a key pair (that can be used to...
Remote Inter-Chip Power Analysis Side-Channel Attacks at Board-Level
Falk Schellenberg, Dennis R. E. Gnad, Amir Moradi, Mehdi B. Tahoori
Implementation
The current practice in board-level integration is to incorporate chips and components from numerous vendors. A fully trusted supply chain for all used components and chipsets is an important, yet extremely difficult to achieve, prerequisite to validate a complete board-level system for safe and secure operation. An increasing risk is that most chips nowadays run software or firmware, typically updated throughout the system lifetime, making it practically impossible to validate the full...
Thunderella: Blockchains with Optimistic Instant Confirmation
Rafael Pass, Elaine Shi
Cryptographic protocols
State machine replication, or “consensus”, is a central abstraction for distributed systems where
a set of nodes seek to agree on an ever-growing, linearly-ordered log. In this paper, we propose a
practical new paradigm called Thunderella for achieving state machine replication by combining a
fast, asynchronous path with a (slow) synchronous “fall-back” path (which only gets executed if
something goes wrong); as a consequence, we get simple state machine replications that essentially
are as...
End-to-end Design of a PUF-based Privacy Preserving Authentication Protocol
Aydin Aysu, Ege Gulcan, Daisuke Moriyama, Patrick Schaumont, Moti Yung
Implementation
We demonstrate a prototype implementation of a provably secure protocol that supports privacy-preserving mutual authentication between a server and a constrained device. Our proposed protocol is based on a physically unclonable function (PUF) and it is optimized for resource-constrained platforms. The reported results include a full protocol analysis, the design of its building blocks, their integration into a constrained device, and finally its performance evaluation. We show how to obtain...
cuHE: A Homomorphic Encryption Accelerator Library
Wei Dai, Berk Sunar
Implementation
We introduce a CUDA GPU library to accelerate evaluations with homomorphic schemes defined over polynomial rings enabled with a number of optimizations including algebraic techniques for efficient evaluation, memory minimization techniques, memory and thread scheduling and low level CUDA hand-tuned assembly optimizations to take full advantage of the mass parallelism and high memory bandwidth GPUs offer. The arithmetic functions constructed to handle very large polynomial operands using...
Power Analysis Attacks against IEEE 802.15.4 Nodes
Colin O'Flynn, Zhizhang Chen
Implementation
IEEE 802.15.4 is a wireless standard used by a variety of higher-level protocols, including many used in the Internet of Things (IoT). A number of system on a chip (SoC) devices that combine a radio transceiver with a microcontroller are available for use in IEEE 802.15.4 networks. IEEE 802.15.4 supports the use of AES-CCM* for encryption and authentication of messages, and a SoC normally includes an AES accelerator for this purpose. This work measures the leakage characteristics of the AES...
VLSI Implementation of Double-Base Scalar Multiplication on a Twisted Edwards Curve with an Efficiently Computable Endomorphism
Zhe Liu, Husen Wang, Johann Großschädl, Zhi Hu, Ingrid Verbauwhede
Implementation
The verification of an ECDSA signature requires a double-base scalar multiplication, an operation of the form $k \cdot G + l \cdot Q$ where $G$ is a generator of a large elliptic curve group of prime order $n$, $Q$ is an arbitrary element of said group, and $k$, $l$ are two integers in the range of $[1, n-1]$. We introduce in this paper an area-optimized VLSI design of a Prime-Field Arithmetic Unit (PFAU) that can serve as a loosely-coupled or tightly-coupled hardware accelerator in a...
Accelerating Somewhat Homomorphic Evaluation using FPGAs
Erdi̇̀nç Öztürk, Yarkın Doröz, Berk Sunar, Erkay Savaş
Implementation
After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a custom hardware accelerator optimized for a class of reconfigurable logic to bring LTV based somewhat homomorphic encryption (SWHE) schemes one step closer to deployment in real-life applications. The...
Side-Channel Security Analysis of Ultra-Low-Power FRAM-based MCUs
Amir Moradi, Gesine Hinterwälder
Implementation
By shrinking the technology and reducing the energy requirements of integrated circuits, producing ultra-low-power devices has practically become possible. Texas Instruments as a pioneer in developing FRAM-based products announced a couple of different microcontroller (MCU) families based on the low-power and fast Ferroelectric RAM technology. Such MCUs come with embedded cryptographic module(s) as well as the assertion that - due to the underlying ultra-low-power technology - mounting...
Efficient Pairings and ECC for Embedded Systems
Thomas Unterluggauer, Erich Wenger
The research on pairing-based cryptography brought forth a wide range of protocols interesting for future embedded applications. One significant obstacle for the widespread deployment of pairing-based cryptography are its tremendous hardware and software requirements. In this paper we present three side-channel protected hardware/software designs for pairing-based cryptography yet small and practically fast: our plain ARM Cortex-M0+-based design computes a pairing in less than one second....
Three Snakes in One Hole: The First Systematic Hardware Accelerator Design for SOSEMANUK with Optional Serpent and SNOW 2.0 Modes
Goutam Paul, Anupam Chattopadhyay
With increasing usage of hardware accelerators in modern heterogeneous
System-on-Chips (SoCs), the distinction between hardware and software is no longer rigid. The domain of cryptography is no exception and efficient hardware design of so-called software ciphers are becoming increasingly popular. In this paper, for the first time we propose an efficient hardware accelerator design for SOSEMANUK, one of the finalists of the eSTREAM stream cipher competition in the software category. Since...
High-Throughput Hardware Architecture for the SWIFFT / SWIFFTX Hash Functions
Tamas Gyorfi, Octavian Cret, Guillaume Hanrot, Nicolas Brisebarre
Introduced in 1996 and greatly developed over the last few years,
Lattice-based cryptography oers a whole set of primitives with nice features, including provable security and asymptotic efficiency. Going from \asymptotic" to \real-world" efficiency seems important as the set of available primitives increases in size and functionality. In this present paper, we explore the improvements that can be obtained through the use of an FPGA architecture for
implementing an ideal-lattice based...
Designing Integrated Accelerator for Stream Ciphers with Structural Similarities
Sourav Sen Gupta, Anupam Chattopadhyay, Ayesha Khalid
Implementation
Till date, the basic idea for implementing stream ciphers has been confined to individual standalone designs. In this paper, we introduce the notion of integrated implementation of multiple stream ciphers within a single architecture, where the goal is to achieve area and throughput efficiency by exploiting the structural similarities of the ciphers at an algorithmic level. We present two case studies to support our idea.
First, we propose the merger of SNOW 3G and ZUC stream ciphers, which...
Optimal Eta Pairing on Supersingular Genus-2 Binary Hyperelliptic Curves
Diego F. Aranha, Jean-Luc Beuchat, Jérémie Detrey, Nicolas Estibals
Public-key cryptography
This article presents a novel pairing algorithm over supersingular genus-$2$ binary hyperelliptic curves. Starting from Vercauteren's work on optimal pairings, we describe how to exploit the action of the $2^{3m}$-th power Verschiebung in order to reduce the loop length of Miller's algorithm even further than the genus-$2$ $\eta_T$ approach.
As a proof of concept, we detail an optimized software implementation and an FPGA accelerator for computing the proposed optimal Eta pairing on a...
Compact hardware for computing the Tate pairing over 128-bit-security supersingular curves
Nicolas Estibals
Implementation
This paper presents a novel method for designing compact yet efficient hardware implementations of the Tate pairing over supersingular curves in small characteristic. Since such curves are usually restricted to lower levels of security because of their bounded embedding degree, aiming for the recommended security of 128 bits implies considering them over very large finite fields. We however manage to mitigate this effect by considering curves over field extensions of moderately-composite...
Garbled Circuits for Leakage-Resilience: Hardware Implementation and Evaluation of One-Time Programs
Kimmo Järvinen, Vladimir Kolesnikov, Ahmad-Reza Sadeghi, Thomas Schneider
The power of side-channel leakage attacks on cryptographic implementations is evident.
Today's practical defenses are typically attack-specific countermeasures against certain classes of side-channel attacks.
The demand for a more general solution has given rise to the recent theoretical research that aims to build provably leakage-resilient cryptography.
This direction is, however, very new and still largely lacks practitioners' evaluation with regard to both efficiency and practical...
Practical Improvements of Profiled Side-Channel Attacks on a Hardware Crypto-Accelerator
M. Abdelaziz Elaabid, Sylvain Guilley
This article investigates the relevance of the theoretical framework on profiled side-channel attacks presented by F.-X. Standaert et al. at Eurocrypt 2009. The analyses consist in a case-study based on sidechannel measurements acquired experimentally from a hardwired cryptographic accelerator. Therefore, with respect to previous formal analyses carried out on software measurements or on simulated data, the investigations we describe are more complex, due to the underlying chip’s...
Fast Architectures for the $\eta_T$ Pairing over Small-Characteristic Supersingular Elliptic Curves
Jean-Luc Beuchat, Jérémie Detrey, Nicolas Estibals, Eiji Okamoto, Francisco Rodríguez-Henríquez
Implementation
This paper is devoted to the design of fast parallel accelerators for the cryptographic $\eta_T$ pairing on supersingular elliptic curves over finite fields of characteristics two and three. We propose here a novel hardware implementation of Miller's algorithm based on a parallel pipelined Karatsuba multiplier. After a short description of the strategies we considered to design our multiplier, we point out the intrinsic parallelism of Miller's loop and outline the architecture of...
Hardware Accelerator for the Tate Pairing in Characteristic Three Based on Karatsuba-Ofman Multipliers
Jean-Luc Beuchat, Jérémie Detrey, Nicolas Estibals, Eiji Okamoto, Francisco Rodríguez-Henríquez
Implementation
This paper is devoted to the design of fast parallel accelerators
for the cryptographic Tate pairing in characteristic three over
supersingular elliptic curves. We propose here a novel hardware
implementation of Miller's loop based on a pipelined Karatsuba-Ofman
multiplier. Thanks to a careful selection of algorithms for computing the tower field arithmetic associated to the Tate pairing, we manage to keep the pipeline busy. We also describe the strategies we
considered to design our...
FPGA and ASIC Implementations of the $\eta_T$ Pairing in Characteristic Three
Jean-Luc Beuchat, Hiroshi Doi, Kaoru Fujita, Atsuo Inomata, Piseth Ith, Akira Kanaoka, Masayoshi Katouno, Masahiro Mambo, Eiji Okamoto, Takeshi Okamoto, Takaaki Shiga, Masaaki Shirase, Ryuji Soga, Tsuyoshi Takagi, Ananda Vithanage, Hiroyasu Yamamoto
Implementation
Since their introduction in constructive cryptographic applications, pairings over (hyper)elliptic curves are at the heart of an ever increasing number of protocols. As they rely critically on efficient algorithms and implementations of pairing primitives, the study of hardware accelerators became an active research area.
In this paper, we propose two coprocessors for the reduced $\eta_T$ pairing introduced by Barreto {\it et al.} as an alternative means of computing the Tate pairing on...
A Comparison Between Hardware Accelerators for the Modified Tate Pairing over $\mathbb{F}_{2^m}$ and $\mathbb{F}_{3^m}$
Jean-Luc Beuchat, Nicolas Brisebarre, Jérémie Detrey, Eiji Okamoto, Francisco Rodríguez-Henríquez
Implementation
In this article we propose a study of the modified Tate pairing in characteristics two and three. Starting from the $\eta_T$ pairing introduced by Barreto {\em et al.} (Des Codes Crypt, 2007), we detail various algorithmic improvements in the case of characteristic two. As far as characteristic three is concerned, we refer to the survey by Beuchat {\em et al.} (ePrint 2007-417). We then show how to get back to the modified Tate pairing at almost no extra cost. Finally, we explore the...
Algorithms and Arithmetic Operators for Computing the $\eta_T$ Pairing in Characteristic Three
Jean-Luc Beuchat, Nicolas Brisebarre, Jérémie Detrey, Eiji Okamoto, Masaaki Shirase, Tsuyoshi Takagi
Implementation
Since their introduction in constructive cryptographic applications,
pairings over (hyper)elliptic curves are at the heart of an ever
increasing number of protocols. Software implementations being rather
slow, the study of hardware architectures became an active research
area.
In this paper, we discuss several algorithms to compute the $\eta_T$
pairing in characteristic three and suggest further improvements.
These algorithms involve addition, multiplication, cubing, inversion,
and...
Executing Modular Exponentiation on a Graphics Accelerator
Andrew Moss, Dan Page, Nigel Smart
Implementation
Demand in the consumer market for graphics hardware that accelerates rendering of 3D images has resulted in commodity devices capable of astonishing levels of performance. These results were achieved by specifically tailoring the hardware for the target domain. As graphics accelerators become increasingly programmable this performance makes them an attractive target for other domains. Specifically, they have motivated the transformation of costly algorithms from a general purpose...
Arithmetic Operators for Pairing-Based Cryptography
Jean-Luc Beuchat, Nicolas Brisebarre, Jérémie Detrey, Eiji Okamoto
Implementation
Since their introduction in constructive cryptographic applications,
pairings over (hyper)elliptic curves are at the heart of an ever
increasing number of protocols. Software implementations being rather
slow, the study of hardware architectures became an active research
area. In this paper, we first study an accelerator for the $\eta_T$
pairing over $\mathbb{F}_3[x]/(x^{97}+x^{12}+2)$. Our architecture is
based on a unified arithmetic operator which performs addition,
multiplication, and...
A Coprocessor for the Final Exponentiation of the $\eta_T$ Pairing in Characteristic Three
Jean-Luc Beuchat, Nicolas Brisebarre, Masaaki Shirase, Tsuyoshi Takagi, Eiji Okamoto
Implementation
Since the introduction of pairings over (hyper)elliptic curves in
constructive cryptographic applications, an ever increasing number of
protocols based on pairings have appeared in the literature. Software
implementations being rather slow, the study of hardware architectures
became an active research area. Beuchat et al. proposed for
instance a coprocessor which computes the characteristic three
$\eta_T$ pairing, from which the Tate pairing can easily be derived,
in $33$\,$\mu$s on a...
An Algorithm for the $\eta_T$ Pairing Calculation in Characteristic Three and its Hardware Implementation
Jean-Luc Beuchat, Masaaki Shirase, Tsuyoshi Takagi, Eiji Okamoto
Implementation
In this paper, we propose a modified $\eta_T$ pairing algorithm in
characteristic three which does not need any cube root extraction. We
also discuss its implementation on a low cost platform which hosts an
Altera Cyclone~II FPGA device. Our pairing accelerator is ten times
faster than previous known FPGA implementations in characteristic
three.
A Parallelization of ECDSA Resistant to Simple Power Analysis Attacks
Sarang Aravamuthan, Viswanatha Rao Thumparthy
Implementation
The Elliptic Curve Digital Signature Algorithm admits a natural parallelization wherein the point multiplication step can be split in two parts and executed in parallel. Further parallelism is achieved by executing a portion of the multiprecision arithmetic operations in parallel with point multiplication. This results in a saving in timing as well as gate count when the two paths are implemented in hardware and software. This article attempts to exploit this parallelism in a typical system...
The Matrix Equivalence Digital Signature (MEDS) scheme a code-based candidate in the first round of NIST’s Post-Quantum Cryptography (PQC) standardization process, offers competitively small signature sizes but incurs high computational costs for signing and verification. This work explores how a high-performance FPGA-based hardware implementation can enhance MEDS performance by leveraging the inherent parallelism of its computations, while examining the trade-offs between performance gains...
This work presents SPHINCSLET, the first fully standard-compliant and area-efficient hardware implementation of the SLH-DSA algorithm, formerly known as SPHINCS+, a post-quantum digital signature scheme. SPHINCSLET is designed to be parameterizable across different security levels and hash functions, offering a balanced trade-off between area efficiency and performance. Existing hardware implementations either feature a large area footprint to achieve fast signing and verification or adopt a...
(Preprint) Zero-Knowledge Proofs (ZKPs) are rapidly gaining importance in privacy-preserving and verifiable computing. ZKPs enable a proving party to prove the truth of a statement to a verifying party without revealing anything else. ZKPs have applications in blockchain technologies, verifiable machine learning, and electronic voting, but have yet to see widespread adoption due to the computational complexity of the proving process.Recent works have accelerated the key primitives of...
The transition to quantum-safe public-key cryptography has begun: for key agreement, NIST has standardized ML-KEM and selected HQC for future standardization. The relative immaturity of these schemes encourages crypto-agile implementations, to facilitate easy transitions between them. Intelligent crypto-agility requires efficient sharing strategies to compute operations from different cryptosystems using the same resources. This is particularly challenging for cryptosystems with distinct...
In this paper, we present a side-channel attack on the hardware AES accelerator of a Bluetooth chip used in millions of devices worldwide, ranging from wearables and smart home products to industrial IoT. The attack leverages information about AES computations unintentionally transmitted by the chip together with RF signals to recover the encryption key. Unlike traditional side-channel attacks that rely on power or near-field electromagnetic emissions as sources of information, RF-based...
Side-channel attacks pose significant threats to cryptographic implementations, which require the inclusion of countermeasures to mitigate these attacks. In this work, we study the masking of state-of-the-art post-quantum signatures based on the MPC-in-the-head paradigm. More precisely, we focus on the recent threshold-computation-in-the-head (TCitH) framework that applies to some NIST candidates of the post-quantum standardization process. We first provide an analysis of side-channel attack...
This work showcases Quatorze-bis, a state-of-the-art Number Theoretic Transform circuit for TFHE-like cryptosystems on FPGAs. It contains a novel modular multiplication design for modular multiplication with a constant for a constant modulus. This modular multiplication design does not require any DSP units or any dedicated multiplier unit, nor does it require extra logic when compared to the state-of-the-art modular multipliers. Furthermore, we present an implementation of a constant...
The paper is dedicated to Multivariate Cryptography over general commutative ring K and protocols of symbolic computations for safe delivery of multivariate maps. We consider itera-tive algorithm of generation of multivariate maps of prescribed degree or density with the trapdoor accelerator, i.e. piece of information which allows to compute the reimage of the map in polynomial time. The concept of Jordan-Gauss temporal graphs is used for the obfus-cation of known graph based public keys ...
Zero-knowledge proofs (ZKPs) are cryptographic protocols that enable one party to prove the validity of a statement without revealing any information beyond its truth. A central building block in many ZKPs are polynomial commitment schemes (PCS) where constructions with \textit{linear-time provers} are especially attractive. Two such examples are Brakedown and its extension Orion which enable linear-time and quantum-resistant proving by leveraging linear-time encodable Spielman codes....
The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving...
Modern cryptographic techniques such as fully homomorphic encryption (FHE) have recently gained broad attention. Most of these cryptosystems rely on lattice problems wherein polynomial multiplication forms the computational bottleneck. A popular method to accelerate these polynomial multiplications is the Number-Theoretic Transformation (NTT). Recent works aim to improve the practical deployability of NTT and propose toolchains supporting the NTT hardware accelerator design processes....
Jordan-Gauss graphs are bipartite graphs given by special quadratic equations over the commutative ring K with unity with partition sets K^n and K^m , n ≥m such that the neighbour of each vertex is defined by the system of linear equation given in its row-echelon form. We use families of this graphs for the construction of new quadratic and cubic surjective multivariate maps F of K^n onto K^m (or K^n onto K^n) with the trapdoor accelerators T , i. e. pieces of information which...
The Extended Greatest Common Divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and post-quantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bezout coefficients $b_a$ and $b_b$ which satisfy $\mathrm{GCD}(a,b) = a\times b_a + b\times b_b$. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation...
Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...
This paper presents extensions to the OpenTitan hardware root of trust that aim at enabling high-performance lattice-based cryptography. We start by carefully optimizing ML-KEM and ML-DSA - the two primary algorithms selected by NIST for standardization - in software targeting the OTBN accelerator. Based on profiling results of these implementations, we propose tightly integrated extensions to OTBN, specifically an interface from OTBN to OpenTitan's Keccak accelerator (KMAC core) and...
In this study, we introduce a new approach to secure computing by implementing a platform that utilizes a non-volatile memory express (NVMe)-based system with an FPGA-based Torus fully homomorphic encryption (TFHE) accelerator, solid state drive (SSD), and middleware on the host-side. Our platform is the first to offer completely secure computing capabilities for TFHE by using an FPGA-based accelerator. We defined secure computing instructions to evaluate 14-bit to 14-bit functions using...
Zero-knowledge proof systems are widely used in different applications on the Internet. Among zero-knowledge proof systems, SNARKs are a popular choice because of their fast verification time and small proof size. The efficiency of zero-knowledge systems is crucial for usability, resulting in the development of so-called arithmetization-oriented ciphers. In this work, we introduce Vision Mark-32, a modified instance of Vision defined over binary tower fields, with an optimized number of...
We suggest the family of ciphers s^E^n, n=2,3,.... with the space of plaintexts (Z*_{2^s})^n, s >1 such that the encryption map is the composition of kind G=G_1A_1G_2A_2 where A_i are the affine transformations from AGL_n(Z_{2^s}) preserving the variety (Z*_{2^s)}^n , Eulerian endomorphism G_i , i=1,2 of K[x_1, x_2,...., x_n] moves x_i to monomial term ϻ(x_1)^{d(1)}(x_2)^{d(2)}...(x_n)^{d(n)} , ϻϵ Z*_{2^s} and act on (Z*_{2^s})^n as bijective transformations. The cipher is...
Recognizing the importance of a fast and resource-efficient polynomial multiplication in homomorphic encryption, in this paper, we design a multiplier-less number theoretic transform using a Fermat number as an auxiliary modulus. To make this algorithm scalable with the degree of polynomial, we apply a univariate to multivariate polynomial ring transformation. We develop an accelerator architecture for fully homomorphic encryption using these algorithmic techniques for efficient...
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...
Homomorphic encryption (HE) has gained broad attention in recent years as it allows computations on encrypted data enabling secure cloud computing. Deploying HE presents a notable challenge since it introduces a performance overhead by orders of magnitude. Hence, most works target accelerating server-side operations on hardware platforms, while little attention has been given to client-side operations. In this paper, we present a novel design methodology to implement and accelerate the...
VDFs are characterized by sequential function evaluation but an immediate output verification. In order to ensure secure use of VDFs in real-world applications, it is important to determine the fastest implementation. Considering the point of view of an attacker (say with unbounded resources), this paper aims to accelerate the isogeny-based VDF proposed by De Feo-Mason-Petit-Sanso in 2019. It is the first work that implements a hardware accelerator for the evaluation step of an isogeny VDF....
NIST issued a new call in 2023 to diversify the portfolio of quantum-resistant digital signature schemes since the current portfolio relies on lattice problems. The MAYO scheme, which builds on the Unbalanced Oil and Vinegar (UOV) problem, is a promising candidate for this new call. MAYO introduces emulsifier maps and a novel 'whipping' technique to significantly reduce the key sizes compared to previous UOV schemes. This paper provides a comprehensive analysis of the implementation...
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation and has many applications. However, its practical implementation faces massive computation and memory overheads. To address this bottleneck, several Application-Specific Integrated Circuit (ASIC) FHE accelerators have been proposed. All these prior works put every component needed for FHE onto one chip (monolithic), hence offering high performance. However, they encounter common challenges associated with large-scale...
In this brief, we realize different architectural techniques towards improving the performance of post-quantum cryptography (PQC) algorithms when implemented as hardware accelerators on an application-specific integrated circuit (ASIC) platform. Having SABER as a case study, we designed a 256-bit wide architecture geared for high-speed cryptographic applications that incorporates smaller and distributed SRAM memory blocks. Moreover, we have adapted the building blocks of SABER to process...
The Number Theoretic Transform (NTT) is used to efficiently execute polynomial multiplication. It has become an important part of lattice-based post-quantum methods and the subsequent generation of standard cryptographic systems. However, implementing post-quantum schemes is challenging since they rely on intricate structures. This paper demonstrates how to develop a high-speed NTT multiplier highly optimized for FPGAs with few logical resources. We describe a novel architecture for NTT...
The hard mathematical problems that assure the security of our current public-key cryptography (RSA, ECC) are broken if and when a quantum computer appears rendering them ineffective for use in the quantum era. Lattice based cryptography is a novel approach to public key cryptography, of which the mathematical investigation (so far) resists attacks from quantum computers. By choosing a module learning with errors (MLWE) algorithm as the next standard, National Institute of Standard \&...
The magic of Fully Homomorphic Encryption (FHE) is that it allows operations on encrypted data without decryption. Unfortunately, the slow computation time limits their adoption. The slow computation time results from the vast memory requirements (64Kbits per ciphertext), a bootstrapping key of 1.3 GB, and sizeable computational overhead (10240 NTTs, each NTT requiring 5120 32-bit multiplications). We accelerate the FHEW bootstrapping in hardware on a high-end U280 FPGA. To reduce the...
Secure computation is of critical importance to not only the DoD, but across financial institutions, healthcare, and anywhere personally identifiable information (PII) is accessed. Traditional security techniques require data to be decrypted before performing any computation. When processed on untrusted systems the decrypted data is vulnerable to attacks to extract the sensitive information. To address these vulnerabilities Fully Homomorphic Encryption (FHE) keeps the data encrypted...
Ring-Learning-with-Errors (RLWE) has emerged as the foundation of many important techniques for improving security and privacy, including homomorphic encryption and post-quantum cryptography. While promising, these techniques have received limited use due to their extreme overheads of running on general-purpose machines. In this paper, we present a novel vector Instruction Set Architecture (ISA) and microarchitecture for accelerating the ring-based computations of RLWE. The ISA, named B512,...
During the last decade, there has been a stunning progress in the domain of AI with adoption in both safety-critical and security-critical applications. A key requirement for this is highly trained Machine Learning (ML) models, which are valuable Intellectual Property (IP) of the respective organizations. Naturally, these models have become targets for model recovery attacks through side-channel leakage. However, majority of the attacks reported in literature are either on simple embedded...
Because of the rapid growth of Internet of Things (IoT), embedded systems have become an interesting target for experienced attackers. ESP32~\cite{tech-ref-man} is a low-cost and low-power system on chip (SoC) series created by Espressif Systems. The firmware extraction of such embedded systems is a real threat to the manufacturer as it breaks its intellectual property and raises the risk of creating equivalent systems with less effort and resources. In 2019,...
Fully Homomorphic Encryption (FHE) is a technique that allows computation on encrypted data. It has the potential to drastically change privacy considerations in the cloud, but high computational and memory overheads are preventing its broad adoption. TFHE is a promising Torus-based FHE scheme that heavily relies on bootstrapping, the noise-removal tool invoked after each encrypted logical/arithmetical operation. We present FPT, a Fixed-Point FPGA accelerator for TFHE bootstrapping. FPT...
Multivariate rule x_i -> f_i, i = 1, 2, ..., n, f_i from K[x_1, x_2, ..., x_n] over commutative ring K defines endomorphism σ_n of K[x_1, x_2, ..., x_n] into itself given by its values on variables x_i. Degree of σ_n can be defined as maximum of degrees of polynomials f_i. We say that family σ_n, n = 2, 3, .... has trapdoor accelerator ^nT if the knowledge of the piece of information ^nT allows to compute reimage x of y = σ_n(x) in time O(n^2). We use extremal algebraic graphs for the...
CRYSTALS-Kyber (Kyber) was recently chosen as the first quantum resistant Key Encapsulation Mechanism (KEM) scheme for standardisation, after three rounds of the National Institute of Standards and Technology (NIST) initiated PQC competition which begin in 2016 and search of the best quantum resistant KEMs and digital signatures. Kyber is based on the Module-Learning with Errors (M-LWE) class of Lattice-based Cryptography, that is known to manifest efficiently on FPGAs. This work explores...
Public-key cryptography, including conventional cryptosystems and post-quantum cryptography, involves computation-intensive workloads. With noticing the extraordinary computing power of AI accelerators, in this paper, we further explore the feasibility to introduce AI accelerators into high-performance cryptographic computing. Since AI accelerators are dedicated to machine learning or neural networks, the biggest challenge is how to transform cryptographic workloads into their operations,...
Fully Homomorphic Encryption (FHE) allows for secure computation on encrypted data. Unfortunately, huge memory size, computational cost and bandwidth requirements limit its practicality. We present BASALISC, an architecture family of hardware accelerators that aims to substantially accelerate FHE computations in the cloud. BASALISC is the first to implement the BGV scheme with fully-packed bootstrapping – the noise removal capability necessary for arbitrary-depth computation. It supports a...
Along the rapid development in building large-scale quantum computers, post-quantum cryptography (PQC) has drawn significant attention from research community recently as it is proven that the existing public-key cryptosystems are vulnerable to the quantum attacks. Following this direction, this paper presents a novel implementation of high-performance polynomial multiplication hardware accelerators for key encapsulation mechanism (KEM) Saber and NTRU, two PQC algorithms that are currently...
Quantum computers will break cryptographic primitives that are based on integer factorization and discrete logarithm problems. SABER is a key agreement scheme based on the Learning With Rounding problem that is quantum-safe, i.e., resistant to quantum computer attacks. This article presents a high-speed silicon implementation of SABER in a 65nm technology as an Application Specific Integrated Circuit. The chip measures 1$mm^2$ in size and can operate at a maximum frequency of 715$MHz$ at a...
The looming threat of an adversary with Quantum computing capability led to a worldwide research effort towards identifying and standardizing novel post-quantum cryptographic primitives. Post-standardization, all existing security protocols will need to support efficient implementation of these primitives. In this work, we contribute to these efforts by reporting the smallest implementation of CRYSTALS-Dilithium, a finalist candidate for post-quantum digital signature. By invoking multiple...
Homomorphic encryption enables computation on encrypted data, and hence it has a great potential in privacy-preserving outsourcing of computations to the cloud. Hardware acceleration of homomorphic encryption is crucial as software implementations are very slow. In this paper, we present design methodologies for building a programmable hardware accelerator for speeding up the cloud-side homomorphic evaluations on encrypted data. First, we propose a divide-and-conquer technique that...
Fully homomorphic encryption enables computation on encrypted data, and hence it has a great potential in privacy-preserving outsourcing of computations. In this paper, we present a complete instruction-set processor architecture ‘Medha’ for accelerating the cloud-side operations of an RNS variant of the HEAAN homomorphic encryption scheme. Medha has been designed following a modular hardware design approach to attain a fast computation time for computationally expensive homomorphic...
In this paper, we introduce a configurable hardware architecture that can be used to generate unified and parametric NTT-based polynomial multipliers that support a wide range of parameters of lattice-based cryptographic schemes proposed for post-quantum cryptography. Both NTT and inverse NTT operations can be performed using the unified butterfly unit of our architecture, which constitutes the core building block in NTT operations. The multitude of this unit plays an essential role in...
The extended GCD (XGCD) calculation, which computes Bézout coefficients b_a, b_b such that b_a ∗ a_0 + b_b ∗ b_0 = GCD(a_0, b_0), is a critical operation in many cryptographic applications. In particular, large-integer XGCD is computationally dominant for two applications of increasing interest: verifiable delay functions that square binary quadratic forms within a class group and constant-time modular inversion for elliptic curve cryptography. Most prior work has focused on fast software...
This paper presents a design space exploration for SABER, one of the finalists in NIST’s quantum-resistant public-key cryptographic standardization effort. Our design space exploration targets a 65nmASIC platform and has resulted in the evaluation of 6 different architectures. Our exploration is initiated by setting a baseline architecture which is ported from FPGA. In order to improve the clock frequency (the primary goal in our exploration), we have employed several optimizations: (i) use...
In this work, we present a configurable and side channel resistant implementation of the post-quantum key-exchange algorithm Crystals-Kyber. The implemented design can be configured for different performance and area requirements leading to different trade-offs for different applications. A low area implementation can be achieved in 5269 LUTs and 2422 FFs, whereas a high performance implementation required 7151 LUTs and 3730 FFs. Due to a deeply pipelined architecture, a high operating speed...
Lattice sieving is currently the leading class of algorithms for solving the shortest vector problem over lattices. The computational difficulty of this problem is the basis for constructing secure post-quantum public-key cryptosystems based on lattices. In this paper, we present a novel massively parallel approach for solving the shortest vector problem using lattice sieving and hardware acceleration. We combine previously reported algorithms with a proper caching strategy and develop...
Abstract—CARiMoL is a novel run-time Configurable Hardware Accelerator for Ring and Module Lattice-based postquantum cryptography. It’s flexible design can be configured to key-pair generation, encapsulation, and decapsulation for NewHope and CRYSTALS-Kyber schemes using same hardware. CARiMoL offers run-time configurability for multiple security levels of NewHope and CRYSTALS-Kyber schemes, supporting both Chosen-Plaintext Attack (CPA) and Chosen-Ciphertext Attack (CCA) secure...
Software implementations of cryptographic algorithms are slow but highly flexible and relatively easy to implement. On the other hand, hardware implementations are usually faster but provide little flexibility and require a lot of time to implement efficiently. In this paper, we develop a hybrid software-hardware implementation of the third round of Supersingular Isogeny Key Encapsulation (SIKE), a post-quantum cryptography algorithm candidate for NIST. We implement an isogeny field...
This paper demonstrates an architecture for accelerating the polynomial multiplication using number theoretic transform (NTT). Kyber is one of the finalists in the third round of the NIST post-quantum cryptography standardization process. Simultaneously, the performance of NTT execution is its main challenge, requiring large memory and complex memory access pattern. In this paper, an efficient NTT architecture is presented to improve the respective computation time. We propose several...
Public-key cryptography based on the lattice problem is efficient and believed to be secure in a post-quantum era. In this paper, we introduce carefully optimized implementations of Kyber encryption schemes for 64-bit ARM Cortex-A processors. Our research contribution includes several optimizations for Number Theoretic Transform (NTT), noise sampling, and AES accelerator based symmetric function implementations. The proposed Kyber512 implementation on ARM64 improved previous works by 1.72×,...
Polynomial multiplication is one of the most time-consuming operations utilized in lattice-based post-quantum cryptography (PQC) schemes. CRYSTALS-KYBER is a lattice-based key encapsulation mechanism (KEM) and it was recently announced as one of the four finalists at round three in NIST's PQC Standardization. Therefore, efficient implementations of polynomial multiplication operation are crucial for high-performance CRYSTALS-KYBER applications. In this paper, we propose three different...
Quantum computers are about to herald a new age of cryptography. As a fundamental building block in today’s digitalized world, Digital Signature Schemes (DSS) provide the ability to authenticate messages exchanged over untrusted channels. Unfortunately, virtually all currently used DSS are built upon mathematical problems that can efficiently be solved using quantum computers, thus rendering schemes such as RSA and ECC insecure. Due to its conservative security properties, the eXtended...
Lattice-based cryptography forms the mathematical basis for homomorphic encryption, which allows computation directly on encrypted data. Homomorphic encryption enables privacy-preserving applications such as secure cloud computing; yet, its practical applications suffer from the high computational complexity of homomorphic operations. Fast implementations of the homomorphic encryption schemes heavily depend on efficient polynomial arithmetic; multiplication of very large degree polynomials...
This work presents a hardware accelerator, for the optimization of latency and area at the same time, to improve the performance of point multiplication process in Elliptic Curve Cryptography. In order to reduce the overall computation time in the proposed 2-stage pipelined architecture, a rescheduling of point addition and point doubling instructions is performed along with an efficient use of required memory locations. Furthermore, a 41-bit multiplier is also proposed. Consequently, the...
Ascon-p is the core building block of Ascon, the winner in the lightweight category of the CAESAR competition. With ISAP, another Ascon-p-based AEAD scheme is currently competing in the 2nd round of the NIST lightweight cryptography standardization project. In contrast to Ascon, ISAP focuses on providing hardening/protection against a large class of implementation attacks, such as DPA, DFA, SFA, and SIFA, entirely on mode-level. Consequently, Ascon-p can be used to realize a wide range of...
FPGAs offer many-fold acceleration to various application domains, and have become a part of cloud-based computation. However, their cloud-use introduce Cloud Service Provider (CSP) as trusted parties, who can access the hardware designs in plaintext. Therefore, the intellectual property of hardware designers is not protected against a dishonest cloud. In this paper, we propose a scheme for the confidentiality of accelerators on cloud, without limiting CSP to maintain their resources freely....
Quantum computers promise to solve hard mathematical problems such as integer factorization and discrete logarithms in polynomial time, making standardized public-key cryptography (such as digital signature and key agreement) insecure. Lattice-Based Cryptography (LBC) is a promising post-quantum public-key cryptographic protocol that could replace standardized public-key cryptography, thanks to the inherent post-quantum resistant properties, efficiency, and versatility. A key mathematical...
This paper proposes a method to protect DMA data transfer that can be used to offload computation to an accelerator. The proposal minimizes changes in the hardware platform and to the application and SW stack. The paper de-scribes the end-to-end scheme to protect communication between an appli-cation running inside a SGX enclave and a FPGA accelerator optimized for bandwidth and latency and details the implementation of AES-GCM hard-ware engines with high bandwidth and low latency.
Recent publications describe profiled side-channel attacks (SCAs) against the DES key-schedule of a “commercially available security controller”. They report a significant reduction of the average remaining entropy of cryptographic keys after the attack, with large, key-dependent variations and results as low as a few bits using only a single attack trace. Unfortunately, they leave important questions unanswered: Is the reported wide distribution of results plausible? Are the results...
We investigate multiple implementations of a hash-based digital signature scheme in software and hardware for a RISC-V processor. For this, different instantiations of XMSS by leveraging SHA-256 and SHA-3 are considered. Moreover, we propose various optimisations for accelerating the signature scheme on resource-constrained FPGAs. Compared to the pure software version, the implemented hardware accelerators for SHA-256 and SHA-3 achieve a significant speedup of 25x and 87x respectively for...
FPGAs are increasingly used in cloud applications and being integrated into Systems-on-Chip (SoCs). For these systems, various side-channel attacks on cryptographic implementations have been reported, motivating to apply proper countermeasures. Beyond cryptographic implementations, maliciously introduced covert channel receivers and transmitters can allow to exfiltrate other secret information from the FPGA. In this paper, we present a fast covert channel on FPGAs, which exploits the on-chip...
Existing logic obfuscation approaches aim to protect hardware design IPs from SAT attack by increasing query count and output corruptibility of a locked netlist. In this paper, we demonstrate the ineffectiveness of such techniques to obfuscate hardware accelerator platforms. Subsequently, we propose a Hardware/software co-design based Accelerator Obfuscation (HSCAO) scheme to provably safeguard the IP of such designs against SAT as well as removal/bypass type of attacks while still...
The past few decades have seen significant progress in practically realizable quantum technologies. It is well known since the work of Peter Shor that large scale quantum computers will threaten the security of most of the currently used public key cryptographic algorithms. This has spurred the cryptography community to design algorithms which will remain safe even with the emergence of large scale quantum computing systems. An effort in this direction is the currently ongoing post-quantum...
In this work, we present a fast parallel architecture to perform supersingular isogeny key encapsulation (SIKE). We propose and implement a fast isogeny accelerator architecture that uses fast and parallelized isogeny formulas. On top of our isogeny accelerator, we build a novel architecture for the SIKE primitive, which provides both quantum and IND-CCA security. Since SIKE can support static keys, we propose and implement additional differential power analysis countermeasures. We...
Homomorphic encryption is a tool that enables computation on encrypted data and thus has applications in privacy-preserving cloud computing. Though conceptually amazing, implementation of homomorphic encryption is very challenging and typically software implementations on general purpose computers are extremely slow. In this paper we present our year long effort to design a domain specific architecture in a heterogeneous Arm+FPGA platform to accelerate homomorphic computing on encrypted...
Internet of Things (IoT) have seen tremendous growth and are being deployed pervasively in areas such as home, surveillance, health-care and transportation. These devices collect and process sensitive data with respect to user's privacy. Protecting the privacy of the user is an essential aspect of security, and anonymous attestation of IoT devices are critical to enable privacy-preserving mechanisms. Enhanced Privacy ID (EPID) is an industry-standard cryptographic scheme that offers...
In this paper, we present an optimized FPGA implementation of a novel, fast and highly parallelized NTT-based polynomial multiplier architecture, which proves to be effective as an accelerator for lattice-based homomorphic cryptographic schemes. As input-output (I/O) operations are as time-consuming as NTT operations during homomorphic computations in a host processor/accelerator setting, instead of achieving the fastest NTT implementation possible on the target FPGA, we focus on a balanced...
We describe a software-hardware co-design for the hash-based post-quantum signature scheme XMSS on a RISC-V embedded processor. We provide software optimizations for the XMSS reference implementation for SHA-256 parameter sets and several hardware accelerators that allow to balance area usage and performance based on individual needs. By integrating our hardware accelerators into the RISC-V processor, the version with the best time-area product generates a key pair (that can be used to...
The current practice in board-level integration is to incorporate chips and components from numerous vendors. A fully trusted supply chain for all used components and chipsets is an important, yet extremely difficult to achieve, prerequisite to validate a complete board-level system for safe and secure operation. An increasing risk is that most chips nowadays run software or firmware, typically updated throughout the system lifetime, making it practically impossible to validate the full...
State machine replication, or “consensus”, is a central abstraction for distributed systems where a set of nodes seek to agree on an ever-growing, linearly-ordered log. In this paper, we propose a practical new paradigm called Thunderella for achieving state machine replication by combining a fast, asynchronous path with a (slow) synchronous “fall-back” path (which only gets executed if something goes wrong); as a consequence, we get simple state machine replications that essentially are as...
We demonstrate a prototype implementation of a provably secure protocol that supports privacy-preserving mutual authentication between a server and a constrained device. Our proposed protocol is based on a physically unclonable function (PUF) and it is optimized for resource-constrained platforms. The reported results include a full protocol analysis, the design of its building blocks, their integration into a constrained device, and finally its performance evaluation. We show how to obtain...
We introduce a CUDA GPU library to accelerate evaluations with homomorphic schemes defined over polynomial rings enabled with a number of optimizations including algebraic techniques for efficient evaluation, memory minimization techniques, memory and thread scheduling and low level CUDA hand-tuned assembly optimizations to take full advantage of the mass parallelism and high memory bandwidth GPUs offer. The arithmetic functions constructed to handle very large polynomial operands using...
IEEE 802.15.4 is a wireless standard used by a variety of higher-level protocols, including many used in the Internet of Things (IoT). A number of system on a chip (SoC) devices that combine a radio transceiver with a microcontroller are available for use in IEEE 802.15.4 networks. IEEE 802.15.4 supports the use of AES-CCM* for encryption and authentication of messages, and a SoC normally includes an AES accelerator for this purpose. This work measures the leakage characteristics of the AES...
The verification of an ECDSA signature requires a double-base scalar multiplication, an operation of the form $k \cdot G + l \cdot Q$ where $G$ is a generator of a large elliptic curve group of prime order $n$, $Q$ is an arbitrary element of said group, and $k$, $l$ are two integers in the range of $[1, n-1]$. We introduce in this paper an area-optimized VLSI design of a Prime-Field Arithmetic Unit (PFAU) that can serve as a loosely-coupled or tightly-coupled hardware accelerator in a...
After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a custom hardware accelerator optimized for a class of reconfigurable logic to bring LTV based somewhat homomorphic encryption (SWHE) schemes one step closer to deployment in real-life applications. The...
By shrinking the technology and reducing the energy requirements of integrated circuits, producing ultra-low-power devices has practically become possible. Texas Instruments as a pioneer in developing FRAM-based products announced a couple of different microcontroller (MCU) families based on the low-power and fast Ferroelectric RAM technology. Such MCUs come with embedded cryptographic module(s) as well as the assertion that - due to the underlying ultra-low-power technology - mounting...
The research on pairing-based cryptography brought forth a wide range of protocols interesting for future embedded applications. One significant obstacle for the widespread deployment of pairing-based cryptography are its tremendous hardware and software requirements. In this paper we present three side-channel protected hardware/software designs for pairing-based cryptography yet small and practically fast: our plain ARM Cortex-M0+-based design computes a pairing in less than one second....
With increasing usage of hardware accelerators in modern heterogeneous System-on-Chips (SoCs), the distinction between hardware and software is no longer rigid. The domain of cryptography is no exception and efficient hardware design of so-called software ciphers are becoming increasingly popular. In this paper, for the first time we propose an efficient hardware accelerator design for SOSEMANUK, one of the finalists of the eSTREAM stream cipher competition in the software category. Since...
Introduced in 1996 and greatly developed over the last few years, Lattice-based cryptography oers a whole set of primitives with nice features, including provable security and asymptotic efficiency. Going from \asymptotic" to \real-world" efficiency seems important as the set of available primitives increases in size and functionality. In this present paper, we explore the improvements that can be obtained through the use of an FPGA architecture for implementing an ideal-lattice based...
Till date, the basic idea for implementing stream ciphers has been confined to individual standalone designs. In this paper, we introduce the notion of integrated implementation of multiple stream ciphers within a single architecture, where the goal is to achieve area and throughput efficiency by exploiting the structural similarities of the ciphers at an algorithmic level. We present two case studies to support our idea. First, we propose the merger of SNOW 3G and ZUC stream ciphers, which...
This article presents a novel pairing algorithm over supersingular genus-$2$ binary hyperelliptic curves. Starting from Vercauteren's work on optimal pairings, we describe how to exploit the action of the $2^{3m}$-th power Verschiebung in order to reduce the loop length of Miller's algorithm even further than the genus-$2$ $\eta_T$ approach. As a proof of concept, we detail an optimized software implementation and an FPGA accelerator for computing the proposed optimal Eta pairing on a...
This paper presents a novel method for designing compact yet efficient hardware implementations of the Tate pairing over supersingular curves in small characteristic. Since such curves are usually restricted to lower levels of security because of their bounded embedding degree, aiming for the recommended security of 128 bits implies considering them over very large finite fields. We however manage to mitigate this effect by considering curves over field extensions of moderately-composite...
The power of side-channel leakage attacks on cryptographic implementations is evident. Today's practical defenses are typically attack-specific countermeasures against certain classes of side-channel attacks. The demand for a more general solution has given rise to the recent theoretical research that aims to build provably leakage-resilient cryptography. This direction is, however, very new and still largely lacks practitioners' evaluation with regard to both efficiency and practical...
This article investigates the relevance of the theoretical framework on profiled side-channel attacks presented by F.-X. Standaert et al. at Eurocrypt 2009. The analyses consist in a case-study based on sidechannel measurements acquired experimentally from a hardwired cryptographic accelerator. Therefore, with respect to previous formal analyses carried out on software measurements or on simulated data, the investigations we describe are more complex, due to the underlying chip’s...
This paper is devoted to the design of fast parallel accelerators for the cryptographic $\eta_T$ pairing on supersingular elliptic curves over finite fields of characteristics two and three. We propose here a novel hardware implementation of Miller's algorithm based on a parallel pipelined Karatsuba multiplier. After a short description of the strategies we considered to design our multiplier, we point out the intrinsic parallelism of Miller's loop and outline the architecture of...
This paper is devoted to the design of fast parallel accelerators for the cryptographic Tate pairing in characteristic three over supersingular elliptic curves. We propose here a novel hardware implementation of Miller's loop based on a pipelined Karatsuba-Ofman multiplier. Thanks to a careful selection of algorithms for computing the tower field arithmetic associated to the Tate pairing, we manage to keep the pipeline busy. We also describe the strategies we considered to design our...
Since their introduction in constructive cryptographic applications, pairings over (hyper)elliptic curves are at the heart of an ever increasing number of protocols. As they rely critically on efficient algorithms and implementations of pairing primitives, the study of hardware accelerators became an active research area. In this paper, we propose two coprocessors for the reduced $\eta_T$ pairing introduced by Barreto {\it et al.} as an alternative means of computing the Tate pairing on...
In this article we propose a study of the modified Tate pairing in characteristics two and three. Starting from the $\eta_T$ pairing introduced by Barreto {\em et al.} (Des Codes Crypt, 2007), we detail various algorithmic improvements in the case of characteristic two. As far as characteristic three is concerned, we refer to the survey by Beuchat {\em et al.} (ePrint 2007-417). We then show how to get back to the modified Tate pairing at almost no extra cost. Finally, we explore the...
Since their introduction in constructive cryptographic applications, pairings over (hyper)elliptic curves are at the heart of an ever increasing number of protocols. Software implementations being rather slow, the study of hardware architectures became an active research area. In this paper, we discuss several algorithms to compute the $\eta_T$ pairing in characteristic three and suggest further improvements. These algorithms involve addition, multiplication, cubing, inversion, and...
Demand in the consumer market for graphics hardware that accelerates rendering of 3D images has resulted in commodity devices capable of astonishing levels of performance. These results were achieved by specifically tailoring the hardware for the target domain. As graphics accelerators become increasingly programmable this performance makes them an attractive target for other domains. Specifically, they have motivated the transformation of costly algorithms from a general purpose...
Since their introduction in constructive cryptographic applications, pairings over (hyper)elliptic curves are at the heart of an ever increasing number of protocols. Software implementations being rather slow, the study of hardware architectures became an active research area. In this paper, we first study an accelerator for the $\eta_T$ pairing over $\mathbb{F}_3[x]/(x^{97}+x^{12}+2)$. Our architecture is based on a unified arithmetic operator which performs addition, multiplication, and...
Since the introduction of pairings over (hyper)elliptic curves in constructive cryptographic applications, an ever increasing number of protocols based on pairings have appeared in the literature. Software implementations being rather slow, the study of hardware architectures became an active research area. Beuchat et al. proposed for instance a coprocessor which computes the characteristic three $\eta_T$ pairing, from which the Tate pairing can easily be derived, in $33$\,$\mu$s on a...
In this paper, we propose a modified $\eta_T$ pairing algorithm in characteristic three which does not need any cube root extraction. We also discuss its implementation on a low cost platform which hosts an Altera Cyclone~II FPGA device. Our pairing accelerator is ten times faster than previous known FPGA implementations in characteristic three.
The Elliptic Curve Digital Signature Algorithm admits a natural parallelization wherein the point multiplication step can be split in two parts and executed in parallel. Further parallelism is achieved by executing a portion of the multiprecision arithmetic operations in parallel with point multiplication. This results in a saving in timing as well as gate count when the two paths are implemented in hardware and software. This article attempts to exploit this parallelism in a typical system...