121 results sorted by ID
PASTA on Edge: Cryptoprocessor for Hybrid Homomorphic Encryption
Aikata Aikata, Daniel Sanz Sobrino, Sujoy Sinha Roy
Implementation
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation but imposes significant computational and communication overhead on the client for the public-key encryption. To alleviate this burden, previous works have introduced the Hybrid Homomorphic Encryption (HHE) paradigm, which combines symmetric encryption with homomorphic decryption to enhance performance for the FHE client. While early HHE schemes focused on binary data, modern versions now support integer prime fields,...
Efficient Modular Multiplication Hardware for Number Theoretic Transform on FPGA
Tolun Tosun, Selim Kırbıyık, Emre Koçer, Erkay Savaş, Ersin Alaybeyoğlu
Implementation
In this paper, we present a comprehensive analysis of various modular multiplication methods for Number Theoretic Transform (NTT) on FPGA. NTT is a critical and time-intensive component of Fully Homomorphic Encryption (FHE) applications while modular multiplication consumes a significant portion of the design resources in an NTT implementation. We study the existing modular reduction approaches from the literature, and implement particular methods on FPGA. Specifically Word-Level Montgomery...
Improved ML-DSA Hardware Implementation With First Order Masking Countermeasure
Kamal Raj, Prasanna Ravi, Tee Kiah Chia, Anupam Chattopadhyay
Implementation
We present the protected hardware implementation of the Module-Lattice-Based Digital Signature Standard (ML-DSA). ML-DSA is an extension of Dilithium 3.1, which is the winner of the Post Quantum Cryptography (PQC) competition in the digital signature category. The proposed design is based on the existing high-performance Dilithium 3.1 design. We implemented existing Dilithium masking gadgets in hardware, which were only implemented in software. The masking gadgets are integrated with the...
PQNTRU: Acceleration of NTRU-based Schemes via Customized Post-Quantum Processor
Zewen Ye, Junhao Huang, Tianshun Huang, Yudan Bai, Jinze Li, Hao Zhang, Guangyan Li, Donglong Chen, Ray C.C. Cheung, Kejie Huang
Implementation
Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. The latest round of digital signature schemes introduced Hawk, both based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained...
The SMAesH dataset
Gaëtan Cassiers, Charles Momin
Implementation
Datasets of side-channel leakage measurements are widely used in research to develop and benchmarking side-channel attack and evaluation methodologies. Compared to using custom and/or one-off datasets, widely-used and publicly available datasets improve research reproducibility and comparability. Further, performing high-quality measurements requires specific equipment and skills, while also taking a significant amount of time. Therefore, using publicly available datasets lowers the barriers...
A Combined Design of 4-PLL-TRNG and 64-bit CDC-7-XPUF on a Zynq-7020 SoC
Oğuz Yayla, Yunus Emre Yılmaz
Implementation
True Random Number Generators (TRNGs) and Physically Unclonable Functions (PUFs) are critical hardware primitives for cryptographic systems, providing randomness and device-specific security. TRNGs require complete randomness, while PUFs rely on consistent, device-unique responses. In this work, both primitives are implemented on a System-on-Chip Field-Programmable Gate Array (SoC FPGA), leveraging the integrated Phase-Locked Loops (PLLs) for robust entropy generation in PLLbased TRNGs. A...
32-bit and 64-bit CDC-7-XPUF Implementations on a Zynq-7020 SoC
Oğuz Yayla, Yunus Emre Yılmaz
Implementation
Physically (or Physical) Unclonable Functions (PUFs) are basic and useful primitives in designing cryptographic systems. PUFs are designed to facilitate device authentication, secure boot, firmware integrity, and secure communications. To achieve these objectives, PUFs must exhibit both consistent repeatability and instance-specific randomness. The Arbiter PUF (APUF), recognized as the first silicon PUF, is capable of generating a substantial number of secret keys instantaneously based on...
Design and Implementation of a Fast, Platform-Adaptive, AIS-20/31 Compliant PLL-Based True Random Number Generator on a Zynq 7020 SoC FPGA
Oğuz Yayla, Yunus Emre Yılmaz
Implementation
Phase-locked loops (PLLs) integrated within field-programmable gate arrays (FPGAs) or System-on-Chip FPGAs (SoCs) represent a promising approach for generating random numbers. Their widespread deployment, isolated functionality within these devices, and robust entropy, as demonstrated in prior studies, position PLL-based true random number generators (PLL-TRNGs) as highly viable solutions for this purpose. This study explicitly examines PLL-TRNG implementations using the ZC702 Rev1.1...
FELIX (XGCD for FALCON): FPGA-based Scalable and Lightweight Accelerator for Large Integer Extended GCD
Sam Coulon, Tianyou Bao, Jiafeng Xie
Implementation
The Extended Greatest Common Divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and post-quantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bezout coefficients $b_a$ and $b_b$ which satisfy $\mathrm{GCD}(a,b) = a\times b_a + b\times b_b$. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation...
Rudraksh: A compact and lightweight post-quantum key-encapsulation mechanism
Suparna Kundu, Archisman Ghosh, Angshuman Karmakar, Shreyas Sen, Ingrid Verbauwhede
Public-key cryptography
Resource-constrained devices such as wireless sensors and Internet of Things (IoT) devices have become ubiquitous in our digital ecosystem. These devices generate and handle a major part of our digital data. In the face of the impending threat of quantum computers on our public-key infrastructure, it is impossible to imagine the security and privacy of our digital world without integrating post-quantum cryptography (PQC) into these devices. Usually, due to the resource constraints of these...
Reading It like an Open Book: Single-trace Blind Side-channel Attacks on Garbled Circuit Frameworks
Sirui Shen, Chenglu Jin
Attacks and cryptanalysis
Garbled circuits (GC) are a secure multiparty computation protocol that enables two parties to jointly compute a function using their private data without revealing it to each other. While garbled circuits are proven secure at the protocol level, implementations can still be vulnerable to side-channel attacks. Recently, side-channel analysis of GC implementations has garnered significant interest from researchers.
We investigate popular open-source GC frameworks and discover that the AES...
Side-Channel and Fault Resistant ASCON Implementation: A Detailed Hardware Evaluation (Extended Version)
Aneesh Kandi, Anubhab Baksi, Peizhou Gan, Sylvain Guilley, Tomáš Gerlich, Jakub Breier, Anupam Chattopadhyay, Ritu Ranjan Shrivastwa, Zdeněk Martinásek, Shivam Bhasin
Implementation
In this work, we present various hardware implementations for the lightweight cipher ASCON, which was recently selected as the winner of the NIST organized Lightweight Cryptography (LWC) competition. We cover encryption + tag generation and decryption + tag verification for the ASCON AEAD and also the ASCON hash function. On top of the usual (unprotected) implementation, we present side-channel protection (threshold countermeasure) and triplication/majority-based fault protection. To the...
An NVMe-based Secure Computing Platform with FPGA-based TFHE Accelerator
Yoshihiro Ohba, Tomoya Sanuki, Claude Gravel, Kentaro Mihara
Implementation
In this paper, we introduce a new approach to secure computing by implementing a platform that utilizes an NVMe-based system with an FPGA-based Torus FHE accelerator, SSD, and middleware on the host-side. Our platform is the first of its kind to offer complete secure computing capabilities for TFHE using an FPGA-based accelerator. We have defined secure computing instructions to evaluate 14-bit to 14-bit functions using TFHE, and our middleware allows for communication of ciphertexts, keys,...
Generalized Feistel Ciphers for Efficient Prime Field Masking - Full Version
Lorenzo Grassi, Loïc Masure, Pierrick Méaux, Thorben Moos, François-Xavier Standaert
Secret-key cryptography
A recent work from Eurocrypt 2023 suggests that prime-field masking has excellent potential to improve the efficiency vs. security tradeoff of masked implementations against side-channel attacks, especially in contexts where physical leakages show low noise. We pick up on the main open challenge that this seed result leads to, namely the design of an optimized prime cipher able to take advantage of this potential. Given the interest of tweakable block ciphers with cheap inverses in many...
1/0 Shades of UC: Photonic Side-Channel Analysis of Universal Circuits
Dev M. Mehta, Mohammad Hashemi, Domenic Forte, Shahin Tajik, Fatemeh Ganji
Attacks and cryptanalysis
A universal circuit (UC) can be thought of as a programmable circuit that can simulate any circuit up to a certain size by specifying its secret configuration bits. UCs have been incorporated into various applications, such as private function evaluation (PFE). Recently, studies have attempted to formalize the concept of semiconductor intellectual property (IP) protection in the context of UCs. This is despite the observations made in theory and practice that, in reality, the adversary may...
SDitH in Hardware
Sanjay Deshpande, James Howe, Jakub Szefer, Dongze Yue
Implementation
This work presents the first hardware realisation of the Syndrome-Decoding-in-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH's hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction.
This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for...
Accelerating Polynomial Multiplication for RLWE using Pipelined FFT
Neil Thanawala, Hamid Nejatollahi, Nikil Dutt
Implementation
The evolution of quantum algorithms threatens to break public key cryptography in polynomial time. The development of quantum-resistant algorithms for the post-quantum era has seen a significant growth in the field of
post quantum cryptography (PQC). Polynomial multiplication is the core of
Ring Learning with Error (RLWE) lattice based cryptography (LBC) which
is one of the most promising PQC candidates. In this work, we present the
design of fast and energy-efficient pipelined Number...
StaTI: Protecting against Fault Attacks Using Stable Threshold Implementations
Siemen Dhooghe, Artemii Ovchinnikov, Dilara Toprakhisar
Secret-key cryptography
Fault attacks impose a serious threat against the practical implementations of cryptographic algorithms. Statistical Ineffective Fault Attacks (SIFA), exploiting the dependency between the secret data and the fault propagation overcame many of the known countermeasures. Later, several countermeasures have been proposed to tackle this attack using error detection methods. However, the efficiency of the countermeasures, in part governed by the number of error checks, still remains a...
Threshold Implementations with Non-Uniform Inputs
Siemen Dhooghe, Artemii Ovchinnikov
Implementation
Modern block ciphers designed for hardware and masked with Threshold Implementations (TIs) provide provable security against first-order attacks. However, the application of TIs leaves designers to deal with a trade-off between its security and its cost, for example, the process to generate its required random bits. This generation cost comes with an increased overhead in terms of area and latency. Decreasing the number of random bits for the masking allows to reduce the aforementioned...
HaMAYO: A Fault-Tolerant Reconfigurable Hardware Implementation of the MAYO Signature Scheme
Oussama Sayari, Soundes Marzougui, Thomas Aulbach, Juliane Krämer, Jean-Pierre Seifert
Implementation
MAYO is a topical modification of the established multivariate signature scheme UOV. Signer and Verifier locally enlarge the public key map, such that the dimension of the oil space and therefore, the parameter sizes in general, can be reduced. This significantly reduces the public key size while maintaining the appealing properties of UOV, like short signatures and fast verification. Therefore, MAYO is considered as an attractive candidate in the NIST call for additional digital signatures...
Randomness Generation for Secure Hardware Masking - Unrolled Trivium to the Rescue
Gaëtan Cassiers, Loïc Masure, Charles Momin, Thorben Moos, Amir Moradi, François-Xavier Standaert
Implementation
Masking is a prominent strategy to protect cryptographic implementations against side-channel analysis. Its popularity arises from the exponential security gains that can be achieved for (approximately) quadratic resource utilization. Many variants of the countermeasure tailored for different optimization goals have been proposed. The common denominator among all of them is the implicit demand for robust and high entropy randomness. Simply assuming that uniformly distributed random bits are...
A Side-Channel Attack on a Masked Hardware Implementation of CRYSTALS-Kyber
Yanning Ji, Elena Dubrova
Attacks and cryptanalysis
NIST has recently selected CRYSTALS-Kyber as a new public key encryption and key establishment algorithm to be standardized. This makes it important to evaluate the resistance of CRYSTALS-Kyber implementations to side-channel attacks. Software implementations of CRYSTALS-Kyber have already been thoroughly analysed. The discovered vulnerabilities helped improve the subsequently released versions and promoted stronger countermeasures against side-channel attacks. In this paper, we present the...
BALoo: First and Efficient Countermeasure dedicated to Persistent Fault Attacks
Pierre-Antoine Tissot, Lilian Bossuet, Vincent Grosso
Implementation
Persistent fault analysis is a novel and efficient cryptanalysis method. The persistent fault attacks take advantage of a persistent fault injected in a non-volatile memory, then present on the device until the reboot of the device. Contrary to classical physical fault injection, where differential analysis can be performed, persistent fault analysis requires new analyses and dedicated countermeasures. Persistent fault analysis requires a persistent fault injected in the S-box such that the...
Automated Generation of Masked Nonlinear Components: From Lookup Tables to Private Circuits
Lixuan Wu, Yanhong Fan, Bart Preneel, Weijia Wang, Meiqin Wang
Implementation
Masking is considered to be an essential defense mechanism against side-channel attacks, but it is challenging to be adopted for hardware cryptographic implementations, especially for high security orders. Recently, Knichel et al. proposed an automated tool called AGEMA that enables the generation of masked implementations in hardware for arbitrary security orders using composable gadgets. This accelerates the construction and practical application of masking schemes. This article proposes a...
Bake It Till You Make It: Heat-induced Power Leakage from Masked Neural Networks
Dev M. Mehta, Mohammad Hashemi, David S. Koblah, Domenic Forte, Fatemeh Ganji
Applications
Masking has become one of the most effective approaches for securing hardware designs against side-channel attacks. Regardless of the effort put into correctly implementing masking schemes on a field-programmable gate array (FPGA), leakage can be unexpectedly observed. This is due to the fact that the assumption underlying all masked designs, i.e., the leakages of different shares are independent of each other, may no longer hold in practice. In this regard, extreme temperatures have been...
RDS: FPGA Routing Delay Sensors for Effective Remote Power Analysis Attacks
David Spielmann, Ognjen Glamocanin, Mirjana Stojilovic
Implementation
State-of-the-art sensors for measuring FPGA voltage fluctuations are time-to-digital converters (TDCs). They allow detecting voltage fluctuations in the order of a few nanoseconds. The key building component of a TDC is a delay line, typically implemented as a chain of fast carry propagation multiplexers. In FPGAs, the fast carry chains are constrained to dedicated logic and routing, and need to be routed strictly vertically. In this work, we present an alternative approach to designing...
A Closer Look at the Chaotic Ring Oscillators based TRNG Design
Shuqin Su, Bohan Yang, Vladimir Rožić, Mingyuan Yang, Min Zhu, Shaojun Wei, Leibo Liu
Implementation
TRNG is an essential component for security applications. A vulnerable TRNG could be exploited to facilitate potential attacks or be related to a reduced key space, and eventually results in a compromised cryptographic system. A digital FIRO-/GARO-based TRNG with high throughput and high entropy rate was introduced by Jovan Dj. Golić (TC’06). However, the fact that periodic oscillation is a main failure of FIRO-/GARO-based TRNGs is noticed in the paper (Markus Dichtl, ePrint’15). We verify...
FPT: a Fixed-Point Accelerator for Torus Fully Homomorphic Encryption
Michiel Van Beirendonck, Jan-Pieter D'Anvers, Furkan Turan, Ingrid Verbauwhede
Implementation
Fully Homomorphic Encryption (FHE) is a technique that allows computation on encrypted data. It has the potential to drastically change privacy considerations in the cloud, but high computational and memory overheads are preventing its broad adoption. TFHE is a promising Torus-based FHE scheme that heavily relies on bootstrapping, the noise-removal tool invoked after each encrypted logical/arithmetical operation.
We present FPT, a Fixed-Point FPGA accelerator for TFHE bootstrapping. FPT...
Side-Channel Attack Countermeasures Based On Clock Randomization Have a Fundamental Flaw
Martin Brisfors, Michail Moraitis, Elena Dubrova
Implementation
Clock randomization is one of the oldest countermeasures against side-channel attacks. Various implementations have been presented in the past, along with positive security evaluations. However, in this paper we show that it is possible to break countermeasures based on a randomized clock by sampling side-channel measurements at a frequency much higher than the encryption clock, synchronizing the traces with pre-processing, and targeting the beginning of the encryption.
We demonstrate a...
Exploring RNS for Isogeny-based Cryptography
David Jacquemin, Ahmet Can Mert, Sujoy Sinha Roy
Implementation
Isogeny-based cryptography suffers from a long-running time due to its requirement of a great amount of large integer arithmetic. The Residue Number System (RNS) can compensate for that drawback by making computation more efficient via parallelism. However, performing a modular reduction by a large prime which is not part of the RNS base is very expensive. In this paper, we propose a new fast and efficient modular reduction algorithm using RNS. Also, we evaluate our modular reduction method...
Second-Order Low-Randomness $d+1$ Hardware Sharing of the AES
Siemen Dhooghe, Aein Rezaei Shahmirzadi, Amir Moradi
Implementation
In this paper, we introduce a second-order masking of the AES using the minimal number of shares and a total of 1268 bits of randomness including the sharing of the plaintext and key. The masking of the S-box is based on the tower field decomposition of the inversion over bytes where the changing of the guards technique is used in order to re-mask the middle branch of the decomposition. The sharing of the S-box is carefully crafted such that it achieves first-order probing security without...
XOR Compositions of Physically Related Functions
Harishma Boyapally, Sikhar Patranabis, Debdeep Mukhopadhyay
Foundations
Physically related functions~(PReFs) are hardware primitives proposed to establish key-exchange between resource-constrained devices with no pre-established secrets. In this paper, we introduce XOR composition of PReFs to eliminate the requirement of revealing the complete functionality of the hardware primitive during the setup phase, which is a prerequisite to setup PReFs. We evaluate the quality of XOR\_PReF design by implementing them on Artix-7 FPGAs.
Logic Locking - Connecting Theory and Practice
Elisaweta Masserova, Deepali Garg, Ken Mai, Lawrence Pileggi, Vipul Goyal, Bryan Parno
Due to the complexity and the cost of producing integrated circuits, most hardware circuit designers outsource the manufacturing of their circuits to a third-party foundry. However, a dishonest foundry may abuse its access to the circuit's design in a variety of ways that undermine the designer's investment or potentially introduce vulnerabilities.
To combat these issues, the hardware community has developed the notion of logic locking, which allows the designer to send the foundry a...
A Key-Recovery Side-Channel Attack on Classic McEliece
Qian Guo, Andreas Johansson, Thomas Johansson
Public-key cryptography
In this paper, we propose the first key-recovery side-channel attack on Classic McEliece, a KEM finalist in the NIST Post-quantum Cryptography Standardization Project. Our novel idea is to design an attack algorithm where we submit special ciphertexts to the decryption oracle that correspond to cases of single errors. Decoding of such cipher-texts involves only a single entry in a large secret permutation, which is part of the secret key. Through an identified leakage in the additive...
Lightweight Hardware Accelerator for Post-Quantum Digital Signature CRYSTALS-Dilithium
Naina Gupta, Arpan Jati, Anupam Chattopadhyay, Gautam Jha
Implementation
The looming threat of an adversary with Quantum computing capability led to a worldwide research effort towards identifying and standardizing novel post-quantum cryptographic primitives. Post-standardization, all existing security protocols will need to support efficient implementation of these primitives. In this work, we contribute to these efforts by reporting the smallest implementation of CRYSTALS-Dilithium, a finalist candidate for post-quantum digital signature.
By invoking multiple...
Medha: Microcoded Hardware Accelerator for computing on Encrypted Data
Ahmet Can Mert, Aikata, Sunmin Kwon, Youngsam Shin, Donghoon Yoo, Yongwoo Lee, Sujoy Sinha Roy
Implementation
Homomorphic encryption enables computation on encrypted data, and hence it has a great potential in privacy-preserving outsourcing of computations to the cloud. Hardware acceleration of homomorphic encryption is crucial as software implementations are very slow. In this paper, we present design methodologies for building a programmable hardware accelerator for speeding up the cloud-side homomorphic evaluations on encrypted data.
First, we propose a divide-and-conquer technique that...
Complete and Improved FPGA Implementation of Classic McEliece
Po-Jen Chen, Tung Chou, Sanjay Deshpande, Norman Lahr, Ruben Niederhagen, Jakub Szefer, Wen Wang
Implementation
We present the first specification-compliant constant-time FPGA implementation of the Classic McEliece cryptosystem from the third-round of NIST's Post-Quantum Cryptography standardization process. In particular, we present the first complete implementation including encapsulation and decapsulation modules as well as key generation with seed expansion. All the hardware modules are parametrizable, at compile time, with security level and performance parameters. As the most time consuming...
High-Performance Hardware Implementation of Lattice-Based Digital Signatures
Luke Beckwith, Duc Tri Nguyen, Kris Gaj
Implementation
Many currently deployed public-key cryptosystems are based on the difficulty of the discrete logarithm and integer factorization problems. However, given an adequately sized quantum computer, these problems can be solved in polynomial time as a function of the key size. Due to the future threat of quantum computing to current cryptographic standards, alternative algorithms that remain secure under quantum computing are being evaluated for future use. As a part of this evaluation,...
Where Star Wars Meets Star Trek: SABER and Dilithium on the Same Polynomial Multiplier
Andrea Basso, Furkan Aydin, Daniel Dinu, Joseph Friel, Avinash Varna, Manoj Sastry, Santosh Ghosh
Implementation
Secure communication often require both encryption and digital signatures to guarantee the confidentiality of the message and the authenticity of the parties. However, post-quantum cryptographic protocols are often studied independently. In this work, we identify a powerful synergy between two finalist protocols in the NIST standardization process. In particular, we propose a technique that enables SABER and Dilithium to share the exact same polynomial multiplier. Since polynomial...
Ark of the ECC: An open-source ECDSA power analysis attack on a FPGA based Curve P-256 implementation
Jean-Pierre Thibault, Colin O’Flynn, Alex Dewar
Public-key cryptography
Power analysis attacks on ECC have been presented since almost the very beginning of DPA itself, even before the standardization of AES. Given that power analysis attacks against AES are well known and have a large body of practical artifacts to demonstrate attacks on both software and hardware implementations, it is surprising that these artifacts are generally lacking for ECC. In this work we begin to remedy this by providing a complete open-source ECDSA attack artifact, based on a...
High-Performance Hardware Implementation of CRYSTALS-Dilithium
Luke Beckwith, Duc Tri Nguyen, Kris Gaj
Implementation
Many currently deployed public-key cryptosystems are based on the difficulty of the discrete logarithm and integer factorization problems. However, given an adequately sized quantum computer, these problems can be solved in polynomial time as a function of the key size. Due to the future threat of quantum computing to current cryptographic standards, alternative algorithms that remain secure under quantum computing are being evaluated for future use. One such algorithm is CRYSTALS-Dilithium,...
Improving First-Order Threshold Implementations of SKINNY
Andrea Caforio, Daniel Collins, Ognjen Glamocanin, Subhadeep Banik
Implementation
Threshold Implementations have become a popular generic technique to construct
circuits resilient against power analysis attacks. In this paper, we look to devise efficient
threshold circuits for the lightweight block cipher family SKINNY. The only threshold circuits
for this family are those proposed by its designers who decomposed the 8-bit S-box into four
quadratic S-boxes, and constructed a 3-share byte-serial threshold circuit that executes the
substitution layer over four cycles. In...
Design Space Exploration of SABER in 65nm ASIC
Malik Imran, Felipe Almeida, Jaan Raik, Andrea Basso, Sujoy Sinha Roy, Samuel Pagliarini
Public-key cryptography
This paper presents a design space exploration for SABER, one of the finalists in NIST’s quantum-resistant public-key cryptographic standardization effort. Our design space exploration targets a 65nmASIC platform and has resulted in the evaluation of 6 different architectures. Our exploration is initiated by setting a baseline architecture which is ported from FPGA. In order to improve the clock frequency (the primary goal in our exploration), we have employed several optimizations: (i) use...
XDIVINSA: eXtended DIVersifying INStruction Agent to Mitigate Power Side-Channel Leakage
Thinh H. Pham, Ben Marshall, Alexander Fell, Siew-Kei Lam, Daniel Page
Implementation
Side-channel analysis (SCA) attacks pose a major threat to embedded systems due to their ease of accessibility. Realising SCA resilient cryptographic algorithms on embedded systems under tight intrinsic constraints, such as low area cost, limited computational ability, etc., is extremely challenging and often not possible. We propose a seamless and effective approach to realise a generic countermeasure against SCA attacks. XDIVINSA, an extended diversifying instruction agent, is introduced...
Guarding the First Order: The Rise of AES Maskings
Amund Askeland, Siemen Dhooghe, Svetla Nikova, Vincent Rijmen, Zhenda Zhang
Implementation
We provide three first-order hardware maskings of the AES, each allowing for a different trade-off between the number of shares and the number of register stages. All maskings use a generalization of the changing of the guards method enabling the re-use of randomness between masked S-boxes. As a result, the maskings do not require fresh randomness while still allowing for a minimal number of shares and providing provable security in the glitch-extended probing model.
The low-area...
Hardware Deployment of Hybrid PQC
Reza Azarderakhsh, Rami El Khatib, Brian Koziel, Brandon Langenberg
Implementation
In this work, we present a small architecture for quantum-safe hybrid key exchange targeting ECDH and SIKE. This is the first known hardware implementation of ECDH/SIKE-based hybrid key exchange in the literature. We propose new ECDH and EdDSA parameter sets defined over the SIKE primes. As a proof-of-concept, we evaluate SIKEX434, a hybrid PQC scheme composed of SIKEp434 and our proposed ECDH scheme X434 over a new, low-footprint architecture. Both schemes utilize the same 434-bit prime to...
2021/536
Last updated: 2021-12-29
Analyzing the Potential of Transport Triggered Architecture for Lattice-based Cryptography Algorithms
Latif AKÇAY, Berna ÖRS
Implementation
Lattice-based structures offer considerable possibilities for post-quantum cryptography. Recently, many algorithms have been built on hard lattice problems. The three of the remaining four in the final round of the post-quantum cryptography standardization process use lattice-based methods. Especially in embedded systems, these algorithms should be operated effectively. In this study, the potential of transport triggered architecture is examined in this sense. We try to compare open source...
A Hard Crystal - Implementing Dilithium on Reconfigurable Hardware
Georg Land, Pascal Sasdrich, Tim Güneysu
Implementation
CRYSTALS-Dilithium as a lattice-based digital signature scheme has been selected as a finalist in the PQC standardization process of NIST. As part of this selection, a variety of software implementations have been evaluated regarding their performance and memory requirements for platforms like x86 or ARM Cortex-M4. In this work, we present a first set of FPGA implementations for the low-end Xilinx Artix-7 platform, evaluating the peculiarities of the scheme in hardware, reflecting all...
\(\chi\)perbp: a Cloud-based Lightweight Mutual Authentication Protocol
Morteza Adeli, Nasour Bagheri, Sadegh Sadeghi, Saru Kumari
Cryptographic protocols
Alongside the development of cloud computing and Internet of Things(IoT), cloud-based RFID is receiving more attention nowadays.
Cloud-based RFID system is specifically developed to providing real-time data that can be fed to the cloud for easy access and instant data interpretation.
Security and privacy of constrained devices in these systems is a challenging issue for many applications. To deal with this problem, we propose \(\chi\)perbp, a lightweight authentication protocol based on...
Coco: Co-Design and Co-Verification of Masked Software Implementations on CPUs
Barbara Gigerl, Vedad Hadzic, Robert Primas, Stefan Mangard, Roderick Bloem
Implementation
The protection of cryptographic implementations against power analysis attacks is of critical importance for many applications in embedded systems. The typical approach of protecting against these attacks is to implement algorithmic countermeasures, like masking. However, implementing these countermeasures in a secure and correct manner is challenging. Masking schemes require the independent processing of secret shares, which is a property that is often violated by CPU microarchitectures in...
Implementation and Benchmarking of Round 2 Candidates in the NIST Post-Quantum Cryptography Standardization Process Using Hardware and Software/Hardware Co-design Approaches
Viet Ba Dang, Farnoud Farahmand, Michal Andrzejczak, Kamyar Mohajerani, Duc Tri Nguyen, Kris Gaj
Implementation
Performance in hardware has typically played a major role in differentiating among leading candidates in cryptographic standardization efforts. Winners of two past NIST cryptographic contests (Rijndael in case of AES and Keccak in case of SHA-3) were ranked consistently among the two fastest candidates when implemented using FPGAs and ASICs. Hardware implementations of cryptographic operations may quite easily outperform software implementations for at least a subset of major performance...
DANA - Universal Dataflow Analysis for Gate-Level Netlist Reverse Engineering
Nils Albartus, Max Hoffmann, Sebastian Temme, Leonid Azriel, Christof Paar
Applications
Reverse engineering of integrated circuits, i.e., understanding the internals
of IC, is required for many benign and malicious applications. Examples of the
former are detection of patent infringements, hardware Trojans or IP-theft, as
well as interface recovery and defect analysis, while malicious applications
include IP-theft and finding insertion points for hardware Trojans. However,
regardless of the application, the reverse engineer initially starts with a
large unstructured netlist,...
RISQ-V: Tightly Coupled RISC-V Accelerators for Post-Quantum Cryptography
Tim Fritzmann, Georg Sigl, Johanna Sepúlveda
Public-key cryptography
Empowering electronic devices to support Post-Quantum Cryptography (PQC) is a challenging task. PQC introduces new mathematical elements and operations which are usually not easy to implement on standard processors. Especially for low cost and resource constraint devices, hardware acceleration is usually required. In addition, as the standardization process of PQC is still ongoing, a focus on maintaining flexibility is mandatory. To cope with such requirements, hardware/software co-design...
ISA Extensions for Finite Field Arithmetic - Accelerating Kyber and NewHope on RISC-V
Erdem Alkim, Hülya Evkan, Norman Lahr, Ruben Niederhagen, Richard Petri
Implementation
We present and evaluate a custom extension to the RISC-V instruction set
for finite fields arithmetic. The result serves as a very compact
approach to software-hardware co-design of PQC implementations in the
context of small embedded processors such as smartcards. The extension
provides instructions that implement finite field operations with
subsequent reduction of the result. As small finite fields are used in
various PQC schemes, such instructions can provide a considerable
speedup for...
Post-Quantum Secure Architectures for Automotive Hardware Secure Modules
Wen Wang, Marc Stöttinger
Applications
The rapid development of information technology in the automotive industry has driven increasing requirements on incorporating security functionalities in the in-vehicle architecture, which is usually realized by adding a Hardware Secure Module (HSM) in the Electronic Central Unit (ECU). Therefore, secure communications can be enforced by carrying out secret cryptographic computations within the HSM by use of the embedded hardware accelerators. However, there is no common standard for...
Side Channel Information Set Decoding using Iterative Chunking
Norman Lahr, Ruben Niederhagen, Richard Petri, Simona Samardjiska
Implementation
This paper presents an attack based on side-channel information and Information Set Decoding (ISD) on the Niederreiter cryptosystem and an evaluation of the practicality of the attack using an electromagnetic side channel. First, we describe a basic plaintext-recovery attack on the decryption algorithm of the Niederreiter cryptosystem. In case the cryptosystem is used as Key-Encapsulation Mechanism (KEM) in a key exchange, the plaintext corresponds to a session key. Our attack is an...
Efficient Utilization of DSPs and BRAMs Revisited: New AES-GCM Recipes on FPGAs
Elif Bilge Kavun, Nele Mentens, Jo Vliegen, Tolga Yalcin
Implementation
In 2008, Drimer et al. proposed different AES implementations on a Xilinx Virtex-5 FPGA, making efficient use of the DSP slices and BRAM tiles available on the device. Inspired by their work, in this paper, we evaluate the feasibility of extending AES with the popular GCM mode of operation, still concentrating on the optimal use of DSP slices and BRAM tiles. We make use of a Xilinx Zynq UltraScale+ MPSoC FPGA with improved DSP features.
For the AES part, we implement Drimer’s round-based and...
Efficient FPGA Implementations of LowMC and Picnic
Daniel Kales, Sebastian Ramacher, Christian Rechberger, Roman Walch, Mario Werner
Implementation
Post-quantum cryptography has received increased attention in recent years, in particular, due to the standardization effort by NIST. One of the second-round candidates in the NIST post-quantum standardization project is Picnic, a post-quantum secure signature scheme based on efficient zero-knowledge proofs of knowledge. In this work, we present the first FPGA implementation of Picnic. We show how to efficiently calculate LowMC, the block cipher used as a one-way function in Picnic, in...
A Comparison of Chi^2-Test and Mutual Information as Distinguisher for Side-Channel Analysis
Bastian Richter, David Knichel, Amir Moradi
Implementation
Masking is known as the most widely studied countermeasure against side-channel analysis attacks.
Since a masked implementation is based on a certain number of shares (referred to as the order of masking), it still exhibits leakages at higher orders.
In order to exploit such leakages, higher-order statistical moments individually at each order need to be estimated reflecting the higher-order attacks.
Instead, Mutual Information Analysis (MIA) known for more than 10 years avoids such a...
SNEIK on Microcontrollers: AVR, ARMv7-M, and RISC-V with Custom Instructions
Markku-Juhani O. Saarinen
Implementation
SNEIK is a family of lightweight cryptographic algorithms derived from a
single 512-bit permutation. The SNEIGEN ``entropy distribution
function'' was designed to speed up certain functions in post-quantum
and lattice-based public key algorithms.
We implement and evaluate SNEIK algorithms on popular 8-bit AVR and 32-bit
ARMv7-M (Cortex M3/M4) microcontrollers, and also describe an
implementation for the open-source RISC-V (RV32I) Instruction Set
Architecture (ISA). Our results demonstrate...
Hardware Implementations of NIST Lightweight Cryptographic Candidates: A First Look
Behnaz Rezvani, Flora Coleman, Sachin Sachin, William Diehl
Implementation
Achieving security in the Internet of Things (IoT) is challenging. The need for lightweight yet robust cryptographic solutions suitable for the IoT calls for improved design and implementation of constructs such as authenticated encryption with associated data (AEAD) which can ensure confidentiality, integrity, and authenticity of data in one algorithm. The U.S. National Institute of Standards and Technology (NIST) has embarked on a multi-year effort called the lightweight cryptography...
DL-LA: Deep Learning Leakage Assessment: A modern roadmap for SCA evaluations
Thorben Moos, Felix Wegener, Amir Moradi
Implementation
In recent years, deep learning has become an attractive ingredient to side-channel analysis (SCA) due to its potential to improve the success probability or enhance the performance of certain frequently executed tasks. One task that is commonly assisted by machine learning techniques is the profiling of a device's leakage behavior in order to carry out a template attack. At CHES 2019, deep learning has also been applied to non-profiled scenarios for the first time, extending its reach within...
FPGA-based High-Performance Parallel Architecture for Homomorphic Computing on Encrypted Data
Sujoy Sinha Roy, Furkan Turan, Kimmo Jarvinen, Frederik Vercauteren, Ingrid Verbauwhede
Implementation
Homomorphic encryption is a tool that enables computation on encrypted data and thus has applications in privacy-preserving cloud computing. Though conceptually amazing, implementation of homomorphic encryption is very challenging and typically software implementations on general purpose computers are extremely slow. In this paper we present our year long effort to design a domain specific architecture in a heterogeneous Arm+FPGA platform to accelerate homomorphic computing on encrypted...
NIST Post-Quantum Cryptography- A Hardware Evaluation Study
Kanad Basu, Deepraj Soni, Mohammed Nabeel, Ramesh Karri
Implementation
Experts forecast that quantum computers can break classical cryptographic algorithms. Scientists are developing post quantum cryptographic (PQC) algorithms, that are invulnerable to quantum computer attacks. The National Institute of
Standards and Technology (NIST) started a public evaluation process to standardize quantum-resistant public key algorithms. The objective of our study is to provide a hardware comparison of the NIST PQC competition candidates. For this, we use a High-Level...
The BIG Cipher: Design, Security Analysis, and Hardware-Software Optimization Techniques
Anthony Demeri, Thomas Conroy, Alex Nolan, William Diehl
Secret-key cryptography
Secure block cipher design is a complex discipline which combines mathematics, engineering, and computer science. In order to develop cryptographers who are grounded in all three disciplines, it is necessary to undertake synergistic research as early as possible in technical curricula, particularly at the undergraduate university level. In this work, students are presented with a new block cipher, which is designed to offer moderate security while providing engineering and analysis...
Horizontal DEMA Attack as the Criterion to Select the Best Suitable EM Probe
Christian Wittke, Ievgen Kabin, Dan Klann, Zoya Dyka, Anton Datsuk, Peter Langendoerfer
Implementing cryptographic algorithms in a tamper resistant way is an extremely complex task as the algorithm used and the target platform have a significant impact on the potential leakage of the implementation. In addition the quality of the tools used for the attacks is of importance. In order to evaluate the resistance of a certain design against electromagnetic emanation attacks – as a highly relevant type of attacks – we discuss the quality of different electromagnetic (EM) probes as...
High-speed Side-channel-protected Encryption and Authentication in Hardware
Nele Mentens, Vojtech Miskovsky, Martin Novotny, Jo Vliegen
Implementation
This paper describes two FPGA implementations for the encryption and authentication of data, based on the AES algorithm running in Galois/Counter mode (AES-GCM). Both architectures are protected against side-channel analysis attacks through the use of a threshold implementation (TI). The first architecture is fully unrolled and optimized for throughput. The second architecture uses a round-based structure, fits on a relatively small FPGA board, and is evaluated for side-channel attack...
Jitter Estimation with High Accuracy for Oscillator-Based TRNGs
Shaofeng Zhu, Hua Chen, Limin Fan, Meihui Chen, Wei Xi, Dengguo Feng
Implementation
Ring oscillator-based true random number generators (RO-based TRNGs) are widely used to provide unpredictable random numbers for cryptographic systems. The unpredictability of the output numbers, which can be measured by entropy, is extracted from the jitter
of the oscillatory signal. To quantitatively evaluate the entropy, several stochastic models have been proposed, all of which take the jitter as a key input parameter. So it is crucial to accurately estimate the jitter in the process of...
Multiplicative Masking for AES in Hardware
Lauren De Meyer, Oscar Reparaz, Begül Bilgin
Implementation
Hardware masked AES designs usually rely on Boolean masking and perform the computation of the S-box using the tower-field decomposition. On the other hand, splitting sensitive variables in a multiplicative way is more amenable for the computation of the AES S-box, as noted by Akkar and Giraud. However, multiplicative masking needs to be implemented carefully not to be vulnerable to first-order DPA with a zero-value power model. Up to now, sound higher-order multiplicative masking schemes...
Rethinking Secure FPGAs: Towards a Cryptography-friendly Configurable Cell Architecture and its Automated Design Flow
Nele Mentens, Edoardo Charbon, Francesco Regazzoni
Implementation
This work proposes the first fine-grained configurable cell array specifically tailored for cryptographic implementations. The proposed architecture can be added to future FPGAs as an application-specific configurable building block, or to an ASIC as an embedded FPGA (eFPGA). The goal is to map cryptographic ciphers on combinatorial cells that are more efficient than general purpose lookup tables in terms of silicon area, configuration memory and combinatorial delay. As a first step in this...
Standard Lattice-Based Key Encapsulation on Embedded Devices
James Howe, Tobias Oder, Markus Krausz, Tim Güneysu
Implementation
Lattice-based cryptography is one of the most promising candidates being considered to replace current public-key systems in the era of quantum computing. In 2016, Bos et al. proposed the key exchange scheme FrodoCCS, that is also a submission to the NIST post-quantum standardization process, modified as a key encapsulation mechanism (FrodoKEM). The security of the scheme is based on standard lattices and the learning with errors problem. Due to the large parameters, standard lattice-based...
Secure Boot and Remote Attestation in the Sanctum Processor
Ilia Lebedev, Kyle Hogan, Srinivas Devadas
Foundations
During the secure boot process for a trusted execution environment, the processor must provide a chain of certificates to the remote client demonstrating that their secure container was established as specified. This certificate chain is rooted at the hardware manufacturer who is responsible for constructing chips according to the correct specification and provisioning them with key material. We consider a semi-honest manufacturer who is assumed to construct chips correctly, but may attempt...
Compact, Scalable, and Efficient Discrete Gaussian Samplers for Lattice-Based Cryptography
Ayesha Khalid, James Howe, Ciara Rafferty, Francesco Regazzoni, Maire O’Neill
Public-key cryptography
Lattice-based cryptography, one of the leading
candidates for post-quantum security, relies heavily on discrete
Gaussian samplers to provide necessary uncertainty, obfuscating
computations on secret information. For reconfigurable hardware,
the cumulative distribution table (CDT) scheme has previously
been shown to achieve the highest throughput and the
smallest resource utilisation, easily outperforming other existing
samplers. However, the CDT sampler does not scale well. In
fact, for...
A Comprehensive Performance Analysis of Hardware Implementations of CAESAR Candidates
Sachin Kumar, Jawad Haj-Yahya, Mustafa Khairallah, Mahmoud A. Elmohr, Anupam Chattopadhyay
Implementation
Authenticated encryption with Associated Data (AEAD) plays a significant role in cryptography because of its ability to provide integrity, confidentiality and authenticity at the same time. Due to the emergence of security at the edge of computing fabric, such as, sensors and smartphone devices, there is a growing need of lightweight AEAD ciphers. Currently, a worldwide contest, titled CAESAR, is being held to decide on a set of AEAD ciphers, which are distinguished by their security,...
Transparent Memory Encryption and Authentication
Mario Werner, Thomas Unterluggauer, Robert Schilling, David Schaffenrath, Stefan Mangard
Implementation
Security features of modern (SoC) FPAGs permit to protect the confidentiality of hard- and software IP when the devices are powered off as well as to validate the authenticity of IP when being loaded at startup. However, these approaches are insufficient since attackers with physical access can also perform attacks during runtime, demanding for additional security measures. In particular, RAM used by modern (SoC) FPGAs is under threat since RAM stores software IP as well as all kinds of...
Your Rails Cannot Hide From Localized EM: How Dual-Rail Logic Fails on FPGAs
Vincent Immler, Robert Specht, Florian Unterstein
Protecting cryptographic implementations against side-channel attacks is a must to prevent leakage of processed secrets. As a cell-level countermeasure, so called DPA-resistant logic styles have been proposed to prevent a data-dependent power consumption.
As most of the DPA-resistant logic is based on dual-rails, properly implementing them is a challenging task on FPGAs which is due to their fixed architecture and missing freedom in the design tools.
While previous works show a significant...
FPGA-based Key Generator for the Niederreiter Cryptosystem using Binary Goppa Codes
Wen Wang, Jakub Szefer, Ruben Niederhagen
This paper presents a post-quantum secure, efficient, and tunable
FPGA implementation
of the key-generation algorithm for the Niederreiter cryptosystem
using binary Goppa codes.
Our key-generator implementation requires as few as 896,052 cycles
to produce both public and private portions of a key,
and can achieve an estimated frequency Fmax
of over 240 MHz when synthesized for Stratix V FPGAs.
To the best of our knowledge,
this work is the first hardware-based implementation
that works with...
Security Analysis of Arbiter PUF and Its Lightweight Compositions Under Predictability Test
Phuong Ha Nguyen, Durga Prasad Sahoo, Rajat Subhra Chakraborty, Debdeep Mukhopadhyay
Applications
Unpredictability is an important security property of Physically Unclonable Function (PUF) in the context of statistical attacks, where the correlation between challenge-response pairs is explicitly exploited. In existing literature on PUFs, Hamming Distance test, denoted by $\mathrm{HDT}(t)$, was proposed to evaluate the unpredictability of PUFs, which is a simplified case of the Propagation Criterion test $\mathrm{PC}(t)$. The objective of these testing schemes is to estimate the output...
SafeDRP: Yet Another Way Toward Power-Equalized Designs in FPGA
Maik Ender, Alexander Wild, Amir Moradi
Implementation
Side-channel analysis attacks, particularly power analysis attacks, have become one of the major threats, that hardware designers have to deal with. To defeat them, the majority of the known concepts are based on either masking, hiding, or rekeying (or a combination of them). This work deals with a hiding scheme, more precisely a power-equalization technique which is ideally supposed to make the amount of power consumption of the device independent of its processed data. We propose and...
Dissecting Leakage Resilient PRFs with Multivariate Localized EM Attacks - A Practical Security Evaluation on FPGA
Florian Unterstein, Johann Heyszl, Fabrizio De Santis, Robert Specht
In leakage-resilient symmetric cryptography, two important concepts have been proposed in order to decrease the success rate of differential side-channel attacks. The first one is to limit the attacker’s data complexity by restricting the number of observable inputs; the second one is to create correlated algorithmic noise by using parallel S-boxes with equal inputs. The latter hinders the typical divide and conquer approach of differential side-channel attacks and makes key recovery much...
IPcore implementation susceptibility: A case study of Low latency ciphers
Dillibabu Shanmugam, Ravikumar Selvam, Suganya Annadurai
Implementation
Security evaluation of third-party cryptographic IP (Intellectual Property) cores is often ignored due to several reasons including, lack of awareness about its adversity, lack of trust validation methodology otherwise view security as a byproduct. Particularly, the validation of low latency cipher IP core on Internet of Things (IoT) devices is crucial as they may otherwise become vulnerable for information theft. In this paper, we share an (Un)intentional way of cipher implementation as IP...
Side-Channel Plaintext-Recovery Attacks on Leakage-Resilient Encryption
Thomas Unterluggauer, Mario Werner, Stefan Mangard
Implementation
Differential power analysis (DPA) is a powerful tool to extract the key of a cryptographic implementation from observing its power consumption during the en-/decryption of many different inputs. Therefore, cryptographic schemes based on frequent re-keying such as leakage-resilient encryption aim to inherently prevent DPA on the secret key by limiting the amount of data being processed under one key. However, the original asset of encryption, namely the plaintext, is disregarded.
This paper...
Fast Hardware Architectures for Supersingular Isogeny Diffie-Hellman Key Exchange on FPGA
Brian Koziel, Reza Azarderakhsh, Mehran Mozaffari Kermani
In this paper, we present a constant-time hardware implementation that achieves new speed records for the supersingular isogeny Diffie-Hellman (SIDH), even when compared to highly optimized Haswell computer architectures. We employ inversion-free projective isogeny formulas presented by Costello et al. at CRYPTO 2016 on an FPGA. Modern FPGA's can take advantage of heavily parallelized arithmetic in $\mathbb{F}_{p^{2}}$, which lies at the foundation of supersingular isogeny arithmetic....
Tile-Based Modular Architecture for Accelerating Homomorphic Function Evaluation on FPGA
Mustafa Khairallah, Maged Ghoneima
In this paper, a new architecture for accelerating homomorphic function evaluation on FPGA is proposed. A parallel cached NTT algorithm with an overall time complexity O(sqrt(N)log(sqrt(N)) is presented. The architecture has been implemented on Xilinx Virtex 7 XC7V1140T FPGA. achieving a 60% utilization ratio. The implementation performs 32-bit 2^(16)-point NTT algorithm in 23.8 us, achieving speed-up of 2x over the state of the art architectures. The architecture has been evaluated by...
Strong Machine Learning Attack against PUFs with No Mathematical Model
Fatemeh Ganji, Shahin Tajik, Fabian Fäßler, Jean-Pierre Seifert
Although numerous attacks revealed the vulnerability of different PUF families to non-invasive Machine Learning (ML) attacks, the question is still open whether all PUFs might be learnable. Until now, virtually all ML attacks rely on the assumption that a mathematical model of the PUF functionality is known a priori. However, this is not always the case, and attention should be paid to this important aspect of ML attacks. This paper aims to address this issue by providing a provable...
Efficient High-Speed WPA2 Brute Force Attacks using Scalable Low-Cost FPGA Clustering
Markus Kammerstetter, Markus Muellner, Daniel Burian, Christian Kudera, Wolfgang Kastner
WPA2-Personal is widely used to protect Wi-Fi networks against illicit access.
While attackers typically use GPUs to speed up the discovery of weak network passwords, attacking random passwords is considered to quickly become infeasible with increasing password length.
Professional attackers may thus turn to commercial high-end FPGA-based cluster solutions to significantly increase the speed of those attacks.
Well known manufacturers such as Elcomsoft have succeeded in creating...
White-Box Cryptography in the Gray Box - A Hardware Implementation and its Side Channels
Pascal Sasdrich, Amir Moradi, Tim Güneysu
Implementations of white-box cryptography aim to protect a secret key in a white-box environment in which an adversary has full control over the execution process and the entire environment. Its fundamental principle is the map of the cryptographic architecture, including the secret key, to a number of encoded tables that shall resist the inspection and decomposition of an attacker. In a gray-box scenario, however, the property of hiding required implementation details from the attacker...
Side-Channel Watchdog: Run-Time Evaluation of Side-Channel Vulnerability in FPGA-Based Crypto-systems
Souvik Sonar, Debapriya Basu Roy, Rajat Subhra Chakraborty, Debdeep Mukhopadhyay
Implementation
Besides security against classical cryptanalysis, its important
for cryptographic implementations to have sufficient robustness against
side-channel attacks. Many countermeasures have been proposed to thwart
side channel attacks, especially power trace measurement based side
channel attacks. Additionally, researchers have proposed several evaluation
metrics to evaluate side channel security of crypto-system. However,
evaluation of any crypto-system is done during the testing phase and is
not...
Secure Execution Architecture based on PUF-driven Instruction Level Code Encryption
Stephan Kleber, Florian Unterstein, Matthias Matousek, Frank Kargl, Frank Slomka, Matthias Hiller
Implementation
A persistent problem with program execution, despite numerous mitigation attempts, is its inherent vulnerability to the injection of malicious code. Equally unsolved is the susceptibility of firmware to reverse engineering, which undermines the manufacturer's code confidentiality. We propose an approach that solves both kinds of security problems employing instruction-level code encryption combined with the use of a physical unclonable function (PUF). Our novel Secure Execution PUF-based...
FROPUF: How to Extract More Entropy from Two Ring Oscillators in FPGA-Based PUFs
Qinglong Zhang, Zongbin Liu, Cunqing Ma, Changting Li, Jiwu Jing
Foundations
Ring oscillator (RO) based physically unclonable function
(PUF) on FPGAs is crucial and popular for its nice properties and easy
implementation. The compensated measurement based on the ratio of
two ring oscillators’ frequencies proves to be particularly effective to extract
entropy of process variations. However from two ring oscillators
only one bit entropy is extracted and RO PUFs will occupy numerous
resource with the size of private information increasing. Motivated by this
inefficient...
Smaller Keys for Code-Based Cryptography: QC-MDPC McEliece Implementations on Embedded Devices
Stefan Heyse, Ingo von Maurich, Tim Güneysu
Implementation
In the last years code-based cryptosystems were established as promising alternatives for asymmetric cryptography since they base their security on well-known NP-hard problems and still show decent performance on a wide range of computing platforms. The main drawback of code-based schemes, including the popular proposals by McEliece and Niederreiter, are the large keys whose size is inherently determined by the underlying code. In a very recent approach, Misoczki et al. proposed to use...
Modular Hardware Architecture for Somewhat Homomorphic Function Evaluation
Sujoy Sinha Roy, Kimmo Järvinen, Frederik Vercauteren, Vassil Dimitrov, Ingrid Verbauwhede
Implementation
We present a hardware architecture for all building blocks required in polynomial ring based fully homomorphic schemes and use it to instantiate the somewhat homomorphic encryption scheme YASHE. Our implementation is the first FPGA implementation that is designed for evaluating functions on homomorphically encrypted data (up to a certain multiplicative depth) and we illustrate this capability by evaluating the SIMON-64/128 block cipher in the encrypted domain. Our implementation provides a...
Accelerating Somewhat Homomorphic Evaluation using FPGAs
Erdi̇̀nç Öztürk, Yarkın Doröz, Berk Sunar, Erkay Savaş
Implementation
After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a custom hardware accelerator optimized for a class of reconfigurable logic to bring LTV based somewhat homomorphic encryption (SWHE) schemes one step closer to deployment in real-life applications. The...
Achieving Side-Channel Protection with Dynamic Logic Reconfiguration on Modern FPGAs
Pascal Sasdrich, Amir Moradi, Oliver Mischke, Tim Güneysu
Implementation
Reconfigurability is a unique feature of modern FPGA devices to load hardware circuits just on demand.
This also implies that a completely different set of circuits might operate at the exact same location of the FPGA at different time slots, making it difficult for an external observer or attacker to predict what will happen at what time.
In this work we present and evaluate a novel hardware implementation of the lightweight cipher PRESENT with built-in side-channel countermeasures based on...
Evaluating the Duplication of Dual-Rail Precharge Logics on FPGAs
Alexander Wild, Amir Moradi, Tim Güneysu
Implementation
Power-equalization schemes for digital circuits aim to harden cryptographic designs against power analysis attacks. With respect to dual-rail logics most of these schemes have originally been designed for ASIC platforms, but much efforts have been spent to map them to FPGAs as well. A particular challenge is here to apply those schemes to the predefined logic structures of FPGAs (i.e., slices, LUTs, FFs, and routing switch boxes) for which special tools are required. Due to the absence of...
Side-Channel Protection by Randomizing Look-Up Tables on Reconfigurable Hardware - Pitfalls of Memory Primitives
Pascal Sasdrich, Oliver Mischke, Amir Moradi, Tim Güneysu
Implementation
Block Memory Content Scrambling (BMS), presented at CHES 2011, enables an effective way of first-order side-channel protection for cryptographic primitives at the cost of a significant reconfiguration time for the mask update. In this work we analyze alternative ways to implement dynamic first-order masking of AES with randomized look-up tables that can reduce this mask update time. The memory primitives we consider in this work include three distributed RAM components (RAM32M, RAM64M, and...
GliFreD: Glitch-Free Duplication - Towards Power-Equalized Circuits on FPGAs
Alexander Wild, Amir Moradi, Tim Güneysu
Implementation
Designers of secure hardware are required to harden their implementations against physical threats, such as power analysis attacks. In particular, cryptographic hardware circuits are required to decorrelate their current consumption from the information inferred by processing (secret) data. A common technique to achieve this goal is the use of special logic styles that aim at equalizing the current consumption at each single processing step. However, since all hiding techniques like...
How Different Electrical Circuits of ECC Designs Influence the Shape of Power Traces measured on FPGA
Thomas Basmer, Christian Wittke, Zoya Dyka, Peter Langendoerfer
Public-key cryptography
Side channel and fault attacks take advantage from the fact that the behavior of crypto implementations can be observed and provide hints that simplify revealing keys. These attacks use identical devices either for preparation of attacks or for measurements. By the preparation of attacks the structure and the electrical circuit of devices, that are identical to the target, is analyzed. By side channel attacks usually the same device is used many times for measurements, i.e. measurements on...
Wire-Tap Codes as Side-Channel Countermeasure - an FPGA-based experiment
Amir Moradi
Implementation
In order to provide security against side-channel attacks a masking scheme which makes use of wire-tap codes has recently been proposed. The scheme benefits from the features of binary linear codes, and its application to AES has been presented in the seminal article. In this work – with respect to the underlying scheme – we re-iterate the fundamental operations of the AES cipher in a hopefully more understandable terminology. Considering an FPGA platform we address the challenges each AES...
FPGA Trojans through Detecting and Weakening of Cryptographic Primitives
Pawel Swierczynski, Marc Fyrbiak, Philipp Koppe, Christof Paar
This paper investigates a novel attack vector against
cryptography realized on FPGAs, which poses a serious threat to
real-world applications.We demonstrate how a targeted bitstream
modification can seriously weaken cryptographic algorithms,
which we show with the examples of AES and 3DES. The attack
is performed by modifying the FPGA bitstream that configures
the hardware elements during initialization. Recently, it has
been shown that cloning of FPGA designs is feasible, even if
the...
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation but imposes significant computational and communication overhead on the client for the public-key encryption. To alleviate this burden, previous works have introduced the Hybrid Homomorphic Encryption (HHE) paradigm, which combines symmetric encryption with homomorphic decryption to enhance performance for the FHE client. While early HHE schemes focused on binary data, modern versions now support integer prime fields,...
In this paper, we present a comprehensive analysis of various modular multiplication methods for Number Theoretic Transform (NTT) on FPGA. NTT is a critical and time-intensive component of Fully Homomorphic Encryption (FHE) applications while modular multiplication consumes a significant portion of the design resources in an NTT implementation. We study the existing modular reduction approaches from the literature, and implement particular methods on FPGA. Specifically Word-Level Montgomery...
We present the protected hardware implementation of the Module-Lattice-Based Digital Signature Standard (ML-DSA). ML-DSA is an extension of Dilithium 3.1, which is the winner of the Post Quantum Cryptography (PQC) competition in the digital signature category. The proposed design is based on the existing high-performance Dilithium 3.1 design. We implemented existing Dilithium masking gadgets in hardware, which were only implemented in software. The masking gadgets are integrated with the...
Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. The latest round of digital signature schemes introduced Hawk, both based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained...
Datasets of side-channel leakage measurements are widely used in research to develop and benchmarking side-channel attack and evaluation methodologies. Compared to using custom and/or one-off datasets, widely-used and publicly available datasets improve research reproducibility and comparability. Further, performing high-quality measurements requires specific equipment and skills, while also taking a significant amount of time. Therefore, using publicly available datasets lowers the barriers...
True Random Number Generators (TRNGs) and Physically Unclonable Functions (PUFs) are critical hardware primitives for cryptographic systems, providing randomness and device-specific security. TRNGs require complete randomness, while PUFs rely on consistent, device-unique responses. In this work, both primitives are implemented on a System-on-Chip Field-Programmable Gate Array (SoC FPGA), leveraging the integrated Phase-Locked Loops (PLLs) for robust entropy generation in PLLbased TRNGs. A...
Physically (or Physical) Unclonable Functions (PUFs) are basic and useful primitives in designing cryptographic systems. PUFs are designed to facilitate device authentication, secure boot, firmware integrity, and secure communications. To achieve these objectives, PUFs must exhibit both consistent repeatability and instance-specific randomness. The Arbiter PUF (APUF), recognized as the first silicon PUF, is capable of generating a substantial number of secret keys instantaneously based on...
Phase-locked loops (PLLs) integrated within field-programmable gate arrays (FPGAs) or System-on-Chip FPGAs (SoCs) represent a promising approach for generating random numbers. Their widespread deployment, isolated functionality within these devices, and robust entropy, as demonstrated in prior studies, position PLL-based true random number generators (PLL-TRNGs) as highly viable solutions for this purpose. This study explicitly examines PLL-TRNG implementations using the ZC702 Rev1.1...
The Extended Greatest Common Divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and post-quantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bezout coefficients $b_a$ and $b_b$ which satisfy $\mathrm{GCD}(a,b) = a\times b_a + b\times b_b$. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation...
Resource-constrained devices such as wireless sensors and Internet of Things (IoT) devices have become ubiquitous in our digital ecosystem. These devices generate and handle a major part of our digital data. In the face of the impending threat of quantum computers on our public-key infrastructure, it is impossible to imagine the security and privacy of our digital world without integrating post-quantum cryptography (PQC) into these devices. Usually, due to the resource constraints of these...
Garbled circuits (GC) are a secure multiparty computation protocol that enables two parties to jointly compute a function using their private data without revealing it to each other. While garbled circuits are proven secure at the protocol level, implementations can still be vulnerable to side-channel attacks. Recently, side-channel analysis of GC implementations has garnered significant interest from researchers. We investigate popular open-source GC frameworks and discover that the AES...
In this work, we present various hardware implementations for the lightweight cipher ASCON, which was recently selected as the winner of the NIST organized Lightweight Cryptography (LWC) competition. We cover encryption + tag generation and decryption + tag verification for the ASCON AEAD and also the ASCON hash function. On top of the usual (unprotected) implementation, we present side-channel protection (threshold countermeasure) and triplication/majority-based fault protection. To the...
In this paper, we introduce a new approach to secure computing by implementing a platform that utilizes an NVMe-based system with an FPGA-based Torus FHE accelerator, SSD, and middleware on the host-side. Our platform is the first of its kind to offer complete secure computing capabilities for TFHE using an FPGA-based accelerator. We have defined secure computing instructions to evaluate 14-bit to 14-bit functions using TFHE, and our middleware allows for communication of ciphertexts, keys,...
A recent work from Eurocrypt 2023 suggests that prime-field masking has excellent potential to improve the efficiency vs. security tradeoff of masked implementations against side-channel attacks, especially in contexts where physical leakages show low noise. We pick up on the main open challenge that this seed result leads to, namely the design of an optimized prime cipher able to take advantage of this potential. Given the interest of tweakable block ciphers with cheap inverses in many...
A universal circuit (UC) can be thought of as a programmable circuit that can simulate any circuit up to a certain size by specifying its secret configuration bits. UCs have been incorporated into various applications, such as private function evaluation (PFE). Recently, studies have attempted to formalize the concept of semiconductor intellectual property (IP) protection in the context of UCs. This is despite the observations made in theory and practice that, in reality, the adversary may...
This work presents the first hardware realisation of the Syndrome-Decoding-in-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH's hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction. This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for...
The evolution of quantum algorithms threatens to break public key cryptography in polynomial time. The development of quantum-resistant algorithms for the post-quantum era has seen a significant growth in the field of post quantum cryptography (PQC). Polynomial multiplication is the core of Ring Learning with Error (RLWE) lattice based cryptography (LBC) which is one of the most promising PQC candidates. In this work, we present the design of fast and energy-efficient pipelined Number...
Fault attacks impose a serious threat against the practical implementations of cryptographic algorithms. Statistical Ineffective Fault Attacks (SIFA), exploiting the dependency between the secret data and the fault propagation overcame many of the known countermeasures. Later, several countermeasures have been proposed to tackle this attack using error detection methods. However, the efficiency of the countermeasures, in part governed by the number of error checks, still remains a...
Modern block ciphers designed for hardware and masked with Threshold Implementations (TIs) provide provable security against first-order attacks. However, the application of TIs leaves designers to deal with a trade-off between its security and its cost, for example, the process to generate its required random bits. This generation cost comes with an increased overhead in terms of area and latency. Decreasing the number of random bits for the masking allows to reduce the aforementioned...
MAYO is a topical modification of the established multivariate signature scheme UOV. Signer and Verifier locally enlarge the public key map, such that the dimension of the oil space and therefore, the parameter sizes in general, can be reduced. This significantly reduces the public key size while maintaining the appealing properties of UOV, like short signatures and fast verification. Therefore, MAYO is considered as an attractive candidate in the NIST call for additional digital signatures...
Masking is a prominent strategy to protect cryptographic implementations against side-channel analysis. Its popularity arises from the exponential security gains that can be achieved for (approximately) quadratic resource utilization. Many variants of the countermeasure tailored for different optimization goals have been proposed. The common denominator among all of them is the implicit demand for robust and high entropy randomness. Simply assuming that uniformly distributed random bits are...
NIST has recently selected CRYSTALS-Kyber as a new public key encryption and key establishment algorithm to be standardized. This makes it important to evaluate the resistance of CRYSTALS-Kyber implementations to side-channel attacks. Software implementations of CRYSTALS-Kyber have already been thoroughly analysed. The discovered vulnerabilities helped improve the subsequently released versions and promoted stronger countermeasures against side-channel attacks. In this paper, we present the...
Persistent fault analysis is a novel and efficient cryptanalysis method. The persistent fault attacks take advantage of a persistent fault injected in a non-volatile memory, then present on the device until the reboot of the device. Contrary to classical physical fault injection, where differential analysis can be performed, persistent fault analysis requires new analyses and dedicated countermeasures. Persistent fault analysis requires a persistent fault injected in the S-box such that the...
Masking is considered to be an essential defense mechanism against side-channel attacks, but it is challenging to be adopted for hardware cryptographic implementations, especially for high security orders. Recently, Knichel et al. proposed an automated tool called AGEMA that enables the generation of masked implementations in hardware for arbitrary security orders using composable gadgets. This accelerates the construction and practical application of masking schemes. This article proposes a...
Masking has become one of the most effective approaches for securing hardware designs against side-channel attacks. Regardless of the effort put into correctly implementing masking schemes on a field-programmable gate array (FPGA), leakage can be unexpectedly observed. This is due to the fact that the assumption underlying all masked designs, i.e., the leakages of different shares are independent of each other, may no longer hold in practice. In this regard, extreme temperatures have been...
State-of-the-art sensors for measuring FPGA voltage fluctuations are time-to-digital converters (TDCs). They allow detecting voltage fluctuations in the order of a few nanoseconds. The key building component of a TDC is a delay line, typically implemented as a chain of fast carry propagation multiplexers. In FPGAs, the fast carry chains are constrained to dedicated logic and routing, and need to be routed strictly vertically. In this work, we present an alternative approach to designing...
TRNG is an essential component for security applications. A vulnerable TRNG could be exploited to facilitate potential attacks or be related to a reduced key space, and eventually results in a compromised cryptographic system. A digital FIRO-/GARO-based TRNG with high throughput and high entropy rate was introduced by Jovan Dj. Golić (TC’06). However, the fact that periodic oscillation is a main failure of FIRO-/GARO-based TRNGs is noticed in the paper (Markus Dichtl, ePrint’15). We verify...
Fully Homomorphic Encryption (FHE) is a technique that allows computation on encrypted data. It has the potential to drastically change privacy considerations in the cloud, but high computational and memory overheads are preventing its broad adoption. TFHE is a promising Torus-based FHE scheme that heavily relies on bootstrapping, the noise-removal tool invoked after each encrypted logical/arithmetical operation. We present FPT, a Fixed-Point FPGA accelerator for TFHE bootstrapping. FPT...
Clock randomization is one of the oldest countermeasures against side-channel attacks. Various implementations have been presented in the past, along with positive security evaluations. However, in this paper we show that it is possible to break countermeasures based on a randomized clock by sampling side-channel measurements at a frequency much higher than the encryption clock, synchronizing the traces with pre-processing, and targeting the beginning of the encryption. We demonstrate a...
Isogeny-based cryptography suffers from a long-running time due to its requirement of a great amount of large integer arithmetic. The Residue Number System (RNS) can compensate for that drawback by making computation more efficient via parallelism. However, performing a modular reduction by a large prime which is not part of the RNS base is very expensive. In this paper, we propose a new fast and efficient modular reduction algorithm using RNS. Also, we evaluate our modular reduction method...
In this paper, we introduce a second-order masking of the AES using the minimal number of shares and a total of 1268 bits of randomness including the sharing of the plaintext and key. The masking of the S-box is based on the tower field decomposition of the inversion over bytes where the changing of the guards technique is used in order to re-mask the middle branch of the decomposition. The sharing of the S-box is carefully crafted such that it achieves first-order probing security without...
Physically related functions~(PReFs) are hardware primitives proposed to establish key-exchange between resource-constrained devices with no pre-established secrets. In this paper, we introduce XOR composition of PReFs to eliminate the requirement of revealing the complete functionality of the hardware primitive during the setup phase, which is a prerequisite to setup PReFs. We evaluate the quality of XOR\_PReF design by implementing them on Artix-7 FPGAs.
Due to the complexity and the cost of producing integrated circuits, most hardware circuit designers outsource the manufacturing of their circuits to a third-party foundry. However, a dishonest foundry may abuse its access to the circuit's design in a variety of ways that undermine the designer's investment or potentially introduce vulnerabilities. To combat these issues, the hardware community has developed the notion of logic locking, which allows the designer to send the foundry a...
In this paper, we propose the first key-recovery side-channel attack on Classic McEliece, a KEM finalist in the NIST Post-quantum Cryptography Standardization Project. Our novel idea is to design an attack algorithm where we submit special ciphertexts to the decryption oracle that correspond to cases of single errors. Decoding of such cipher-texts involves only a single entry in a large secret permutation, which is part of the secret key. Through an identified leakage in the additive...
The looming threat of an adversary with Quantum computing capability led to a worldwide research effort towards identifying and standardizing novel post-quantum cryptographic primitives. Post-standardization, all existing security protocols will need to support efficient implementation of these primitives. In this work, we contribute to these efforts by reporting the smallest implementation of CRYSTALS-Dilithium, a finalist candidate for post-quantum digital signature. By invoking multiple...
Homomorphic encryption enables computation on encrypted data, and hence it has a great potential in privacy-preserving outsourcing of computations to the cloud. Hardware acceleration of homomorphic encryption is crucial as software implementations are very slow. In this paper, we present design methodologies for building a programmable hardware accelerator for speeding up the cloud-side homomorphic evaluations on encrypted data. First, we propose a divide-and-conquer technique that...
We present the first specification-compliant constant-time FPGA implementation of the Classic McEliece cryptosystem from the third-round of NIST's Post-Quantum Cryptography standardization process. In particular, we present the first complete implementation including encapsulation and decapsulation modules as well as key generation with seed expansion. All the hardware modules are parametrizable, at compile time, with security level and performance parameters. As the most time consuming...
Many currently deployed public-key cryptosystems are based on the difficulty of the discrete logarithm and integer factorization problems. However, given an adequately sized quantum computer, these problems can be solved in polynomial time as a function of the key size. Due to the future threat of quantum computing to current cryptographic standards, alternative algorithms that remain secure under quantum computing are being evaluated for future use. As a part of this evaluation,...
Secure communication often require both encryption and digital signatures to guarantee the confidentiality of the message and the authenticity of the parties. However, post-quantum cryptographic protocols are often studied independently. In this work, we identify a powerful synergy between two finalist protocols in the NIST standardization process. In particular, we propose a technique that enables SABER and Dilithium to share the exact same polynomial multiplier. Since polynomial...
Power analysis attacks on ECC have been presented since almost the very beginning of DPA itself, even before the standardization of AES. Given that power analysis attacks against AES are well known and have a large body of practical artifacts to demonstrate attacks on both software and hardware implementations, it is surprising that these artifacts are generally lacking for ECC. In this work we begin to remedy this by providing a complete open-source ECDSA attack artifact, based on a...
Many currently deployed public-key cryptosystems are based on the difficulty of the discrete logarithm and integer factorization problems. However, given an adequately sized quantum computer, these problems can be solved in polynomial time as a function of the key size. Due to the future threat of quantum computing to current cryptographic standards, alternative algorithms that remain secure under quantum computing are being evaluated for future use. One such algorithm is CRYSTALS-Dilithium,...
Threshold Implementations have become a popular generic technique to construct circuits resilient against power analysis attacks. In this paper, we look to devise efficient threshold circuits for the lightweight block cipher family SKINNY. The only threshold circuits for this family are those proposed by its designers who decomposed the 8-bit S-box into four quadratic S-boxes, and constructed a 3-share byte-serial threshold circuit that executes the substitution layer over four cycles. In...
This paper presents a design space exploration for SABER, one of the finalists in NIST’s quantum-resistant public-key cryptographic standardization effort. Our design space exploration targets a 65nmASIC platform and has resulted in the evaluation of 6 different architectures. Our exploration is initiated by setting a baseline architecture which is ported from FPGA. In order to improve the clock frequency (the primary goal in our exploration), we have employed several optimizations: (i) use...
Side-channel analysis (SCA) attacks pose a major threat to embedded systems due to their ease of accessibility. Realising SCA resilient cryptographic algorithms on embedded systems under tight intrinsic constraints, such as low area cost, limited computational ability, etc., is extremely challenging and often not possible. We propose a seamless and effective approach to realise a generic countermeasure against SCA attacks. XDIVINSA, an extended diversifying instruction agent, is introduced...
We provide three first-order hardware maskings of the AES, each allowing for a different trade-off between the number of shares and the number of register stages. All maskings use a generalization of the changing of the guards method enabling the re-use of randomness between masked S-boxes. As a result, the maskings do not require fresh randomness while still allowing for a minimal number of shares and providing provable security in the glitch-extended probing model. The low-area...
In this work, we present a small architecture for quantum-safe hybrid key exchange targeting ECDH and SIKE. This is the first known hardware implementation of ECDH/SIKE-based hybrid key exchange in the literature. We propose new ECDH and EdDSA parameter sets defined over the SIKE primes. As a proof-of-concept, we evaluate SIKEX434, a hybrid PQC scheme composed of SIKEp434 and our proposed ECDH scheme X434 over a new, low-footprint architecture. Both schemes utilize the same 434-bit prime to...
Lattice-based structures offer considerable possibilities for post-quantum cryptography. Recently, many algorithms have been built on hard lattice problems. The three of the remaining four in the final round of the post-quantum cryptography standardization process use lattice-based methods. Especially in embedded systems, these algorithms should be operated effectively. In this study, the potential of transport triggered architecture is examined in this sense. We try to compare open source...
CRYSTALS-Dilithium as a lattice-based digital signature scheme has been selected as a finalist in the PQC standardization process of NIST. As part of this selection, a variety of software implementations have been evaluated regarding their performance and memory requirements for platforms like x86 or ARM Cortex-M4. In this work, we present a first set of FPGA implementations for the low-end Xilinx Artix-7 platform, evaluating the peculiarities of the scheme in hardware, reflecting all...
Alongside the development of cloud computing and Internet of Things(IoT), cloud-based RFID is receiving more attention nowadays. Cloud-based RFID system is specifically developed to providing real-time data that can be fed to the cloud for easy access and instant data interpretation. Security and privacy of constrained devices in these systems is a challenging issue for many applications. To deal with this problem, we propose \(\chi\)perbp, a lightweight authentication protocol based on...
The protection of cryptographic implementations against power analysis attacks is of critical importance for many applications in embedded systems. The typical approach of protecting against these attacks is to implement algorithmic countermeasures, like masking. However, implementing these countermeasures in a secure and correct manner is challenging. Masking schemes require the independent processing of secret shares, which is a property that is often violated by CPU microarchitectures in...
Performance in hardware has typically played a major role in differentiating among leading candidates in cryptographic standardization efforts. Winners of two past NIST cryptographic contests (Rijndael in case of AES and Keccak in case of SHA-3) were ranked consistently among the two fastest candidates when implemented using FPGAs and ASICs. Hardware implementations of cryptographic operations may quite easily outperform software implementations for at least a subset of major performance...
Reverse engineering of integrated circuits, i.e., understanding the internals of IC, is required for many benign and malicious applications. Examples of the former are detection of patent infringements, hardware Trojans or IP-theft, as well as interface recovery and defect analysis, while malicious applications include IP-theft and finding insertion points for hardware Trojans. However, regardless of the application, the reverse engineer initially starts with a large unstructured netlist,...
Empowering electronic devices to support Post-Quantum Cryptography (PQC) is a challenging task. PQC introduces new mathematical elements and operations which are usually not easy to implement on standard processors. Especially for low cost and resource constraint devices, hardware acceleration is usually required. In addition, as the standardization process of PQC is still ongoing, a focus on maintaining flexibility is mandatory. To cope with such requirements, hardware/software co-design...
We present and evaluate a custom extension to the RISC-V instruction set for finite fields arithmetic. The result serves as a very compact approach to software-hardware co-design of PQC implementations in the context of small embedded processors such as smartcards. The extension provides instructions that implement finite field operations with subsequent reduction of the result. As small finite fields are used in various PQC schemes, such instructions can provide a considerable speedup for...
The rapid development of information technology in the automotive industry has driven increasing requirements on incorporating security functionalities in the in-vehicle architecture, which is usually realized by adding a Hardware Secure Module (HSM) in the Electronic Central Unit (ECU). Therefore, secure communications can be enforced by carrying out secret cryptographic computations within the HSM by use of the embedded hardware accelerators. However, there is no common standard for...
This paper presents an attack based on side-channel information and Information Set Decoding (ISD) on the Niederreiter cryptosystem and an evaluation of the practicality of the attack using an electromagnetic side channel. First, we describe a basic plaintext-recovery attack on the decryption algorithm of the Niederreiter cryptosystem. In case the cryptosystem is used as Key-Encapsulation Mechanism (KEM) in a key exchange, the plaintext corresponds to a session key. Our attack is an...
In 2008, Drimer et al. proposed different AES implementations on a Xilinx Virtex-5 FPGA, making efficient use of the DSP slices and BRAM tiles available on the device. Inspired by their work, in this paper, we evaluate the feasibility of extending AES with the popular GCM mode of operation, still concentrating on the optimal use of DSP slices and BRAM tiles. We make use of a Xilinx Zynq UltraScale+ MPSoC FPGA with improved DSP features. For the AES part, we implement Drimer’s round-based and...
Post-quantum cryptography has received increased attention in recent years, in particular, due to the standardization effort by NIST. One of the second-round candidates in the NIST post-quantum standardization project is Picnic, a post-quantum secure signature scheme based on efficient zero-knowledge proofs of knowledge. In this work, we present the first FPGA implementation of Picnic. We show how to efficiently calculate LowMC, the block cipher used as a one-way function in Picnic, in...
Masking is known as the most widely studied countermeasure against side-channel analysis attacks. Since a masked implementation is based on a certain number of shares (referred to as the order of masking), it still exhibits leakages at higher orders. In order to exploit such leakages, higher-order statistical moments individually at each order need to be estimated reflecting the higher-order attacks. Instead, Mutual Information Analysis (MIA) known for more than 10 years avoids such a...
SNEIK is a family of lightweight cryptographic algorithms derived from a single 512-bit permutation. The SNEIGEN ``entropy distribution function'' was designed to speed up certain functions in post-quantum and lattice-based public key algorithms. We implement and evaluate SNEIK algorithms on popular 8-bit AVR and 32-bit ARMv7-M (Cortex M3/M4) microcontrollers, and also describe an implementation for the open-source RISC-V (RV32I) Instruction Set Architecture (ISA). Our results demonstrate...
Achieving security in the Internet of Things (IoT) is challenging. The need for lightweight yet robust cryptographic solutions suitable for the IoT calls for improved design and implementation of constructs such as authenticated encryption with associated data (AEAD) which can ensure confidentiality, integrity, and authenticity of data in one algorithm. The U.S. National Institute of Standards and Technology (NIST) has embarked on a multi-year effort called the lightweight cryptography...
In recent years, deep learning has become an attractive ingredient to side-channel analysis (SCA) due to its potential to improve the success probability or enhance the performance of certain frequently executed tasks. One task that is commonly assisted by machine learning techniques is the profiling of a device's leakage behavior in order to carry out a template attack. At CHES 2019, deep learning has also been applied to non-profiled scenarios for the first time, extending its reach within...
Homomorphic encryption is a tool that enables computation on encrypted data and thus has applications in privacy-preserving cloud computing. Though conceptually amazing, implementation of homomorphic encryption is very challenging and typically software implementations on general purpose computers are extremely slow. In this paper we present our year long effort to design a domain specific architecture in a heterogeneous Arm+FPGA platform to accelerate homomorphic computing on encrypted...
Experts forecast that quantum computers can break classical cryptographic algorithms. Scientists are developing post quantum cryptographic (PQC) algorithms, that are invulnerable to quantum computer attacks. The National Institute of Standards and Technology (NIST) started a public evaluation process to standardize quantum-resistant public key algorithms. The objective of our study is to provide a hardware comparison of the NIST PQC competition candidates. For this, we use a High-Level...
Secure block cipher design is a complex discipline which combines mathematics, engineering, and computer science. In order to develop cryptographers who are grounded in all three disciplines, it is necessary to undertake synergistic research as early as possible in technical curricula, particularly at the undergraduate university level. In this work, students are presented with a new block cipher, which is designed to offer moderate security while providing engineering and analysis...
Implementing cryptographic algorithms in a tamper resistant way is an extremely complex task as the algorithm used and the target platform have a significant impact on the potential leakage of the implementation. In addition the quality of the tools used for the attacks is of importance. In order to evaluate the resistance of a certain design against electromagnetic emanation attacks – as a highly relevant type of attacks – we discuss the quality of different electromagnetic (EM) probes as...
This paper describes two FPGA implementations for the encryption and authentication of data, based on the AES algorithm running in Galois/Counter mode (AES-GCM). Both architectures are protected against side-channel analysis attacks through the use of a threshold implementation (TI). The first architecture is fully unrolled and optimized for throughput. The second architecture uses a round-based structure, fits on a relatively small FPGA board, and is evaluated for side-channel attack...
Ring oscillator-based true random number generators (RO-based TRNGs) are widely used to provide unpredictable random numbers for cryptographic systems. The unpredictability of the output numbers, which can be measured by entropy, is extracted from the jitter of the oscillatory signal. To quantitatively evaluate the entropy, several stochastic models have been proposed, all of which take the jitter as a key input parameter. So it is crucial to accurately estimate the jitter in the process of...
Hardware masked AES designs usually rely on Boolean masking and perform the computation of the S-box using the tower-field decomposition. On the other hand, splitting sensitive variables in a multiplicative way is more amenable for the computation of the AES S-box, as noted by Akkar and Giraud. However, multiplicative masking needs to be implemented carefully not to be vulnerable to first-order DPA with a zero-value power model. Up to now, sound higher-order multiplicative masking schemes...
This work proposes the first fine-grained configurable cell array specifically tailored for cryptographic implementations. The proposed architecture can be added to future FPGAs as an application-specific configurable building block, or to an ASIC as an embedded FPGA (eFPGA). The goal is to map cryptographic ciphers on combinatorial cells that are more efficient than general purpose lookup tables in terms of silicon area, configuration memory and combinatorial delay. As a first step in this...
Lattice-based cryptography is one of the most promising candidates being considered to replace current public-key systems in the era of quantum computing. In 2016, Bos et al. proposed the key exchange scheme FrodoCCS, that is also a submission to the NIST post-quantum standardization process, modified as a key encapsulation mechanism (FrodoKEM). The security of the scheme is based on standard lattices and the learning with errors problem. Due to the large parameters, standard lattice-based...
During the secure boot process for a trusted execution environment, the processor must provide a chain of certificates to the remote client demonstrating that their secure container was established as specified. This certificate chain is rooted at the hardware manufacturer who is responsible for constructing chips according to the correct specification and provisioning them with key material. We consider a semi-honest manufacturer who is assumed to construct chips correctly, but may attempt...
Lattice-based cryptography, one of the leading candidates for post-quantum security, relies heavily on discrete Gaussian samplers to provide necessary uncertainty, obfuscating computations on secret information. For reconfigurable hardware, the cumulative distribution table (CDT) scheme has previously been shown to achieve the highest throughput and the smallest resource utilisation, easily outperforming other existing samplers. However, the CDT sampler does not scale well. In fact, for...
Authenticated encryption with Associated Data (AEAD) plays a significant role in cryptography because of its ability to provide integrity, confidentiality and authenticity at the same time. Due to the emergence of security at the edge of computing fabric, such as, sensors and smartphone devices, there is a growing need of lightweight AEAD ciphers. Currently, a worldwide contest, titled CAESAR, is being held to decide on a set of AEAD ciphers, which are distinguished by their security,...
Security features of modern (SoC) FPAGs permit to protect the confidentiality of hard- and software IP when the devices are powered off as well as to validate the authenticity of IP when being loaded at startup. However, these approaches are insufficient since attackers with physical access can also perform attacks during runtime, demanding for additional security measures. In particular, RAM used by modern (SoC) FPGAs is under threat since RAM stores software IP as well as all kinds of...
Protecting cryptographic implementations against side-channel attacks is a must to prevent leakage of processed secrets. As a cell-level countermeasure, so called DPA-resistant logic styles have been proposed to prevent a data-dependent power consumption. As most of the DPA-resistant logic is based on dual-rails, properly implementing them is a challenging task on FPGAs which is due to their fixed architecture and missing freedom in the design tools. While previous works show a significant...
This paper presents a post-quantum secure, efficient, and tunable FPGA implementation of the key-generation algorithm for the Niederreiter cryptosystem using binary Goppa codes. Our key-generator implementation requires as few as 896,052 cycles to produce both public and private portions of a key, and can achieve an estimated frequency Fmax of over 240 MHz when synthesized for Stratix V FPGAs. To the best of our knowledge, this work is the first hardware-based implementation that works with...
Unpredictability is an important security property of Physically Unclonable Function (PUF) in the context of statistical attacks, where the correlation between challenge-response pairs is explicitly exploited. In existing literature on PUFs, Hamming Distance test, denoted by $\mathrm{HDT}(t)$, was proposed to evaluate the unpredictability of PUFs, which is a simplified case of the Propagation Criterion test $\mathrm{PC}(t)$. The objective of these testing schemes is to estimate the output...
Side-channel analysis attacks, particularly power analysis attacks, have become one of the major threats, that hardware designers have to deal with. To defeat them, the majority of the known concepts are based on either masking, hiding, or rekeying (or a combination of them). This work deals with a hiding scheme, more precisely a power-equalization technique which is ideally supposed to make the amount of power consumption of the device independent of its processed data. We propose and...
In leakage-resilient symmetric cryptography, two important concepts have been proposed in order to decrease the success rate of differential side-channel attacks. The first one is to limit the attacker’s data complexity by restricting the number of observable inputs; the second one is to create correlated algorithmic noise by using parallel S-boxes with equal inputs. The latter hinders the typical divide and conquer approach of differential side-channel attacks and makes key recovery much...
Security evaluation of third-party cryptographic IP (Intellectual Property) cores is often ignored due to several reasons including, lack of awareness about its adversity, lack of trust validation methodology otherwise view security as a byproduct. Particularly, the validation of low latency cipher IP core on Internet of Things (IoT) devices is crucial as they may otherwise become vulnerable for information theft. In this paper, we share an (Un)intentional way of cipher implementation as IP...
Differential power analysis (DPA) is a powerful tool to extract the key of a cryptographic implementation from observing its power consumption during the en-/decryption of many different inputs. Therefore, cryptographic schemes based on frequent re-keying such as leakage-resilient encryption aim to inherently prevent DPA on the secret key by limiting the amount of data being processed under one key. However, the original asset of encryption, namely the plaintext, is disregarded. This paper...
In this paper, we present a constant-time hardware implementation that achieves new speed records for the supersingular isogeny Diffie-Hellman (SIDH), even when compared to highly optimized Haswell computer architectures. We employ inversion-free projective isogeny formulas presented by Costello et al. at CRYPTO 2016 on an FPGA. Modern FPGA's can take advantage of heavily parallelized arithmetic in $\mathbb{F}_{p^{2}}$, which lies at the foundation of supersingular isogeny arithmetic....
In this paper, a new architecture for accelerating homomorphic function evaluation on FPGA is proposed. A parallel cached NTT algorithm with an overall time complexity O(sqrt(N)log(sqrt(N)) is presented. The architecture has been implemented on Xilinx Virtex 7 XC7V1140T FPGA. achieving a 60% utilization ratio. The implementation performs 32-bit 2^(16)-point NTT algorithm in 23.8 us, achieving speed-up of 2x over the state of the art architectures. The architecture has been evaluated by...
Although numerous attacks revealed the vulnerability of different PUF families to non-invasive Machine Learning (ML) attacks, the question is still open whether all PUFs might be learnable. Until now, virtually all ML attacks rely on the assumption that a mathematical model of the PUF functionality is known a priori. However, this is not always the case, and attention should be paid to this important aspect of ML attacks. This paper aims to address this issue by providing a provable...
WPA2-Personal is widely used to protect Wi-Fi networks against illicit access. While attackers typically use GPUs to speed up the discovery of weak network passwords, attacking random passwords is considered to quickly become infeasible with increasing password length. Professional attackers may thus turn to commercial high-end FPGA-based cluster solutions to significantly increase the speed of those attacks. Well known manufacturers such as Elcomsoft have succeeded in creating...
Implementations of white-box cryptography aim to protect a secret key in a white-box environment in which an adversary has full control over the execution process and the entire environment. Its fundamental principle is the map of the cryptographic architecture, including the secret key, to a number of encoded tables that shall resist the inspection and decomposition of an attacker. In a gray-box scenario, however, the property of hiding required implementation details from the attacker...
Besides security against classical cryptanalysis, its important for cryptographic implementations to have sufficient robustness against side-channel attacks. Many countermeasures have been proposed to thwart side channel attacks, especially power trace measurement based side channel attacks. Additionally, researchers have proposed several evaluation metrics to evaluate side channel security of crypto-system. However, evaluation of any crypto-system is done during the testing phase and is not...
A persistent problem with program execution, despite numerous mitigation attempts, is its inherent vulnerability to the injection of malicious code. Equally unsolved is the susceptibility of firmware to reverse engineering, which undermines the manufacturer's code confidentiality. We propose an approach that solves both kinds of security problems employing instruction-level code encryption combined with the use of a physical unclonable function (PUF). Our novel Secure Execution PUF-based...
Ring oscillator (RO) based physically unclonable function (PUF) on FPGAs is crucial and popular for its nice properties and easy implementation. The compensated measurement based on the ratio of two ring oscillators’ frequencies proves to be particularly effective to extract entropy of process variations. However from two ring oscillators only one bit entropy is extracted and RO PUFs will occupy numerous resource with the size of private information increasing. Motivated by this inefficient...
In the last years code-based cryptosystems were established as promising alternatives for asymmetric cryptography since they base their security on well-known NP-hard problems and still show decent performance on a wide range of computing platforms. The main drawback of code-based schemes, including the popular proposals by McEliece and Niederreiter, are the large keys whose size is inherently determined by the underlying code. In a very recent approach, Misoczki et al. proposed to use...
We present a hardware architecture for all building blocks required in polynomial ring based fully homomorphic schemes and use it to instantiate the somewhat homomorphic encryption scheme YASHE. Our implementation is the first FPGA implementation that is designed for evaluating functions on homomorphically encrypted data (up to a certain multiplicative depth) and we illustrate this capability by evaluating the SIMON-64/128 block cipher in the encrypted domain. Our implementation provides a...
After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a custom hardware accelerator optimized for a class of reconfigurable logic to bring LTV based somewhat homomorphic encryption (SWHE) schemes one step closer to deployment in real-life applications. The...
Reconfigurability is a unique feature of modern FPGA devices to load hardware circuits just on demand. This also implies that a completely different set of circuits might operate at the exact same location of the FPGA at different time slots, making it difficult for an external observer or attacker to predict what will happen at what time. In this work we present and evaluate a novel hardware implementation of the lightweight cipher PRESENT with built-in side-channel countermeasures based on...
Power-equalization schemes for digital circuits aim to harden cryptographic designs against power analysis attacks. With respect to dual-rail logics most of these schemes have originally been designed for ASIC platforms, but much efforts have been spent to map them to FPGAs as well. A particular challenge is here to apply those schemes to the predefined logic structures of FPGAs (i.e., slices, LUTs, FFs, and routing switch boxes) for which special tools are required. Due to the absence of...
Block Memory Content Scrambling (BMS), presented at CHES 2011, enables an effective way of first-order side-channel protection for cryptographic primitives at the cost of a significant reconfiguration time for the mask update. In this work we analyze alternative ways to implement dynamic first-order masking of AES with randomized look-up tables that can reduce this mask update time. The memory primitives we consider in this work include three distributed RAM components (RAM32M, RAM64M, and...
Designers of secure hardware are required to harden their implementations against physical threats, such as power analysis attacks. In particular, cryptographic hardware circuits are required to decorrelate their current consumption from the information inferred by processing (secret) data. A common technique to achieve this goal is the use of special logic styles that aim at equalizing the current consumption at each single processing step. However, since all hiding techniques like...
Side channel and fault attacks take advantage from the fact that the behavior of crypto implementations can be observed and provide hints that simplify revealing keys. These attacks use identical devices either for preparation of attacks or for measurements. By the preparation of attacks the structure and the electrical circuit of devices, that are identical to the target, is analyzed. By side channel attacks usually the same device is used many times for measurements, i.e. measurements on...
In order to provide security against side-channel attacks a masking scheme which makes use of wire-tap codes has recently been proposed. The scheme benefits from the features of binary linear codes, and its application to AES has been presented in the seminal article. In this work – with respect to the underlying scheme – we re-iterate the fundamental operations of the AES cipher in a hopefully more understandable terminology. Considering an FPGA platform we address the challenges each AES...
This paper investigates a novel attack vector against cryptography realized on FPGAs, which poses a serious threat to real-world applications.We demonstrate how a targeted bitstream modification can seriously weaken cryptographic algorithms, which we show with the examples of AES and 3DES. The attack is performed by modifying the FPGA bitstream that configures the hardware elements during initialization. Recently, it has been shown that cloning of FPGA designs is feasible, even if the...