18 results sorted by ID
Artificial Results From Hardware Synthesis
Ahmed Alharbi, Charles Bouillaguet
Implementation
In this paper, we revisit venerable lower-bounds on the $AT$ or $AT^2$
performance metric of hardware circuits. A series of works started in the late 1970's has established that if a hardware circuit of area $A$ computes a function $f : \{0, 1\}^n \rightarrow \{0, 1\}^m$ in $T$ clock cycles, then $AT^2$ is asymptotically larger than (a form of) the communication complexity of $f$. These lower-bounds ignore the active component of the circuit such as the logic gates and only take into...
Hardware Acceleration of the Prime-Factor and Rader NTT for BGV Fully Homomorphic Encryption
David Du Pont, Jonas Bertels, Furkan Turan, Michiel Van Beirendonck, Ingrid Verbauwhede
Implementation
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...
Cryptanalysis of Strong Physically Unclonable Functions
Liliya Kraleva, Mohammad Mahzoun, Raluca Posteuca, Dilara Toprakhisar, Tomer Ashur, Ingrid Verbauwhede
Attacks and cryptanalysis
Physically Unclonable Functions (PUFs) are being proposed as a low cost alternative to permanently store secret keys or provide device authentication without requiring non-volatile memory, large e-fuses or other dedicated processing steps. In the literature, PUFs are split into two main categories. The so-called strong PUFs are mainly used for authentication purposes, hence also called authentication PUFs. They promise to be lightweight by avoiding extensive digital post-processing and...
Low-Delay 4, 5 and 6-Term Karatsuba Formulae in $\mathbb{F}_2[x]$ Using Overlap-free Splitting
Haining Fan
Implementation
The overlap-free splitting method, i.e., even-odd splitting and its generalization, can reduce the XOR delay of a Karatsuba multiplier. We use this method to derive Karatsuba formulae with one less XOR delay in each recursive iteration. These formulae need more multiplication operations, and are trade-offs between space and time.
We also show that ``finding common subexpressions'' performs better than ``the refined identity'' in 4-term formula: we reduce the number of XOR gates given by...
zk-Sherlock: Exposing Hardware Trojans in Zero-Knowledge
Dimitris Mouris, Charles Gouert, Nektarios Georgios Tsoutsos
Applications
As integrated circuit (IC) design and manufacturing have become highly globalized, hardware security risks become more prominent as malicious parties can exploit multiple stages of the supply chain for profit. Two potential targets in this chain are third-party intellectual property (3PIP) vendors and their customers. Untrusted parties can insert hardware Trojans into 3PIP circuit designs that can both alter device functionalities when triggered or create a side channel to leak sensitive...
Nice Attacks --- but What is the Cost? Computational Models for Cryptanalysis
Charles Bouillaguet
Foundations
This paper discusses the implications of choosing a computational model to
study the cost of cryptographic attacks and therefore quantify how dangerous
they are. This choice is often unconscious and the chosen model itself is
usually implicit; but it has repercussions on security evaluations.
We compare three reasonable computational models: $i$) the usual Random Access
Machine (RAM) model; $ii$) the ``Expensive Memory Model'' explicitly introduced by
several 3rd-round submissions to the...
Image PUF: A Physical Unclonable Function for Printed Electronics based on Optical Variation of Printed Inks
Ahmet Turan Erozan, Michael Hefenbrock, Michael Beigl, Jasmin Aghassi-Hagmann, Mehdi B. Tahoori
Applications
Printed Electronics (PE) has a rapidly growing market, thus, the counterfeiting/overbuilding of PE components is anticipated to grow. The common solution for the counterfeiting is Physical Unclonable Functions (PUFs). In PUFs, a unique fingerprint is extracted from (irreproducible) process variations in the production and used in the authentication of valid components. Many commonly used PUFs are electrical PUFs by leveraging the impact of process variations on electrical properties of...
High-Speed Modular Multipliers for Isogeny-Based Post-Quantum Cryptography
Jing Tian, Zhe Liu, Jun Lin, Zhongfeng Wang, Binjing Li
Implementation
As one of the post-quantum protocol candidates, the supersingular isogeny key encapsulation (SIKE) protocol delivers promising public and secret key sizes over other candidates. Nevertheless, the considerable computations form the bottleneck and limit its practical applications. The modular multiplication operations occupy a large proportion of the overall computations required by the SIKE protocol. The VLSI implementation of the high-speed modular multiplier remains a big challenge. In this...
Hardware-Software Co-Design Based Obfuscation of Hardware Accelerators
Abhishek Chakraborty, Ankur Srivastava
Implementation
Existing logic obfuscation approaches aim to protect
hardware design IPs from SAT attack by increasing query count
and output corruptibility of a locked netlist. In this paper, we
demonstrate the ineffectiveness of such techniques to obfuscate
hardware accelerator platforms. Subsequently, we propose
a Hardware/software co-design based Accelerator Obfuscation
(HSCAO) scheme to provably safeguard the IP of such designs
against SAT as well as removal/bypass type of attacks while
still...
Testing the Trustworthiness of IC Testing: An Oracle-less Attack on IC Camouflaging
Muhammad Yasin, Ozgur Sinanoglu, Jeyavijayan Rajendran
Test of integrated circuits (ICs) is essential to ensure their quality; the test is meant to prevent defective and out-of-spec ICs from entering into the supply chain. The test is conducted by comparing the observed IC output with the expected test responses for a set of test patterns; the test patterns are generated using automatic test pattern generation algorithms. Existing test-pattern generation algorithms aim to achieve higher fault coverage at...
High-speed VLSI implementation of Digit-serial Gaussian normal basis Multiplication over GF(2m)
Bahram Rashidi, Sayed Masoud Sayedi, Reza Rezaeian Farashahi
Implementation
In this paper, by employing the logical effort technique an efficient and high-speed VLSI implementation of the digit-serial Gaussian normal basis multiplier is presented. It is constructed by using AND, XOR and XOR tree components. To have a low-cost implementation with low number of transistors, the block of AND gates are implemented by using NAND gates based on the property of the XOR gates in the XOR tree. To optimally decrease the delay and increase the drive ability of the circuit the...
A High Reliability PUF Using Hot Carrier Injection Based Response Reinforcement
Mudit Bhargava, Ken Mai
Implementation
Achieving high reliability across environmental variations and over aging in physical unclonable functions (PUFs) remains a challenge for PUF designers. The conventional method to improve PUF reliability is to use powerful error correction codes (ECC) to correct the errors in the raw response from the PUF core. Unfortunately, these ECC blocks generally have high VLSI overheads, which scale up quickly with the error correction capability. Alternately, researchers have proposed techniques to...
VLSI Implementation of Double-Base Scalar Multiplication on a Twisted Edwards Curve with an Efficiently Computable Endomorphism
Zhe Liu, Husen Wang, Johann Großschädl, Zhi Hu, Ingrid Verbauwhede
Implementation
The verification of an ECDSA signature requires a double-base scalar multiplication, an operation of the form $k \cdot G + l \cdot Q$ where $G$ is a generator of a large elliptic curve group of prime order $n$, $Q$ is an arbitrary element of said group, and $k$, $l$ are two integers in the range of $[1, n-1]$. We introduce in this paper an area-optimized VLSI design of a Prime-Field Arithmetic Unit (PFAU) that can serve as a loosely-coupled or tightly-coupled hardware accelerator in a...
2014/377
Last updated: 2019-05-09
Logic Synthesis based Public Key Scheme
Boaz Shahar
Public-key cryptography
This article proposes a method for the construction of a public key system that is based on VLSI logic synthesis algorithms. First, we discuss the properties of VLSI logic synthesis algorithms. Then we view them in the context of cryptographic primitives. Then we propose a public key encryption system and finally discuss its security properties.
Investigating the Potential of Custom Instruction Set Extensions for SHA-3 Candidates on a 16-bit Microcontroller Architecture
Jeremy Constantin, Andreas Burg, Frank K. Gurkaynak
Implementation
In this paper, we investigate the benefit of instruction set extensions for software implementations of all five SHA-3 candidates. To this end, we start from optimized assembly code for a common 16-bit microcontroller instruction set architecture. By themselves, these implementations provide reference for complexity of the algorithms on 16-bit architectures, commonly used in embedded systems. For each algorithm, we then propose suitable instruction set extensions and implement the modified...
Cryptanalysis of the Full AES Using GPU-Like Special-Purpose Hardware
Alex Biryukov, Johann Großschädl
Secret-key cryptography
The block cipher Rijndael has undergone more than ten years of extensive cryptanalysis since its submission as a candidate for the Advanced Encryption Standard (AES) in April 1998. To date, most of the publicly-known cryptanalytic results are based on reduced-round variants of the AES (respectively Rijndael) algorithm. Among the few exceptions that target the full AES are the Related-Key Cryptanalysis (RKC) introduced at ASIACRYPT 2009 and attacks exploiting Time-Memory-Key (TMK) trade-offs...
Overlap-free Karatsuba-Ofman Polynomial Multiplication Algorithms
Haining Fan, Jiaguang Sun, Ming Gu, Kwok-Yan Lam
We describe how a simple way to split input operands allows for fast VLSI implementations of subquadratic $GF(2)[x]$ Karatsuba-Ofman multipliers. The theoretical XOR gate delay of the resulting multipliers is reduced significantly. For example, it is reduced by about 33\% and 25\% for $n=2^{t}$ and $n=3^{t}$ $(t>1)$, respectively. To the best of our knowledge, this parameter has never been improved since the original Karatsuba-Ofman algorithm was first used to design $GF(2^n)$ multipliers in 1990.
2004/140
Last updated: 2005-03-23
Architectures and Hardware Implementations of the 64-bit MISTY1 Block Cipher
P. Kitsos, M. D. Galanis, O. Koufopavlou
Two alternative architectures and VLSI implementations of the 64-bit NESSIE proposal, MISTY1 block cipher, are presented in this paper. For these implementations, FPGA devices were used. The first architecture is suitable for applications with high throughput requirements. A throughput of up to 7.2 Gbps can be achieved at a clock frequency of 96 MHz. The main characteristic of this implementation is that uses RAM blocks that are embedded in the FPGA device in order to implement the necessary...
In this paper, we revisit venerable lower-bounds on the $AT$ or $AT^2$ performance metric of hardware circuits. A series of works started in the late 1970's has established that if a hardware circuit of area $A$ computes a function $f : \{0, 1\}^n \rightarrow \{0, 1\}^m$ in $T$ clock cycles, then $AT^2$ is asymptotically larger than (a form of) the communication complexity of $f$. These lower-bounds ignore the active component of the circuit such as the logic gates and only take into...
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...
Physically Unclonable Functions (PUFs) are being proposed as a low cost alternative to permanently store secret keys or provide device authentication without requiring non-volatile memory, large e-fuses or other dedicated processing steps. In the literature, PUFs are split into two main categories. The so-called strong PUFs are mainly used for authentication purposes, hence also called authentication PUFs. They promise to be lightweight by avoiding extensive digital post-processing and...
The overlap-free splitting method, i.e., even-odd splitting and its generalization, can reduce the XOR delay of a Karatsuba multiplier. We use this method to derive Karatsuba formulae with one less XOR delay in each recursive iteration. These formulae need more multiplication operations, and are trade-offs between space and time. We also show that ``finding common subexpressions'' performs better than ``the refined identity'' in 4-term formula: we reduce the number of XOR gates given by...
As integrated circuit (IC) design and manufacturing have become highly globalized, hardware security risks become more prominent as malicious parties can exploit multiple stages of the supply chain for profit. Two potential targets in this chain are third-party intellectual property (3PIP) vendors and their customers. Untrusted parties can insert hardware Trojans into 3PIP circuit designs that can both alter device functionalities when triggered or create a side channel to leak sensitive...
This paper discusses the implications of choosing a computational model to study the cost of cryptographic attacks and therefore quantify how dangerous they are. This choice is often unconscious and the chosen model itself is usually implicit; but it has repercussions on security evaluations. We compare three reasonable computational models: $i$) the usual Random Access Machine (RAM) model; $ii$) the ``Expensive Memory Model'' explicitly introduced by several 3rd-round submissions to the...
Printed Electronics (PE) has a rapidly growing market, thus, the counterfeiting/overbuilding of PE components is anticipated to grow. The common solution for the counterfeiting is Physical Unclonable Functions (PUFs). In PUFs, a unique fingerprint is extracted from (irreproducible) process variations in the production and used in the authentication of valid components. Many commonly used PUFs are electrical PUFs by leveraging the impact of process variations on electrical properties of...
As one of the post-quantum protocol candidates, the supersingular isogeny key encapsulation (SIKE) protocol delivers promising public and secret key sizes over other candidates. Nevertheless, the considerable computations form the bottleneck and limit its practical applications. The modular multiplication operations occupy a large proportion of the overall computations required by the SIKE protocol. The VLSI implementation of the high-speed modular multiplier remains a big challenge. In this...
Existing logic obfuscation approaches aim to protect hardware design IPs from SAT attack by increasing query count and output corruptibility of a locked netlist. In this paper, we demonstrate the ineffectiveness of such techniques to obfuscate hardware accelerator platforms. Subsequently, we propose a Hardware/software co-design based Accelerator Obfuscation (HSCAO) scheme to provably safeguard the IP of such designs against SAT as well as removal/bypass type of attacks while still...
Test of integrated circuits (ICs) is essential to ensure their quality; the test is meant to prevent defective and out-of-spec ICs from entering into the supply chain. The test is conducted by comparing the observed IC output with the expected test responses for a set of test patterns; the test patterns are generated using automatic test pattern generation algorithms. Existing test-pattern generation algorithms aim to achieve higher fault coverage at...
In this paper, by employing the logical effort technique an efficient and high-speed VLSI implementation of the digit-serial Gaussian normal basis multiplier is presented. It is constructed by using AND, XOR and XOR tree components. To have a low-cost implementation with low number of transistors, the block of AND gates are implemented by using NAND gates based on the property of the XOR gates in the XOR tree. To optimally decrease the delay and increase the drive ability of the circuit the...
Achieving high reliability across environmental variations and over aging in physical unclonable functions (PUFs) remains a challenge for PUF designers. The conventional method to improve PUF reliability is to use powerful error correction codes (ECC) to correct the errors in the raw response from the PUF core. Unfortunately, these ECC blocks generally have high VLSI overheads, which scale up quickly with the error correction capability. Alternately, researchers have proposed techniques to...
The verification of an ECDSA signature requires a double-base scalar multiplication, an operation of the form $k \cdot G + l \cdot Q$ where $G$ is a generator of a large elliptic curve group of prime order $n$, $Q$ is an arbitrary element of said group, and $k$, $l$ are two integers in the range of $[1, n-1]$. We introduce in this paper an area-optimized VLSI design of a Prime-Field Arithmetic Unit (PFAU) that can serve as a loosely-coupled or tightly-coupled hardware accelerator in a...
This article proposes a method for the construction of a public key system that is based on VLSI logic synthesis algorithms. First, we discuss the properties of VLSI logic synthesis algorithms. Then we view them in the context of cryptographic primitives. Then we propose a public key encryption system and finally discuss its security properties.
In this paper, we investigate the benefit of instruction set extensions for software implementations of all five SHA-3 candidates. To this end, we start from optimized assembly code for a common 16-bit microcontroller instruction set architecture. By themselves, these implementations provide reference for complexity of the algorithms on 16-bit architectures, commonly used in embedded systems. For each algorithm, we then propose suitable instruction set extensions and implement the modified...
The block cipher Rijndael has undergone more than ten years of extensive cryptanalysis since its submission as a candidate for the Advanced Encryption Standard (AES) in April 1998. To date, most of the publicly-known cryptanalytic results are based on reduced-round variants of the AES (respectively Rijndael) algorithm. Among the few exceptions that target the full AES are the Related-Key Cryptanalysis (RKC) introduced at ASIACRYPT 2009 and attacks exploiting Time-Memory-Key (TMK) trade-offs...
We describe how a simple way to split input operands allows for fast VLSI implementations of subquadratic $GF(2)[x]$ Karatsuba-Ofman multipliers. The theoretical XOR gate delay of the resulting multipliers is reduced significantly. For example, it is reduced by about 33\% and 25\% for $n=2^{t}$ and $n=3^{t}$ $(t>1)$, respectively. To the best of our knowledge, this parameter has never been improved since the original Karatsuba-Ofman algorithm was first used to design $GF(2^n)$ multipliers in 1990.
Two alternative architectures and VLSI implementations of the 64-bit NESSIE proposal, MISTY1 block cipher, are presented in this paper. For these implementations, FPGA devices were used. The first architecture is suitable for applications with high throughput requirements. A throughput of up to 7.2 Gbps can be achieved at a clock frequency of 96 MHz. The main characteristic of this implementation is that uses RAM blocks that are embedded in the FPGA device in order to implement the necessary...