43 results sorted by ID
Possible spell-corrected query: is
The Large Block Cipher Family Vistrutah
Roberto Avanzi, Bishwajit Chakraborty, Eik List
Secret-key cryptography
Vistrutah is a large block cipher with block sizes of 256 and 512 bits. It iterates a step function that applies two AES rounds to each 128-bit block of the state, followed by a state-wide cell permutation. Like Simpira, Haraka, Pholkos, and ASURA, Vistrutah leverages AES instructions to achieve high performance.
For each component of Vistrutah, we conduct a systematic evaluation of functions that can be efficiently implemented on both Intel and Arm architectures. We therefore expect...
CRAFT: Characterizing and Root-Causing Fault Injection Threats at Pre-Silicon
Arsalan Ali Malik, Harshvadan Mihir, Aydin Aysu
Attacks and cryptanalysis
Fault injection attacks represent a class of threats that can compromise embedded systems across multiple layers of abstraction, such as system software, instruction set architecture (ISA), microarchitecture, and physical implementation. Early detection of these vulnerabilities and understanding their root causes, along with their propagation from the physical layer to the system software, is critical in securing the cyberinfrastructure.
This work presents a comprehensive methodology for...
Verifying Jolt zkVM Lookup Semantics
Carl Kwan, Quang Dao, Justin Thaler
Applications
Lookups are a popular way to express repeated constraints in state-of-the art SNARKs. This is especially the case for zero-knowledge virtual machines (zkVMs), which produce succinct proofs of correct execution for programs expressed as bytecode according to a specific instruction set architecture (ISA). The Jolt zkVM (Arun, Setty & Thaler, Eurocrypt 2024) for RISC-V ISA employs Lasso (Setty, Thaler & Wahby, Eurocrypt 2024), an efficient lookup argument for massive structured tables, to prove...
Marian: An Open Source RISC-V Processor with Zvk Vector Cryptography Extensions
Thomas Szymkowiak, Endrit Isufi, Markku-Juhani Saarinen
Implementation
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
SoK: Instruction Set Extensions for Cryptographers
Hao Cheng, Johann Großschädl, Ben Marshall, Daniel Page, Markku-Juhani O. Saarinen
Implementation
Framed within the general context of cyber-security, standard cryptographic constructions often represent an enabling technology for associated solutions. Alongside or in combination with their design, therefore, the implementation of such constructions is an important challenge: beyond delivering artefacts that are usable in practice, implementation can impact many quality metrics (such as efficiency and security) which determine fitness-for-purpose. A rich design space of implementation...
MSMAC: Accelerating Multi-Scalar Multiplication for Zero-Knowledge Proof
Pengcheng Qiu, Guiming Wu, Tingqiang Chu, Changzheng Wei, Runzhou Luo, Ying Yan, Wei Wang, Hui Zhang
Implementation
Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...
ECO-CRYSTALS: Efficient Cryptography CRYSTALS on Standard RISC-V ISA
Xinyi Ji, Jiankuo Dong, Junhao Huang, Zhijian Yuan, Wangchen Dai, Fu Xiao, Jingqiang Lin
Implementation
The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium,...
Towards ML-KEM & ML-DSA on OpenTitan
Amin Abdulrahman, Felix Oberhansl, Hoang Nguyen Hien Pham, Jade Philipoom, Peter Schwabe, Tobias Stelzer, Andreas Zankl
Implementation
This paper presents extensions to the OpenTitan hardware root of trust that aim at enabling high-performance lattice-based cryptography.
We start by carefully optimizing ML-KEM and ML-DSA --- the two algorithms primarily recommended and standardized by NIST --- in software targeting the OTBN accelerator.
Based on profiling results of these implementations, we propose tightly integrated extensions to OTBN, specifically an interface from OTBN to OpenTitan's Keccak accelerator (KMAC core) and...
Juliet: A Configurable Processor for Computing on Encrypted Data
Charles Gouert, Dimitris Mouris, Nektarios Georgios Tsoutsos
Applications
Fully homomorphic encryption (FHE) has become progressively more viable in the years since its original inception in 2009. At the same time, leveraging state-of-the-art schemes in an efficient way for general computation remains prohibitively difficult for the average programmer. In this work, we introduce a new design for a fully homomorphic processor, dubbed Juliet, to enable faster operations on encrypted data using the state-of-the-art TFHE and cuFHE libraries for both CPU and GPU...
Towards a Polynomial Instruction Based Compiler for Fully Homomorphic Encryption Accelerators
Sejun Kim, Wen Wang, Duhyeong Kim, Adish Vartak, Michael Steiner, Rosario Cammarota
Applications
Fully Homomorphic Encryption (FHE) is a transformative technology that enables computations on encrypted data without requiring decryption, promising enhanced data privacy. However, its adoption has been limited due to significant performance overheads. Recent advances include the proposal of domain-specific, highly-parallel hardware accelerators designed to overcome these limitations.
This paper introduces PICA, a comprehensive compiler framework designed to simplify the programming of...
CrISA-X: Unleashing Performance Excellence in Lightweight Symmetric Cryptography for Extendable and Deeply Embedded Processors
Oren Ganon, Itamar Levi
Implementation
The selection of a Lightweight Cryptography (LWC) algorithm is crucial for resource limited applications. The National Institute of Standards and Technology (NIST) leads this process, which involves a thorough evaluation of the algorithms’ cryptanalytic strength. Furthermore, careful consideration is given to factors such as algorithm latency, code size, and hardware implementation area. These factors are critical in determining the overall performance of cryptographic solutions at edge...
Fast polynomial multiplication using matrix multiplication accelerators with applications to NTRU on Apple M1/M3 SoCs
Décio Luiz Gazzoni Filho, Guilherme Brandão, Julio López
Implementation
Efficient polynomial multiplication routines are critical to the performance of lattice-based post-quantum cryptography (PQC). As PQC standards only recently started to emerge, CPUs still lack specialized instructions to accelerate such routines. Meanwhile, deep learning has grown immeasurably in importance. Its workloads call for teraflops-level of processing power for linear algebra operations, mainly matrix multiplication. Computer architects have responded by introducing ISA extensions,...
PQ.V.ALU.E: Post-Quantum RISC-V Custom ALU Extensions on Dilithium and Kyber
Konstantina Miteloudi, Joppe Bos, Olivier Bronchain, Björn Fay, Joost Renes
Implementation
This paper explores the challenges and potential solutions of implementing the recommended upcoming post-quantum cryptography standards (the CRYSTALS-Dilithium and CRYSTALS-Kyber algorithms) on resource constrained devices.
The high computational cost of polynomial operations, fundamental to cryptography based on ideal lattices, presents significant challenges in an efficient implementation.
This paper proposes a hardware/software co-design strategy using RISC-V extensions to optimize...
Jolt: SNARKs for Virtual Machines via Lookups
Arasu Arun, Srinath Setty, Justin Thaler
Cryptographic protocols
Succinct Non-interactive Arguments of Knowledge (SNARKs) allow an untrusted prover to establish that it correctly ran some "witness-checking procedure" on a witness. A zkVM (short for zero-knowledge Virtual Machine) is a SNARK that allows the witness-checking procedure to be specified as a computer program written in the assembly language of a specific instruction set architecture (ISA).
A $\textit{front-end}$ converts computer programs into a lower-level representation such as an...
EDEN - a practical, SNARK-friendly combinator VM and ISA
Logan Allen, Brian Klatt, Philip Quirk, Yaseen Shaikh
Cryptographic protocols
Succinct Non-interactive Arguments of Knowledge (SNARKs) enable a party to cryptographically prove a statement regarding a computation to another party that has constrained resources. Practical use of SNARKs often involves a Zero-Knowledge Virtual Machine (zkVM) that receives an input program and input data, then generates a SNARK proof of the correct execution of the input program. Most zkVMs emulate the von Neumann architecture and must prove relations between a program's execution and its...
RPU: The Ring Processing Unit
Deepraj Soni, Negar Neda, Naifeng Zhang, Benedict Reynwar, Homer Gamil, Benjamin Heyman, Mohammed Nabeel Thari Moopan, Ahmad Al Badawi, Yuriy Polyakov, Kellie Canida, Massoud Pedram, Michail Maniatakos, David Bruce Cousins, Franz Franchetti, Matthew French, Andrew Schmidt, Brandon Reagen
Applications
Ring-Learning-with-Errors (RLWE) has emerged as the foundation of many important techniques for improving security and privacy, including homomorphic encryption and post-quantum cryptography. While promising, these techniques have received limited use due to their extreme overheads of running on general-purpose machines. In this paper, we present a novel vector Instruction Set Architecture (ISA) and microarchitecture for accelerating the ring-based computations of RLWE. The ISA, named B512,...
Recommendation for a holistic secure embedded ISA extension
Florian Stolz, Marc Fyrbiak, Pascal Sasdrich, Tim Güneysu
Foundations
Embedded systems are a cornerstone of the ongoing digitization of our society, ranging from expanding markets around IoT and smart-X devices over to sensors in autonomous driving, medical equipment or critical infrastructures. Since a vast amount of embedded systems are safety-critical (e.g., due to their operation site), security is a necessity for their operation. However, unlike mobile, desktop, and server systems, where adversaries typically only act have remote access, embedded systems...
RISC-V Instruction Set Extensions for Lightweight Symmetric Cryptography
Hao Cheng, Johann Großschädl, Ben Marshall, Dan Page, Thinh Pham
Implementation
The NIST LightWeight Cryptography (LWC) selection process aims to standardise cryptographic functionality which is suitable for resource-constrained devices. Since the outcome is likely to have significant, long-lived impact, careful evaluation of each submission with respect to metrics explicitly outlined in the call is imperative. Beyond the robustness of submissions against cryptanalytic attack, metrics related to their implementation (e.g., execution latency and memory footprint) form an...
Low-latency implementation of the GIFT cipher on RISC-V architectures
Gheorghe Pojoga, Kostas Papagiannopoulos
Implementation
Lightweight cryptography is a viable solution for constrained computational environments that require a secure communication channel. To standardize lightweight primitives, NIST has published a call for algorithms that address needs like compactness, low-latency, low-power/energy, etc. Among the candidates, the GIFT family of block ciphers was utilized in various NIST candidates due to its high-security margin and small gate footprint. As a result of their hardware-oriented design, software...
Efficient Proofs of Software Exploitability for Real-world Processors
Matthew Green, Mathias Hall-Andersen, Eric Hennenfent, Gabriel Kaptchuk, Benjamin Perez, Gijs Van Laer
Applications
We consider the problem of proving in zero-knowledge the existence of vulnerabilities in executables compiled to run on real-world processors. We demonstrate that it is practical to prove knowledge of real exploits for real-world processor architectures without the need for source code and without limiting our consideration to narrow vulnerability classes. To achieve this, we devise a novel circuit compiler and a toolchain that produces highly optimized, non-interactive zero-knowledge proofs...
On the Performance Gap of a Generic C Optimized Assembler and Wide Vector Extensions for Masked Software with an Ascon-{\it{p}} test case
Dor Salomon, Itamar Levi
Implementation
Efficient implementations of software masked designs constitute both an important goal and a significant challenge to Side Channel Analysis attack (SCA) security. In this paper we discuss the shortfall between generic C implementations and optimized (inline-) assembly versions while providing a large spectrum of efficient and generic masked implementations for any order, and demonstrate cryptographic algorithms and masking gadgets with reference to the state of the art. Our main goal is to...
A Scalable SIMD RISC-V based Processor with Customized Vector Extensions for CRYSTALS-Kyber
Huimin Li, Nele Mentens, Stjepan Picek
Implementation
SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1,600] permutation, which operates on an internal state of 1,600 bits, mostly represented as a $5\times5\times64{-}bit$ matrix. While existing implementations process the state sequentially in chunks of typically 32 or 64 bits, the Keccak-f[1,600] permutation can benefit a lot from speedup through parallelization. This paper is the first to explore the full potential of parallelization of...
SME: Scalable Masking Extensions
Ben Marshall, Dan Page
Supporting masking countermeasures for non-invasive side-channel security in instructions set architectures is a hard problem. Masked operations often have a large number of inputs and outputs, and enabling portable higher order masking has remained a difficult. However, there are clear benefits to enabling this in terms of performance, code density and security guarantees. We present SME, an instruction set extension for enabling secure and efficient software masking of cryptographic
code...
Architecture Support for Bitslicing
Pantea Kiaei, Tom Conroy, Patrick Schaumont
Implementation
The bitsliced programming model has shown to boost the throughput of software programs. However, on a standard architecture, it exerts a high pressure on register access, causing memory spills and restraining the full potential of bitslicing. In this work, we present architecture support for bitslicing in a System-on-Chip. Our hardware extensions are of two types; internal to the processor core, in the form of custom instructions, and external to the processor, in the form of direct memory...
A lightweight ISE for ChaCha on RISC-V
Ben Marshall, Daniel Page, Thinh Hung Pham
Implementation
ChaCha is a high-throughput stream cipher designed with the aim of ensuring high-security margins while achieving high performance on software platforms. RISC-V, an emerging, free, and open Instruction Set Architecture (ISA) is being developed with many instruction set extensions (ISE). ISEs are a native concept in RISC-V to support a relatively small RISC-V ISA to suit different use-cases including cryptographic acceleration via either standard or custom ISEs. This paper proposes a...
The Nested Subset Differential Attack: A Practical Direct Attack Against LUOV which Forges a Signature within 210 Minutes
Jintai Ding, Joshua Deaton, Vishakha, Bo-Yin Yang
Secret-key cryptography
In 2017, Ward Beullenset al.submitted Lifted Unbalanced Oil andVinegar, which is a modification to the Unbalanced Oil and Vinegar Schemeby Patarin. Previously, Dinget al.proposed the Subfield Differential Attack which prompted a change of parameters by the authors of LUOV for the sec-ond round of the NIST post quantum standardization competition. In this paper we propose a modification to the Subfield Differential Attack called the Nested Subset Differential Attack which fully...
The design of scalar AES Instruction Set Extensions for RISC-V
Ben Marshall, G. Richard Newell, Dan Page, Markku-Juhani O. Saarinen, Claire Wolf
Implementation
Secure, efficient execution of AES is an essential requirement on most computing platforms. Dedicated Instruction Set Extensions (ISEs) are often included for this purpose. RISC-V is a (relatively) new ISA that lacks such a standardised ISE. We survey the state-of-the-art industrial and academic ISEs for AES, implement and evaluate five different ISEs, one of which is novel. We recommend separate ISEs for 32 and 64-bit base architectures, with measured performance improvements for an AES-128...
Development of The RISC-V Entropy Source Interface
Markku-Juhani O. Saarinen, G. Richard Newell, Ben Marshall
Implementation
The RISC-V True Random Number Generator (TRNG) architecture breaks with previous ISA TRNG practice by splitting the Entropy Source (ES) component away from cryptographic DRBGs into a separate privileged interface, and in its use of polling. The modular approach is suitable for the RISC-V hardware IP ecosystem, allows a significantly smaller implementation footprint on platforms that need it, while directly supporting current standards compliance testing methods. We describe the interface,...
An Instruction Set Extension to Support Software-Based Masking
Si Gao, Johann Großschädl, Ben Marshall, Dan Page, Thinh Pham, Francesco Regazzoni
Implementation
In both hardware and software, masking can represent an effective means of hardening an implementation against side channel attack vectors such as Differential Power Analysis (DPA). Focusing on software, however, the use of masking can present various challenges: specifically, it often 1) requires significant effort to translate any theoretical security properties into practice, and, even then, 2) imposes a significant overhead in terms of efficiency. To address both challenges, this paper...
Domain-Oriented Masked Instruction Set Architecture for RISC-V
Pantea Kiaei, Patrick Schaumont
Implementation
An important selling point for the RISC-V instruction set is the separation between ISA and the implementation of the ISA, leading to flexibility in the design. We argue that for secure implementations, this flexibility is often a vulnerability. With a hardware attacker, the side-effects of instruction execution cannot be ignored. As a result, a strict separation between the ISA interface and implementation is undesirable. We suggest that secure ISA may require additional implementation...
RISQ-V: Tightly Coupled RISC-V Accelerators for Post-Quantum Cryptography
Tim Fritzmann, Georg Sigl, Johanna Sepúlveda
Public-key cryptography
Empowering electronic devices to support Post-Quantum Cryptography (PQC) is a challenging task. PQC introduces new mathematical elements and operations which are usually not easy to implement on standard processors. Especially for low cost and resource constraint devices, hardware acceleration is usually required. In addition, as the standardization process of PQC is still ongoing, a focus on maintaining flexibility is mandatory. To cope with such requirements, hardware/software co-design...
ISA Extensions for Finite Field Arithmetic - Accelerating Kyber and NewHope on RISC-V
Erdem Alkim, Hülya Evkan, Norman Lahr, Ruben Niederhagen, Richard Petri
Implementation
We present and evaluate a custom extension to the RISC-V instruction set
for finite fields arithmetic. The result serves as a very compact
approach to software-hardware co-design of PQC implementations in the
context of small embedded processors such as smartcards. The extension
provides instructions that implement finite field operations with
subsequent reduction of the result. As small finite fields are used in
various PQC schemes, such instructions can provide a considerable
speedup for...
A Lattice-based Enhanced Privacy ID
Nada EL Kassem, Luis Fiolhais, Paulo Martins, Liqun Chen, Leonel Sousa
Cryptographic protocols
The Enhanced Privacy ID (EPID) scheme is currently used for hardware enclave attestation by an increasingly large number of platforms that implement Intel Software Guard Extensions (SGX). However, the scheme currently deployed by Intel is supported on Elliptic Curve Cryptography (ECC), and will become insecure should a large quantum computer become available. As part of National Institute of Standards and Technology (NIST)'s effort for the standardisation of post-quantum cryptography, there...
SNEIK on Microcontrollers: AVR, ARMv7-M, and RISC-V with Custom Instructions
Markku-Juhani O. Saarinen
Implementation
SNEIK is a family of lightweight cryptographic algorithms derived from a
single 512-bit permutation. The SNEIGEN ``entropy distribution
function'' was designed to speed up certain functions in post-quantum
and lattice-based public key algorithms.
We implement and evaluate SNEIK algorithms on popular 8-bit AVR and 32-bit
ARMv7-M (Cortex M3/M4) microcontrollers, and also describe an
implementation for the open-source RISC-V (RV32I) Instruction Set
Architecture (ISA). Our results demonstrate...
Data Oblivious ISA Extensions for Side Channel-Resistant and High Performance Computing
Jiyong Yu, Lucas Hsiung, Mohamad El Hajj, Christopher W. Fletcher
Foundations
Blocking microarchitectural (digital) side channels is one of the most pressing challenges in hardware security today. Recently, there has been a surge of effort that attempts to block these leakages by writing programs data obliviously. In this model, programs are written to avoid placing sensitive data-dependent pressure on shared resources. Despite recent efforts, however, running data oblivious programs on modern machines today is insecure and low performance. First, writing programs...
Fully Verifiable Secure Delegation of Pairing Computation: Cryptanalysis and An Efficient Construction
Osmanbey Uzunkol, Öznur Kalkar, İsa Sertkaya
We address the problem of secure and verifiable delegation of general pairing computation. We first analyze some recently proposed pairing delegation schemes and present several attacks on their security and/or verifiability properties. In particular, we show that none of these achieve the claimed security and verifiability properties simultaneously. We then provide a fully verifiable secure delegation scheme ${\sf VerPair}$ under one-malicious version of a two-untrusted-program model...
2016/1125
Last updated: 2017-11-06
Estonian Voting Verification Mechanism Revisited
Koksal Mus, Mehmet Sabir Kiraz, Murat Cenk, Isa Sertkaya
After the Estonian Parliamentary Elections held in 2011, an additional verification mechanism was integrated into the i-voting system in order to resist corrupted voting devices, including the so called Student's Attack where a student practically showed that the voting system is indeed not verifiable by developing several versions of malware capable of blocking or even changing the vote. This mechanism gives voters the opportunity to verify whether the vote they cast is stored in the...
More Efficient Secure Outsourcing Methods for Bilinear Maps
Öznur Arabacı, Mehmet Sabir Kiraz, İsa Sertkaya, Osmanbey Uzunkol
Public-key cryptography
Bilinear maps are popular cryptographic primitives which have been commonly used in various modern cryptographic protocols. However, the cost of computation for bilinear maps is expensive because of their realization using variants of Weil and Tate pairings of elliptic curves. Due to increasing availability of cloud computing services, devices with limited computational resources can outsource this heavy computation to more powerful external servers. Currently, the checkability probability...
Characterising and Comparing the Energy Consumption of Side Channel Attack Countermeasures and Lightweight Cryptography on Embedded Devices
David McCann, Kerstin Eder, Elisabeth Oswald
Implementation
This paper uses an Instruction Set Architecture (ISA) based statistical energy model of an ARM Cortex-M4 microprocessor to evaluate the energy consumption of an implementation of AES with different side channel attack (SCA) countermeasures and an implementation of lightweight ciphers PRESENT, KLEIN and ZORRO with and without Boolean first order masking. In this way, we assess the additional energy consumption of using different SCA countermeasures and using lightweight block ciphers on 32...
An Efficient ID-Based Message Recoverable Privacy-Preserving Auditing Scheme
Mehmet Sabır Kiraz, İsa Sertkaya, Osmanbey Uzunkol
Applications
One of the most important benefits of public cloud storage is outsourcing of management and maintenance with easy accessibility and retrievability over the internet. However, outsourcing data on the cloud brings new challenges such as integrity verification and privacy of data. More concretely, once the users outsource their data on the cloud they have no longer physical control over the data and this leads to the integrity protection issue. Hence, it is crucial to guarantee proof of data...
Simple AEAD Hardware Interface (SÆHI) in a SoC: Implementing an On-Chip Keyak/WhirlBob Coprocessor
Markku-Juhani O. Saarinen
Simple AEAD Hardware Interface (SÆHI) is a hardware cryptographic interface aimed at CAESAR Authenticated Encryption with Associated Data (AEAD) algorithms. Cryptographic acceleration is typically achieved either with a coprocessor or via instruction set extensions. ISA modifications require re-engineering the CPU core, making the approach inapplicable outside the realm of open source processor cores. At minimum, we suggest implementing CAESAR AEADs as universal memory-mapped cryptographic...
Ultra Low-Power implementation of ECC on the ARM Cortex-M0+
Ruan de Clercq, Leif Uhsadel, Anthony Van Herrewege, Ingrid Verbauwhede
Public-key cryptography
In this work, elliptic curve cryptography (ECC) is used to make an efficient implementation of a public-key cryptography algorithm on the ARM Cortex-M0+. The goal of this implementation is to make not only a fast, but also a very low-power software implementation. To aid in the elliptic curve parameter selection, the energy consumption of different instructions on the ARM Cortex-M0+ was measured and it was found that there is a variation of up to 22.5% between different instructions. The...
On the Affine Equivalence and Nonlinearity Preserving Bijective Mappings
İsa Sertkaya, Ali Doğanaksoy
Secret-key cryptography
It is well-known that affine equivalence relations keep nonlineaerity invariant for all Boolean functions. The set of all Boolean functions, $\mathcal{F}_n$, over $\bbbf_2^n$, is naturally regarded as the $2^n$ dimensional vector space, $\bbbf_2^{2^n}$. Thus, while analyzing the transformations acting on $\mathcal{F}_n$, $S_{2^{2^n}}$, the group of all bijective mappings, defined from $\bbbf_2^{2^n}$ onto itself should be considered. As it is shown in \cite{ser,ser:dog,ser:dog:2}, there...
Vistrutah is a large block cipher with block sizes of 256 and 512 bits. It iterates a step function that applies two AES rounds to each 128-bit block of the state, followed by a state-wide cell permutation. Like Simpira, Haraka, Pholkos, and ASURA, Vistrutah leverages AES instructions to achieve high performance. For each component of Vistrutah, we conduct a systematic evaluation of functions that can be efficiently implemented on both Intel and Arm architectures. We therefore expect...
Fault injection attacks represent a class of threats that can compromise embedded systems across multiple layers of abstraction, such as system software, instruction set architecture (ISA), microarchitecture, and physical implementation. Early detection of these vulnerabilities and understanding their root causes, along with their propagation from the physical layer to the system software, is critical in securing the cyberinfrastructure. This work presents a comprehensive methodology for...
Lookups are a popular way to express repeated constraints in state-of-the art SNARKs. This is especially the case for zero-knowledge virtual machines (zkVMs), which produce succinct proofs of correct execution for programs expressed as bytecode according to a specific instruction set architecture (ISA). The Jolt zkVM (Arun, Setty & Thaler, Eurocrypt 2024) for RISC-V ISA employs Lasso (Setty, Thaler & Wahby, Eurocrypt 2024), an efficient lookup argument for massive structured tables, to prove...
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
Framed within the general context of cyber-security, standard cryptographic constructions often represent an enabling technology for associated solutions. Alongside or in combination with their design, therefore, the implementation of such constructions is an important challenge: beyond delivering artefacts that are usable in practice, implementation can impact many quality metrics (such as efficiency and security) which determine fitness-for-purpose. A rich design space of implementation...
Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...
The field of post-quantum cryptography (PQC) is continuously evolving. Many researchers are exploring efficient PQC implementation on various platforms, including x86, ARM, FPGA, GPU, etc. In this paper, we present an Efficient CryptOgraphy CRYSTALS (ECO-CRYSTALS) implementation on standard 64-bit RISC-V Instruction Set Architecture (ISA). The target schemes are two winners of the National Institute of Standards and Technology (NIST) PQC competition: CRYSTALS-Kyber and CRYSTALS-Dilithium,...
This paper presents extensions to the OpenTitan hardware root of trust that aim at enabling high-performance lattice-based cryptography. We start by carefully optimizing ML-KEM and ML-DSA --- the two algorithms primarily recommended and standardized by NIST --- in software targeting the OTBN accelerator. Based on profiling results of these implementations, we propose tightly integrated extensions to OTBN, specifically an interface from OTBN to OpenTitan's Keccak accelerator (KMAC core) and...
Fully homomorphic encryption (FHE) has become progressively more viable in the years since its original inception in 2009. At the same time, leveraging state-of-the-art schemes in an efficient way for general computation remains prohibitively difficult for the average programmer. In this work, we introduce a new design for a fully homomorphic processor, dubbed Juliet, to enable faster operations on encrypted data using the state-of-the-art TFHE and cuFHE libraries for both CPU and GPU...
Fully Homomorphic Encryption (FHE) is a transformative technology that enables computations on encrypted data without requiring decryption, promising enhanced data privacy. However, its adoption has been limited due to significant performance overheads. Recent advances include the proposal of domain-specific, highly-parallel hardware accelerators designed to overcome these limitations. This paper introduces PICA, a comprehensive compiler framework designed to simplify the programming of...
The selection of a Lightweight Cryptography (LWC) algorithm is crucial for resource limited applications. The National Institute of Standards and Technology (NIST) leads this process, which involves a thorough evaluation of the algorithms’ cryptanalytic strength. Furthermore, careful consideration is given to factors such as algorithm latency, code size, and hardware implementation area. These factors are critical in determining the overall performance of cryptographic solutions at edge...
Efficient polynomial multiplication routines are critical to the performance of lattice-based post-quantum cryptography (PQC). As PQC standards only recently started to emerge, CPUs still lack specialized instructions to accelerate such routines. Meanwhile, deep learning has grown immeasurably in importance. Its workloads call for teraflops-level of processing power for linear algebra operations, mainly matrix multiplication. Computer architects have responded by introducing ISA extensions,...
This paper explores the challenges and potential solutions of implementing the recommended upcoming post-quantum cryptography standards (the CRYSTALS-Dilithium and CRYSTALS-Kyber algorithms) on resource constrained devices. The high computational cost of polynomial operations, fundamental to cryptography based on ideal lattices, presents significant challenges in an efficient implementation. This paper proposes a hardware/software co-design strategy using RISC-V extensions to optimize...
Succinct Non-interactive Arguments of Knowledge (SNARKs) allow an untrusted prover to establish that it correctly ran some "witness-checking procedure" on a witness. A zkVM (short for zero-knowledge Virtual Machine) is a SNARK that allows the witness-checking procedure to be specified as a computer program written in the assembly language of a specific instruction set architecture (ISA). A $\textit{front-end}$ converts computer programs into a lower-level representation such as an...
Succinct Non-interactive Arguments of Knowledge (SNARKs) enable a party to cryptographically prove a statement regarding a computation to another party that has constrained resources. Practical use of SNARKs often involves a Zero-Knowledge Virtual Machine (zkVM) that receives an input program and input data, then generates a SNARK proof of the correct execution of the input program. Most zkVMs emulate the von Neumann architecture and must prove relations between a program's execution and its...
Ring-Learning-with-Errors (RLWE) has emerged as the foundation of many important techniques for improving security and privacy, including homomorphic encryption and post-quantum cryptography. While promising, these techniques have received limited use due to their extreme overheads of running on general-purpose machines. In this paper, we present a novel vector Instruction Set Architecture (ISA) and microarchitecture for accelerating the ring-based computations of RLWE. The ISA, named B512,...
Embedded systems are a cornerstone of the ongoing digitization of our society, ranging from expanding markets around IoT and smart-X devices over to sensors in autonomous driving, medical equipment or critical infrastructures. Since a vast amount of embedded systems are safety-critical (e.g., due to their operation site), security is a necessity for their operation. However, unlike mobile, desktop, and server systems, where adversaries typically only act have remote access, embedded systems...
The NIST LightWeight Cryptography (LWC) selection process aims to standardise cryptographic functionality which is suitable for resource-constrained devices. Since the outcome is likely to have significant, long-lived impact, careful evaluation of each submission with respect to metrics explicitly outlined in the call is imperative. Beyond the robustness of submissions against cryptanalytic attack, metrics related to their implementation (e.g., execution latency and memory footprint) form an...
Lightweight cryptography is a viable solution for constrained computational environments that require a secure communication channel. To standardize lightweight primitives, NIST has published a call for algorithms that address needs like compactness, low-latency, low-power/energy, etc. Among the candidates, the GIFT family of block ciphers was utilized in various NIST candidates due to its high-security margin and small gate footprint. As a result of their hardware-oriented design, software...
We consider the problem of proving in zero-knowledge the existence of vulnerabilities in executables compiled to run on real-world processors. We demonstrate that it is practical to prove knowledge of real exploits for real-world processor architectures without the need for source code and without limiting our consideration to narrow vulnerability classes. To achieve this, we devise a novel circuit compiler and a toolchain that produces highly optimized, non-interactive zero-knowledge proofs...
Efficient implementations of software masked designs constitute both an important goal and a significant challenge to Side Channel Analysis attack (SCA) security. In this paper we discuss the shortfall between generic C implementations and optimized (inline-) assembly versions while providing a large spectrum of efficient and generic masked implementations for any order, and demonstrate cryptographic algorithms and masking gadgets with reference to the state of the art. Our main goal is to...
SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1,600] permutation, which operates on an internal state of 1,600 bits, mostly represented as a $5\times5\times64{-}bit$ matrix. While existing implementations process the state sequentially in chunks of typically 32 or 64 bits, the Keccak-f[1,600] permutation can benefit a lot from speedup through parallelization. This paper is the first to explore the full potential of parallelization of...
Supporting masking countermeasures for non-invasive side-channel security in instructions set architectures is a hard problem. Masked operations often have a large number of inputs and outputs, and enabling portable higher order masking has remained a difficult. However, there are clear benefits to enabling this in terms of performance, code density and security guarantees. We present SME, an instruction set extension for enabling secure and efficient software masking of cryptographic code...
The bitsliced programming model has shown to boost the throughput of software programs. However, on a standard architecture, it exerts a high pressure on register access, causing memory spills and restraining the full potential of bitslicing. In this work, we present architecture support for bitslicing in a System-on-Chip. Our hardware extensions are of two types; internal to the processor core, in the form of custom instructions, and external to the processor, in the form of direct memory...
ChaCha is a high-throughput stream cipher designed with the aim of ensuring high-security margins while achieving high performance on software platforms. RISC-V, an emerging, free, and open Instruction Set Architecture (ISA) is being developed with many instruction set extensions (ISE). ISEs are a native concept in RISC-V to support a relatively small RISC-V ISA to suit different use-cases including cryptographic acceleration via either standard or custom ISEs. This paper proposes a...
In 2017, Ward Beullenset al.submitted Lifted Unbalanced Oil andVinegar, which is a modification to the Unbalanced Oil and Vinegar Schemeby Patarin. Previously, Dinget al.proposed the Subfield Differential Attack which prompted a change of parameters by the authors of LUOV for the sec-ond round of the NIST post quantum standardization competition. In this paper we propose a modification to the Subfield Differential Attack called the Nested Subset Differential Attack which fully...
Secure, efficient execution of AES is an essential requirement on most computing platforms. Dedicated Instruction Set Extensions (ISEs) are often included for this purpose. RISC-V is a (relatively) new ISA that lacks such a standardised ISE. We survey the state-of-the-art industrial and academic ISEs for AES, implement and evaluate five different ISEs, one of which is novel. We recommend separate ISEs for 32 and 64-bit base architectures, with measured performance improvements for an AES-128...
The RISC-V True Random Number Generator (TRNG) architecture breaks with previous ISA TRNG practice by splitting the Entropy Source (ES) component away from cryptographic DRBGs into a separate privileged interface, and in its use of polling. The modular approach is suitable for the RISC-V hardware IP ecosystem, allows a significantly smaller implementation footprint on platforms that need it, while directly supporting current standards compliance testing methods. We describe the interface,...
In both hardware and software, masking can represent an effective means of hardening an implementation against side channel attack vectors such as Differential Power Analysis (DPA). Focusing on software, however, the use of masking can present various challenges: specifically, it often 1) requires significant effort to translate any theoretical security properties into practice, and, even then, 2) imposes a significant overhead in terms of efficiency. To address both challenges, this paper...
An important selling point for the RISC-V instruction set is the separation between ISA and the implementation of the ISA, leading to flexibility in the design. We argue that for secure implementations, this flexibility is often a vulnerability. With a hardware attacker, the side-effects of instruction execution cannot be ignored. As a result, a strict separation between the ISA interface and implementation is undesirable. We suggest that secure ISA may require additional implementation...
Empowering electronic devices to support Post-Quantum Cryptography (PQC) is a challenging task. PQC introduces new mathematical elements and operations which are usually not easy to implement on standard processors. Especially for low cost and resource constraint devices, hardware acceleration is usually required. In addition, as the standardization process of PQC is still ongoing, a focus on maintaining flexibility is mandatory. To cope with such requirements, hardware/software co-design...
We present and evaluate a custom extension to the RISC-V instruction set for finite fields arithmetic. The result serves as a very compact approach to software-hardware co-design of PQC implementations in the context of small embedded processors such as smartcards. The extension provides instructions that implement finite field operations with subsequent reduction of the result. As small finite fields are used in various PQC schemes, such instructions can provide a considerable speedup for...
The Enhanced Privacy ID (EPID) scheme is currently used for hardware enclave attestation by an increasingly large number of platforms that implement Intel Software Guard Extensions (SGX). However, the scheme currently deployed by Intel is supported on Elliptic Curve Cryptography (ECC), and will become insecure should a large quantum computer become available. As part of National Institute of Standards and Technology (NIST)'s effort for the standardisation of post-quantum cryptography, there...
SNEIK is a family of lightweight cryptographic algorithms derived from a single 512-bit permutation. The SNEIGEN ``entropy distribution function'' was designed to speed up certain functions in post-quantum and lattice-based public key algorithms. We implement and evaluate SNEIK algorithms on popular 8-bit AVR and 32-bit ARMv7-M (Cortex M3/M4) microcontrollers, and also describe an implementation for the open-source RISC-V (RV32I) Instruction Set Architecture (ISA). Our results demonstrate...
Blocking microarchitectural (digital) side channels is one of the most pressing challenges in hardware security today. Recently, there has been a surge of effort that attempts to block these leakages by writing programs data obliviously. In this model, programs are written to avoid placing sensitive data-dependent pressure on shared resources. Despite recent efforts, however, running data oblivious programs on modern machines today is insecure and low performance. First, writing programs...
We address the problem of secure and verifiable delegation of general pairing computation. We first analyze some recently proposed pairing delegation schemes and present several attacks on their security and/or verifiability properties. In particular, we show that none of these achieve the claimed security and verifiability properties simultaneously. We then provide a fully verifiable secure delegation scheme ${\sf VerPair}$ under one-malicious version of a two-untrusted-program model...
After the Estonian Parliamentary Elections held in 2011, an additional verification mechanism was integrated into the i-voting system in order to resist corrupted voting devices, including the so called Student's Attack where a student practically showed that the voting system is indeed not verifiable by developing several versions of malware capable of blocking or even changing the vote. This mechanism gives voters the opportunity to verify whether the vote they cast is stored in the...
Bilinear maps are popular cryptographic primitives which have been commonly used in various modern cryptographic protocols. However, the cost of computation for bilinear maps is expensive because of their realization using variants of Weil and Tate pairings of elliptic curves. Due to increasing availability of cloud computing services, devices with limited computational resources can outsource this heavy computation to more powerful external servers. Currently, the checkability probability...
This paper uses an Instruction Set Architecture (ISA) based statistical energy model of an ARM Cortex-M4 microprocessor to evaluate the energy consumption of an implementation of AES with different side channel attack (SCA) countermeasures and an implementation of lightweight ciphers PRESENT, KLEIN and ZORRO with and without Boolean first order masking. In this way, we assess the additional energy consumption of using different SCA countermeasures and using lightweight block ciphers on 32...
One of the most important benefits of public cloud storage is outsourcing of management and maintenance with easy accessibility and retrievability over the internet. However, outsourcing data on the cloud brings new challenges such as integrity verification and privacy of data. More concretely, once the users outsource their data on the cloud they have no longer physical control over the data and this leads to the integrity protection issue. Hence, it is crucial to guarantee proof of data...
Simple AEAD Hardware Interface (SÆHI) is a hardware cryptographic interface aimed at CAESAR Authenticated Encryption with Associated Data (AEAD) algorithms. Cryptographic acceleration is typically achieved either with a coprocessor or via instruction set extensions. ISA modifications require re-engineering the CPU core, making the approach inapplicable outside the realm of open source processor cores. At minimum, we suggest implementing CAESAR AEADs as universal memory-mapped cryptographic...
In this work, elliptic curve cryptography (ECC) is used to make an efficient implementation of a public-key cryptography algorithm on the ARM Cortex-M0+. The goal of this implementation is to make not only a fast, but also a very low-power software implementation. To aid in the elliptic curve parameter selection, the energy consumption of different instructions on the ARM Cortex-M0+ was measured and it was found that there is a variation of up to 22.5% between different instructions. The...
It is well-known that affine equivalence relations keep nonlineaerity invariant for all Boolean functions. The set of all Boolean functions, $\mathcal{F}_n$, over $\bbbf_2^n$, is naturally regarded as the $2^n$ dimensional vector space, $\bbbf_2^{2^n}$. Thus, while analyzing the transformations acting on $\mathcal{F}_n$, $S_{2^{2^n}}$, the group of all bijective mappings, defined from $\bbbf_2^{2^n}$ onto itself should be considered. As it is shown in \cite{ser,ser:dog,ser:dog:2}, there...