Search results

2024/1862 (PDF) Last updated: 2024-11-14

BatchZK: A Fully Pipelined GPU-Accelerated System for Batch Generation of Zero-Knowledge Proofs

Tao Lu, Yuxun Chen, Zonghui Wang, Xiaohang Wang, Wenzhi Chen, Jiaheng Zhang

Implementation

Zero-knowledge proof (ZKP) is a cryptographic primitive that enables one party to prove the validity of a statement to other parties without disclosing any secret information. With its widespread adoption in applications such as blockchain and verifiable machine learning, the demand for generating zero-knowledge proofs has increased dramatically. In recent years, considerable efforts have been directed toward developing GPU-accelerated systems for proof generation. However, these previous...

2024/1827 (PDF) Last updated: 2024-11-07

OPTIMSM: FPGA hardware accelerator for Zero-Knowledge MSM

Xander Pottier, Thomas de Ruijter, Jonas Bertels, Wouter Legiest, Michiel Van Beirendonck, Ingrid Verbauwhede

Implementation

The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving...

2024/1629 (PDF) Last updated: 2024-10-11

Efficient Key-Switching for Word-Type FHE and GPU Acceleration

Shutong Jin, Zhen Gu, Guangyan Li, Donglong Chen, Çetin Kaya Koç, Ray C. C. Cheung, Wangchen Dai

Implementation

Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains...

2024/1543 (PDF) Last updated: 2024-10-02

HEonGPU: a GPU-based Fully Homomorphic Encryption Library 1.0

Ali Şah Özcan, Erkay Savaş

Implementation

HEonGPU is a high-performance library designed to optimize Fully Homomorphic Encryption (FHE) operations on Graphics Processing Unit (GPU). By leveraging the parallel processing capac- ity of GPUs, HEonGPU significantly reduces the computational overhead typically associated with FHE by executing complex operation concurrently. This allows for faster execution of homomorphic computations on encrypted data, enabling real-time applications in privacy-preserving machine learn- ing and secure...

2024/1436 (PDF) Last updated: 2024-09-13

Eva: Efficient IVC-Based Authentication of Lossy-Encoded Videos

Chengru Zhang, Xiao Yang, David Oswald, Mark Ryan, Philipp Jovanovic

Applications

With the increasing spread of fake videos for misinformation, proving the provenance of an edited video (without revealing the original one) becomes critical. To this end, we introduce Eva, the first cryptographic protocol for authenticating lossy-encoded videos. Compared to previous cryptographic methods for image authentication, Eva supports significantly larger amounts of data that undergo complex transformations during encoding. We achieve this by decomposing repetitive and manageable...

2024/1371 (PDF) Last updated: 2024-12-01

PIGEON: A Framework for Private Inference of Neural Networks

Christopher Harth-Kitzerow, Yongqin Wang, Rachit Rajat, Georg Carle, Murali Annavaram

Cryptographic protocols

Privacy-Preserving Machine Learning (PPML) is one of the most relevant use cases for Secure Multiparty Computation (MPC). While private training of large neural networks such as VGG-16 or ResNet-50 on state-of-the-art datasets such as ImageNet is still out of reach due to the performance overhead of MPC, GPU-based MPC frameworks are starting to achieve practical runtimes for private inference. However, we show that, in contrast to plaintext machine learning, the usage of GPU acceleration for...

2024/1246 (PDF) Last updated: 2024-08-06

MSMAC: Accelerating Multi-Scalar Multiplication for Zero-Knowledge Proof

Pengcheng Qiu, Guiming Wu, Tingqiang Chu, Changzheng Wei, Runzhou Luo, Ying Yan, Wei Wang, Hui Zhang

Implementation

Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...

2024/1089 (PDF) Last updated: 2024-07-04

Juliet: A Configurable Processor for Computing on Encrypted Data

Charles Gouert, Dimitris Mouris, Nektarios Georgios Tsoutsos

Applications

Fully homomorphic encryption (FHE) has become progressively more viable in the years since its original inception in 2009. At the same time, leveraging state-of-the-art schemes in an efficient way for general computation remains prohibitively difficult for the average programmer. In this work, we introduce a new design for a fully homomorphic processor, dubbed Juliet, to enable faster operations on encrypted data using the state-of-the-art TFHE and cuFHE libraries for both CPU and GPU...

2024/1030 (PDF) Last updated: 2024-06-26

GRASP: Accelerating Hash-based PQC Performance on GPU Parallel Architecture

Yijing Ning, Jiankuo Dong, Jingqiang Lin, Fangyu Zheng, Yu Fu, Zhenjiang Dong, Fu Xiao

Implementation

$SPHINCS^+$, one of the Post-Quantum Cryptography Digital Signature Algorithms (PQC-DSA) selected by NIST in the third round, features very short public and private key lengths but faces significant performance challenges compared to other post-quantum cryptographic schemes, limiting its suitability for real-world applications. To address these challenges, we propose the GPU-based paRallel Accelerated $SPHINCS^+$ (GRASP), which leverages GPU technology to enhance the efficiency of...

2024/744 (PDF) Last updated: 2024-08-28

An NVMe-based Secure Computing Platform with FPGA-based TFHE Accelerator

Yoshihiro Ohba, Tomoya Sanuki, Claude Gravel, Kentaro Mihara

Implementation

In this paper, we introduce a new approach to secure computing by implementing a platform that utilizes an NVMe-based system with an FPGA-based Torus FHE accelerator, SSD, and middleware on the host-side. Our platform is the first of its kind to offer complete secure computing capabilities for TFHE using an FPGA-based accelerator. We have defined secure computing instructions to evaluate 14-bit to 14-bit functions using TFHE, and our middleware allows for communication of ciphertexts, keys,...

2024/705 (PDF) Last updated: 2024-10-17

Large-Scale MPC: Scaling Private Iris Code Uniqueness Checks to Millions of Users

Remco Bloemen, Bryan Gillespie, Daniel Kales, Philipp Sippl, Roman Walch

Cryptographic protocols

In this work we tackle privacy concerns in biometric verification systems that typically require server-side processing of sensitive data (e.g., fingerprints and Iris Codes). Concretely, we design a solution that allows us to query whether a given Iris Code is similar to one contained in a given database, while all queries and datasets are being protected using secure multiparty computation (MPC). Addressing the substantial performance demands of operational systems like World ID and aid...

2024/529 (PDF) Last updated: 2024-04-05

Fully Homomorphic Training and Inference on Binary Decision Tree and Random Forest

Hojune Shin, Jina Choi, Dain Lee, Kyoungok Kim, Younho Lee

This paper introduces a new method for training decision trees and random forests using CKKS homomorphic encryption (HE) in cloud environments, enhancing data privacy from multiple sources. The innovative Homomorphic Binary Decision Tree (HBDT) method utilizes a modified Gini Impurity index (MGI) for node splitting in encrypted data scenarios. Notably, the proposed training approach operates in a single cloud security domain without the need for decryption, addressing key challenges in...

2024/136 (PDF) Last updated: 2024-09-16

Secure Transformer Inference Made Non-interactive

Jiawen Zhang, Xinpeng Yang, Lipeng He, Kejia Chen, Wen-jie Lu, Yinghao Wang, Xiaoyang Hou, Jian Liu, Kui Ren, Xiaohu Yang

Cryptographic protocols

Secure transformer inference has emerged as a prominent research topic following the proliferation of ChatGPT. Existing solutions are typically interactive, involving substantial communication load and numerous interaction rounds between the client and the server. In this paper, we propose NEXUS, the first non-interactive protocol for secure transformer inference. The protocol requires the client to engage in just one round of communication with the server during the whole inference...

2024/118 (PDF) Last updated: 2024-01-26

Data Privacy Made Easy: Enhancing Applications with Homomorphic Encryption

Charles Gouert, Nektarios Georgios Tsoutsos

Applications

Homomorphic encryption is a powerful privacy-preserving technology that is notoriously difficult to configure and use, even for experts. The key difficulties include restrictive programming models of homomorphic schemes and choosing suitable parameters for an application. In this tutorial, we outline methodologies to solve these issues and allow for conversion of any application to the encrypted domain using both leveled and fully homomorphic encryption. The first approach, called...

2024/057 (PDF) Last updated: 2024-08-16

Elastic MSM: A Fast, Elastic and Modular Preprocessing Technique for Multi-Scalar Multiplication Algorithm on GPUs

Xudong Zhu, Haoqi He, Zhengbang Yang, Yi Deng, Lutan Zhao, Rui Hou

Implementation

Zero-knowledge proof (ZKP) is a cryptographic primitive that enables a prover to convince a verifier that a statement is true, without revealing any other information beyond the correctness of the statement itself. Due to its powerful capabilities, its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been widely deployed in various privacy preserving applications such as cryptocurrencies and verifiable computation. Although...

2023/1541 (PDF) Last updated: 2023-10-08

TMVP-based Polynomial Convolution for Saber and Sable on GPU using CUDA-cores and Tensor-cores

Muhammad Asfand Hafeez, Wai-Kong Lee, Angshuman Karmakar, Seong Oun Hwang

Implementation

Recently proposed lattice-based cryptography algorithms can be used to protect the IoT communication against the threat from quantum computers, but they are computationally heavy. In particular, polynomial multiplication is one of the most time-consuming operations in lattice-based cryptography. To achieve efficient implementation, the Number Theoretic Transform (NTT) algorithm is an ideal choice, but it has certain limitations on the parameters, which not all lattice-based schemes can...

2023/1522 (PDF) Last updated: 2023-10-06

cuML-DSA: Optimized Signing Procedure and Server-Oriented GPU Design for ML-DSA

Shiyu Shen, Hao Yang, Wenqian Li, Yunlei Zhao

Implementation

The threat posed by quantum computing has precipitated an urgent need for post-quantum cryptography. Recently, the post-quantum digital signature draft FIPS 204 has been published, delineating the details of the ML-DSA, which is derived from the CRYSTALS-Dilithium. Despite these advancements, server environments, especially those equipped with GPU devices necessitating high-throughput signing, remain entrenched in classical schemes. A conspicuous void exists in the realm of GPU...

2023/1467 (PDF) Last updated: 2023-09-28

GPU Acceleration of High-Precision Homomorphic Computation Utilizing Redundant Representation

Shintaro Narisada, Hiroki Okada, Kazuhide Fukushima, Shinsaku Kiyomoto, Takashi Nishide

Implementation

Fully homomorphic encryption (FHE) can perform computations on encrypted data, allowing us to analyze sensitive data without losing its security. The main issue for FHE is its lower performance, especially for high-precision computations, compared to calculations on plaintext data. Making FHE viable for practical use requires both algorithmic improvements and hardware acceleration. Recently, Klemsa and Önen (CODASPY'22) presented fast homomorphic algorithms for high-precision integers,...

2023/1429 (PDF) Last updated: 2023-09-21

Leveraging GPU in Homomorphic Encryption: Framework Design and Analysis of BFV Variants

Shiyu Shen, Hao Yang, Wangchen Dai, Lu Zhou, Zhe Liu, Yunlei Zhao

Implementation

Homomorphic Encryption (HE) enhances data security by facilitating computations on encrypted data, opening new paths for privacy-focused computations. The Brakerski-Fan-Vercauteren (BFV) scheme, a promising HE scheme, raises considerable performance challenges. Graphics Processing Units (GPUs), with considerable parallel processing abilities, have emerged as an effective solution. In this work, we present an in-depth study focusing on accelerating and comparing BFV variants on GPUs,...

2023/1428 (PDF) Last updated: 2023-09-21

XNET: A Real-Time Uniﬁed Secure Inference Framework Using Homomorphic Encryption

Hao Yang, Shiyu Shen, Siyang Jiang, Lu Zhou, Wangchen Dai, Yunlei Zhao

Applications

Homomorphic Encryption (HE) presents a promising solution to securing neural networks for Machine Learning as a Service (MLaaS). Despite its potential, the real-time applicability of current HE-based solutions remains a challenge, and the diversity in network structures often results in inefficient implementations and maintenance. To address these issues, we introduce a unified and compact network structure for real-time inference in convolutional neural networks based on HE. We further...

2023/1410 (PDF) Last updated: 2023-10-06

Two Algorithms for Fast GPU Implementation of NTT

Ali Şah Özcan, Erkay Savaş

Implementation

The number theoretic transform (NTT) permits a very efficient method to perform multiplication of very large degree polynomials, which is the most time-consuming operation in fully homomorphic encryption (FHE) schemes and a class of non-interactive succinct zero-knowledge proof systems such as zk-SNARK. Efficient modular arithmetic plays an important role in the performance of NTT, and therefore it is studied extensively. The access pattern to the memory, on the other hand, may play much...

2023/1352 (PDF) Last updated: 2023-09-11

ACE-HoT: Accelerating an extreme amount of symmetric Cipher Evaluations for High-Order avalanche Tests

Emanuele Bellini, Juan Grados, Mohamed Rachidi, Nitin Satpute, Joan Daemen, Solane Elhirch

Implementation

In this work, we tackle the problem of estimating the security of iterated symmetric ciphers in an efficient manner, with tests that do not require a deep analysis of the internal structure of the cipher. This is particularly useful during the design phase of these ciphers, especially for quickly testing several combinations of possible parameters defining several cipher design variants. We consider a popular statistical test that allows us to determine the probability of flipping each...

2023/1309 (PDF) Last updated: 2023-09-02

A Lattice-based Publish-Subscribe Communication Protocol using Accelerated Homomorphic Encryption Primitives

Anes Abdennebi, Erkay Savaş

Implementation

Key-policy attribute-based encryption scheme (KP-ABE) uses a set of attributes as public keys for encryption. It allows homomorphic evaluation of ciphertext into another ciphertext of the same message, which can be decrypted if a certain access policy based on the attributes is satisfied. A lattice-based KP-ABE scheme is reported in several works in the literature, and its software implementation is available in an open-source library called PALISADE. However, as the cryptographic primitives...

2023/1194 (PDF) Last updated: 2023-08-06

HI-Kyber: A novel high-performance implementation scheme of Kyber based on GPU

Xinyi Ji, Jiankuo Dong, Pinchang Zhang, Deng Tonggui, Hua Jiafeng, Fu Xiao

Implementation

CRYSTALS-Kyber, as the only public key encryption (PKE) algorithm selected by the National Institute of Standards and Technology (NIST) in the third round, is considered one of the most promising post-quantum cryptography (PQC) schemes. Lattice-based cryptography uses complex discrete alogarithm problems on lattices to build secure encryption and decryption systems to resist attacks from quantum computing. Performance is an important bottleneck affecting the promotion of post quantum...

2023/804 (PDF) Last updated: 2023-06-01

Falkor: Federated Learning Secure Aggregation Powered by AES-CTR GPU Implementation

Mariya Georgieva Belorgey, Sofia Dandjee, Nicolas Gama, Dimitar Jetchev, Dmitry Mikushin

Cryptographic protocols

We propose a novel protocol, Falkor, for secure aggregation for Federated Learning in the multi-server scenario based on masking of local models via a stream cipher based on AES in counter mode and accelerated by GPUs running on the aggregating servers. The protocol is resilient to client dropout and has reduced clients/servers communication cost by a factor equal to the number of aggregating servers (compared to the naïve baseline method). It scales simultaneously in the two major...

2023/641 (PDF) Last updated: 2023-05-12

Accelerated Encrypted Execution of General-Purpose Applications

Charles Gouert, Vinu Joseph, Steven Dalton, Cedric Augonnet, Michael Garland, Nektarios Georgios Tsoutsos

Implementation

Fully Homomorphic Encryption (FHE) is a cryptographic method that guarantees the privacy and security of user data during computation. FHE algorithms can perform unlimited arithmetic computations directly on encrypted data without decrypting it. Thus, even when processed by untrusted systems, confidential data is never exposed. In this work, we develop new techniques for accelerated encrypted execution and demonstrate the significant performance advantages of our approach. Our current focus...

2023/532 (PDF) Last updated: 2023-04-12

HLG: A framework for computing graphs in Residue Number System and its application in Fully Homomorphic Encryption

Shuang Wu, Chunhuan Zhao, Ye Yuan, Shuzhou Sun, Jie Li, Yamin Liu

Implementation

Implementation of Fully Homomorphic Encryption (FHE) is challenging. Especially when considering hardware acceleration, the major performance bottleneck is data transfer. Here we propose an algebraic framework called Heterogenous Lattice Graph (HLG) to build and process computing graphs in Residue Number System (RNS), which is the basis of high performance implementation of mainstream FHE algorithms. There are three main design goals for HLG framework: • Design a dedicated IR (HLG...

2023/399 (PDF) Last updated: 2023-03-21

High Throughput Lattice-based Signatures on GPUs: Comparing Falcon and Mitaka

Wai-Kong Lee, Raymond K. Zhao, Ron Steinfeld, Amin Sakzad, Seong Oun Hwang

Implementation

The US National Institute of Standards and Technology initiated a standardization process for post-quantum cryptography in 2017, with the aim of selecting key encapsulation mechanisms and signature schemes that can withstand the threat from emerging quantum computers. In 2022, Falcon was selected as one of the standard signature schemes, eventually attracting effort to optimize the implementation of Falcon on various hardware architectures for practical applications. Recently, Mitaka was...

2023/206 (PDF) Last updated: 2024-05-10

Orca: FSS-based Secure Training and Inference with GPUs

Neha Jawalkar, Kanav Gupta, Arkaprava Basu, Nishanth Chandran, Divya Gupta, Rahul Sharma

Cryptographic protocols

Secure Two-party Computation (2PC) allows two parties to compute any function on their private inputs without revealing their inputs to each other. In the offline/online model for 2PC, correlated randomness that is independent of all inputs to the computation, is generated in a preprocessing (offline) phase and this randomness is then utilized in the online phase once the inputs to the parties become available. Most 2PC works focus on optimizing the online time as this overhead lies on the...

2023/049 (PDF) Last updated: 2024-02-19

Phantom: A CUDA-Accelerated Word-Wise Homomorphic Encryption Library

Hao Yang, Shiyu Shen, Wangchen Dai, Lu Zhou, Zhe Liu, Yunlei Zhao

Implementation

Homomorphic encryption (HE) is a promising technique for privacy-preserving computations, especially the word-wise HE schemes that allow batching. However, the high computational overhead hinders the deployment of HE in real-word applications. GPUs are often used to accelerate execution, but a comprehensive performance comparison of different schemes on the same platform is still missing. In this work, we fill this gap by implementing three word-wise HE schemes BGV, BFV, and CKKS on GPU,...

2022/1621 (PDF) Last updated: 2022-11-21

cuXCMP: CUDA-Accelerated Private Comparison Based on Homomorphic Encryption

Hao Yang, Shiyu Shen, Zhe Liu, Yunlei Zhao

Applications

Private comparison schemes constructed on homomorphic encryption oﬀer the noninteractive, output expressive and parallelizable features, and have advantages in communication bandwidth and performance. In this paper, we propose cuXCMP, which allows negative and ﬂoat inputs, oﬀers fully output expressive feature, and is more extensible and practical compared to XCMP (AsiaCCS 2018). Meanwhile, we introduce several memory-centric optimizations of the constant term extraction kernel tailored for...

2022/1321 (PDF) Last updated: 2023-04-15

cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs

Tao Lu, Chengkun Wei, Ruijing Yu, Chaochao Chen, Wenjing Fang, Lei Wang, Zeke Wang, Wenzhi Chen

Implementation

Zero-knowledge proof is a critical cryptographic primitive. Its most practical type, called zero-knowledge Succinct Non-interactive ARgument of Knowledge (zkSNARK), has been deployed in various privacy-preserving applications such as cryptocurrencies and verifiable machine learning. Unfortunately, zkSNARK like Groth16 has a high overhead on its proof generation step, which consists of several time-consuming operations, including large-scale matrix-vector multiplication (MUL),...

2022/1222 (PDF) Last updated: 2022-11-17

Homomorphic Encryption on GPU

Ali Şah Özcan, Can Ayduman, Enes Recep Türkoğlu, Erkay Savaş

Implementation

Homomorphic encryption (HE) is a cryptosystem that allows secure processing of encrypted data. One of the most popular HE schemes is the Brakerski-Fan-Vercauteren (BFV), which supports somewhat (SWHE) and fully homomorphic encryption (FHE). Since overly involved arithmetic operations of HE schemes are amenable to concurrent computation, GPU devices can be instrumental in facilitating the practical use of HE in real world applications thanks to their superior parallel processing capacity....

2022/999 (PDF) Last updated: 2022-08-03

PipeMSM: Hardware Acceleration for Multi-Scalar Multiplication

Charles. F. Xavier

Foundations

Multi-Scalar Multiplication (MSM) is a fundamental computational problem. Interest in this problem was recently prompted by its application to ZK-SNARKs, where it often turns out to be the main computational bottleneck. In this paper we set forth a pipelined design for computing MSM. Our design is based on a novel algorithmic approach and hardware-specific optimizations. At the core, we rely on a modular multiplication technique which we deem to be of independent interest. We implemented...

2022/892 (PDF) Last updated: 2022-08-26

Piranha: A GPU Platform for Secure Computation

Jean-Luc Watson, Sameer Wagh, Raluca Ada Popa

Implementation

Secure multi-party computation (MPC) is an essential tool for privacy-preserving machine learning (ML). However, secure training of large-scale ML models currently requires a prohibitively long time to complete. Given that large ML inference and training tasks in the plaintext setting are significantly accelerated by Graphical Processing Units (GPUs), this raises the natural question: can secure MPC leverage GPU acceleration? A few recent works have studied this question in the context of...

2022/881 (PDF) Last updated: 2022-08-16

A Novel High-performance Implementation of CRYSTALS-Kyber with AI Accelerator

Lipeng Wan, Fangyu Zheng, Guang Fan, Rong Wei, Lili Gao, Jiankuo Dong, Jingqiang Lin, Yuewu Wang

Implementation

Public-key cryptography, including conventional cryptosystems and post-quantum cryptography, involves computation-intensive workloads. With noticing the extraordinary computing power of AI accelerators, in this paper, we further explore the feasibility to introduce AI accelerators into high-performance cryptographic computing. Since AI accelerators are dedicated to machine learning or neural networks, the biggest challenge is how to transform cryptographic workloads into their operations,...

2022/633 (PDF) Last updated: 2022-05-23

CUDA-Accelerated RNS Multiplication in Word-Wise Homomorphic Encryption Schemes

Shiyu Shen, Hao Yang, Yu Liu, Zhe Liu, Yunlei Zhao

Implementation

Homomorphic encryption (HE), which allows computation over encrypted data, has often been used to preserve privacy. However, the computationally heavy nature and complexity of network topologies make the deployment of HE schemes in the Internet of Things (IoT) scenario difficult. In this work, we propose CARM, the first optimized GPU implementation that covers BGV, BFV and CKKS, targeting for accelerating homomorphic multiplication using GPU in heterogeneous IoT systems. We offer...

2022/482 (PDF) Last updated: 2022-04-23

cuFE: High Performance Privacy Preserving Support Vector Machine with Inner-Product Functional Encryption

KyungHyun Han, Wai-Kong Lee, Angshuman Karmakar, Jose Maria Bermudo Mera, Seong Oun Hwang

Public-key cryptography

Privacy preservation is a sensitive issue in our modern society. It is becoming increasingly important in many applications in this ever-growing and highly connected digital era. Functional encryption is a computation on encrypted data paradigm that allows users to retrieve the evaluation of a function on encrypted data without revealing the data, thus effectively protecting users' privacy. However, existing functional encryption implementations are still very time-consuming for practical...

2022/183 (PDF) Last updated: 2024-09-28

Improving Differential-Neural Cryptanalysis

Liu Zhang, Zilong Wang, Baocang wang

Attacks and cryptanalysis

In CRYPTO'19, Gohr introduced a novel cryptanalysis method by developing a differential-neural distinguisher using neural networks as the underlying distinguisher. He effectively integrated this distinguisher with classical differentials, facilitating a 12-round key recovery attack on Speck32/64 (from a total of 22 rounds). Bao et al. refined the concept of neutral bits, enabling key recovery attacks up to 13 rounds for Speck32/64 and 16 rounds (from a total of 32) for Simon32/64. Our...

2021/1389 (PDF) Last updated: 2022-06-13

DPCrypto: Acceleration of Post-quantum Cryptographic Algorithms using Dot-Product Instruction on GPUs

Wai-Kong Lee, Hwajeong Seo, Seong Oun Hwang, Angshuman Karmakar, Jose Maria Bermudo Mera, Ramachandra Achar

Implementation

Dot-product is a widely used operation in many machine learning and scientific computing algorithms. Recently, NVIDIA has introduced dot-product instructions (DP2A and DP4A) in modern GPU architectures, with the aim of accelerating machine learning and scientific computing applications. These dot-product instructions allow the computation of multiply-and-add instructions in a clock cycle, effectively achieving higher throughput compared to conventional 32-bit integer units. In this paper,...

2021/1155 (PDF) Last updated: 2022-05-05

GPS: Integration of Graphene, PALISADE, and SGX for Large-scale Aggregations of Distributed Data

Jonathan Takeshita, Colin McKechney, Justin Pajak, Antonis Papadimitriou, Ryan Karl, Taeho Jung

Implementation

Secure computing methods such as fully homomorphic encryption and hardware solutions such as Intel Software Guard Extension (SGX) have been applied to provide security for user input in privacy-oriented computation outsourcing. Fully homomorphic encryption is amenable to parallelization and hardware acceleration to improve its scalability and latency, but is limited in the complexity of functions it can efficiently evaluate. SGX is capable of arbitrarily complex calculations, but due to...

2021/1100 (PDF) Last updated: 2022-10-25

REDsec: Running Encrypted Discretized Neural Networks in Seconds

Lars Folkerts, Charles Gouert, Nektarios Georgios Tsoutsos

Applications

Machine learning as a service (MLaaS) has risen to become a prominent technology due to the large development time, amount of data, hardware costs, and level of expertise required to develop a machine learning model. However, privacy concerns prevent the adoption of MLaaS for applications with sensitive data. A promising privacy preserving solution is to use fully homomorphic encryption (FHE) to perform the ML computations. Recent advancements have lowered computational costs by several...

2021/1024 (PDF) Last updated: 2021-08-30

Efficient Implementation of Lightweight Hash Functions on GPU and Quantum Computers for IoT Applications

Wai-Kong Lee, Kyungbae Jang, Gyeongju Song, Hyunji Kim, Seong Oun Hwang, Hwajeong Seo

Implementation

Secure communication is an important aspect Internet of Things (IoT) applications in order to avoid cyber-security attacks and privacy issue. One of the key security aspects is data integrity, which can be protected by employing cryptographic hash functions. Recently, US National Institute of Standards and Technology (NIST) had initialized a competition to standardize lightweight hash functions targeting constrained devices, which can be used in IoT applications. The communication in IoT...

2021/823 (PDF) Last updated: 2022-06-22

GPU-accelerated PIR with Client-Independent Preprocessing for Large-Scale Applications

Daniel Günther, Maurice Heymann, Benny Pinkas, Thomas Schneider

Cryptographic protocols

Multi-Server Private Information Retrieval (PIR) is a cryptographic protocol that allows a client to securely query a database entry from $n \geq 2$ servers of which less than $t$ can collude, s.t. the servers learn no information about the query. Highly efficient PIR could be used for large-scale applications like Compromised Credential Checking (C3) (USENIX Security'19), which allows users to check whether their credentials have been leaked in a data breach. However, state-of-the art PIR...

2021/720 (PDF) Last updated: 2021-05-31

On Reverse Engineering Neural Network Implementation on GPU

Łukasz Chmielewski, Léo Weissbart

Applications

In recent years machine learning has become increasingly mainstream across industries. Additionally, Graphical Processing Unit (GPU) accelerators are widely deployed in various neural network (NN) applications, including image recognition for autonomous vehicles and natural language processing, among others. Since training a powerful network requires expensive data collection and computing power, its design and parameters are often considered a secret intellectual property of their...

2021/508 (PDF) Last updated: 2021-04-23

Over 100x Faster Bootstrapping in Fully Homomorphic Encryption through Memory-centric Optimization with GPUs

Wonkyung Jung, Sangpyo Kim, Jung Ho Ahn, Jung Hee Cheon, Younho Lee

Implementation

Fully Homomorphic encryption (FHE) has been gaining popularity as an emerging way of enabling an unlimited number of operations on the encrypted message without decryption. A major drawback of FHE is its high computational cost. Especially, a bootstrapping that refreshes the noise accumulated through consequent FHE operations on the ciphertext is even taking minutes. This significantly limits the practical use of FHE in numerous real applications. By exploiting massive parallelism available...

2021/173 (PDF) Last updated: 2022-02-17

TensorCrypto

Wai-Kong Lee, Hwajeong Seo, Zhenfei Zhang, Seongoun Hwang

Implementation

Tensor core is a specially designed hardware included in new NVIDIA GPU chips, aimed at accelerating deep learning applications. With the introduction of tensor core, the matrix multiplication at low precision can be computed much faster than using conventional integer and floating point units in NVIDIA GPU. In the past, applications of tensor core were mainly restricted to machine learning and mixed precision scientific computing. In this paper, we show that for the first time, tensor core...

2021/141 (PDF) Last updated: 2021-02-10

Advanced Lattice Sieving on GPUs, with Tensor Cores

Léo Ducas, Marc Stevens, Wessel van Woerden

Public-key cryptography

In this work, we study GPU implementations of various state-of-the-art sieving algorithms for lattices (Becker-Gama-Joux 2015, Becker-Ducas-Gama-Laarhoven 2016, Herold-Kirshanova 2017) inside the General Sieve Kernel (G6K, Albrecht et al. 2019). In particular, we extensively exploit the recently introduced *Tensor Cores* -- originally designed for raytracing and machine learning -- and demonstrate their fitness for the cryptanalytic task at hand. We also propose a new *dual-hash* technique...

2021/124 (PDF) Last updated: 2021-02-05

Efficient Number Theoretic Transform Implementation on GPU for Homomorphic Encryption

Ozgun Ozerk, Can Elgezen, Ahmet Can Mert, Erdinc Ozturk, Erkay Savas

Implementation

Lattice-based cryptography forms the mathematical basis for homomorphic encryption, which allows computation directly on encrypted data. Homomorphic encryption enables privacy-preserving applications such as secure cloud computing; yet, its practical applications suffer from the high computational complexity of homomorphic operations. Fast implementations of the homomorphic encryption schemes heavily depend on efficient polynomial arithmetic; multiplication of very large degree polynomials...

2020/1223 (PDF) Last updated: 2021-05-17

Algorithmic Acceleration of B/FV-like Somewhat Homomorphic Encryption for Compute-Enabled RAM

Jonathan Takeshita, Dayane Reis, Ting Gong, Michael Niemier, X. Sharon Hu, Taeho Jung

Implementation

Somewhat Homomorphic Encryption (SHE) allows arbitrary computation with nite multiplicative depths to be performed on encrypted data, but its overhead is high due to memory transfer incurred by large ciphertexts. Recent research has recognized the shortcomings of general-purpose computing for high-performance SHE, and has begun to pioneer the use of hardware-based SHE acceleration with hardware including FPGAs, GPUs, and Compute-Enabled RAM (CE-RAM). CERAM is well-suited for SHE, as it is...

2019/1216 (PDF) Last updated: 2020-08-10

Automated Search for Block Cipher Differentials: A GPU-Accelerated Branch-and-Bound Algorithm

Wei-Zhu Yeoh, Je Sen Teh, Jiageng Chen

Secret-key cryptography

Differential cryptanalysis of block ciphers requires the identification of differential characteristics with high probability. For block ciphers with large block sizes and number of rounds, identifying these characteristics is computationally intensive. The branch-and-bound algorithm was proposed by Matsui to automate this task. Since then, numerous improvements were made to the branch-and-bound algorithm by bounding the number of active s-boxes, incorporating a meet-in-the-middle approach,...

2016/471 (PDF) Last updated: 2016-07-14

NTRU Modular Lattice Signature Scheme on CUDA GPUs

Wei Dai, John Schanck, Berk Sunar, William Whyte, Zhenfei Zhang

Public-key cryptography

In this work we show how to use Graphics Processing Units (GPUs) with Compute Unified Device Architecture (CUDA) to accelerate a lattice based signature scheme, namely, the NTRU modular lattice signature (NTRU-MLS) scheme. Lattice based schemes require operations on large vectors that are perfect candidates for GPU implementations. In addition, similar to most lattice based signature schemes, NTRU-MLS provides transcript security with a rejection sampling technique. With a GPU...

2016/445 (PDF) Last updated: 2017-11-20

SecureMed: Secure Medical Computation using GPU-Accelerated Homomorphic Encryption Scheme

Alhassan Khedr, Glenn Gulak

Sharing the medical records of individuals among healthcare providers and researchers around the world can accelerate advances in medical research. While the idea seems increasingly practical due to cloud data services, maintaining patient privacy is of paramount importance. Standard encryption algorithms help protect sensitive data from outside attackers but they cannot be used to compute on this sensitive data while being encrypted. Homomorphic Encryption (HE) presents a very useful tool...

2015/967 (PDF) Last updated: 2016-02-22

Freestart collision for full SHA-1

Marc Stevens, Pierre Karpman, Thomas Peyrin

This article presents an explicit freestart colliding pair for SHA-1, i.e. a collision for its internal compression function. This is the first practical break of the full SHA-1, reaching all 80 out of 80 steps. Only 10 days of computation on a 64-GPU cluster were necessary to perform this attack, for a cost of approximately $2^{57.5}$ calls to the compression function of SHA-1. This work builds on a continuous series of cryptanalytic advancements on SHA-1 since the theoretical collision...

2015/818 (PDF) Last updated: 2015-08-18

cuHE: A Homomorphic Encryption Accelerator Library

Wei Dai, Berk Sunar

Implementation

We introduce a CUDA GPU library to accelerate evaluations with homomorphic schemes defined over polynomial rings enabled with a number of optimizations including algebraic techniques for efficient evaluation, memory minimization techniques, memory and thread scheduling and low level CUDA hand-tuned assembly optimizations to take full advantage of the mass parallelism and high memory bandwidth GPUs offer. The arithmetic functions constructed to handle very large polynomial operands using...

2015/294 (PDF) Last updated: 2015-04-01

Accelerating Somewhat Homomorphic Evaluation using FPGAs

Erdi̇̀nç Öztürk, Yarkın Doröz, Berk Sunar, Erkay Savaş

Implementation

After being introduced in 2009, the first fully homomorphic encryption (FHE) scheme has created significant excitement in academia and industry. Despite rapid advances in the last 6 years, FHE schemes are still not ready for deployment due to an efficiency bottleneck. Here we introduce a custom hardware accelerator optimized for a class of reconfigurable logic to bring LTV based somewhat homomorphic encryption (SWHE) schemes one step closer to deployment in real-life applications. The...

2013/204 (PDF) Last updated: 2013-04-14

Computing Privacy-Preserving Edit Distance and Smith-Waterman Problems on the GPU Architecture

Shi Pu, Jyh-Charn Liu

Applications

This paper presents privacy-preserving, parallel computing algorithms on a graphic processing unit (GPU) architecture to solve the Edit-Distance (ED) and the Smith-Waterman (SW) problems. The ED and SW problems are formulated into dynamic programming (DP) computing problems, which are solved using the Secure Function Evaluation (SFE) to meet privacy protection requirements, based on the semi-honest security model. Major parallelization techniques include mapping of variables to support...

2013/059 (PDF) Last updated: 2013-02-06

Optimized GPU Implementation and Performance Analysis of HC Series of Stream Ciphers

Ayesha Khalid, Deblin Bagchi, Goutam Paul, Anupam Chattopadhyay

Secret-key cryptography

The ease of programming offered by the CUDA programming model attracted a lot of programmers to try the platform for acceleration of many non-graphics applications. Cryptography, being no exception, also found its share of exploration efforts, especially block ciphers. In this contribution we present a detailed walk-through of effective mapping of HC-128 and HC-256 stream ciphers on GPUs. Due to inherent inter-S-Box dependencies, intra-S-Box dependencies and a high number of memory accesses...

2011/097 (PDF) Last updated: 2011-02-28

Fastplay-A Parallelization Model and Implementation of SMC on CUDA based GPU Cluster Architecture

Shi Pu, Pu Duan, Jyh-Charn Liu

Implementation

We propose a four-tiered parallelization model for acceleration of the secure multiparty computation (SMC) on the CUDA based Graphic Processing Unit (GPU) cluster architecture. Specification layer is the top layer, which adopts the SFDL of Fairplay for specification of secure computations. The SHDL file generated by the SFDL compiler of Fairplay is used as inputs to the function layer, for which we developed both multi-core and GPU based control functions for garbling of various types of...

Search Help

59 results sorted by ID