Skip to content

Privacy-preserving LLM inference with CKKS homomorphic encryption and Private Linear Layer (PLL) protection for LoRA fine-tuned models

Notifications You must be signed in to change notification settings

jayeshthk/secure-llm-fhe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Secure LLM Inference with Homomorphic Encryption

A Python implementation based on the paper: "Practical Secure Inference Algorithm for Fine-tuned Large Language Model Based on Fully Homomorphic Encryption" by Zhang Ruoyan, Zheng Zhongxiang, Bao Wankang (2025)

Paper: https://arxiv.org/abs/2501.01672

Overview

This implementation demonstrates the key concepts from the paper:

  1. Open-LLM + Private-LoRA Architecture: Splits computation between client (base model) and server (LoRA weights)
  2. Private Linear Layer (PLL): Protects LoRA weights against model extraction attacks using LWE-hard problem
  3. CKKS Homomorphic Encryption: Enables computation on encrypted user inputs

Key Features

Privacy-Preserving: User inputs are encrypted before transmission ✅ Model Protection: LoRA weights protected by PLL with cryptographic guarantees ✅ Practical Efficiency: Minimizes expensive ciphertext operations (1.61s/token in paper) ✅ Easy to Understand: Simplified implementation showing core concepts

Architecture

┌─────────────────────────────────────────────────────────────┐
│                        CLIENT SIDE                          │
├─────────────────────────────────────────────────────────────┤
│  1. Base LLM (plaintext)                                    │
│     - Open-source model (e.g., ChatGLM2-6B)                 │
│     - Runs locally, no encryption needed                    │
│                                                              │
│  2. Encrypt intermediate result                             │
│     - CKKS encryption before sending to server              │
│                                                              │
│  4. Decrypt final result                                    │
│     - Receive and decrypt server response                   │
└─────────────────────────────────────────────────────────────┘
                           ↓  ↑  (encrypted)
┌─────────────────────────────────────────────────────────────┐
│                        SERVER SIDE                          │
├─────────────────────────────────────────────────────────────┤
│  3. Private LoRA Inference                                  │
│     - LoRA matrices (A1, A2) protected by PLL               │
│     - Computation on encrypted data                         │
│     - Returns encrypted result                              │
└─────────────────────────────────────────────────────────────┘

Installation

Basic Installation

pip install numpy tenseal

Full Installation (for actual LLM integration)

pip install -r requirements.txt

Install TenSEAL from source (if needed)

# Install dependencies
pip install cmake

# Clone and build
git clone https://github.com/OpenMined/TenSEAL.git
cd TenSEAL
pip install .

Usage

Quick Start

from secure_llm_inference import SecureLLMInference
import numpy as np

# Initialize system
model_dim = 128
lora_rank = 8
system = SecureLLMInference(model_dim, lora_rank)

# User input (e.g., token embedding)
user_input = np.random.randn(1, model_dim)

# Run secure inference
result = system.full_inference(user_input)

Run Demo

from secure_llm_inference import demo_secure_inference

# This will run a complete demonstration
demo_secure_inference()

Implementation Details

1. Private Linear Layer (PLL)

The PLL transforms a standard linear layer to protect against model extraction:

y = xA + x'(E' ⊙ P) + sA + kq mod q

Where:

  • A: Original weight matrix (LoRA matrices)
  • x': Auxiliary vector filled with 1s
  • E': Small Gaussian noise matrix
  • P: Random Bernoulli matrix (dropout-like)
  • s: Secret vector with fixed length γ
  • k: Random integer matrix
  • q: Modulus parameter

Security: Breaking PLL is as hard as solving the Learning with Errors (LWE) problem.

2. LoRA Layer

Low-Rank Adaptation reduces trainable parameters:

h = Wx + (α/r)ABx

Where:

  • W: Frozen pre-trained weights
  • A, B: Low-rank matrices (r << d)
  • α: Scaling factor

3. CKKS Encryption

  • Supports approximate arithmetic on encrypted real numbers
  • Enables addition and multiplication on ciphertexts
  • Implements rotation for efficient matrix operations

Performance

From the paper (ChatGLM2-6B):

  • Inference Speed: 1.61 seconds/token
  • Comparison: PUMA achieves 200s/token on LLAMA-7B
  • Efficiency Gain: ~124x faster

Security Guarantees

  1. User Input Protection:

    • All user data encrypted with CKKS before transmission
    • Server never sees plaintext inputs
  2. Model Weight Protection:

    • LoRA weights protected by PLL
    • Model extraction reduced to LWE problem (provably hard)
  3. Communication Security:

    • Only encrypted data transmitted between client/server

Limitations

This is a simplified demonstration implementation. For production use, you need:

  1. ✗ Full CKKS matrix multiplication implementation
  2. ✗ Integration with actual LLMs (ChatGLM2, LLaMA, etc.)
  3. ✗ Optimized rotation and packing schemes
  4. ✗ Network protocol for client-server communication
  5. ✗ Key management system

Paper Citation

@article{zhang2025practical,
  title={Practical Secure Inference Algorithm for Fine-tuned Large Language Model Based on Fully Homomorphic Encryption},
  author={Zhang, Ruoyan and Zheng, Zhongxiang and Bao, Wankang},
  journal={arXiv preprint arXiv:2501.01672},
  year={2025}
}

Related Papers

  • PUMA: Secure Transformer Inference (Dong et al., 2023)
  • Iron: Private Transformer Inference (He et al., 2022)
  • LoRA: Low-Rank Adaptation (Hu et al., 2021)
  • CKKS Homomorphic Encryption (Cheon et al., 2017)

Resources

Contributing

This is an educational implementation. For the official implementation, contact the paper authors:

  • Corresponding Author: Zheng Zhongxiang (zhengzx@cuc.edu.cn)
  • Affiliation: Communication University of China

License

Educational/Research purposes. Check with paper authors for commercial use.

Acknowledgments

Implementation based on concepts from:

  • Zhang et al. (2025) - Original paper
  • Microsoft Research - SEAL library
  • OpenMined - TenSEAL library

About

Privacy-preserving LLM inference with CKKS homomorphic encryption and Private Linear Layer (PLL) protection for LoRA fine-tuned models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages