Secure LLM Inference with Homomorphic Encryption

A Python implementation based on the paper: "Practical Secure Inference Algorithm for Fine-tuned Large Language Model Based on Fully Homomorphic Encryption" by Zhang Ruoyan, Zheng Zhongxiang, Bao Wankang (2025)

Paper: https://arxiv.org/abs/2501.01672

Overview

This implementation demonstrates the key concepts from the paper:

Open-LLM + Private-LoRA Architecture: Splits computation between client (base model) and server (LoRA weights)
Private Linear Layer (PLL): Protects LoRA weights against model extraction attacks using LWE-hard problem
CKKS Homomorphic Encryption: Enables computation on encrypted user inputs

Key Features

✅ Privacy-Preserving: User inputs are encrypted before transmission ✅ Model Protection: LoRA weights protected by PLL with cryptographic guarantees ✅ Practical Efficiency: Minimizes expensive ciphertext operations (1.61s/token in paper) ✅ Easy to Understand: Simplified implementation showing core concepts

Architecture

┌─────────────────────────────────────────────────────────────┐
│                        CLIENT SIDE                          │
├─────────────────────────────────────────────────────────────┤
│  1. Base LLM (plaintext)                                    │
│     - Open-source model (e.g., ChatGLM2-6B)                 │
│     - Runs locally, no encryption needed                    │
│                                                              │
│  2. Encrypt intermediate result                             │
│     - CKKS encryption before sending to server              │
│                                                              │
│  4. Decrypt final result                                    │
│     - Receive and decrypt server response                   │
└─────────────────────────────────────────────────────────────┘
                           ↓  ↑  (encrypted)
┌─────────────────────────────────────────────────────────────┐
│                        SERVER SIDE                          │
├─────────────────────────────────────────────────────────────┤
│  3. Private LoRA Inference                                  │
│     - LoRA matrices (A1, A2) protected by PLL               │
│     - Computation on encrypted data                         │
│     - Returns encrypted result                              │
└─────────────────────────────────────────────────────────────┘

Installation

Basic Installation

pip install numpy tenseal

Full Installation (for actual LLM integration)

pip install -r requirements.txt

Install TenSEAL from source (if needed)

# Install dependencies
pip install cmake

# Clone and build
git clone https://github.com/OpenMined/TenSEAL.git
cd TenSEAL
pip install .

Usage

Quick Start

from secure_llm_inference import SecureLLMInference
import numpy as np

# Initialize system
model_dim = 128
lora_rank = 8
system = SecureLLMInference(model_dim, lora_rank)

# User input (e.g., token embedding)
user_input = np.random.randn(1, model_dim)

# Run secure inference
result = system.full_inference(user_input)

Run Demo

from secure_llm_inference import demo_secure_inference

# This will run a complete demonstration
demo_secure_inference()

Implementation Details

1. Private Linear Layer (PLL)

The PLL transforms a standard linear layer to protect against model extraction:

y = xA + x'(E' ⊙ P) + sA + kq mod q

Where:

A: Original weight matrix (LoRA matrices)
x': Auxiliary vector filled with 1s
E': Small Gaussian noise matrix
P: Random Bernoulli matrix (dropout-like)
s: Secret vector with fixed length γ
k: Random integer matrix
q: Modulus parameter

Security: Breaking PLL is as hard as solving the Learning with Errors (LWE) problem.

2. LoRA Layer

Low-Rank Adaptation reduces trainable parameters:

h = Wx + (α/r)ABx

Where:

W: Frozen pre-trained weights
A, B: Low-rank matrices (r << d)
α: Scaling factor

3. CKKS Encryption

Supports approximate arithmetic on encrypted real numbers
Enables addition and multiplication on ciphertexts
Implements rotation for efficient matrix operations

Performance

From the paper (ChatGLM2-6B):

Inference Speed: 1.61 seconds/token
Comparison: PUMA achieves 200s/token on LLAMA-7B
Efficiency Gain: ~124x faster

Security Guarantees

User Input Protection:
- All user data encrypted with CKKS before transmission
- Server never sees plaintext inputs
Model Weight Protection:
- LoRA weights protected by PLL
- Model extraction reduced to LWE problem (provably hard)
Communication Security:
- Only encrypted data transmitted between client/server

Limitations

This is a simplified demonstration implementation. For production use, you need:

✗ Full CKKS matrix multiplication implementation
✗ Integration with actual LLMs (ChatGLM2, LLaMA, etc.)
✗ Optimized rotation and packing schemes
✗ Network protocol for client-server communication
✗ Key management system

Paper Citation

@article{zhang2025practical,
  title={Practical Secure Inference Algorithm for Fine-tuned Large Language Model Based on Fully Homomorphic Encryption},
  author={Zhang, Ruoyan and Zheng, Zhongxiang and Bao, Wankang},
  journal={arXiv preprint arXiv:2501.01672},
  year={2025}
}

Related Papers

PUMA: Secure Transformer Inference (Dong et al., 2023)
Iron: Private Transformer Inference (He et al., 2022)
LoRA: Low-Rank Adaptation (Hu et al., 2021)
CKKS Homomorphic Encryption (Cheon et al., 2017)

Resources

Paper: https://arxiv.org/abs/2501.01672
Microsoft SEAL: https://github.com/microsoft/SEAL
TenSEAL: https://github.com/OpenMined/TenSEAL
LoRA: https://github.com/microsoft/LoRA

Contributing

This is an educational implementation. For the official implementation, contact the paper authors:

Corresponding Author: Zheng Zhongxiang (zhengzx@cuc.edu.cn)
Affiliation: Communication University of China

License

Educational/Research purposes. Check with paper authors for commercial use.

Acknowledgments

Implementation based on concepts from:

Zhang et al. (2025) - Original paper
Microsoft Research - SEAL library
OpenMined - TenSEAL library

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
EXAMPLES.md		EXAMPLES.md
README.md		README.md
requirements.txt		requirements.txt
secure_llm_inference.py		secure_llm_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Secure LLM Inference with Homomorphic Encryption

Overview

Key Features

Architecture

Installation

Basic Installation

Full Installation (for actual LLM integration)

Install TenSEAL from source (if needed)

Usage

Quick Start

Run Demo

Implementation Details

1. Private Linear Layer (PLL)

2. LoRA Layer

3. CKKS Encryption

Performance

Security Guarantees

Limitations

Paper Citation

Related Papers

Resources

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

jayeshthk/secure-llm-fhe

Folders and files

Latest commit

History

Repository files navigation

Secure LLM Inference with Homomorphic Encryption

Overview

Key Features

Architecture

Installation

Basic Installation

Full Installation (for actual LLM integration)

Install TenSEAL from source (if needed)

Usage

Quick Start

Run Demo

Implementation Details

1. Private Linear Layer (PLL)

2. LoRA Layer

3. CKKS Encryption

Performance

Security Guarantees

Limitations

Paper Citation

Related Papers

Resources

Contributing

License

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages