π Our paper "Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation"[Link][arXiv] has been accepted to International Journal of Computer Vision (IJCV) in February 2026.
Text-to-image diffusion models, such as Stable Diffusion, generate highly realistic images from text descriptions. However, the generation of certain content at such high quality raises concerns. A prominent issue is the accurate depiction of identifiable facial images, which could lead to malicious deepfake generation and privacy violations. In this paper, we propose Anonymization Prompt Learning (APL) to address this problem. Specifically, we train a learnable prompt prefix for text-to-image diffusion models, which forces the model to generate anonymized facial identities, even when prompted to produce images of specific individuals. Extensive quantitative and qualitative experiments demonstrate the successful anonymization performance of APL, which anonymizes any specific individuals without compromising the quality of non-identity-specific image generation. Furthermore, we reveal the plug-and-play property of the learned prompt prefix, enabling its effective application across different pretrained text-to-image models for transferrable privacy and security protection against the risks of deepfakes.
The EmbeddingManager class (in ldm/modules/embedding_manager.py) manages learnable prompt embeddings. It:
- Initializes placeholder token embeddings (default: 'alz')
- Supports multi-vector tokens (default: 10 vectors per token)
- Handles embedding save/load operations
- Integrates with CLIP text encoders
The training alternates between:
- Identity anonymization: Transforming specific person prompts to anonymized face attribute descriptions. The mapping from person names to anonymized attributes is defined in
people.py(see the dataset preparation section). - General image preservation: Maintaining quality for non-identity-specific prompts
The loss function encourages the learned prefix to push identity-specific prompts toward anonymized representations while preserving general image quality.
- Clone this repository:
git clone https://github.com/shiliang26/APL
cd APL-main- Install required Python packages:
pip install -r requirements.txt-
Download the Stable Diffusion v1.5 checkpoint:
- Place the checkpoint file at
./models/Stable-diffusion/v1-5-pruned-emaonly.ckpt - Or update the
--ckpt_pathargument in training scripts
- Place the checkpoint file at
-
Prepare datasets:
- For training, you'll need:
- Face identity dataset (organized in
./data/face/) - Generic text-image pair dataset: You can use LAION dataset images (in
./data/laion/directory) or your own generic text-image pair dataset. The dataset should contain image files and corresponding text prompts. - Prompt templates file (
./templates.txt) - General prompts file (
./prompts.txt) - Person attributes mapping: The
people.pyfile contains a dictionary mapping person names to anonymized face attribute descriptions. This is used during training to transform identity-specific prompts into anonymized representations. You can modify this file to include your own person names and their corresponding anonymized attributes.
- Face identity dataset (organized in
- For training, you'll need:
Train an anonymization prompt prefix using the main training script:
python train.py \
--prompt <prompt_name> \
--train_method <method> \
--iterations <num_iterations> \
--lr <learning_rate> \
--config_path configs/stable-diffusion/v1-inference.yaml \
--ckpt_path ./models/Stable-diffusion/v1-5-pruned-emaonly.ckpt \
--devices 0,1 \
--ddim_steps 50Key Arguments:
--prompt: Name/identifier for the prompt being trained--train_method: Training method identifier--iterations: Number of training iterations (default: 1000)--lr: Learning rate (default: 1e-5)--devices: CUDA device IDs (e.g., "0,1" for multi-GPU)--ddim_steps: Number of DDIM sampling steps for training (default: 50)
The trained embedding will be saved in ./embeddings/ directory.
Test the anonymization performance on face identities:
python test_id.pyThis script evaluates the identity distance between generated images and ground truth face embeddings using InsightFace.
Evaluate generation quality using FID and CLIP scores:
python test_fid_clip_score.pyConfigure the test by modifying the script variables:
embedding: Path to the trained embedding filebenchmark: Dataset to evaluate on (e.g., 'coco_person_prompts')anonymize: Boolean flag to enable/disable anonymization
To generate anonymized images with a trained embedding:
- Load the embedding in your script:
from ldm.modules.embedding_manager import EmbeddingManager
embedding_manager = EmbeddingManager(...)
embedding_manager.load('./embeddings/APL-v1.1-9999.pt')- Use the placeholder token 'alz' in your prompts:
prompt = "alz A close-up portrait of [person name]"APL-main/
βββ train.py # Main training script
βββ test_id.py # Identity anonymization testing
βββ test_fid_clip_score.py # FID and CLIP score evaluation
βββ dataset_generation.py # Dataset generation utilities
βββ people.py # Person names to anonymized attributes mapping
βββ args/
β βββ clip_args.py # CLIP evaluation arguments
βββ configs/
β βββ stable-diffusion/ # Stable Diffusion configuration files
βββ ldm/ # Latent Diffusion Model implementation
β βββ models/ # Model definitions
β βββ modules/ # Core modules
β β βββ embedding_manager.py # Embedding manager for prompt learning
β βββ data/ # Dataset loaders
βββ README.md
If you find our work helpful, please cite our paper:
@article{shi2026anonymization,
title={Anonymization prompt learning for facial privacy-preserving text-to-image generation},
author={Shi, Liang and Zhang, Jie and Shan, Shiguang},
journal={International Journal of Computer Vision},
volume={134},
number={4},
pages={192},
year={2026},
publisher={Springer}
}