Collaborative Diffusion (CVPR 2023)

This repository contains the implementation of the following paper:

Collaborative Diffusion for Multi-Modal Face Generation and Editing
Ziqi Huang, Kelvin C.K. Chan, Yuming Jiang, Ziwei Liu
IEEE/CVF International Conference on Computer Vision (CVPR), 2023

From MMLab@NTU affiliated with S-Lab, Nanyang Technological University

[Paper] | [Project Page] | [Video]

Overview

We propose Collaborative Diffusion, where users can use multiple modalities to control face generation and editing. (a) Face Generation. Given multi-modal controls, our framework synthesizes high-quality images consistent with the input conditions. (b) Face Editing. Collaborative Diffusion also supports multi-modal editing of real images with promising identity preservation capability.

We use pre-trained uni-modal diffusion models to perform multi-modal guided face generation and editing. At each step of the reverse process (i.e., from timestep t to t − 1), the dynamic diffuser predicts the spatial-varying and temporal-varying influence function to selectively enhance or suppress the contributions of the given modality.

Updates

[06/2023] We provide the preprocessed multi-modal annotations here.
[05/2023] Training code for Collaborative Diffusion (512x512) released.
[04/2023] Project page and video available.
[04/2023] Arxiv paper available.
[04/2023] Checkpoints for multi-modal face generation (512x512) released.
[04/2023] Inference code for multi-modal face generation (512x512) released.

Installation

Clone repo

git clone https://github.com/ziqihuangg/Collaborative-Diffusion
cd Collaborative-Diffusion

Create conda environment.
If you already have an ldm environment installed according to LDM, you do not need to go throught this step (i.e., step 2). You can simply conda activate ldm and jump to step 3.
```
 conda env create -f environment.yaml
 conda activate codiff
```

Install dependencies

 pip install transformers==4.19.2 scann kornia==0.6.4 torchmetrics==0.6.0
 conda install -c anaconda git
 pip install git+https://github.com/arogozhnikov/einops.git

Download

Download Checkpoints

Download the pre-trained models from here.

Put the models under pretrained as follows:

Collaborative-Diffusion
└── pretrained
    ├── 512_codiff_mask_text.ckpt
    ├── 512_mask.ckpt
    ├── 512_text.ckpt
    └── 512_vae.ckpt

Download Datasets

We provide preprocessed data used in this project (see Acknowledgement for data source). You need download them only if you want to reproduce the training of Collaborative Diffusion. You can skip this step if you simply want to use our pre-trained models for inference.

Download the preprocessed training data from here.

Put the datasets under dataset as follows:

Collaborative-Diffusion
└── dataset
    ├── image
    |   └──image_512_downsampled_from_hq_1024
    ├── text
    |   └──captions_hq_beard_and_age_2022-08-19.json
    ├── mask
    |   └──CelebAMask-HQ-mask-color-palette_32_nearest_downsampled_from_hq_512_one_hot_2d_tensor
    └── sketch
        └──sketch_1x1024_tensor

For more details about the annotations, please refer to CelebA-Dialog.

Generation

You can control face generation using text and segmentation mask.

mask_path is the path to the segmentation mask, and input_text is the text condition.

python generate_512.py \
--mask_path test_data/512_masks/27007.png \
--input_text "This man has beard of medium length. He is in his thirties."

python generate_512.py \
--mask_path test_data/512_masks/29980.png \
--input_text "This woman is in her forties."

You can view different types of intermediate outputs by setting the flags as 1. For example, to view the influence functions, you can set return_influence_function to 1.
```
python generate_512.py \
--mask_path test_data/512_masks/27007.png \
--input_text "This man has beard of medium length. He is in his thirties." \
--ddim_steps 10 \
--batch_size 1 \
--save_z 1 \
--return_influence_function 1 \
--display_x_inter 1 \
--save_mixed 1
```
Note that producing intermediate results might consume a lot of GPU memory, so we suggest setting batch_size to 1, and setting ddim_steps to a smaller value (e.g., 10) to save memory and computation time.

Training

We provide the entire training pipeline, including training the VAE, uni-modal diffusion models, and our proposed dynamic diffusers.

If you are only interested in training dynamic diffusers, you can use our provided checkpoints for VAE and uni-modal diffusion models. Simply skip step 1 and 2 and directly look at step 3.

Train VAE.

LDM compresses images to the VAE latents to save computational cost, and later train UNet diffusion models on the VAE latents. This step is to reproduce the pretrained/512_vae.ckpt.
```
python main.py \
--logdir 'outputs/512_vae' \
--base 'configs/512_vae.yaml' \
-t  --gpus 0,1,2,3,
```
Train the uni-modal diffusion models.

(1) train text-to-image model. This step is to reproduce the pretrained/512_text.ckpt.
```
python main.py \
--logdir 'outputs/512_text' \
--base 'configs/512_text.yaml' \
-t  --gpus 0,1,2,3,
```
(2) train mask-to-image model. This step is to reproduce the pretrained/512_mask.ckpt.
```
python main.py \
--logdir 'outputs/512_mask' \
--base 'configs/512_mask.yaml' \
-t  --gpus 0,1,2,3,
```
Train the dynamic diffusers.

The dynamic diffusers are the meta-networks that determine how the uni-modal diffusion models collaborate together. This step is to reproduce the pretrained/512_codiff_mask_text.ckpt.
```
python main.py \
--logdir 'outputs/512_codiff_mask_text' \
--base 'configs/512_codiff_mask_text.yaml' \
-t  --gpus 0,1,2,3,
```

Citation

If you find our repo useful for your research, please consider citing our paper:

 @InProceedings{huang2023collaborative,
     author = {Huang, Ziqi and Chan, Kelvin C.K. and Jiang, Yuming and Liu, Ziwei},
     title = {Collaborative Diffusion for Multi-Modal Face Generation and Editing},
     booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
     year = {2023},
 }

Acknowledgement

The codebase is maintained by Ziqi Huang.

This project is built on top of LDM. We trained on data provided by CelebA-HQ, CelebA-Dialog, CelebAMask-HQ, and MM-CelebA-HQ-Dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
configs		configs
ldm		ldm
test_data/512_masks		test_data/512_masks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
generate_512.py		generate_512.py
main.py		main.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Collaborative Diffusion (CVPR 2023)

Overview

Updates

Installation

Download

Download Checkpoints

Download Datasets

Generation

Training

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

lvchigo/Collaborative-Diffusion

Folders and files

Latest commit

History

Repository files navigation

Collaborative Diffusion (CVPR 2023)

Overview

Updates

Installation

Download

Download Checkpoints

Download Datasets

Generation

Training

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages