Official repository for the paper
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models, CVPR 2025 (Oral).
Felix Taubner1,2, Ruihang Zhang1, Mathieu Tuli3, David B. Lindell1,2
1University of Toronto, 2Vector Institute, 3LG Electronics
TL;DR: CAP4D turns any number of reference images into an animatable avatar.
# 1. Clone repo
git clone https://github.com/felixtaubner/cap4d/
cd cap4d
# 2. Create conda environment for CAP4D:
conda create --name cap4d_env python=3.10
conda activate cap4d_env
# 3. Install requirements
pip install -r requirements.txtFollow the instructions and install Pytorch3D. Make sure to install with CUDA support. We recommend to install from source: pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"
Follow instructions on the FLAME website to download the FLAME blendshapes files. Locate flame2023_no_jaw.pkl and place it in data/assets/flame/.
Download the MMDM weights with this link, and place cap4d_mmdm_100k.ckpt in data/weights/mmdm/checkpoints/.
Run the pipeline in debug settings to test the installation.
bash scripts/test_pipeline.shCheck if a video is exported to examples/debug_output/tesla/sequence_00/renders.mp4.
If it appears to show a blurry cartoon Nicola Tesla, you're all set!
Run the provided scripts to generate avatars and animate them with a single script:
bash scripts/generate_felix.sh
bash scripts/generate_lincoln.sh
bash scripts/generate_tesla.shThe output directories contain exported animations which you can view in real-time.
Open the real-time viewer in your browser (powered by Brush). Click Load file and
upload the exported animation found in examples/output/{SUBJECT}/animation_{ID}/exported_animation.ply.
Coming soon! For now, only generations using the provided identities with precomputed FlowFace annotations are supported.
# Generate images with single reference image
python cap4d/inference/generate_images.py --config_path configs/generation/single_ref.yaml --reference_data_path examples/input/lincoln/ --output_path examples/output/lincoln/
# Generate images with multiple reference images
python cap4d/inference/generate_images.py --config_path configs/generation/multi_ref.yaml --reference_data_path examples/input/felix/ --output_path examples/output/felix/Note: the generation script will use all visible CUDA devices. The more available devices, the faster it runs! This will take hours, and requires lots of RAM (ideally > 64 GB) to run smoothly.
python gaussianavatars/train.py --config_path configs/avatar/default.yaml --source_paths examples/output/{SUBJECT}/reference_images/ examples/output/{SUBJECT}/generated_images/ --model_path examples/output/{SUBJECT}/avatar/ --interval 5000For now, only animations with precomputed FLAME annotations are supported.
These animations are located in examples/input/animation/.
python gaussianavatars/animate.py --model_path examples/output/lincoln/avatar/ --target_animation_path examples/input/animation/sequence_00/fit.npz --target_cam_trajectory_path examples/input/animation/sequence_00/orbit.npz --output_path examples/output/lincoln/animation_00/ --export_ply 1 --compress_ply 0The --target_animation_path contains FLAME expressions and pose, while the (optional) --target_cam_trajectory_path contains the relative camera trajectory.
The MMDM code is based on ControlNet. The 4D Gaussian avatar code is based on GaussianAvatars. Special thanks to the authors for making their code public!
Related work:
- CAT3D: Create Anything in 3D with Multi-View Diffusion Models
- GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians
- FlowFace: 3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow
- StableDiffusion: High-Resolution Image Synthesis with Latent Diffusion Models
Awesome concurrent work:
- Pippo: High-Resolution Multi-View Humans from a Single Image
- Avat3r: Large Animatable Gaussian Reconstruction Model for High-fidelity 3D Head Avatars
@inproceedings{taubner2025cap4d,
author = {Taubner, Felix and Zhang, Ruihang and Tuli, Mathieu and Lindell, David B.},
title = {{CAP4D}: Creating Animatable {4D} Portrait Avatars with Morphable Multi-View Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025},
pages = {5318-5330}
}This work was developed in collaboration with and with sponsorship from LG Electronics. We gratefully acknowledge their support and contributions throughout the course of this project.