CharacterShot: Controllable and Consistent 4D Character Animation
Junyao Gaoβ‘, Jiaxing Liβ‘, Wenran Liu, Yanhong Zeng, Fei Shen, Kai Chen, Yanan Sun*, Cairong Zhao*
(β‘ equal contributions, * corresponding authors)
CharacterShot supports diverse character designs and custom motion control (2D pose sequence), enabling 4D character animation in minutes and without specialized hardware.
Your star is our fuel! We're revving up the engines with it!
- [2026/2/27] π₯ We release the training/inference codes, models and dataset of CharacterShot!!!
- [2025/8/12] π₯ We release the paper of CharacterShot!!!
- Character4D Dataset.
- Training Code.
- Inference Code.
- 4D Optimization Code.
CharacterShot supports: 1) 2D character animation from a character image and pose video; 2) multi-view videos generation from multi-view images of a character and pose images; 3) 4D optimization from multi-view videos.
git clone git@github.com:Jeoyal/CharacterShot.git
cd ./CharacterShot
This script has been tested on CUDA version of 12.4.
conda create -n charactershot python==3.10
conda activate charactershot
pip install -r requirements.txt
cd submodules
pip install -e ./simple-knn
pip install -e ./depth-diff-gaussian-rasterization
cd ..
-
Download the checkpoints of 2D character animation and multi-view generation from here and here.
-
Download DWPose pretrained model:
mkdir -p inference/dwpose/models/ wget https://huggingface.co/yzd-v/DWPose/resolve/main/yolox_l.onnx?download=true -O inference/dwpose/models/yolox_l.onnx wget https://huggingface.co/yzd-v/DWPose/resolve/main/dw-ll_ucoco_384.onnx?download=true -O inference/dwpose/models/dw-ll_ucoco_384.onnx
Construct your inference samples in the following structure:
βββ inference/
β βββexamples/
β βββ 4d/
β βββ images/
β βββ 001/ # character images in 21 views.
β βββ view0.png
β βββ ...
β βββ poses/
β βββ 001/ # pose images.
β βββ 0.png
β βββ ...
β βββ 2d/
β βββ images/
β βββ 001.png
β βββ ... # character images.
β βββ poses/
β βββ 001/ # pose images.
β βββ 0.png
β βββ ...
For 2D character animation:
python -m inference.cli_demo_4d --image_path inference/examples/2d/images/ --func_type 2dpretrain --model_path Gaojunyao/Character2D/
For multi-view videos generation:
python -m inference.cli_demo_4d --image_path inference/examples/4d/images/ --func_type 4dfinetune --model_path Gaojunyao/CharacterShot/
Navigate into ./finetune and download the checkpoints of CogVideoX-5b-I2V.
For 2D character animation pretraining, you should prepare your own dataset into ./data/i2v/2dpretrain and start training with:
bash train_2d_pretrain.sh
After that, to fine-tune the model for multi-view video generation, download the our proposed 4D dataset Character4D and follow the steps below to prepare cached input latents:
python prepare_multiview_cache.py
python convert2meta.py
And start training with:
bash train_4d_finetune.sh
Please set --pose_model_path in train_4d_finetune.sh to the checkpoint from the 2D pretraining stage, or continue training from Gaojunyao/Character2D.
After generating multi-view videos via inference, first prepare the data, then run optimization:
cd 4D_optimization
# Step 1: Prepare data β split inference mp4 into per-view frames + copy camera templates
# Edit prepare_optimization_data.sh to set INFERENCE_VIDEO and MULTIVIEW_VIDEO_FOLDER paths
bash prepare_optimization_data.sh
# Step 2: Train
# Edit train.sh to set MULTIVIEW_VIDEO_FOLDER path
bash train.sh
# Step 3: Render
# Edit render.sh to set 4DGS_MODEL_PATH to training output path
bash render.sh
We construct a large-scale 4D character dataset by filtering high-quality characters from VRoid Hub, collecting a total of 13,115 characters in OBJ format. We then retarget and bind 40 diverse motions (e.g., dancing, singing, and jumping), using skeletons from Mixamo, to these characters. Next, we render all characters from 21 viewpoints in the A-pose and under various motions. Finally, we release the raw and rigged OBJ files, along with the rendered images and pose visualizations, at this link.
All assets and code are under the license unless specified otherwise.
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{gao2025charactershot,
title={CharacterShot: Controllable and Consistent 4D Character Animation},
author={Gao, Junyao and Li, Jiaxing and Liu, Wenran and Zeng, Yanhong and Shen, Fei and Chen, Kai and Sun, Yanan and Zhao, Cairong},
journal={arXiv preprint arXiv:2508.07409},
year={2025}
}
The code is built upon CogVideo, WideRange4D and 4DGaussians.