Skip to content

Jeoyal/CharacterShot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

[Arxiv] CharacterShot: Controllable and Consistent 4D Character Animation

CharacterShot: Controllable and Consistent 4D Character Animation

Junyao Gao‑, Jiaxing Li‑, Wenran Liu, Yanhong Zeng, Fei Shen, Kai Chen, Yanan Sun*, Cairong Zhao*

(‑ equal contributions, * corresponding authors)

CharacterShot supports diverse character designs and custom motion control (2D pose sequence), enabling 4D character animation in minutes and without specialized hardware.

Your star is our fuel! We're revving up the engines with it!

News

  • [2026/2/27] πŸ”₯ We release the training/inference codes, models and dataset of CharacterShot!!!
  • [2025/8/12] πŸ”₯ We release the paper of CharacterShot!!!

TODO List

  • Character4D Dataset.
  • Training Code.
  • Inference Code.
  • 4D Optimization Code.

Get Started

CharacterShot supports: 1) 2D character animation from a character image and pose video; 2) multi-view videos generation from multi-view images of a character and pose images; 3) 4D optimization from multi-view videos.

Clone the Repository

git clone git@github.com:Jeoyal/CharacterShot.git
cd ./CharacterShot

Environment Setup

This script has been tested on CUDA version of 12.4.

conda create -n charactershot python==3.10
conda activate charactershot
pip install -r requirements.txt
cd submodules
pip install -e ./simple-knn
pip install -e ./depth-diff-gaussian-rasterization
cd ..

Downloading Checkpoints

  1. Download the checkpoints of 2D character animation and multi-view generation from here and here.

  2. Download DWPose pretrained model:

    mkdir -p inference/dwpose/models/
    wget https://huggingface.co/yzd-v/DWPose/resolve/main/yolox_l.onnx?download=true -O inference/dwpose/models/yolox_l.onnx
    wget https://huggingface.co/yzd-v/DWPose/resolve/main/dw-ll_ucoco_384.onnx?download=true -O inference/dwpose/models/dw-ll_ucoco_384.onnx
    

Preparing Inference Samples

Construct your inference samples in the following structure:

β”œβ”€β”€ inference/
β”‚   β”œβ”€β”€examples/
β”‚       β”œβ”€β”€ 4d/
β”‚           β”œβ”€β”€ images/
β”‚               β”œβ”€β”€ 001/ # character images in 21 views.
β”‚                   β”œβ”€β”€ view0.png 
β”‚                   β”œβ”€β”€ ...
β”‚           β”œβ”€β”€ poses/
β”‚               β”œβ”€β”€ 001/ # pose images.
β”‚                   β”œβ”€β”€ 0.png 
β”‚                   β”œβ”€β”€ ... 
β”‚       β”œβ”€β”€ 2d/
β”‚           β”œβ”€β”€ images/
β”‚               β”œβ”€β”€ 001.png
β”‚               β”œβ”€β”€ ... # character images.
β”‚           β”œβ”€β”€ poses/
β”‚               β”œβ”€β”€ 001/ # pose images.
β”‚                   β”œβ”€β”€ 0.png 
β”‚                   β”œβ”€β”€ ... 

Running Inference

For 2D character animation:

python -m inference.cli_demo_4d --image_path inference/examples/2d/images/ --func_type 2dpretrain --model_path Gaojunyao/Character2D/

For multi-view videos generation:

python -m inference.cli_demo_4d --image_path inference/examples/4d/images/ --func_type 4dfinetune --model_path Gaojunyao/CharacterShot/

Training

Navigate into ./finetune and download the checkpoints of CogVideoX-5b-I2V.

For 2D character animation pretraining, you should prepare your own dataset into ./data/i2v/2dpretrain and start training with:

bash train_2d_pretrain.sh

After that, to fine-tune the model for multi-view video generation, download the our proposed 4D dataset Character4D and follow the steps below to prepare cached input latents:

python prepare_multiview_cache.py
python convert2meta.py

And start training with:

bash train_4d_finetune.sh

Please set --pose_model_path in train_4d_finetune.sh to the checkpoint from the 2D pretraining stage, or continue training from Gaojunyao/Character2D.

4D Optimization

After generating multi-view videos via inference, first prepare the data, then run optimization:

cd 4D_optimization

# Step 1: Prepare data β€” split inference mp4 into per-view frames + copy camera templates
# Edit prepare_optimization_data.sh to set INFERENCE_VIDEO and MULTIVIEW_VIDEO_FOLDER paths
bash prepare_optimization_data.sh

# Step 2: Train
# Edit train.sh to set MULTIVIEW_VIDEO_FOLDER path
bash train.sh

# Step 3: Render
# Edit render.sh to set 4DGS_MODEL_PATH to training output path
bash render.sh

Character4D Dataset

We construct a large-scale 4D character dataset by filtering high-quality characters from VRoid Hub, collecting a total of 13,115 characters in OBJ format. We then retarget and bind 40 diverse motions (e.g., dancing, singing, and jumping), using skeletons from Mixamo, to these characters. Next, we render all characters from 21 viewpoints in the A-pose and under various motions. Finally, we release the raw and rigged OBJ files, along with the rendered images and pose visualizations, at this link.

License and Citation

All assets and code are under the license unless specified otherwise.

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{gao2025charactershot,
  title={CharacterShot: Controllable and Consistent 4D Character Animation},
  author={Gao, Junyao and Li, Jiaxing and Liu, Wenran and Zeng, Yanhong and Shen, Fei and Chen, Kai and Sun, Yanan and Zhao, Cairong},
  journal={arXiv preprint arXiv:2508.07409},
  year={2025}
}

Acknowledgements

The code is built upon CogVideo, WideRange4D and 4DGaussians.

About

Official implementation of CharacterShot: Controllable and Consistent 4D Character Animation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors