: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

This repository contains the official implementation code for SCAIL (Studio-Grade Character Animation via In-Context Learning), a framework that enables high-fidelity character animation under diverse and challenging conditions, including large motion variations, stylized characters, and multi-character interactions.

🔎 Motivation and Results

SCAIL identifies the key bottlenecks that hinder character animation towards production level: limited generalization towards characters and incoherent motion under complex scenarios (e.g., the long-standing challenge of multi-character interactions, as well as common failures in basic motions like flipping and turning). We revisit the core components of character animation -- how to represent the pose representation and how to inject the pose. Our framework resolves the challenge that pose representations cannot simultaneously prevent identity leakage and preserve rich motion information, and compels the model to perform spatiotemporal reasoning over the entire motion sequence for more natural and coherent movements. Check our methods, results gallery, as well as comparisons against other baselines at our project page.

🗞️ Update and News

2025.12.08: 🔥 We release the inference code of SCAIL on SAT.
2025.12.11: 👀 We’ve added more interesting cases to our gallery on project page! Check it out!
2025.12.11: 💥 SCAIL is now officially opensourced on HuggingFace and ModelScope!!
2025.12.14: 🥳 Thanks to friends in the community for testing the work! Despite the fact that only 1.5% of SCAIL’s training samples are anime data, and that we did not intentionally collect any multi-character anime data, we were surprised to see that the model can already handle many complex anime characters and even support multi-character anime interactions. The release of SCAIL-Preview is intended to demonstrate the soundness of our proposed pose representation and model architecture, with clear potential for further scaling and enhancement.
2025.12.16: ❤️ Huge thanks to KJ for the work done on adaptation — SCAIL is now available in ComfyUI-WanVideoWrapper!!! Meanwhile, the pose extraction & rendering has also been partly adapted to ComfyUI in ComfyUI-SCAIL-Pose, currently without multi-character tracking and multi-character facial keypoints.
2025.12.17: ❤️ Thanks to VantageWithAI, GGUF version is now available at SCAIL-Preview-GGUF!

📋 TODOs

SCAIL-14B-Preview Model Weights(512p, 5s) and Inference Config
Prompt Optimization Snippets
SCAIL-Official(1.3B/14B) Model Weights(Improved Stability and Clarity, Innate Long Video Generation Capability) and Inference Config

🚀 Getting Started

Checkpoints Download

ckpts	Download Link	Notes
SCAIL-Preview(14B)	🤗 Hugging Face 🤖 ModelScope	Supports 512P

Use the following commands to download the model weights (We have integrated both Wan VAE and T5 modules into this checkpoint for convenience).

# Download the repository (skip automatic LFS file downloads)
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/zai-org/SCAIL-Preview

The files should be organized like:

SCAIL-Preview/
├── Wan2.1_VAE.pth
├── model
│   ├── 1
│   │   └── mp_rank_00_model_states.pt
│   └── latest
└── umt5-xxl
    ├── ...

Environment Setup

Please make sure your Python version is between 3.10 and 3.12, inclusive of both 3.10 and 3.12.

pip install -r requirements.txt

🦾 Usage

Input preparation

The input data should be organized as follows, we have provided some example data in examples/:

examples/
├── 001
│   ├── driving.mp4
│   ├── ref.jpg
└── 002
    ├── driving.mp4
    └── ref.jpg
...

Pose Extraction & Rendering

We provide our pose extraction and rendering code in another repo SCAIL-Pose, which can be used to extract the pose from the driving video and render them. We recommand using another environment for pose extraction due to dependency issues. Clone that repo to SCAIL-Pose folder and follow instructions in it. After pose extraction and rendering, the input data should be organized as follows:

examples/
├── 001
│   ├── driving.mp4
│   ├── ref.jpg
│   └── rendered.mp4 (or rendered_aligned.mp4)
└── 002
...

Model Inference

Run the following command to start the inference:

bash scripts/sample_sgl_1Bsc_xc_cli.sh

The CLI will ask you to input in format like <prompt>@@<example_dir>, e.g. the girl is dancing@@examples/001. The example_dir should contain rendered.mp4 or rendered_aligned.mp4 after pose extraction and rendering. Results will be save to samples/.

Note that our model is trained with long detailed prompts, even though a short or even null prompt can be used, the result may not be as good as the long prompt. We will provide our prompt generation snippets, using Google Gemini to read from the reference image and the driving motion and generate a detailed prompt like A woman with curly hair is joyfully dancing along a rocky shoreline, wearing a sleek blue two-piece outfit. She performs various dance moves, including twirling, raising her hands, and embracing the lively seaside atmosphere, her tattoos and confident demeanor adding to her dynamic presence.

You can further choose sampling configurations like resolution in the yaml file under configs/sampling/ or directly modify sample_video.py for customized sampling logic.

✨ Acknowledgements

Our implementation is built upon the foundation of Wan 2.1 and the overall project architecture is built using SAT. We utilized NLFPose for reliable pose extraction. Thanks for their remarkable contribution and released code.

📄 Citation

If you find this work useful in your research, please cite:

@article{yan2025scail,
  title={SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations},
  author={Yan, Wenhao and Ye, Sheng and Yang, Zhuoyi and Teng, Jiayan and Dong, ZhenHui and Wen, Kairui and Gu, Xiaotao and Liu, Yong-Jin and Tang, Jie},
  journal={arXiv preprint arXiv:2512.05905},
  year={2025}
}

🗝️ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
configs		configs
eval		eval
examples		examples
notuse		notuse
resources		resources
sat		sat
scripts		scripts
sgm		sgm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
data_video.py		data_video.py
diffusion_video.py		diffusion_video.py
dit_video_crossattn_sc_xc.py		dit_video_crossattn_sc_xc.py
gen_prompts_gemini.py		gen_prompts_gemini.py
requirements.txt		requirements.txt
sample_video.py		sample_video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

🔎 Motivation and Results

🗞️ Update and News

📋 TODOs

🚀 Getting Started

Checkpoints Download

Environment Setup

🦾 Usage

Input preparation

Pose Extraction & Rendering

Model Inference

✨ Acknowledgements

📄 Citation

🗝️ License

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

zai-org/SCAIL

Folders and files

Latest commit

History

Repository files navigation

: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

🔎 Motivation and Results

🗞️ Update and News

📋 TODOs

🚀 Getting Started

Checkpoints Download

Environment Setup

🦾 Usage

Input preparation

Pose Extraction & Rendering

Model Inference

✨ Acknowledgements

📄 Citation

🗝️ License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages