Skip to content
/ SCAIL Public

Offical Implementation of SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

License

Notifications You must be signed in to change notification settings

zai-org/SCAIL

Repository files navigation

SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

This repository contains the official implementation code for SCAIL (Studio-Grade Character Animation via In-Context Learning), a framework that enables high-fidelity character animation under diverse and challenging conditions, including large motion variations, stylized characters, and multi-character interactions.

Teaser

πŸ”Ž Motivation and Results

SCAIL identifies the key bottlenecks that hinder character animation towards production level: limited generalization towards characters and incoherent motion under complex scenarios (e.g., the long-standing challenge of multi-character interactions, as well as common failures in basic motions like flipping and turning). We revisit the core components of character animation -- how to represent the pose representation and how to inject the pose. Our framework resolves the challenge that pose representations cannot simultaneously prevent identity leakage and preserve rich motion information, and compels the model to perform spatiotemporal reasoning over the entire motion sequence for more natural and coherent movements. Check our methods, results gallery, as well as comparisons against other baselines at our project page.

πŸ—žοΈ Update and News

  • 2025.12.08: πŸ”₯ We release the inference code of SCAIL on SAT.
  • 2025.12.11: πŸ‘€ We’ve added more interesting cases to our gallery on project page! Check it out!
  • 2025.12.11: πŸ’₯ SCAIL is now officially opensourced on HuggingFace and ModelScope!!
  • 2025.12.14: πŸ₯³ Thanks to friends in the community for testing the work! Despite the fact that only 1.5% of SCAIL’s training samples are anime data, and that we did not intentionally collect any multi-character anime data, we were surprised to see that the model can already handle many complex anime characters and even support multi-character anime interactions. The release of SCAIL-Preview is intended to demonstrate the soundness of our proposed pose representation and model architecture, with clear potential for further scaling and enhancement.
  • 2025.12.16: ❀️ Huge thanks to KJ for the work done on adaptation β€” SCAIL is now available in ComfyUI-WanVideoWrapper!!! Meanwhile, the pose extraction & rendering has also been partly adapted to ComfyUI in ComfyUI-SCAIL-Pose, currently without multi-character tracking and multi-character facial keypoints.
  • 2025.12.17: ❀️ Thanks to VantageWithAI, GGUF version is now available at SCAIL-Preview-GGUF!

πŸ“‹ TODOs

  • SCAIL-14B-Preview Model Weights(512p, 5s) and Inference Config
  • Prompt Optimization Snippets
  • SCAIL-Official(1.3B/14B) Model Weights(Improved Stability and Clarity, Innate Long Video Generation Capability) and Inference Config

πŸš€ Getting Started

Checkpoints Download

ckpts Download Link Notes
SCAIL-Preview(14B) πŸ€— Hugging Face
πŸ€– ModelScope
Supports 512P

Use the following commands to download the model weights (We have integrated both Wan VAE and T5 modules into this checkpoint for convenience).

# Download the repository (skip automatic LFS file downloads)
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/zai-org/SCAIL-Preview

The files should be organized like:

SCAIL-Preview/
β”œβ”€β”€ Wan2.1_VAE.pth
β”œβ”€β”€ model
β”‚   β”œβ”€β”€ 1
β”‚   β”‚   └── mp_rank_00_model_states.pt
β”‚   └── latest
└── umt5-xxl
    β”œβ”€β”€ ...

Environment Setup

Please make sure your Python version is between 3.10 and 3.12, inclusive of both 3.10 and 3.12.

pip install -r requirements.txt

🦾 Usage

Input preparation

The input data should be organized as follows, we have provided some example data in examples/:

examples/
β”œβ”€β”€ 001
β”‚   β”œβ”€β”€ driving.mp4
β”‚   β”œβ”€β”€ ref.jpg
└── 002
    β”œβ”€β”€ driving.mp4
    └── ref.jpg
...

Pose Extraction & Rendering

We provide our pose extraction and rendering code in another repo SCAIL-Pose, which can be used to extract the pose from the driving video and render them. We recommand using another environment for pose extraction due to dependency issues. Clone that repo to SCAIL-Pose folder and follow instructions in it. After pose extraction and rendering, the input data should be organized as follows:

examples/
β”œβ”€β”€ 001
β”‚   β”œβ”€β”€ driving.mp4
β”‚   β”œβ”€β”€ ref.jpg
β”‚   └── rendered.mp4 (or rendered_aligned.mp4)
└── 002
...

Model Inference

Run the following command to start the inference:

bash scripts/sample_sgl_1Bsc_xc_cli.sh

The CLI will ask you to input in format like <prompt>@@<example_dir>, e.g. the girl is dancing@@examples/001. The example_dir should contain rendered.mp4 or rendered_aligned.mp4 after pose extraction and rendering. Results will be save to samples/.

Note that our model is trained with long detailed prompts, even though a short or even null prompt can be used, the result may not be as good as the long prompt. We will provide our prompt generation snippets, using Google Gemini to read from the reference image and the driving motion and generate a detailed prompt like A woman with curly hair is joyfully dancing along a rocky shoreline, wearing a sleek blue two-piece outfit. She performs various dance moves, including twirling, raising her hands, and embracing the lively seaside atmosphere, her tattoos and confident demeanor adding to her dynamic presence.

You can further choose sampling configurations like resolution in the yaml file under configs/sampling/ or directly modify sample_video.py for customized sampling logic.

✨ Acknowledgements

Our implementation is built upon the foundation of Wan 2.1 and the overall project architecture is built using SAT. We utilized NLFPose for reliable pose extraction. Thanks for their remarkable contribution and released code.

πŸ“„ Citation

If you find this work useful in your research, please cite:

@article{yan2025scail,
  title={SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations},
  author={Yan, Wenhao and Ye, Sheng and Yang, Zhuoyi and Teng, Jiayan and Dong, ZhenHui and Wen, Kairui and Gu, Xiaotao and Liu, Yong-Jin and Tang, Jie},
  journal={arXiv preprint arXiv:2512.05905},
  year={2025}
}

πŸ—οΈ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details

About

Offical Implementation of SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages