π Docs | SANA | SANA-1.5 | SANA-Sprint | SANA-Video | Demo | π€ HuggingFace | ComfyUI
SANA is an efficiency-oriented codebase for high-resolution image and video generation, providing complete training and inference pipelines. This repository contains code for SANA, SANA-1.5, SANA-Sprint, and SANA-Video. More details can be found in our π documentation.
Join our Discord to engage in discussions with the community! If you have any questions, run into issues, or are interested in contributing, don't hesitate to reach out!
- (π₯ New) [2025/12/09] π¬ LongSANA training and inference code are all released. Thanks to LongLive Team. Refer to: [Train] | [Test] | [Weight]
- (π₯ New) [2025/11/21] π¬ LongSANA: 27FPS real-time minute-length video generation model is released. Thanks to LongLive. [Weight]
- (π₯ New) [2025/11/24] πͺΆ Blog: how Causal Linear Attention unlocks infinite context for LLMs and long video generation.
- (π₯ New) [2025/11/9] π¬ Introduction video shows how Block Causal Linear Attention and Causal Mix-FFN work?
- (π₯ New) [2025/11/6] πΊSANA-Video is merged into diffusers. How to use.
- (π₯ New) [2025/10/27] πΊSANA-Video is released. [README] | [Weights] support Text-to-Video, TextImage-to-Video.
- (π₯ New) [2025/10/13] πΊSANA-Video is coming, 1). a 5s Linear DiT Video model, and 2). real-time minute-length video generation (with LongLive). [paper] | [Page]
- (π₯ New) [2025/8/20] We release a new DC-AE-Lite for faster inference and smaller memory. [How to config] | [diffusers PR] | [Weight]
- (π₯ New) [2025/6/25] SANA-Sprint was accepted to ICCV'25 ποΈ
- (π₯ New) [2025/6/4] SANA-Sprint ComfyUI Node is released [Example].
- (π₯ New) [2025/5/8] SANA-Sprint (One-step diffusion) diffusers training code is released [Guidance].
Click to show all updates
- (π₯ New) [2025/5/4] SANA-1.5 (Inference-time scaling) is accepted by ICML-2025. πππ
- (π₯ New) [2025/3/22] π₯SANA-Sprint demo is hosted on Huggingface, try it! π [Demo Link]
- (π₯ New) [2025/3/22] π₯SANA-1.5 is supported in ComfyUI! π: ComfyUI Guidance | ComfyUI Work Flow SANA-1.5 4.8B
- (π₯ New) [2025/3/22] π₯SANA-Sprint code & weights are released! π Include: Training & Inference code and Weights / HF are all released. [Guidance]
- (π₯ New) [2025/3/21] πSana + Inference Scaling is released. [Guidance]
- (π₯ New) [2025/3/16] π₯SANA-1.5 code & weights are released! π Include: DDP/FSDP | TAR file WebDataset | Multi-Scale Training code and Weights | HF are all released.
- (π₯ New) [2025/3/14] πSANA-Sprint is coming out! π A new one/few-step generator of Sana. 0.1s per 1024px image on H100, 0.3s on RTX 4090. Find out more details: [Page] | [Arxiv]. Code is coming very soon along with
diffusers - (π₯ New) [2025/2/10] πSana + ControlNet is released. [Guidance] | [Model] | [Demo]
- (π₯ New) [2025/1/30] Release CAME-8bit optimizer code. Saving more GPU memory during training. [How to config]
- (π₯ New) [2025/1/29] π π πSANA 1.5 is out! Figure out how to do efficient training & inference scaling! π[Tech Report]
- (π₯ New) [2025/1/24] 4bit-Sana is released, powered by SVDQuant and Nunchaku inference engine. Now run your Sana within 8GB GPU VRAM [Guidance] [Demo] [Model]
- (π₯ New) [2025/1/24] DCAE-1.1 is released, better reconstruction quality. [Model] [diffusers]
- (π₯ New) [2025/1/23] Sana is accepted as Oral by ICLR-2025. πππ
- (π₯ New) [2025/1/12] DC-AE tiling makes Sana-4K inferences 4096x4096px images within 22GB GPU memory. With model offload and 8bit/4bit quantize. The 4K Sana run within 8GB GPU VRAM. [Guidance]
- (π₯ New) [2025/1/11] Sana code-base license changed to Apache 2.0.
- (π₯ New) [2025/1/10] Inference Sana with 8bit quantization.[Guidance]
- (π₯ New) [2025/1/8] 4K resolution Sana models is supported in Sana-ComfyUI and work flow is also prepared. [4K guidance]
- (π₯ New) [2025/1/8] 1.6B 4K resolution Sana models are released: [BF16 pth] or [BF16 diffusers]. π Get your 4096x4096 resolution images within 20 seconds! Find more samples in Sana page. Thanks SUPIR for their wonderful work and support.
- (π₯ New) [2025/1/2] Bug in the
diffuserspipeline is solved. Solved PR - (π₯ New) [2025/1/2] 2K resolution Sana models is supported in Sana-ComfyUI and work flow is also prepared.
- β [2024/12] 1.6B 2K resolution Sana models are released: [BF16 pth] or [BF16 diffusers]. π Get your 2K resolution images within 4 seconds! Find more samples in Sana page. Thanks SUPIR for their wonderful work and support.
- β
[2024/12]
diffuserssupports Sana-LoRA fine-tuning! Sana-LoRA's training and convergence speed is super fast. [Guidance] or [diffusers docs]. - β
[2024/12]
diffusershas Sana! All Sana models in diffusers safetensors are released and diffusers pipelineSanaPipeline,SanaPAGPipeline,DPMSolverMultistepScheduler(with FlowMatching)are all supported now. We prepare a Model Card for you to choose. - β [2024/12] 1.6B BF16 Sana model is released for stable fine-tuning.
- β [2024/12] We release the ComfyUI node for Sana. [Guidance]
- β [2024/11] All multi-linguistic (Emoji & Chinese & English) SFT models are released: 1.6B-512px, 1.6B-1024px, 600M-512px, 600M-1024px. The metric performance is shown here
- β [2024/11] Sana Replicate API is launching at Sana-API.
- β [2024/11] 1.6B Sana models are released.
- β [2024/11] Training & Inference & Metrics code are released.
- β
[2024/11] Working on
diffusers. - [2024/10] Demo is released.
- [2024/10] DC-AE Code and weights are released!
- [2024/10] Paper is on Arxiv!
We introduce SANA, a series of efficient diffusion models for high-resolution image and video generation:
- SANA: Text-to-image generation up to 4K resolution, 20Γ smaller and 100Γ faster than Flux-12B.
- SANA-1.5: Efficient training-time and inference-time compute scaling for better quality.
- SANA-Sprint: One/few-step generation via sCM distillation, 0.1s per 1024px image on H100. | SANA-Video/LongSANA | Efficient video generation with Block Linear Attention / with LongLive|
Key Techniques:
- Linear Attention: Replace vanilla attention in DiT with linear attention for efficiency at high resolutions.
- DC-AE: 32Γ image compression (vs. traditional 8Γ) to reduce latent tokens.
- Decoder-only Text Encoder: Modern decoder-only LLM with in-context learning for better text-image alignment.
- Block Causal Linear Attention & Causal Mix-FFN: Efficient attention and feedforward for long video generation.
- Flow-DPM-Solver: Reduce sampling steps with efficient training and sampling.
- sCM Distillation: One/few-step generation with continuous-time consistency distillation.
In summary, SANA is a fully open-source framework integrating efficient training, fast inference, and flexible deployment for both image and video generation. Deployable on laptop GPUs with < 8GB VRAM via 4-bit quantization.
git clone https://github.com/NVlabs/Sana.git
cd Sana && ./environment_setup.sh sanaimport torch
from diffusers import SanaPipeline
pipe = SanaPipeline.from_pretrained(
"Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers",
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
pipe.vae.to(torch.bfloat16)
pipe.text_encoder.to(torch.bfloat16)
prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
prompt=prompt,
height=1024,
width=1024,
guidance_scale=4.5,
num_inference_steps=20,
generator=torch.Generator(device="cuda").manual_seed(42),
)[0]
image[0].save("sana.png")Tip
Upgrade your diffusers>=0.32.0 to use SanaPipeline. More details can be found in π Docs.
- π Full Documentation
- Installation Guide
- Model Zoo
- Sana Inference & Training
- SANA-Sprint
- SANA-Video
- LongSANA
- ControlNet
- LoRA / DreamBooth
- Quantization (4bit / 8bit)
- ComfyUI
| Methods (1024x1024) | Throughput (samples/s) | Latency (s) | Params (B) | Speedup | FID π | CLIP π | GenEval π | DPG π |
|---|---|---|---|---|---|---|---|---|
| FLUX-dev | 0.04 | 23.0 | 12.0 | 1.0Γ | 10.15 | 27.47 | 0.67 | 84.0 |
| Sana-0.6B | 1.7 | 0.9 | 0.6 | 39.5Γ | 5.81 | 28.36 | 0.64 | 83.6 |
| Sana-0.6B | 1.7 | 0.9 | 0.6 | 39.5Γ | 5.61 | 28.80 | 0.68 | 84.2 |
| Sana-1.6B | 1.0 | 1.2 | 1.6 | 23.3Γ | 5.92 | 28.94 | 0.69 | 84.5 |
| Sana-1.5 1.6B | 1.0 | 1.2 | 1.6 | 23.3Γ | 5.70 | 29.12 | 0.82 | 84.5 |
| Sana-1.5 4.8B | 0.26 | 4.2 | 4.8 | 6.5Γ | 5.99 | 29.23 | 0.81 | 84.7 |
| Models | Latency (s) | Params (B) | VBench Total β | Quality β | Semantic β |
|---|---|---|---|---|---|
| Wan-2.1-14B | 1897 | 14 | 83.73 | 85.77 | 75.58 |
| Wan-2.1-1.3B | 400 | 1.3 | 83.38 | 85.67 | 74.22 |
| SANA-Video-2B | 36 | 2 | 84.05 | 84.63 | 81.73 |
We will try our best to achieve
- [β ] Training code
- [β ] Inference code
- [β ] Model zoo
- [β ] ComfyUI Nodes(SANA, SANA-1.5, SANA-Sprint)
- [β ] DC-AE Diffusers
- [β ] Sana merged in Diffusers(huggingface/diffusers#9982)
- [β
] LoRA training by @paul(
diffusers: https://github.com/ huggingface/diffusers/pull/10234) - [β ] 2K/4K resolution models.(Thanks @SUPIR to provide a 4K super-resolution model)
- [β ] 8bit / 4bit Laptop development
- [β ] ControlNet (train & inference & models)
- [β ] FSDP Training
- [β ] SANA-1.5 (Larger model size / Inference Scaling)
- [β ] SANA-Sprint: Few-step generator
- [β ] Faster DCAE-Lite weight
- [β ] Better re-construction F32/F64 VAEs.
- [β ] SANA-Video: Linear DiT Video model, and real-time minute-length video generation
- [π] See you in the future
Thanks to the following open-sourced projects:
Thanks to the following open-sourced codebase for their wonderful work and codebase!
- PixArt-Ξ±
- PixArt-Ξ£
- diffusers
- Efficient-ViT
- ComfyUI_ExtraModels
- SVDQuant and Nunchaku
- Open-Sora
- Wan
- LongLive
Thanks Paper2Video for generating Jeason presenting SANAπ. Refer to Paper2Video for more details.
Thanks goes to these wonderful contributors:
@misc{xie2024sana,
title={Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer},
author={Enze Xie and Junsong Chen and Junyu Chen and Han Cai and Haotian Tang and Yujun Lin and Zhekai Zhang and Muyang Li and Ligeng Zhu and Yao Lu and Song Han},
year={2024},
eprint={2410.10629},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.10629},
}
@misc{xie2025sana,
title={SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer},
author={Xie, Enze and Chen, Junsong and Zhao, Yuyang and Yu, Jincheng and Zhu, Ligeng and Lin, Yujun and Zhang, Zhekai and Li, Muyang and Chen, Junyu and Cai, Han and others},
year={2025},
eprint={2501.18427},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.18427},
}
@misc{chen2025sanasprint,
title={SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation},
author={Junsong Chen and Shuchen Xue and Yuyang Zhao and Jincheng Yu and Sayak Paul and Junyu Chen and Han Cai and Song Han and Enze Xie},
year={2025},
eprint={2503.09641},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.09641},
}
@misc{chen2025sana,
title={SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer},
author={Chen, Junsong and Zhao, Yuyang and Yu, Jincheng and Chu, Ruihang and Chen, Junyu and Yang, Shuai and Wang, Xianbang and Pan, Yicheng and Zhou, Daquan and Ling, Huan and others},
year={2025},
eprint={2509.24695},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.24695},
}