Derf (Dynamic erf) - Normalization-Free Transformer Activation. Reimplementation of arXiv:2512.10938
-
Updated
Dec 13, 2025 - Python
Derf (Dynamic erf) - Normalization-Free Transformer Activation. Reimplementation of arXiv:2512.10938
InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models
Official implementation for "RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers" (ICML 2025) and "UltraViCo: Breaking Extrapolation Limits in Video Diffusion Transformers"
Official implementation of the paper "GenCompositor: Generative Video Compositing with Diffusion Transformer"
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[NeurIPS 2025] Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Surpasses GPT-4o in ID persistence~ MoE ckpt released! Only 4GB VRAM is enough to run!
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
Implementation of RIFT-SVC, a singing voice conversion model based on Rectified Flow Transformer.
Unofficial LeetArxiv Implementation of the paper Scalable Diffusion Models with Transformers
[NeurIPS 2025] Official implementation for our paper "Scaling Diffusion Transformers Efficiently via μP".
(ICCV 2025) 🎨 Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Official training code for MUG-V 10B video generation model. Built on Megatron-LM (v0.14.0) with production-ready distributed training for 10B DiT.
[NeurIPS2024 (Spotlight)] "Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement" by Zhehao Huang, Xinwen Cheng, JingHao Zheng, Haoran Wang, Zhengbao He, Tao Li, Xiaolin Huang
Torchsmith is a minimalist library that focuses on understanding generative AI by building it using primitive PyTorch operations
HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation
[SIGGRAPH Asia 25] Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off
DiffInk: Glyph- and Style-Aware Latent Diffusion Transformer for Text to Online Handwriting Generation
[NeurIPS 2025] OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication
A repo of a modified version of Diffusion Transformer
Democratising RGBA Image Generation With No $$$ (AI4VA@ECCV24)
Add a description, image, and links to the diffusion-transformer topic page so that developers can more easily learn about it.
To associate your repository with the diffusion-transformer topic, visit your repo's landing page and select "manage topics."