-
Alibaba-DAMO
- beijing
Stars
Efficient Triton Kernels for LLM Training
[CVPR 2025] Teaching Large Language Models to Regress Accurate Image Quality Scores using Score Distribution
[CVPR2023] Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective
๐ Sliding Window Attention Training for Efficient Large Language Models
Flash Attention Triton kernel with support for second-order derivatives
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
Triton based sparse quantization attention kernel collection
Muon is an optimizer for hidden layers in neural networks
Reference PyTorch implementation and models for DINOv3
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
Evaluating text-to-image/video/3D models with VQAScore
A toolkit designed for the CapsBench Caption Evaluation Framework, as introduced in the paper Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
Scalable and memory-optimized training of diffusion models
QuAcK: a software for emerging quantum electronic structure methods
GoatWu / Self-Forcing-Plus
Forked from guandeh17/Self-ForcingUnofficial extension implementation of Self-Forcing to support I2V && 14B training.
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
A unified inference and post-training framework for accelerated video generation.
Repo for SeedVR2 & SeedVR (CVPR2025 Highlight)
๐A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.๐
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Lets make video diffusion practical!
๐น A more flexible framework that can generate videos at any resolution and creates videos from images.
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
(NeurIPS 2024 Oral ๐ฅ) Improved Distribution Matching Distillation for Fast Image Synthesis
[ICCV 2025, Oral] TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models