Stars
An hardware-aware Efficient Implementation for "Mixture-of-Depths Attention".
LLM驱动的 A/H/美股智能分析器:多数据源行情 + 实时新闻 + LLM决策仪表盘 + 多渠道推送,零成本定时运行,纯白嫖. LLM-powered stock analysis system for A/H/US markets.
[Tech Report] Alive: A Unified Audio-Video Generation Model
Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory
HunyuanVideo-1.5: A leading lightweight video generation model
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
Repo for SeedVR2 (ICLR2026) & SeedVR (CVPR2025 Highlight)
MTVCraft: An Open Veo3-style Audio-Video Generation Demo
Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.
Structured Video Comprehension of Real-World Shorts
InternRobotics' open platform for building generalized navigation foundation models.
Official PyTorch Implementation of "Optimal Stepsize for Diffusion Sampling".
MAGI-1: Autoregressive Video Generation at Scale
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
ReNeg: Learning Negative Embedding with Reward Guidance
(Accepted by IJCV) Liquid: Language Models are Scalable and Unified Multi-modal Generators
[CVPR 2025] StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
The official implementation of "[MASK] is All You Need"
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[ECCV 2024] Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
[CVPR 2025 Highlight] Truncated Diffusion Model for Real-Time End-to-End Autonomous Driving
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
Bridging Large Vision-Language Models and End-to-End Autonomous Driving