-
Tsinghua University
- Beijing, China
Stars
🚀🚀 Efficient implementations of Native Sparse Attention
Official repo for paper "SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass"
Development repository for the Triton language and compiler
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
[NeurIPS 2025] PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
[PG2025] PaMO: Parallel Mesh Optimization for Intersection-Free Low-Poly Modeling on the GPU
A video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.
Import a 3D Model and automatically assign and export animations
A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
A Python framework for accelerated simulation, data generation and spatial computing.
[NeurIPS 2025 Spotlight] Official repository for “Puppeteer: Rig and Animate Your 3D Models”
Reference PyTorch implementation and models for DINOv3
ViPE: Video Pose Engine for Geometric 3D Perception
Efficient triton implementation of Native Sparse Attention.
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
🔥 Official impl. of "DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction"
Wan: Open and Advanced Large-Scale Video Generative Models
Lumos Project: Frontier generative model research by Alibaba DAMO Academy, including Lumos-1, etc.
Code for Streaming 4D Visual Geometry Transformer
Code of π^3: Permutation-Equivariant Visual Geometry Learning
CoPart (ICCV 2025): A part-based 3D generation framework & the first large-scale part-level 3D dataset.