-
UC San Diego
- La Jolla
-
19:02
(UTC -07:00) - alexzms.github.io
- in/minshen-zhang-416a0b291
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Accelerating MoE with IO and Tile-aware Optimizations
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model
[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation
NVIDIA FastGen: Fast Generation from Diffusion Models
Jacobi Forcing: Fast and Accurate Diffusion-style Decoding
[NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning
A feature-rich command-line audio/video downloader
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Helpful kernel tutorials and examples for tile-based GPU programming
A high-throughput and memory-efficient inference and serving engine for LLMs
Introduction to Machine Learning Systems
🚀🚀 Efficient implementations of Native Sparse Attention
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
SGLang is a high-performance serving framework for large language models and multimodal models.
Aims to teach Python3 by example
Optimized primitives for collective multi-GPU communication
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Model Compression Toolbox for Large Language Models and Diffusion Models
A unified inference and post-training framework for accelerated video generation.
Tile primitives for speedy kernels
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels