- Shanghai
-
00:28
(UTC +08:00)
Stars
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Universal LLM Deployment Engine with ML Compilation
Fast, Flexible and Portable Structured Generation
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Training and serving large-scale neural networks with auto parallelization.
A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
Automatically Generated Notebook Slides
Hummingbird compiles trained ML models into tensor computation for faster inference.
High-performance automatic differentiation of LLVM and MLIR.
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
A list of awesome compiler projects and papers for tensor computation and deep learning.
A library for syntactically rewriting Python programs, pronounced (sinner).
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
Learning Vim and Vimscript doesn't have to be hard. This is the guide that you're looking for 📖