- Shanghai
-
13:19
(UTC +08:00)
Stars
🎓 Path to a free self-taught education in Computer Science!
Original Apollo 11 Guidance Computer (AGC) source code for the command and lunar modules.
Visualizer for neural network, deep learning and machine learning models
Universal LLM Deployment Engine with ML Compilation
Learning Vim and Vimscript doesn't have to be hard. This is the guide that you're looking for 📖
Open deep learning compiler stack for cpu, gpu and specialized accelerators
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Repository which contains links and resources on different topics of Computer Science.
Hummingbird compiles trained ML models into tensor computation for faster inference.
Training and serving large-scale neural networks with auto parallelization.
A list of awesome compiler projects and papers for tensor computation and deep learning.
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
"Multi-Level Intermediate Representation" Compiler Infrastructure
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.
High-performance automatic differentiation of LLVM and MLIR.
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
Fast, Flexible and Portable Structured Generation