- Shanghai
-
16:15
(UTC +08:00)
Stars
Universal LLM Deployment Engine with ML Compilation
Open deep learning compiler stack for cpu, gpu and specialized accelerators
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Hummingbird compiles trained ML models into tensor computation for faster inference.
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
A library for syntactically rewriting Python programs, pronounced (sinner).
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
TFLite python API package for parsing TFLite model