Stars
Let your Claude able to think
Advanced quantization toolkit for LLMs and VLMs. Support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration with Transformers, vLLM, SGLang, and llm-compressor
Intel® NPU Acceleration Library
An innovative library for efficient LLM inference via low-bit quantization
how to optimize some algorithm in cuda.
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
Writing a minimal x86-64 JIT compiler in C++
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
Intel® Extension for TensorFlow*
MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com
This is an implementation of sgemm_kernel on L1d cache.
an educational compiler intermediate representation
A list of awesome compiler projects and papers for tensor computation and deep learning.
LLVM Optimization to extract a function, embedded in its intermediate representation in the binary, and execute it using the LLVM Just-In-Time compiler.
Transform ONNX model to PyTorch representation
Intel Data Parallel C++ (and SYCL 2020) Tutorial.
LightSeq: A High Performance Library for Sequence Processing and Generation
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
Python Framework for sparse neural networks
a c++/cuda template library for tensor lazy evaluation