Skip to content
View thuang6's full-sized avatar

Block or report thuang6

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantization, MXFP4, NVFP4, GGUF, and adaptive schemes.

Python 844 78 Updated Feb 5, 2026

Model compression for ONNX

Python 98 9 Updated Nov 18, 2024

oneAPI Deep Neural Network Library (oneDNN)

C++ 3,960 1,102 Updated Feb 5, 2026

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,174 216 Updated Oct 8, 2024

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

Python 2,582 295 Updated Feb 5, 2026

Machine learning, in numpy

Python 16,240 3,785 Updated Oct 29, 2023

Repo for counting stars and contributing. Press F to pay respect to glorious developers.

275,428 20,973 Updated Aug 22, 2025