Skip to content
View thuang6's full-sized avatar

Block or report thuang6

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SOTA rounding quantization for high-accuracy low-bit LLM inference. Seamlessly optimized for vLLM, sglang, and CPU/GPU/CUDA with multi-datatype support.

Python 919 90 Updated Mar 24, 2026

Model compression for ONNX

Python 100 9 Updated Mar 1, 2026

oneAPI Deep Neural Network Library (oneDNN)

C++ 3,965 1,114 Updated Mar 24, 2026

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,179 216 Updated Oct 8, 2024

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

Python 2,604 300 Updated Mar 24, 2026

Machine learning, in numpy

Python 16,299 3,776 Updated Oct 29, 2023

Repo for counting stars and contributing. Press F to pay respect to glorious developers.

275,803 20,931 Updated Aug 22, 2025