Skip to content
View airMeng's full-sized avatar
🇸🇭
I will not serve
🇸🇭
I will not serve

Organizations

@RunoobHelpsRunoob

Block or report airMeng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Let your Claude able to think

TypeScript 16,768 1,980 Updated Nov 4, 2025

🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantization, MXFP4, NVFP4, GGUF, and adaptive schemes.

Python 844 77 Updated Feb 4, 2026

Intel® NPU Acceleration Library

Python 703 82 Updated Apr 24, 2025

An innovative library for efficient LLM inference via low-bit quantization

C++ 352 39 Updated Aug 30, 2024
C++ 61 20 Updated Dec 18, 2024

how to optimize some algorithm in cuda.

Cuda 2,815 256 Updated Jan 31, 2026

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,174 216 Updated Oct 8, 2024

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

Python 2,581 295 Updated Feb 4, 2026

Writing a minimal x86-64 JIT compiler in C++

C++ 106 17 Updated Apr 28, 2018

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

Jupyter Notebook 532 185 Updated Feb 3, 2026

Intel® Extension for TensorFlow*

C++ 349 45 Updated Oct 29, 2025
Jupyter Notebook 221 81 Updated Nov 22, 2024

MLIR Sample dialect

C++ 136 36 Updated Dec 23, 2025

MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com

38 9 Updated Dec 1, 2023

Parallel Algorithm Scheduling Library

C++ 105 20 Updated Jul 24, 2017

This is an implementation of sgemm_kernel on L1d cache.

Assembly 233 33 Updated Feb 26, 2024

an educational compiler intermediate representation

Rust 732 323 Updated Jan 5, 2026

A primitive library for neural network

C++ 1,369 223 Updated Nov 24, 2024

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,725 324 Updated Oct 19, 2024

Fast sparse deep learning on CPUs

Python 56 8 Updated Sep 28, 2022

Assembler for NVIDIA Maxwell architecture

Sass 1,060 171 Updated Jan 3, 2023

Samples for Intel® oneAPI Toolkits

C++ 1,125 744 Updated Jan 29, 2026

LLVM Optimization to extract a function, embedded in its intermediate representation in the binary, and execute it using the LLVM Just-In-Time compiler.

C++ 531 31 Updated May 15, 2021

Transform ONNX model to PyTorch representation

Python 345 70 Updated Nov 4, 2025

Intel Data Parallel C++ (and SYCL 2020) Tutorial.

C++ 95 16 Updated Dec 15, 2021

LightSeq: A High Performance Library for Sequence Processing and Generation

C++ 3,305 335 Updated May 16, 2023

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure

C++ 974 400 Updated Feb 4, 2026

ipex verbose toolkit

Python 2 Updated Mar 10, 2022

Python Framework for sparse neural networks

Cuda 19 5 Updated Apr 28, 2017

a c++/cuda template library for tensor lazy evaluation

C++ 164 38 Updated May 8, 2023
Next