Skip to content
View airMeng's full-sized avatar
🇸🇭
I will not serve
🇸🇭
I will not serve

Organizations

@RunoobHelpsRunoob

Block or report airMeng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Let your Claude able to think

TypeScript 16,620 1,964 Updated Nov 4, 2025

Advanced quantization toolkit for LLMs and VLMs. Support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration with Transformers, vLLM, SGLang, and llm-compressor

Python 775 64 Updated Dec 22, 2025

Intel® NPU Acceleration Library

Python 703 80 Updated Apr 24, 2025

An innovative library for efficient LLM inference via low-bit quantization

C++ 351 39 Updated Aug 30, 2024
C++ 61 20 Updated Dec 18, 2024

how to optimize some algorithm in cuda.

Cuda 2,705 244 Updated Dec 22, 2025

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,169 216 Updated Oct 8, 2024

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

Python 2,552 287 Updated Dec 22, 2025

Writing a minimal x86-64 JIT compiler in C++

C++ 104 17 Updated Apr 28, 2018

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

Jupyter Notebook 521 167 Updated Dec 22, 2025

Intel® Extension for TensorFlow*

C++ 350 45 Updated Oct 29, 2025
Jupyter Notebook 216 79 Updated Nov 22, 2024

MLIR Sample dialect

C++ 133 36 Updated Feb 18, 2025

MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com

38 9 Updated Dec 1, 2023

Parallel Algorithm Scheduling Library

C++ 107 20 Updated Jul 24, 2017

This is an implementation of sgemm_kernel on L1d cache.

Assembly 233 33 Updated Feb 26, 2024

an educational compiler intermediate representation

Rust 725 320 Updated Dec 22, 2025

A primitive library for neural network

C++ 1,369 223 Updated Nov 24, 2024

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,702 323 Updated Oct 19, 2024

Fast sparse deep learning on CPUs

Python 56 8 Updated Sep 28, 2022

Assembler for NVIDIA Maxwell architecture

Sass 1,058 172 Updated Jan 3, 2023

Samples for Intel® oneAPI Toolkits

C++ 1,117 741 Updated Nov 21, 2025

LLVM Optimization to extract a function, embedded in its intermediate representation in the binary, and execute it using the LLVM Just-In-Time compiler.

C++ 528 31 Updated May 15, 2021

Transform ONNX model to PyTorch representation

Python 344 70 Updated Nov 4, 2025

Intel Data Parallel C++ (and SYCL 2020) Tutorial.

C++ 95 16 Updated Dec 15, 2021

LightSeq: A High Performance Library for Sequence Processing and Generation

C++ 3,299 333 Updated May 16, 2023

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure

C++ 953 386 Updated Dec 10, 2025

ipex verbose toolkit

Python 2 Updated Mar 10, 2022

Python Framework for sparse neural networks

Cuda 19 5 Updated Apr 28, 2017

a c++/cuda template library for tensor lazy evaluation

C++ 164 38 Updated May 8, 2023
Next