Skip to content
View airMeng's full-sized avatar
🇸🇭
I will not serve
🇸🇭
I will not serve

Organizations

@RunoobHelpsRunoob

Block or report airMeng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Let your Claude able to think

TypeScript 16,999 1,971 Updated Apr 7, 2026

A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support and full compatibility with vLLM, SGLang, and Transformers.

Python 1,068 116 Updated Apr 29, 2026

Intel® NPU Acceleration Library

Python 710 82 Updated Apr 24, 2025

An innovative library for efficient LLM inference via low-bit quantization

C++ 352 38 Updated Aug 30, 2024
C++ 61 20 Updated Dec 18, 2024

how to optimize some algorithm in cuda.

Cuda 2,953 272 Updated Apr 22, 2026

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,178 217 Updated Oct 8, 2024

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

Python 2,627 304 Updated Apr 29, 2026

Writing a minimal x86-64 JIT compiler in C++

C++ 105 17 Updated Apr 28, 2018

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

Jupyter Notebook 578 229 Updated Apr 29, 2026

Intel® Extension for TensorFlow*

C++ 351 45 Updated Oct 29, 2025
Jupyter Notebook 228 81 Updated Nov 22, 2024

MLIR Sample dialect

C++ 137 36 Updated Dec 23, 2025

MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com

38 9 Updated Dec 1, 2023

Parallel Algorithm Scheduling Library

C++ 106 20 Updated Jul 24, 2017

This is an implementation of sgemm_kernel on L1d cache.

Assembly 234 33 Updated Feb 26, 2024

an educational compiler intermediate representation

Rust 759 329 Updated Feb 6, 2026

A primitive library for neural network

C++ 1,368 220 Updated Nov 24, 2024

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,739 324 Updated Oct 19, 2024

Fast sparse deep learning on CPUs

Python 55 8 Updated Sep 28, 2022

Assembler for NVIDIA Maxwell architecture

Sass 1,062 171 Updated Jan 3, 2023

Samples for Intel® oneAPI Toolkits

C++ 1,138 746 Updated Apr 8, 2026

LLVM Optimization to extract a function, embedded in its intermediate representation in the binary, and execute it using the LLVM Just-In-Time compiler.

C++ 530 33 Updated May 15, 2021

Transform ONNX model to PyTorch representation

Python 348 71 Updated Nov 4, 2025

Intel Data Parallel C++ (and SYCL 2020) Tutorial.

C++ 96 16 Updated Dec 15, 2021

LightSeq: A High Performance Library for Sequence Processing and Generation

C++ 3,300 333 Updated May 16, 2023

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure

C++ 1,009 423 Updated Apr 29, 2026

ipex verbose toolkit

Python 2 Updated Mar 10, 2022

Python Framework for sparse neural networks

Cuda 19 4 Updated Apr 28, 2017

a c++/cuda template library for tensor lazy evaluation

C++ 165 38 Updated May 8, 2023
Next