Skip to content
View FrozenGene's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Shanghai
  • 00:28 (UTC +08:00)

Organizations

@apache @DougongAI

Block or report FrozenGene

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,930 286 Updated May 15, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,660 319 Updated Aug 19, 2025

Universal LLM Deployment Engine with ML Compilation

Python 21,564 1,851 Updated Nov 4, 2025

Fast, Flexible and Portable Structured Generation

C++ 1,349 97 Updated Nov 4, 2025

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 2,194 276 Updated Nov 4, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,622 256 Updated Oct 28, 2025

Triton to TVM transpiler.

C++ 22 2 Updated Oct 14, 2024

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,934 148 Updated Nov 5, 2025

LLM101n: Let's build a Storyteller

35,454 1,929 Updated Aug 1, 2024

LLM training in simple, raw C/CUDA

Cuda 28,073 3,264 Updated Jun 26, 2025

An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.

Python 51 3 Updated Jul 23, 2024

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,728 1,514 Updated Nov 5, 2025

Inference Llama 2 in one file of pure C

C 18,912 2,399 Updated Aug 6, 2024

GPTQ inference TVM kernel

Cuda 39 1 Updated Apr 25, 2024

Training and serving large-scale neural networks with auto parallelization.

Python 3,162 353 Updated Dec 9, 2023

Awesome resources for GPUs

600 56 Updated Jul 1, 2023

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

JavaScript 1,572 192 Updated Feb 25, 2025

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,481 582 Updated Feb 15, 2025
Jupyter Notebook 210 75 Updated Nov 22, 2024
Python 620 65 Updated Jun 4, 2024
C++ 145 21 Updated Jan 30, 2025
Python 42 3 Updated Sep 8, 2023

Automatically Generated Notebook Slides

Jupyter Notebook 239 82 Updated Aug 18, 2023

Hummingbird compiles trained ML models into tensor computation for faster inference.

Python 3,495 286 Updated Jul 17, 2025

High-performance automatic differentiation of LLVM and MLIR.

LLVM 1,485 144 Updated Nov 5, 2025

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

Python 2,488 424 Updated Nov 5, 2025

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,666 320 Updated Oct 19, 2024

A library for syntactically rewriting Python programs, pronounced (sinner).

Python 68 11 Updated Feb 22, 2022

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C++ 1,669 610 Updated Nov 4, 2025

Learning Vim and Vimscript doesn't have to be hard. This is the guide that you're looking for 📖

14,750 1,127 Updated Oct 27, 2025
Next