Skip to content
View FrozenGene's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Shanghai
  • 08:11 (UTC +08:00)

Organizations

@apache @DougongAI

Block or report FrozenGene

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C++ 1,672 612 Updated Nov 7, 2025

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 2,210 279 Updated Nov 7, 2025

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

Python 2,492 424 Updated Nov 7, 2025

Visualizer for neural network, deep learning and machine learning models

JavaScript 31,744 3,021 Updated Nov 7, 2025

Fast, Flexible and Portable Structured Generation

C++ 1,352 98 Updated Nov 7, 2025

High-performance automatic differentiation of LLVM and MLIR.

LLVM 1,487 144 Updated Nov 7, 2025

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 12,797 3,693 Updated Nov 7, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,741 1,520 Updated Nov 7, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,634 259 Updated Nov 6, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,938 148 Updated Nov 5, 2025

Universal LLM Deployment Engine with ML Compilation

Python 21,576 1,851 Updated Nov 4, 2025

Learning Vim and Vimscript doesn't have to be hard. This is the guide that you're looking for πŸ“–

14,751 1,128 Updated Oct 27, 2025

Original Apollo 11 Guidance Computer (AGC) source code for the command and lunar modules.

Assembly 63,826 7,357 Updated Oct 22, 2025

πŸŽ“ Path to a free self-taught education in Computer Science!

HTML 197,223 24,586 Updated Aug 23, 2025

πŸ“šA curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.πŸŽ‰

Python 4,668 319 Updated Aug 19, 2025

Repository which contains links and resources on different topics of Computer Science.

CSS 4,224 1,161 Updated Aug 15, 2025

Hummingbird compiles trained ML models into tensor computation for faster inference.

Python 3,496 286 Updated Jul 17, 2025

LLM training in simple, raw C/CUDA

Cuda 28,099 3,267 Updated Jun 26, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,929 286 Updated May 15, 2025

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

JavaScript 1,572 192 Updated Feb 25, 2025

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,483 583 Updated Feb 15, 2025
C++ 145 21 Updated Jan 30, 2025
Jupyter Notebook 210 75 Updated Nov 22, 2024

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,667 320 Updated Oct 19, 2024

Triton to TVM transpiler.

C++ 22 2 Updated Oct 14, 2024

Inference Llama 2 in one file of pure C

C 18,917 2,400 Updated Aug 6, 2024

LLM101n: Let's build a Storyteller

35,480 1,931 Updated Aug 1, 2024

An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.

Python 51 3 Updated Jul 23, 2024
Python 619 65 Updated Jun 4, 2024

GPTQ inference TVM kernel

Cuda 39 1 Updated Apr 25, 2024
Next