FrozenGene

🎯

Focusing

Zhao Wu FrozenGene

🎯

Focusing

1.7k followers · 40 following

Shanghai
00:28 (UTC +08:00)

Achievements

x2 x2

Achievements

x2 x2

Organizations

Stars

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,930 286 Updated May 15, 2025

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,660 319 Updated Aug 19, 2025

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 21,564 1,851 Updated Nov 4, 2025

mlc-ai / xgrammar

Fast, Flexible and Portable Structured Generation

C++ 1,349 97 Updated Nov 4, 2025

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 2,194 276 Updated Nov 4, 2025

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,622 256 Updated Oct 28, 2025

Yongqi-Zhuo / triton-tvm

Triton to TVM transpiler.

C++ 22 2 Updated Oct 14, 2024

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,934 148 Updated Nov 5, 2025

karpathy / LLM101n

LLM101n: Let's build a Storyteller

35,454 1,929 Updated Aug 1, 2024

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 28,073 3,264 Updated Jun 26, 2025

nox-410 / tvm.tl

An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.

Python 51 3 Updated Jul 23, 2024

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,728 1,514 Updated Nov 5, 2025

karpathy / llama2.c

Inference Llama 2 in one file of pure C

C 18,912 2,399 Updated Aug 6, 2024

LeiWang1999 / AutoGPTQ.tvm

GPTQ inference TVM kernel

Cuda 39 1 Updated Apr 25, 2024

alpa-projects / alpa

Training and serving large-scale neural networks with auto parallelization.

Python 3,162 353 Updated Dec 9, 2023

Jokeren / Awesome-GPU

Awesome resources for GPUs

600 56 Updated Jul 1, 2023

ZhangGe6 / onnx-modifier

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

JavaScript 1,572 192 Updated Feb 25, 2025

gpgpu-sim / gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,481 582 Updated Feb 15, 2025