Skip to content
View jeejeelee's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Chengdu, China
  • 19:00 (UTC +08:00)

Block or report jeejeelee

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 770 26 Updated Oct 13, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,453 475 Updated Dec 20, 2025

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 804 178 Updated Dec 19, 2025

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 507 87 Updated Sep 8, 2024

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 732 55 Updated Aug 6, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,987 877 Updated Dec 4, 2025

A curated list for Efficient Large Language Models

Python 1,916 146 Updated Jun 17, 2025

Awesome LLM compression research papers and tools.

1,737 112 Updated Nov 10, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,847 328 Updated Nov 28, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,312 606 Updated Dec 20, 2025

collection of benchmarks to measure basic GPU capabilities

C++ 475 72 Updated Oct 24, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,434 1,969 Updated Dec 20, 2025

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python 491 71 Updated Aug 1, 2024
Cuda 116 29 Updated Apr 11, 2024

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,016 583 Updated Dec 20, 2025

Development repository for the Triton language and compiler

MLIR 17,887 2,461 Updated Dec 20, 2025

TinyChatEngine: On-Device LLM Inference Library

C++ 932 94 Updated Jul 4, 2024

A simple high performance CUDA GEMM implementation.

Cuda 421 42 Updated Jan 4, 2024

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 884 72 Updated Nov 26, 2025

List of papers related to neural network quantization in recent AI conferences and journals.

771 59 Updated Mar 27, 2025

The official repository for the gem5 computer-system architecture simulator.

C++ 2,345 1,631 Updated Dec 18, 2025

how to optimize some algorithm in cuda.

Cuda 2,698 244 Updated Dec 6, 2025

LLVM IR入门指南

LLVM 1,487 162 Updated Jan 31, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,816 12,084 Updated Dec 20, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,407 635 Updated Dec 20, 2025

row-major matmul optimization

C++ 692 94 Updated Aug 20, 2025

TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.

C++ 98 10 Updated Apr 22, 2023

llm deploy project based mnn. This project has merged into MNN.

C++ 1,615 176 Updated Jan 20, 2025

[ICCV 2023] Consistent Image Synthesis and Editing

Python 828 36 Updated Aug 19, 2024

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Python 9,326 390 Updated Nov 24, 2025
Next