jeejeelee

Follow

🎯

Focusing

Jee Jee Li jeejeelee

🎯

Focusing

Follow

79 followers · 16 following

Chengdu, China
19:00 (UTC +08:00)

Achievements

Achievements

Stars

thuml / depyf

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 770 26 Updated Oct 13, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,453 475 Updated Dec 20, 2025

flagos-ai / FlagGems

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 804 178 Updated Dec 19, 2025

Bruce-Lee-LY / cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 507 87 Updated Sep 8, 2024

microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 732 55 Updated Aug 6, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,987 877 Updated Dec 4, 2025

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 1,916 146 Updated Jun 17, 2025

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

1,737 112 Updated Nov 10, 2025

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,847 328 Updated Nov 28, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 4,312 606 Updated Dec 20, 2025

RRZE-HPC / gpu-benches

collection of benchmarks to measure basic GPU capabilities

C++ 475 72 Updated Oct 24, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,434 1,969 Updated Dec 20, 2025

FMInference / H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python 491 71 Updated Aug 1, 2024

njuhope / cuda_sgemm

Cuda 116 29 Updated Apr 11, 2024

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,016 583 Updated Dec 20, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 17,887 2,461 Updated Dec 20, 2025

mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

C++ 932 94 Updated Jul 4, 2024

Cjkkkk / CUDA_gemm

A simple high performance CUDA GEMM implementation.

Cuda 421 42 Updated Jan 4, 2024

OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 884 72 Updated Nov 26, 2025

Zhen-Dong / Awesome-Quantization-Papers

List of papers related to neural network quantization in recent AI conferences and journals.

771 59 Updated Mar 27, 2025

gem5 / gem5

The official repository for the gem5 computer-system architecture simulator.

C++ 2,345 1,631 Updated Dec 18, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,698 244 Updated Dec 6, 2025

Evian-Zhang / llvm-ir-tutorial

LLVM IR入门指南

LLVM 1,487 162 Updated Jan 31, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,816 12,084 Updated Dec 20, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,407 635 Updated Dec 20, 2025

tpoisonooo / how-to-optimize-gemm

row-major matmul optimization

C++ 692 94 Updated Aug 20, 2025

alibaba / TePDist

TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.

C++ 98 10 Updated Apr 22, 2023

wangzhaode / mnn-llm

llm deploy project based mnn. This project has merged into MNN.

C++ 1,615 176 Updated Jan 20, 2025

TencentARC / MasaCtrl

[ICCV 2023] Consistent Image Synthesis and Editing

Python 828 36 Updated Aug 19, 2024

arogozhnikov / einops

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Python 9,326 390 Updated Nov 24, 2025