Bruce-Lee-LY

Bruce-Lee-LY Bruce-Lee-LY

LLM Infer, AI Infra, CUDA

216 followers · 2 following

Tsinghua University
https://www.zhihu.com/people/mu-zi-zhi-6-28
https://bruce-lee-ly.medium.com

Achievements

x2 x3

Achievements

x2 x3

Stars

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

Cuda 1,119 187 Updated May 17, 2026

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,265 983 Updated May 13, 2026

pranjalssh / fast.cu

Fastest kernels written from scratch

Cuda 578 75 Updated Sep 18, 2025

google / gpu-runtime

C++ 18 6 Updated Dec 18, 2019

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,650 1,034 Updated Apr 30, 2026

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,730 1,856 Updated May 13, 2026

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 27,923 5,954 Updated May 17, 2026

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 19,205 2,855 Updated May 17, 2026

NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source

C 17,017 1,683 Updated May 15, 2026

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 23,813 2,730 Updated May 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bruce-Lee-LY Bruce-Lee-LY

Achievements

Achievements

Block or report Bruce-Lee-LY

Stars

alibaba / rtp-llm

deepseek-ai / DeepGEMM

pranjalssh / fast.cu

google / gpu-runtime

deepseek-ai / FlashMLA

NVIDIA / cutlass

sgl-project / sglang

triton-lang / triton

NVIDIA / open-gpu-kernel-modules

Dao-AILab / flash-attention