Skip to content
View JensenFire's full-sized avatar
  • MM
  • Beijing

Block or report JensenFire

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

My learning notes for ML SYS.

Python 4,770 303 Updated Dec 22, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,026 886 Updated Dec 4, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,827 1,036 Updated Dec 23, 2025

Ring attention implementation with flash attention

Python 952 91 Updated Sep 10, 2025

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 616 74 Updated Dec 17, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,466 478 Updated Dec 23, 2025

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 808 181 Updated Dec 23, 2025

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,208 178 Updated Jul 29, 2023

A book about LLVM & Clang(中文开源书:玩转 LLVM)

C++ 401 52 Updated Jul 3, 2020
Python 103 8 Updated Sep 9, 2024

Ahead of Time (AOT) Triton Math Library

Python 84 35 Updated Dec 12, 2025

Development repository for the Triton language and compiler

MLIR 17,914 2,465 Updated Dec 23, 2025