withlin

🧸

Jinlin withlin

🧸

AI Infra, Docker, Kubernetes, SRE ,ebpf, Observability, MiddleWare, OpenTelemetry, Go, C#, Java, Rust, TypeScript.

435 followers · 694 following

GuangZhou,China

Achievements

x3 x3

Achievements

x3 x3

Organizations

Lists (17)

Sort

✨ Inspiration

interview

6 repositories

iterm

1 repository

k8s

25 repositories

k8s-network

2 repositories

k8s-operator

1 repository

kernel

4 repositories

leetcode

1 repository

network

47 repositories

remote

1 repository

rust

23 repositories

WebAssembly

区块链

5 repositories

Starred repositories

12 stars written in Cuda

Clear filter

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,259 1,041 Updated Apr 12, 2026

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 9,117 1,148 Updated Apr 9, 2026

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,915 267 Updated Apr 9, 2026

NVIDIA / nccl-tests

NCCL Tests

Cuda 1,485 363 Updated Mar 11, 2026

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,277 180 Updated Jul 29, 2023

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 1,118 111 Updated Dec 30, 2024

openai / blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Cuda 1,065 198 Updated Jun 8, 2023

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 883 149 Updated Sep 26, 2025

jinbooooom / ai-infra-hpc

hpc 教程，包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等

Cuda 406 44 Updated Apr 7, 2026

AndreSlavescu / mHC.cu

mHC kernels implemented in CUDA

Cuda 259 20 Updated Mar 9, 2026

osayamenja / FlashMoE

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 245 33 Updated Apr 6, 2026

KuangjuX / NVSHMEM-Tutorial

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 177 14 Updated Feb 11, 2026

Jinlin withlin

Organizations

Lists (17)

AI

ai-learing

ebpf

go-hack

✨ Inspiration

interview

iterm

k8s

k8s-network

k8s-operator

kernel

leetcode

network

remote

rust

WebAssembly

区块链

Starred repositories

Data structures

Algorithm