JF-D

🎯

Focusing

JFDuan JF-D

🎯

Focusing

Interested in AI for system, efficient LLM training and serving!

98 followers · 182 following

Ph.D. Candidate@CUHK-MMLab, B.E.@ UCAS
HongKong
https://jf-d.github.io/

Achievements

Highlights

Lists (1)

Sort

🔮 Future ideas

Stars

8 results for source starred repositories written in Cuda

Clear filter

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,696 972 Updated Nov 6, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,863 737 Updated Oct 15, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 4,021 558 Updated Nov 6, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,868 192 Updated Nov 6, 2025

NVIDIA / nccl-tests

NCCL Tests

Cuda 1,325 326 Updated Nov 3, 2025

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,183 172 Updated Jul 29, 2023

wangzyon / NVIDIA_SGEMM_PRACTICE

Step-by-step optimization of CUDA SGEMM

Cuda 393 51 Updated Mar 30, 2022

AlibabaResearch / flash-llm

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 222 22 Updated Sep 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JFDuan JF-D

Achievements

Achievements

Highlights

Block or report JF-D

Lists (1)

🔮 Future ideas

Stars

deepseek-ai / DeepEP

deepseek-ai / DeepGEMM

flashinfer-ai / flashinfer

HazyResearch / ThunderKittens

NVIDIA / nccl-tests

Liu-xiandong / How_to_optimize_in_GPU

wangzyon / NVIDIA_SGEMM_PRACTICE

AlibabaResearch / flash-llm