JF-D

Follow

🎯

Focusing

JFDuan JF-D

🎯

Focusing

Follow

Interested in AI for system, efficient LLM training and serving!

99 followers · 182 following

Ph.D. Candidate@CUHK-MMLab, B.E.@ UCAS
HongKong
https://jf-d.github.io/

Achievements

Achievements

Highlights

Pro

Lists (1)

Sort

🔮 Future ideas

Stars

linux-rdma / perftest

Infiniband Verbs Performance Tests

C 858 353 Updated Oct 27, 2025

ovg-project / kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 606 51 Updated Nov 4, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 4,066 247 Updated Oct 6, 2025

MoonshotAI / checkpoint-engine

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python 805 61 Updated Nov 4, 2025

leigest519 / ScreenCoder

ScreenCoder — Turn any UI screenshot into clean, editable HTML/CSS with full control. Fast, accurate, and easy to customize.

Python 2,469 232 Updated Oct 22, 2025

alibaba / ROLL

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,195 136 Updated Nov 5, 2025

InternRobotics / MMSI-Bench

[arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

Python 55 Updated Oct 23, 2025

facebookresearch / Multi-SpatialMLLM

Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

Python 158 6 Updated Oct 10, 2025

zartbot / shallowsim

DeepSeek-V3/R1 inference performance simulator

Jupyter Notebook 169 23 Updated Mar 27, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,441 957 Updated Oct 24, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,849 298 Updated Nov 5, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,861 736 Updated Oct 15, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,691 972 Updated Nov 5, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,842 896 Updated Sep 30, 2025

facebookresearch / MLGym

MLGym A New Framework and Benchmark for Advancing AI Research Agents

Python 566 55 Updated Aug 10, 2025

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,043 3,177 Updated Nov 5, 2025

srush / awesome-o1

A bibliography and survey of the papers surrounding o1

TeX 1,208 51 Updated Nov 16, 2024

hao-ai-lab / vllm-ltr

[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank

Python 62 15 Updated Nov 4, 2024

flexflow / flexflow-serve

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 63 5 Updated Sep 15, 2025

hao-ai-lab / FastVideo

A unified inference and post-training framework for accelerated video generation.

Python 2,520 192 Updated Nov 5, 2025

LoongServe / LoongServe

Jupyter Notebook 124 12 Updated Nov 11, 2024

NVlabs / COAT

[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training

Python 244 22 Updated Aug 9, 2025

chengzeyi / ParaAttention

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

Python 385 38 Updated Jul 5, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,128 2,426 Updated Nov 5, 2025

JamesAslan / MicroArchBench

Python 74 12 Updated Oct 29, 2024

cat538 / SKVQ

[COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models

Python 24 3 Updated Oct 5, 2024

InternRobotics / VLM-Grounder

[CoRL 2024] VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding

Python 119 1 Updated May 22, 2025

Open-Source-O1 / Open-O1

Python 1,348 54 Updated Nov 21, 2024

SocialAI-tianji / Tianji

制作懂人情世故的大语言模型 | 涵盖提示词工程、RAG、Agent、LLM微调教程

Python 1,589 130 Updated Apr 29, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 910 44 Updated Oct 29, 2025