JF-D

🎯

Focusing

JFDuan JF-D

🎯

Focusing

Interested in AI for system, efficient LLM training and serving!

98 followers · 182 following

Ph.D. Candidate@CUHK-MMLab, B.E.@ UCAS
HongKong
https://jf-d.github.io/

Achievements

Highlights

Lists (1)

Sort

🔮 Future ideas

Stars

33 stars written in C++

Clear filter

tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone

C++ 192,326 74,958 Updated Nov 6, 2025

ggml-org / llama.cpp

LLM inference in C/C++

C++ 89,236 13,583 Updated Nov 6, 2025

nlohmann / json

JSON for Modern C++

C++ 47,801 7,202 Updated Nov 5, 2025

BVLC / caffe

Caffe: a fast open framework for deep learning.

C++ 34,719 18,598 Updated Jul 31, 2024

taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.

C++ 27,672 2,362 Updated Oct 6, 2025

pybind / pybind11

Seamless operability between C++11 and Python

C++ 17,420 2,235 Updated Nov 3, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,052 1,844 Updated Nov 6, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,847 896 Updated Sep 30, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,442 957 Updated Oct 24, 2025

Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,368 1,010 Updated Aug 20, 2025

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,377 450 Updated Aug 2, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,229 420 Updated Nov 6, 2025

NVIDIA / nccl

Optimized primitives for collective multi-GPU communication

C++ 4,211 1,063 Updated Nov 6, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,867 301 Updated Nov 6, 2025

tqchen / tinyflow

Tutorial code on how to build your own Deep Learning System in 2k Lines

C++ 2,018 368 Updated Oct 4, 2018

facebookincubator / oomd

A userspace out-of-memory killer

C++ 1,990 158 Updated Oct 29, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,934 148 Updated Nov 5, 2025

flexflow / flexflow-train

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,844 245 Updated Nov 4, 2025

dmlc / nnvm

C++ 1,655 281 Updated Sep 11, 2018

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,162 82 Updated Aug 28, 2025

mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

C++ 917 94 Updated Jul 4, 2024

maxbbraun / llama4micro

A "large" language model running on a microcontroller

C++ 540 36 Updated Dec 9, 2023

Tencent / KsanaLLM

C++ 510 41 Updated Sep 12, 2025

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

C++ 481 37 Updated Nov 4, 2025

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 433 73 Updated Nov 6, 2025

cythonbook / examples

Code examples from the book.

C++ 427 159 Updated May 7, 2016

microsoft / msccl

Microsoft Collective Communication Library

C++ 372 32 Updated Sep 20, 2023

InfiniTensor / InfiniTensor

C++ 267 67 Updated Oct 30, 2025

nv-legate / legate

The Foundation for All Legate Libraries

C++ 231 63 Updated Nov 6, 2025

mit-han-lab / inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

C++ 200 34 Updated Apr 27, 2022