JF-D

🎯

Focusing

JFDuan JF-D

🎯

Focusing

Interested in AI for system, efficient LLM training and serving!

98 followers · 182 following

Ph.D. Candidate@CUHK-MMLab, B.E.@ UCAS
HongKong
https://jf-d.github.io/

Achievements

Highlights

Lists (1)

Sort

🔮 Future ideas

Stars

33 stars written in C++

Clear filter

tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone

C++ 192,384 74,975 Updated Nov 10, 2025

ggml-org / llama.cpp

LLM inference in C/C++

C++ 89,521 13,627 Updated Nov 10, 2025

nlohmann / json

JSON for Modern C++

C++ 47,815 7,206 Updated Nov 5, 2025

BVLC / caffe

Caffe: a fast open framework for deep learning.

C++ 34,720 18,596 Updated Jul 31, 2024

taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.

C++ 27,692 2,363 Updated Oct 6, 2025

pybind / pybind11

Seamless operability between C++11 and Python

C++ 17,435 2,236 Updated Nov 10, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,090 1,854 Updated Nov 10, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,857 898 Updated Sep 30, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,454 958 Updated Oct 24, 2025

Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,369 1,010 Updated Aug 20, 2025

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,383 450 Updated Aug 2, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,247 422 Updated Nov 10, 2025

NVIDIA / nccl

Optimized primitives for collective multi-GPU communication

C++ 4,216 1,063 Updated Nov 8, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,883 303 Updated Nov 10, 2025

tqchen / tinyflow

Tutorial code on how to build your own Deep Learning System in 2k Lines

C++ 2,018 368 Updated Oct 4, 2018

facebookincubator / oomd

A userspace out-of-memory killer

C++ 1,991 157 Updated Oct 29, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,943 149 Updated Nov 8, 2025

flexflow / flexflow-train

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,844 245 Updated Nov 4, 2025

dmlc / nnvm

C++ 1,655 281 Updated Sep 11, 2018

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,165 82 Updated Aug 28, 2025

mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

C++ 922 95 Updated Jul 4, 2024

maxbbraun / llama4micro

A "large" language model running on a microcontroller

C++ 540 36 Updated Dec 9, 2023

Tencent / KsanaLLM

C++ 512 41 Updated Sep 12, 2025

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

C++ 482 37 Updated Nov 6, 2025

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 433 72 Updated Nov 10, 2025

cythonbook / examples

Code examples from the book.

C++ 427 159 Updated May 7, 2016

microsoft / msccl

Microsoft Collective Communication Library

C++ 372 32 Updated Sep 20, 2023

InfiniTensor / InfiniTensor

C++ 268 67 Updated Oct 30, 2025

nv-legate / legate

The Foundation for All Legate Libraries

C++ 232 63 Updated Nov 9, 2025

mit-han-lab / inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

C++ 200 34 Updated Apr 27, 2022