Skip to content
View JF-D's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report JF-D

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
33 stars written in C++
Clear filter

An Open Source Machine Learning Framework for Everyone

C++ 192,326 74,958 Updated Nov 6, 2025

LLM inference in C/C++

C++ 89,236 13,583 Updated Nov 6, 2025

JSON for Modern C++

C++ 47,801 7,202 Updated Nov 5, 2025

Caffe: a fast open framework for deep learning.

C++ 34,719 18,598 Updated Jul 31, 2024

Productive, portable, and performant GPU programming in Python.

C++ 27,672 2,362 Updated Oct 6, 2025

Seamless operability between C++11 and Python

C++ 17,420 2,235 Updated Nov 3, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,052 1,844 Updated Nov 6, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,847 896 Updated Sep 30, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,442 957 Updated Oct 24, 2025

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,368 1,010 Updated Aug 20, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,377 450 Updated Aug 2, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,229 420 Updated Nov 6, 2025

Optimized primitives for collective multi-GPU communication

C++ 4,211 1,063 Updated Nov 6, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,867 301 Updated Nov 6, 2025

Tutorial code on how to build your own Deep Learning System in 2k Lines

C++ 2,018 368 Updated Oct 4, 2018

A userspace out-of-memory killer

C++ 1,990 158 Updated Oct 29, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,934 148 Updated Nov 5, 2025

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,844 245 Updated Nov 4, 2025
C++ 1,655 281 Updated Sep 11, 2018

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,162 82 Updated Aug 28, 2025

TinyChatEngine: On-Device LLM Inference Library

C++ 917 94 Updated Jul 4, 2024

A "large" language model running on a microcontroller

C++ 540 36 Updated Dec 9, 2023
C++ 510 41 Updated Sep 12, 2025

A high-performance inference system for large language models, designed for production environments.

C++ 481 37 Updated Nov 4, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 433 73 Updated Nov 6, 2025

Code examples from the book.

C++ 427 159 Updated May 7, 2016

Microsoft Collective Communication Library

C++ 372 32 Updated Sep 20, 2023

The Foundation for All Legate Libraries

C++ 231 63 Updated Nov 6, 2025

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

C++ 200 34 Updated Apr 27, 2022
Next