Skip to content
View JF-D's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report JF-D

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
33 stars written in C++
Clear filter

An Open Source Machine Learning Framework for Everyone

C++ 192,384 74,975 Updated Nov 10, 2025

LLM inference in C/C++

C++ 89,521 13,627 Updated Nov 10, 2025

JSON for Modern C++

C++ 47,815 7,206 Updated Nov 5, 2025

Caffe: a fast open framework for deep learning.

C++ 34,720 18,596 Updated Jul 31, 2024

Productive, portable, and performant GPU programming in Python.

C++ 27,692 2,363 Updated Oct 6, 2025

Seamless operability between C++11 and Python

C++ 17,435 2,236 Updated Nov 10, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,090 1,854 Updated Nov 10, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,857 898 Updated Sep 30, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,454 958 Updated Oct 24, 2025

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,369 1,010 Updated Aug 20, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,383 450 Updated Aug 2, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,247 422 Updated Nov 10, 2025

Optimized primitives for collective multi-GPU communication

C++ 4,216 1,063 Updated Nov 8, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,883 303 Updated Nov 10, 2025

Tutorial code on how to build your own Deep Learning System in 2k Lines

C++ 2,018 368 Updated Oct 4, 2018

A userspace out-of-memory killer

C++ 1,991 157 Updated Oct 29, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,943 149 Updated Nov 8, 2025

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,844 245 Updated Nov 4, 2025
C++ 1,655 281 Updated Sep 11, 2018

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,165 82 Updated Aug 28, 2025

TinyChatEngine: On-Device LLM Inference Library

C++ 922 95 Updated Jul 4, 2024

A "large" language model running on a microcontroller

C++ 540 36 Updated Dec 9, 2023
C++ 512 41 Updated Sep 12, 2025

A high-performance inference system for large language models, designed for production environments.

C++ 482 37 Updated Nov 6, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 433 72 Updated Nov 10, 2025

Code examples from the book.

C++ 427 159 Updated May 7, 2016

Microsoft Collective Communication Library

C++ 372 32 Updated Sep 20, 2023

The Foundation for All Legate Libraries

C++ 232 63 Updated Nov 9, 2025

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

C++ 200 34 Updated Apr 27, 2022
Next