Skip to content
View jason-huang03's full-sized avatar

Organizations

@thu-nics @thu-ml

Block or report jason-huang03

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
45 stars written in C++
Clear filter

LLM inference in C/C++

C++ 94,524 14,789 Updated Feb 6, 2026

Productive, portable, and performant GPU programming in Python.

C++ 27,950 2,380 Updated Jan 5, 2026

Tensor library for machine learning

C++ 13,917 1,460 Updated Jan 30, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,453 981 Updated Feb 6, 2026

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,703 1,004 Updated Feb 4, 2026

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,390 1,015 Updated Dec 4, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,232 1,661 Updated Feb 4, 2026

High-speed Large Language Model Serving for Local Deployment

C++ 8,635 482 Updated Jan 24, 2026

Implementation of popular deep learning networks with TensorRT network definition API

C++ 7,674 1,868 Updated Feb 2, 2026

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,725 598 Updated Feb 5, 2026

Transformer related optimization, including BERT, GPT

C++ 6,392 929 Updated Mar 27, 2024

header only, dependency-free deep learning framework in C++14

C++ 6,016 1,397 Updated Apr 17, 2022

Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++

C++ 5,352 526 Updated Feb 4, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,694 549 Updated Feb 6, 2026

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。

C++ 4,144 418 Updated Jan 29, 2026

校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

C++ 3,311 357 Updated Jun 22, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,119 172 Updated Jan 29, 2026

Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++ 1,899 83 Updated Dec 23, 2025

SPlisHSPlasH is an open-source library for the physically-based simulation of fluids.

C++ 1,794 317 Updated Nov 28, 2025

Large-scale LLM inference engine

C++ 1,647 185 Updated Jan 21, 2026

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,242 89 Updated Aug 28, 2025

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 906 102 Updated Feb 6, 2026

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 810 58 Updated Mar 6, 2025

Hands-On Practical MLIR Tutorial

C++ 716 106 Updated Oct 20, 2023

Perplexity GPU Kernels

C++ 558 75 Updated Nov 7, 2025

collection of benchmarks to measure basic GPU capabilities

C++ 492 78 Updated Oct 24, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 462 82 Updated Feb 6, 2026

A sparse attention kernel supporting mix sparse patterns

C++ 452 45 Updated Jan 18, 2026

A header-only C++ library for deep neural networks

C++ 431 96 Updated Apr 16, 2021

Demonstration of various hardware effects on CUDA GPUs.

C++ 391 30 Updated Nov 22, 2023
Next