Skip to content
View jason-huang03's full-sized avatar
  • Tsinghua University
  • Beijing, China

Organizations

@thu-nics @thu-ml

Block or report jason-huang03

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
44 results for source starred repositories written in C++
Clear filter

LLM inference in C/C++

C++ 89,598 13,644 Updated Nov 11, 2025

Productive, portable, and performant GPU programming in Python.

C++ 27,698 2,363 Updated Oct 6, 2025

Tensor library for machine learning

C++ 13,537 1,387 Updated Nov 9, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,102 1,858 Updated Nov 11, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,857 899 Updated Sep 30, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,457 958 Updated Oct 24, 2025

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,368 1,011 Updated Aug 20, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,758 1,522 Updated Nov 10, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,384 450 Updated Aug 2, 2025

Implementation of popular deep learning networks with TensorRT network definition API

C++ 7,555 1,850 Updated Nov 6, 2025

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,611 573 Updated Nov 11, 2025

Transformer related optimization, including BERT, GPT

C++ 6,344 920 Updated Mar 27, 2024

header only, dependency-free deep learning framework in C++14

C++ 5,995 1,398 Updated Apr 17, 2022

Diffusion model(SD,Flux,Wan,Qwen Image,...) inference in pure C/C++

C++ 4,543 442 Updated Nov 11, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,246 424 Updated Nov 11, 2025

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。

C++ 4,066 413 Updated Oct 28, 2025

校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

C++ 3,175 349 Updated Jun 22, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,942 149 Updated Nov 11, 2025

Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++ 1,871 76 Updated Sep 10, 2025

SPlisHSPlasH is an open-source library for the physically-based simulation of fluids.

C++ 1,769 313 Updated Aug 22, 2025

Large-scale LLM inference engine

C++ 1,585 176 Updated Nov 10, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,165 82 Updated Aug 28, 2025

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 903 103 Updated Jul 10, 2025

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 776 54 Updated Mar 6, 2025

Hands-On Practical MLIR Tutorial

C++ 649 95 Updated Oct 20, 2023

Perplexity GPU Kernels

C++ 526 69 Updated Nov 7, 2025

collection of benchmarks to measure basic GPU capabilities

C++ 451 68 Updated Oct 24, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 433 72 Updated Nov 11, 2025

A header-only C++ library for deep neural networks

C++ 427 96 Updated Apr 16, 2021

Demonstration of various hardware effects on CUDA GPUs.

C++ 389 30 Updated Nov 22, 2023
Next