jason-huang03

Follow

Haofeng Huang jason-huang03

Follow

Undergraduate student from IIIS (Yao Class), Tsinghua University | Training Framework | Kernel | Model Arch & GenAI

208 followers · 37 following

Tsinghua University
Beijing, China
https://jason-huang03.github.io/

Achievements

Achievements

Organizations

Stars

45 stars written in C++

ggml-org / llama.cpp

LLM inference in C/C++

C++ 94,524 14,789 Updated Feb 6, 2026

taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.

C++ 27,950 2,380 Updated Jan 5, 2026

ggml-org / ggml

Tensor library for machine learning

C++ 13,917 1,460 Updated Jan 30, 2026

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,453 981 Updated Feb 6, 2026

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,703 1,004 Updated Feb 4, 2026

Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,390 1,015 Updated Dec 4, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,232 1,661 Updated Feb 4, 2026

Tiiny-AI / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,635 482 Updated Jan 24, 2026

wang-xinyu / tensorrtx

Implementation of popular deep learning networks with TensorRT network definition API

C++ 7,674 1,868 Updated Feb 2, 2026

google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,725 598 Updated Feb 5, 2026

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,392 929 Updated Mar 27, 2024

tiny-dnn / tiny-dnn

header only, dependency-free deep learning framework in C++14

C++ 6,016 1,397 Updated Apr 17, 2022

leejet / stable-diffusion.cpp

Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++

C++ 5,352 526 Updated Feb 4, 2026

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,694 549 Updated Feb 6, 2026

ztxz16 / fastllm

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型，任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型，单并发20tps；INT4量化模型单并发30tps，多并发可达60+。

C++ 4,144 418 Updated Jan 29, 2026

zjhellofss / KuiperInfer

校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

C++ 3,311 357 Updated Jun 22, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,119 172 Updated Jan 29, 2026

ashvardanian / less_slow.cpp

Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++ 1,899 83 Updated Dec 23, 2025

InteractiveComputerGraphics / SPlisHSPlasH

SPlisHSPlasH is an open-source library for the physically-based simulation of fluids.

C++ 1,794 317 Updated Nov 28, 2025

aphrodite-engine / aphrodite-engine

Large-scale LLM inference engine

C++ 1,647 185 Updated Jan 21, 2026

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,242 89 Updated Aug 28, 2025

zhihu / ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 906 102 Updated Feb 6, 2026

mit-han-lab / omniserve

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 810 58 Updated Mar 6, 2025

KEKE046 / mlir-tutorial

Hands-On Practical MLIR Tutorial

C++ 716 106 Updated Oct 20, 2023

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 558 75 Updated Nov 7, 2025

RRZE-HPC / gpu-benches

collection of benchmarks to measure basic GPU capabilities

C++ 492 78 Updated Oct 24, 2025

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 462 82 Updated Feb 6, 2026

mit-han-lab / Block-Sparse-Attention

A sparse attention kernel supporting mix sparse patterns

C++ 452 45 Updated Jan 18, 2026

yixuan / MiniDNN

A header-only C++ library for deep neural networks

C++ 431 96 Updated Apr 16, 2021

Kobzol / hardware-effects-gpu

Demonstration of various hardware effects on CUDA GPUs.

C++ 391 30 Updated Nov 22, 2023