jason-huang03

Haofeng Huang jason-huang03

Undergraduate student from IIIS (Yao Class), Tsinghua University | Training Framework | Kernel | Model Arch & GenAI

181 followers · 33 following

Tsinghua University
Beijing, China

Achievements

Organizations

Stars

44 results for source starred repositories written in C++

Clear filter

ggml-org / llama.cpp

LLM inference in C/C++

C++ 89,598 13,644 Updated Nov 11, 2025

taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.

C++ 27,698 2,363 Updated Oct 6, 2025

ggml-org / ggml

Tensor library for machine learning

C++ 13,537 1,387 Updated Nov 9, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,102 1,858 Updated Nov 11, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,857 899 Updated Sep 30, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,457 958 Updated Oct 24, 2025

Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,368 1,011 Updated Aug 20, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,758 1,522 Updated Nov 10, 2025

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,384 450 Updated Aug 2, 2025

wang-xinyu / tensorrtx

Implementation of popular deep learning networks with TensorRT network definition API

C++ 7,555 1,850 Updated Nov 6, 2025

google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,611 573 Updated Nov 11, 2025

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,344 920 Updated Mar 27, 2024

tiny-dnn / tiny-dnn

header only, dependency-free deep learning framework in C++14

C++ 5,995 1,398 Updated Apr 17, 2022

leejet / stable-diffusion.cpp

Diffusion model(SD,Flux,Wan,Qwen Image,...) inference in pure C/C++

C++ 4,543 442 Updated Nov 11, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,246 424 Updated Nov 11, 2025

ztxz16 / fastllm

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型，任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型，单并发20tps；INT4量化模型单并发30tps，多并发可达60+。

C++ 4,066 413 Updated Oct 28, 2025

zjhellofss / KuiperInfer

校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

C++ 3,175 349 Updated Jun 22, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 1,942 149 Updated Nov 11, 2025

ashvardanian / less_slow.cpp

Playing around "Less Slow" coding practices in C++ 20, C, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO

C++ 1,871 76 Updated Sep 10, 2025

InteractiveComputerGraphics / SPlisHSPlasH

SPlisHSPlasH is an open-source library for the physically-based simulation of fluids.

C++ 1,769 313 Updated Aug 22, 2025

aphrodite-engine / aphrodite-engine

Large-scale LLM inference engine

C++ 1,585 176 Updated Nov 10, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,165 82 Updated Aug 28, 2025

zhihu / ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 903 103 Updated Jul 10, 2025

mit-han-lab / omniserve

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 776 54 Updated Mar 6, 2025

KEKE046 / mlir-tutorial

Hands-On Practical MLIR Tutorial

C++ 649 95 Updated Oct 20, 2023

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 526 69 Updated Nov 7, 2025

RRZE-HPC / gpu-benches

collection of benchmarks to measure basic GPU capabilities

C++ 451 68 Updated Oct 24, 2025

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 433 72 Updated Nov 11, 2025

yixuan / MiniDNN

A header-only C++ library for deep neural networks

C++ 427 96 Updated Apr 16, 2021

Kobzol / hardware-effects-gpu

Demonstration of various hardware effects on CUDA GPUs.

C++ 389 30 Updated Nov 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Haofeng Huang jason-huang03

Achievements

Achievements

Organizations

Block or report jason-huang03

Stars

ggml-org / llama.cpp

taichi-dev / taichi

ggml-org / ggml

NVIDIA / TensorRT-LLM

deepseek-ai / FlashMLA

deepseek-ai / 3FS

Oneflow-Inc / oneflow

NVIDIA / cutlass

SJTU-IPADS / PowerInfer

wang-xinyu / tensorrtx

google / gemma.cpp

NVIDIA / FasterTransformer

tiny-dnn / tiny-dnn

leejet / stable-diffusion.cpp

kvcache-ai / Mooncake

ztxz16 / fastllm

zjhellofss / KuiperInfer

mirage-project / mirage

ashvardanian / less_slow.cpp

InteractiveComputerGraphics / SPlisHSPlasH

aphrodite-engine / aphrodite-engine

bytedance / flux

zhihu / ZhiLight

mit-han-lab / omniserve

KEKE046 / mlir-tutorial

perplexityai / pplx-kernels

RRZE-HPC / gpu-benches

microsoft / mscclpp

yixuan / MiniDNN

Kobzol / hardware-effects-gpu