RbRe145

Follow

🎯

Focusing

Huaijin Zheng RbRe145

🎯

Focusing

Follow

high performance computing computes a me a better future

11 followers · 90 following

SEU
09:24 (UTC -12:00)

Achievements

Achievements

Lists (3)

Sort

✨ Inspiration

IWannaBiye

zk

Stars

RussWong / vLLM_SGLang_cuteDSL_tutorial

Jupyter Notebook 34 Updated Apr 16, 2026

JohnG-mit / useful_scripts

Some useful scripts for linux operation and maintenance.

Shell 1 Updated Apr 18, 2026

RightNow-AI / autokernel

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python 1,316 127 Updated Mar 19, 2026

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,328 276 Updated Apr 25, 2026

RbRe145 / GraphNet

Forked from PaddlePaddle/GraphNet

Python 1 Updated Feb 7, 2026

ScalingIntelligence / KernelBench

KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)

Jupyter Notebook 952 159 Updated Mar 24, 2026

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,295 100 Updated Aug 28, 2025

PaddlePaddle / GraphNet

A Large-Scale Computation Graph Database for Tensor Compiler Research

Python 95 51 Updated Apr 24, 2026

hxzd5568 / GraphNet

Python 4 1 Updated Jul 18, 2025

hxzd5568 / Athena_torch

A torch model extract tool which is helpful in building the torch unit test files.

Python 1 1 Updated Jul 17, 2025

RbRe145 / vllm

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1 Updated May 29, 2025

RbRe145 / Paddle

Forked from PaddlePaddle/Paddle

打工人打工魂

C++ 1 Updated Jun 19, 2025

l0ngc / hpc-learning

hpc-learning

783 45 Updated May 30, 2024

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,793 1,090 Updated Apr 20, 2026

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,675 1,065 Updated Apr 27, 2026

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Python 2,667 301 Updated Apr 7, 2026

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 23,561 2,650 Updated Apr 27, 2026

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes for ML SYS.

Python 6,136 401 Updated Apr 23, 2026

WenhaoHe02 / multinode_parallism_of_Moe

东南大学srtp，多节点Moe模型并行策略研究

Python 3 Updated May 13, 2025

TideDra / lmm-r1

Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.

Python 844 53 Updated May 14, 2025

24mlight / A_Share_investment_Agent

Python 2,392 613 Updated Jul 25, 2025

Infrasys-AI / AIInfra

AIInfra（AI 基础设施）指AI系统从底层芯片等硬件，到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 6,873 895 Updated Dec 22, 2025

HuaizhengZhang / AI-Infra-from-Zero-to-Hero

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

3,928 379 Updated Jul 25, 2025

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 42,202 4,807 Updated Apr 24, 2026

OpenHands / OpenHands

🙌 OpenHands: AI-Driven Development

Python 72,192 9,116 Updated Apr 27, 2026

openmlsys / openmlsys-cuda

Tutorials for writing high-performance GPU operators in AI frameworks.

Cuda 135 14 Updated Aug 12, 2023

openmlsys / openmlsys

《Machine Learning Systems: Design and Implementation》 (V2 is launching soon）

TeX 4,802 476 Updated Mar 15, 2026

bytedance / libnvmf

NVMe over Fabrics user space initiator library.

C 40 6 Updated Sep 2, 2024

bytedance / effective_transformer

Running BERT without Padding

C++ 479 53 Updated Mar 18, 2022

ByteDance-Seed / ShadowKV

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 295 23 Updated May 1, 2025