Skip to content
View RbRe145's full-sized avatar
🎯
Focusing
🎯
Focusing
  • SEU
  • 18:56 (UTC -12:00)

Block or report RbRe145

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 1 Updated Oct 20, 2025

KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)

Jupyter Notebook 718 101 Updated Dec 19, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,207 85 Updated Aug 28, 2025

A Large-Scale Computation Graph Database for Tensor Compiler Research

Python 77 43 Updated Dec 18, 2025
Python 4 1 Updated Jul 18, 2025

A torch model extract tool which is helpful in building the torch unit test files.

Python 1 1 Updated Jul 17, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1 Updated May 29, 2025

打工人打工魂

C++ 1 Updated Jun 19, 2025

hpc-learning

766 47 Updated May 30, 2024

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,971 877 Updated Dec 4, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,651 748 Updated Dec 19, 2025

Minimalistic large language model 3D-parallelism training

Python 2,370 260 Updated Dec 11, 2025

Fast and memory-efficient exact attention

Python 21,182 2,230 Updated Dec 18, 2025

My learning notes for ML SYS.

Python 4,690 298 Updated Dec 19, 2025

东南大学srtp,多节点Moe模型并行策略研究

Python 3 Updated May 13, 2025

Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.

Python 832 54 Updated May 14, 2025

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 5,462 758 Updated Dec 3, 2025

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

3,476 353 Updated Jul 25, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 41,033 4,669 Updated Dec 18, 2025

🙌 OpenHands: AI-Driven Development

Python 65,773 8,081 Updated Dec 19, 2025

Tutorials for writing high-performance GPU operators in AI frameworks.

Cuda 132 15 Updated Aug 12, 2023

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,719 478 Updated Apr 13, 2024

NVMe over Fabrics user space initiator library.

C 37 4 Updated Sep 2, 2024

Running BERT without Padding

C++ 476 53 Updated Mar 18, 2022

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 276 18 Updated May 1, 2025

A CUDA tutorial to make people learn CUDA program from 0

Cuda 262 65 Updated Jul 9, 2024

Learning materials for Stanford CS149 : Parallel Computing

C 258 43 Updated Jul 31, 2021

优化版本的京东茅台抢购神器

Python 256 4,849 Updated Dec 30, 2020
Next