Skip to content
View RbRe145's full-sized avatar
🎯
Focusing
🎯
Focusing
  • SEU
  • 07:20 (UTC -12:00)

Block or report RbRe145

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
96 results for source starred repositories
Clear filter

KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)

Jupyter Notebook 790 131 Updated Jan 20, 2026

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,241 89 Updated Aug 28, 2025

A Large-Scale Computation Graph Database for Tensor Compiler Research

Python 84 45 Updated Feb 3, 2026
Python 4 1 Updated Jul 18, 2025

A torch model extract tool which is helpful in building the torch unit test files.

Python 1 1 Updated Jul 17, 2025

hpc-learning

778 46 Updated May 30, 2024

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,567 943 Updated Feb 4, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,040 840 Updated Feb 4, 2026

Minimalistic large language model 3D-parallelism training

Python 2,536 278 Updated Dec 11, 2025

Fast and memory-efficient exact attention

Python 22,078 2,346 Updated Feb 4, 2026

My learning notes for ML SYS.

Python 5,267 342 Updated Jan 30, 2026

东南大学srtp,多节点Moe模型并行策略研究

Python 3 Updated May 13, 2025

Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.

Python 840 53 Updated May 14, 2025

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 5,974 819 Updated Dec 22, 2025

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

3,658 363 Updated Jul 25, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 41,532 4,703 Updated Feb 3, 2026

🙌 OpenHands: AI-Driven Development

Python 67,496 8,402 Updated Feb 4, 2026

Tutorials for writing high-performance GPU operators in AI frameworks.

Cuda 136 15 Updated Aug 12, 2023

《Machine Learning Systems: Design and Implementation》- Chinese Version

TeX 4,757 475 Updated Apr 13, 2024

NVMe over Fabrics user space initiator library.

C 37 4 Updated Sep 2, 2024

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 283 21 Updated May 1, 2025

A CUDA tutorial to make people learn CUDA program from 0

Cuda 266 67 Updated Jul 9, 2024

Learning materials for Stanford CS149 : Parallel Computing

C 271 43 Updated Jul 31, 2021

优化版本的京东茅台抢购神器

Python 257 4,833 Updated Dec 30, 2020

The pintos source distribution for PKU Operating System Course projects

C 55 74 Updated Feb 17, 2025

GPU programming related news and material links

1,953 113 Updated Sep 17, 2025

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 23,058 2,677 Updated Dec 30, 2025

A CPU-friendly client puzzle with instant verification

C 118 5 Updated Sep 25, 2020
Next