Skip to content
View RbRe145's full-sized avatar
🎯
Focusing
🎯
Focusing
  • SEU
  • 09:24 (UTC -12:00)

Block or report RbRe145

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Jupyter Notebook 34 Updated Apr 16, 2026

Some useful scripts for linux operation and maintenance.

Shell 1 Updated Apr 18, 2026

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python 1,316 127 Updated Mar 19, 2026

Tile primitives for speedy kernels

Cuda 3,328 276 Updated Apr 25, 2026
Python 1 Updated Feb 7, 2026

KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)

Jupyter Notebook 952 159 Updated Mar 24, 2026

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,295 100 Updated Aug 28, 2025

A Large-Scale Computation Graph Database for Tensor Compiler Research

Python 95 51 Updated Apr 24, 2026
Python 4 1 Updated Jul 18, 2025

A torch model extract tool which is helpful in building the torch unit test files.

Python 1 1 Updated Jul 17, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1 Updated May 29, 2025

打工人打工魂

C++ 1 Updated Jun 19, 2025

hpc-learning

783 45 Updated May 30, 2024

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,793 1,090 Updated Apr 20, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,675 1,065 Updated Apr 27, 2026

Minimalistic large language model 3D-parallelism training

Python 2,667 301 Updated Apr 7, 2026

Fast and memory-efficient exact attention

Python 23,561 2,650 Updated Apr 27, 2026

My learning notes for ML SYS.

Python 6,136 401 Updated Apr 23, 2026

东南大学srtp,多节点Moe模型并行策略研究

Python 3 Updated May 13, 2025

Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.

Python 844 53 Updated May 14, 2025

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 6,873 895 Updated Dec 22, 2025

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

3,928 379 Updated Jul 25, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 42,202 4,807 Updated Apr 24, 2026

🙌 OpenHands: AI-Driven Development

Python 72,192 9,116 Updated Apr 27, 2026

Tutorials for writing high-performance GPU operators in AI frameworks.

Cuda 135 14 Updated Aug 12, 2023

《Machine Learning Systems: Design and Implementation》 (V2 is launching soon)

TeX 4,802 476 Updated Mar 15, 2026

NVMe over Fabrics user space initiator library.

C 40 6 Updated Sep 2, 2024

Running BERT without Padding

C++ 479 53 Updated Mar 18, 2022

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 295 23 Updated May 1, 2025
Next