-
SEU
-
09:24
(UTC -12:00)
Lists (3)
Sort Name ascending (A-Z)
Stars
Some useful scripts for linux operation and maintenance.
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
Tile primitives for speedy kernels
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
A Large-Scale Computation Graph Database for Tensor Compiler Research
A torch model extract tool which is helpful in building the torch unit test files.
RbRe145 / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
A Datacenter Scale Distributed Inference Serving Framework
Minimalistic large language model 3D-parallelism training
Fast and memory-efficient exact attention
My learning notes for ML SYS.
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Tutorials for writing high-performance GPU operators in AI frameworks.
《Machine Learning Systems: Design and Implementation》 (V2 is launching soon)
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference