-
RadixArk | AMD | Tsinghua University
- California, USA
-
10:47
(UTC -07:00) - https://yushengsu-thu.github.io/
- @thu_yushengsu
Highlights
- Pro
Lists (4)
Sort Name ascending (A-Z)
Stars
Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language
how to optimize some algorithm in cuda.
Train the smallest LM you can that fits in 16MB. Best model wins!
A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-performance systems.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Claude Opus 4.6 wrote a dependency-free C compiler in Rust, with backends targeting x86 (64- and 32-bit), ARM, and RISC-V, capable of compiling a booting Linux kernel.
Tutorials for Triton, a language for writing gpu kernels
HuggingFace conversion and training library for Megatron-based models
Training library for Megatron-based models with bidirectional Hugging Face conversion capability
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Getting Started with Triton: A Tutorial for Python Beginners
A simple, performant and scalable Jax LLM!
The absolute trainer to light up AI agents.
Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepowe…
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation feedback, cross-platform NVIDIA/AMD, Kernelbook + KernelBench
open-source coding LLM for software engineering tasks
yushengsu-thu / sglang
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM training.