Skip to content
View xiuhu17's full-sized avatar

Highlights

  • Pro

Block or report xiuhu17

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 1,063 87 Updated Sep 4, 2024

Accelerating MoE with IO and Tile-aware Optimizations

Python 663 80 Updated Apr 29, 2026

Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming

Python 708 68 Updated Apr 29, 2026

compiler learning resources collect.

Python 2,722 370 Updated Mar 19, 2025

NVIDIA cuTile learn

Python 167 2 Updated Dec 9, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,419 139 Updated Apr 22, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 2,032 134 Updated Apr 28, 2026

slime is an LLM post-training framework for RL Scaling.

Python 5,528 758 Updated Apr 29, 2026

Nano vLLM

Python 13,181 2,017 Updated Apr 26, 2026

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,818 1,093 Updated Apr 20, 2026

A Survey of Reinforcement Learning for Large Reasoning Models

TeX 2,448 130 Updated Nov 9, 2025

Code repo for efficient quantized MoE inference with mixture of low-rank compensators

Python 36 Updated Apr 14, 2025

A Unix-like Operating System

C 5 Updated Jan 21, 2024

Algorithms implementation in C++ and solutions of questions (both code and math proof) from “Introduction to Algorithms” (3e) (CLRS) in LaTeX.

C++ 54 7 Updated Dec 27, 2022