Skip to content
View jason-huang03's full-sized avatar

Organizations

@thu-nics @thu-ml

Block or report jason-huang03

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 1,391 102 Updated Dec 18, 2025

Accelerating MoE with IO and Tile-aware Optimizations

Python 330 14 Updated Dec 18, 2025

Boosting 4-bit inference kernels with 2:4 Sparsity

Cuda 89 5 Updated Sep 4, 2024

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 732 55 Updated Aug 6, 2025

A Triton-only attention backend for vLLM

Python 23 7 Updated Dec 18, 2025

NVIDIA cuTile learn

Python 130 Updated Dec 9, 2025

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 2,436 326 Updated Dec 19, 2025

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,628 83 Updated Dec 19, 2025

Helpful kernel tutorials and examples for tile-based GPU programming

Python 455 22 Updated Dec 19, 2025

Basic Analysis, undergraduate real analysis textbook

TeX 86 30 Updated Dec 11, 2025

Quarto code for marksmath.org

JavaScript 1 Updated Dec 4, 2025

Code and content for andrewheiss.com

HTML 147 21 Updated Dec 3, 2025
C++ 209 6 Updated Nov 19, 2025

GPU documentation for humans

Python 416 51 Updated Dec 9, 2025

🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.

Cuda 242 12 Updated Nov 18, 2025

torchcomms: a modern PyTorch communications API

C++ 310 46 Updated Dec 19, 2025

Efficient triton implementation of Native Sparse Attention.

Python 255 18 Updated May 23, 2025

🚀🚀 Efficient implementations of Native Sparse Attention

Python 1,042 12 Updated Sep 29, 2025

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLM, VLM, and video generation models.

Python 643 64 Updated Nov 19, 2025

High-throughput tensor loading for PyTorch

Python 214 13 Updated Dec 3, 2025

Development repository for the Triton language and compiler

MLIR 17,882 2,458 Updated Dec 19, 2025

Propositions of solutions to the exercises from Terence Tao's textbooks, Analysis I & II. Mirrored from https://gitlab.com/f-santos/taoanalysissolutions

TeX 103 11 Updated Jan 17, 2023

Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Large Language Models.

Python 86 15 Updated Nov 21, 2025

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 146 13 Updated Sep 18, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 444 77 Updated Dec 19, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,909 347 Updated Dec 19, 2025

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,762 103 Updated Nov 4, 2025

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 431 14 Updated Dec 16, 2025
Next