Skip to content
View jason-huang03's full-sized avatar
  • Tsinghua University
  • Beijing, China

Organizations

@thu-nics @thu-ml

Block or report jason-huang03

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
270 results for source starred repositories
Clear filter

Efficient triton implementation of Native Sparse Attention.

Python 246 18 Updated May 23, 2025

🚀🚀 Efficient implementations of Native Sparse Attention

Python 1,011 8 Updated Sep 29, 2025

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLM, VLM, and video generation models.

Python 614 63 Updated Nov 10, 2025

High-throughput tensor loading for PyTorch

Python 194 11 Updated Oct 27, 2025

Development repository for the Triton language and compiler

MLIR 17,520 2,378 Updated Nov 10, 2025

Propositions of solutions to the exercises from Terence Tao's textbooks, Analysis I & II. Mirrored from https://gitlab.com/f-santos/taoanalysissolutions

TeX 97 11 Updated Jan 17, 2023

Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Large Language Models.

Python 72 14 Updated Oct 1, 2025

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 142 11 Updated Sep 18, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 433 72 Updated Nov 10, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,434 246 Updated Nov 7, 2025

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,563 88 Updated Nov 4, 2025

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 396 8 Updated Nov 8, 2025

Light Video Generation Inference Framework

Python 779 49 Updated Nov 10, 2025

Qwen-Image-Lightning: Speed up Qwen-Image model with distillation

Python 925 36 Updated Oct 14, 2025

CUDA Kernel Benchmarking Library

Cuda 762 90 Updated Oct 21, 2025
Python 120 6 Updated Aug 18, 2025

青稞Talk

160 1 Updated Nov 5, 2025

Hands-On Practical MLIR Tutorial

C++ 649 94 Updated Oct 20, 2023

DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.

Python 66 4 Updated Nov 5, 2025
C++ 312 28 Updated Nov 6, 2025
Python 9 Updated Jul 25, 2025

This is a repo to track the latest autoregressive visual generation papers.

409 5 Updated Jun 25, 2025

😼 优雅地使用基于 clash/mihomo 的代理环境

Shell 5,673 719 Updated Nov 7, 2025

CUDA on non-NVIDIA GPUs

Rust 13,409 849 Updated Nov 10, 2025

The missing star history graph of GitHub repos - https://star-history.com

TypeScript 8,037 302 Updated Nov 7, 2025

Distributed query engine providing simple and reliable data processing for any modality and scale

Rust 4,687 334 Updated Nov 10, 2025

A compiler for the SYSY language (a subset of C). My homework for the course "compiler principles"

C++ 8 Updated Aug 6, 2024

NanoGPT (124M) in 3 minutes

Python 3,789 492 Updated Nov 6, 2025
Next