jason-huang03

Follow

Haofeng Huang jason-huang03

Follow

Undergraduate student from IIIS (Yao Class), Tsinghua University | Training Framework | Kernel | Model Arch & GenAI

180 followers · 33 following

Tsinghua University
Beijing, China

Achievements

Achievements

Organizations

Stars

269 results for source starred repositories

Relaxed-System-Lab / Flash-Sparse-Attention

🚀🚀 Efficient implementations of Native Sparse Attention

Python 994 8 Updated Sep 29, 2025

ModelTC / LightCompress

A powerful toolkit for compressing large models including LLM, VLM, and video generation models.

Python 612 62 Updated Nov 5, 2025

fal-ai / flashpack

High-throughput tensor loading for PyTorch

Python 194 11 Updated Oct 27, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 17,502 2,365 Updated Nov 7, 2025

MoonshotAI / Kimi-Linear

1,072 45 Updated Oct 31, 2025

frederic-santos / taoanalysissolutions

Propositions of solutions to the exercises from Terence Tao's textbooks, Analysis I & II. Mirrored from https://gitlab.com/f-santos/taoanalysissolutions

TeX 97 11 Updated Jan 17, 2023

NVIDIA / compute-eval

Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Large Language Models.

Python 72 14 Updated Oct 1, 2025

KuangjuX / NVSHMEM-Tutorial

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 142 11 Updated Sep 18, 2025

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 433 72 Updated Nov 7, 2025

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 2,399 244 Updated Nov 7, 2025

yifan123 / flow_grpo

[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 1,553 84 Updated Nov 4, 2025

NVIDIA / tilus

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 395 8 Updated Nov 7, 2025

ModelTC / LightX2V

Light Video Generation Inference Framework

Python 769 48 Updated Nov 7, 2025

ModelTC / Qwen-Image-Lightning

Qwen-Image-Lightning: Speed up Qwen-Image model with distillation

Python 916 36 Updated Oct 14, 2025

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 762 90 Updated Oct 21, 2025

ByteDance-Seed / cudaLLM

Python 120 6 Updated Aug 18, 2025

qingkelab / qingketalk

青稞Talk

159 1 Updated Nov 5, 2025

KEKE046 / mlir-tutorial

Hands-On Practical MLIR Tutorial

C++ 647 93 Updated Oct 20, 2023

antgroup / DeepXTrace

DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.

Python 66 4 Updated Nov 5, 2025

stepfun-ai / StepMesh

C++ 311 27 Updated Nov 6, 2025

simveit / cute_persistent_kernels

Python 9 Updated Jul 25, 2025

Lightricks / LTX-Video-Q8-Kernels

Python 71 14 Updated May 14, 2025

lxa9867 / Awesome-Autoregressive-Visual-Generation

This is a repo to track the latest autoregressive visual generation papers.

409 5 Updated Jun 25, 2025

nelvko / clash-for-linux-install

😼 优雅地使用基于 clash/mihomo 的代理环境

Shell 5,591 709 Updated Nov 7, 2025

vosen / ZLUDA

CUDA on non-NVIDIA GPUs

Rust 13,386 848 Updated Nov 6, 2025

star-history / star-history

The missing star history graph of GitHub repos - https://star-history.com

TypeScript 8,027 302 Updated Nov 7, 2025

Eventual-Inc / Daft

Distributed query engine providing simple and reliable data processing for any modality and scale

Rust 4,676 330 Updated Nov 7, 2025

interestingLSY / sysy-compiler

A compiler for the SYSY language (a subset of C). My homework for the course "compiler principles"

C++ 8 Updated Aug 6, 2024

KellerJordan / modded-nanogpt

NanoGPT (124M) in 3 minutes

Python 3,774 489 Updated Nov 6, 2025

Dao-AILab / quack

A Quirky Assortment of CuTe Kernels

Python 650 58 Updated Oct 30, 2025