-
Tsinghua University
- Beijing, China
- https://jason-huang03.github.io/
Stars
Efficient, Python-native, end-to-end MoE training in ~10K lines of code.
high-performance linear attention kernel library built on TileLang
Codes & examples for "CUDA - From Correctness to Performance"
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
Beginner, advanced, expert level Rust training material
A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.
Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language
Building the Virtuous Cycle for AI-driven LLM Systems
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
Official Code Implementation of Translating Flow to Policy via Hindsight Online Imitation
Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.
A claude code skill to delegate prompts to codex
Lightweight coding agent that runs in your terminal
omo; the best agent harness - previously oh-my-opencode
原 [chatlog]项目(一个微信数据库解密读取及提供mcp服务、http服务的开源软件),现已支持通过微信clawbot接口推送消息,可以实时转发全部或指定消息到clawbot