-
Tsinghua University
- Beijing, China
- https://jason-huang03.github.io/
Stars
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
Beginner, advanced, expert level Rust training material
A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.
Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language
Building the Virtuous Cycle for AI-driven LLM Systems
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
Official Code Implementation of Translating Flow to Policy via Hindsight Online Imitation
Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.
A claude code skill to delegate prompts to codex
Lightweight coding agent that runs in your terminal
omo; the best agent harness - previously oh-my-opencode
原 [chatlog]项目(一个微信数据库读取及提供mcp服务开源软件)的二次开发,会尽可能同步最新开源解密源码
Vercel's official collection of agent skills
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning