Skip to content
View jeromeku's full-sized avatar

Block or report jeromeku

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Region-level profiling for CUDA kernels with trace, NVBit, CUPTI, NSys, and an interactive Explorer.

Python 118 11 Updated Apr 17, 2026

A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.

Python 183 12 Updated May 15, 2026

🍀 Codebase for CloverLM

Python 7 Updated Apr 26, 2026

🐜 Research-friendly Deep Learning framework

Python 8 1 Updated May 4, 2026

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 314 23 Updated May 31, 2026

stdgpu: Efficient STL-like Data Structures on the GPU

C++ 1,264 99 Updated Jun 8, 2026

An MLIR-based compiler that takes GPU kernels and compiles them to real hardware instructions. Interactive web visualizer included.

TypeScript 131 16 Updated Mar 21, 2026

Review automated kernel generation in the era of LLMs

233 18 Updated May 26, 2026
Cuda 61 12 Updated Dec 10, 2025

Code examples for tutoring modern C++

C++ 100 9 Updated Jul 21, 2025

GPU-accelerated Schulze voting method in Python, Numba, CUDA, and Mojo 🔥, using ideas from Algebraic Graph Theory

Mojo 19 1 Updated Oct 28, 2025

Nvidia Instruction Set Specification Generator

Python 338 23 Updated Jul 9, 2024

A collection of study materials for AI compilers and systems.

58 2 Updated Nov 14, 2025

A chronologically sorted list of influential papers on compiler optimization, from the seminal works of 1952 through the advanced techniques of 1994

TeX 79 7 Updated May 26, 2026

SBLP 2025 MLIR Tutorial

C++ 75 4 Updated Mar 25, 2026

A MLIR Rust workshop

Rust 8 1 Updated Dec 11, 2024

A concise explanation of Rust types and Memory Layout.

137 15 Updated Jul 9, 2025

🚴 Call stack profiler for Python. Shows you why your code is slow!

Python 7,929 289 Updated Jun 9, 2026

Minimal and annotated implementations of key ideas from modern deep learning research.

Python 1,320 109 Updated Jan 29, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,458 151 Updated Apr 22, 2026

A simple calculation for LLM MFU.

Jupyter Notebook 78 4 Updated Sep 10, 2025
C++ 182 45 Updated May 11, 2026

Low-Level Programming Roadmap and Resources

1,333 89 Updated Mar 26, 2026

Tutorial on building a gpu compiler backend in LLVM

C++ 58 11 Updated Jan 11, 2025

Compiling useful links, papers, benchmarks, ideas, etc.

46 1 Updated Mar 16, 2025

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

MATLAB 16,476 1,557 Updated May 26, 2026

Awesome Reasoning LLM Tutorial/Survey/Guide

Python 2,441 164 Updated Apr 6, 2026

This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

C++ 3,738 381 Updated Jun 4, 2026

Expert Parallelism Load Balancer

Python 1,388 203 Updated Mar 24, 2025

Analyze computation-communication overlap in V3/R1.

1,160 147 Updated Mar 21, 2025
Next