Stars
KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)
A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.
A kernel library written in tilelang
OpenCode plugin that uses your existing Claude Code credentials — no separate login needed.
Open-source CUDA, Triton and HIP compiler targeting multiple GPU and CPU architectures.
Documentation for the Mainboard and printable mechanical parts in the Framework Desktop
A project trying to build a hoverboard controller without semiconductors
A machine learning accelerator core designed for energy-efficient AI at the edge.
Memory Optimizations for Deep Learning (ICML 2023)
Exocompilation for productive programming of hardware accelerators
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Minimal reproduction of DeepSeek R1-Zero
Open-source high-performance RISC-V processor
Type annotations and runtime checking for shape and dtype of JAX/NumPy/PyTorch/etc. arrays. https://docs.kidger.site/jaxtyping/
the official Rust and C implementations of the BLAKE3 cryptographic hash function
Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wikitext-103 on a single A100 in <100 seconds. Scales to large…
Entropy Based Sampling and Parallel CoT Decoding
A free and strong UCI chess engine
parallelized hyperdimensional tictactoe
Nvidia Instruction Set Specification Generator