ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…

Python 12,454 1,131 Updated Jun 21, 2026

wafer-ai / gpu-perf-engineering-resources

A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do

843 102 Updated Apr 27, 2026

ai-infra-curriculum / ai-infra-performance-learning

AI Infrastructure Performance Engineer Learning Track - GPU optimization, inference optimization, and cost reduction

Python 38 9 Updated Jun 21, 2026

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 22,835 2,069 Updated May 11, 2026

apple / axlearn

An Extensible Deep Learning Library

Python 2,367 406 Updated May 16, 2026

NVIDIA / TileGym

Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming

Python 757 78 Updated Jun 17, 2026

scai-tech / Nest

Python 4 Updated Mar 4, 2026

HazyResearch / Megakernels

Kernels, of the mega variety :)

Python 757 60 Updated May 26, 2026

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,464 299 Updated Jun 15, 2026

LMCache / LMCache

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Python 9,574 1,367 Updated Jun 22, 2026

NVIDIA / cccl

CUDA Core Compute Libraries

C++ 2,391 413 Updated Jun 22, 2026

Lightning-AI / lightning-thunder

PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily write your own.

Python 1,463 114 Updated Jun 15, 2026

KarnbirKhera / CUDA-TwoTreeFramework

A systematic and pedagogical way to derive the correctness structure of 2D Register Allocated GEMM before coding.

HTML 8 Updated May 2, 2026

KarnbirKhera / MLSys2026-9Week-LearningPlan

Cuda 10 1 Updated Apr 17, 2026

antirez / ds4

DeepSeek 4 Flash and PRO local inference engine for Metal, CUDA and ROCm

C 14,923 1,304 Updated Jun 17, 2026

florianmattana / sass-king

Reverse engineering NVIDIA SASS instruction dictionary, kernel audits and pattern recognition across GPU architectures.

Cuda 300 14 Updated May 18, 2026