A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,392 749 Updated Jun 14, 2026

flagos-ai / FlagGems

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 1,023 412 Updated Jun 14, 2026

pytorch / ao

PyTorch native quantization and sparsity for training and inference

Python 2,857 527 Updated Jun 12, 2026

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 19,442 2,937 Updated Jun 14, 2026

yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 418 52 Updated Jan 2, 2025

meta-pytorch / tlparse

TORCH_TRACE parser for PT2

Rust 86 28 Updated May 11, 2026

luizalabs / shared-memory-dict

A very simple shared memory dict implementation

Python 176 23 Updated Jan 19, 2026

johnma2006 / mamba-minimal

Simple, minimal implementation of the Mamba SSM in one file of PyTorch.

Python 2,954 221 Updated Mar 8, 2024

pybind / pybind11

Seamless operability between C++11 and Python

C++ 17,907 2,308 Updated Jun 10, 2026

lucidrains / denoising-diffusion-pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Python 10,604 1,283 Updated Feb 11, 2026

hojonathanho / diffusion

Denoising Diffusion Probabilistic Models

Python 5,240 484 Updated Aug 29, 2023

tanelp / tiny-diffusion

A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.

Jupyter Notebook 1,010 79 Updated May 7, 2024

mlfoundations / open_clip

An open source implementation of CLIP.

Python 13,912 1,286 Updated Jun 12, 2026

teddykoker / image-gpt

PyTorch Implementation of OpenAI's Image GPT

Python 258 33 Updated Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lin idning

Achievements

Achievements

Block or report idning

Stars

dair-ai / Mathematics-for-ML

jax-ml / scaling-book

linkedin / Liger-Kernel

tspeterkim / flash-attention-minimal

kyegomez / FlashAttention20Triton

open-thought / reasoning-gym

deepseek-ai / DeepGEMM

kvcache-ai / ktransformers

open-thought / tiny-grpo

MaximeVandegar / Papers-in-100-Lines-of-Code

OpenRLHF / OpenRLHF

Jiayi-Pan / TinyZero

BlackHC / batch_pong_poc

huggingface / open-r1

fla-org / flash-linear-attention

pytorch / torchtitan

NVIDIA / TransformerEngine