Skip to content
View idning's full-sized avatar

Block or report idning

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🧮 A collection of resources to learn mathematics for machine learning

6,179 703 Updated Jan 24, 2023

Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs

HTML 1,078 161 Updated Jun 11, 2026

Efficient Triton Kernels for LLM Training

Python 6,430 540 Updated Jun 12, 2026

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 1,157 113 Updated Dec 30, 2024

Triton implementation of Flash Attention2.0

Python 54 6 Updated Jul 31, 2023

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Python 1,444 120 Updated Apr 17, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,375 1,046 Updated Jun 4, 2026

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 17,280 1,313 Updated Jun 7, 2026

Minimal hackable GRPO implementation

Python 341 45 Updated Jan 31, 2025

Implementation of papers in 100 lines of code.

Python 2,810 253 Updated Apr 8, 2026

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

Python 9,637 969 Updated Jun 9, 2026

Minimal reproduction of DeepSeek R1-Zero

Python 13,165 1,585 Updated Feb 27, 2026

Instead of running one environment at a time or one per thread, run everything in batch using numpy on a single core.

Jupyter Notebook 5 2 Updated Feb 19, 2018

Fully open reproduction of DeepSeek-R1

Python 26,311 2,439 Updated Apr 2, 2026

🚀 Efficient implementations for emerging model architectures

Python 5,217 556 Updated Jun 11, 2026

A PyTorch native platform for training generative AI models

Python 5,436 859 Updated Jun 14, 2026

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,392 749 Updated Jun 14, 2026

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 1,023 412 Updated Jun 14, 2026

PyTorch native quantization and sparsity for training and inference

Python 2,857 527 Updated Jun 12, 2026

Development repository for the Triton language and compiler

MLIR 19,442 2,937 Updated Jun 14, 2026

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 418 52 Updated Jan 2, 2025

TORCH_TRACE parser for PT2

Rust 86 28 Updated May 11, 2026

A very simple shared memory dict implementation

Python 176 23 Updated Jan 19, 2026

Simple, minimal implementation of the Mamba SSM in one file of PyTorch.

Python 2,954 221 Updated Mar 8, 2024

Seamless operability between C++11 and Python

C++ 17,907 2,308 Updated Jun 10, 2026

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Python 10,604 1,283 Updated Feb 11, 2026

Denoising Diffusion Probabilistic Models

Python 5,240 484 Updated Aug 29, 2023

A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.

Jupyter Notebook 1,010 79 Updated May 7, 2024

An open source implementation of CLIP.

Python 13,912 1,286 Updated Jun 12, 2026

PyTorch Implementation of OpenAI's Image GPT

Python 258 33 Updated Oct 3, 2023
Next