Stars
Muon is an optimizer for hidden layers in neural networks
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)
A Neural network layer able to express distributions over anything
Establishing Scaling Laws for Crypto Market Forecasting
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Conformal prediction for time-series applications.
[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
Instruct-tune LLaMA on consumer hardware
Series of lectures on Scientific Methodology and Performance Evaluation
You like pytorch? You like micrograd? You love tinygrad! ❤️
Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.
A set of tools to play with deep learning
Implementation of flat mnist Generative Adversiarial Neural Network using low level features of tfjs