Highlights
- Pro
Stars
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
The best OSS video generation models, created by Genmo
LeanRL is a fork of CleanRL, where selected PyTorch scripts optimized for performance using compile and cudagraphs.
Helpful tools and examples for working with flex-attention
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Simple and efficient pytorch-native transformer training and inference (batched)
The official PyTorch implementation of Google's Gemma models
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
A batched offline inference oriented version of segment-anything
Converts profiling output to a dot graph.
A PyTorch-based library for embedding large graphs to low-dimensional space using force-directed layouts with GPU acceleration.
Advanced evolutionary computation library built directly on top of PyTorch, created at NNAISENSE.
SMT-LIB benchmarks for shape computations from deep learning models in PyTorch
Fast Differentiable Tensor Library in JavaScript and TypeScript with Bun + Flashlight
Graph dump of torchbench models, huggingface models, and TIMM models.
TorchOpt is an efficient library for differentiable optimization built upon PyTorch.
Convert scikit-learn models to PyTorch modules
Optimizing AlphaFold Training and Inference on GPU Clusters
GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
The most customizable typing website with a minimalistic design and a ton of features. Test yourself in various modes, track your progress and improve your speed.
A library for differentiable nonlinear optimization
Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2