Highlights
- Pro
Stars
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
The best OSS video generation models, created by Genmo
LeanRL is a fork of CleanRL, where selected PyTorch scripts optimized for performance using compile and cudagraphs.
Helpful tools and examples for working with flex-attention
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Simple and efficient pytorch-native transformer training and inference (batched)
The official PyTorch implementation of Google's Gemma models
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
A batched offline inference oriented version of segment-anything
Converts profiling output to a dot graph.
A PyTorch-based library for embedding large graphs to low-dimensional space using force-directed layouts with GPU acceleration.
Inference code and configs for the ReplitLM model family
Advanced evolutionary computation library built directly on top of PyTorch, created at NNAISENSE.
SMT-LIB benchmarks for shape computations from deep learning models in PyTorch
Fast Differentiable Tensor Library in JavaScript and TypeScript with Bun + Flashlight
Graph dump of torchbench models, huggingface models, and TIMM models.
Named tensors with first-class dimensions for PyTorch
TorchOpt is an efficient library for differentiable optimization built upon PyTorch.
Convert scikit-learn models to PyTorch modules
Optimizing AlphaFold Training and Inference on GPU Clusters
GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compression of numerical and other data types in HPC/ML applications.
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.