- Zurich
Stars
[ICML 2025] Official PyTorch Implementation of "History-Guided Video Diffusion"
Official codebase for "Self Forcing: Bridging Training and Inference in Autoregressive Video Diffusion" (NeurIPS 2025 Spotlight)
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
optimizer & lr scheduler & loss function collections in PyTorch
Reference PyTorch implementation and models for DINOv3
The GrandTour Dataset: A Legged Robotics Dataset in the Wild
Official implementation of the paper "Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training", [TMLR & Workshop (SLLM) @ ICLR 2025]
Combining Grouped-Query Attention (https://arxiv.org/abs/2305.13245) with Deformable Attention (https://arxiv.org/abs/2201.00520) in PyTorch.
[NeurIPS 2025 Spotlight] TPA: Tensor ProducT ATTenTion Transformer (T6) (https://arxiv.org/abs/2501.06425)
Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"
TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)
Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
Helpful tools and examples for working with flex-attention
The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints"
(Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints" (https://arxiv.org/pdf/2305.13245.pdf)
several types of attention modules written in PyTorch for learning purposes
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
You like pytorch? You like micrograd? You love tinygrad! ❤️
Development repository for the Triton language and compiler
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
MineStudio: A Streamlined Package for Minecraft AI Agent Development
This repository contains a collection of surveys, datasets, papers, and codes, for predictive uncertainty estimation in deep learning models.
[CVPR 2025] DEIM: DETR with Improved Matching for Fast Convergence