Stars
Gradient Descent Lab: Methods and Empirical Behavior
Accelerated Gradient Methods: Momentum, Nesterov, and When Theory Misbehaves
[ICML 2025] Official Repository of "TruthLens: Training-Free Data Verification for Deepfake Images via VQA-style Probing"
Attention Is All You Need: A PyTorch Implementation from Scratch
Merge Sort: Parallelization Study
A Comprehensive Benchmark of Sparse Attention Mechanisms in Vision Transformers
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
Manipulate audio with a simple and easy high level interface