Skip to content
Change the repository type filter

All

    Repositories list

    • flash-attention

      Public
      Fast and memory-efficient exact attention
      Python
      2.2k21k91692Updated Dec 18, 2025Dec 18, 2025
    • sonic-moe

      Public
      Accelerating MoE with IO and Tile-aware Optimizations
      Python
      1534310Updated Dec 18, 2025Dec 18, 2025
    • quack

      Public
      A Quirky Assortment of CuTe Kernels
      Python
      64700141Updated Dec 16, 2025Dec 16, 2025
    • causal-conv1d

      Public
      Causal depthwise conv1d in CUDA, with a PyTorch interface
      Cuda
      1476773311Updated Oct 20, 2025Oct 20, 2025
    • fast-hadamard-transform

      Public
      Fast Hadamard transform in CUDA, with a PyTorch interface
      C
      4926782Updated Oct 19, 2025Oct 19, 2025
    • cutlass

      Public
      CUDA Templates for Linear Algebra Subroutines
      C++
      1.6k100Updated Jun 8, 2025Jun 8, 2025
    • grouped-latent-attention

      Public
      Python
      413350Updated May 29, 2025May 29, 2025
    • gemm-cublas

      Public
      Python
      12300Updated May 5, 2025May 5, 2025