Skip to content
#

flashattention

Here are 11 public repositories matching this topic...

A high-performance kernel implementation of multi-head attention using Triton. Focused on minimizing memory overhead and maximizing throughput for large-scale transformer layers. Includes clean-tensor layouts, head-grouping optimisations, and ready-to-benchmark code you can plug into custom models.

  • Updated Aug 12, 2025
  • Python

Improve this page

Add a description, image, and links to the flashattention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the flashattention topic, visit your repo's landing page and select "manage topics."

Learn more