flash-attention

Star

Here are 55 public repositories matching this topic...

QwenLM / Qwen

Star

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

natural-language-processing chinese pretrained-models large-language-models llm flash-attention

Updated Mar 5, 2026
Python

ymcui / Chinese-LLaMA-Alpaca-2

Star

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

nlp yarn llama alpaca 64k large-language-models llm rlhf flash-attention llama2 llama-2 alpaca-2 alpaca2

Updated Jul 15, 2025
Python

InternLM / InternLM

Star

Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

chatbot chinese gpt pretrained-models llm long-context rlhf large-language-model flash-attention fine-tuning-llm

Updated Oct 30, 2025
Python

xlite-dev / Awesome-LLM-Inference

Star

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

mla vllm llm-inference awesome-llm flash-attention tensorrt-llm paged-attention deepseek flash-attention-3 deepseek-v3 minimax-01 deepseek-r1 flash-mla qwen3

Updated Mar 26, 2026
Python

MoonshotAI / MoBA

Star

MoBA: Mixture of Block Attention for Long-Context LLMs

pytorch transformer moe llm llm-serving llm-training flash-attention

Updated Apr 3, 2025
Python

InternLM / InternEvo

Star

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

pytorch multi-modal gemma pipeline-parallelism transformers-models tensor-parallelism llava llm-training internlm flash-attention zero3 llm-framework sequence-parallelism internlm2 ring-attention deepspeed-ulysses llama3 910b

Updated Aug 21, 2025
Python

HKUSTDial / flash-sparse-attention

Star

Trainable fast and memory-efficient sparse attention

kernel triton sparse-attention flash-attention flash-sparse-attention

Updated Mar 27, 2026
Python

NVlabs / rcm

Star

[ICLR 2026] rCM: SOTA JVP-Based Diffusion Distillation & Few-Step Video Generation & Scaling Up sCM/MeanFlow

real-time diffusion distillation video-generation world-models distribution-matching jacobian-vector-product flash-attention consistency-model wan-video meanflow few-step-generation autoregressive-video-generation

Updated Mar 24, 2026
Python

ot-triton-lab / flash-sinkhorn

Star

FlashSinkhorn: IO-Aware Entropic Optimal Transport in PyTorch + Triton. Streaming Sinkhorn with O(nd) memory.

machine-learning gpu cuda pytorch triton optimal-transport sinkhorn flash-attention entropic-optimal-transport flashsinkhorn

Updated Mar 3, 2026
Python

alexzhang13 / flashattention2-custom-mask

Star

Triton implementation of FlashAttention2 that adds Custom Masks.

deep-learning triton attention cuda-kernels attention-mechanism triton-lang flash-attention flash-attention-2

Updated Aug 14, 2024
Python

[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.

memory-efficient clip contrastive-learning flash-attention ring-attention infinite-batch-size

Updated Jan 16, 2025
Python

CoinCheung / gdGPT

Star

Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.

nlp bloom pipeline pytorch deepspeed llm full-finetune model-parallization flash-attention llama2 baichuan2-7b chatglm3-6b mixtral-8x7b

Updated Feb 5, 2024
Python

D-Ogi / ComfyUI-Attention-Optimizer

Star

Automatically benchmark and optimize attention in diffusion models. 1.5-2x speedup on RTX 4090.

flux performance optimization attention diffusion stable-diffusion comfyui flash-attention comfyui-custom-node sageattention

Updated Feb 9, 2026
Python

RulinShao / FastCkpt

Star

Python package for rematerialization-aware gradient checkpointing

gradient-checkpointing flash-attention

Updated Oct 31, 2023
Python

windreamer / flash-attention3-wheels

Star

Pre-built wheels that erase Flash Attention 3 installation headaches.

python windows wheel hopper llm flash-attention flash-attention-3

Updated Mar 31, 2026
Python

Naman-ntc / FastCode

Star

Utilities for efficient fine-tuning, inference and evaluation of code generation models

transformers efficient inference code-generation finetuning flash-attention

Updated Oct 3, 2023
Python

kklemon / FlashPerceiver

Star

Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.

nlp deep-learning transformer attention-mechanism perceiver flash-attention

Updated Nov 4, 2024
Python

erfanzar / jax-flash-attn2

Star

A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).

pallas jax flash-attention flash-attention-2

Updated Mar 4, 2025
Python

kreasof-ai / Homunculus-Project

Sponsor

Star

Long term project about a custom AI architecture. Consist of cutting-edge technique in machine learning such as Flash-Attention, Group-Query-Attention, ZeRO-Infinity, BitNet, etc.

python machine-learning deep-learning jupyter-notebook pytorch transformer bitnet pytorch-lightning vision-transformer large-language-models low-rank-adaptation flash-attention

Updated Oct 15, 2024
Python

gietema / attention

Star

Toy Flash Attention implementation in torch

torch flash-attention flash-attention-2 flash-attention-3

Updated Sep 22, 2024
Python

Improve this page

Add a description, image, and links to the flash-attention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the flash-attention topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flash-attention

Here are 55 public repositories matching this topic...

QwenLM / Qwen

ymcui / Chinese-LLaMA-Alpaca-2

InternLM / InternLM

xlite-dev / Awesome-LLM-Inference

MoonshotAI / MoBA

InternLM / InternEvo

HKUSTDial / flash-sparse-attention

NVlabs / rcm

ot-triton-lab / flash-sinkhorn

alexzhang13 / flashattention2-custom-mask

DAMO-NLP-SG / Inf-CLIP

CoinCheung / gdGPT

D-Ogi / ComfyUI-Attention-Optimizer

RulinShao / FastCkpt

windreamer / flash-attention3-wheels

Naman-ntc / FastCode

kklemon / FlashPerceiver

erfanzar / jax-flash-attn2

kreasof-ai / Homunculus-Project

gietema / attention

Improve this page

Add this topic to your repo