#

flash-attention

Here are 55 public repositories matching this topic...

Naman-ntc / FastCode

Utilities for efficient fine-tuning, inference and evaluation of code generation models

transformers efficient inference code-generation finetuning flash-attention

Updated Oct 3, 2023
Python

RulinShao / FastCkpt

Python package for rematerialization-aware gradient checkpointing

gradient-checkpointing flash-attention

Updated Oct 31, 2023
Python

CoinCheung / gdGPT

Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.

nlp bloom pipeline pytorch deepspeed llm full-finetune model-parallization flash-attention llama2 baichuan2-7b chatglm3-6b mixtral-8x7b

Updated Feb 5, 2024
Python

alexzhang13 / flashattention2-custom-mask

Triton implementation of FlashAttention2 that adds Custom Masks.

deep-learning triton attention cuda-kernels attention-mechanism triton-lang flash-attention flash-attention-2

Updated Aug 14, 2024
Python

gietema / attention

Toy Flash Attention implementation in torch

torch flash-attention flash-attention-2 flash-attention-3

Updated Sep 22, 2024
Python

kreasof-ai / Homunculus-Project

Long term project about a custom AI architecture. Consist of cutting-edge technique in machine learning such as Flash-Attention, Group-Query-Attention, ZeRO-Infinity, BitNet, etc.

python machine-learning deep-learning jupyter-notebook pytorch transformer bitnet pytorch-lightning vision-transformer large-language-models low-rank-adaptation flash-attention

Updated Oct 15, 2024
Python

kklemon / FlashPerceiver

Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.

nlp deep-learning transformer attention-mechanism perceiver flash-attention

Updated Nov 4, 2024
Python

LukasDrews97 / DumbleLLM

Decoder-only LLM trained on the Harry Potter books.

transformer byte-pair-encoding rotary-position-embedding large-language-model flash-attention grouped-query-attention

Updated Dec 20, 2024
Python

MasterSkepticista / gpt2

Training GPT-2 on FineWeb-Edu in JAX/Flax

flax jax gpt2 flash-attention fineweb

Updated Dec 28, 2024
Python

ichuniq / cs149gpt

Efficient custom C++ attention layer for NanoGPT

llms flash-attention

Updated Jan 4, 2025
Python

DAMO-NLP-SG / Inf-CLIP

[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.

memory-efficient clip contrastive-learning flash-attention ring-attention infinite-batch-size

Updated Jan 16, 2025
Python

Iron-Bound / native-sparse-attention

Building Native Sparse Attention

deep-learning sparse-attention flash-attention

Updated Feb 20, 2025
Python

erfanzar / jax-flash-attn2

A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).

pallas jax flash-attention flash-attention-2

Updated Mar 4, 2025
Python

MoonshotAI / MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

pytorch transformer moe llm llm-serving llm-training flash-attention

Updated Apr 3, 2025
Python

zzmtsvv / ad-gta

Grouped-Tied Attention by Zadouri, Strauss, Dao (2025).

transformers attention in-context-learning flash-attention in-context-reinforcement-learning algorithm-distillation

Updated Jul 4, 2025
Python

LaBackDoor / rope-t5

A from-scratch implementation of a T5 model modified with Rotary Position Embeddings (RoPE). This project includes the code for pre-training on the C4 dataset in streaming mode with Flash Attention 2.

nlp pytorch sequence-to-sequence language-model from-scratch rope pre-training huggingface t5 evaluation-benchmark llm rotary-position-embedding flash-attention c4-dataset span-corruption

Updated Jul 9, 2025
Python

Chinese-LLaMA-Alpaca-2

ymcui / Chinese-LLaMA-Alpaca-2

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

nlp yarn llama alpaca 64k large-language-models llm rlhf flash-attention llama2 llama-2 alpaca-2 alpaca2

Updated Jul 15, 2025
Python

XunhaoLai / ring-sliding-window-attention

Ring sliding window attention implementation with flash attention

parallel-training large-language-models flash-attention sliding-window-attention

Updated Jul 25, 2025
Python

ZhouTao415 / ml-systems-from-scratch-tz

Implementing modern DL systems from scratch — Transformers, Diffusion, Multimodal LLMs, FlashAttention, RLHF.

machine-learning ai deep-learning pytorch transformer from-scratch attention-is-all-you-need rag diffusion-models llm rlhf vision-language-model multimodal-large-language-models flash-attention

Updated Jul 31, 2025
Python

InternLM / InternEvo

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

pytorch multi-modal gemma pipeline-parallelism transformers-models tensor-parallelism llava llm-training internlm flash-attention zero3 llm-framework sequence-parallelism internlm2 ring-attention deepspeed-ulysses llama3 910b

Updated Aug 21, 2025
Python

Improve this page

Add a description, image, and links to the flash-attention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the flash-attention topic, visit your repo's landing page and select "manage topics."