flash-attention

Star

Here are 34 public repositories matching this topic...

QwenLM / Qwen

Star

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

natural-language-processing chinese pretrained-models large-language-models llm flash-attention

Updated Nov 26, 2025
Python

ymcui / Chinese-LLaMA-Alpaca-2

Star

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

nlp yarn llama alpaca 64k large-language-models llm rlhf flash-attention llama2 llama-2 alpaca-2 alpaca2

Updated Jul 15, 2025
Python

InternLM / InternLM

Star

Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).

chatbot chinese gpt pretrained-models llm long-context rlhf large-language-model flash-attention fine-tuning-llm

Updated Oct 30, 2025
Python

xlite-dev / Awesome-LLM-Inference

Star

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

mla vllm llm-inference awesome-llm flash-attention tensorrt-llm paged-attention deepseek flash-attention-3 deepseek-v3 minimax-01 deepseek-r1 flash-mla qwen3

Updated Nov 28, 2025
Python

MoonshotAI / MoBA

Star

MoBA: Mixture of Block Attention for Long-Context LLMs

pytorch transformer moe llm llm-serving llm-training flash-attention

Updated Apr 3, 2025
Python

flash-algo / flash-sparse-attention

Star

Trainable fast and memory-efficient sparse attention

kernel sparse-attention flash-attention flash-sparse-attention

Updated Dec 19, 2025
Python

InternLM / InternEvo

Star

InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

pytorch multi-modal gemma pipeline-parallelism transformers-models tensor-parallelism llava llm-training internlm flash-attention zero3 llm-framework sequence-parallelism internlm2 ring-attention deepspeed-ulysses llama3 910b

Updated Aug 21, 2025
Python

[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.

memory-efficient clip contrastive-learning flash-attention ring-attention infinite-batch-size

Updated Jan 16, 2025
Python

alexzhang13 / flashattention2-custom-mask

Star

Triton implementation of FlashAttention2 that adds Custom Masks.

deep-learning triton attention cuda-kernels attention-mechanism triton-lang flash-attention flash-attention-2

Updated Aug 14, 2024
Python

CoinCheung / gdGPT

Star

Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.

nlp bloom pipeline pytorch deepspeed llm full-finetune model-parallization flash-attention llama2 baichuan2-7b chatglm3-6b mixtral-8x7b

Updated Feb 5, 2024
Python

erfanzar / jax-flash-attn2

Star

A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).

pallas jax flash-attention flash-attention-2

Updated Mar 4, 2025
Python

kklemon / FlashPerceiver

Star

Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.

nlp deep-learning transformer attention-mechanism perceiver flash-attention

Updated Nov 4, 2024
Python

RulinShao / FastCkpt

Star

Python package for rematerialization-aware gradient checkpointing

gradient-checkpointing flash-attention

Updated Oct 31, 2023
Python

Naman-ntc / FastCode

Star

Utilities for efficient fine-tuning, inference and evaluation of code generation models

transformers efficient inference code-generation finetuning flash-attention

Updated Oct 3, 2023
Python

athrael-soju / little-scripts

Star

A monorepo containing various utility scripts, tools, and applications for development, automation, and AI-powered tasks.

text-to-speech ocr computer-vision cuda speech-to-text gradio fastapi vector-search document-understanding qdrant paddle-ocr flash-attention rag-chatbot colpali deepseek-ocr

Updated Nov 30, 2025
Python

XunhaoLai / ring-sliding-window-attention

Star

Ring sliding window attention implementation with flash attention

parallel-training large-language-models flash-attention sliding-window-attention

Updated Jul 25, 2025
Python

kreasof-ai / Homunculus-Project

Sponsor

Star

Long term project about a custom AI architecture. Consist of cutting-edge technique in machine learning such as Flash-Attention, Group-Query-Attention, ZeRO-Infinity, BitNet, etc.

python machine-learning deep-learning jupyter-notebook pytorch transformer bitnet pytorch-lightning vision-transformer large-language-models low-rank-adaptation flash-attention

Updated Oct 15, 2024
Python

windreamer / flash-attention3-wheels

Star

Pre-built wheels that erase Flash Attention 3 installation headaches.

python wheel llm flash-attention flash-attention-3

Updated Dec 15, 2025
Python

egaoharu-kensei / flash-attention-triton

Star

Cross-platform FlashAttention-2 Triton implementation for Turing+ with custom configuration mode

Updated Dec 16, 2025
Python

MasterSkepticista / gpt2

Star

Training GPT-2 on FineWeb-Edu in JAX/Flax

flax jax gpt2 flash-attention fineweb

Updated Dec 28, 2024
Python

Improve this page

Add a description, image, and links to the flash-attention topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the flash-attention topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flash-attention

Here are 34 public repositories matching this topic...

QwenLM / Qwen

ymcui / Chinese-LLaMA-Alpaca-2

InternLM / InternLM

xlite-dev / Awesome-LLM-Inference

MoonshotAI / MoBA

flash-algo / flash-sparse-attention

InternLM / InternEvo

DAMO-NLP-SG / Inf-CLIP

alexzhang13 / flashattention2-custom-mask

CoinCheung / gdGPT

erfanzar / jax-flash-attn2

kklemon / FlashPerceiver

RulinShao / FastCkpt

Naman-ntc / FastCode

athrael-soju / little-scripts

XunhaoLai / ring-sliding-window-attention

kreasof-ai / Homunculus-Project

windreamer / flash-attention3-wheels

egaoharu-kensei / flash-attention-triton

MasterSkepticista / gpt2

Improve this page

Add this topic to your repo