The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
-
Updated
Mar 5, 2026 - Python
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
MoBA: Mixture of Block Attention for Long-Context LLMs
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
Trainable fast and memory-efficient sparse attention
[ICLR 2026] rCM: SOTA JVP-Based Diffusion Distillation & Few-Step Video Generation & Scaling Up sCM/MeanFlow
FlashSinkhorn: IO-Aware Entropic Optimal Transport in PyTorch + Triton. Streaming Sinkhorn with O(nd) memory.
Triton implementation of FlashAttention2 that adds Custom Masks.
[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.
Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.
Automatically benchmark and optimize attention in diffusion models. 1.5-2x speedup on RTX 4090.
Python package for rematerialization-aware gradient checkpointing
Pre-built wheels that erase Flash Attention 3 installation headaches.
Utilities for efficient fine-tuning, inference and evaluation of code generation models
Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.
A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).
Long term project about a custom AI architecture. Consist of cutting-edge technique in machine learning such as Flash-Attention, Group-Query-Attention, ZeRO-Infinity, BitNet, etc.
Toy Flash Attention implementation in torch
Add a description, image, and links to the flash-attention topic page so that developers can more easily learn about it.
To associate your repository with the flash-attention topic, visit your repo's landing page and select "manage topics."