The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
-
Updated
Nov 26, 2025 - Python
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
MoBA: Mixture of Block Attention for Long-Context LLMs
Trainable fast and memory-efficient sparse attention
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
[CVPR 2025 Highlight] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.
Triton implementation of FlashAttention2 that adds Custom Masks.
Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.
A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).
Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.
Python package for rematerialization-aware gradient checkpointing
Utilities for efficient fine-tuning, inference and evaluation of code generation models
A monorepo containing various utility scripts, tools, and applications for development, automation, and AI-powered tasks.
Ring sliding window attention implementation with flash attention
Long term project about a custom AI architecture. Consist of cutting-edge technique in machine learning such as Flash-Attention, Group-Query-Attention, ZeRO-Infinity, BitNet, etc.
Pre-built wheels that erase Flash Attention 3 installation headaches.
Cross-platform FlashAttention-2 Triton implementation for Turing+ with custom configuration mode
Training GPT-2 on FineWeb-Edu in JAX/Flax
Add a description, image, and links to the flash-attention topic page so that developers can more easily learn about it.
To associate your repository with the flash-attention topic, visit your repo's landing page and select "manage topics."