Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
VideoNSA: Native Sparse Attention Scales Video Understanding
An intuitive and low-overhead instrumentation tool for Python
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Efficient Triton Kernels for LLM Training
toothacher17 / Megatron-LM
Forked from NVIDIA/Megatron-LMOngoing research training transformer models at scale
🚀 Efficient implementations of state-of-the-art linear attention models
Development repository for the Triton language and compiler
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashMLA: Efficient Multi-head Latent Attention Kernels
OpenSeek aims to unite the global open source community to drive collaborative innovation in algorithms, data and systems to develop next-generation models.