Skip to content
View mdy666's full-sized avatar

Block or report mdy666

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,813 12,083 Updated Dec 20, 2025

VideoNSA: Native Sparse Attention Scales Video Understanding

Python 73 1 Updated Nov 16, 2025

An intuitive and low-overhead instrumentation tool for Python

Python 1,184 39 Updated Jul 8, 2025

A Quirky Assortment of CuTe Kernels

Python 702 64 Updated Dec 16, 2025
Python 78 6 Updated Dec 2, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,261 350 Updated Dec 19, 2025

Efficient Triton Kernels for LLM Training

Python 5,962 451 Updated Dec 19, 2025

Ongoing research training transformer models at scale

Python 3 2 Updated Feb 24, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,087 333 Updated Dec 20, 2025

Development repository for the Triton language and compiler

MLIR 17,887 2,461 Updated Dec 20, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,984 778 Updated Dec 8, 2025

qwen-nsa

Jupyter Notebook 85 7 Updated Oct 14, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,926 918 Updated Dec 15, 2025

OpenSeek aims to unite the global open source community to drive collaborative innovation in algorithms, data and systems to develop next-generation models.

Python 241 39 Updated Dec 15, 2025
Jupyter Notebook 148 13 Updated Jul 4, 2025