Skip to content
View catswe's full-sized avatar

Block or report catswe

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.

Python 69 9 Updated Jun 22, 2026

🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× vs cuBLAS

Cuda 111 2 Updated Sep 8, 2025

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 909 193 Updated May 9, 2026

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 1,446 226 Updated Mar 20, 2024

[AAAI 2026] Official implementation of "FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models". If you find this repository helpful, please consider starring 🌟 it to support the p…

Python 17 2 Updated May 1, 2026

Official repository of the xLSTM.

Python 2,175 184 Updated May 28, 2026

Algorithms for latent compaction

Python 250 27 Updated Apr 22, 2026

Zonos2 is a leading open-weight text-to-speech MoE.

Python 236 25 Updated Jun 16, 2026

Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Roy et al. (2025)

Python 49 1 Updated Sep 2, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,329 223 Updated Jun 22, 2026

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Python 216 22 Updated Jun 23, 2026

AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)

Python 45 2 Updated May 20, 2026

Triton kernels for dynamic causal short convolutions.

Python 22 1 Updated Jun 4, 2026

LM engine is a library for pretraining/finetuning LLMs

Python 180 30 Updated Jun 23, 2026

Official PyTorch Implementation of Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Python 223 18 Updated May 25, 2026

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 610 32 Updated Mar 13, 2026

Learn CUDA with PyTorch

Cuda 337 50 Updated Jun 1, 2026

CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.

Python 524 65 Updated Jun 23, 2026

Official repository for Parallax (Parameterized Local Linear Attention)

Python 61 5 Updated Jun 20, 2026

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

JavaScript 220,052 33,719 Updated Jun 22, 2026

Delta Attention Residuals - supplementary code and pretrained models

Python 35 1 Updated May 20, 2026

SpectralQuant: Calibrated Eigenbasis Rotation and Water-Filled Bit Allocation for KV-Cache Compression

Python 195 22 Updated May 15, 2026

State-of-the-art TTS model under 25MB 😻

Python 14,148 773 Updated Jun 11, 2026

On-device intelligence.

Python 410 34 Updated Mar 24, 2025

JAX infrastructure for model optimisation

Python 86 19 Updated Jun 22, 2026

Mamba SSM architecture

Python 18,472 1,760 Updated Jun 15, 2026

A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.

Python 489 32 Updated Mar 10, 2025

A kernel library written in tilelang

Python 1,597 142 Updated Apr 23, 2026

Experimental GPU language with meta-programming

Jupyter Notebook 31 Updated Sep 6, 2024

Sequential Monte Carlo Speculative Decoding

Python 48 6 Updated Jun 20, 2026
Next