catswe

will b. catswe

doing things at the edge of stability

39 followers · 6 following

Achievements

Stars

facebookexperimental / CUTracer

A dynamic binary instrumentation tool for tracing and analyzing CUDA kernel instructions.

Python 69 9 Updated Jun 22, 2026

sandyresearch / chipmunk

🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× vs cuBLAS

Cuda 111 2 Updated Sep 8, 2025

Dao-AILab / causal-conv1d

Causal depthwise conv1d in CUDA, with a PyTorch interface

Cuda 909 193 Updated May 9, 2026

bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 1,446 226 Updated Mar 20, 2024

Zishan-Shao / FlashSVD

[AAAI 2026] Official implementation of "FlashSVD: Memory-Efficient Inference with Streaming for Low-Rank Models". If you find this repository helpful, please consider starring 🌟 it to support the p…

Python 17 2 Updated May 1, 2026

NX-AI / xlstm

Official repository of the xLSTM.

Python 2,175 184 Updated May 28, 2026

adamzweiger / compaction

Algorithms for latent compaction

Python 250 27 Updated Apr 22, 2026

Zyphra / ZONOS2

Zonos2 is a leading open-weight text-to-speech MoE.

Python 236 25 Updated Jun 16, 2026

lucidrains / simplicial-attention

Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Roy et al. (2025)

Python 49 1 Updated Sep 2, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,329 223 Updated Jun 22, 2026

open-lm-engine / coda-kernels

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Python 216 22 Updated Jun 23, 2026

deep-spin / adasplash

AdaSplash: Adaptive Sparse Flash Attention (aka Flash Entmax Attention)

Python 45 2 Updated May 20, 2026

OliverSieberling / dynamic-conv1d

Triton kernels for dynamic causal short convolutions.

Python 22 1 Updated Jun 4, 2026

open-lm-engine / lm-engine

LM engine is a library for pretraining/finetuning LLMs

Python 180 30 Updated Jun 23, 2026

NVlabs / GatedDeltaNet-2

Official PyTorch Implementation of Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Python 223 18 Updated May 25, 2026

NVlabs / GatedDeltaNet

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 610 32 Updated Mar 13, 2026

gau-nernst / learn-cuda

Learn CUDA with PyTorch

Cuda 337 50 Updated Jun 1, 2026

inclusionAI / cuLA

CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.

Python 524 65 Updated Jun 23, 2026

Yifei-Zuo / Parallax

Official repository for Parallax (Parameterized Local Linear Attention)

Python 61 5 Updated Jun 20, 2026

affaan-m / ECC

The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.

JavaScript 220,052 33,719 Updated Jun 22, 2026

wdlctc / delta-attention-residuals-code

Delta Attention Residuals - supplementary code and pretrained models

Python 35 1 Updated May 20, 2026

Dynamis-Labs / spectralquant

SpectralQuant: Calibrated Eigenbasis Rotation and Water-Filled Bit Allocation for KV-Cache Compression

Python 195 22 Updated May 15, 2026

KittenML / KittenTTS

State-of-the-art TTS model under 25MB 😻

Python 14,148 773 Updated Jun 11, 2026

cartesia-ai / edge

On-device intelligence.

Python 410 34 Updated Mar 24, 2025

trymirai / lalamo

JAX infrastructure for model optimisation

Python 86 19 Updated Jun 22, 2026

state-spaces / mamba

Mamba SSM architecture

Python 18,472 1,760 Updated Jun 15, 2026

rkinas / triton-resources

A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.

Python 489 32 Updated Mar 10, 2025

deepseek-ai / TileKernels

A kernel library written in tilelang

Python 1,597 142 Updated Apr 23, 2026

kuterd / opal_ptx

Experimental GPU language with meta-programming

Jupyter Notebook 31 Updated Sep 6, 2024

abdelfattah-lab / smcsd

Sequential Monte Carlo Speculative Decoding

Python 48 6 Updated Jun 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

will b. catswe

Achievements

Achievements

Block or report catswe

Stars

facebookexperimental / CUTracer

sandyresearch / chipmunk

Dao-AILab / causal-conv1d

bigscience-workshop / Megatron-DeepSpeed

Zishan-Shao / FlashSVD

NX-AI / xlstm

adamzweiger / compaction

Zyphra / ZONOS2

lucidrains / simplicial-attention

mirage-project / mirage

open-lm-engine / coda-kernels

deep-spin / adasplash

OliverSieberling / dynamic-conv1d

open-lm-engine / lm-engine

NVlabs / GatedDeltaNet-2

NVlabs / GatedDeltaNet

gau-nernst / learn-cuda

inclusionAI / cuLA

Yifei-Zuo / Parallax

affaan-m / ECC

wdlctc / delta-attention-residuals-code

Dynamis-Labs / spectralquant

KittenML / KittenTTS

cartesia-ai / edge

trymirai / lalamo

state-spaces / mamba

rkinas / triton-resources

deepseek-ai / TileKernels

kuterd / opal_ptx

abdelfattah-lab / smcsd