Skip to content
View alexarmbr's full-sized avatar

Block or report alexarmbr

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 342 30 Updated Jun 15, 2026

[CVPR 2026 Highlight] Official implementation of Log-linear Sparse Attention (LLSA).

Python 86 3 Updated May 1, 2026

mKernel: fast multi-node, multi-GPU fused kernels

Cuda 240 22 Updated Jun 21, 2026

Official native C++ client SDK for LiveKit: build realtime audio, video, and data applications using the LiveKit protocol.

C++ 63 28 Updated Jun 22, 2026
Cuda 62 12 Updated Dec 10, 2025

A simple, performant and scalable Jax LLM!

Python 2,332 540 Updated Jun 22, 2026

Puffing up reinforcement learning

C 6,035 495 Updated Jun 15, 2026

The best Claude Code that $200 can buy

Python 265 31 Updated Apr 6, 2026

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python 1,422 143 Updated Mar 19, 2026

AI agents running research on single-GPU nanochat training automatically

Python 88,058 12,751 Updated Mar 26, 2026

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 314 19 Updated Feb 24, 2026

A place to store reusable transformer components of my own creation or found on the interwebs

Python 80 12 Updated May 30, 2026

RL agent fusing real-time Binance futures data into Polymarket prediction markets. On-device training with MLX on Apple Silicon.

Python 383 98 Updated Jan 3, 2026

High-throughput tensor loading for PyTorch

Python 256 15 Updated Mar 11, 2026

fresh directories for every vibe

Shell 3,681 145 Updated May 20, 2026

The best ChatGPT that $100 can buy.

Python 55,316 7,592 Updated May 5, 2026

Triton-based Symmetric Memory operators and examples

Python 102 14 Updated May 15, 2026

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 100,956 28,084 Updated Jun 22, 2026

NSA Triton Kernels written with GPT5 and Opus 4.1

Python 70 6 Updated Aug 12, 2025

Cog wrapper of Black Forest Lab's / FLUX.1 Kontext [dev]

Python 21 10 Updated Aug 7, 2025

A Quirky Assortment of CuTe Kernels

Python 1,026 136 Updated Jun 20, 2026

Interactive visualizations of the geometric intuition behind diffusion models.

Svelte 1,148 59 Updated Jun 1, 2026

A PyTorch native platform for training generative AI models

Python 5,456 868 Updated Jun 22, 2026

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 810 29 Updated Oct 13, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,709 1,063 Updated Apr 30, 2026
Cuda 132 16 Updated Mar 19, 2026

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

Python 428 47 Updated Jul 5, 2025

A bunch of kernels that might make stuff slower 😉

Python 91 11 Updated Jun 18, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,399 1,059 Updated Jun 4, 2026

KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)

Jupyter Notebook 1,073 174 Updated Mar 24, 2026
Next