Skip to content
View alexarmbr's full-sized avatar

Block or report alexarmbr

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 336 29 Updated Jun 15, 2026

[CVPR 2026 Highlight] Official implementation of Log-linear Sparse Attention (LLSA).

Python 86 3 Updated May 1, 2026

mKernel: fast multi-node, multi-GPU fused kernels

Cuda 239 22 Updated Jun 8, 2026

Official native C++ client SDK for LiveKit: build realtime audio, video, and data applications using the LiveKit protocol.

C++ 63 28 Updated Jun 21, 2026
Cuda 62 12 Updated Dec 10, 2025

A simple, performant and scalable Jax LLM!

Python 2,330 540 Updated Jun 21, 2026

Puffing up reinforcement learning

C 6,028 494 Updated Jun 15, 2026

The best Claude Code that $200 can buy

Python 265 31 Updated Apr 6, 2026

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python 1,418 143 Updated Mar 19, 2026

AI agents running research on single-GPU nanochat training automatically

Python 87,848 12,720 Updated Mar 26, 2026

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 313 19 Updated Feb 24, 2026

A place to store reusable transformer components of my own creation or found on the interwebs

Python 80 12 Updated May 30, 2026

RL agent fusing real-time Binance futures data into Polymarket prediction markets. On-device training with MLX on Apple Silicon.

Python 383 98 Updated Jan 3, 2026

High-throughput tensor loading for PyTorch

Python 256 15 Updated Mar 11, 2026

fresh directories for every vibe

Shell 3,681 145 Updated May 20, 2026

The best ChatGPT that $100 can buy.

Python 55,281 7,590 Updated May 5, 2026

Triton-based Symmetric Memory operators and examples

Python 102 14 Updated May 15, 2026

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 100,916 28,071 Updated Jun 21, 2026

NSA Triton Kernels written with GPT5 and Opus 4.1

Python 70 6 Updated Aug 12, 2025

Cog wrapper of Black Forest Lab's / FLUX.1 Kontext [dev]

Python 21 10 Updated Aug 7, 2025

A Quirky Assortment of CuTe Kernels

Python 1,023 136 Updated Jun 20, 2026

Interactive visualizations of the geometric intuition behind diffusion models.

Svelte 1,146 59 Updated Jun 1, 2026

A PyTorch native platform for training generative AI models

Python 5,452 866 Updated Jun 21, 2026

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 810 28 Updated Oct 13, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,708 1,063 Updated Apr 30, 2026
Cuda 132 16 Updated Mar 19, 2026

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

Python 428 47 Updated Jul 5, 2025

A bunch of kernels that might make stuff slower 😉

Python 91 11 Updated Jun 18, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,397 1,056 Updated Jun 4, 2026

KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)

Jupyter Notebook 1,071 174 Updated Mar 24, 2026
Next