Skip to content
View alexarmbr's full-sized avatar

Block or report alexarmbr

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 333 29 Updated Jun 15, 2026

[CVPR 2026 Highlight] Official implementation of Log-linear Sparse Attention (LLSA).

Python 86 3 Updated May 1, 2026

mKernel: fast multi-node, multi-GPU fused kernels

Cuda 239 22 Updated Jun 8, 2026

Official native C++ client SDK for LiveKit: build realtime audio, video, and data applications using the LiveKit protocol.

C++ 63 28 Updated Jun 18, 2026
Cuda 62 12 Updated Dec 10, 2025

A simple, performant and scalable Jax LLM!

Python 2,330 539 Updated Jun 19, 2026

Puffing up reinforcement learning

C 6,017 491 Updated Jun 15, 2026

The best Claude Code that $200 can buy

Python 265 31 Updated Apr 6, 2026

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python 1,418 143 Updated Mar 19, 2026

AI agents running research on single-GPU nanochat training automatically

Python 87,683 12,677 Updated Mar 26, 2026

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 313 19 Updated Feb 24, 2026

A place to store reusable transformer components of my own creation or found on the interwebs

Python 80 12 Updated May 30, 2026

RL agent fusing real-time Binance futures data into Polymarket prediction markets. On-device training with MLX on Apple Silicon.

Python 383 98 Updated Jan 3, 2026

High-throughput tensor loading for PyTorch

Python 257 15 Updated Mar 11, 2026

fresh directories for every vibe

Shell 3,681 145 Updated May 20, 2026

The best ChatGPT that $100 can buy.

Python 55,231 7,588 Updated May 5, 2026

Triton-based Symmetric Memory operators and examples

Python 102 14 Updated May 15, 2026

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 100,892 28,060 Updated Jun 19, 2026

NSA Triton Kernels written with GPT5 and Opus 4.1

Python 70 6 Updated Aug 12, 2025

Cog wrapper of Black Forest Lab's / FLUX.1 Kontext [dev]

Python 21 10 Updated Aug 7, 2025

A Quirky Assortment of CuTe Kernels

Python 1,021 136 Updated Jun 19, 2026

Interactive visualizations of the geometric intuition behind diffusion models.

Svelte 1,146 59 Updated Jun 1, 2026

A PyTorch native platform for training generative AI models

Python 5,453 864 Updated Jun 19, 2026

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 810 28 Updated Oct 13, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,706 1,062 Updated Apr 30, 2026
Cuda 132 16 Updated Mar 19, 2026

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

Python 429 47 Updated Jul 5, 2025

A bunch of kernels that might make stuff slower 😉

Python 91 11 Updated Jun 18, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,394 1,054 Updated Jun 4, 2026

KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)

Jupyter Notebook 1,069 173 Updated Mar 24, 2026
Next