Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
- All languages
- AppleScript
- Astro
- BitBake
- C
- C#
- C++
- CSS
- Cuda
- Dart
- Elixir
- Go
- Go Template
- HTML
- Java
- JavaScript
- Jupyter Notebook
- KiCad Layout
- Kotlin
- LLVM
- Lua
- MDX
- MLIR
- Makefile
- Markdown
- Mojo
- Objective-C
- OpenSCAD
- PLpgSQL
- Pug
- Python
- Ruby
- Rust
- Scala
- Shell
- Solidity
- Svelte
- Swift
- SystemVerilog
- Tcl
- TeX
- TypeScript
- Vue
- Zig
Starred repositories
Memory-bounded compressed sparse attention via streaming top-k. Triton kernels for the DeepSeek-V4 lightning indexer. 32x regime extension on a single H200 | by RightNow https://www.rightnowai.co/
Research artifacts from Recursive's automated AI research system
Cafe and Cowork. Find places to work. Open and collaborative.
Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average speedup of 34.93x
Model export recipes, Python primitives, and Swift runtime utilities for on-device AI
Inference-native Tokenmaxxing Agent Harness for Loop Engineering
An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode, paper: https://arxiv.org/abs/2606.09682
Community edition of RepoPrompt: a native macOS context engineering app for AI coding agents, with an MCP CLI.
NVFP4 KV cache for vLLM on SM120 (RTX PRO 6000) via FlashInfer FA2 explicit-SF-stride patch — ~1.5x fp8 pool at ~95-104% speed
A voice companion for AI coding agents. Speaks your agent's replies so you can keep working.
Fast LLM speculative inference server for consumer hardware.
Foundry materializes CUDA graphs along with its execution context to disk to support fast cold start of serving engines.
Perplexity open source garden for inference technology
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
Garry's Opinionated OpenClaw/Hermes Agent Brain
ThunderKittens LCF forward non-causal attention kernel benchmarked against FlashAttention-2 and FlashAttention-3 on Hopper.
Benchmarking Open-Ended Inference Optimization by AI Agents
SpectralQuant: Calibrated Eigenbasis Rotation and Water-Filled Bit Allocation for KV-Cache Compression
CPU-GPU co-design analysis for agentic LLM inference. Blog: andyluo7.github.io
SkyRL: A Modular Full-stack RL Library for LLMs
OCWC22 / hermes-agent
Forked from NousResearch/hermes-agentThe agent that grows with you
A PyTorch native library for training speculative decoding models
Open source skill library for AI coding agents to write, optimize, and debug high performance compute kernels across CUDA, Triton, and quantized workloads.
AtomicBot-ai / Atomic-Chat
Forked from janhq/janLocal AI app and inference engine for agents. Run open-weight LLMs locally — private, 100% offline on your computer.