Skip to content
View OCWC22's full-sized avatar

Highlights

  • Pro

Block or report OCWC22

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Memory-bounded compressed sparse attention via streaming top-k. Triton kernels for the DeepSeek-V4 lightning indexer. 32x regime extension on a single H200 | by RightNow https://www.rightnowai.co/

Python 20 5 Updated May 5, 2026

Research artifacts from Recursive's automated AI research system

Python 123 12 Updated Jun 11, 2026

Cafe and Cowork. Find places to work. Open and collaborative.

Pug 79 33 Updated Jun 2, 2026

Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average speedup of 34.93x

Python 140 10 Updated Jun 10, 2026

Model export recipes, Python primitives, and Swift runtime utilities for on-device AI

Swift 1,076 83 Updated Jun 18, 2026

Inference-native Tokenmaxxing Agent Harness for Loop Engineering

TypeScript 213 35 Updated Jun 18, 2026

An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode, paper: https://arxiv.org/abs/2606.09682

Python 67 8 Updated Jun 18, 2026

Fast FP8 GEMM on AMD CDNA4

C++ 3 Updated May 27, 2026

Community edition of RepoPrompt: a native macOS context engineering app for AI coding agents, with an MCP CLI.

Swift 294 65 Updated Jun 20, 2026

NVFP4 KV cache for vLLM on SM120 (RTX PRO 6000) via FlashInfer FA2 explicit-SF-stride patch — ~1.5x fp8 pool at ~95-104% speed

Python 15 1 Updated Jun 5, 2026

A voice companion for AI coding agents. Speaks your agent's replies so you can keep working.

Python 115 15 Updated Jun 19, 2026

A raspberry pi AirPlay visualizer.

Rust 4 Updated Jun 17, 2026

Config files for my GitHub profile.

1 Updated May 30, 2026

Fast LLM speculative inference server for consumer hardware.

C++ 2,574 241 Updated Jun 20, 2026

Foundry materializes CUDA graphs along with its execution context to disk to support fast cold start of serving engines.

C++ 36 4 Updated Jun 15, 2026

Perplexity open source garden for inference technology

Rust 581 56 Updated May 27, 2026

AI/GPU flame graph

C++ 259 9 Updated Jun 9, 2026

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

Cuda 441 28 Updated Mar 30, 2026

Garry's Opinionated OpenClaw/Hermes Agent Brain

TypeScript 23,533 3,379 Updated Jun 18, 2026

ThunderKittens LCF forward non-causal attention kernel benchmarked against FlashAttention-2 and FlashAttention-3 on Hopper.

Cuda 11 Updated May 23, 2026

Benchmarking Open-Ended Inference Optimization by AI Agents

Python 27 4 Updated May 16, 2026

SpectralQuant: Calibrated Eigenbasis Rotation and Water-Filled Bit Allocation for KV-Cache Compression

Python 195 22 Updated May 15, 2026

CPU-GPU co-design analysis for agentic LLM inference. Blog: andyluo7.github.io

Python 7 1 Updated May 14, 2026

SkyRL: A Modular Full-stack RL Library for LLMs

Python 2,009 356 Updated Jun 20, 2026

The agent that grows with you

Python 1 Updated May 29, 2026

A PyTorch native library for training speculative decoding models

Python 168 39 Updated Jun 12, 2026

Open source skill library for AI coding agents to write, optimize, and debug high performance compute kernels across CUDA, Triton, and quantized workloads.

TypeScript 23 5 Updated Jun 11, 2026

Local AI app and inference engine for agents. Run open-weight LLMs locally — private, 100% offline on your computer.

TypeScript 931 87 Updated Jun 19, 2026
Next