OCWC22

Follow

William Chen OCWC22

Follow

22 followers · 57 following

Achievements

Achievements

Highlights

Pro

Lists (2)

Sort

🔮 Future ideas

GPU Programming

Starred repositories

RightNow-AI / StreamIndex

Memory-bounded compressed sparse attention via streaming top-k. Triton kernels for the DeepSeek-V4 lightning indexer. 32x regime extension on a single H200 | by RightNow https://www.rightnowai.co/

Python 20 5 Updated May 5, 2026

recursive-org / first-steps-toward-automated-ai-research

Research artifacts from Recursive's automated AI research system

Python 123 12 Updated Jun 11, 2026

pqvst / cafeandcowork

Cafe and Cowork. Find places to work. Open and collaborative.

Pug 79 33 Updated Jun 2, 2026

Dogacel / auto-gpu-kernel

Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average speedup of 34.93x

Python 140 10 Updated Jun 10, 2026

apple / coreai-models

Model export recipes, Python primitives, and Swift runtime utilities for on-device AI

Swift 1,076 83 Updated Jun 18, 2026

agentic-in / inferoa

Inference-native Tokenmaxxing Agent Harness for Loop Engineering

TypeScript 213 35 Updated Jun 18, 2026

mcrao / infert-tutor-arena-capstone

Python 1 Updated Jun 8, 2026

RightNow-AI / AutoMegaKernel

An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode, paper: https://arxiv.org/abs/2606.09682

Python 67 8 Updated Jun 18, 2026

nirw4nna / hipgemm

Fast FP8 GEMM on AMD CDNA4

C++ 3 Updated May 27, 2026

repoprompt / repoprompt-ce

Community edition of RepoPrompt: a native macOS context engineering app for AI coding agents, with an MCP CLI.

Swift 294 65 Updated Jun 20, 2026

hikarioyama / vllm-nvfp4-kv-sm120

NVFP4 KV cache for vLLM on SM120 (RTX PRO 6000) via FlashInfer FA2 explicit-SF-stride patch — ~1.5x fp8 pool at ~95-104% speed

Python 15 1 Updated Jun 5, 2026

heardlabs / heard

A voice companion for AI coding agents. Speaks your agent's replies so you can keep working.

Python 115 15 Updated Jun 19, 2026

Weekendsuperhero-io / NanoViz

A raspberry pi AirPlay visualizer.

Rust 4 Updated Jun 17, 2026

WeekendSuperhero / WeekendSuperhero

Config files for my GitHub profile.

1 Updated May 30, 2026

Luce-Org / lucebox-hub

Fast LLM speculative inference server for consumer hardware.

C++ 2,574 241 Updated Jun 20, 2026

foundry-org / foundry

Foundry materializes CUDA graphs along with its execution context to disk to support fast cold start of serving engines.

C++ 36 4 Updated Jun 15, 2026

perplexityai / pplx-garden

Perplexity open source garden for inference technology

Rust 581 56 Updated May 27, 2026

intel / iaprof

AI/GPU flame graph

C++ 259 9 Updated Jun 9, 2026

deepreinforce-ai / CUDA-L2

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

Cuda 441 28 Updated Mar 30, 2026

mit-han-lab / kernel-design-agents

613 51 Updated Jun 2, 2026

garrytan / gbrain

Garry's Opinionated OpenClaw/Hermes Agent Brain

TypeScript 23,533 3,379 Updated Jun 18, 2026

HamzaElshafie / tk_attention

ThunderKittens LCF forward non-causal attention kernel benchmarked against FlashAttention-2 and FlashAttention-3 on Hopper.

Cuda 11 Updated May 23, 2026

aisa-group / InferenceBench

Benchmarking Open-Ended Inference Optimization by AI Agents

Python 27 4 Updated May 16, 2026

Dynamis-Labs / spectralquant

SpectralQuant: Calibrated Eigenbasis Rotation and Water-Filled Bit Allocation for KV-Cache Compression

Python 195 22 Updated May 15, 2026

andyluo7 / cpu-gpu-codesign-agentic-inference

CPU-GPU co-design analysis for agentic LLM inference. Blog: andyluo7.github.io

Python 7 1 Updated May 14, 2026

NovaSky-AI / SkyRL

SkyRL: A Modular Full-stack RL Library for LLMs

Python 2,009 356 Updated Jun 20, 2026

OCWC22 / hermes-agent

Forked from NousResearch/hermes-agent

The agent that grows with you

Python 1 Updated May 29, 2026

lightseekorg / TorchSpec

A PyTorch native library for training speculative decoding models

Python 168 39 Updated Jun 12, 2026

tensormux / kernel-skills

Open source skill library for AI coding agents to write, optimize, and debug high performance compute kernels across CUDA, Triton, and quantized workloads.

TypeScript 23 5 Updated Jun 11, 2026

AtomicBot-ai / Atomic-Chat

Forked from janhq/jan

Local AI app and inference engine for agents. Run open-weight LLMs locally — private, 100% offline on your computer.

TypeScript 931 87 Updated Jun 19, 2026

Starred topics

React