Skip to content
View dukebw's full-sized avatar

Highlights

  • Pro

Block or report dukebw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 328 29 Updated Jun 15, 2026

TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration

Python 1,574 183 Updated Mar 27, 2026

adds /goal functionality similar to that used in Codex and Claude Code to OpenCode.

JavaScript 102 8 Updated Jun 15, 2026

Ideogram 4: Open image model at the forefront of design

Python 2,125 209 Updated Jun 4, 2026

NVIDIA FastGen: Fast Generation from Diffusion Models

Python 815 65 Updated Jun 7, 2026

Conveniently export torch.compile compiled products into self-contained Python files

Python 33 2 Updated Jun 5, 2026

TokenSpeed is a speed-of-light LLM inference engine.

Python 1,452 160 Updated Jun 18, 2026

Ready-to-use ML training recipes to help you build and deploy models on Baseten.

Python 55 8 Updated Jun 17, 2026

The lightweight framework for building agents

Python 453 49 Updated Jun 16, 2026

AI Tensor Engine for ROCm

Python 465 359 Updated Jun 18, 2026

A modern alternative to ls

Rust 22,338 456 Updated May 31, 2026

A fast type checker and language server for Python

Rust 6,666 407 Updated Jun 18, 2026

Performant kernels, and other ML Systems integrations

Cuda 7 2 Updated Jun 8, 2026

From a+b to sparsemax(QK^T)V in Triton!

Jupyter Notebook 34 Updated Jun 19, 2025

Anthropic's original performance take-home, now open for you to try!

Python 3,897 888 Updated Jan 22, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 6,515 606 Updated Jun 18, 2026

A kernel library written in tilelang

Python 1,595 140 Updated Apr 23, 2026

CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.

Python 523 64 Updated Jun 17, 2026

Use Codex from Claude Code to review code or delegate tasks.

JavaScript 21,240 1,287 Updated Jun 14, 2026

Region-level profiling for CUDA kernels with trace, NVBit, CUPTI, NSys, and an interactive Explorer.

Python 118 11 Updated Apr 17, 2026

Dashboard for InferenceX™, Open Source Continuous Inference

TypeScript 30 9 Updated Jun 18, 2026
Python 232 8 Updated Oct 27, 2025

Skills for Real Engineers. Straight from my .claude directory.

Shell 134,666 11,678 Updated Jun 18, 2026

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 896 259 Updated Jun 16, 2026

Agent Lattice: a knowledge graph for your codebase, written in markdown.

TypeScript 1,653 108 Updated Apr 2, 2026

A monitor of resources

C++ 32,908 1,055 Updated Jun 6, 2026

A markdown native slides tool for academics building with agents.

Python 208 11 Updated Jun 4, 2026

CLI proxy that reduces LLM token consumption by 60-90% on common dev commands. Single Rust binary, zero dependencies

Rust 63,519 3,907 Updated Jun 17, 2026

Gemini auth plugin for opencode

TypeScript 1,675 114 Updated Jun 3, 2026
Next