Skip to content
View woodx9's full-sized avatar
🙂
🙂

Block or report woodx9

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

TokenSpeed is a speed-of-light LLM inference engine.

Python 1,435 156 Updated Jun 15, 2026

System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge

Go 4,383 707 Updated Jun 15, 2026

LMCache: Supercharge Your LLM with the Fastest KV Cache Layer

Python 9,131 1,329 Updated Jun 15, 2026

The official Go library for the OpenAI API

Go 3,297 327 Updated Jun 11, 2026

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Go 44,794 4,066 Updated Jun 15, 2026

Based on the RV32I ISA, aiming to implement the complete functions of the CPU without considering synthesis, timing, and latency.

Verilog 2 Updated Jun 20, 2025

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 891 253 Updated Jun 15, 2026

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 3,403 545 Updated Jun 15, 2026

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 1,086 88 Updated Sep 4, 2024

从零构建大模型:从预训练到RLHF的完整实践

Python 2,665 207 Updated May 20, 2026

Algorithm powering the For You feed on X

Rust 26,183 4,499 Updated May 15, 2026

a embedding infer server faster than vllm and sglang

Python 17 1 Updated Feb 10, 2026
Python 1,292 134 Updated May 20, 2026

Nano vLLM

Python 14,041 2,219 Updated Apr 26, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 4,403 700 Updated May 17, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,797 1,050 Updated Jun 15, 2026

Fast and memory-efficient exact attention

Python 24,157 2,832 Updated Jun 10, 2026
Python 62 11 Updated Jun 19, 2024

LeetGPU Solutions

Python 119 5 Updated Oct 9, 2025

leetTriton

Python 2 Updated Sep 9, 2025

Getting Started with Triton: A Tutorial for Python Beginners

HTML 60 5 Updated Mar 26, 2026

A powerful MCP toolkit for coding, providing semantic retrieval and editing capabilities - the IDE for your agent

Python 25,392 1,702 Updated Jun 15, 2026

Kode CLI — Design for post-human workflows. One unit agent for every human & computer task.

TypeScript 5,125 765 Updated Jun 9, 2026

Merge superpoint、lightglue、MixVPR into VINS-FUSION for loop closure with TensorRT

C++ 157 21 Updated Nov 12, 2024

Code search MCP for Claude Code. Make entire codebase the context for any coding agent.

TypeScript 11,852 874 Updated Jun 8, 2026

Extract and compare system prompts and tools from different Claude Code versions

TypeScript 456 36 Updated Oct 30, 2025

WPF+litegraph.js+Webview实现的混合图节点编辑器

JavaScript 26 4 Updated May 2, 2025

Build a Claude Code–like CLI coding agent from scratch.

Python 163 32 Updated May 18, 2026

Use Claude Code as the foundation for coding infrastructure, allowing you to decide how to interact with the model while enjoying updates from Anthropic.

TypeScript 35,003 2,873 Updated Mar 4, 2026

a great vscode extension

TypeScript 1 Updated Aug 6, 2025
Next