Skip to content
View qimcis's full-sized avatar
🌱
🌱

Block or report qimcis

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,383 217 Updated Dec 23, 2025
Python 629 61 Updated Dec 25, 2025

Nano vLLM

Python 10,156 1,272 Updated Nov 3, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,008 161 Updated Dec 20, 2025

Accelerate inference without tears

Python 370 22 Updated Nov 17, 2025

TPU inference for vLLM, with unified JAX and PyTorch support.

Python 201 64 Updated Dec 25, 2025

NanoGPT (124M) in 3 minutes

Python 4,010 532 Updated Dec 25, 2025

Tenstorrent MLIR compiler

C++ 226 87 Updated Dec 26, 2025

🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.

C++ 1,294 314 Updated Dec 26, 2025

Universal LLM Deployment Engine with ML Compilation

Python 21,781 1,891 Updated Dec 24, 2025

Open Machine Learning Compiler Framework

Python 12,965 3,747 Updated Dec 26, 2025

Efficient Triton Kernels for LLM Training

Python 5,981 455 Updated Dec 25, 2025

Blazingly fast LLM inference.

Rust 6,301 497 Updated Dec 19, 2025

Open-source search and retrieval database for AI applications.

Rust 25,139 1,981 Updated Dec 25, 2025

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.

Rust 10,724 746 Updated Dec 26, 2025

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

Rust 552 64 Updated Dec 24, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,434 638 Updated Dec 25, 2025

A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1

SystemVerilog 1,079 84 Updated Aug 21, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,684 755 Updated Dec 25, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 41,087 4,672 Updated Dec 24, 2025

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python 8,331 898 Updated Dec 23, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 66,202 12,193 Updated Dec 26, 2025

Use your Neovim like using Cursor AI IDE!

Lua 16,874 773 Updated Dec 22, 2025