Skip to content
View zetxqx's full-sized avatar

Block or report zetxqx

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Sutskever 30 implementations inspired by https://papercode.vercel.app/

Jupyter Notebook 3,123 424 Updated Feb 8, 2026

LLMRouter: An Open-Source Library for LLM Routing

Python 1,356 125 Updated Feb 12, 2026

Agentic networking policies and governance for agents and tools in Kubernetes

Go 48 16 Updated Feb 11, 2026

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.

Go 1,150 126 Updated Feb 13, 2026

Kubernetes-native Job Queueing

Go 2,313 522 Updated Feb 13, 2026

Proposals and discussions for the AI Conformance Working Group.

20 6 Updated Dec 17, 2025

🚀 Next Generation AI One-Stop Internationalization Solution. 🚀 下一代 AI 一站式 B/C 端解决方案,支持 OpenAI,Midjourney,Claude,讯飞星火,Stable Diffusion,DALL·E,ChatGLM,通义千问,腾讯混元,360 智脑,百川 AI,火山方舟,新必应,Gemini,Moonshot …

TypeScript 8,939 1,180 Updated Jan 23, 2026

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

C++ 1,052 140 Updated Feb 13, 2026

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 23,559 4,432 Updated Feb 13, 2026

A flexible, high-performance serving system for machine learning models

C++ 6,353 2,204 Updated Dec 18, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,713 1,006 Updated Feb 4, 2026

Bootstrap Kubernetes the hard way. No scripts.

47,353 15,486 Updated Apr 10, 2025

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 94,414 11,150 Updated Feb 13, 2026

llm-d benchmark scripts and tooling

Jupyter Notebook 47 48 Updated Feb 13, 2026

Inference scheduler for llm-d

Go 129 124 Updated Feb 12, 2026

Run Slurm on Kubernetes. A Slinky project.

Go 236 63 Updated Feb 12, 2026

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python 8,435 912 Updated Feb 11, 2026

Supercharge Your LLM with the Fastest KV Cache Layer

Python 6,887 898 Updated Feb 13, 2026

helm charts for deploying models with llm-d

Go Template 27 49 Updated Feb 11, 2026

Cloud-native high-performance edge/middle/service proxy

C++ 27,509 5,247 Updated Feb 13, 2026

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,615 524 Updated Feb 13, 2026

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 2,166 367 Updated Feb 13, 2026

NVIDIA Inference Xfer Library (NIXL)

C++ 882 240 Updated Feb 13, 2026

Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation…

Go 5,323 736 Updated Feb 13, 2026

Text-audio foundation model from Boson AI

Python 7,914 606 Updated Jan 18, 2026

GenAI inference performance benchmarking tool

Python 145 70 Updated Feb 13, 2026

An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.

Python 17,613 2,888 Updated Feb 13, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 70,221 13,434 Updated Feb 13, 2026

Nano vLLM

Python 11,673 1,569 Updated Nov 3, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,095 856 Updated Feb 13, 2026
Next