Skip to content
View key4ng's full-sized avatar

Block or report key4ng

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

fmchisel: Efficient Compression and Training Algorithms for Foundation Models

Python 61 6 Updated Oct 9, 2025

Efficient Triton Kernels for LLM Training

Python 5,725 413 Updated Oct 8, 2025

Make SGLang go brrr

35 8 Updated Sep 30, 2025

Ultra and Unified CCL

C++ 578 49 Updated Oct 9, 2025

The official Rust SDK for the Model Context Protocol

Rust 2,354 375 Updated Oct 9, 2025

Build resilient language agents as graphs.

Python 19,530 3,432 Updated Oct 9, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,262 631 Updated Oct 9, 2025

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.

Rust 10,396 711 Updated Oct 9, 2025

A command-line interface tool for serving LLM using vLLM.

Python 425 18 Updated Aug 25, 2025

Gensis is a lightweight deep learning framework written from scratch in Python, with Triton as its backend for high-performance computing.

Python 36 6 Updated Sep 25, 2025

A minimal, easy-to-read PyTorch reimplementation of the Qwen3 and Qwen2.5 VL with a fancy CLI

Python 167 7 Updated Sep 3, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,298 217 Updated Sep 26, 2025

some tiny learning projects in Rust

Rust 701 87 Updated Aug 29, 2024

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

JavaScript 111,932 15,478 Updated Oct 8, 2025

Examples and guides for using the OpenAI API

Jupyter Notebook 68,380 11,407 Updated Oct 9, 2025

MIT 6.828: Operating System Engineering lab / JOS

C 68 15 Updated Aug 1, 2018

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 100 17 Updated Oct 3, 2025

Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.

Python 218 23 Updated Oct 4, 2025

OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)

Go 286 45 Updated Oct 9, 2025

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 4,097 392 Updated Sep 10, 2025

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 44,907 7,653 Updated Dec 9, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,069 389 Updated Oct 9, 2025

Nano vLLM

Python 7,005 891 Updated Aug 31, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 14,114 2,515 Updated Oct 9, 2025

A Java library to use the OpenAI Api in the simplest possible way.

Java 344 49 Updated Sep 17, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,920 285 Updated May 15, 2025