Skip to content
View vMaroon's full-sized avatar

Organizations

@IBM @stolostron @neuralmagic @llm-d

Block or report vMaroon

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A personal PR-review extension.

Python 1 Updated Mar 9, 2026

Standardized Serverless ML Inference Platform on Kubernetes

Go 3 19 Updated Mar 31, 2026

A framework for efficient model inference with omni-modality models

Python 4,088 669 Updated Apr 1, 2026

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 832 95 Updated Apr 1, 2026

Span Queries: What if we had a way to plan and optimize GenAI like we do for SQL?

Rust 13 8 Updated Mar 31, 2026

Main Kagenti repo - installer, UI and docs

Python 164 65 Updated Apr 1, 2026

GenAI inference performance benchmarking tool

Python 162 77 Updated Mar 25, 2026

Incubating P/D sidecar for llm-d

Go 16 29 Updated Nov 13, 2025

llm-d benchmark scripts and tooling

Python 53 64 Updated Apr 1, 2026

A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.

Go 105 68 Updated Apr 1, 2026

Helm charts for llm-d

Shell 52 56 Updated Jul 22, 2025

Inference scheduler for llm-d

Go 158 150 Updated Apr 1, 2026

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2,875 384 Updated Apr 1, 2026

Distributed KV cache scheduling & offloading libraries

Go 122 107 Updated Mar 31, 2026
Python 104 27 Updated Jul 21, 2025

Gateway API Inference Extension

Go 634 274 Updated Apr 1, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 74,919 15,050 Updated Apr 1, 2026

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,785 1,025 Updated Mar 30, 2026

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 2,249 380 Updated Mar 31, 2026
Go 1 Updated Jan 28, 2025

LangChain for Go, the easiest way to write LLM-based programs in Go

Go 8,981 1,069 Updated Jan 11, 2026

GUI tool for visualizing the result data of deBruijn sequence complexity distribution study

C++ 2 Updated Feb 20, 2024

KubeStellar - a flexible solution for multi-cluster configuration management for edge, multi-cloud, and hybrid cloud

Go 651 258 Updated Mar 30, 2026

the main repository for the multicluster global hub

Go 22 35 Updated Apr 1, 2026