Skip to content
View vMaroon's full-sized avatar

Organizations

@IBM @stolostron @neuralmagic @llm-d

Block or report vMaroon

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

General agent evaluation framework

Python 51 9 Updated Apr 15, 2026

A personal PR-review extension.

Python 1 Updated Mar 9, 2026

Standardized Serverless ML Inference Platform on Kubernetes

Go 3 20 Updated Apr 17, 2026

A framework for efficient model inference with omni-modality models

Python 4,383 784 Updated Apr 18, 2026

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 859 101 Updated Apr 7, 2026

Span Queries: What if we had a way to plan and optimize GenAI like we do for SQL?

Rust 13 8 Updated Apr 18, 2026

Main Kagenti repo - installer, UI and docs

Python 182 74 Updated Apr 17, 2026

GenAI inference performance benchmarking tool

Python 174 85 Updated Apr 16, 2026

Incubating P/D sidecar for llm-d

Go 17 29 Updated Nov 13, 2025

llm-d benchmark scripts and tooling

Python 57 70 Updated Apr 18, 2026

A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.

Go 115 74 Updated Apr 16, 2026

Helm charts for llm-d

Shell 52 57 Updated Jul 22, 2025

Inference scheduler for llm-d

Go 168 166 Updated Apr 17, 2026

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 3,015 419 Updated Apr 17, 2026

Distributed KV cache scheduling & offloading libraries

Go 129 111 Updated Apr 16, 2026
Python 105 26 Updated Jul 21, 2025

Gateway API Inference Extension

Go 644 279 Updated Apr 18, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 77,143 15,770 Updated Apr 18, 2026

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,807 1,033 Updated Mar 30, 2026

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 2,281 389 Updated Apr 16, 2026
Go 1 Updated Jan 28, 2025

LangChain for Go, the easiest way to write LLM-based programs in Go

Go 9,103 1,083 Updated Jan 11, 2026

GUI tool for visualizing the result data of deBruijn sequence complexity distribution study

C++ 2 Updated Feb 20, 2024

KubeStellar - a flexible solution for multi-cluster configuration management for edge, multi-cloud, and hybrid cloud

Go 655 262 Updated Apr 16, 2026

the main repository for the multicluster global hub

Go 22 35 Updated Apr 18, 2026