Skip to content
View Jeffwan's full-sized avatar
  • Bytedance
  • Seattle, WA

Highlights

  • Pro

Organizations

@todogroup @kubernetes @ray-project @kubeflow @kubernetes-sigs @volcano-sh @aibrix

Block or report Jeffwan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A framework for efficient model inference with omni-modality models

Python 4,119 687 Updated Apr 4, 2026

Easy, Fast, and Scalable Multimodal AI

Python 121 8 Updated Apr 3, 2026

High Performance KV Cache Store for LLM

C 52 8 Updated Mar 31, 2026

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 840 95 Updated Apr 4, 2026

An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, subagents and message gateway, it handles different levels of…

Python 57,718 7,162 Updated Apr 4, 2026

KV cache store for distributed LLM inference

C++ 402 36 Updated Nov 13, 2025

A fast, clean, responsive Hugo theme.

HTML 13,323 3,361 Updated Mar 22, 2026

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,708 546 Updated Apr 4, 2026

Universal LLM Deployment Engine with ML Compilation

Python 22,337 1,987 Updated Apr 2, 2026

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 633 84 Updated Sep 11, 2024

Command line tool to create and query container image manifest list/indexes

Go 835 99 Updated Mar 30, 2026

Reproduce of Pre-warming is Not Enough (SoCC'24)

Scala 7 2 Updated Aug 12, 2025
Jupyter Notebook 178 11 Updated Mar 12, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,040 651 Updated Apr 3, 2026

Predict the performance of LLM inference services

Jupyter Notebook 23 1 Updated Sep 18, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 952 48 Updated Mar 29, 2026

vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)

C++ 947 131 Updated Jan 22, 2026

Efficient and easy multi-instance LLM serving

Python 541 47 Updated Mar 12, 2026

A library developed by Volcano Engine for high-performance reading and writing of PyTorch model files.

Python 25 7 Updated Jan 2, 2025

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,905 123 Updated Jan 21, 2024

Stateless cluster local OCI registry mirror.

Go 3,566 141 Updated Apr 3, 2026

Serverless LLM Serving for Everyone.

Python 667 69 Updated Mar 6, 2026

Fast Distributed Inference Serving for Large Language Models

4 Updated Oct 18, 2023

Distributed Model Serving Framework

Java 187 79 Updated Mar 19, 2026

Custom controller that extends the Horizontal Pod Autoscaler

Go 240 32 Updated Apr 1, 2026

paper and its code for AI System

357 23 Updated Feb 10, 2026

SpotServe: Serving Generative Large Language Models on Preemptible Instances

134 15 Updated Feb 22, 2024

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python 3,743 310 Updated May 21, 2025

[COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild

Python 4,745 525 Updated Nov 18, 2024
Next