Skip to content
View whybeyoung's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.

Block or report whybeyoung

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat histor…

Rust 173 49 Updated Apr 19, 2026

Self-hosted, open-source agent skill registry for enterprises. Publish & version skill packages, govern with RBAC and audit logs, deploy on-premise with Docker or Kubernetes.

Java 2,670 332 Updated Apr 17, 2026

中文文生图stable diffsion模型集合

418 24 Updated Feb 11, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 4,024 585 Updated Mar 13, 2026

rdma_demo

Python 1 Updated Apr 9, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 1 Updated Apr 17, 2026

Astron-xmod-shim — Lightweight, declarative middleware for reliably converging AI service workloads.

Go 101 16 Updated Nov 3, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 30 5 Updated Apr 18, 2026

Cross-platform AI workflow DSL converter supporting iFlytek Spark, Dify, and Coze platforms with unified intermediate representation and bidirectional transformation capabilities.

Go 23 3 Updated Mar 3, 2026
Go 81 6 Updated Sep 15, 2025

A workload for deploying LLM inference services on Kubernetes

Go 206 54 Updated Apr 14, 2026

OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go

Go 2 Updated Sep 22, 2025

Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

Go 424 74 Updated Apr 18, 2026

SGLang is a fast serving framework for large language models and vision language models.

Python 7 1 Updated Apr 6, 2026

This a simple implementation of an MCP server using iFlytek. It enables calling iFlytek workflows through MCP tools.

Python 27 8 Updated Mar 28, 2025

easy version of pyverbs

Python 6 2 Updated Apr 16, 2023

A lightweight data processing framework built on DuckDB and 3FS.

Python 4,947 442 Updated Mar 5, 2025

Expert Parallelism Load Balancer

Python 1,359 201 Updated Mar 24, 2025

Analyze computation-communication overlap in V3/R1.

1,149 145 Updated Mar 21, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,943 320 Updated Jan 14, 2026

My learning notes for ML SYS.

Python 6,048 396 Updated Apr 8, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,618 887 Updated Apr 17, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 9,136 1,154 Updated Apr 16, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,555 1,008 Updated Apr 7, 2026

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,973 287 Updated May 15, 2025

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 26,074 5,444 Updated Apr 19, 2026

🪄 Turns your machine learning code into microservices with web API, interactive GUI, and more.

Python 3,137 166 Updated Mar 30, 2026

A collection of community maintained NRI plugins

Go 102 34 Updated Apr 16, 2026

SciLifeLab Serve is a platform offering machine learning model serving, data science app hosting (Shiny, Gradio, Streamlit, Dash, etc.), and other tools to life science researchers affiliated with …

Python 14 3 Updated Apr 17, 2026

Examples of models deployable with Truss

Python 223 60 Updated Apr 16, 2026
Next