-
NIVIC
- HeFei
Lists (3)
Sort Name ascending (A-Z)
Stars
Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Industry-first gRPC pipeline, KV cache-aware routing, chat histor…
Self-hosted, open-source agent skill registry for enterprises. Publish & version skill packages, govern with RBAC and audit logs, deploy on-premise with Docker or Kubernetes.
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
whybeyoung / sglang
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Astron-xmod-shim — Lightweight, declarative middleware for reliably converging AI service workloads.
antgroup / sglang
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Cross-platform AI workflow DSL converter supporting iFlytek Spark, Dify, and Coze platforms with unified intermediate representation and bidirectional transformation capabilities.
A workload for deploying LLM inference services on Kubernetes
whybeyoung / go-openai
Forked from sashabaranov/go-openaiOpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton
fzyzcjy / sglang
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
This a simple implementation of an MCP server using iFlytek. It enables calling iFlytek workflows through MCP tools.
A lightweight data processing framework built on DuckDB and 3FS.
Analyze computation-communication overlap in V3/R1.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
My learning notes for ML SYS.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient Multi-head Latent Attention Kernels
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
SGLang is a high-performance serving framework for large language models and multimodal models.
🪄 Turns your machine learning code into microservices with web API, interactive GUI, and more.
A collection of community maintained NRI plugins
SciLifeLab Serve is a platform offering machine learning model serving, data science app hosting (Shiny, Gradio, Streamlit, Dash, etc.), and other tools to life science researchers affiliated with …
Examples of models deployable with Truss