Starred repositories
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Kubernetes-native AI serving platform for scalable model serving.
A Datacenter Scale Distributed Inference Serving Framework
Community maintained hardware plugin for vLLM on Ascend
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
A high-throughput and memory-efficient inference and serving engine for LLMs
vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.
System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
AI 基础知识 - GPU 架构、CUDA 编程、大模型基础及AI Agent 相关知识
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
SGLang is a high-performance serving framework for large language models and multimodal models.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
tanjunchen / open-webui
Forked from open-webui/open-webuiUser-friendly AI Interface (Supports Ollama, OpenAI API, ...)
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
My learning notes for ML SYS.
每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈
FlashMLA: Efficient Multi-head Latent Attention Kernels