Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents
verl: Volcano Engine Reinforcement Learning for LLMs
Wan: Open and Advanced Large-Scale Video Generative Models
The official GitHub page for the survey paper "A Survey of Large Language Models".
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
My learning notes/codes for ML SYS.
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Efficient and easy multi-instance LLM serving
Chat2Graph: Graph Native Agentic System.
Integrating SSE with NVIDIA Triton Inference Server using a Python backend and Zephyr model. There is very less documentation how to use Nvidia Triton in Streaming use-cases ( hard to find in their…