Stars
A collection of modern C++ libraries, include coro_http, coro_rpc, compile-time reflection, struct_pack, struct_json, struct_xml, struct_pb, easylog, async_simple etc.
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
scalable and robust tree-based speculative decoding algorithm
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Efficient and easy multi-instance LLM serving
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
A high-throughput and memory-efficient inference and serving engine for LLMs
Cost-efficient and pluggable Infrastructure components for GenAI inference
Achieve state of the art inference performance with modern accelerators on Kubernetes
Supercharge Your LLM with the Fastest KV Cache Layer
A generative world for general-purpose robotics & embodied AI learning.
Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
๐ฅ SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
This repository collects papers on VLLM applications. We will update new papers irregularly.
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! ๐ฅ
Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
The official repo of Qwen-VL (้ไนๅ้ฎ-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
Collection of AWESOME vision-language models for vision tasks
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
๐ฐ Must-read papers on KV Cache Compression (constantly updating ๐ค).