Stars
A generative world for general-purpose robotics & embodied AI learning.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Achieve state of the art inference performance with modern accelerators on Kubernetes
A high-throughput and memory-efficient inference and serving engine for LLMs
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
The official Python client for the Hugging Face Hub.
Cost-efficient and pluggable Infrastructure components for GenAI inference
Supercharge Your LLM with the Fastest KV Cache Layer
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
A debugging and profiling tool that can trace and visualize python code execution
A collection of modern C++ libraries, include coro_http, coro_rpc, compile-time reflection, struct_pack, struct_json, struct_xml, struct_pb, easylog, async_simple etc.
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
π A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! π₯
A throughput-oriented high-performance serving framework for LLMs
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
Collection of AWESOME vision-language models for vision tasks
π° Must-read papers on KV Cache Compression (constantly updating π€).
[Lumina Embodied AI] ε ·θΊ«ζΊθ½ζζ―ζε Embodied-AI-Guide
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
This repository collects papers on VLLM applications. We will update new papers irregularly.
Efficient and easy multi-instance LLM serving
TAPA compiles task-parallel HLS program into high-performance FPGA accelerators.