- Shanghai
Stars
Distributed Compiler based on Triton for Parallel Systems
🚀 Efficient implementations of state-of-the-art linear attention models
A Datacenter Scale Distributed Inference Serving Framework
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
SGLang is a fast serving framework for large language models and vision language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
PaddlePaddle High Performance Deep Learning Inference Engine for Mobile and Edge (飞桨高性能深度学习端侧推理引擎)
A treasure chest for visual classification and recognition powered by PaddlePaddle
PaddleSlim is an open-source library for deep model compression and architecture search.