Work for Tencent-WXG. Focus on model inference optimization, such as inference engine and model compression.
- Shanghai
Stars
6
stars
written in Python
Clear filter
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
A treasure chest for visual classification and recognition powered by PaddlePaddle
🚀 Efficient implementations of state-of-the-art linear attention models
PaddleSlim is an open-source library for deep model compression and architecture search.
Distributed Compiler based on Triton for Parallel Systems