yukavio

KavioYu yukavio

Work for Tencent-WXG. Focus on model inference optimization, such as inference engine and model compression.

Achievements

6 stars written in Python

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 62,435 11,108 Updated Nov 7, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 19,955 3,299 Updated Nov 7, 2025

A treasure chest for visual classification and recognition powered by PaddlePaddle

Python 5,750 1,193 Updated Oct 27, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 3,779 295 Updated Nov 6, 2025

PaddleSlim is an open-source library for deep model compression and architecture search.

Python 1,610 353 Updated Oct 27, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,222 104 Updated Oct 17, 2025