Work for Tencent-WXG. Focus on model inference optimization, such as inference engine and model compression.
- Shanghai
Stars
1
result
for sponsorable starred repositories
written in Python
Clear filter
A high-throughput and memory-efficient inference and serving engine for LLMs