Stars
3
results
for sponsorable starred repositories
written in Python
Clear filter
A high-throughput and memory-efficient inference and serving engine for LLMs
Core Python libraries ported to MicroPython
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".