Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
FlashMLA: Efficient Multi-head Latent Attention Kernels
Transformer related optimization, including BERT, GPT
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Revive unavailable songs for Netease Cloud Music
Revive unavailable songs for Netease Cloud Music