🐒
Making AI Safer
Making AI Safer. Focus on LLM、RL、Infra
Stars
3
stars
written in C++
Clear filter
FlashMLA: Efficient Multi-head Latent Attention Kernels
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention