🎯
Focusing
Ph.D. @ HKUST; B.Eng @ BUAA
-
The Hong Kong University of Science and Technology
- Hong Kong SAR
-
21:38
(UTC +08:00) - https://harahan.github.io/
- https://orcid.org/0009-0002-7898-8402
Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Stars
3
stars
written in C++
Clear filter
FlashMLA: Efficient Multi-head Latent Attention Kernels
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
[ACL 2025] RetroLLM: Empowering LLMs to Retrieve Fine-grained Evidence within Generation