🎯
Focusing
Undergrad @ SJTU ACM Class; RA @uw-syfi . Distributed & ML Systems.
-
Shanghai Jiao Tong University
- Seattle, WA ⇌ Shanghai, China
-
04:17
(UTC -08:00) - https://conless.dev/
- @conlesspan
Highlights
- Pro
Stars
3
stars
written in Cuda
Clear filter
FlashInfer: Kernel Library for LLM Serving
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference