FastLM

CXL-SpecKV Public

[FPGA'26 Best Paper Nomination] CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving

C++ 34 6

tinyserve-vllm Public

[ACM MM 2025 Oral] TinyServe: Query-Aware Page Allocation Optimization

Shell 12 2

CSV-Decode Public

CSV-Decode: Certifiable Sub-Vocabulary Decoding for Efficient Large Language Model Inference

Python 12

SPI_VecDB Public

[VecDB @ VLDB 2026] SPI: Query-Depth-Adaptive Indexing for Streaming RAG in Vector Databases

Go 10

HSGM Public

[ICPADS 2025 Oral, *SEM 2025 Oral] HSGM: Hierarchical Segment-Graph Memory for Scalable Long-Text Semantics

Python 8

MKA Public

[ACM CF'26 Oral] MKA: Memory-Keyed Attention for Efficient Long-Context Reasoning

Python 8 1

Provide feedback