-
Zhejiang University
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
A guidance language for controlling large language models.
An open-source, low-code machine learning library in Python
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
A throughput-oriented high-performance serving framework for LLMs
Disaggregated serving system for Large Language Models (LLMs).
FastKAN: Very Fast Implementation of Kolmogorov-Arnold Networks (KAN)
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an LLM (with low latency overhead!)