Stars
🚀 Efficient implementations for emerging model architectures
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
SGLang is a high-performance serving framework for large language models and multimodal models.
LLM Inference with Deep Learning Accelerator.
60
Updated Jan 23, 2025
Fully open reproduction of DeepSeek-R1