Highlights
- Pro
Stars
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
Official Repo for paper "VLCache: Computing 2% Vision Tokens and Reusing 98% for Vision-Language Inference"
[NeurIPS'25] The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing"
A safetensors extension to efficiently store sparse quantized tensors on disk
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)
HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling
Fully open reproduction of DeepSeek-R1
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Llama中文社区,实时汇总最新Llama学习资料,构建最好的中文Llama大模型开源生态,完全开源可商用
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
用于从头预训练+SFT一个小参数量的中文LLaMa2的仓库;24G单卡即可运行得到一个具备简单中文问答能力的chat-llama2.
Google Research
PyTorch implementation of the paper: Long-tail Learning via Logit Adjustment
EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit