-
Univ. of Sci. & Tech. of China (USTC)
- China
- https://gitee.com/wangxuan95
- https://www.zhihu.com/people/wang-xuan-12-89/posts
AI&LLM
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Using LLM to evaluate MMLU dataset.
A collection of benchmarks and datasets for evaluating LLM.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches. EMNLP Findings 2024
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Generative Models by Stability AI
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
Daily updated LLM papers. 每日更新 LLM 相关的论文,欢迎订阅 👏 喜欢的话动动你的小手 🌟 一个
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization