-
Xiamen University
- Xiamen of Fujian Province, China
- https://zyxxmu.github.io/
Stars
[ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
Unified KV Cache Compression Methods for Auto-Regressive Models
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…
Official implementation of "Towards Efficient Visual Adaption via Structural Re-parameterization".
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
[CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Instruct-tune LLaMA on consumer hardware
An open-source tool-augmented conversational language model from Fudan University
【LLMs九层妖塔】分享 LLMs在自然语言处理(ChatGLM、Chinese-LLaMA-Alpaca、小羊驼 Vicuna、LLaMA、GPT4ALL等)、信息检索(langchain)、语言合成、语言识别、多模态等领域(Stable Diffusion、MiniGPT-4、VisualGLM-6B、Ziya-Visual等)等 实战与经验。
ImageBind One Embedding Space to Bind Them All
# Unified Normalization (ACM MM'22) By Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, and Shiliang Pu. This repository is the official implementation of "Unifie…
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
A framework for few-shot evaluation of language models.
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.