Skip to content
View zyxxmu's full-sized avatar
💭
Study hard & Make progress
💭
Study hard & Make progress

Block or report zyxxmu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Common tools for data processing

Python 22 3 Updated Dec 8, 2025

[ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression

Python 28 3 Updated Aug 7, 2025

Unified KV Cache Compression Methods for Auto-Regressive Models

Python 1,294 159 Updated Jan 4, 2025

The HELMET Benchmark

Jupyter Notebook 198 37 Updated Dec 4, 2025

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,170 73 Updated Sep 30, 2025

Official implementation of "Towards Efficient Visual Adaption via Structural Re-parameterization".

Python 184 17 Updated Apr 18, 2024

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Python 897 61 Updated Oct 1, 2024

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,873 330 Updated Nov 28, 2025

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Python 1,637 163 Updated Oct 28, 2024

[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Python 67 5 Updated Apr 15, 2024

[CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

18 Updated Apr 16, 2024

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python 495 71 Updated Aug 1, 2024

The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Python 390 35 Updated Jul 9, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,164 395 Updated Jul 11, 2024

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,481 1,979 Updated Dec 27, 2025

Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

Python 50 9 Updated Apr 9, 2024

Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"

Python 74 8 Updated Jul 7, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 41,094 4,671 Updated Dec 24, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,022 1,599 Updated Dec 24, 2025

Instruct-tune LLaMA on consumer hardware

Jupyter Notebook 18,987 2,217 Updated Jul 29, 2024

An open-source tool-augmented conversational language model from Fudan University

Python 12,084 1,140 Updated Jul 13, 2024

A simple and effective LLM pruning approach.

Python 832 121 Updated Aug 9, 2024

【LLMs九层妖塔】分享 LLMs在自然语言处理(ChatGLM、Chinese-LLaMA-Alpaca、小羊驼 Vicuna、LLaMA、GPT4ALL等)、信息检索(langchain)、语言合成、语言识别、多模态等领域(Stable Diffusion、MiniGPT-4、VisualGLM-6B、Ziya-Visual等)等 实战与经验。

2,142 210 Updated Mar 30, 2024

ImageBind One Embedding Space to Bind Them All

Python 8,912 835 Updated Nov 21, 2025

# Unified Normalization (ACM MM'22) By Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, and Shiliang Pu. This repository is the official implementation of "Unifie…

Python 34 1 Updated Mar 16, 2023

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Python 5,930 383 Updated Mar 14, 2024

A framework for few-shot evaluation of language models.

Python 11,039 2,924 Updated Dec 23, 2025

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Python 858 115 Updated Aug 20, 2024

CAT: Collaborative Adversarial Training

Python 5 Updated May 31, 2023

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 20,346 2,138 Updated Dec 18, 2025
Next