zyxxmu

💭

Study hard & Make progress

Yuxin Zhang zyxxmu

💭

Study hard & Make progress

Deep Neural Network Compression & Acceleration

62 followers · 35 following

Xiamen University
Xiamen of Fujian Province, China
https://zyxxmu.github.io/

Achievements

Stars

CodeCreator / datatools

Common tools for data processing

Python 22 3 Updated Dec 8, 2025

NVlabs / RocketKV

[ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression

Python 28 3 Updated Aug 7, 2025

Zefan-Cai / KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

Python 1,294 159 Updated Jan 4, 2025

princeton-nlp / HELMET

The HELMET Benchmark

Jupyter Notebook 198 37 Updated Dec 4, 2025

microsoft / MInference

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,170 73 Updated Sep 30, 2025

luogen1996 / RepAdapter

Official implementation of "Towards Efficient Visual Adaption via Structural Re-parameterization".

Python 184 17 Updated Apr 18, 2024

NVlabs / DoRA

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Python 897 61 Updated Oct 1, 2024

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,873 330 Updated Nov 28, 2025

jiaweizzhao / GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Python 1,637 163 Updated Oct 28, 2024

htqin / IR-QLoRA

[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Python 67 5 Updated Apr 15, 2024

OpenGVLab / DiffAgent

[CVPR 2024] DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

18 Updated Apr 16, 2024

FMInference / H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Python 495 71 Updated Aug 1, 2024

pratyushasharma / laser

The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Python 390 35 Updated Jul 9, 2024

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,164 395 Updated Jul 11, 2024

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

Python 12,481 1,979 Updated Dec 27, 2025

zyxxmu / DSnoT

Official Pytorch Implementation of Our Paper Accepted at ICLR 2024-- Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

Python 50 9 Updated Apr 9, 2024

luuyin / OWL

Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"

Python 74 8 Updated Jul 7, 2025

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 41,094 4,671 Updated Dec 24, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,022 1,599 Updated Dec 24, 2025

tloen / alpaca-lora

Instruct-tune LLaMA on consumer hardware

Jupyter Notebook 18,987 2,217 Updated Jul 29, 2024

OpenMOSS / MOSS

An open-source tool-augmented conversational language model from Fudan University

Python 12,084 1,140 Updated Jul 13, 2024

locuslab / wanda

A simple and effective LLM pruning approach.

Python 832 121 Updated Aug 9, 2024

km1994 / LLMsNineStoryDemonTower

【LLMs九层妖塔】分享 LLMs在自然语言处理（ChatGLM、Chinese-LLaMA-Alpaca、小羊驼 Vicuna、LLaMA、GPT4ALL等）、信息检索（langchain）、语言合成、语言识别、多模态等领域（Stable Diffusion、MiniGPT-4、VisualGLM-6B、Ziya-Visual等）等实战与经验。

2,142 210 Updated Mar 30, 2024

facebookresearch / ImageBind

ImageBind One Embedding Space to Bind Them All

Python 8,912 835 Updated Nov 21, 2025

hikvision-research / Unified-Normalization

# Unified Normalization (ACM MM'22) By Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, and Shiliang Pu. This repository is the official implementation of "Unifie…

Python 34 1 Updated Mar 16, 2023

OpenGVLab / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Python 5,930 383 Updated Mar 14, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 11,039 2,924 Updated Dec 23, 2025

IST-DASLab / sparsegpt

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Python 858 115 Updated Aug 20, 2024

liuxingbin / CAT

CAT: Collaborative Adversarial Training

Python 5 Updated May 31, 2023

huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 20,346 2,138 Updated Dec 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yuxin Zhang zyxxmu

Achievements

Achievements

Block or report zyxxmu

Stars

CodeCreator / datatools

NVlabs / RocketKV

Zefan-Cai / KVCache-Factory

princeton-nlp / HELMET

microsoft / MInference

luogen1996 / RepAdapter

NVlabs / DoRA

xlite-dev / Awesome-LLM-Inference

jiaweizzhao / GaLore

htqin / IR-QLoRA

OpenGVLab / DiffAgent

FMInference / H2O

pratyushasharma / laser

mit-han-lab / streaming-llm

NVIDIA / TensorRT-LLM

zyxxmu / DSnoT

luuyin / OWL

deepspeedai / DeepSpeed

NVIDIA / cutlass

tloen / alpaca-lora

OpenMOSS / MOSS

locuslab / wanda

km1994 / LLMsNineStoryDemonTower

facebookresearch / ImageBind

hikvision-research / Unified-Normalization

OpenGVLab / LLaMA-Adapter

EleutherAI / lm-evaluation-harness

IST-DASLab / sparsegpt

liuxingbin / CAT

huggingface / peft