CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

Welcome to the official implementation of CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models. We are thrilled to announce that our paper has been accepted for presentation at COLM 2024!

If you find our work useful in your research, please consider citing our paper:

@article{lee2024cats,
  title={CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models},
  author={Lee, Je-Yong and Lee, Donghyun and Zhang, Genghan and Tiwari, Mo and Mirhoseini, Azalia},
  journal={arXiv preprint arXiv:2404.08763},
  year={2024}
}

Overview

Our paper, "CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models," introduces CATS—a new method aimed at reducing the computational demands of deploying LLMs without sacrificing their performance on downstream tasks. This method centers around a novel activation function that enhances activation sparsity effectively and efficiently.

The CATS approach can be applied to various base models such as Mistral-7B and Llama2-7B, demonstrating a minimal performance drop (within 1-2% of the base models) even at 50% activation sparsity levels. Importantly, CATS not only accelerates convergence but also integrates a custom GPU kernel that enhances inference speeds by approximately 15%.

Reproducing Results

To reproduce the experimental results and figures presented in our work, please follow the steps outlined below. The process has been simplified into a single script to ensure ease of use and to maintain consistency across different environments.

Prerequisites

Ensure you have the following prerequisites installed:

Bash shell (Unix/Linux/Mac)
Required Python packages (listed in requirements.txt)
Set an accelerate configuration file based on your environment by running accelerate config

Steps

Open a terminal in the root directory of the project.
Run the following command:

bash reproduction_script.sh [path1] [path2]

[path1]: Directory where the checkpoints for fine-tuned models will be stored.
[path2]: Directory where the results of the experiments, such as figures and histograms, will be saved.

Work in progress

We are currently developing a framework that will enable CATS to be easily integrated with any model from the HuggingFace library.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
experiments		experiments
flash_gemv		flash_gemv
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
all-pkgs.txt		all-pkgs.txt
auto_model_register.py		auto_model_register.py
base-deps.txt		base-deps.txt
create_models.py		create_models.py
ds_config.json		ds_config.json
inference.py		inference.py
reproduction_script.sh		reproduction_script.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

Overview

Reproducing Results

Prerequisites

Steps

Work in progress

About

Releases

Packages

Languages

ScalingIntelligence/CATS

Folders and files

Latest commit

History

Repository files navigation

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

Overview

Reproducing Results

Prerequisites

Steps

Work in progress

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages