Cache-to-Cache

Direct Semantic Communication Between Large Language Models

🌐 Project Page • 📑 Paper • 🤗 HuggingFace • 🚀 Live Demo

原仓库采用门控将base model的KVCache与teacher model经过projector的KVCache拼接。本分支仅保留teacher model经过projector的KVCache，去除了base model的KVCache，从而对projector的效果进行消融。具体实现请看这里。

Demo

Environment Setup

Create a new environment:

conda create -n rosetta python=3.10
conda activate rosetta

Install the package:

pip install -e .

For training and evaluation, install additional dependencies:

pip install -e ".[training,evaluation]"

How to

Train C2C Projectors

Prepare a training configuration file in recipe/train_recipe/. Specify the base model, teacher model, projector type and parameters, training hyperparameters, dataset, and output directory. See recipe/train_recipe/C2C_0.6+0.5.json for a complete example.

Run training:

# Single GPU
python script/train/SFT_train.py --config recipe/train_recipe/C2C_0.6+0.5.json

# Multi-GPU
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
torchrun --nproc_per_node=8 script/train/SFT_train.py \
    --config recipe/train_recipe/C2C_0.6+0.5.json

Run PURE C2C training:

# example 1
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 script/train/SFT_train.py \
--config recipe/train_recipe/C2C_0.6+0.5_pure.json

# example 2
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 script/train/SFT_train.py \
--config recipe/train_recipe/C2C_base_2.5_teacher_3.json.json

During training, only the C2C projector parameters are updated while both source and target models remain frozen.

Evaluate C2C

Prepare an evaluation configuration file in recipe/eval_recipe/. Specify the model configuration with base model, teacher model, checkpoint directory, generation config, evaluation dataset, and output directory. See recipe/eval_recipe/unified_eval.yaml for a complete example.

Run evaluation:

python script/evaluation/unified_evaluator.py --config recipe/eval_recipe/unified_eval.yaml

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
bash		bash
recipe		recipe
resource		resource
rosetta		rosetta
script		script
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
package.json		package.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cache-to-Cache

Direct Semantic Communication Between Large Language Models

Demo

Environment Setup

How to

Train C2C Projectors

Evaluate C2C

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cache-to-Cache

Direct Semantic Communication Between Large Language Models

Demo

Environment Setup

How to

Train C2C Projectors

Evaluate C2C

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages