Skip to content

san-tian/C2C

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cache-to-Cache Logo

Cache-to-Cache

Direct Semantic Communication Between Large Language Models

🌐 Project Page📑 Paper🤗 HuggingFace🚀 Live Demo

原仓库采用门控将base model的KVCache与teacher model经过projector的KVCache拼接。本分支仅保留teacher model经过projector的KVCache,去除了base model的KVCache,从而对projector的效果进行消融。 具体实现请看这里

Demo

Environment Setup

Create a new environment:

conda create -n rosetta python=3.10
conda activate rosetta

Install the package:

pip install -e .

For training and evaluation, install additional dependencies:

pip install -e ".[training,evaluation]"

How to

Train C2C Projectors

Prepare a training configuration file in recipe/train_recipe/. Specify the base model, teacher model, projector type and parameters, training hyperparameters, dataset, and output directory. See recipe/train_recipe/C2C_0.6+0.5.json for a complete example.

Run training:

# Single GPU
python script/train/SFT_train.py --config recipe/train_recipe/C2C_0.6+0.5.json

# Multi-GPU
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
torchrun --nproc_per_node=8 script/train/SFT_train.py \
    --config recipe/train_recipe/C2C_0.6+0.5.json

Run PURE C2C training:

# example 1
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 script/train/SFT_train.py \
--config recipe/train_recipe/C2C_0.6+0.5_pure.json

# example 2
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 script/train/SFT_train.py \
--config recipe/train_recipe/C2C_base_2.5_teacher_3.json.json

During training, only the C2C projector parameters are updated while both source and target models remain frozen.

Evaluate C2C

Prepare an evaluation configuration file in recipe/eval_recipe/. Specify the model configuration with base model, teacher model, checkpoint directory, generation config, evaluation dataset, and output directory. See recipe/eval_recipe/unified_eval.yaml for a complete example.

Run evaluation:

python script/evaluation/unified_evaluator.py --config recipe/eval_recipe/unified_eval.yaml

About

The official code implementation for "Cache-to-Cache: Direct Semantic Communication Between Large Language Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 98.6%
  • Other 1.4%