原仓库采用门控将base model的KVCache与teacher model经过projector的KVCache拼接。本分支仅保留teacher model经过projector的KVCache,去除了base model的KVCache,从而对projector的效果进行消融。 具体实现请看这里。
Create a new environment:
conda create -n rosetta python=3.10
conda activate rosettaInstall the package:
pip install -e .For training and evaluation, install additional dependencies:
pip install -e ".[training,evaluation]"Prepare a training configuration file in recipe/train_recipe/. Specify the base model, teacher model, projector type and parameters, training hyperparameters, dataset, and output directory. See recipe/train_recipe/C2C_0.6+0.5.json for a complete example.
Run training:
# Single GPU
python script/train/SFT_train.py --config recipe/train_recipe/C2C_0.6+0.5.json
# Multi-GPU
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
torchrun --nproc_per_node=8 script/train/SFT_train.py \
--config recipe/train_recipe/C2C_0.6+0.5.json# example 1
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 script/train/SFT_train.py \
--config recipe/train_recipe/C2C_0.6+0.5_pure.json
# example 2
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 script/train/SFT_train.py \
--config recipe/train_recipe/C2C_base_2.5_teacher_3.json.json
During training, only the C2C projector parameters are updated while both source and target models remain frozen.
Prepare an evaluation configuration file in recipe/eval_recipe/. Specify the model configuration with base model, teacher model, checkpoint directory, generation config, evaluation dataset, and output directory. See recipe/eval_recipe/unified_eval.yaml for a complete example.
Run evaluation:
python script/evaluation/unified_evaluator.py --config recipe/eval_recipe/unified_eval.yaml