π Official Implementation of GeRe - A novel replay framework for continual learning of Large Language Models, designed to mitigate forgetting through general samples replay and activation state constraint.
Traditional replay vs. GeRe: unlike traditional replay requiring laborious collection of an increasing set of downstream replay samples, GeRe simply employs a fixed set of general replay samples to not only retain general capabilities in continual learning, but also enhance the overall performance of learned downstream tasks. The blue oval is the threshold-based margin loss that imposes consistency constraint on neural activation state under GeRe frameworks.
- [2025/08/06] π Initial release of Paper and GeRe framework! GeRe is now available as a plug-and-play package, enabling seamless integration into your existing applications.
git clone https://github.com/Qznan/GeRe
cd GeReπ‘ Tip: We recommend the following package version for optimal compatibility (Optional):
torch==2.2.0 transformers==4.44.2 tokenizers==0.19.1 accelerate==0.30.1 deepspeed=0.14.4
python train_demo.pybash run_multi_gpu.shπ‘ Note: The demo automatically uses a lightweight base_llms/llama-3.1-tiny-random base model for fast verification.
β‘ All you need to do is replace the official default Trainer in huggingface transformers with our GeReTrainer and configure the GeRe arguments:
# # Initialize default Trainer
# trainer = Trainer(
# model=model,
# args=training_args,
# train_dataset=dataset,
# data_collator=data_collator,
# tokenizer=tokenizer,
# )
from gere import GeReTrainer
# initialize GeRe Trainer
trainer = GeReTrainer(
model=model,
args=training_args,
train_dataset=dataset,
data_collator=data_collator,
tokenizer=tokenizer,
# GeRe-specific configurations:βββ
gere_hidden_state_saving_dir='./tiny_gere_saving', # Dir to save GeRe hidden states and statistics
reuse_gere_hidden_state=True, # If False, will force regeneration of hidden states and statistics in the specified directory,
# but notice existing hidden states will be skipped (Generate missing hidden states and update statistics)
num_interpolate_per_batch=0, # BI ratio. set to 0 or None to disable.
w_strategy='100' # weight strategy of margin loss. ['1', '100', 'dy'], dy means dynamic
)| Method | Humanities | STEM | Social Sciences | Other | Average |
|---|---|---|---|---|---|
| Trainer | 56.32 | 51.98 | 72.60 | 69.30 | 61.79 |
| GeReTrainer | 59.23 (+2.9) | 54.74 (+2.8) | 75.20 (+2.6) | 71.93 (+2.6) | 64.54 (+2.75) |
- Both tested on LLaMA-3.1-8B with identical training settings
- Both trained on yelp datasets for 3 epochs
- num_interpolate_per_batch=0, w_strategy='dy'
If you want integrate GeRe with LLaMA-Factory's SFT training, please modify the trainer.py file in LLaMA-Factory/src/llamafactory/train/sft/trainer.py as follows:
# test llamafactory version: 0.9.4.dev0
...
...
...
logger = logging.get_logger(__name__)
import sys; sys.path.insert(0, 'dir/to/gere') # GeRe add
from gere import GeReTrainer # GeRe add
class CustomSeq2SeqTrainer(GeReTrainer): # GeRe modify
r"""Inherits Seq2SeqTrainer to compute generative metrics such as BLEU and ROUGE."""
def __init__(
self,
finetuning_args: "FinetuningArguments",
processor: Optional["ProcessorMixin"],
gen_kwargs: Optional[dict[str, Any]] = None,
**kwargs,
) -> None:
if is_transformers_version_greater_than("4.46"):
kwargs["processing_class"] = kwargs.pop("tokenizer")
else:
self.processing_class: PreTrainedTokenizer = kwargs.get("tokenizer")
super().__init__(
gere_hidden_state_saving_dir='./gere_saving', # Set GeRe-specific configuration here!
# w_strategy='100', # [option], default value refer to GeReTrain
**kwargs)
if processor is not None:
...
...
...- π§© Plug-and-Play - Simply replace your existing
Transformers.Trainerwith ourGeReTrainer - π Continual Learning - Better general capabilities retention for LLMs in finetuning downstream tasks
| Feature | Traditional Methods | GeRe |
|---|---|---|
| Forgetting | β Severe | β Reduced |
| Downstream tasks replay | β Required | β Free |
| Implementation | β Miscellaneous | β Simple |
GeRe/
βββ gere/ # Core package
β βββ __init__.py
β βββ gere_trainer.py # Main GeRe trainer class ποΈ
β βββ gere_dataset.py # General replay samples π§
βββ ckpts/ # Save Checkpoint files βοΈ
βββ base_llms/ # Base LLMs Models for run βοΈ
βββ train_demo.py # Quick start example for single-gpu π
βββ train_demo_multi_gpu.py # Quick start example for multi-gpu π
βββ run_multi_gpu.sh # Quick start example for multi-gpu π
βββ requirements.txt # Dependencies π¦
If GeRe is helpful, please kindly cite as:
@misc{zhang2025gereantiforget,
title={GeRe: Towards Efficient Anti-Forgetting in Continual Learning of LLM via General Samples Replay},
author={Yunan Zhang and Shuoran Jiang and Mengchen Zhao and Yuefeng Li and Yang Fan and Xiangping Wu and Qingcai Chen},
year={2025},
eprint={2508.04676},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.04676},
}GeRe is released under the Apache License 2.0.
We sincerely appreciate your support! Please consider giving us a star β on GitHub to stay updated with the latest developments.
We welcome all forms of contributions, including new features, code improvements, or documentation enhancements.
For questions or suggestions, please open an issue or contact us via email.