Importance Weighted Supervised Fine Tuning (iw-SFT)

Author's Pytorch implementation of Importance Weighted Supervised Fine Tuning (iw-SFT). Iw-SFT uses importance weights to adaptively upweight or downweight points during training; we show this provides a much tighter bound to the RL training objective in comparison to SFT alone.

We have also published the model we have trained on Hugging Face: ChongliQin/iw-SFT-32B

For Arxiv check link here. For blog post check out link here.

For more general information see here.

Overview of the Code

There are two python files, bounding_trainers.py contains iw-sft interpreted and described in our paper. In order to run this code, you need 8 GPUs, first you need to install requirements.txt. We recommend using uv such as below:

pip install uv
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt

Running the code

To run this code simply do:

./iw_sft.sh

Make sure to set up your wandb when you first run.

Evaluating the code

This code is cloned from the S1 repository to run:

cd eval/lm-evaluation-harness
pip install -e .[math,vllm]

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
eval		eval
README.md		README.md
bounding_trainers.py		bounding_trainers.py
deepspeed_zero3_cpu_offload.yaml		deepspeed_zero3_cpu_offload.yaml
iw_sft.sh		iw_sft.sh
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Importance Weighted Supervised Fine Tuning (iw-SFT)

Overview of the Code

Running the code

Evaluating the code

About

Uh oh!

Releases

Packages

Languages

emmyqin/iw_sft

Folders and files

Latest commit

History

Repository files navigation

Importance Weighted Supervised Fine Tuning (iw-SFT)

Overview of the Code

Running the code

Evaluating the code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages