Skip to content

liumy2010/UFT

Repository files navigation

image

Results

Accuracy of different algorithms averaged over Qwen2.5-0.5/1.5/3B

image

Accuracy of different algorithms on Qwen2.5-0.5B

image

Accuracy of different algorithms on Qwen2.5-3B

image

Installation

conda create -n uft python=3.9
conda activate uft
bash install.sh

Usage

Training

python run.py
  --algo              Algorithm to use: {sft, rft, stage, r3, uft}
  --n_gpu             Number of GPUs
  --visible-devices   GPU index to use, e.g., "0,1,2,3"
  --T                 Total training steps (default: 500)
  --T_hint            Maximum training steps with hint (default: 300)
  --data              Dataset: {countdown,math,kk_logic,others}
  --model             Model name (e.g., Qwen2.5-1.5B)
  --tp_size           
  --eval              Triggered to evaluate the model, otherwise training
  --idx IDX           Index of the current process (default=0)
  --sft_loss_coef     Coefficient for the additional log-likelihood term on hint
  --n_rollout        Number of trajectory rollouts (default 4)

Example

python run.py --model Qwen/Qwen2.5-1.5B --data countdown

Requirement

  • Qwen2.5-0.5/1.5B and Llama-3.2-1B: 2 H100
  • Qwen2.5-3B and Llama-3.2-3B: 4 H100

Qwen2.5-0.5/1.5B / Llama-3.2-1B can be trained with 1 H100 by setting n_rollouts=2

Major Modifications from VERL

Evaluate

Change model and dataset to the the model name (e.g., Qwen/Qwen2.5-1.5B) and dataset name (e.g., countdown) to evaluate

python run.py --model {model} --data {dataset} --eval

Acknowledgement

Citation

@article{UFT,
author       = {Liu, Mingyang and Farina, Gabriele and Ozdaglar, Asuman},
title        = {UFT: Unifying Supervised and Reinforcement Fine-Tuning},
journal      = {arXiv preprint arXiv:2505.16984},
year         = {2025}
}

About

UFT: Unifying Supervised and Reinforcement Fine-Tuning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors