Skip to content

mm-vl/ULM-R1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Co-Reinforcement Learning for
Unified Multimodal Understanding and Generation

Paper Github Hugging Face Collection

Introduction

CoRL is a GRPO-based RL framework designed to simultaneously enhance the generation and understanding capabilities of ULMs within a shared policy optimization paradigm. It comprises a unified RL stage for joint optimization and a refined RL stage for task-specific enhancement.

method overview.

📢 Latest Updates

  • [2025-06-25] 📌 Code of TTRL for MM2T and T2I.
  • [2025-06-01] 📌 Core code of CoRL.

Environment

git https://github.com/mm-vl/ULM-R1.git
cd ULM-R1
conda create -n corl python=3.10 -y
conda activate corl
pip install -e .
pip install flash-attn --no-build-isolation
# pip install flash-attn --no-build-isolation --use-pep517

Pls refer to install.md for more details.

Training Data

training example for unified RL.

Training Pipeline

  • Unified RL
bash corl/scripts/corl_unified.sh

Test-Time Reinforcement Learning (TTRL)

We simply adapt the TTRL algorithm to multimodal understanding and text-to-image generation, aiming to explore the potential of RL in enhancing both understanding and generation performance at inference time.

Tip

required: trl>=0.18.1

multimodal understanding

bash ttrl/scripts/mm2t_mmmu.sh
bash ttrl/scripts/mm2t_mmstar.sh
MMMU MMStar
Janus-Pro-1B 36.3 43.1
+ TTRL 39.8 46.9

text-to-image generation

bash ttrl/scripts/t2i_geneval.sh
bash ttrl/scripts/t2i_unieval.sh
GenEval UniEval (UniScore)
Janus-Pro-1B 0.73 0.370
+ TTRL 0.76 0.455

Acknowledgement

Janus-Pro | open-r1-multimodal | R1-V

About

Co-Reinforcement Learning for Unified Multimodal Understanding and Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors