Skip to content

NEUIR/ReAlign

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment

GitHub arXiv HuggingFace

Hao Yang1, Yifan Ji1, Zhipeng Xu1, Zhenghao Liu1, Yukun Yan2, Zulong Chen3, Shuo Wang2, Yu Gu1, Ge Yu1

1Northeastern University, 2Tsinghua University, 3Alibaba Group

Overview

We introduce Reasoning-Guided Alignment (ReAlign), a method that enhances visual document retrieval by leveraging the reasoning capability of VLMs to provide fine-grained visual document descriptions as supervision signals for training. The framework supports multiple multimodal backbone models including Phi3 Vision and Qwen2.5 VL.

Our work is accepted by SIGIR 2026 🎉🎉🎉!

If you find this project useful, please give us a star🌟.

method

Collections

We have made the following resources available on 🤗ReAlign collection.

Resource Description Link
ReAlign-Phi3v The visual document retriever based on Phi-3-vision-128k-instruct 🤗ReAlign-Phi3v
ReAlign-Qwen The visual document retriever based on Qwen2.5-VL-7B-Instruct 🤗ReAlign-Qwen
Training Data The data used to train the ReAlign retriever 🤗ReAlign-Trainset
ReAlign-Set All-in-one package: model weights, training set, and evaluation set 🤗ReAlign-Set

Setup

(1) Clone this repository:

git clone git@github.com:NEUIR/ReAlign.git
cd ReAlign

(2) Create and activate a Conda environment (Python 3.10):

conda create -n realign python=3.10 -y
conda activate realign

(3) Install dependencies and the editable package:

pip install -r requirements.txt
pip install -e .

Training

1. Prepare Data and Model Paths

Use the following command to download all required data, including model checkpoints, training set, and evaluation set:

huggingface-cli download --repo-type dataset yanghaoir/ReAlign-Set --local-dir ./dataset

By default, the paths in this file work out of the box and no changes are needed. If you need to customize model or dataset locations, edit config/dir_config.sh, which looks like:

export REALIGN_TRAIN_DATASET_PATH="path/to/train_data"

This file does not need to be run manually — it is sourced automatically during training and evaluation.

2. Build Synthetic Training Data (Optional)

If you want to reproduce the data construction pipeline from scratch, run the following steps after downloading the dataset:

# Step 1: Extract corpus images from parquet shards
python src/realign/data_construction/data_unzip.py

# Step 2: Call the grounding model to build synthetic annotations
export DASHSCOPE_API_KEY="your-key"
python src/realign/data_construction/build_from_parquet.py

The output CSV will be saved to synthetic_data/OpenDocVQA-Query-1.csv. The pre-built training data is already included in dataset/train_data/train.parquet, so this step can be skipped if you do not need to regenerate it.

3. Create Log Directory

mkdir -p log

4. Run Training

Phi3 Vision:

bash sh/train_phi3v.sh > log/realign-phi3v.log 2>&1

Qwen2.5 VL:

bash sh/train_qwen.sh > log/realign-qwen.log 2>&1

Evaluation

The second argument of each evaluation script is a comma-separated list of GPU IDs. The examples below use four GPUs; adjust to match your hardware (e.g., use 0 for a single GPU).

Phi3 Vision:

bash sh/eval.sh realign-phi3v 0,1,2,3

Qwen2.5 VL:

bash sh/eval_qwen.sh realign-qwen 0,1,2,3

Acknowledgement

Part of our code and data are built upon the following works. We sincerely thank the authors for their contributions.

Citation

@article{yang2026realign,
      title={ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment},
      author={Yang, Hao and Ji, Yifan and Xu, Zhipeng and Liu, Zhenghao and Yan, Yukun and Chen, Zulong and Wang, Shuo and Gu, Yu and Yu, Ge},
      year={2026}
      url={https://arxiv.org/abs/2604.07419}, 
}

Contact

If you have questions, suggestions, and bug reports, please email:

yanghao123@mails.neu.edu.cn

About

[SIGIR '26] This is the code repo for our SIGIR’26 paper "ReAlign: Optimizing the Visual Document Retriever with Reasoning-Guided Fine-Grained Alignment".

Resources

License

Stars

Watchers

Forks

Contributors