📚VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation

📌 Benchmark for Utility of Retrieved Documents
🏆 COLING 2025

📢 Authors

🟨 Hyeonseok Lim
🟨 Dongjae Shin
🟨 Seohyun Song
🟨 Inho Won
🟨 Minjun Kim
🟪 Junghun Yuk
🟪 Haneol Jang
🟨 KyungTae Lim

Affiliations:
🟨 Seoul National University of Science and Technology
🟪 Hanbat National University

🔗 Links

📖 Abstract

VLR-Bench
- We propose VLR-BENCH, a visual question answering (VQA) benchmark for evaluating vision-language models (VLMs) using retrieval-augmented generation (RAG).
- VLR-BENCH includes five input passages, allowing models to determine which passage is most relevant for answering a query—an aspect often overlooked in prior research.
VLR-IF
- We introduce VLR-IF, a dataset of 32,000 instruction-following examples to enhance VLMs' ability to generate accurate responses from retrieved information.
Open-Source
- Both VLR-BENCH and VLR-IF datasets are publicly available online.

📊 VLR-Bench Dataset

📌 Dataset Summary

150 images from BOK-VQA
150 images from Wikimedia Commons (reflecting cultural elements)
Multilingual Parallel Corpus: English, Chinese, and Korean
📂 Dataset Link

📷 Example Images from VLR-Bench

📊 VLR-IF Dataset

📌 Dataset Summary

9,000 images from COCO
32,000 entries (valid/invalid passages)
Languages: English, Chinese, Korean
📂 Dataset Link

📷 Example Images from VLR-IF

📜 BibTeX

@article{lim2024vlr, title={VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation}, author={Lim, Hyeonseok and Shin, Dongjae and Song, Seohyun and Won, Inho and Kim, Minjun and Yuk, Junghun and Jang, Haneol and Lim, KyungTae}, journal={arXiv preprint arXiv:2412.10151}, year={2024} }

@inproceedings{lim-etal-2025-vlr, title = "{VLR}-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation", author = "Lim, Hyeonseok and Shin, Dongjae and Song, Seohyun and Won, Inho and Kim, Minjun and Yuk, Junghun and Jang, Haneol and Lim, KyungTae", booktitle = "Proceedings of the 31st International Conference on Computational Linguistics", month = jan, year = "2025", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.coling-main.411/" }

📢 Acknowledgement

This work was supported by:

Institute of Information & Communications Technology Planning & Evaluation (IITP)
Artificial Intelligence Industrial Convergence Cluster Development Project

⚠️ Usage and License Notices

The data and code are intended and licensed for research use only.
They must comply with the license agreement of GPT-4.
The dataset is released under CC BY NC 4.0 (non-commercial use only).
Models trained using this dataset should not be used for non-research purposes.

🛠 How to Use

Clone this repository:

git clone https://github.com/MLP-Lab/VLR-Bench.git
cd VLR-Bench

Install dependencies:
```
pip install -r requirements.txt
```
Generate the model's inference results:
- The dataset can be loaded directly from Hugging Face.
- You can choose a specific language ('en', 'ko', or 'zh') from the language column and use the filtered dataset for inference.
- Use the following code to load the dataset and filter by language:
```
from datasets import load_dataset

dataset = load_dataset("MLP-KTLim/VLR-Bench")

# Select a specific language ('en', 'ko', or 'zh')
selected_language = "en"  # Change to 'ko' or 'zh' if needed
filtered_data = dataset.filter(lambda x: x["language"] == selected_language)
```
Prepare the inference result JSON file:
The JSON file containing the inference results of the model to be evaluated must include the following fields:

{
  "result": "The model's inference output",
  "label": "The output value from MLP-KTLim/VLR-Bench(filtered_data)",
  "answer_keyword1": "The keyword1 value from MLP-KTLim/VLR-Bench(filtered_data)",
  "answer_keyword2": "The keyword2 value from MLP-KTLim/VLR-Bench(filtered_data)"
}

Run eval:

sh eval.sh

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
images		images
src		src
README.md		README.md
eval.sh		eval.sh
requirements.text		requirements.text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📚VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation

📢 Authors

🔗 Links

📖 Abstract

📊 VLR-Bench Dataset

📊 VLR-IF Dataset

📜 BibTeX

📢 Acknowledgement

⚠️ Usage and License Notices

🛠 How to Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📚VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation

📢 Authors

🔗 Links

📖 Abstract

📊 VLR-Bench Dataset

📊 VLR-IF Dataset

📜 BibTeX

📢 Acknowledgement

⚠️ Usage and License Notices

🛠 How to Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages