DocVXQA

DocVXQA: Context-Aware Visual Explanations for Document Question Answering

Description

PyTorch implementation of our ICML 2025 paper DocVXQA: Context-Aware Visual Explanations for Document Question Answering. This model not only produces accurate answers to questions grounded in document images but also generates visual explanations — heatmaps that highlight semantically and contextually important regions, enabling interpretability in document understanding tasks.

Installation

clone the repository:

git clone https://github.com/dali92002/DocVXQA
cd DocVXQA

Create a virtual environment and install dependencies

conda env create -f environment.yml
conda activate docvxqa

Usage

Download Weights

You can download the pretrained model weights from This link

After downloading, place the weights in your preferred directory.

Demo

You can try out the model quickly using our provided Jupyter notebook demo.ipynb.

Training

Data preparation

First, a similarity map should be extracted using ColPali. For each data point, two maps are generated: one between the question and the document image, and another between the answer and the document image. These maps are stored and later used in the dataloader for training with the token interactions loss. For an example implementation of similarity map extraction, see this reference, though you are free to implement your own approach.

model training

After setting your desired args you can simply train with

python train.py

Evaluation

After setting your desired args you can evaluate with

python evaluate.py

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License 🛡.

Citation

If you find this useful for your research, please cite it as follows:

@inproceedings{
souibgui2025docvxqa,
title={Doc{VXQA}: Context-Aware Visual Explanations for Document Question Answering},
author={Mohamed Ali Souibgui and Changkyu Choi and Andrey Barsky and Kangsoo Jung and Ernest Valveny and Dimosthenis Karatzas},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=wex0vL4c2Y}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
imgs		imgs
models		models
LICENSE		LICENSE
README.md		README.md
arg_utils.py		arg_utils.py
colpali_mask_generator.py		colpali_mask_generator.py
data.py		data.py
demo.ipynb		demo.ipynb
environment.yml		environment.yml
evaluate.py		evaluate.py
fine_tune_models_docvqa.py		fine_tune_models_docvqa.py
metrics.py		metrics.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DocVXQA

Table of Contents

Description

Installation

Usage

Download Weights

Demo

Training

Data preparation

model training

Evaluation

License

Citation

About

Uh oh!

Releases

Packages

Languages

License

dali92002/DocVXQA

Folders and files

Latest commit

History

Repository files navigation

DocVXQA

Table of Contents

Description

Installation

Usage

Download Weights

Demo

Training

Data preparation

model training

Evaluation

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages