Xiang Feng1,
Jiawei Zhou1,
Zhangfeng Huang2,
Kewei Wang3
Shanshan Ye4,
Jinxin Hu2,
Zulong Chen2,†,
Yong Luo1,†,
Jing Zhang1,†,‡
1 School of Computer Science, National Engineering Research Center for Multimedia Software
and Hubei Key Laboratory of Multimedia and Network Communication Engineering,
Wuhan University, China
2 Alibaba Group, Hangzhou, China
3 Department of Electronic Engineering and Information Science,
University of Science and Technology of China, China
4 Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence, United Arab Emirates
† Corresponding author, ‡ Project leader
DocScope is a benchmark for evaluating trustworthy long-document understanding. It tests whether multimodal large language models can produce verifiable reasoning trajectories over complete PDF documents, including evidence pages, grounded evidence regions, factual statements, and final answers.
Figure 1. Overview of DocScope.
@article{feng2026docscopebenchmarkingverifiablereasoning,
title={DocScope: Benchmarking Verifiable Reasoning for Trustworthy Long-Document Understanding},
author={Xiang Feng and Jiawei Zhou and Zhangfeng Huang and Kewei Wang and Shanshan Ye and Jinxin Hu and Zulong Chen and Yong Luo and Jing Zhang},
journal={arXiv preprint arXiv:2605.08888},
year={2026},
}