This repository provides a comparative evaluation of DeepSeek-OCR and OLMOCR-2 on the OmniDocBench benchmark. The evaluation assesses document parsing capabilities across text, formulas, tables, and reading order.
- DeepSeek-OCR: A vLLM-based multimodal pipeline for document understanding.
- OLMOCR-2: An efficient OCR system using open visual language models.
- OmniDocBench: A comprehensive benchmark with 1,355 annotated PDF pages covering diverse document types.
Follow the setup instructions in OmniDocBench/README.md.
Follow the setup instructions in olmocr/README.md.
Follow the installation guide in DeepSeek-OCR-master/README.md.
We used the HuggingFace version and based all our evals on it.
Can be found at link
For olmOCR2, convert the images to PDFs using the following
python utils/image_to_pdf.py
Our outputs can be found in the 'markdown_olmo_ocr_2' folder.
-
Navigate to the DeepSeek-OCR directory:
cd DeepSeek-OCR-master/DeepSeek-OCR-vllm -
Configure paths in
config.py:- Set
INPUT_PATHto the OmniDocBench images directory (e.g.,../../OmniDocBench/images/) - Set
OUTPUT_PATHto a directory for output .md files (e.g.,../../outputs/deepseek_ocr/)
- Set
-
Run inference on images:
python run_dpsk_ocr_eval_batch.py
This will process all images and generate corresponding
.mdfiles in the output directory. Remember to use 'cleaned' .md files for evaluation, which can be found in./tools/cleaned_markdown/that we generated.
-
Navigate to the olmocr directory:
cd olmocr -
Run inference on PDFs:
python -m olmocr.pipeline ./localworkspace --markdown --pdfs tests/gnarly_pdfs/*.pdfReplace
tests/gnarly_pdfs/with a workspace directory, that includes your pdf files.The
--markdownflag ensures.mdfiles are generated in the workspace'smarkdown/subdirectory.
Use OmniDocBench's evaluation scripts to compare the generated outputs.
-
Configure
OmniDocBench/configs/md2md.yaml:- Set
ground_truth.data_pathtoOmniDocBench/OmniDocBench.json - Set
ground_truth.page_infotoOmniDocBench/OmniDocBench.json - Set
prediction.data_pathto the directory containing model outputs (e.g.,outputs/deepseek_ocr/orolmocr_workspace/markdown/)
- Set
-
Run evaluation:
cd OmniDocBench python pdf_validation.py --config configs/end2end.yaml
After evaluation, results are stored in OmniDocBench/result/. Use the notebooks in OmniDocBench/tools/ to generate comparison tables and visualizations.
You can find our results in results folder too!
Key metrics include:
- Text accuracy (normalized edit distance)
- Formula accuracy (Edit dist score)
- Table TEDS score
- Reading order accuracy
- Overall score: ((1 - text_edit) × 100 + table_teds + (1 - edit_distance) × 100) / 3
Based on the evaluation:
- DeepSeek-OCR achieves an overall accuracy of 84.24%
- OLMOCR-2 achieves an overall accuracy of 81.56%
- DeepSeek-OCR shows strengths in text and table recovery
- Both models perform well on reading order but have room for improvement in formula parsing
See REPORT.md for detailed results and visualizations.