29 Oct 25

A toolkit for converting PDFs and other image-based document formats into clean, readable, plain text format.

Try the online demo: https://olmocr.allenai.org/

Features:

Convert PDF, PNG, and JPEG based documents into clean Markdown Support for equations, tables, handwriting, and complex formatting Automatically removes headers and footers Convert into text with a natural reading order, even in the presence of figures, multi-column layouts, and insets Efficient, less than $200 USD per million pages converted (Based on a 7B parameter VLM, so it requires a GPU)

by tmfnk 1 month ago