02 Nov 25
Datalab’s Chandra topped independent benchmarks and beat the previously best dots-ocr.
- Support for 40+ languages
- Handles text, tables, formulas seamlessly
by tmfnk
1 month ago
29 Oct 25
A toolkit for converting PDFs and other image-based document formats into clean, readable, plain text format.
Try the online demo: https://olmocr.allenai.org/
Features:
Convert PDF, PNG, and JPEG based documents into clean Markdown Support for equations, tables, handwriting, and complex formatting Automatically removes headers and footers Convert into text with a natural reading order, even in the presence of figures, multi-column layouts, and insets Efficient, less than $200 USD per million pages converted (Based on a 7B parameter VLM, so it requires a GPU)
by tmfnk
1 month ago
30 Aug 25
by shubxam
3 months ago