25 Nov 25
OCR Arena is a free playground for testing and evaluating leading foundation VLMs and open source OCR models side-by-side. Upload a document, measure accuracy, and vote for the best models on a public leaderboard.
02 Nov 25
Datalab’s Chandra topped independent benchmarks and beat the previously best dots-ocr.
- Support for 40+ languages
- Handles text, tables, formulas seamlessly
29 Oct 25
A toolkit for converting PDFs and other image-based document formats into clean, readable, plain text format.
Try the online demo: https://olmocr.allenai.org/
Features:
Convert PDF, PNG, and JPEG based documents into clean Markdown Support for equations, tables, handwriting, and complex formatting Automatically removes headers and footers Convert into text with a natural reading order, even in the presence of figures, multi-column layouts, and insets Efficient, less than $200 USD per million pages converted (Based on a 7B parameter VLM, so it requires a GPU)
27 Oct 25
11 Sep 25
This article will cover the top ten OCR libraries in Python, highlighting their strengths, unique features, and code examples to help you get started.
30 Aug 25
23 Sep 24
01 Apr 24
29 Dec 23
OCR-powered screenshot tool to capture text instead of images.
19 Oct 23
27 Feb 23
Open Source Document Management System for Digital Archives (Scanned Documents) - papermerge/docker-compose.yml at master · ciur/papermerge
11 Jan 23
Frustrated by the limitations of Tesseract OCR to extract text from meme images, the author found a way to leverage the iOS Vision API capabilities from older iphones models connected to a Raspberry Pi to build his own OCR service.
14 Dec 22
22 May 20
16 Jan 20
20 Dec 19
05 Nov 17
01 Dec 11
A Clojure wrapper for the Tesseract OCR software.