Best Open Source Linux OCR Software 2025

OCR Software for Linux

OCR Linux Business Clear Filters

Browse free open source OCR software and projects for Linux below. Use the toggles on the left to filter open source OCR software by OS, license, language, programming language, and project status.

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Simple, Secure Domain Registration
Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.

Sign up for free
1

OpenKM Document Management - DMS

Document Management System and Content Management System

OpenKM is a electronic document management system and record management system EDRMS ( DMS, RMS, CMS ). It provides modern and flexible architecture that meet today's IT demands, based on open technology (Java, Tomcat, GWT, Lucene, Hibernate, Spring and jBPM), powerful and scalable multiplatform application. OpenKM is a Web 2.0 application that works with Internet Explorer, Firefox, Safari and Opera. Can be configured in major DMBS like Oracle, PostgreSQL and MySQL among others. Due to its technological architecture design, OpenKM meets the document management needs of businesses of all sizes (from SMEs to big corporations). Thanks to its elegant and intuitive interface, OpenKM transforms complex operations into easy tasks. The most relevant functions of OpenKM is the indexing of the most common types of files: text, Office, Office 2007, OpenOffice, PDF, HTML, XML, MP3, JPEG, etc. For a complete feature list take a look at http://goo.gl/au8cQy

33 Reviews

Downloads: 922 This Week

Last Update: 2022-11-25
See Project
2

pdfsandwich

pdfsandwich generates "sandwich" OCR pdf files, i.e. pdf files which contain only images (but no editable text) will be processed by optical character recognition (OCR) and the text will be added to each page invisibly "behind" the images. pdfsandwich is a command line tool which is supposed to be useful to OCR scanned books or journals. It is able to recognize the page layout even for multicolumn text. Essentially, pdfsandwich is a wrapper script which calls the following binaries: convert, unpaper, tesseract, gs, and hocr2pdf (if tesseract < 3.03). It is known to run on Unix systems and has been tested on Linux and MacOS X. It supports parallel processing on multiprocessor systems. In contrast to most competing sandwich programs, it performs preprocessing of the scanned images, such as de-skewing or removal of dark edges etc. For further information please read the manual: http://www.tobias-elze.de/pdfsandwich/index.html

8 Reviews

Downloads: 389 This Week

Last Update: 2018-08-12
See Project
3

gscan2pdf

A GUI to ease the process of producing a multipage PDF from a scan. gscan2pdf should work on almost any Linux/BSD machine.

22 Reviews

Downloads: 284 This Week

Last Update: 2025-11-05
See Project
4

Papermerge

Open Source Document Management System for Digital Archives

Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and open-source software which means that transparency is the core value of our software development. Source code can be reviewed and improved by anyone from anywhere. Papermerge supports multiple users. Each user can be assigned different permissions to perform only a specific kind of action e.g. view only documents from a specific folder. OCR technology is vital part of Papermerge. It extracts text information from scanned documents, PDF, JPEG, TIFF files.

Downloads: 27 This Week

Last Update: 2025-07-24
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
5

Super-PDF-Editor

World's most comprehensive, powerful, process-based PDF editor

World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. PDF editing with 60+ features rich tools and function like OCR pdf and images and produce output like searchable PDF, Text, Hocr, Box, Unlv. Also, improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry. Easy pdf imposition, booklet, n ups pages, and more. OCR performs in pdf files, scanned pdf files and any pdf files. OCR performs in image files, and supports multiple image formats. Auto and manual image enhancement for better OCR accuracy and quality. Supports 165+ languages with three languages data set. Use Multiple Languages at once. International Languages: 127 Languages, High, Medium, and Fast Quality. Scanned Images (jpg, png, gif, tiff, bmp) Multi-Page and TIFF and GIF, Scanned PDFs.

3 Reviews

Downloads: 19 This Week

Last Update: 2023-02-02
See Project
6

Super-PDF-Editor-Lite

World's most comprehensive, powerful, process-based PDF editor

World's most comprehensive, powerful, process-based and lighting fast PDF reader, editor and batch processor. Includes features like Create PDF from Images, HTML, Text files. Create a processing log file. Extract Page, Split Page, Rotate Page, Merge Page, Duplicate page, Move Page, Printing, and Compress Page. Improve image enhancement before OCR operation for better OCR performance. pdf Imposition, etc. Super PDF Editor is best for bulk pdf processing, especially for the printing industry. Easy pdf imposition, booklet, n ups pages, and more. OCR performs in pdf files, scanned pdf files and any pdf files. OCR performs in image files, and supports multiple image formats. Auto and manual image enhancement for better OCR accuracy and quality. Supports 165+ languages with three languages data set. Use Multiple Languages at once. International Languages: 127 Languages, High, Medium, and Fast Quality. Scanned Images (jpg, png, gif, tiff, bmp) Multi-Page and TIFF and GIF, Scanned PDFs.

3 Reviews

Downloads: 16 This Week

Last Update: 2023-02-02
See Project
7

e-Dokyumento

e-Dokyumento is web-based Document Management System (DMS)

e-Dokyumento is opensource web-based Document Management System (DMS) A Document Management which automates the basic office document workflow such as receiving, filing, routing, and approving through capturing (scanning), digitizing (OCR Reading), storing, tagging, and electronically routing and approving (e-signature) of electronic documents. # Demo : https://e-dokyumento.herokuapp.com/ https://edokyu.seillig.com/ (refer to Readme.md for the accounts) #Dockerhub: https://hub.docker.com/r/nelsonmaligro/edokyumento # Install using the ISO: 1. Download: https://sourceforge.net/projects/e-dokyumento/files/Releases/e-DokyuV3.iso/download 2. Boot and login with: "root" and "admin@123" 3. Create 2 partitions: SWAP and / mount 4. Login and move "/opt/drive" folder to root: "mv /opt/drive /" # Install on Ubuntu: https://sourceforge.net/projects/e-dokyumento/files/Install%20e-Dokyumento%20on%20Ubuntu%20Linux.pdf/download

2 Reviews

Downloads: 10 This Week

Last Update: 2022-05-14
See Project
8

yagf

YAGF is a tesseract and cuneiform wrapper and helper*

YAGF is a graphical front-end for cuneiform and tesseract OCR tools. With YAGF you can open already scanned image files or obtain new images via XSane (scanning results are automatically passed to YAGF). Once you have a scanned image you can prepare it for recognition, select particular image areas for recognition, set the recognition language and so on. Recognized text is displayed in a editor window where it can be corrected, saved to disk or copied to clipboard. YAGF also provides some facilities for a multi-page recognition (see the online help for more details).

2 Reviews

Downloads: 7 This Week

Last Update: 2016-11-25
See Project
9

DocWire SDK

Award-winning modern data processing SDK in C++20

DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to expand its capabilities, focusing on versatile data extraction, platform support, and seamless integration with various systems. DocWire SDK is dedicated to streamlining data processing, reducing development time and costs, and harnessing the potential of AI. Its advancements promise a superior experience compared to its predecessor, DocToText.

Downloads: 19 This Week

Last Update: 2025-11-01
See Project
Get the most trusted enterprise browser
Advanced built-in security helps IT prevent breaches before they happen

Defend against security incidents with Chrome Enterprise. Create customizable controls, manage extensions and set proactive alerts to keep your data and employees protected without slowing down productivity.

Download Chrome
10

MyBox

Easy Tools of PDF, Image, File, Network, Data, and Medias

javafx-desktop-apps pdf image ocr icc barcode color-palette text bytes markdown html archive compress digest video audio editor converter media https://github.com/Mararsh/MyBox Self-contain packages need not java env nor installation. Jar packages need Java 16 or higher.

Downloads: 8 This Week

Last Update: 2025-10-02
See Project
11

DJVU++

The DjVu complete solution,with OCR Technology(Arabic ,English).

DjVu++ is a user-friendly program that used to manipulate DjVu file formats such as eBooks with a penalty of editing features. The program introduce a free replacement for the property PDF format with similar resolution and smaller file size DjVu++ also support OCR to handle text in scanned books and images. The program shows good performance for English. In addition to the Arabic language to lead free and commercial software in this area. The main features of DjVu++ program are: o Manipulate DjVu files. o Support smaller size than PDF with the same performance. o DjVu++ supports two languages in the OCR technique (Arabic and English). o Read multiple documents at the same time with the new tabs feature. o DjVu++ supports multiple formats:  Convert PDF document into DjVu format with smaller file size and the same performance.  Convert DjVu into PDF format.  Combine images to a single DjVu document. Perform OCR operations on multiple image formats.

4 Reviews

Downloads: 4 This Week

Last Update: 2015-08-24
See Project
12

MyOCR

Start Your Own Captcha Solving Business Portal

Captcha Solutions OCR Captcha Solver Reseller Website to Start Your Own Captcha Solving Business Portal

Downloads: 2 This Week

Last Update: 2016-03-16
See Project
13

DIY Book Scanner Image Postprocessor

An image postprocessor for the DIY Book Scanner described on instructables.com and diybookscanner.org. Gets images ready for OCR or for PDF. Written in Java based on a partial port of the Leptonica image processing library.

1 Review

Downloads: 1 This Week

Last Update: 2013-04-18
See Project
14

Kuto

When translating becomes a game ! Text to translate can be graphically selected. Several dictionnaries can be sorted according to the context. A large choice of matching strategies is available. The OCR engine is tunable.

Downloads: 0 This Week

Last Update: 2013-02-22
See Project
15

Lince

Artificial vision library. Objectives are to make an OCR, fingerprint and face identification as some applications through a general purpose learning and pattern relationships algorithm (Currently performs very basic identification).

Downloads: 0 This Week

Last Update: 2013-03-22
See Project
16

Merge PDF Files

It is a Windows library that merges standard PDFs into a final PDF

The library is intended for developers, for inclusion in desktop applications or server services. There are lots of SDKs on the market creating (merging) PDFs (almost all of them have limitations). Our Windows library (MergePDFByNMI.dll) only merges standard PDF files (there are several PDF formats). You can send the input PDFs (by file name or by byte array) and you can have the final PDF (saved on a file or get back on a byte array). The library calls can be synchronous or asynchronous. We want to give you a benchmark, the library was used to create a PDF from single page(scanned) image by an OCR SDK (it is not included in our library, you can use any on the market): 20,000 Images (the OCR SDK creates single page PDF text searchable, running 50 threads) in 80 minutes. The size of the final PDF searchable was 800Mb. If you download the library, we provide a sample which cover all the scenarios possible (synchronous and asynchronous).

Downloads: 0 This Week

Last Update: 2020-02-12
See Project
17

OpenBizCard by ThaiLife Insurance Co,Ltd

Java open source scanner for all platform. This application make the use of JSane. It also includes OCR for Thai and English characters. This project is supported and funded by Thai Life Insurance Company - A Thai Company for the Thai people (http://

Downloads: 0 This Week

Last Update: 2013-04-22
See Project
18

PDF OCR Wrapper

Note as of 2013-09-13: I'm moving this project over to github due to this: http://www.gluster.org/2013/08/how-far-the-once-mighty-sourceforge-has-fallen/ Feel free to rejoin the more updated versions on https://github.com/mnott/PDFOCRWrapper Thanks. Matthias -- This is a wrapper written in Java that allows to recursively iterate a directory structure and call an OCR engine on each found PDF on the condition that it hat not yet been called for that PDF. It works well with the ABBYY OCR Engine for Linux.

1 Review

Downloads: 0 This Week

Last Update: 2014-10-12
See Project
19

PyCodeOCR

Turn your scanner into a free document reader for invoices (e.g. for e-banking) with the help of tesseract-ocr available for many unix (and also windows) platforms.

1 Review

Downloads: 0 This Week

Last Update: 2014-09-05
See Project
20

SecureJDMS

SecureJDMS is the attempt to develop a secure DMS, based on an RCP and (even untrusted) DB for data storage. For now, its all about managing scanned documents and searching them by content (using OCR). All data sent and stored will be strongly encrypted.

Downloads: 0 This Week

Last Update: 2013-04-23
See Project
21

Socr3

Socr3 is a plugin-oriented, open source platform upon which I'm building an OCR suite. The name Socr3 stands for "Open Source Optical Character Recognition, Reading, Rendering, and Exporting", and is subject to change in the future.

Downloads: 0 This Week

Last Update: 2016-11-29
See Project
22

edocias

Electronic Document Index And Search

EDocIAS (Electronic Document Index And Search) is a PHP-based tool for indexing and searching files of various types. Third-party tools (tesseract, xpdf, etc.) can be configured to support any type of file.

Downloads: 0 This Week

Last Update: 2015-07-10
See Project
23

hocr - Hebrew OCR

hocr - Hebrew OCR c/c++ library

Downloads: 0 This Week

Last Update: 2014-06-09
See Project
24

ocr2data

Full OCR stack for document digitalization analisys and OCR that provide external conexion by API, standard document exchange formats and database.

Downloads: 0 This Week

Last Update: 2015-08-06
See Project
25

opendias

NB: openDIAS is moving away from SF.net. Please visit the homepage link for the most update to date information, support and files. Document Imaging Archive System. Home document imaging, with OCR. Scan documents (with SANE) or import ODF documents, assign tags. Use openDIAS to store all our letters, bills, statements, etc in a convenient, safe and easily retrievable way.

Downloads: 0 This Week

Last Update: 2013-04-22
See Project