Browse free open source OCR software and projects for Linux below. Use the toggles on the left to filter open source OCR software by OS, license, language, programming language, and project status.

  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • 1
    pdfsandwich generates "sandwich" OCR pdf files, i.e. pdf files which contain only images (but no editable text) will be processed by optical character recognition (OCR) and the text will be added to each page invisibly "behind" the images. pdfsandwich is a command line tool which is supposed to be useful to OCR scanned books or journals. It is able to recognize the page layout even for multicolumn text. Essentially, pdfsandwich is a wrapper script which calls the following binaries: convert, unpaper, tesseract, gs, and hocr2pdf (if tesseract < 3.03). It is known to run on Unix systems and has been tested on Linux and MacOS X. It supports parallel processing on multiprocessor systems. In contrast to most competing sandwich programs, it performs preprocessing of the scanned images, such as de-skewing or removal of dark edges etc. For further information please read the manual: http://www.tobias-elze.de/pdfsandwich/index.html
    Leader badge
    Downloads: 368 This Week
    Last Update:
    See Project
  • 2
    A Java JNA wrapper for Tesseract OCR API
    Leader badge
    Downloads: 197 This Week
    Last Update:
    See Project
  • 3
    Papermerge

    Papermerge

    Open Source Document Management System for Digital Archives

    Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Store, organize and index scanned documents in PDF, JPEG and TIFF formats. Instantly find relevant information using full text, tags and metadata-based search. Papermerge is free and open-source software which means that transparency is the core value of our software development. Source code can be reviewed and improved by anyone from anywhere. Papermerge supports multiple users. Each user can be assigned different permissions to perform only a specific kind of action e.g. view only documents from a specific folder. OCR technology is vital part of Papermerge. It extracts text information from scanned documents, PDF, JPEG, TIFF files.
    Downloads: 27 This Week
    Last Update:
    See Project
  • 4
    Common Resource Grep - crgrep

    Common Resource Grep - crgrep

    Common Resource Grep

    CRGREP searches for matching text in databases, various document formats, archives and other difficult to access resources. A command line tool for name and content text matching in database tables, plain files, MS Office documents, PDF, archives, MP3 audio, image meta-data, scanned documents, maven dependencies and web resources. CRGREP will search resources within resources of any arbitrary combination or depth, so text within a document within a zip archive, and so on. Here you will find binary downloads and discussion (https://sourceforge.net/p/crgrep/discussion/) . The actual development and issue tracking can be found here: https://bitbucket.org/cryanfuse/crgrep
    Downloads: 10 This Week
    Last Update:
    See Project
  • Simple, Secure Domain Registration Icon
    Simple, Secure Domain Registration

    Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

    Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.
    Sign up for free
  • 5
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to expand its capabilities, focusing on versatile data extraction, platform support, and seamless integration with various systems. DocWire SDK is dedicated to streamlining data processing, reducing development time and costs, and harnessing the potential of AI. Its advancements promise a superior experience compared to its predecessor, DocToText.
    Leader badge
    Downloads: 19 This Week
    Last Update:
    See Project
  • 6
    DJVU++

    DJVU++

    The DjVu complete solution,with OCR Technology(Arabic ,English).

    DjVu++ is a user-friendly program that used to manipulate DjVu file formats such as eBooks with a penalty of editing features. The program introduce a free replacement for the property PDF format with similar resolution and smaller file size DjVu++ also support OCR to handle text in scanned books and images. The program shows good performance for English. In addition to the Arabic language to lead free and commercial software in this area. The main features of DjVu++ program are: o Manipulate DjVu files. o Support smaller size than PDF with the same performance. o DjVu++ supports two languages in the OCR technique (Arabic and English). o Read multiple documents at the same time with the new tabs feature. o DjVu++ supports multiple formats:  Convert PDF document into DjVu format with smaller file size and the same performance.  Convert DjVu into PDF format.  Combine images to a single DjVu document. Perform OCR operations on multiple image formats.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Neuroph OCR - Handwriting Recognition
    Neuroph OCR - Handwriting Recognition is developed to recognize hand written letter and characters. It's engine derived's from the Java Neural Network Framework - Neuroph and as such it can be used as a standalone project or a Neuroph plug in.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8

    Immutable Sparse Wave Trees (WaveTree)

    Realtime bigdata tool for bit strings up to 2^63 based on AVL forest

    Realtime bigdata tool at the bit level based on immutable AVL forest which can be run in memory or, in future versions, as a merkle forest like a blockchain. Main object is a sparse bit string (Bits) that efficiently scales up to 2^63 bits normally compressed as forest has duplicated substrings. Bits objects support reading bit, byte, short, int, or long (Java primitives) at any bit index in 64 bit range. Example: instead of building a class to hold a header and then data, represent all of that as Bits, subranges of them, and ints for sizes of its parts. Expansion ability for other kinds of compression, since Bits is a Java interface. Main functions on bits are substring, concat, number of 0 or 1 bits, and number of bits (size). All those operations can be done millions of times per second regardless of size because the AVL forest reuses existing branches recursively. Theres a scalar (originally for copy/pasting subranges of sounds) and a bit Java package. Sparse n dimensional matrix.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Paperless-ng

    Paperless-ng

    A supercharged version of paperless, scan, index and archive docs

    Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. I wrote this to make “going paperless” easier. I do not have to worry about finding stuff again. I feed documents right from the post box into the scanner and then shred them. Perhaps you might find it useful too. Paperless-ng is a fork of the original paperless project. It changes many things both on the surface and under the hood. Paperless-ng was created because I feel that these changes are too big to be pushed into the main repository right away.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Connect every part of your business to one bank account Icon
    Connect every part of your business to one bank account

    North One is a business banking app that integrates cash flow, payments, and budgeting to turn your North One Account into one Connected Bank Account

    North One is proudly built for small businesses, startups and freelancers across America. Make payments easily, keep tabs on your money and put your finances on autopilot through smart integrations with the tools you’re already using. North One was built to make managing money easy so you can focus on running your business. No more branches. No more lines. No more paperwork. Get complete access to your North One Account from your phone or computer, wherever your business takes you. Create Envelopes for taxes, payroll, rent, and anything else automatically.
    Get started for free.
  • 10
    TCR Neuroph -Text Character Recognition
    TCR Neuroph - Text Character Recognition is java tool developed to recognize scanned text , using Java Neural Network Framework - Neuroph
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    TNN

    TNN

    Uniform deep learning inference framework for mobile

    TNN, a high-performance, lightweight neural network inference framework open sourced by Tencent Youtu Lab. It also has many outstanding advantages such as cross-platform, high performance, model compression, and code tailoring. The TNN framework further strengthens the support and performance optimization of mobile devices on the basis of the original Rapidnet and ncnn frameworks. At the same time, it refers to the high performance and good scalability characteristics of the industry's mainstream open source frameworks, and expands the support for X86 and NV GPUs. On the mobile phone, TNN has been used by many applications such as mobile QQ, weishi, and Pitu. As a basic acceleration framework for Tencent Cloud AI, TNN has provided acceleration support for the implementation of many businesses. Everyone is welcome to participate in the collaborative construction to promote the further improvement of the TNN inference framework.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    nunn

    nunn

    This is an implementation of a machine learning library in C++17

    nunn is a collection of ML algorithms and related examples written in modern C++17.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DjVuPlus

    DjVuPlus

    DjVu Read Documents,With OCR Technology(Arabic ,English ),Small Size

    The DjVu Reference Library 3.5 was released by Lizardtech under the GNU General Public License version 2. DjVuLibre-3.5 was developed by Leon Bottou and others as a "Derived Work" of the DjVu Reference Library 3.5. As such, it is also subject to the GNU General Public License version 2. Several patents apply to two very specific aspects of DjVu and DjVuLibre. The patents cover a particular aspect of the ZP-coder (the arithmetic coder used in DjVu and implemented in libdjvu/ZPCodec.cpp) and the background masking technique used in the IW44 wavelet encoder (implemented in libdjvu/IW44EncodeCodec.cpp). Most patents are owned by AT&T. LizardTech has very broad rights to them and grants free and permanent licenses to them for the purpose of building GPL software with the DjVu Reference Library. The grant is materialized by two paragraphs in the headers of the DjVu Reference Library source files. Lizardtech also published an official statement about the open source.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next