A self‑hosted search engine for documents
-
Updated
Nov 4, 2025 - Java
A self‑hosted search engine for documents
Bachelor Thesis | Text extraction from complex video scenes
Tika per page PDF extractor server returning content as JSON.
Tess4J CLI OCR Tool is a command-line application that extracts text from images and PDFs using the Tess4J library, with support for multiple languages. The extracted text is automatically copied to the clipboard for easy access.
Simple server to extract text from a PDF
A Cloud-Native Infrastructure for License Plate Recognition and Text Extraction with Python Integration
Arachnio client library for Java 11+
A Spring Boot-based OCR Exporter tool that extracts text from image or PDF files using the OCR Space API and exports the results to various formats such as PDF, text, Word, or a database.
Multiple File Format (PDF/DOC/DOCX/XLSX/XLS/CSV) Text Extraction Utility Project in Java Programming Language
Text extraction: a highway to systematically process car reviews
A simple Java CLI tool for batch-converting PDF files to TXT format. Supports file filtering by filename wildcards and last modified date.
Run Apache Tika as a service in AWS Lambda by scanning documents in S3 and storing the extracted text back to S3
Yet Another Document 2 Text for pdf/doc/html/rft/etc - Extract text - or - convert to simplified HTML to retain layout information
Extract and detect text from the captured image and also selected images from the gallery.
Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.
To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."