Tesseract Open Source OCR Engine (main repository)
-
Updated
Jan 8, 2026 - C++
Tesseract Open Source OCR Engine (main repository)
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
ShareX is a free and open-source application that enables users to capture or record any area of their screen with a single keystroke. It also supports uploading images, text, and various file types to a wide range of destinations.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
Pure Javascript OCR for more than 100 Languages 📖🎉🖥
A community-supported supercharged document management system: scan, index and archive all your documents
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI workflow, RAG, Agent, Unified model management, Evaluation, SFT, Dataset Management, Enterprise-level System Management, Observability and more.
yolo3+ocr
Experience, Learn and Code the latest breakthrough innovations with Microsoft AI
text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
All-in-One Development Tool based on PaddlePaddle
Add a description, image, and links to the ocr topic page so that developers can more easily learn about it.
To associate your repository with the ocr topic, visit your repo's landing page and select "manage topics."