Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
-
Updated
Nov 3, 2025 - Python
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch.
OpenOCR: A general OCR system with accuracy and efficiency. Supporting 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.
A program for extracting hard coded (burned in) subtitle from a video and generating an external subtitle.
Boosting Document Intelligence
This repository offers a simple OCR library that leverages system APIs like VisionKit and Media OCR for accurate text recognition. Check out the examples and start integrating with ease! 🐙✨
Open Models For Document Intelligence
🖼️ Enhance text recognition efficiency with this AI-driven multilingual OCR tool, designed for high accuracy and automated image preprocessing.
Add a description, image, and links to the chineseocr topic page so that developers can more easily learn about it.
To associate your repository with the chineseocr topic, visit your repo's landing page and select "manage topics."