Skip to content
View ZhReimu's full-sized avatar

Block or report ZhReimu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

OCR

30 repositories

OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。

Python 40,708 4,031 Updated Nov 20, 2025

中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…

Python 77,896 15,096 Updated May 10, 2024

Tesseract Open Source OCR Engine (main repository)

C++ 71,450 10,424 Updated Dec 15, 2025

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Python 66,428 9,511 Updated Dec 16, 2025

Pure Javascript OCR for more than 100 Languages 📖🎉🖥

JavaScript 37,600 2,351 Updated Dec 15, 2025

A community-supported supercharged document management system: scan, index and archive all your documents

Python 34,965 2,202 Updated Dec 17, 2025

A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.

TypeScript 39,863 2,469 Updated Dec 17, 2025

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Python 32,030 2,233 Updated Dec 15, 2025

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Python 16,036 1,270 Updated Jan 18, 2025

超轻量级中文ocr,支持竖排文字识别, 支持ncnn、mnn、tnn推理 ( dbnet(1.8M) + crnn(2.5M) + anglenet(378KB)) 总模型仅4.7M

C++ 12,244 2,291 Updated Aug 14, 2023

🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.

JavaScript 16,326 779 Updated Nov 9, 2025

带带弟弟 通用验证码识别OCR pypi版

Python 13,222 2,162 Updated Jun 9, 2025

Trained models with fast variant of the "best" LSTM models + legacy models

7,314 2,415 Updated Mar 9, 2024

Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)

Python 9,040 882 Updated Dec 17, 2025

Use OCR in Windows quickly and easily with Text Grab. With optional background process and notifications.

C# 4,339 278 Updated Dec 17, 2025

Advanced real-time screen translator for games, hardcoded subtitles in videos, static text and etc.

C# 4,362 241 Updated Nov 25, 2025

Windrecorder is a memory search app by records everything on your screen in small size, to let you rewind what you have seen, query through OCR text or image description, and get activity statistic…

Python 3,721 168 Updated Sep 16, 2025

📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch.

Python 5,439 534 Updated Dec 17, 2025

Java JNA wrapper for Tesseract OCR API

Java 1,720 385 Updated Dec 11, 2025

OCR离线图片文字识别命令行windows程序,以JSON字符串形式输出结果,方便别的程序调用。提供各种语言API。由 PaddleOCR C++ 编译。

C++ 1,392 182 Updated Apr 7, 2025

Toolkit for linearizing PDFs for LLM datasets/training

Python 16,207 1,250 Updated Dec 12, 2025
C++ 829 278 Updated Jul 9, 2025

OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex layout handling, complicated table parsing and cross-page conte…

Python 2,413 146 Updated Aug 4, 2025

OpenMMLab Text Detection, Recognition and Understanding Toolbox

Python 4,696 780 Updated Nov 27, 2024

一个基于大模型api的OCR工具。

Python 961 27 Updated Jul 19, 2025

A lightweight LMM-based Document Parsing Model

Python 6,368 440 Updated Dec 8, 2025

🔥🔥🔥Java代码实现调用RapidOCR(基于PaddleOCR),适配Mac、Win、Linux,支持最新PP-OCRv4

Java 528 73 Updated Jun 5, 2024

All-in-One Development Tool based on PaddlePaddle

Python 5,937 1,123 Updated Dec 17, 2025

开源易用的中文离线OCR,识别率媲美大厂,并且提供了易用的web页面及web的接口,方便人类日常工作使用或者其他程序来调用~

Python 2,845 625 Updated Jun 14, 2023