OCR
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
Tesseract Open Source OCR Engine (main repository)
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Pure Javascript OCR for more than 100 Languages 📖🎉🖥
A community-supported supercharged document management system: scan, index and archive all your documents
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
pix2tex: Using a ViT to convert images of equations into LaTeX code.
超轻量级中文ocr,支持竖排文字识别, 支持ncnn、mnn、tnn推理 ( dbnet(1.8M) + crnn(2.5M) + anglenet(378KB)) 总模型仅4.7M
🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.
Trained models with fast variant of the "best" LSTM models + legacy models
Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)
Use OCR in Windows quickly and easily with Text Grab. With optional background process and notifications.
Advanced real-time screen translator for games, hardcoded subtitles in videos, static text and etc.
Windrecorder is a memory search app by records everything on your screen in small size, to let you rewind what you have seen, query through OCR text or image description, and get activity statistic…
📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch.
OCR离线图片文字识别命令行windows程序,以JSON字符串形式输出结果,方便别的程序调用。提供各种语言API。由 PaddleOCR C++ 编译。
Toolkit for linearizing PDFs for LLM datasets/training
OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex layout handling, complicated table parsing and cross-page conte…
OpenMMLab Text Detection, Recognition and Understanding Toolbox
A lightweight LMM-based Document Parsing Model
🔥🔥🔥Java代码实现调用RapidOCR(基于PaddleOCR),适配Mac、Win、Linux,支持最新PP-OCRv4
All-in-One Development Tool based on PaddlePaddle
开源易用的中文离线OCR,识别率媲美大厂,并且提供了易用的web页面及web的接口,方便人类日常工作使用或者其他程序来调用~