text-extraction

Star

Here are 188 public repositories matching this topic...

Pranav-Nagpure / Text-Extraction

Star

Web Application to extract text from image

python image-processing text-extraction flask-application tesseract-ocr html-css-javascript

Updated May 19, 2023
Python

ajaywanekar / BizCardX_OCR

Star

This web application utilizes OCR technology to recognize text in uploaded images and provides spelling correction and word performance improvement. Users can easily upload images containing text and receive accurate and enhanced text results.

python text-extraction sqlite3 easyocr

Updated Apr 19, 2023
Python

adamrangwala / DirCity_Directory_Crop-out-with-Key-Lines

Star

Turn Old City Directory scans into searchable data. Automated pipeline handles column detection, OCR processing, and accuracy evaluation for historical document digitization.

python opencv ocr computer-vision image-processing tesseract text-extraction genealogy digital-humanities minneapolis document-processing historical-documents city-directories

Updated Jun 28, 2025
Python

akomarla / slack_msg_processing

Star

Processing and hashing Slack communication to enable language modelling

nlp slack text-extraction hashing-algorithm md-5

Updated Jun 7, 2023
Python

w3labkr / py-image-toolkit

Star

A fast and easy-to-use Python toolkit for image processing with CLI tools for resizing, cropping, OCR, and optimization, including batch processing support.

python cli opencv ocr image-processing text-extraction face-detection py image-cropping image-resizing paddleocr

Updated Jun 5, 2025
Python

tirth8205 / RAG_using_NLP

Star

A local GPU-accelerated Retrieval-Augmented Generation (RAG) pipeline for PDF question-answering with multi-LLM support and modular NLP components. Process documents locally with privacy-focused information retrieval.

nlp text-extraction embeddings gemini openai question-answering chunking semantic-search grok document-retrieval faiss fastapi vector-search huggingface pdf-processing sentence-transformers llms rag-pipeline

Updated May 20, 2025
Python

terry-li-hm / prometheus

Star

PDF Liberation MCP Server - Break large PDFs into digestible chunks for Claude

python prometheus text-extraction document-processing pymupdf pdf-splitter pdf-processing ai-tools mcp-server fastmcp claude-code

Updated Aug 30, 2025
Python

desertigorr / docs-classifier

Star

Учебный проект по NLP

nlp sklearn text-extraction classification

Updated Aug 3, 2025
Python

MansurPro / DocuParse

Star

DocuParse is a high-performance tool for converting PDF documents into clean, structured Markdown files. Designed for speed and accuracy, it extracts and formats content while minimizing errors like hallucinations and repetitions.

text-extraction tesseract-ocr pdf-parsing digital-archive markdown-conversion document-layout-analysis google-colab huggingface-transformers pdf-to-markdown

Updated Aug 10, 2025
Python

ceodaniyal / Image_To_Text_GPT-4o-mini

Star

This repository contains a Python script to extract text from images using OpenAI's GPT-4 API. The script supports text extraction from both online image URLs and locally stored images (converted to base64). It ensures accurate and structured text extraction, making it a powerful tool for OCR-like tasks. The extracted text is saved to a file

python ocr base64 image-processing text-analysis text-extraction openai image-to-text api-integration gpt-4 image-ocr gpt-4o gpt-4o-mini

Updated Dec 14, 2024
Python

Mrigank005 / OCR

Star

This Python script automates the extraction of text from images using Tesseract OCR. It processes all images in the test_images/ folder and saves the extracted text as .txt files in the extracted_texts/ directory, maintaining the original image filenames.

python automation ocr image-processing text-extraction multi-language image-to-text batch-processing python-project

Updated May 2, 2025
Python

zanachka / price-parser

Star

Extract price amount and currency symbol from a raw text string

text-extraction html-extraction

Updated Oct 6, 2025
Python

Nath9666 / Lexo

Star

OCR tool to extract and structure text from images and scanned PDFs (outputs .docx / .txt) — FR/EN

desktop-app multilingual python gui ocr document-conversion drag-and-drop logging image-processing text-extraction docx tesseract-ocr python-docx txt pdf2image pdf2text scanned-pdf

Updated May 29, 2025
Python

Aashish-1008 / ml-trainer

Star

package for ml training in GCP

python machine-learning deep-learning text-classification tensorflow text-extraction neural-networks image-classification google-cloud-ml-engine

Updated Apr 3, 2019
Python

fonckchain / pdf-text-converter

Star

Python tool for converting PDF files to text. Simplify your document processing tasks.

python automation pdf-converter text-extraction document-processing

Updated Jul 14, 2023
Python

ParisaArbab / Data-Modeling

Star

Retrieve data from two different websites, loading them into the PostgreSQL database using Python, and combine them to get and present new information

postgresql text-extraction constraints data-statictics extract-data data-conversion python-scrapy python-connector categorize-products join-query

Updated Dec 5, 2023
Python

cyberfantics / form_digitilization

Star

A Python-based application for live video text extraction using the Gemini 1.5 Flash API, hand gesture detection, and UI display.

python opensource computer-vision pillow text-extraction hacktoberfest gemini-api hand-gesture-detection

Updated Dec 7, 2024
Python

comoysha / scrollunroll

Star

Convert scrolling article videos into long images and extract text with OCR.

text-extraction stitcher video-ocr long-screenshot scrollunroll scroll-capture

Updated Sep 15, 2025
Python

SriMathi-2705 / Single_Resume_Parsing

Star

The project focuses on extracting and structuring key details from resumes, such as names, contact information, education, and work experience, into a user-friendly interface for efficient review and management.

css python html pdf natural-language-processing annotation python3 text-extraction text-processing regular-expressions real-time-processing resume-parsing resume-pa

Updated Dec 30, 2024
Python

karan3691 / dataset-builder

Star

A complete Python pipeline that automates the creation of structured datasets from natural language search queries. This tool searches the web for content matching your query, scrapes and cleans the content, and outputs a structured dataset in multiple formats.

nlp search-engine machine-learning text-extraction web-scraping dataset-creation data-collection data-pipeline fastapi huggingface ai-tools

Updated Apr 4, 2025
Python

Improve this page

Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-extraction

Here are 188 public repositories matching this topic...

Pranav-Nagpure / Text-Extraction

ajaywanekar / BizCardX_OCR

adamrangwala / DirCity_Directory_Crop-out-with-Key-Lines

akomarla / slack_msg_processing

w3labkr / py-image-toolkit

tirth8205 / RAG_using_NLP

terry-li-hm / prometheus

desertigorr / docs-classifier

MansurPro / DocuParse

ceodaniyal / Image_To_Text_GPT-4o-mini

Mrigank005 / OCR

zanachka / price-parser

Nath9666 / Lexo

Aashish-1008 / ml-trainer

fonckchain / pdf-text-converter

ParisaArbab / Data-Modeling

cyberfantics / form_digitilization

comoysha / scrollunroll

SriMathi-2705 / Single_Resume_Parsing

karan3691 / dataset-builder

Improve this page

Add this topic to your repo