text-extraction

Star

Here are 188 public repositories matching this topic...

akomarla / slack_msg_processing

Star

Processing and hashing Slack communication to enable language modelling

nlp slack text-extraction hashing-algorithm md-5

Updated Jun 7, 2023
Python

desertigorr / docs-classifier

Star

Учебный проект по NLP

nlp sklearn text-extraction classification

Updated Aug 3, 2025
Python

A local GPU-accelerated Retrieval-Augmented Generation (RAG) pipeline for PDF question-answering with multi-LLM support and modular NLP components. Process documents locally with privacy-focused information retrieval.

nlp text-extraction embeddings gemini openai question-answering chunking semantic-search grok document-retrieval faiss fastapi vector-search huggingface pdf-processing sentence-transformers llms rag-pipeline

Updated May 20, 2025
Python

w3labkr / py-image-toolkit

Star

A fast and easy-to-use Python toolkit for image processing with CLI tools for resizing, cropping, OCR, and optimization, including batch processing support.

python cli opencv ocr image-processing text-extraction face-detection py image-cropping image-resizing paddleocr

Updated Jun 5, 2025
Python

terry-li-hm / prometheus

Star

PDF Liberation MCP Server - Break large PDFs into digestible chunks for Claude

python prometheus text-extraction document-processing pymupdf pdf-splitter pdf-processing ai-tools mcp-server fastmcp claude-code

Updated Aug 30, 2025
Python

ajaywanekar / BizCardX_OCR

Star

This web application utilizes OCR technology to recognize text in uploaded images and provides spelling correction and word performance improvement. Users can easily upload images containing text and receive accurate and enhanced text results.

python text-extraction sqlite3 easyocr

Updated Apr 19, 2023
Python

Pranav-Nagpure / Text-Extraction

Star

Web Application to extract text from image

python image-processing text-extraction flask-application tesseract-ocr html-css-javascript

Updated May 19, 2023
Python

fonckchain / pdf-text-converter

Star

Python tool for converting PDF files to text. Simplify your document processing tasks.

python automation pdf-converter text-extraction document-processing

Updated Jul 14, 2023
Python

zanachka / price-parser

Star

Extract price amount and currency symbol from a raw text string

text-extraction html-extraction

Updated Oct 6, 2025
Python

Nath9666 / Lexo

Star

OCR tool to extract and structure text from images and scanned PDFs (outputs .docx / .txt) — FR/EN

desktop-app multilingual python gui ocr document-conversion drag-and-drop logging image-processing text-extraction docx tesseract-ocr python-docx txt pdf2image pdf2text scanned-pdf

Updated May 29, 2025
Python

ParisaArbab / Data-Modeling

Star

Retrieve data from two different websites, loading them into the PostgreSQL database using Python, and combine them to get and present new information

postgresql text-extraction constraints data-statictics extract-data data-conversion python-scrapy python-connector categorize-products join-query

Updated Dec 5, 2023
Python

cyberfantics / form_digitilization

Star

A Python-based application for live video text extraction using the Gemini 1.5 Flash API, hand gesture detection, and UI display.

python opensource computer-vision pillow text-extraction hacktoberfest gemini-api hand-gesture-detection

Updated Dec 7, 2024
Python

comoysha / scrollunroll

Star

Convert scrolling article videos into long images and extract text with OCR.

text-extraction stitcher video-ocr long-screenshot scrollunroll scroll-capture

Updated Sep 15, 2025
Python

karan3691 / dataset-builder

Star

A complete Python pipeline that automates the creation of structured datasets from natural language search queries. This tool searches the web for content matching your query, scrapes and cleans the content, and outputs a structured dataset in multiple formats.

nlp search-engine machine-learning text-extraction web-scraping dataset-creation data-collection data-pipeline fastapi huggingface ai-tools

Updated Apr 4, 2025
Python

zanachka / python-readability

Star

fast python port of arc90's readability tool, updated to match latest readability.js!

text-extraction html-extraction

Updated May 4, 2025
Python

ashithapallath / Whatsapp_Parser

Star

A Flask-based web app integrated with Twilio that automatically receives resumes via WhatsApp, extracts candidate details (name, email, phone), and stores them in Google Sheets and Drive using NLP and regex-based text extraction.

python nlp flask dotenv text-extraction spacy google-sheets-api data-extraction twilio-api docx2txt google-drive-api resume-parsing pdfplumber regrex

Updated Oct 30, 2025
Python

Samuelson777 / DeepDive-AI

Star

DeepDive AI - PaperInsight is an innovative tool that enables users to upload AI research papers in PDF format, ask questions, and receive context-aware insights. Streamline your research process and unlock valuable information effortlessly!

css python application text-extraction api-integration llm-integration research-paper-insight-app

Updated Apr 27, 2025
Python

pate0304 / webpage-to-text

Star

LlamaIndex-powered web content extractor for RAG applications

python machine-learning text-extraction web-scraping cli-tool rag llama-index

Updated Jul 9, 2025
Python

MyGovHub-Goodbye-World / document-ingestion-and-text-extraction

Star

AI-powered document analysis service combining AWS Textract, Bedrock, and intelligent blur detection. Supports CLI and serverless Lambda API for Malaysian documents (licenses, receipts, ID cards, utility bills).

python automation ocr serverless text-extraction blur-detection document-processing aws-textract aws-bedrock malaysian-documents

Updated Oct 9, 2025
Python

m8r1x / typeformx

Star

A simple python script that fetches data from the typeform API.

python text-extraction typeform typeform-api

Updated Sep 8, 2017
Python

Improve this page

Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-extraction

Here are 188 public repositories matching this topic...

akomarla / slack_msg_processing

desertigorr / docs-classifier

tirth8205 / RAG_using_NLP

w3labkr / py-image-toolkit

terry-li-hm / prometheus

ajaywanekar / BizCardX_OCR

Pranav-Nagpure / Text-Extraction

fonckchain / pdf-text-converter

zanachka / price-parser

Nath9666 / Lexo

ParisaArbab / Data-Modeling

cyberfantics / form_digitilization

comoysha / scrollunroll

karan3691 / dataset-builder

zanachka / python-readability

ashithapallath / Whatsapp_Parser

Samuelson777 / DeepDive-AI

pate0304 / webpage-to-text

MyGovHub-Goodbye-World / document-ingestion-and-text-extraction

m8r1x / typeformx

Improve this page

Add this topic to your repo