text-extraction

Star

Here are 384 public repositories matching this topic...

akomarla / slack_msg_processing

Star

Processing and hashing Slack communication to enable language modelling

nlp slack text-extraction hashing-algorithm md-5

Updated Jun 7, 2023
Python

desertigorr / docs-classifier

Star

Учебный проект по NLP

nlp sklearn text-extraction classification

Updated Aug 3, 2025
Python

A local GPU-accelerated Retrieval-Augmented Generation (RAG) pipeline for PDF question-answering with multi-LLM support and modular NLP components. Process documents locally with privacy-focused information retrieval.

nlp text-extraction embeddings gemini openai question-answering chunking semantic-search grok document-retrieval faiss fastapi vector-search huggingface pdf-processing sentence-transformers llms rag-pipeline

Updated May 20, 2025
Python

w3labkr / py-image-toolkit

Star

A fast and easy-to-use Python toolkit for image processing with CLI tools for resizing, cropping, OCR, and optimization, including batch processing support.

python cli opencv ocr image-processing text-extraction face-detection py image-cropping image-resizing paddleocr

Updated Jun 5, 2025
Python

terry-li-hm / prometheus

Star

PDF Liberation MCP Server - Break large PDFs into digestible chunks for Claude

python prometheus text-extraction document-processing pymupdf pdf-splitter pdf-processing ai-tools mcp-server fastmcp claude-code

Updated Aug 30, 2025
Python

ajaywanekar / BizCardX_OCR

Star

This web application utilizes OCR technology to recognize text in uploaded images and provides spelling correction and word performance improvement. Users can easily upload images containing text and receive accurate and enhanced text results.

python text-extraction sqlite3 easyocr

Updated Apr 19, 2023
Python

Pranav-Nagpure / Text-Extraction

Star

Web Application to extract text from image

python image-processing text-extraction flask-application tesseract-ocr html-css-javascript

Updated May 19, 2023
Python

adamrangwala / DirCity_Directory_Crop-out-with-Key-Lines

Star

Turn Old City Directory scans into searchable data. Automated pipeline handles column detection, OCR processing, and accuracy evaluation for historical document digitization.

python opencv ocr computer-vision image-processing tesseract text-extraction genealogy digital-humanities minneapolis document-processing historical-documents city-directories

Updated Jun 28, 2025
Python

bytesnack114 / subtexty

Star

Extract clean plain-text from subtitle files.

utility text-extraction subtitle

Updated Jul 13, 2025
TypeScript

louisho5 / fastsnip

Star

FastSnip - Free OCR screen capture tool for Windows. Extract text from anywhere on your screen with Ctrl+Shift+T. Perfect TextSniper alternative with multi-language support.

electron windows productivity open-source ocr tesseract text-extraction screen-capture free optical-character-recognition textsniper

Updated Aug 24, 2025
JavaScript

Aashish-1008 / ml-trainer

Star

package for ml training in GCP

python machine-learning deep-learning text-classification tensorflow text-extraction neural-networks image-classification google-cloud-ml-engine

Updated Apr 3, 2019
Python

fonckchain / pdf-text-converter

Star

Python tool for converting PDF files to text. Simplify your document processing tasks.

python automation pdf-converter text-extraction document-processing

Updated Jul 14, 2023
Python

copyleftdev / wt

Star

A command-line tool in Go that extracts meaningful text from web pages, filters out unwanted elements, and outputs clean text for easy integration with AI applications, data mining, and web scraping.

golang natural-language-processing data-mining text-extraction web-scraping data-collection command-line-tool text-processing seo-tools ai-integration

Updated Sep 15, 2024
Go

xsukax / xsukax-ReadClean-PDF

Star

A privacy-focused, client-side web application that extracts clean, readable content from any webpage and converts it to PDF format. Built with pure HTML, CSS, and JavaScript—no backend required, no tracking, complete privacy.

Updated Oct 5, 2025
HTML

alokkumary2j / RapidAutomaticKPExtractionRAKE

Star

This repository contains my experiments with RAKE and its variants. RAKE is one of the most popular unsupervised approach for automatically extracting key-phrases/keywords from an unstructured data source like reviews, news, articles, documents etc.

python natural-language-processing rake text-extraction unsupervised-learning

Updated Jun 2, 2016
Jupyter Notebook

MansurPro / DocuParse

Star

DocuParse is a high-performance tool for converting PDF documents into clean, structured Markdown files. Designed for speed and accuracy, it extracts and formats content while minimizing errors like hallucinations and repetitions.

text-extraction tesseract-ocr pdf-parsing digital-archive markdown-conversion document-layout-analysis google-colab huggingface-transformers pdf-to-markdown

Updated Aug 10, 2025
Python

ceodaniyal / Image_To_Text_GPT-4o-mini

Star

This repository contains a Python script to extract text from images using OpenAI's GPT-4 API. The script supports text extraction from both online image URLs and locally stored images (converted to base64). It ensures accurate and structured text extraction, making it a powerful tool for OCR-like tasks. The extracted text is saved to a file

python ocr base64 image-processing text-analysis text-extraction openai image-to-text api-integration gpt-4 image-ocr gpt-4o gpt-4o-mini

Updated Dec 14, 2024
Python

rachhek / pdf-search-assistant

Star

This assistant tool (WIP) will help you search, browse and summarize the answers to your questions from your uploaded PDF using advanced text analytics, semantic search and Large Language Model (LLM)

search nlp natural-language-processing text-extraction embeddings semantic-search open-ai llm

Updated Jul 30, 2023
Bicep

Banner-19 / Extraction-and-Analysis-of-Text

Sponsor

Star

The objective is to analyze text content from a list of URLs. This involves extracting article titles and text, then performing natural language processing to generate metrics like sentiment, readability, and word usage. Finally, the results are stored for further analysis or visualization.

nlp data-science text-analysis python3 text-extraction nltk data-analytics data-analysis

Updated Apr 11, 2024
Jupyter Notebook

Mrigank005 / OCR

Star

This Python script automates the extraction of text from images using Tesseract OCR. It processes all images in the test_images/ folder and saves the extracted text as .txt files in the extracted_texts/ directory, maintaining the original image filenames.

python automation ocr image-processing text-extraction multi-language image-to-text batch-processing python-project

Updated May 2, 2025
Python

Improve this page

Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-extraction

Here are 384 public repositories matching this topic...

akomarla / slack_msg_processing

desertigorr / docs-classifier

tirth8205 / RAG_using_NLP

w3labkr / py-image-toolkit

terry-li-hm / prometheus

ajaywanekar / BizCardX_OCR

Pranav-Nagpure / Text-Extraction

adamrangwala / DirCity_Directory_Crop-out-with-Key-Lines

bytesnack114 / subtexty

louisho5 / fastsnip

Aashish-1008 / ml-trainer

fonckchain / pdf-text-converter

copyleftdev / wt

xsukax / xsukax-ReadClean-PDF

alokkumary2j / RapidAutomaticKPExtractionRAKE

MansurPro / DocuParse

ceodaniyal / Image_To_Text_GPT-4o-mini

rachhek / pdf-search-assistant

Banner-19 / Extraction-and-Analysis-of-Text

Mrigank005 / OCR

Improve this page

Add this topic to your repo