text-extraction

Star

Here are 384 public repositories matching this topic...

akomarla / slack_msg_processing

Star

Processing and hashing Slack communication to enable language modelling

nlp slack text-extraction hashing-algorithm md-5

Updated Jun 7, 2023
Python

w3labkr / py-image-toolkit

Star

A fast and easy-to-use Python toolkit for image processing with CLI tools for resizing, cropping, OCR, and optimization, including batch processing support.

python cli opencv ocr image-processing text-extraction face-detection py image-cropping image-resizing paddleocr

Updated Jun 5, 2025
Python

A local GPU-accelerated Retrieval-Augmented Generation (RAG) pipeline for PDF question-answering with multi-LLM support and modular NLP components. Process documents locally with privacy-focused information retrieval.

nlp text-extraction embeddings gemini openai question-answering chunking semantic-search grok document-retrieval faiss fastapi vector-search huggingface pdf-processing sentence-transformers llms rag-pipeline

Updated May 20, 2025
Python

terry-li-hm / prometheus

Star

PDF Liberation MCP Server - Break large PDFs into digestible chunks for Claude

python prometheus text-extraction document-processing pymupdf pdf-splitter pdf-processing ai-tools mcp-server fastmcp claude-code

Updated Aug 30, 2025
Python

desertigorr / docs-classifier

Star

Учебный проект по NLP

nlp sklearn text-extraction classification

Updated Aug 3, 2025
Python

Pranav-Nagpure / Text-Extraction

Star

Web Application to extract text from image

python image-processing text-extraction flask-application tesseract-ocr html-css-javascript

Updated May 19, 2023
Python

ajaywanekar / BizCardX_OCR

Star

This web application utilizes OCR technology to recognize text in uploaded images and provides spelling correction and word performance improvement. Users can easily upload images containing text and receive accurate and enhanced text results.

python text-extraction sqlite3 easyocr

Updated Apr 19, 2023
Python

louisho5 / fastsnip

Star

FastSnip - Free OCR screen capture tool for Windows. Extract text from anywhere on your screen with Ctrl+Shift+T. Perfect TextSniper alternative with multi-language support.

electron windows productivity open-source ocr tesseract text-extraction screen-capture free optical-character-recognition textsniper

Updated Aug 24, 2025
JavaScript

rachhek / pdf-search-assistant

Star

This assistant tool (WIP) will help you search, browse and summarize the answers to your questions from your uploaded PDF using advanced text analytics, semantic search and Large Language Model (LLM)

search nlp natural-language-processing text-extraction embeddings semantic-search open-ai llm

Updated Jul 30, 2023
Bicep

Banner-19 / Extraction-and-Analysis-of-Text

Sponsor

Star

The objective is to analyze text content from a list of URLs. This involves extracting article titles and text, then performing natural language processing to generate metrics like sentiment, readability, and word usage. Finally, the results are stored for further analysis or visualization.

nlp data-science text-analysis python3 text-extraction nltk data-analytics data-analysis

Updated Apr 11, 2024
Jupyter Notebook

zanachka / price-parser

Star

Extract price amount and currency symbol from a raw text string

text-extraction html-extraction

Updated Oct 6, 2025
Python

Nath9666 / Lexo

Star

OCR tool to extract and structure text from images and scanned PDFs (outputs .docx / .txt) — FR/EN

desktop-app multilingual python gui ocr document-conversion drag-and-drop logging image-processing text-extraction docx tesseract-ocr python-docx txt pdf2image pdf2text scanned-pdf

Updated May 29, 2025
Python

fonckchain / pdf-text-converter

Star

Python tool for converting PDF files to text. Simplify your document processing tasks.

python automation pdf-converter text-extraction document-processing

Updated Jul 14, 2023
Python

xsukax / xsukax-ReadClean-PDF

Star

A privacy-focused, client-side web application that extracts clean, readable content from any webpage and converts it to PDF format. Built with pure HTML, CSS, and JavaScript—no backend required, no tracking, complete privacy.

Updated Oct 5, 2025
HTML

eitanflor / ShellHacks-2020

Star

A Cloud-Native Infrastructure for License Plate Recognition and Text Extraction with Python Integration

python java machine-learning javafx text-extraction artificial-intelligence sqlserver google-cloud-platform cloud-sql cloudvision license-plate-recognition

Updated Oct 26, 2020
Java

Aytijha / CNN-Text-Summarization

Star

Implementing Text Summarization techniques on 'CNN DialyMail' dataset, using both 'Extractive' and 'Abstractive' strategies.

python nlp natural-language-processing text-extraction abstraction bart text-summarization seq2seq genism

Updated Sep 18, 2022
Jupyter Notebook

cyberfantics / form_digitilization

Star

A Python-based application for live video text extraction using the Gemini 1.5 Flash API, hand gesture detection, and UI display.

python opensource computer-vision pillow text-extraction hacktoberfest gemini-api hand-gesture-detection

Updated Dec 7, 2024
Python

comoysha / scrollunroll

Star

Convert scrolling article videos into long images and extract text with OCR.

text-extraction stitcher video-ocr long-screenshot scrollunroll scroll-capture

Updated Sep 15, 2025
Python

karan3691 / dataset-builder

Star

A complete Python pipeline that automates the creation of structured datasets from natural language search queries. This tool searches the web for content matching your query, scrapes and cleans the content, and outputs a structured dataset in multiple formats.

nlp search-engine machine-learning text-extraction web-scraping dataset-creation data-collection data-pipeline fastapi huggingface ai-tools

Updated Apr 4, 2025
Python

ParisaArbab / Data-Modeling

Star

Retrieve data from two different websites, loading them into the PostgreSQL database using Python, and combine them to get and present new information

postgresql text-extraction constraints data-statictics extract-data data-conversion python-scrapy python-connector categorize-products join-query

Updated Dec 5, 2023
Python

Improve this page

Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-extraction

Here are 384 public repositories matching this topic...

akomarla / slack_msg_processing

w3labkr / py-image-toolkit

tirth8205 / RAG_using_NLP

terry-li-hm / prometheus

desertigorr / docs-classifier

Pranav-Nagpure / Text-Extraction

ajaywanekar / BizCardX_OCR

louisho5 / fastsnip

rachhek / pdf-search-assistant

Banner-19 / Extraction-and-Analysis-of-Text

zanachka / price-parser

Nath9666 / Lexo

fonckchain / pdf-text-converter

xsukax / xsukax-ReadClean-PDF

eitanflor / ShellHacks-2020

Aytijha / CNN-Text-Summarization

cyberfantics / form_digitilization

comoysha / scrollunroll

karan3691 / dataset-builder

ParisaArbab / Data-Modeling

Improve this page

Add this topic to your repo