Skip to content
#

text-extraction

Here are 384 public repositories matching this topic...

A local GPU-accelerated Retrieval-Augmented Generation (RAG) pipeline for PDF question-answering with multi-LLM support and modular NLP components. Process documents locally with privacy-focused information retrieval.

  • Updated May 20, 2025
  • Python
xsukax-ReadClean-PDF

A privacy-focused, client-side web application that extracts clean, readable content from any webpage and converts it to PDF format. Built with pure HTML, CSS, and JavaScript—no backend required, no tracking, complete privacy.

  • Updated Oct 5, 2025
  • HTML

This repository contains a Python script to extract text from images using OpenAI's GPT-4 API. The script supports text extraction from both online image URLs and locally stored images (converted to base64). It ensures accurate and structured text extraction, making it a powerful tool for OCR-like tasks. The extracted text is saved to a file

  • Updated Dec 14, 2024
  • Python

The objective is to analyze text content from a list of URLs. This involves extracting article titles and text, then performing natural language processing to generate metrics like sentiment, readability, and word usage. Finally, the results are stored for further analysis or visualization.

  • Updated Apr 11, 2024
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."

Learn more