Skip to content
#

extract-text

Here are 58 public repositories matching this topic...

A Python tool for extracting highlighted text from PDF files while preserving formatting attributes (headers, bold, italic) and removing unwanted line breaks and page breaks. Perfect for integrating with content management systems.

  • Updated Nov 13, 2025
  • Python

Apache Tika extract text and metadata from any document format with this pre-built containerised solution Kubernetes-ready deployment with intuitive UI, API, and text-to-speech capabilities - perfect for content indexing, analysis, and document processing workflows

  • Updated Nov 3, 2025
  • JavaScript

pdfRest API Toolkit is a REST API service for processing PDF documents, made by developers, for developers. Rapidly integrate PDF workflows with your existing projects and applications, simply and seamlessly. Get started for free in seconds.

  • Updated Oct 23, 2025
  • C#

Improve this page

Add a description, image, and links to the extract-text topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the extract-text topic, visit your repo's landing page and select "manage topics."

Learn more