Golang module for extracting text from XML-based MS Office documents
-
Updated
Jan 29, 2023 - Go
Golang module for extracting text from XML-based MS Office documents
A command-line tool in Go that extracts meaningful text from web pages, filters out unwanted elements, and outputs clean text for easy integration with AI applications, data mining, and web scraping.
Highlight/colourise command output, logfiles (and anything else really) based on regex pattern matching
Extract text from plaintext, .docx, .odt and .rtf files. Pure go.
[UNMANTEINED] Extract values from strings and fill your structs with nlp.
This repository has moved! https://github.com/unidoc/unipdf
Golang PDF library for creating and processing PDF files (pure go)
Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.
To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."