Arky is a technologist and a visual storyteller, who helps non-profits use ICT.
Highlights
Stars
7
stars
written in Java
Clear filter
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
This is a language detection library implemented in plain Java. (aliases: language identification, language guessing)
A cross-platform command line tool for parallelised content extraction and analysis.