Project is used for basic text processing. It can download web pages, clean text and identify TF-IDF.
Project contains four main modules according its functionality:
- lib-api-classes - contains shared interfaces and classes
- lib-calculator - contains classes which provides some computations
- lib-configuration - contains classes for reading configuration, connection to database and some helper classes
- app-text-preparation - module, which clean and prepare text
- app-tf-idf-processor - module, which identify TF-IDF for each word of each downloaded text
- app-web-downloader - module, which downloads web pages
I provide this project under Apache License 2.0.
In case of any questions about Text Processing or suggestions for improvements or some feedback or whatever is in your mind about The Framework you may contact me on projects@yss.sk.
Java, Text Processing, simple, basic text procesing, examples, clean text, TF-IDF, stemmer, downloader