Skip to content

janissl/stemmer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

stemmer

A utility to convert all plaintext documents in natural language stored in a user-defined directory to documents containing word stems


Usage

java -cp * com.github.janissl.DirectoryStemmer ${source_directory} ${destination_directory}

The plaintext files must be UTF-8-encoded and named using the following pattern: ${title}_${language}.snt where ${language} matches the language of the file content and represents an ISO 639-1 language code.

About

A simple stemmer that uses the analysis package from Apache Lucene

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages