-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
From #78 (comment) by @joshlk
I think it would be sensible to identify different languages throughout the package using ISO two-letter codes (e.g. en, fr, de ...).
In particular, we should implement this for the Snowball stemmer in python which currently uses the full language names.
I am also wondering if in Rust, we should use String
for the language parameter or define an Enum
e.g.
use vtext::lang
let stemmer = SnowballStemmerParams::default().lang(lang::en).build()
The latter is probably simpler, but it makes it a bit harder to extend e.g. if someone designs an custom estimator for a language not in the list (e.g. some ancient infrequently used language), they would have to create a new enum.
Also just to be consistent the parameter name would be "lang"
not "language"
, right?
Metadata
Metadata
Assignees
Labels
No labels