Lemmatizer
classThe Lemmatizer supports simple part-of-speech-sensitive suffix rules and
lookup tables.
Lemmatizer.__init__ method
Initialize a Lemmatizer. Typically, this happens under the hood within spaCy
when a Language subclass and its Vocab is initialized.
| Name | Type | Description |
|---|---|---|
lookups v2.2 | Lookups | The lookups object containing the (optional) tables "lemma_rules", "lemma_index", "lemma_exc" and "lemma_lookup". |
| RETURNS | Lemmatizer | The newly created object. |
Lemmatizer.__call__ method
Lemmatize a string.
| Name | Type | Description |
|---|---|---|
string | unicode | The string to lemmatize, e.g. the token text. |
univ_pos | unicode / int | The token’s universal part-of-speech tag. |
morphology | dict / None | Morphological features following the Universal Dependencies scheme. |
| RETURNS | list | The available lemmas for the string. |
Lemmatizer.lookup methodv2.0
Look up a lemma in the lookup table, if available. If no lemma is found, the
original string is returned. Languages can provide a
lookup table via the Lookups.
| Name | Type | Description |
|---|---|---|
string | unicode | The string to look up. |
orth | int | Optional hash of the string to look up. If not set, the string will be used and hashed. Defaults to None. |
| RETURNS | unicode | The lemma if the string was found, otherwise the original string. |
Lemmatizer.is_base_form method
Check whether we’re dealing with an uninflected paradigm, so we can avoid lemmatization entirely.
| Name | Type | Description |
|---|---|---|
univ_pos | unicode / int | The token’s universal part-of-speech tag. |
morphology | dict | The token’s morphological features. |
| RETURNS | bool | Whether the token’s part-of-speech tag and morphological features describe a base form. |
Attributes
| Name | Type | Description |
|---|---|---|
lookups v2.2 | Lookups | The lookups object containing the rules and data, if available. |