Pipeline

Lemmatizer

class
Assign the base forms of words

The Lemmatizer supports simple part-of-speech-sensitive suffix rules and lookup tables.

Lemmatizer.__init__ method

Initialize a Lemmatizer. Typically, this happens under the hood within spaCy when a Language subclass and its Vocab is initialized.

NameTypeDescription
lookups v2.2LookupsThe lookups object containing the (optional) tables "lemma_rules", "lemma_index", "lemma_exc" and "lemma_lookup".

Lemmatizer.__call__ method

Lemmatize a string.

NameTypeDescription
stringunicodeThe string to lemmatize, e.g. the token text.
univ_posunicode / intThe token’s universal part-of-speech tag.
morphologydict / NoneMorphological features following the Universal Dependencies scheme.

Lemmatizer.lookup methodv2.0

Look up a lemma in the lookup table, if available. If no lemma is found, the original string is returned. Languages can provide a lookup table via the Lookups.

NameTypeDescription
stringunicodeThe string to look up.
orthintOptional hash of the string to look up. If not set, the string will be used and hashed. Defaults to None.

Lemmatizer.is_base_form method

Check whether we’re dealing with an uninflected paradigm, so we can avoid lemmatization entirely.

NameTypeDescription
univ_posunicode / intThe token’s universal part-of-speech tag.
morphologydictThe token’s morphological features.

Attributes

NameTypeDescription
lookups v2.2LookupsThe lookups object containing the rules and data, if available.