Containers

Lexeme

class
An entry in the vocabulary

A Lexeme has no string context – it’s a word type, as opposed to a word token. It therefore has no part-of-speech tag, dependency parse, or lemma (if lemmatization depends on the part-of-speech tag).

Lexeme.__init__ method

Create a Lexeme object.

NameTypeDescription
vocabVocabThe parent vocabulary.
orthintThe orth id of the lexeme.

Lexeme.set_flag method

Change the value of a boolean flag.

NameTypeDescription
flag_idintThe attribute ID of the flag to set.
valueboolThe new value of the flag.

Lexeme.check_flag method

Check the value of a boolean flag.

NameTypeDescription
flag_idintThe attribute ID of the flag to query.

Lexeme.similarity methodNeeds model

Compute a semantic similarity estimate. Defaults to cosine over vectors.

NameTypeDescription
other-The object to compare with. By default, accepts Doc, Span, Token and Lexeme objects.

Lexeme.has_vector propertyNeeds model

A boolean value indicating whether a word vector is associated with the lexeme.

NameTypeDescription

Lexeme.vector propertyNeeds model

A real-valued meaning representation.

NameTypeDescription

Lexeme.vector_norm propertyNeeds model

The L2 norm of the lexeme’s vector representation.

NameTypeDescription

Attributes

NameTypeDescription
vocabVocabThe lexeme’s vocabulary.
textunicodeVerbatim text content.
orthintID of the verbatim text content.
orth_unicodeVerbatim text content (identical to Lexeme.text). Exists mostly for consistency with the other attributes.
rankintSequential ID of the lexemes’s lexical type, used to index into tables, e.g. for word vectors.
flagsintContainer of the lexeme’s binary flags.
normintThe lexemes’s norm, i.e. a normalized form of the lexeme text.
norm_unicodeThe lexemes’s norm, i.e. a normalized form of the lexeme text.
lowerintLowercase form of the word.
lower_unicodeLowercase form of the word.
shapeintTransform of the words’s string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,“Xxxx”or“dd”`.
shape_unicodeTransform of the word’s string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,“Xxxx”or“dd”`.
prefixintLength-N substring from the start of the word. Defaults to N=1.
prefix_unicodeLength-N substring from the start of the word. Defaults to N=1.
suffixintLength-N substring from the end of the word. Defaults to N=3.
suffix_unicodeLength-N substring from the start of the word. Defaults to N=3.
is_alphaboolDoes the lexeme consist of alphabetic characters? Equivalent to lexeme.text.isalpha().
is_asciiboolDoes the lexeme consist of ASCII characters? Equivalent to [any(ord(c) >= 128 for c in lexeme.text)].
is_digitboolDoes the lexeme consist of digits? Equivalent to lexeme.text.isdigit().
is_lowerboolIs the lexeme in lowercase? Equivalent to lexeme.text.islower().
is_upperboolIs the lexeme in uppercase? Equivalent to lexeme.text.isupper().
is_titleboolIs the lexeme in titlecase? Equivalent to lexeme.text.istitle().
is_punctboolIs the lexeme punctuation?
is_left_punctboolIs the lexeme a left punctuation mark, e.g. (?
is_right_punctboolIs the lexeme a right punctuation mark, e.g. )?
is_spaceboolDoes the lexeme consist of whitespace characters? Equivalent to lexeme.text.isspace().
is_bracketboolIs the lexeme a bracket?
is_quoteboolIs the lexeme a quotation mark?
is_currency v2.0.8boolIs the lexeme a currency symbol?
like_urlboolDoes the lexeme resemble a URL?
like_numboolDoes the lexeme represent a number? e.g. “10.9”, “10”, “ten”, etc.
like_emailboolDoes the lexeme resemble an email address?
is_oovboolIs the lexeme out-of-vocabulary (i.e. Does it not have a word vector?)
is_stopboolIs the lexeme part of a “stop list”?
langintLanguage of the parent vocabulary.
lang_unicodeLanguage of the parent vocabulary.
probfloatSmoothed log probability estimate of the lexeme’s word type (context-independent entry in the vocabulary).
clusterintBrown cluster ID.
sentimentfloatA scalar value indicating the positivity or negativity of the lexeme.