Lexeme
classA Lexeme has no string context – it’s a word type, as opposed to a word token.
It therefore has no part-of-speech tag, dependency parse, or lemma (if
lemmatization depends on the part-of-speech tag).
Lexeme.__init__ method
Create a Lexeme object.
| Name | Type | Description |
|---|---|---|
vocab | Vocab | The parent vocabulary. |
orth | int | The orth id of the lexeme. |
| RETURNS | Lexeme | The newly constructed object. |
Lexeme.set_flag method
Change the value of a boolean flag.
| Name | Type | Description |
|---|---|---|
flag_id | int | The attribute ID of the flag to set. |
value | bool | The new value of the flag. |
Lexeme.check_flag method
Check the value of a boolean flag.
| Name | Type | Description |
|---|---|---|
flag_id | int | The attribute ID of the flag to query. |
| RETURNS | bool | The value of the flag. |
Lexeme.similarity methodNeeds model
Compute a semantic similarity estimate. Defaults to cosine over vectors.
| Name | Type | Description |
|---|---|---|
| other | - | The object to compare with. By default, accepts Doc, Span, Token and Lexeme objects. |
| RETURNS | float | A scalar similarity score. Higher is more similar. |
Lexeme.has_vector propertyNeeds model
A boolean value indicating whether a word vector is associated with the lexeme.
| Name | Type | Description |
|---|---|---|
| RETURNS | bool | Whether the lexeme has a vector data attached. |
Lexeme.vector propertyNeeds model
A real-valued meaning representation.
| Name | Type | Description |
|---|---|---|
| RETURNS | numpy.ndarray[ndim=1, dtype='float32'] | A 1D numpy array representing the lexeme’s semantics. |
Lexeme.vector_norm propertyNeeds model
The L2 norm of the lexeme’s vector representation.
| Name | Type | Description |
|---|---|---|
| RETURNS | float | The L2 norm of the vector representation. |
Attributes
| Name | Type | Description |
|---|---|---|
vocab | Vocab | The lexeme’s vocabulary. |
text | unicode | Verbatim text content. |
orth | int | ID of the verbatim text content. |
orth_ | unicode | Verbatim text content (identical to Lexeme.text). Exists mostly for consistency with the other attributes. |
rank | int | Sequential ID of the lexemes’s lexical type, used to index into tables, e.g. for word vectors. |
flags | int | Container of the lexeme’s binary flags. |
norm | int | The lexemes’s norm, i.e. a normalized form of the lexeme text. |
norm_ | unicode | The lexemes’s norm, i.e. a normalized form of the lexeme text. |
lower | int | Lowercase form of the word. |
lower_ | unicode | Lowercase form of the word. |
shape | int | Transform of the words’s string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,“Xxxx”or“dd”`. |
shape_ | unicode | Transform of the word’s string, to show orthographic features. Alphabetic characters are replaced by x or X, and numeric characters are replaced by d, and sequences of the same character are truncated after length 4. For example,“Xxxx”or“dd”`. |
prefix | int | Length-N substring from the start of the word. Defaults to N=1. |
prefix_ | unicode | Length-N substring from the start of the word. Defaults to N=1. |
suffix | int | Length-N substring from the end of the word. Defaults to N=3. |
suffix_ | unicode | Length-N substring from the start of the word. Defaults to N=3. |
is_alpha | bool | Does the lexeme consist of alphabetic characters? Equivalent to lexeme.text.isalpha(). |
is_ascii | bool | Does the lexeme consist of ASCII characters? Equivalent to [any(ord(c) >= 128 for c in lexeme.text)]. |
is_digit | bool | Does the lexeme consist of digits? Equivalent to lexeme.text.isdigit(). |
is_lower | bool | Is the lexeme in lowercase? Equivalent to lexeme.text.islower(). |
is_upper | bool | Is the lexeme in uppercase? Equivalent to lexeme.text.isupper(). |
is_title | bool | Is the lexeme in titlecase? Equivalent to lexeme.text.istitle(). |
is_punct | bool | Is the lexeme punctuation? |
is_left_punct | bool | Is the lexeme a left punctuation mark, e.g. (? |
is_right_punct | bool | Is the lexeme a right punctuation mark, e.g. )? |
is_space | bool | Does the lexeme consist of whitespace characters? Equivalent to lexeme.text.isspace(). |
is_bracket | bool | Is the lexeme a bracket? |
is_quote | bool | Is the lexeme a quotation mark? |
is_currency v2.0.8 | bool | Is the lexeme a currency symbol? |
like_url | bool | Does the lexeme resemble a URL? |
like_num | bool | Does the lexeme represent a number? e.g. “10.9”, “10”, “ten”, etc. |
like_email | bool | Does the lexeme resemble an email address? |
is_oov | bool | Is the lexeme out-of-vocabulary (i.e. Does it not have a word vector?) |
is_stop | bool | Is the lexeme part of a “stop list”? |
lang | int | Language of the parent vocabulary. |
lang_ | unicode | Language of the parent vocabulary. |
prob | float | Smoothed log probability estimate of the lexeme’s word type (context-independent entry in the vocabulary). |
cluster | int | Brown cluster ID. |
sentiment | float | A scalar value indicating the positivity or negativity of the lexeme. |