Usually you’ll load this once per process as nlp and pass the instance around
your application. The Language class is created when you call
spacy.load() and contains the shared vocabulary
and language data, optional model data loaded from a
model package or a path, and a
processing pipeline containing components like
the tagger or parser that are called on a document in order. You can also add
your own processing pipeline components that take a Doc object, modify it and
return it.
Apply the pipeline to some text. The text can span multiple sentences, and can
contain arbitrary whitespace. Alignment into the original string is preserved.
A batch of Doc objects or unicode. If unicode, a Doc object will be created from the text.
golds
iterable
A batch of GoldParse objects or dictionaries. Dictionaries will be used to create GoldParse objects. For the available keys and their usage, see GoldParse.__init__.
drop
float
The dropout rate.
sgd
callable
An optimizer.
losses
dict
Dictionary to update with the loss, keyed by pipeline component.
component_cfgv2.1
dict
Config parameters for specific pipeline components, keyed by component name.
Tuples of Doc and GoldParse objects, such that the Doc objects contain the predictions and the GoldParse objects the correct annotations. Alternatively, (text, annotations) tuples of raw text and a dict (see simple training style).
verbose
bool
Print debugging information.
batch_size
int
The batch size to use.
scorer
Scorer
Optional Scorer to use. If not passed in, a new one will be created.
component_cfgv2.1
dict
Config parameters for specific pipeline components, keyed by component name.
Replace weights of models in the pipeline with those provided in the params
dictionary. Can be used as a context manager, in which case, models go back to
their original weights after the block.
Add a component to the processing pipeline. Valid components are callables that
take a Doc object, modify it and return it. Only one of before, after,
first or last can be set. Default behavior is last=True.
Name
Type
Description
component
callable
The pipeline component.
name
unicode
Name of pipeline component. Overwrites existing component.name attribute if available. If no name is set and the component exposes no name attribute, component.__name__ is used. An error is raised if the name already exists in the pipeline.
before
unicode
Component name to insert component directly before.
after
unicode
Component name to insert component directly after:
first
bool
Insert component first / not first in the pipeline.
Rename a component in the pipeline. Useful to create custom names for
pre-defined and pre-loaded components. To change the default name of a component
added to the pipeline, you can also use the name argument on
add_pipe.
Disable one or more pipeline components. If used as a context manager, the
pipeline will be restored to the initial state at the end of the block.
Otherwise, a DisabledPipes object is returned, that has a .restore() method
you can use to undo your changes.
Name
Type
Description
disabledv2.2.2
list
Names of pipeline components to disable.
*disabled
unicode
Names of pipeline components to disable.
RETURNS
DisabledPipes
The disabled pipes that can be restored by calling the object’s .restore() method.
Loads state from a directory. Modifies the object in place and returns it. If
the saved Language object contains a model, the model will be loaded. Note
that this method is commonly used via the subclasses like English or German
to make language-specific functionality like the
lexical attribute getters available to the
loaded object.
Name
Type
Description
path
unicode / Path
A path to a directory. Paths may be either strings or Path-like objects.
Load state from a binary string. Note that this method is commonly used via the
subclasses like English or German to make language-specific functionality
like the lexical attribute getters
available to the loaded object.
During serialization, spaCy will export several data fields used to restore
different aspects of the object. If needed, you can exclude them from
serialization by passing in the string names via the exclude argument.