Replace CoreNLP with spaCy

Starting the CoreNLP server is not nice for anyone, it is big, relatively slow and the usage is a bit clunky. 
Other options are either spaCy or nltk.

First experiments show that `nltk`'s Named Entity Recognition is not very accurate and the sentence splitter is worse than CoreNLP.
The next choice is `spaCy` which shows nice results from simple experiments. Before we implement, we have to check the following:

- [x] Is the sentence splitter and tokenizer better than CoreNLP?
- [ ] Can we deploy spaCy with the models according to their license?
- [x] Is the NER better than CoreNLP?
- [ ] Can we have a higher throughput?
- [ ] Is it parallelizable? CoreNLP doesn't like more than 2-4 requests at the same time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace CoreNLP with spaCy #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replace CoreNLP with spaCy #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions