I encountered discrepancies when annotating biomedical texts using two different SpaCy NER models: en_ner_bionlp13cg_md and en_ner_jnlpba_md.
For example:
en_ner_bionlp13cg_md annotated "serum exosomal miR-378" as ORGANISM_SUBSTANCE, also p=0.000095 as CANCER.
en_ner_jnlpba_md annotated "73 NSCLC" incorrectly as PROTEIN and failed to annotate the microRNA "miR-378" entirely.
Expected Behavior:
microRNA entities such as "miR-378" should ideally be recognized as RNA.
Clinical terms (e.g., "NSCLC") should not be annotated incorrectly as molecular entities such as proteins.
Any suggestions or insights from the maintainers regarding how to address this discrepancy would be greatly appreciated.
scispacy 0.5.5
spacy 3.7.5
spacy-alignments 0.9.1
spacy-legacy 3.0.12
spacy-loggers 1.0.5
spacy-transformers 1.3.8
