Skip to content

Mismatch in Entity Annotation between SpaCy Models (en_ner_bionlp13cg_md and en_ner_jnlpba_md) #538

@shshossain

Description

@shshossain

I encountered discrepancies when annotating biomedical texts using two different SpaCy NER models: en_ner_bionlp13cg_md and en_ner_jnlpba_md.

For example:

en_ner_bionlp13cg_md annotated "serum exosomal miR-378" as ORGANISM_SUBSTANCE, also p=0.000095 as CANCER.

en_ner_jnlpba_md annotated "73 NSCLC" incorrectly as PROTEIN and failed to annotate the microRNA "miR-378" entirely.

Expected Behavior:

microRNA entities such as "miR-378" should ideally be recognized as RNA.

Clinical terms (e.g., "NSCLC") should not be annotated incorrectly as molecular entities such as proteins.

Any suggestions or insights from the maintainers regarding how to address this discrepancy would be greatly appreciated.

scispacy 0.5.5
spacy 3.7.5
spacy-alignments 0.9.1
spacy-legacy 3.0.12
spacy-loggers 1.0.5
spacy-transformers 1.3.8

Image Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions