Persian SemCor: A Bag of Word Sense Annotated Corpus for the Persian Language

Hossein Rouhizadeh, Mehrnoush Shamsfard, Mahdi Dehghan, Masoud Rouhizadeh


Abstract
Supervised approaches usually achieve the best performance in the Word Sense Disambiguation problem. However, the unavailability of large sense annotated corpora for many low-resource languages make these approaches inapplicable for them in practice. In this paper, we mitigate this issue for the Persian language by proposing a fully automatic approach for obtaining Persian SemCor (PerSemCor), as a Persian Bag-of-Word (BoW) sense-annotated corpus. We evaluated PerSemCor both intrinsically and extrinsically and showed that it can be effectively used as training sets for Persian supervised WSD systems. To encourage future research on Persian Word Sense Disambiguation, we release the PerSemCor in http://nlp.sbu.ac.ir.
Anthology ID:
2021.gwc-1.17
Volume:
Proceedings of the 11th Global Wordnet Conference
Month:
January
Year:
2021
Address:
University of South Africa (UNISA)
Editors:
Piek Vossen, Christiane Fellbaum
Venue:
GWC
SIG:
SIGLEX
Publisher:
Global Wordnet Association
Note:
Pages:
147–156
Language:
URL:
https://aclanthology.org/2021.gwc-1.17
DOI:
Bibkey:
Cite (ACL):
Hossein Rouhizadeh, Mehrnoush Shamsfard, Mahdi Dehghan, and Masoud Rouhizadeh. 2021. Persian SemCor: A Bag of Word Sense Annotated Corpus for the Persian Language. In Proceedings of the 11th Global Wordnet Conference, pages 147–156, University of South Africa (UNISA). Global Wordnet Association.
Cite (Informal):
Persian SemCor: A Bag of Word Sense Annotated Corpus for the Persian Language (Rouhizadeh et al., GWC 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.gwc-1.17.pdf
Data
Word Sense Disambiguation: a Unified Evaluation Framework and Empirical Comparison