Homonym normalisation by word sense clustering: a case in Japanese

Yo Sato, Kevin Heffernan


Abstract
This work presents a method of word sense clustering that differentiates homonyms and merge homophones, taking Japanese as an example, where orthographical variation causes problem for language processing. It uses contextualised embeddings (BERT) to cluster tokens into distinct sense groups, and we use these groups to normalise synonymous instances to a single representative form. We see the benefit of this normalisation in language model, as well as in transliteration.
Anthology ID:
2020.coling-main.295
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Donia Scott, Nuria Bel, Chengqing Zong
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
3324–3332
Language:
URL:
https://aclanthology.org/2020.coling-main.295
DOI:
10.18653/v1/2020.coling-main.295
Bibkey:
Cite (ACL):
Yo Sato and Kevin Heffernan. 2020. Homonym normalisation by word sense clustering: a case in Japanese. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3324–3332, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Homonym normalisation by word sense clustering: a case in Japanese (Sato & Heffernan, COLING 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.coling-main.295.pdf