Demographic Word Embeddings for Racism Detection on Twitter

Mohammed Hasanuzzaman, Gaël Dias, Andy Way


Abstract
Most social media platforms grant users freedom of speech by allowing them to freely express their thoughts, beliefs, and opinions. Although this represents incredible and unique communication opportunities, it also presents important challenges. Online racism is such an example. In this study, we present a supervised learning strategy to detect racist language on Twitter based on word embedding that incorporate demographic (Age, Gender, and Location) information. Our methodology achieves reasonable classification accuracy over a gold standard dataset (F1=76.3%) and significantly improves over the classification performance of demographic-agnostic models.
Anthology ID:
I17-1093
Volume:
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Editors:
Greg Kondrak, Taro Watanabe
Venue:
IJCNLP
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
926–936
Language:
URL:
https://aclanthology.org/I17-1093
DOI:
Bibkey:
Cite (ACL):
Mohammed Hasanuzzaman, Gaël Dias, and Andy Way. 2017. Demographic Word Embeddings for Racism Detection on Twitter. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 926–936, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
Demographic Word Embeddings for Racism Detection on Twitter (Hasanuzzaman et al., IJCNLP 2017)
Copy Citation:
PDF:
https://aclanthology.org/I17-1093.pdf