Generalisation of cyberbullying detection

K Richard, L Marc-André - arXiv preprint arXiv:2009.01046, 2020 - arxiv.org
arXiv preprint arXiv:2009.01046, 2020arxiv.org
Cyberbullying is a problem in today's ubiquitous online communities. Filtering it out of online
conversations has proven a challenge, and efforts have led to the creation of many different
datasets, all offered as resources to train classifiers. Through these datasets, we will explore
the variety of definitions of cyberbullying behaviors and the impact of these differences on
the portability of one classifier to another community. By analyzing the similarities between
datasets, we also gain insight on the generalization power of the classifiers trained from …
Cyberbullying is a problem in today's ubiquitous online communities. Filtering it out of online conversations has proven a challenge, and efforts have led to the creation of many different datasets, all offered as resources to train classifiers. Through these datasets, we will explore the variety of definitions of cyberbullying behaviors and the impact of these differences on the portability of one classifier to another community. By analyzing the similarities between datasets, we also gain insight on the generalization power of the classifiers trained from them. A study of ensemble models combining these classifiers will help us understand how they interact with each other.
arxiv.org