Detecting Troll Tweets in a Bilingual Corpus

Lin Miao, Mark Last, Marina Litvak


Abstract
During the past several years, a large amount of troll accounts has emerged with efforts to manipulate public opinion on social network sites. They are often involved in spreading misinformation, fake news, and propaganda with the intent of distracting and sowing discord. This paper aims to detect troll tweets in both English and Russian assuming that the tweets are generated by some “troll farm.” We reduce this task to the authorship verification problem of determining whether a single tweet is authored by a “troll farm” account or not. We evaluate a supervised classification approach with monolingual, cross-lingual, and bilingual training scenarios, using several machine learning algorithms, including deep learning. The best results are attained by the bilingual learning, showing the area under the ROC curve (AUC) of 0.875 and 0.828, for tweet classification in English and Russian test sets, respectively. It is noteworthy that these results are obtained using only raw text features, which do not require manual feature engineering efforts. In this paper, we introduce a resource of English and Russian troll tweets containing original tweets and translation from English to Russian, Russian to English. It is available for academic purposes.
Anthology ID:
2020.lrec-1.766
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6247–6254
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.766/
DOI:
Bibkey:
Cite (ACL):
Lin Miao, Mark Last, and Marina Litvak. 2020. Detecting Troll Tweets in a Bilingual Corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6247–6254, Marseille, France. European Language Resources Association.
Cite (Informal):
Detecting Troll Tweets in a Bilingual Corpus (Miao et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.766.pdf