An Annotation Framework for Luxembourgish Sentiment Analysis

Joshgun Sirajzade, Daniela Gierschek, Christoph Schommer


Abstract
The aim of this paper is to present a framework developed for crowdsourcing sentiment annotation for the low-resource language Luxembourgish. Our tool is easily accessible through a web interface and facilitates sentence-level annotation of several annotators in parallel. In the heart of our framework is an XML database, which serves as central part linking several components. The corpus in the database consists of news articles and user comments. One of the components is LuNa, a tool for linguistic preprocessing of the data set. It tokenizes the text, splits it into sentences and assigns POS-tags to the tokens. After that, the preprocessed text is stored in XML format into the database. The Sentiment Annotation Tool, which is a browser-based tool, then enables the annotation of split sentences from the database. The Sentiment Engine, a separate module, is trained with this material in order to annotate the whole data set and analyze the sentiment of the comments over time and in relationship to the news articles. The gained knowledge can again be used to improve the sentiment classification on the one hand and on the other hand to understand the sentiment phenomenon from the linguistic point of view.
Anthology ID:
2020.sltu-1.24
Volume:
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Dorothee Beermann, Laurent Besacier, Sakriani Sakti, Claudia Soria
Venue:
SLTU
SIG:
Publisher:
European Language Resources association
Note:
Pages:
172–176
Language:
English
URL:
https://aclanthology.org/2020.sltu-1.24/
DOI:
Bibkey:
Cite (ACL):
Joshgun Sirajzade, Daniela Gierschek, and Christoph Schommer. 2020. An Annotation Framework for Luxembourgish Sentiment Analysis. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pages 172–176, Marseille, France. European Language Resources association.
Cite (Informal):
An Annotation Framework for Luxembourgish Sentiment Analysis (Sirajzade et al., SLTU 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.sltu-1.24.pdf