0% found this document useful (0 votes)
13 views8 pages

Paper 1

Twitter is extensively used as an information-sharing platform during any kind of emergency like disasters etc. People tweet useful information about disaster-related events such as evacuations, volunteer need, help, warnings etc. This data is sometimes very useful for rescue teams, NGOs, military and various other government and private organisations who are tasked with responsibilities to save lives and provide volunteers. This data can also be used to analyze disaster behaviour. In this p

Uploaded by

manojpemmadi09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

Paper 1

Twitter is extensively used as an information-sharing platform during any kind of emergency like disasters etc. People tweet useful information about disaster-related events such as evacuations, volunteer need, help, warnings etc. This data is sometimes very useful for rescue teams, NGOs, military and various other government and private organisations who are tasked with responsibilities to save lives and provide volunteers. This data can also be used to analyze disaster behaviour. In this p

Uploaded by

manojpemmadi09
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

EAI Endorsed Transactions

on Scalable Information Systems Research Article

Sequence Classification of Tweets with Transfer


Learning via BERT in the Field of Disaster Management
Sumera Naaz1,†, Zain Ul Abedin1,† and Danish Raza Rizvi1,∗
1Department of Computer Engineering, Jamia Millia Islamia, New Delhi, 110025 India

Abstract

Twitter is extensively used as an information-sharing platform during any kind of emergency like disasters
etc. People tweet useful information about disaster-related events such as evacuations, volunteer need,
help, warnings etc. This data is sometimes very useful for rescue teams, NGOs, military and various
other government and private organisations who are tasked with responsibilities to save lives and provide
volunteers. This data can also be used to analyze disaster behaviour. In this paper, we have collected labelled
tweets from crisisLexT26 and crisisNLP and classified them into seven labels on the basis of information
provided by them. The data was heavily skewed. So to improve the accuracy of classifiers, we have applied
various techniques as a result of which we have created two datasets (Imbalanced and Balanced). We have
compared the performance of various BERT-based models on these two datasets. For sequence classification, a
balanced dataset performs better than an imbalanced dataset. We can improve accuracy of classifiers to great
extent by adopting good data preprocessing and data splitting techniques.

Received on 21 June 2020; accepted on 18 March 2021; published on 23 March 2021


Keywords: BERT (Bidirectional Encoder Representation from Transformers), Tweet classification, Balanced Dataset,
Imbalanced Dataset, Disaster Management, Natural Language Processing.
Copyright © 2021 Sumera Naaz et al., licensed to EAI. This is an open access article distributed under the terms of the
Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unlimited use,
distribution and reproduction in any medium so long as the original work is properly cited.
doi:10.4108/eai.23-3-2021.169071

1. Introduction area has affirmed the potential use of such social media
data for various disaster response tasks [2].
The popularity of social media is increasing rapidly
Whenever a disaster occurs, there is a shortage of
due to which massive volumes of data is generated
time because the safety of people is in question. So
each day. This massive volume of data provided great
there is a need to act as quickly as possible [5].Different
opportunities and challenges for natural language
types of information are shared in real-time by victims;
processing[1]. Although there is a huge availability of
by people who wish to help these affected people or
social media data, there is quite a limitation in making
by people who need any kind of help [6]. Twitter has
sense of this data because of its high velocity, veracity
helped a lot in spreading news of damages, donation
and large volume [2]. So this huge availability and
needs, volunteering which also include videos and
complications of data make it even more prone to
photos [4].
research and exploration.
The majority of people choose Twitter when choosing It is also difficult to identify relevant information
a social media outlet for reliable scientific information about disasters [7], thus it becomes more difficult for
and news [3]. Microblogging and social networking disaster-affected communities and rescue teams to act
sites like Twitter play important role in spreading quickly [4].
information during disasters [4]. Recent research in this Recent studies have shown the relevance of social
media messages and how these messages contain
information that can be used effectively during
∗ Corresponding author. Email: Drizvi@jmi.ac.in disasters [4][8][9][10]. These social media messages
† These authors contributed equally can be processed by various NLP techniques like

EAI Endorsed Transactions on


1
Scalable Information Systems
Online First
Sumera Naaz, Zain Ul Abedin and Danish Raza Rizvi

automatic summarization, information classification, geographical phenomena(for ex- solar eclipse). They
named-entity recognition, information extraction etc applied this unique model to twitter’s data gathered
[6][11]. But most of this data is brief, informal, noisy during the solar eclipse of 2017. They decided on the
and contains typographical errors etc [12] which affects basis of some parameters if a particular feature can be
the accuracy of the model. Also, real-world data has used to decide that a user is viewing the eclipse or not
[1].
a severe problem of imbalanced classes. Some classes
have a fewer number of instances than other classes. 6. Mendhe, Chetan Harichandra, et al. developed a
platform for big data analysis. The platform supports
This problem seriously affects the performance of the
various filters for data and contains a big collection of
classifier.
social media data. They offer a convenient method of
We have collected twitter data from various sources collecting and hosting large data sets, implementing
and applied various data preprocessing techniques to state-of-the-art algorithms for preprocessing, ranging
make it suitable for processing. We have made two from removal operations (e.g., of repeated tweets) to
datasets, balanced and imbalanced from the collected transformations (e.g., of abbreviations, acronyms, and
tweets. Balanced dataset has an equal distribution of emoticons into fully formed words), and making use
tweets of each label while an imbalanced dataset has of collective intelligence to annotate large collection of
an unequal distribution of tweets. We then compared tweets. For annotation of large set of tweets, they have
the performance of these datasets on four BERT based combined social media data collection with crowd-
models- default BERT, BERT+NL (BERT with non- sourcing by using Amazon Mechanical Turk to label
Twitter data [3].
linear layers), BERT+LSTM (BERT with LSTM) and
BERT+CNN2 (BERT with two convolutional layers). 7. Praznik, Logan, et al. focuses on link prediction of
hashtag graphs. They showed how different hashtags
can be linked with each other and hence, can belong
Acknowledgement. The code in this paper is adapted from the
to the same topic. This can also be helpful for tracking
Guoqin Ma paper [13].
the development of a topic over the time and help
in prediction of future course of topic. They mapped
1.1. Related work twitter data in terms of hashtag graphs, where vertices
1. Imran, Muhammad, et al. [4] have developed a correspond to hashtags, and edges correspond to
system that can filter messages that do not contribute co-occurrences of hashtags within the same distinct
to situational awareness. They then classified these tweet. Also, the weight of vertex in hashtag graphs
filtered relevant messages into labels like caution and corresponds to the number of tweets a hashtag has
advice, casualties and damage, donation of money, occurred in, and edges can be weighted with the
goods or services etc. number of tweets both hashtags have co-occurred in
[17].
2. In Nair, Meera R. et al. paper, Twitter messages
8. TextAttack is a Python library and a system for
have been classified using keyword analysis and
executing or building ill-disposed attacks against NLP
a comparative study of three machine learning
models. This is profoundly useful in the assessment
algorithms such as Random Forests, Decision tree and
of the attack strategies and the NLP model’s strength.
Naive Bayes is carried out. The comparison of all three
Improving the model performance is one of the
algorithms is done with the help of weka, an analytical
most crucial tasks and TextAttack is working in the
tool. This paper also focuses on identifying the most
betterment of the model’s performance. TextAttack is
influential users of Chennai flood [14].
quite flexible as it provides the option of customization
3. Starbird et. al has collected Tweets posted during in the formation of attacks. The four components
Red River Flood that occurred in Red River valley from which TextAttack builds attacks are : a goal
in central North America using the keyword redriver. function, a set of constraints, a transformation, and
They then categorized these tweets into labels like a search method. The attacks can be reused for data
hopeful, humour, support and fear [15]. augmentation and adversarial training [18].
4. Case study of Thai floods that occurred in 2011
collected tweets using keyword thaiflood. These tweets 1.2. Novelty
are then classified into five categories based on The dataset we have collected has an imbalanced distribution
information provided by them. These categories were of tweets among different labels. This imbalanced nature of
requests for assistance, announcements for support, data causes a lower accuracy of classifiers. So we have applied
Situational Awareness, requests for information and various techniques for better performance and created two
others. Along with this, they also identified the datasets(D1 and D2). The two main tasks of this paper are
influential users related to Thai Flood, by scrutinizing 1) Apply various data preprocessing techniques to improve
the sources of the tweets. Most of the top users were the accuracy of classifier, 2) Compare the performance of
from government or non-government organizations various models on imbalanced and balanced datasets. We
who were somehow related to the disaster [16]. have developed several BERT-based models and compared
5. Clarkson, Kyle, et al. focuses on geolocation inference of their performance. We chose BERT because it has achieved
twitter users by taking reference of discrete sets of some state-of-art performance in many NLP tasks.

EAI Endorsed Transactions on


2
Scalable Information Systems
Online First
Sequence Classification of Tweets with Transfer Learning via BERT in the Field of Disaster Management

1. We split this imbalanced dataset(D1) into train,


validation and test data in such a way that there is an
equal distribution of tweets of each label (imbalanced
dataset + equal distribution) [19].
2. We first converted the imbalanced dataset into balanced
dataset(D2) and then split into train, validation and test
data such that there is the equal distribution of tweets
of each label (balanced dataset + equal distribution).

2.3. BERT-based models


BERT has two models, BERT base and BERT large. BERT
base has 110 million parameters while BERT large has 345
million parameters. The BASE model is used to compare the
Figure 1. Workflow (M1, M2, M3 and M4 are four different BERT performance of different architecture and the LARGE model
based models) produces state-of-the-art results as stated in BERT research
paper. In all the models, we are using BERT base uncased
model which consists of 12 layers, 768 hidden layers and 12
heads [20].
Default BERT. For sequence classification, we have used
default BERT. The last layer is the softmax layer and softmax
function is a squashing function. In this approach, we adjust
the hyperparameters of the Pre-trained BERT model very
precisely [13] (Figure 3a).
BERT with Non-Linear Layer. Three fully connected layers
are stacked on the BERT model. The activation function used
in the first two layers is a leaky rectified linear unit (negative
slope=0.01) and softmax is performed by the third layer. In
this approach also, we adjust the hyperparameters of the pre-
trained BERT model very precisely [13] (Figure 3b).
BERT with Long-Short Term Memory. This is a feature-based
approach. This model is developed by stacking a bidirectional
LSTM on default BERT model. The input to the bidirectional
LSTM is provided by the final hidden state of BERT. The
Figure 2. Flow Chart of data preprocessing
last fully-connected layer performs softmax. The bidirectional
LSTM is followed by a softmax layer [13] (Figure 3c).

2. Approach BERT with Convolutional Neural Network. This is a feature-


based approach.This model is developed by stacking a CNN
The flow chart for the approach is shown in (Figure 1). model on default BERT model. This model is developed
by two convolutional layers followed by a softmax layer.
The number of in-channels and out-channels for the first
2.1. Text Preprocessing convolutional layer are 12 and 12 respectively. The number of
Tweets are converted into lowercase. User mentions do not in-channels and out-channels for the second layer are 12 and
convey any information, so they are removed. Non-Ascii 192 respectively. The output from the second convolutional
characters and URLs are also removed. As we are using BERT layer is fed to the softmax layer (Figure 3d).
in all models, an additional [CLS] token is also inserted at
the beginning of each tweet (Figure 2). We have not removed 3. Experiment
stopwords for fluency purposes [13].
3.1. Data
2.2. Data Preparation We have collected various small aforementioned datasets
from crisisNLP and crisisLexT26 and compiled them into a
The combined dataset we have is imbalanced. A dataset single large dataset. This dataset basically contains tweets
is said to be imbalanced if at least one of the classes has which are posted during various types of disasters across
significantly fewer annotated instances than the others. The various parts of the world. These tweets are distributed into
class imbalance problem has been known to hinder the seven labels. We have followed taxonomy as given in Guoqin
learning performance of classification algorithms [6]. So we Ma paper [13]. The labels are -not related or not informative
have applied several techniques to improve the accuracy of -other useful information -donations and volunteering -
classifiers. We have compared the performance of all models affected individuals -sympathy and emotional support -
on two datasets. infrastructure and utilities damage -caution and advice.

EAI Endorsed Transactions on


3
Scalable Information Systems
Online First
Sumera Naaz, Zain Ul Abedin and Danish Raza Rizvi

Tweets are categorized on the basis of their information types.


For example [6]:
not related or not informative. information and questions
which are either not related to disaster or are out of this scope
to categorize.
other useful information. information and questions related
to disaster.
donations and volunteering. regarding the donation of food,
clothes, medicines and other basic stuff. People willing to
volunteer to provide help.
affected individuals. information regarding injured or dead
(a) Default BERT people and other victims of the disaster.
sympathy and emotional support. information regarding
prayers and well wishes.
infrastructure and utilities damage. information related to
damaged buildings, places, things and services.
caution and advice. information regarding warnings, tips
and advice by concerned authorities and people.

This dataset has a highly skewed distribution of labels.


This imbalanced distribution causes lower accuracy of the
classifier. So we have applied various techniques to improve
the performance of models. We have trained all the models
on two datasets (D1 and D2). The insight for these datasets is
(b) BERT+NL shown in Table 1 and Table 2 respectively.

Table 1. Insights of D1

Labels Count
not related or not informative 25785
other useful information 18877
donations and volunteering 11315
affected individuals 10587
sympathy and emotional support 6100
infrastructure and utilities damage 5468
caution and advice 4301

(c) BERT+LSTM

Table 2. Insights of D2

Labels Count
not related or not informative 10000
other useful information 10000
donations and volunteering 10000
affected individuals 10000
sympathy and emotional support 10000
infrastructure and utilities damage 10000
caution and advice 10000

(d) BERT+CNN2

Figure 3. BERT-based model diagrams are taken and adapted 3.2. Evaluation Method
from Guoqin Ma paper [13] Multiple metrics are calculated so that the model is evaluated
properly. Accuracy, precision, recall, F1-score and Matthews

EAI Endorsed Transactions on


4
Scalable Information Systems
Online First
Sequence Classification of Tweets with Transfer Learning via BERT in the Field of Disaster Management

correlation coefficient are determined during the evaluation.


Macro precision, macro recall, F1-score, Matthews correlation
coefficient and accuracy are determined for every model Table 3. summarize the evaluation metrics for D1 and D2 on all
respectively, while recall, precision and F1-score score are BERT-based models. {Acc(Accuracy), MCC(Matthews Correlation
determined for every class[13]. Coefficient), MP(Macro Precision), MR(Macro Recall), M-
F1(Macro F1)}.
3.3. Experimental Details Model Acc. MCC MP MR M-F1
For both D1 and D2, the train, test and validation set split Dataset-1
percentage is the same. The train set is 85%, the test set is
Default BERT 0.71 0.65 67.14 74.14 69.57
10% and the validation set is 5%. We shuffle the samples in
BERT+NL 0.69 0.62 66.29 72.29 68.43
the train set between the epochs. The loss function used is the
BERT+LSTM 0.69 0.63 65.57 73.43 67.43
Cross-Entropy loss function. There are 7 seven labels so we
BERT+CNN2 0.70 0.64 65.86 74.86 68.14
use multiclass classification variation of the loss function.
Dataset-2
M
X Default BERT 0.71 0.66 71.29 71.00 70.86
yo,c log(po,c ) BERT+NL 0.69 0.65 71.00 69.43 69.71
c=1 BERT+LSTM 0.72 0.67 72.14 72.14 71.86
BERT+CNN2 0.72 0.67 71.43 72.29 71.29
Figure 4. M - number of classes, y - binary indicator (0 or 1),
p - predicted probability observation of o is of class c

Max epoch is set to 100. For the model training, it uses


Adam optimizer. Batch size is 32. Learning rate for BERT
parameters is 0.00002 and the learning rate for non-BERT
parameters is 0.001. The learning rate decay by 50% when
patience hit 5 [13].

3.4. Results
Macro Precision(D1). Default BERT > BERT+NL >
BERT+CNN2 > BERT+LSTM
Macro Precision(D2). BERT+LSTM > BERT+CNN2 >
Default BERT > BERT+NL
Macro Recall(D1). BERT+CNN2 > Default BERT >
BERT+LSTM > BERT+NL
Macro Recall(D2). BERT+CNN2 > BERT+LSTM > Default
BERT > BERT+NL (a) F1 Score for D1
Macro F1-score(D1). Default BERT > BERT+NL >
BERT+CNN2 > BERT+LSTM
Macro F1-score(D2). BERT+LSTM > BERT+CNN2 >
Default BERT > BERT+NL
The evaluation metrics for both datasets on all BERT-
based models is shown in table 3. For D1, Default BERT has
performed best with an accuracy of 71% whereas for D2,
BERT+CNN2 and BERT+LSTM has performed best with an
accuracy of 72%.The models in general perform better when
trained and tested with balanced dataset.
The heatmaps of F1 score for D1 and D2 are shown in
Figure 5. The heatmaps of confusion matrix for all BERT-
based models on D1 as well as D2 are shown in Figure 6
and Figure 7 respectively. From the confusion matrix of the
test data for all the models, misclassification across the labels (b) F1 Score for D2
can be observed. Some of the reasons for misclassifications are
ambiguity in the context of the tweet, the presence of special Figure 5. F1 Score for all the BERT-based models
characters like emoji, etc.

EAI Endorsed Transactions on


5
Scalable Information Systems
Online First
Sumera Naaz, Zain Ul Abedin and Danish Raza Rizvi

(a) Confusion matrix of Default BERT

(b) Confusion matrix of BERT+NL

(c) Confusion matrix of BERT+LSTM

(d) Confusion matrix of BERT+CNN2

Figure 6. Confusion matrix of test dataset of D1

EAI Endorsed Transactions on


6
Scalable Information Systems
Online First
Sequence Classification of Tweets with Transfer Learning via BERT in the Field of Disaster Management

(a) Confusion matrix of Default BERT

(b) Confusion matrix of BERT+NL

(c) Confusion matrix of BERT+LSTM

(d) Confusion matrix of BERT+CNN2

Figure 7. Confusion matrix of test dataset of D2

EAI Endorsed Transactions on


7
Scalable Information Systems
Online First
Sumera Naaz, Zain Ul Abedin and Danish Raza Rizvi

3.5. Conclusion [10] Cameron, M.A., Power, R., Robinson, B. and Yin, J.
The information generated on social media can be utilised in (2012) Emergency situation awareness from twitter for
the field of disaster management. The transference of noise crisis management. In Proceedings of the 21st Interna-
from data can lead to better decisions. The data preprocessing tional Conference on World Wide Web: 695–698.
is very significant in sequence classification. The way we split [11] Bontcheva, K., Derczynski, L., Funk, A., Greenwood,
the data into train, test and validation sets, also affects the M.A., Maynard, D. and Aswani, N. (2013) Twitie:
performance of the classifier. The balanced data prove to be An open-source information extraction pipeline for
better than unbalanced data. The value of accuracy and other microblog text. In Proceedings of the International Con-
evaluation metrics should be up to the mark because the ference Recent Advances in Natural Language Processing
decisions made in the field of disaster management impact RANLP 2013: 83–90.
lives. [12] Han, B., Cook, P. and Baldwin, T. (2013) Lexical nor-
malization for social media text. ACM Transactions on
Intelligent Systems and Technology (TIST) 4(1): 1–27.
References [13] Ma, G. Tweets classification with bert in the field of
[1] Clarkson, K., Srivastava, G., Meawad, F. and Dwivedi, disaster management .
A.D. (2019) Where’s@ waldo?: finding users on twitter. [14] Nair, M.R., Ramya, G. and Sivakumar, P.B. (2017)
In International Conference on Artificial Intelligence and Usage and analysis of twitter during 2015 chennai flood
Soft Computing (Springer): 338–349. towards disaster management. Procedia computer science
[2] Kaur, A. (2019) Analyzing twitter feeds to facilitate 115: 350–358.
crises informatics and disaster response during mass [15] Starbird, K., Palen, L., Hughes, A.L. and Vieweg, S.
emergencies . (2010) Chatter on the red: what hazards threat reveals
[3] Mendhe, C.H., Henderson, N., Srivastava, G. and about the social life of microblogged information. In
Mago, V. (2020) A scalable platform to collect, store, Proceedings of the 2010 ACM conference on Computer
visualize, and analyze big data in real time. IEEE Trans- supported cooperative work: 241–250.
actions on Computational Social Systems . [16] Kongthon, A., Haruechaiyasak, C., Pailai, J. and
[4] Imran, M., Elbassuoni, S., Castillo, C., Diaz, F. and Kongyoung, S. (2014) The role of social media during a
Meier, P. (2013) Extracting information nuggets from natural disaster: a case study of the 2011 thai flood. Inter-
disaster-related messages in social media. In Iscram. national Journal of Innovation and Technology Management
[5] Olteanu, A., Vieweg, S. and Castillo, C. (2015) What 11(03): 1440012.
to expect when the unexpected happens: Social media [17] Praznik, L., Srivastava, G., Mendhe, C. and Mago, V.
communications across crises. In Proceedings of the 18th (2019) Vertex-weighted measures for link prediction in
ACM conference on computer supported cooperative work & hashtag graphs. In 2019 IEEE/ACM International Confer-
social computing: 994–1009. ence on Advances in Social Networks Analysis and Mining
[6] Imran, M., Mitra, P. and Castillo, C. (2016) Twitter as (ASONAM) (IEEE): 1034–1041.
a lifeline: Human-annotated twitter corpora for nlp of [18] Morris, J., Lifland, E., Yoo, J.Y., Grigsby, J., Jin, D. and
crisis-related messages. arXiv preprint arXiv:1605.05894 Qi, Y. (2020) Textattack: A framework for adversarial
. attacks, data augmentation, and adversarial training in
[7] Stowe, K., Anderson, J., Palmer, M., Palen, L. and nlp. In Proceedings of the 2020 Conference on Empirical
Anderson, K.M. (2018) Improving classification of twit- Methods in Natural Language Processing: System Demon-
ter behavior during hurricane events. In Proceedings of strations: 119–126.
the Sixth International Workshop on Natural Language [19] Géron, A. (2019) Hands-on machine learning with Scikit-
Processing for Social Media: 67–75. Learn, Keras, and TensorFlow: Concepts, tools, and tech-
[8] Walters, T.N. (2008), Ongoing crisis communication: niques to build intelligent systems (O’Reilly Media).
Planning, managing, and responding, wt coombs, sage [20] Devlin, J., Chang, M.W., Lee, K. and Toutanova, K.
publications (2007), 207 pp., paper, $45.95. (2018) Bert: Pre-training of deep bidirectional trans-
[9] Purohit, H., Castillo, C., Diaz, F., Sheth, A. and Meier, formers for language understanding. arXiv preprint
P. (2014) Emergency-relief coordination on social media: arXiv:1810.04805 .
Automatically matching resource requests and offers.
First Monday 19(1).

EAI Endorsed Transactions on


8
Scalable Information Systems
Online First

You might also like