0% found this document useful (0 votes)

51 views11 pages

Detecting Rumors on Twitter

Uploaded by

preethipremkumar297

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views11 pages

Detecting Rumors on Twitter

Uploaded by

preethipremkumar297

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Rumor has it: Identifying Misinformation in Microblogs

Vahed Qazvinian Emily Rosengren Dragomir R. Radev Qiaozhu Mei

University of Michigan
Ann Arbor, MI
{vahed,emirose,radev,qmei}@umich.edu

Abstract in which one’s well-being is at risk (DiFonzo et al.,

1994).
A rumor is commonly defined as a state- The rapid growth of online social media has made
ment whose true value is unverifiable. Ru- it possible for rumors to spread more quickly. On-
mors may spread misinformation (false infor- line social media enable unreliable sources to spread
mation) or disinformation (deliberately false large amounts of unverified information among peo-
information) on a network of people. Identi-
ple (Herman and Chomsky, 2002). Therefore, it is
fying rumors is crucial in online social media
where large amounts of information are easily crucial to design systems that automatically detect
spread across a large network by sources with misinformation and disinformation (the former of-
unverified authority. In this paper, we address ten seen as simply false and the latter as deliberately
the problem of rumor detection in microblogs false information).
and explore the effectiveness of 3 categories of Our definition of a rumor is established based on
features: content-based, network-based, and social psychology, where a rumor is defined as a
microblog-specific memes for correctly iden-
statement whose truth-value is unverifiable or delib-
tifying rumors. Moreover, we show how these
features are also effective in identifying disin- erately false. In-depth rumor analysis such as deter-
formers, users who endorse a rumor and fur- mining the intent and impact behind the spread of
ther help it to spread. We perform our exper- a rumor is a very challenging task and is not possi-
iments on more than 10,000 manually anno- ble without first retrieving the complete set of social
tated tweets collected from Twitter and show conversations (e.g., tweets) that are actually about
how our retrieval model achieves more than the rumor. In our work, we take this first step to
0.95 in Mean Average Precision (MAP). Fi-
retrieve a complete set of tweets that discuss a spe-
nally, we believe that our dataset is the first
large-scale dataset on rumor detection. It can
cific rumor. In our approach, we address two basic
open new dimensions in analyzing online mis- problems. The first problem concerns retrieving on-
information and other aspects of microblog line microblogs that are rumor-related. In the second
conversations. problem, we try to identify tweets in which the ru-
mor is endorsed (the posters show that they believe
the rumor).
1 Introduction
2 Related Work
A rumor is an unverified and instrumentally relevant
statement of information spread among people (Di- We review related work on 3 main areas: Analyzing
Fonzo and Bordia, 2007). Social psychologists ar- rumors, mining microblogs, and sentiment analysis
gue that rumors arise in contexts of ambiguity, when and subjectivity detection.
the meaning of a situation is not readily apparent,
or potential threat, when people feel an acute need 2.1 Rumor Identification and Analysis
for security. For instance rumors about ‘office ren- Though understanding rumors has been the sub-
ovation in a company’ is an example of an ambigu- ject of research in psychology for some time (All-
ous context, and the rumor that ‘underarm deodor- port and Lepkin, 1945), (Allport and Postman,
ants cause breast cancer’ is an example of a context 1947), (DiFonzo and Bordia, 2007), research has

1589

Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1589–1599,
Edinburgh, Scotland, UK, July 27–31, 2011. c 2011 Association for Computational Linguistics
only recently begun to investigate how rumors are opinion mining and sentiment analysis, it presents
manifested and spread differently online. Mi- a different class of problem because we are con-
croblogging services, like Twitter, allow small cerned not just with the opinion of the person post-
pieces of information to spread quickly to large au- ing a tweet, but with whether the statements they
diences, allowing rumors to be created and spread in post appear controversial. The automatic identifica-
new ways (Ratkiewicz et al., 2010). tion of rumors from a corpus is most closely related
Related research has used different methods to to the identification of memes done in (Leskovec et
study the spread of memes and false information al., 2009), but presents new challenges since we seek
on the web. Leskovec et al. use the evolution to highlight a certain type of recurring phrases. Our
of quotes reproduced online to identify memes and work presents one of the first attempts at automatic
track their spread overtime (Leskovec et al., 2009). rumor analysis.
Ratkiewicz et al. (Ratkiewicz et al., 2010) created
the “Truthy” system, identifying misleading politi- 2.3 Mining Twitter Data
cal memes on Twitter using tweet features, includ- With its nearly constant update of new posts and
ing hashtags, links, and mentions. Other projects public API, Twitter can be a useful source for
focus on highlighting disputed claims on the Inter- collecting data to be used in exploring a num-
net using pattern matching techniques (Ennals et al., ber of problems related to natural language pro-
2010). Though our project builds on previous work, cessing and information diffusion (Bifet and Frank,
our work differs in its general focus on identifying 2010). Pak and Paroubek demonstrated experimen-
rumors from a corpus of relevant phrases and our at- tally that despite frequent occurrences of irregular
tempts to further discriminate between phrases that speech patterns in tweets, Twitter can provide a use-
confirm, refute, question, and simply talk about ru- ful corpus for sentiment analysis (Pak and Paroubek,
mors of interest. 2010). The diversity of Twitter users make this
Mendoza et al. explore Twitter data to analyze the corpus especially valuable. Ratkiewicz et al also
behavior of Twitter users under the emergency situ- use Twitter to detect and track misleading political
ation of 2010 earthquake in Chile (Mendoza et al., memes (Ratkiewicz et al., 2010).
). They analyze the re-tweet network topology and Along with many advantages, using Twitter as a
find that the patterns of propagation in rumors dif- corpus for sentiment analysis does present unusual
fer from news because rumors tend to be questioned challenges. Because posts are limited to 140 charac-
more than news by the Twitter community. ters, tweets often contain information in an unusu-
2.2 Sentiment Analysis ally compressed form and, as a result, grammar used
may be unconventional. Instances of sarcasm and
The automated detection of rumors is similar to tra- humor are also prevalent (Bifet and Frank, 2010).
ditional NLP sentiment analysis tasks. Previous The procedures we used for the collection and anal-
work has used machine learning techniques to iden- ysis of tweets are similar to those described in previ-
tify positive and negative movie reviews (Pang et ous work. However, our goal of developing compu-
al., 2002). Hassan et al. use a supervised Markov tational methods to identify rumors being transmit-
model, part of speech, and dependency patterns to ted through tweets differentiates our project.
identify attitudinal polarities in threads posted to
Usenet discussion posts (Hassan et al., 2010). Oth- 3 Problem Definition
ers have designated sentiment scores for news sto-
ries and blog posts based on algorithmically gener- Assume we have a set of tweets that are about the
ated lexicons of positive and negative words (God- same topic that has some controversial aspects. Our
bole et al., 2007). Pang and Lee provide a detailed objective in this work is two-fold: (1) Extract tweets
overview of current techniques and practices in sen- that are about the controversial aspects of the story
timent analysis and opinion mining (Pang and Lee, and spread misinformation (Rumor retrieval). (2)
2008; Pang and Lee, 2004). Identify users who believe that misinformation ver-
Though rumor classification is closely related to sus users who refute or question the rumor (Belief

1590
Name Rumor Regular Expression Query Status #tweets
obama Is Barack Obama muslim? Obama & (muslim|islam) false 4975
airfrance Air France mid-air crash photos? (air.france|air france) & (photo|pic|pix) false 505
cellphone Cell phone numbers going public? (cell|cellphone|cell phone) mostly false 215
michelle Michelle Obama hired too many staff? staff & (michelle obama|first lady|1st lady) partly true 299
palin Sarah Palin getting divorced? palin & divorce false 4423

Table 1: List of rumor examples and their corresponding queries used to collect data from Twitter

classification). regexp (extracted from about.com) to Twitter and re-

The following two tweets are two instances of the trieve a large primitive set of tweets that is supposed
tweets written about president Obama and the Mus- to have a high recall. This set however, contains a lot
lim world. The first tweet below is about president of false positives, tweets that match the regexp but
Obama and Muslim world, where the second tweet are not about the rumor (e.g., “Obama meets muslim
spread misinformation that president Obama is Mus- leaders”). Moreover, a rumor is usually stated using
lim. various instances (e.g., “Barack HUSSEIN Obama”
versus “Obama is muslim”). Our goal is then to de-
(non-rumor) “As Obama bows to Muslim leaders sign a learning framework that filters all such false
Americans are less safe not only at home but also positives and retrieves various instances of the same
overseas. Note: The terror alert in Europe... ”
rumor
(rumor) “RT @johnnyA99 Ann Coulter Tells Larry Although our second task, belief classification,
King Why People Think Obama Is A Muslim can be viewed as an opinion mining task, it is sub-
http://bit.ly/9rs6pa #Hussein via @NewsBusters stantially different from opinion mining in nature.
#tcot ..” The difference from a standard opinion mining task
is that here we are looking for attitudes about a sub-
The goal of the retrieval task is to discriminate tle statement (e.g., “Palin is getting divorce”) instead
between such tweets. In the second task, we use of the overall sentiment of the text or the opinion
the tweets that are flagged as rumorous, and identify towards an explicit object or person (e.g., “Sarah
users that endorse (believe) the rumor versus users Palin”).
who deny or question it. The following three tweets
are about the same story. The first user is a believer
4 Data
and the second and third are not. As September 2010, Twitter reports that its users
publish nearly 95 million tweets per day1 . This
(confirm) “RT @moronwatch: Obama’s a Muslim. Or
makes Twitter an excellent case to analyze misin-
if he’s not, he sure looks like one #whyimvotingre-
publican.”
formation in social media.
Our goal in this work was to collect and annotate
(deny) “Barack Obama is a Christian man who had a large dataset that includes all the tweets that are
a Christian wedding with 2 kids baptised in Jesus written about a rumor in a certain period of time. To
name. Tea Party clowns call that muslim #p2 #gop” collect such a complete and self-contained dataset
about a rumor, we used the Twitter search API, and
(doubtful) “President Barack Obama’s Religion:
retrieved all the tweets that matched a given regular
Christian, Muslim, or Agnostic? - The News
expression. This API is the only API that returns re-
of Today (Google): Share With Friend...
sults from the entire public Twitter stream and not
http://bit.ly/bk42ZQ”
a small randomly selected sample. To overcome the
rate limit enforced by Twitter, we collected match-
The first task is substantially more challenging
ing tweets once per hour, and remove any duplicates.
than a standard IR task because of the requirement of
To use the search API, we carefully designed reg-
both high precision (every result should be actually
ular expression queries to be broad enough to match
discussing the rumor) and high recall (the set should
be complete). To do this, we submit a handcrafted 1
http://twitter.com/about

1591
all the tweets that are about a rumor. Each query Rumor non-rumor (0) believe (11) deny/ (12) total
doubtful/neutral
represents a popular rumor that is listed as “false” obama 3,036 926 1,013 4975
or only “partly true” on About.com’s Urban Leg- airfrance 306 71 128 505
cellphone 132 74 9 215
ends reference site2 between 2009 and 2010. Table 1 michelle 83 191 25 299
lists the rumor examples that we used to collect our palin 86 1,709 2,628 4,423
total 3,643 2,971 3,803 10,417
dataset along with their corresponding regular ex-
pression queries and the number of tweets collected. Table 2: Number of instances in each class from the an-
notated data
4.1 Annotation
We asked two annotators to go over all the tweets task κ
rumor retrieval 0.954
in the dataset and mark each tweet with a “1” if it
belief classification 0.853
is about any of the rumors from Table 1, and with
a “0” otherwise. This annotation scheme will be Table 3: Inter-judge agreement in two annotation tasks in
used in our first task to detect false positives, tweets terms of κ-statistic
that match the broad regular expressions and are re-
trieved, but are not about the rumor. For instance, 4.2 Inter-Judge Agreement
both of the following tweets match the regular ex-
To calculate the annotation accuracy, we annotated
pression for the palin example, but only the sec-
500 instances twice. These annotations were com-
ond one is rumorous.
pared with each other, and the Kappa coefficient (κ)
(0) “McCain Divorces Palin over her ‘untruths and out was calculated. The κ statistic is formulated as
right lies’ in the book written for her. McCain’s
team says Palin is a petty liar and phony” Pr(a) − Pr(e)
κ=
1 − Pr(e)
(1) “Sarah and Todd Palin to divorce, according to local
Alaska paper. http://ow.ly/iNxF” where P r(a) is the relative observed agreement
among raters, and P r(e) is the probability that anno-
We also asked the annotators to mark each pre- tators agree by chance if each annotator is randomly
viously annotated rumorous tweet with “11” if the assigning categories (Krippendorff, 1980; Carletta,
tweet poster endorses the rumor and with “12” if the 1996). Table 3 shows that annotators can reach
user refutes the rumor, questions its credibility, or is a high agreement in both extracting rumors (κ =
neutral. 0.95) and identifying believers (κ = 0.85).
(12) “Sarah Palin Divorce Rumor Debunked on Face- 5 Approach
book http://ff.im/62Evd”
In this section, we describe a general framework,
(11) “Todd and Sarah Palin to divorce which given a tweet, predicts (1) whether it is a
http://bit.ly/15StNc” rumor-related statement, and if so (2) whether the
user believes the rumor or not. We describe 3 sets of
Our annotation of more than 10,400 tweets shows
features, and explain why these are intuitive to use
that %35 of all the instances that matched the regu-
for identification of rumors.
lar expressions are false positives, tweets that are not
We process the tweets as they appear in the user
rumor-related but match the initial queries. More-
timeline, and do not perform any pre-processing.
over, among tweets that are about particular ru-
Specially, we think that capitalization might be an
mors, nearly %43 show the poster believe the rumor,
important property. So, we do not lower-case the
demonstrating the importance of identifying misin-
tweet texts either.
formation and those who are misinformed. Table 2
Our approach is based on building different Bayes
shows the basic statistics extracted from the annota-
classifiers as high level features and then learning
tions for each story.
a linear function of these classifiers for retrieval in
2
http://urbanlegends.about.com the first task and classification in the second. Each

1592
Bayes classifier, which corresponds to a feature fi , From each tweet we extract 4 (2 × 2) features,
calculates the likelihood ratio for a given tweet t, as corresponding to unigrams and bigrams of each rep-
shown in Equation 1. resentation. Each feature is the log-likelihood ra-
tio calculated using Equation 2. More formally,
we represent each tweet t, of length n, lexically
P (θi+ |t) P (θi+ ) P (t|θi+ ) as (w1 w2 · · · wn ) and with part-of-speech tags as
= (1)
P (θi− |t) P (θi− ) P (t|θi− ) (p1 p2 · · · pn ). After building the positive and nega-
tive models (θ+ , θ− ) for each feature using the train-
Here θi+ and θi− are two probabilistic models built ing data, we calculate the likelihood ratio as defined
based on feature fi using a set of positive (+) and in Equation 2 where
negative (−) training data. The likelihood ratio ex-
presses how many times more likely the tweet t is P (t|θ+ ) X
n
P (wj |θ+ )
under the positive model than the negative model = log (3)
P (t|θ− ) P (wj |θ− )
with respect to fi . j=1
For computational reasons and to avoid dealing for unigram-lexical features (TXT1) and
with very small numbers we use the log of the like-
lihood ratio to build each classifier.
n−1
P (t|θ+ ) X P (wj wj+1 |θ+ )
= log (4)
P (t|θ− ) P (wj wj+1 |θ− )
j=1
P (θi+ |t) P (θi+ ) P (t|θi+ )
LLi = log = log + log (2)
P (θi− |t) P (θi− ) P (t|θi− ) for bigram-based lexical features (TXT2). Simi-
larly, we define the unigram and bigram-based part-
P (θi+ ) of-speech features (POS1 and POS2) as the log-
The first term can be easily calculated us-
P (θi− ) likelihood ratio with respect to the positive and neg-
ing the maximum likelihood estimates of the prob- ative part-of-speech models.
abilities (i.e., the estimate of each probability is the
corresponding relative frequency). The second term 5.2 Network-based Features
is calculated using various features that we explain The features that we have proposed so far are all
below. based on the content of individual tweets. In the
second set of features we focus on user behavior on
5.1 Content-based Features Twitter. We observe 4 types of network-based prop-
The first set of features are extracted from the text of erties, and build 2 features that capture them.
the tweets. We propose 4 content based features. We Twitter enables users to re-tweet messages from
follow (Hassan et al., 2010) and present the tweet other people. This interaction is usually easy to de-
with 2 different patterns: tect because the re-tweeted messages generally start
with the specific pattern: ‘RT @user’. We use this
• Lexical patterns: All the words and segments property to infer about the re-tweeted message.
in the tweet are represented as they appear and Let’s suppose a user ui re-tweets a message t from
are tokenized using the space character. the user uj (ui : “RT @uj t”). Intuitively, t is more
likely to be a rumor if (1) uj has a history of posting
• Part-of-speech patterns: All words are replaced or re-tweeting rumors, or (2) ui has posted or re-
with their part-of-speech tags. To find the part- tweeted rumors in the past.
of-speech of a hashtag we treat it as a word Given a set of training instances, we build a pos-
(since they could have semantic roles in the itive (θ+ ) and a negative (θ− ) user models. The
sentence), by omitting the tag sign, and then first model is a probability distribution over all users
precede the tag with the label TAG/. We also that have posted a positive instance or have been re-
introduce a new tag, URL, for URLs that appear tweeted in a positive instance. Similarly, the sec-
in a tweet. ond model is a probability distribution over users

1593
that have posted (or been re-tweeted in) a negative Feature LL-ratio model
TXT1 content unigram content unigram
instance. After building the models, for a given TXT2 content bigram content unigram
Content
tweet we calculate two log-likelihood ratios as two POS1 content pos content pos unigram
POS2 content pos content pos bigram
network-based features. URL1 content unigram target URL unigram
The first feature is the log-likelihood ratio that ui Twitter URL2 content bigram target URL bigram
TAG hashtag hashtag
is under a positive user model (USR1) and the sec- USR1 tweeting user all users in the data
ond feature is the log-likelihood ratio that the tweet Network
USR2 re-tweeted user all users in the data
is re-tweeted from a user (uj ) who is under a positive
Table 4: List of features used in our optimization frame-
user model than a negative user model (USR2).
work. Each feature is a log-likelihood ratio calculated
The distinction between the posting user and the against a a positive (+) and negative (−) training models.
re-tweeted user is important, since some times the
users modify the re-tweeted message in a way that
changes its meaning and intent. In the following ex-
ample, the original user is quoting president Obama. m
P (t|θ+ ) X P (#hj |θ+ )
The second user is re-tweeting the first user, but has = log (5)
added more content to the tweet and made it sound P (t|θ− ) P (#hj |θ− )
j=1
rumorous.
5.3.2 URLs
original message (non-rumor) “Obama says he’s do- Previous work has discussed the role of URLs
ing ‘Christ’s work’.”
in information diffusion on Twitter (Honeycutt and
re-tweeted (rumor) “Obama says he’s doing ‘Christ’s Herring, 2009). Twitter users share URLs in their
work.’ Oh my God, CHRIST IS A MUSLIM.” tweets to refer to external sources or overcome the
length limit forced by Twitter. Intuitively, if a tweet
5.3 Twitter Specific Memes is a positive instance, then it is likely to be similar to
Our final set of features are extracted from memes the content of URLs shared by other positive tweets.
that are specific to Twitter: hashtags and URLs. Using the same reasoning, if a tweet is a negative
Previous work has shown the usefulness of these instance, then it should be more similar to the web
memes (Ratkiewicz et al., 2010). pages shared by other negative instances.
Given a set of training tweets, we fetch all the
5.3.1 Hashtags URLs in these tweets and build θ+ and θ− once for
One emergent phenomenon in the Twitter ecosys- unigrams and once for bigrams. These models are
tem is the use of hashtags: words or phrases prefixed merely built on the content of the URLs and ignore
with a hash symbol (#). These hashtags are created the tweet content. Similar to previous features, we
by users, and are widely used for a few days, then calculate the log-likelihood ratio of the content of
disappear when the topic is outdated (Huang et al., each tweet with respect to θ+ and θ− for unigrams
2010). (URL1) and bigrams URL2).
In our approach, we investigate whether hashtags Table 4 summarizes the set of features used in our
used in rumor-related tweets are different from other proposed framework, where each feature is a log-
tweets. Moreover, we examine whether people who likelihood ratio calculated against a positive (+) and
believe and spread rumors use hashtags that are dif- negative (−) training models. To build these lan-
ferent from those seen in tweets that deny or ques- guage models, we use the CMU Language Modeling
tion a rumor. toolkit (Clarkson and Rosenfeld, 1997).
Given a set of training tweets of positive and neg-
ative examples, we build two statistical models (θ+ , 5.4 Optimization
θ− ), each showing the usage probability distribution We build an L1 -regularized log-linear model (An-
of various hashtags. For a given tweet, t, with a set drew and Gao, 2007) on various features discussed
of m hashtags (#h1 · · · #hm ), we calculate the log- before to predict each tweet. Suppose, a procedure
likelihood ratio using Equation 2 where generates a set of candidates for an input x. Also,

1594
let’s suppose Φ : X × Y → RD is a function that 6.1 Rumor Retrieval
maps each (x, y) to a vector of feature values. Here, In this experiment, we view different stories as
the feature vector is the vector of coefficients corre- queries, and build a relevance set for each query.
sponding to different network, content, and twitter- Each relevance set is an annotation of the entire
based properties, and the parameter vector θ ∈ RD 10,417 tweets, where each tweet is marked as rel-
(D ≤ 9 in our experiments) assigns a real-valued evant if it matches the regular expression query and
weight to each feature. This estimator chooses θ to is marked as a rumor-related tweet by the annotators.
minimize the sum of least squares and a regulariza- For instance, according to Table 2 the cellphone
tion term R. dataset has only 83 relevant documents out of the
1X entire 10,417 documents.
θ̂ = arg min{ ||hθ, xi i − yi ||22 + R(θ)} (6)
θ 2
i
For each query we use 5-fold cross-validation,
and predict the relevance of tweets as a function of
where the regularizer term R(θ) is the weighted L1 their features. We use these predictions and rank
norm of the parameters. all the tweets with respect to the query. To evalu-
X
R(θ) = α |θj | (7) ate the performance of our ranking model for a sin-
j gle query (Q) with the set of relevant documents
{d1 , · · · , dm }, we calculate Average Precision as
Here, α is a parameter that controls the amount of
regularization (set to 0.1 in our experiments). 1 X
m

Gao et. al (Gao et al., 2007) argue that op- AP (Q) = Precision(Rk ) (8)
m
timizing L1 -regularized objective function is chal- k=1

lenging since its gradient is discontinuous whenever where Rk is the set of ranked retrieval results from
some parameters equal zero. In this work, we use the top result to the k th relevant document, dk (Man-
the orthant-wise limited-memory quasi-Newton al- ning et al., 2008).
gorithm (OWL-QN), which is a modification of L-
BFGS that allows it to effectively handle the dis- 6.1.1 Baselines
continuity of the gradient (Andrew and Gao, 2007). We compare our proposed ranking model with a
OWL-QN is based on the fact that when restricted number of other retrieval models. The first two sim-
to a single orthant, the L1 regularizer is differen- ple baselines that indicate a difficulty lower-bound
tiable, and is in fact a linear function of θ. Thus, for the problem are Random and Uniform meth-
as long as each coordinate of any two consecutive ods. In the Random baseline, documents are ranked
search points does not pass through zero R(θ) does based on a random number assignment to them. In
not contribute at all to the curvature of the function the Uniform model, we use a 5-fold cross validation,
on the segment joining them. Therefore, we can use and in each fold the label of the test documents is de-
L-BFGS to approximate the Hessian of L(θ) alone termined by the majority vote from the training set.
and use it to build an approximation to the full reg- The main baseline that we use in this work, is the
ularized objective that is valid on a given orthant. regular expression that was submitted to Twitter to
This algorithm works quite well in practice, and typ- collect data (regexp). Using the same regular ex-
ically reaches convergence in even fewer iterations pression to mark the relevance of the documents will
than standard L-BFGS (Gao et al., 2007). cause a recall value of 1.00 (since it will retrieve all
the relevant documents), but will also retrieve false
6 Experiments positives, tweets that match the regular expression
We design 2 sets of experiments to evaluate our ap- but are not rumor-related. We would like to inves-
proach. In the first experiment we assess the effec- tigate whether using training data will help us de-
tiveness of the proposed method when employed in crease the rate of false positives in retrieval.
an Information Retrieval (IR) framework for rumor Finally, using the Lemur Toolkit software3 , we
retrieval and in the second experiment we employ employ a KL divergence retrieval model with
various features to detect users’ beliefs in rumors. 3
http://www.lemurproject.org/

1595
Dirichlet smoothing (KL). In this model, documents
are ranked according to the negation of the diver-
gence of query and document language models.
More formally, given the query language model θQ ,
and the document language model θD , the docu-
ments are ranked by −D(θQ ||θD ), where D is the
KL-divergence between the two models.
X p(w|θQ )
D(θQ ||θD ) = p(w|θQ ) log (9)
w
p(w|θD )
Figure 1: Average precision and recall of the proposed
To estimate p(w|θD ), we use Bayesian smoothing
method employing each set of features: content-based,
with Dirichlet priors (Berger, 1985). network-based, and twitter specific.
C(w, D) + µ.p(w|θS )
ps (w|θD ) = P (10)
µ + w C(w, D) tweets do not share hashtags or are not written based
on the contents of external URLs.
where, µ is a parameter, C is the count function, and
Finally, we find that user history can be a good
thetaS is the collection language model. Higher val-
indicator of rumors. However, we believe that this
ues of µ put more emphasis on the collection model.
feature could be more helpful with a complete user
Here, we try two variants of the model, one using
set and a more comprehensive history of their activ-
the default parameter value in Lemur (µ = 2000),
ities.
and one in which µ is tuned based on the the data
(µ = 10). Using the test data to tune the parameter 6.1.3 Domain Training Data
value, µ, will help us find an upper-bound estimate
As our last experiment with rumor retrieval we in-
of the effectiveness of this method.
vestigate how much new labeled data from an emer-
Table 5 shows the Mean Average Precision
gent rumor is required to effectively retrieve in-
(MAP) and Fβ=1 for each method in the rumor re-
stances of that particular rumor. This experiment
trieval task. This table shows that a method that
helps us understand how our proposed framework
employs training data to re-rank documents with
could be generalized to other stories.
respect to rumors makes significant improvements
To do this experiment, we use the obama story,
over the baselines and outperforms other strong re-
which is a large dataset with a significant number of
trieval systems.
false positive instances. We extract 400 randomly
6.1.2 Feature Analysis selected tweets from this dataset and keep them for
To investigate the effectiveness of using indi- testing. We also build an initial training dataset of
vidual features in retrieving rumors, we perform the other 4 rumors, and label them as not relevant.
5-fold cross validations for each query, using We assess the performance of the retrieval model as
different feature sets each time. Figure 1 shows we gradually add the rest of the obama tweets. Fig-
the average precision and recall for our pro- ure 2 shows both Average Precision and labeling ac-
posed optimization system when content-based curacy versus the size of the labeled data used from
(TXT1+TXT2+POS1+POS2), network-based the obama dataset. This plot shows that both mea-
(USR1+USR2), and twitter specific memes sures exhibit a fast growth and reach 80% when the
(TAG+URL1+URL2) are employed individually. number of labeled data reaches 2000.
Figure 1 shows that features that are calculated us-
ing the content language models are very effective in 6.2 Belief Classification
achieving high precision and recall. Twitter specific In previous experiments we showed that maximiz-
features, especially hashtags, can result in high pre- ing a linear function of log-likelihood ratios is an
cisions but lead to a low recall value because many effective method in retrieving rumors. Here, we in-

1596
Method MAP 95% C.I. Fβ=1 95% C.I.
Random 0.129 [-0.065, 0.323] 0.164 [-0.051, 0.379]
Uniform 0.129 [-0.066, 0.324] 0.198 [-0.080, 0.476]
regexp 0.587 [0.305, 0.869] 0.702 [0.479, 0.925]
KL (µ = 2000) 0.678 [0.458, 0.898] 0.538 [0.248, 0.828]
KL (µ = 10) 0.803 [0.641, 0.965] 0.681 [0.614, 0.748]
LL (all 9 features) 0.965 [0.936, 0.994] 0.897 [0.828, 0.966]

Table 5: Mean Average Precision (MAP) and Fβ=1 of each method in the rumor retrieval task. (C.I.: Confidence
Interval)

Method Accuracy Precision Recall Fβ=1 Win/Loss Ratio

random 0.501 0.441 0.513 0.474 1.004
uniform 0.439 0.439 1.000 0.610 0.781
TXT 0.934 0.925 0.924 0.924 14.087
POS 0.742 0.706 0.706 0.706 2.873
content (TXT+POS) 0.941 0.934 0.930 0.932 15.892
network (USR) 0.848 0.873 0.765 0.815 5.583
TAG 0.589 0.734 0.099 0.175 1.434
URL 0.664 0.630 0.570 0.598 1.978
twitter (TAG+URL) 0.683 0.658 0.579 0.616 2.155
all 0.935 0.944 0.906 0.925 14.395

Table 6: Accuracy, precision, recall, Fβ=1 , and win/loss ratio of belief classification using different features.

vestigate whether this method, and in particular, the

proposed features are useful in detecting users’ be-
liefs in a rumor that they post about. Unlike re-
trieval, detecting whether a user endorses a rumor or
refutes it may be possible using similar methods re-
gardless of the rumor. Intuitively, linguistic features
such as negation (e.g., “obama is not a muslim”), or
capitalization (e.g., “barack HUSSEIN obama ...”),
user history (e.g., liberal tweeter vs. conservative
tweeter), hashtags (e.g., #tcot vs. #tdot), and URLs
(e.g., links to fake airfrance crash photos) should
help to identify endorsements.
We perform this experiment by making a pool
of all the tweets that are marked as “rumorous” in
the annotation task. Table 2 shows that there are
6,774 such tweets, from which 2,971 show belief
and 3,803 tweets show that the user is doubtful, de-
Figure 2: Average Precision and Accuracy learning curve nies, or questions it.
for the proposed method employing all 9 features. Using various feature settings, we perform 5-fold
cross-validation on these 6,774 rumorous tweets.
Table 6 shows the results of this experiment in terms
of F-score, classification accuracy, and win/loss ra-
tio, the ratio of correct classification to an incorrect

1597
classification. believe them. Journal of Abnormal and Social Psy-
chology, 40(1):3 – 36.
7 Conclusion Gordon Allport and Leo Postman. 1947. The psychology
of rumor. Holt, Rinehart, and Winston, New York.
In this paper we tackle the fairly unaddressed prob- Galen Andrew and Jianfeng Gao. 2007. Scalable train-
lem of identifying misinformation and disinform- ing of l1-regularized log-linear models. In ICML ’07,
ers in Microblogs. Our contributions in this pa- pages 33–40.
per are two-fold: (1) We propose a general frame- James Berger. 1985. Statistical decision theory and
work that employs statistical models and maximizes Bayesian Analysis (2nd ed.). New York: Springer-
a linear function of log-likelihood ratios to retrieve Verlag.
rumorous tweets that match a more general query. Albert Bifet and Eibe Frank. 2010. Sentiment knowl-
edge discovery in twitter streaming data. In Bernhard
(2) We show the effectiveness of the proposed fea-
Pfahringer, Geoff Holmes, and Achim Hoffmann, edi-
ture in capturing tweets that show user endorsement. tors, Discovery Science, volume 6332 of Lecture Notes
This will help us identify disinformers or users that in Computer Science, pages 1–15. Springer Berlin /
spread false information in online social media. Heidelberg.
Our work has resulted in a manually annotated Jean Carletta. 1996. Assessing agreement on classifi-
dataset of 10,000 tweets from 5 different controver- cation tasks: the kappa statistic. Comput. Linguist.,
sial topics. To the knowledge of authors this is the 22(2):249–254.
first large-scale publicly available rumor dataset, and Philip Clarkson and Roni Rosenfeld. 1997. Statistical
language modeling using the cmu-cambridge toolkit.
can open many new dimensions in studying the ef-
Proceedings ESCA Eurospeech, 47:45–148.
fects of misinformation or other aspects of informa-
Nicholas DiFonzo and Prashant Bordia. 2007. Rumor,
tion diffusion in online social media. gossip, and urban legend. Diogenes, 54:19–35, Febru-
In this paper we effectively retrieve instances of ary.
rumors that are already identified and evaluated by Nicholas DiFonzo, P. Prashant Bordia, and Ralph L. Ros-
an external source such as About.com’s Urban Leg- now. 1994. Reining in rumors. Organizational Dy-
ends reference. Identifying new emergent rumors namics, 23(1):47–62.
directly from the Twitter data is a more challenging Rob Ennals, Dan Byler, John Mark Agosta, and Barbara
task. As our future work, we aim to build a sys- Rosario. 2010. What is disputed on the web? In Pro-
tem that employs our findings in this paper and the ceedings of the 4th workshop on Information Credibil-
ity, WICOW ’10, pages 67–74.
emergent patterns in the re-tweet network topology
Jianfeng Gao, Galen Andrew, Mark Johnson, and
to identify whether a new trending topic is a rumor Kristina Toutanova. 2007. A comparative study of pa-
or not. rameter estimation methods for statistical natural lan-
guage processing. In ACL ’07.
8 Acknowledgments Namrata Godbole, Manjunath Srinivasaiah, and Steven
Skiena. 2007. Large-scale sentiment analysis for
The authors would like to thank Paul Resnick,
news and blogs. In Proceedings of the International
Rahul Sami, and Brendan Nyhan for helpful discus- Conference on Weblogs and Social Media (ICWSM),
sions. This work is supported by the National Sci- Boulder, CO, USA.
ence Foundation grant “SoCS: Assessing Informa- Ahmed Hassan, Vahed Qazvinian, and Dragomir Radev.
tion Credibility Without Authoritative Sources” as 2010. What’s with the attitude? identifying sentences
IIS-0968489. Any opinions, findings, and conclu- with attitude in online discussions. In Proceedings of
sions or recommendations expressed in this paper the 2010 Conference on Empirical Methods in Natural
are those of the authors and do not necessarily re- Language Processing, pages 1245–1255, Cambridge,
MA, October. Association for Computational Linguis-
flect the views of the supporters.
tics.
Edward S Herman and Noam Chomsky. 2002. Manu-
facturing Consent: The Political Economy of the Mass
References
Media. Pantheon.
Floyd H. Allport and Milton Lepkin. 1945. Wartime ru- Courtenay Honeycutt and Susan C. Herring. 2009. Be-
mors of waste and special privilege: why some people yond microblogging: Conversation and collaboration

1598
via twitter. Hawaii International Conference on Sys-
tem Sciences, 0:1–10.
Jeff Huang, Katherine M. Thornton, and Efthimis N.
Efthimiadis. 2010. Conversational tagging in twitter.
In Proceedings of the 21st ACM conference on Hyper-
text and hypermedia, HT ’10, pages 173–178.
Klaus Krippendorff. 1980. Content Analysis: An Intro-
duction to its Methodology. Beverly Hills: Sage Pub-
lications.
Jure Leskovec, Lars Backstrom, and Jon Kleinberg.
2009. Meme-tracking and the dynamics of the news
cycle. In KDD ’09: Proceedings of the 15th ACM
SIGKDD international conference on Knowledge dis-
covery and data mining, pages 497–506.
Christopher D. Manning, Prabhakar Raghavan, and Hin-
rich Schütze. 2008. Introduction to Information Re-
trieval. Cambridge University Press.
Marcelo Mendoza, Barbara Poblete, and Carlos Castillo.
Twitter under crisis: Can we trust what we rt?
Alexander Pak and Patrick Paroubek. 2010. Twit-
ter as a corpus for sentiment analysis and opinion
mining. In Nicoletta Calzolari (Conference Chair),
Khalid Choukri, Bente Maegaard, Joseph Mariani,
Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel
Tapias, editors, Proceedings of the Seventh conference
on International Language Resources and Evaluation
(LREC’10), Valletta, Malta, may. European Language
Resources Association (ELRA).
Bo Pang and Lillian Lee. 2004. A sentimental educa-
tion: sentiment analysis using subjectivity summariza-
tion based on minimum cuts. In ACL’04, Morristown,
NJ, USA.
Bo Pang and Lillian Lee. 2008. Opinion mining and
sentiment analysis. Foundations and Trends in Infor-
mation Retrieval, 2:1–135.
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
2002. Thumbs up?: sentiment classification using ma-
chine learning techniques. In Proceedings of confer-
ence on Empirical methods in natural language pro-
cessing, EMNLP’02, pages 79–86.
Jacob Ratkiewicz, Michael Conover, Mark Meiss, Bruno
Gonçalves, Snehal Patil, Alessandro Flammini, and
Filippo Menczer. 2010. Detecting and tracking
the spread of astroturf memes in microblog streams.
CoRR, abs/1011.3768.

1599

Social Media Rumor Detection Guide
0% (1)
Social Media Rumor Detection Guide
19 pages
A Study of Rumor Detection Based On Social Network Topic Models Relationship
No ratings yet
A Study of Rumor Detection Based On Social Network Topic Models Relationship
12 pages
The Rumour Spectrum: A1111111111 A1111111111 A1111111111 A1111111111 A1111111111
No ratings yet
The Rumour Spectrum: A1111111111 A1111111111 A1111111111 A1111111111 A1111111111
27 pages
Rumour Detection Models and Tools For Social PDF
No ratings yet
Rumour Detection Models and Tools For Social PDF
6 pages
Early Rumour Detection: Kaimin Zhou Chang Shu Binyang Li Jey Han Lau
No ratings yet
Early Rumour Detection: Kaimin Zhou Chang Shu Binyang Li Jey Han Lau
10 pages
A Multilevel Graph Convolution Neural Network Model For Rumor Detection
No ratings yet
A Multilevel Graph Convolution Neural Network Model For Rumor Detection
5 pages
A50 Vosoughi
No ratings yet
A50 Vosoughi
36 pages
Clase 9 Caso Estudio
No ratings yet
Clase 9 Caso Estudio
10 pages
No, That Never Happened!! Investigating Rumors On Twitter: Department: Affective Computing and Sentiment Analysis
No ratings yet
No, That Never Happened!! Investigating Rumors On Twitter: Department: Affective Computing and Sentiment Analysis
8 pages
Information Credibility On Twitter in Emergency Situation
No ratings yet
Information Credibility On Twitter in Emergency Situation
15 pages
Untitled Collection 2ye0ujym Composite User Behaviour Assisted Rumour Detection Over 4abixgi4tv
No ratings yet
Untitled Collection 2ye0ujym Composite User Behaviour Assisted Rumour Detection Over 4abixgi4tv
6 pages
Sentence Embedding To Improve Rumour Detection Performance Model
No ratings yet
Sentence Embedding To Improve Rumour Detection Performance Model
7 pages
Detecting Rumor Patterns in Streaming Social Media
No ratings yet
Detecting Rumor Patterns in Streaming Social Media
7 pages
Deep Learning for Rumor Detection
No ratings yet
Deep Learning for Rumor Detection
10 pages
Niletmrg at Semeval-2017 Task 8: Determining Rumour and Veracity Support For Rumours On Twitter
No ratings yet
Niletmrg at Semeval-2017 Task 8: Determining Rumour and Veracity Support For Rumours On Twitter
5 pages
Paper Jan1
No ratings yet
Paper Jan1
12 pages
A Semisupervised Model For Persian Rumor Verification Based On Content Informationmultimedia Tools and Applications
No ratings yet
A Semisupervised Model For Persian Rumor Verification Based On Content Informationmultimedia Tools and Applications
29 pages
Fake Tweet Detection Survey
No ratings yet
Fake Tweet Detection Survey
7 pages
Fake News Synopsis
No ratings yet
Fake News Synopsis
10 pages
Michael Final Project
100% (1)
Michael Final Project
59 pages
Rumor Detection Based On Attention CNN and Time Series of
No ratings yet
Rumor Detection Based On Attention CNN and Time Series of
18 pages
Interpretable Rumor Detection in Microblogs by Attending To User Interactions
No ratings yet
Interpretable Rumor Detection in Microblogs by Attending To User Interactions
8 pages
Analyzing and Ranking Prevalent News Over Social Media
No ratings yet
Analyzing and Ranking Prevalent News Over Social Media
12 pages
Social Media Rumor Detection Guide
No ratings yet
Social Media Rumor Detection Guide
4 pages
Cyberspace News Prediction of Text and Image
No ratings yet
Cyberspace News Prediction of Text and Image
53 pages
28 Ijcse 07897
No ratings yet
28 Ijcse 07897
5 pages
Social Media's Role in Political Rumors
No ratings yet
Social Media's Role in Political Rumors
17 pages
Hate Speech Detection Using Machine Learning2
No ratings yet
Hate Speech Detection Using Machine Learning2
4 pages
Titov Bunker
No ratings yet
Titov Bunker
8 pages
Jung Et Al 2020 Caution Rumors Ahead A Case Study On The Debunking of False Information On Twitter
No ratings yet
Jung Et Al 2020 Caution Rumors Ahead A Case Study On The Debunking of False Information On Twitter
15 pages
Early Social Media Rumor Detection
No ratings yet
Early Social Media Rumor Detection
14 pages
2010nel A
No ratings yet
2010nel A
11 pages
A Stochastics Branching Process Model - Formatted
No ratings yet
A Stochastics Branching Process Model - Formatted
5 pages
The Future of Combating Rumors? Retrieval, Discrimination, and Generation
No ratings yet
The Future of Combating Rumors? Retrieval, Discrimination, and Generation
9 pages
SNA Project Presentation
No ratings yet
SNA Project Presentation
18 pages
3.efficient Fake New Detector
No ratings yet
3.efficient Fake New Detector
9 pages
Twitter Sentiment Analysis Using Deep Learning
No ratings yet
Twitter Sentiment Analysis Using Deep Learning
17 pages
Detection and Visualization of Misleading Content On Twitter
No ratings yet
Detection and Visualization of Misleading Content On Twitter
16 pages
Approaches To Identify Fake News: A Systematic Literature Review
No ratings yet
Approaches To Identify Fake News: A Systematic Literature Review
10 pages
Ijcse What
No ratings yet
Ijcse What
10 pages
Twitter Bullying Detection
No ratings yet
Twitter Bullying Detection
7 pages
Multimodal Disentangled Domain Adaption For Social Media Event Rumor Detection
No ratings yet
Multimodal Disentangled Domain Adaption For Social Media Event Rumor Detection
14 pages
DA Project Report
No ratings yet
DA Project Report
17 pages
10 1108 - JD 05 2017 0075
No ratings yet
10 1108 - JD 05 2017 0075
24 pages
Iitp at Semeval-2017 Task 8: A Supervised Approach For Rumour Evaluation
No ratings yet
Iitp at Semeval-2017 Task 8: A Supervised Approach For Rumour Evaluation
5 pages
Classification Unlabled Online Media
No ratings yet
Classification Unlabled Online Media
13 pages
Analysis of Emotion in Rumour Threads On Social Media: Rui Xing, Boyang Sun, Kun Zhang, Timothy Baldwin, Jey Han Lau
No ratings yet
Analysis of Emotion in Rumour Threads On Social Media: Rui Xing, Boyang Sun, Kun Zhang, Timothy Baldwin, Jey Han Lau
11 pages
DMW Project Report by Saurabh Zingade
No ratings yet
DMW Project Report by Saurabh Zingade
16 pages
Summarizing Micro Blog Ing Automatically
No ratings yet
Summarizing Micro Blog Ing Automatically
4 pages
A Review of Approaches For Topic Detection in Twitter
No ratings yet
A Review of Approaches For Topic Detection in Twitter
28 pages
Compromised Account Detection On Social Networks
No ratings yet
Compromised Account Detection On Social Networks
11 pages
CSCW Proximity Huang Starbird FINAL PDF
No ratings yet
CSCW Proximity Huang Starbird FINAL PDF
12 pages
Twitter Sentiment Analysis Algorithm
No ratings yet
Twitter Sentiment Analysis Algorithm
4 pages
Sentimental Analysison Tweets About Covid'19
No ratings yet
Sentimental Analysison Tweets About Covid'19
23 pages
Major
No ratings yet
Major
15 pages
Thesis Paper Patrick Jaehnichen
No ratings yet
Thesis Paper Patrick Jaehnichen
88 pages
Siri PPT 1
No ratings yet
Siri PPT 1
10 pages
An Attention-Based Rumor Detection Model With
No ratings yet
An Attention-Based Rumor Detection Model With
28 pages
NLPR D 200522 001
No ratings yet
NLPR D 200522 001
13 pages
Paramore That S What You Get PDF Free
No ratings yet
Paramore That S What You Get PDF Free
7 pages
TL101 - MAT1503 - Assignment - 5
No ratings yet
TL101 - MAT1503 - Assignment - 5
4 pages
Dimensions, Sizes and Specification of JIS B2220 Standard Steel Flanges PDF
No ratings yet
Dimensions, Sizes and Specification of JIS B2220 Standard Steel Flanges PDF
8 pages
Exide Battery
No ratings yet
Exide Battery
14 pages
Saraswat Co-Operative Bank LTD - Wikipedia
No ratings yet
Saraswat Co-Operative Bank LTD - Wikipedia
4 pages
Cadmach Tablet Press Replacement Parts Catalog
No ratings yet
Cadmach Tablet Press Replacement Parts Catalog
12 pages
9 - Ict - T2 - Revsion Material - MS - 2022-23
No ratings yet
9 - Ict - T2 - Revsion Material - MS - 2022-23
5 pages
Easy Cornbread Recipe - Moist, Fluffy Homemade Cornbread
No ratings yet
Easy Cornbread Recipe - Moist, Fluffy Homemade Cornbread
2 pages
Smoke ABSORBER Machinee
20% (5)
Smoke ABSORBER Machinee
8 pages
Engineering Economics and Financial Analysis
No ratings yet
Engineering Economics and Financial Analysis
2 pages
Pharmaceutical Technology
No ratings yet
Pharmaceutical Technology
15 pages
Newtonianism and The Constitution
No ratings yet
Newtonianism and The Constitution
16 pages
Chess Training: The Important Areas of Chess Knowledge
50% (4)
Chess Training: The Important Areas of Chess Knowledge
15 pages
Chirality & Stereochemistry Guide
No ratings yet
Chirality & Stereochemistry Guide
108 pages
Audit Report
No ratings yet
Audit Report
9 pages
Valerie Lynn, The Mommy Plan, NPEW Invitation, Mar 8th-10th
No ratings yet
Valerie Lynn, The Mommy Plan, NPEW Invitation, Mar 8th-10th
1 page
Necromancer List
No ratings yet
Necromancer List
5 pages
Manual Ajuste GeN2-R2 - ACD2 MRL PDF Elevator Manufactured Goods 2
No ratings yet
Manual Ajuste GeN2-R2 - ACD2 MRL PDF Elevator Manufactured Goods 2
1 page
Heritage Institute of Technology, Kolkata: Information For Admission For The Session 2016 - 2017
No ratings yet
Heritage Institute of Technology, Kolkata: Information For Admission For The Session 2016 - 2017
2 pages
Major Physical Features in Our County
100% (1)
Major Physical Features in Our County
6 pages
FATA Water Studies
No ratings yet
FATA Water Studies
132 pages
WAI Application Form
No ratings yet
WAI Application Form
4 pages
ICT 10 REGISTRY EDIT Lecture
No ratings yet
ICT 10 REGISTRY EDIT Lecture
22 pages
Brighenti 2016 Gramsci Civil Society
No ratings yet
Brighenti 2016 Gramsci Civil Society
8 pages
FAS - 01 - General Presentation and Disclosures in The Financial Statements
No ratings yet
FAS - 01 - General Presentation and Disclosures in The Financial Statements
3 pages
Ergonomics - Posture: Chair
No ratings yet
Ergonomics - Posture: Chair
3 pages
Franklin India Multi Asset Fund - One Pager - V11 HDFC Securities 09072025
No ratings yet
Franklin India Multi Asset Fund - One Pager - V11 HDFC Securities 09072025
2 pages
Curriculum Vitae: Avi Srivastava
No ratings yet
Curriculum Vitae: Avi Srivastava
2 pages
Lab Manual (March-July 2018)
No ratings yet
Lab Manual (March-July 2018)
37 pages
Isochretism and Style
No ratings yet
Isochretism and Style
12 pages

Detecting Rumors on Twitter

Uploaded by

Detecting Rumors on Twitter

Uploaded by

Rumor has it: Identifying Misinformation in Microblogs

Vahed Qazvinian Emily Rosengren Dragomir R. Radev Qiaozhu Mei

Abstract in which one’s well-being is at risk (DiFonzo et al.,

classification). regexp (extracted from about.com) to Twitter and re-

Method Accuracy Precision Recall Fβ=1 Win/Loss Ratio

vestigate whether this method, and in particular, the

You might also like