Beyond News Contents: The Role of Social Context For Fake News Detection
Beyond News Contents: The Role of Social Context For Fake News Detection
ally false information. Detecting fake news is an important task, intentionally false information, are produced online for a variety
which not only ensures users receive authentic information but of purposes, such as financial and political gain [2, 13].
also helps maintain a trustworthy news ecosystem. The majority Fake news can have detrimental effects on individuals and the
of existing detection algorithms focus on finding clues from news society. First, people may be misled by fake news and accept false
contents, which are generally not effective because fake news is of- beliefs [18, 21]. Second, fake news could change the way people
ten intentionally written to mislead users by mimicking true news. respond to true news2 . Third, the wide propagation of fake news
Therefore, we need to explore auxiliary information to improve could break the trustworthiness of entire news ecosystem. Thus, it
detection. The social context during news dissemination process is important to detect fake news on social media. Fake news is inten-
on social media forms the inherent tri-relationship, the relationship tionally written to mislead consumers, which makes it nontrivial
among publishers, news pieces, and users, which has potential to to detect simply based on news content. To build an effective and
improve fake news detection. For example, partisan-biased publish- practical fake news detection system, it is natural and necessary to
ers are more likely to publish fake news, and low-credible users explore auxiliary information from different perspectives.
are more likely to share fake news. In this paper, we study the The news ecosystem on social media provides abundant social
novel problem of exploiting social context for fake news detection. context information, which involves three basic entities, i.e., pub-
We propose a tri-relationship embedding framework TriFN, which lishers, news pieces, and social media users. Figure 1 gives an il-
models publisher-news relations and user-news interactions simul- lustration of such ecosystem. In Figure 1, p1 , p2 and p3 are news
taneously for fake news classification. We conduct experiments publishers who publish news a 1 , . . . , a 4 and u 1 , . . . , u 6 are users
on two real-world datasets, which demonstrate that the proposed who have engaged in sharing these news pieces. In addition, users
approach significantly outperforms other baseline methods for fake tend to form social links with like-minded people with similar inter-
news detection. ests. As we will show, the tri-relationship, the relationship among
publishers, news pieces, and users, contains additional information
KEYWORDS to help detect fake news.
First, sociological studies on journalism have theorized the cor-
Fake news detection; joint learning; social media mining
relation between the partisan bias of publisher and the veracity
ACM Reference Format: degree of news content [8]. The partisan bias means the perceived
Kai Shu, Suhang Wang, and Huan Liu. 2019. Beyond News Contents: The bias of the publisher in the selection of how news is reported and
Role of Social Context for Fake News Detection. In The Twelfth ACM Inter- covered [6]. For example, in Figure 1, p1 is a publisher with extreme
national Conference on Web Search and Data Mining (WSDM ’19), February left partisan bias and p2 is a publisher with extreme right partisan
11–15, 2019, Melbourne, VIC, Australia. ACM, New York, NY, USA, 10 pages. bias. To support their own partisan viewpoints, they have high
https://doi.org/10.1145/3289600.3290994 degree to distort the facts and report fake news pieces, such as a 1
and a 3 ; while for a mainstream publisher p3 that has least partisan
1 INTRODUCTION bias, he/she has a lower chance to manipulate original news events,
People nowadays tend to seek out and consume news from social and is more likely to write a true news piece a 4 . Thus, exploit-
media rather than traditional news organizations. For example, 62% ing the partisan bias of publishers to bridge the publisher-news
of U.S. adults get news on social media in 2016, while in 2012, only relationships can bring additional benefits to predict fake news.
Second, mining user engagements towards news pieces on social
media also help fake news detection. Previous approaches try to
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed aggregate users’ attributes to infer the degree of news veracity by
for profit or commercial advantage and that copies bear this notice and the full citation assuming that either (i) all the users contribute equally for learning
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
feature representations of news pieces [10]; or (ii) user features are
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from permissions@acm.org.
1 http://www.journalism.org/2016/05/26/news-use-across-social-media-platforms-
WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia
© 2019 Association for Computing Machinery. 2016/
2 https://www.nytimes.com/2016/11/28/opinion/fake-news-and-the-internet-shell-
ACM ISBN 978-1-4503-5940-5/19/02. . . $15.00
https://doi.org/10.1145/3289600.3290994 game.html?
Figure 1: An illustration of tri-relationship among publish- Figure 2: The tri-relationship embedding framework, which
ers, news pieces, and users, during the news dissemination consists of five components: news contents embedding, user
process. For example, an edge (p → a) demonstrates that embedding, user-news interaction embedding, publisher-
publisher p publishes news item a, an edge (a → u) repre- news relation embedding, and news classification.
sents news item a is spread by user u, and an edge (u 1 ↔ u 2 )
indicates the social relation between user u 1 and u 2 .
of news pieces, where t is the dimension of vocabulary size. We use
A ∈ {0, 1}m×m to denote the user-user adjacency matrix, where
grouped locally for specific news and the global user-news interac- Ai j = 1 indicates that user ui and u j are friends; otherwise Ai j = 0.
tions are ignored [4]. However, in practice, these assumptions may We denote the user-news interaction matrix as W ∈ {0, 1}m×n ,
not hold. On social media, different users have different credibility where Wi j = 1 indicates that user ui has shared the news piece
levels. The credibility score, which means “the quality of being a j ; otherwise Wi j = 0. It’s worth mentioning that we focus on
trustworthy” [1], has a strong indication of whether some user is those user-news interactions in which users agree with the news.
more likely to share fake news or not. Those less credible users, For example, we only consider those users who share news pieces
such as malicious accounts or normal users who are vulnerable to without comments, and these users share the same alignment of
fake news, are more likely to spread fake news. For example, u 2 and viewpoints with the news items [12]. We will introduce more details
u 4 are users with low credibility scores, and they tend to spread fake in Section 3.3. We also denote P = {p1 , p2 , ..., pl } as the set of l
news more than other highly credible users. In addition, users tend news publishers. In addition, we denote B ∈ Rl ×n as the publisher-
to form relationships with like-minded people [25]. For example, news publishing matrix, and Bk j = 1 means news publisher pk
user u 5 and u 6 are friends on social media, so they tend to post publishes the news article a j ; otherwise Bk j = 0. We assume that
those news that confirm their own views, such as a 4 . Therefore, the partisan bias labels of some publishers are given and available
incorporating the user credibility levels to capture the user-news (see more details of how to collect partisan bias labels in Sec 3.4).
interactions has potentials to improve fake news prediction. We define o ∈ {−1, 0, 1}l ×1 as the partisan label vectors, where -1,
Moreover, the publisher-news relationships and user-news inter- 0, 1 represents left-, neutral-, and right-partisan bias.
actions both provide new and different perspectives of social con- Similar to previous research [10, 32], we treat fake news detec-
text, and thus contain complementary information to advance fake tion problem as a binary classification problem. In other words, each
news detection. In this paper, we investigate: (1) how to mathemat- news piece can be true or fake, and we use y = {y1 ; y2 ; ...; yn } ∈
ically model the tri-relationship to extract feature representations Rn×1 to represent the labels, and yj = 1 means news piece a j is
of news pieces; and (2) how to take advantage of tri-relationship fake news; yj = −1 means true news. With the notations given
modeling for fake news detection. Our solutions to these two chal- above, the problem is formally defined as,
lenges results in a novel framework TriFN for fake news detection
problem. Our main contributions are summarized as follows: Given news article feature matrix X, user adjacency matrix A, user
• We provide a principled way to model tri-relationship among social engagement matrix W, publisher-news publishing matrix B,
publishers, news pieces, and users simultaneously; publisher partisan label vector o, and partial labeled news vector yL ,
• We propose a novel framework TriFN, which exploits both we aim to predict remaining unlabeled news label vector yU .
user-news interactions and publisher-news relations for learn-
ing news feature representations to predict fake news; and 3 A TRI-RELATIONSHIP EMBEDDING
• We conduct extensive experiments on two real-world datasets FRAMEWORK
to assess the effectiveness of TriFN. In this section, we present the details of the proposed framework
TriFN for modeling tri-relationship for fake news detection. It con-
2 PROBLEM STATEMENT sists of five major components (Figure 2): a news contents embed-
Let A = {a 1 , a 2 , ..., an } be the set of n news pieces, and U = ding component, a user embedding component, a user-news interac-
{u 1 , u 2 , ..., um } be the set of m users on social media posting these tion embedding component, a publisher-news relation embedding
news pieces. We denote X ∈ Rn×t as the bag-of-word feature matrix component, and a semi-supervised classification component.
In general, the news contents embedding component describes less likely to spread fake news. To measure user credibility scores,
the mapping of news from bag-of-word features to latent feature we adopt the practical approach in [1]. The basic idea in [1] is
space; the user embedding component illustrates the extraction that less credible users are more likely to coordinate with each
of user latent features from user social relations; the user-news other and form big clusters, while more credible users are likely to
interaction embedding component learn the feature representations from small clusters. Specifically, the credibility scores are measured
of news pieces guided by their partial labels and user credibilities; through the following major steps: 1) detect and cluster coordinate
The publisher-news relation embedding component regularize the users based on user similarities; 2) weight each cluster based on the
feature representations of news pieces through publisher partisan cluster size. Note that for our fake news detection task, we do not
bias labels; The semi-supervised classification component learns a assume that credibility scores are directly provided, but inferred
classification function to predict unlabeled news items. from widely available data, such as user-generated contents. By
using the method in [1], we can assign each user ui a credibility
3.1 News Contents Embedding score ci ∈ [0, 1]. A larger ci indicates that user ui has a higher
We can use news contents to find clues to differentiate fake news credibility, while a lower ci indicates a lower credibility score. We
and true news. Recently, it has been shown that nonnegative matrix use c = {c1 , c2 , ..., cm } to denote the credibility score vector for all
factorization (NMF) algorithms are very practical and popular to users.
learn document representations [20, 28, 38]. It can project the news- First, high-credibility users are more likely to share true news
word matrix X to a joint latent semantic factor space with low pieces, so we ensure that the distance between latent features of
dimensionality, such that the news-word relations are modeled high-credibility users and that of true news is minimized,
as the inner product in the space. Specifically, giving the news-
word matrix X ∈ Rn×t , NMF methods try to find two nonnegative m Õ
Õ r
1 + yLj
matrices D ∈ Rn×d
+ and V ∈ Rt+×d , where d is the dimension of the min Wi j ci (1 − )||Ui − DL j ||22 (3)
U,DL ≥0
i=1 j=1
2
latent space, by solving the following optimization problem,
1+y
min ∥ X − DVT ∥F2 + λ(∥D∥F2 + ∥V∥F2 ) (1) and (1 − 2 Lj ) is to ensure we only include true news pieces (i.e.,
D,V≥0 yLj = −1), and ci is to adjust the contribution of user ui to the loss
where D and V are the nonnegative matrices indicating low dimen- function. For example, if ci is large (high-credibility) and Wi j = 1,
sion representations of news pieces and words. Note that we denote we put a bigger weight on forcing the distance of feature Ui and
D = [DL ; DU ], where DL ∈ Rr ×d is the news latent feature matrix DLj to be small; if ci is small (low-credibility) and Wi j = 1, than
for labeled news; while DU ∈ R(n−r )×d is the news latent feature we put a smaller weight on forcing the distance of feature Ui and
matrix for unlabeled news. The term λ(∥D∥F2 + ∥V∥F2 ) is introduced DLj to be small.
to avoid over-fitting. Second, low-credibility users are more likely to share fake news
pieces, and we aim to minimize the distance between latent features
3.2 User Embedding of low-credibility users and that of fake news,
On social media, people tend to form relationships with like-minded
people, rather than those users who have opposing preferences and m Õ
r
Õ 1 + yLj
interests. Thus, connected users are more likely to share similar min Wi j (1 − ci )( )||Ui − DL j ||22 (4)
latent interests in news pieces. To obtain a standardized represen- U,DL ≥0
i=1 j=1
2
tation, we use nonnegative matrix factorization to learn the users’
1+y
latent representations. Specifically, giving user-user adjacency ma- and the term ( 2 Lj ) is to ensure we only include fake news pieces
trix A ∈ {0, 1}m×m , we learn nonnegative matrix U ∈ Rm×d + by (i.e., yL j = 1), and (1 − ci ) is to adjust the contribution of user ui to
solving the following optimization problem, the loss function. For example, if ci is large (high-credibility) and
Wi j = 1, we put a smaller weight on forcing the distance of feature
min ∥Y ⊙ (A − UTUT )∥F2 + λ(∥U∥F2 + ∥T∥F2 ) (2)
U,T≥0 Ui and DLj to be small; if ci is small (low-credibility) and Wi j = 1,
then we put a bigger weight on forcing the distance of feature Ui
where U is the user latent matrix, T ∈ Rd+×d is the user-user cor-
and DLj to be small.
relation matrix, Y ∈ Rm×m controls the contribution of A, and
Finally, We combine Eqn 3 and Eqn 4 to consider the above two
⊙ denotes the Hadamard product operation. Since only positive
situations, and obtain the following objective function,
links are observed in A, following common strategies [19], we first
set Yi j = 1 if Ai j = 1, and then perform negative sampling and m Õ
r
Õ 1 + yLj
generate the same number of unobserved links and set weights as min Wi j ci (1 − )||Ui − DL j ||22
0. The term λ(∥U∥F2 + ∥T∥F2 ) is to avoid over-fitting. U,DL ≥0
i=1 j=1
2
| {z }
3.3 User-News Interaction Embedding True news
(5)
m Õ
r
We model the user-news interactions by considering the relation-
Õ 1 + yLj
+ Wi j (1 − ci )( )||Ui − DL j ||22
ships between user features and the labels of news items. We have i=1 j=1
2
shown (see Section 1) that users with low credibilities are more | {z }
likely to spread fake news, while users with high credibilities are Fake news
For simplicity, Eqn 5 can be rewritten as, 3.5 Proposed Framework - TriFN
m Õ
r We have introduced how we can learn news latent features by mod-
Õ
min Gi j ||Ui − DL j ||22 (6) eling different aspects of the tri-relationship. We further employ a
U,DL ≥0 semi-supervised linear classifier term as follows,
i=1 j=1
1+y 1+y
where Gi j = Wi j (ci (1 − 2 Lj ) + (1 − ci )( 2 Lj )). If we denote a min ∥ DL p − yL ∥22 + λ∥p∥22 (10)
p
new matrix H = [U; DL ] ∈ R(m+r )×d , we can also rewrite Eqn. 6 as
a matrix form as follows, where p ∈ Rd ×1 is the weighting matrix that maps news latent
features to fake news labels. With all previous components, TriFN
m Õ
r m rÕ
+m
solves the following optimization problem,
Õ Õ
min Gi j ||Ui − DL j ||22 ⇔ min Gi j ||Hi − Hj ||22
U,DL ≥0 H≥0
i=1 j=1 i=1 j=1+m
m+r
min ∥X − DVT ∥F2 + α ∥Y ⊙ (A − UTUT )∥F2
D,U,V,T≥0,p,q
Fi j ||Hi − Hj ||22 ⇔ min tr(HT LH)
Õ
⇔ min
H≥0
i, j=1
H≥0 + βtr(HT LH) + γ ∥e ⊙ (B̄Dq − o)∥22 (11)
(7) + η∥DL p − yL ∥22 + λR
where L = S − F is the Laplacian matrix and S is a diagonal ma-
trix with diagonal element Sii = m+r (m+r )×(m+r ) is
Í
j=1 Fi j . F ∈ R where R = (∥D∥F2 + ∥V∥F2 + ∥U∥F2 + ∥T∥F2 + ∥p∥22 + ∥q∥22 ) is to
computed as follows, avoid over-fitting. The first term models the news latent features
from news contents; the second term extracts user latent features
0, i, j ∈ [1, m] or i, j ∈ [m + 1, m + r ] from their social relationships; and the third term incorporates the
Fi j = G i(j−m) , i ∈ [1, m], j ∈ [m + 1, m + r ]
(8) user-news interactions; and the fourth term models publisher-news
G relationships. The fifth term adds a semi-supervised fake news
(i−m)j , i ∈ [m + 1, m + r ], j ∈ [1, m]
classifier. Therefore, this framework provides a principled way to
model tri-relationship for fake news prediction.
3.4 Publisher-News Relation Embedding
Fake news is often written to convey opinions or claims that sup-
port the partisan bias of news publishers. Thus, a good news rep-
4 AN OPTIMIZATION ALGORITHM
resentation should be good at predicting the partisan bias of its In this section, we present the detail optimization process for the
publisher. We obtain the list of publishers’ partisan scores from a proposed framework TriFN. If we update the variables jointly, the
well-known media bias fact-checking websites MBFC 3 . The parti- objective function in Eq. 11 is not convex. Thus, we propose to
san bias labels are checked with a principled methodology that en- use alternating least squares to update the variables separately. For
sures the reliability and objectivity of the partisan annotations. The simplicity, we user L to denote the objective function in Eq. 11.
labels are categorized into five categories: “left”, “left-Center”,“least- Next, we introduce the updating rules for each variable in details.
biased”,“right-Center” and “right”. To further ensure the accuracy of Update D. Let ΨD be the Lagrange multiplier for constraint
the labels, we only consider those news publishers with the annota- D ≥ 0, the Lagrange function related to D is,
tions [“left”,“least-biased”, “Right”], and rewrite the corresponding
labels as [-1,0,1]. Thus, we can construct a partisan label vectors min ∥X − DVT ∥F2 + βtr(HT LH) + γ ∥e ⊙ (B̄Dq − o)∥22
D (12)
for news publishers as o. Note that we may not obtain the partisan
+ η∥DL p − yL ∥22 + λ∥D∥F2 − tr(ΨD DT )
labels for all publishers, so we introduce e ∈ {0, 1}l ×1 to control the
weight of o. If we have the partisan bias label of publisher pk , then
and D = [DL ; DU ] and H = [U; DL ]. We rewrite L = [L11 , L12 ; L21 , L22 ],
ek = 1; otherwise, ek = 0. The basic idea is to utilize publisher par-
where L11 ∈ Rm×m , L12 ∈ Rm×r ,L21 ∈ Rr ×m , and L22 ∈ Rr ×r ; and
tisan labels vector o ∈ Rl ×1 and publisher-news matrix B ∈ Rl ×n X = [XL , XU ]. The partial derivative of L w.r.t. D as follows,
to optimize the news feature representation learning. Specifically,
we optimization following objective function,
1 ∂L
= (DVT − X)V + λD + γ B̄T ET (EB̄Dq − Eo)qT
min ∥ e ⊙ (B̄Dq − o)∥22 + λ∥q∥22 2 ∂D (13)
(9)
D≥0,q + βL21 U + βL22 DL + η(DL p − yL )pT ; 0 − ΨD
F1
F1
0.8 0.8
0.8 0.8 η γ η γ
0 0 0 0
F1
0.7 0.7
(a) η and γ on BuzzFeed (b) η and γ on PolitiFact
0.6 0.6
0.5 0.5
12 24 36 48 60 72 84 96 All 12 24 36 48 60 72 84 96 All
hours hours
F1
F1
0.8 0.9
0.8 0.01
0.6 0.01
1 1 0.01
0.01
RST 0.001 0.001
RST
LIW C 0.001 0.001
LIW C
Castillo 0.0001 0.0001
Castillo 0.0001 0.0001
0.9 0.9 RST + Castillo
RST + Castillo 1e-05 1e-05
LIW C + Castillo 1e-05 1e-05
LIW C + Castillo
T riF N
T riF N β α β α
0 0 0 0
Accuracy
0.8 0.8
F1
0.7 0.7
(c) α and β on BuzzFeed (d) α and β on PolitiFact
0.6 0.6
Figure 5: Model parameter analysis for TriFN on BuzzFeed
and PolitiFact in terms of F1.
0.5 0.5
12 24 36 48 60 72 84 96 All 12 24 36 48 60 72 84 96 All
hours hours
Figure 4: The performance of early fake news detection on specific writing styles and sensational headlines that commonly oc-
BuzzFeed and PolitiFact in terms of Accuracy and F1. cur in fake news content [24], such as lexical and syntactic features.
Visual-based features try to identify fake images [9] that are in-
tentionally created or capturing specific characteristics for images
5.6 Model Parameter Analysis in fake news. News content based models include i) knowledge-
The proposed TriFN has four important parameters. The first two based: using external sources to fact-checking claims in news con-
are α and β, which control the contributions from social relationship tent [17, 37], and 2) style-based: capturing the manipulators in
and user-news engagements. γ controls the contribution of pub- writing style, such as deception [7, 27] and non-objectivity [24]. For
lisher partisan and η controls the contribution of semi-supervised example, Potthast et al. [24] extracted various style features from
classifier. We first fix {α = 1e − 4, β = 1e − 5} and {α = 1e − 5, β = news contents and predict fake news and media bias.
1e − 4} for BuzzFeed and PolitiFact, respectively. Then we vary η as In addition to news content, social context related to news pieces
{1, 10, 20, 50, 100} and γ in {1, 10, 20, 30, 100}. The performance vari- contains rich information to help detect fake news. For social con-
ations are depicted in Figure 5. We can see i) when η increases from text based approaches, the features mainly include user-based,
0, eliminating the impact of semi-supervised classification term, to post-based and network-based. User-based features are extracted
1, the performance increase dramatically in both datasets. These re- from user profiles to measure their characteristics and credibili-
sults support the importance to combine semi-supervised classifier ties [4, 14, 34, 39]. For example, Shu et al. [34] proposed to under-
to feature learning; ii) generally, the increase of γ will increase the stand user profiles from various aspects to differentiate fake news.
performance in a certain region, γ ∈ [1, 50] and η ∈ [1, 50] for both Yang et al. [39] proposed an unsupervised fake news detection
datasets, which easy the process for parameter setting. Next, we algorithm by utilizing users’ opinions on social media and estimat-
fix {γ = 1, η = 1} and {γ = 10, η = 1} for BuzzFeed and PolitiFact, ing their credibilities. Post-based features represent users’ social
respectively. Then we vary α, β ∈ [0, 1e −5, 1e −4, 1e −3, 0.001, 0.01]. response in term of stance [10], topics [16], or credibility [4, 36].
We can see that i) when α and β increase from 0, which eliminate Network-based features [29] are extracted by constructing specific
the social engagements, to 1e − 5, the performance increases rela- networks, such as diffusion network [14] etc. Social context models
tively, which again support the importance of social engagements; basically include stance-based and propagation-based. Stance-based
ii) The performance tends to increase first and then decrease, and models utilize users’ opinions towards the news to infer news ve-
it’s relatively stable in [1e − 5, 1e − 3]. racity [10]. Propagation-based models assume that the credibility
of news is highly related to the credibilities of relevant social me-
6 RELATED WORK dia posts, which several propagation methods can be applied [10].
We briefly introduce the related work about fake news detection Recently, deep learning models are applied to learn the temporal
on social media. Fake news detection methods generally focus on and linguistic representation of news [11, 30, 35]. Shu et al. [33]
using news contents and social contexts [32, 40]. proposed to generate synthetic data for augmenting training data to
News contents contain the clues to differentiate fake and real help improve the detection of clickbaits. It’s worth mentioning that
news. For news content based approaches, features are extracted as we can not directly compare the propagation-based approaches,
linguistic-based and visual-based. Linguistic-based features capture because we assume we only have user actions, e.g., posting the
news or not. In this case, the propagation signals inferred from text [14] Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, and Yajun Wang. 2013.
are the same and thus become ineffective. Prominent features of rumor propagation in online social media. In Data Mining
(ICDM), 2013 IEEE 13th International Conference on. IEEE, 1103–1108.
In this paper, we are to our best knowledge the first to clas- [15] Daniel D Lee and H Sebastian Seung. 2001. Algorithms for non-negative matrix
sify fake news by learning the effective news features through the factorization. In Advances in neural information processing systems. 556–562.
[16] Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, and Kam-Fai Wong. 2015. Detect
tri-relationship embedding among publishers, news contents, and rumors using time series of social context information on microblogging websites.
social engagements. In Proceedings of the 24th ACM International on Conference on Information and
Knowledge Management. ACM, 1751–1754.
[17] Amr Magdy and Nayer Wanas. 2010. Web-based statistical fact checking of
7 CONCLUSION AND FUTURE WORK textual documents. In Proceedings of the 2nd international workshop on Search
Due to the inherent relationship among publisher, news and social and mining user-generated contents. ACM, 103–110.
[18] Brendan Nyhan and Jason Reifler. 2010. When corrections fail: The persistence
engagements during news dissemination process on social media, of political misperceptions. Political Behavior 32, 2 (2010), 303–330.
we propose a novel framework TriFN to model tri-relationship [19] Rong Pan and Martin Scholz. Mind the gaps: weighting the unknown in large-
for fake news detection. TriFN can extract effective features from scale one-class collaborative filtering. In KDD’09.
[20] V Paul Pauca, Farial Shahnaz, Michael W Berry, and Robert J Plemmons. 2004.
news publisher and user engagements separately, as well as cap- Text mining using non-negative matrix factorizations. In Proceedings of the 2004
ture the interrelationship simultaneously. Experimental results on SIAM International Conference on Data Mining. SIAM, 452–456.
[21] Christopher Paul and Miriam Matthews. 2016. The Russian âĂIJFirehose of
real world fake news datasets demonstrate the effectiveness of the
FalsehoodâĂİ Propaganda Model. RAND Corporation (2016).
proposed framework and importance of tri-relationship for fake [22] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel,
news prediction. It’s worth mentioning TriFN can achieve good Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss,
Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal
detection performance in the early stage of news dissemination. of machine learning research 12, Oct (2011), 2825–2830.
There are several interesting future directions. First, it’s worth to [23] James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The
explore effective features and models for early fake news detection, development and psychometric properties of LIWC2015. Technical Report.
[24] Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno
as fake news usually evolves very fast on social media; Second, how Stein. 2017. A Stylometric Inquiry into Hyperpartisan and Fake News. arXiv
to extract features to model fake news intention from psychology’s preprint arXiv:1702.05638 (2017).
perspective needs further investigation. At last, how to identify low [25] Walter Quattrociocchi, Antonio Scala, and Cass R Sunstein. 2016. Echo chambers
on facebook. (2016).
quality or even malicious users spreading fake news is important [26] Victoria L Rubin, N Conroy, and Yimin Chen. 2015. Towards news verifica-
for fake news intervention and mitigation. tion: Deception detection methods for news discourse. In Hawaii International
Conference on System Sciences.
[27] Victoria L Rubin and Tatiana Lukoianova. 2015. Truth and deception at the
8 ACKOWLEDGMENTS rhetorical structure level. Journal of the Association for Information Science and
Technology 66, 5 (2015), 905–917.
This material is based upon work supported by, or in part by, the [28] Farial Shahnaz, Michael W Berry, V Paul Pauca, and Robert J Plemmons. 2006.
NSF #1614576 and the ONR grant N00014-16-1-2257. Document clustering using nonnegative matrix factorization. Information Pro-
cessing & Management 42, 2 (2006), 373–386.
[29] Kai Shu, H. Russell Bernard, and Huan Liu. 2018. Studying Fake News via Network
REFERENCES Analysis: Detection and Mitigation. CoRR abs/1804.10233 (2018).
[1] Mohammad Ali Abbasi and Huan Liu. 2013. Measuring User Credibility in Social [30] Kai Shu, Deepak Mahudeswaran, and Huan Liu. 2018. FakeNewsTracker: a
Media.. In SBP. Springer, 441–448. tool for fake news collection, detection, and visualization. Computational and
[2] Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the Mathematical Organization Theory (2018), 1–12.
2016 election. Technical Report. National Bureau of Economic Research. [31] Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and Huan Liu.
[3] Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge 2018. FakeNewsNet: A Data Repository with News Content, Social Context and
university press. Dynamic Information for Studying Fake News on Social Media. arXiv preprint
[4] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credi- arXiv:1809.01286 (2018).
bility on twitter. In Proceedings of the 20th international conference on World wide [32] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake
web. ACM, 675–684. News Detection on Social Media: A Data Mining Perspective. KDD exploration
[5] Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. newsletter (2017).
In Proceedings of the 22nd acm sigkdd international conference on knowledge [33] Kai Shu, Suhang Wang, Thai Le, Dongwon Lee, and Huan Liu. Deep Headline
discovery and data mining. ACM, 785–794. Generation for Clickbait Detection.. In ICDM’18.
[6] Robert M Entman. 2007. Framing bias: Media in the distribution of power. Journal [34] Kai Shu, Suhang Wang, and Huan Liu. 2018. Understanding User Profiles on
of communication 57, 1 (2007), 163–173. Social Media for Fake News Detection. In 2018 IEEE Conference on Multimedia
[7] Song Feng, Ritwik Banerjee, and Yejin Choi. 2012. Syntactic stylometry for de- Information Processing and Retrieval (MIPR). IEEE.
ception detection. In Proceedings of the 50th Annual Meeting of the Association for [35] Yaqing Wang, Fenglong Ma, Zhiwei Jin, Ye Yuan, Guangxu Xun, Kishlay Jha, Lu
Computational Linguistics: Short Papers-Volume 2. Association for Computational Su, and Jing Gao. EANN: Event Adversarial Neural Networks for Multi-Modal
Linguistics, 171–175. Fake News Detection. In KDD’18.
[8] Matthew Gentzkow, Jesse M Shapiro, and Daniel F Stone. 2014. Media bias in the [36] Liang Wu and Huan Liu. Tracing fake-news footprints: Characterizing social
marketplace: Theory. Technical Report. National Bureau of Economic Research. media messages by how they propagate. In WSDM’18.
[9] Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi. [37] You Wu, Pankaj K Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. Toward
2013. Faking sandy: characterizing and identifying fake images on twitter during computational fact-checking. Proceedings of the VLDB Endowment 7, 7 (2014),
hurricane sandy. In Proceedings of the 22nd international conference on World 589–600.
Wide Web. ACM, 729–736. [38] Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-
[10] Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. 2016. News Verification negative matrix factorization. In Proceedings of the 26th annual international
by Exploiting Conflicting Social Viewpoints in Microblogs.. In AAAI. 2972–2978. ACM SIGIR conference on Research and development in informaion retrieval. ACM,
[11] Hamid Karimi, Proteek Roy, Sari Saba-Sadiya, and Jiliang Tang. Multi-Source 267–273.
Multi-Class Fake News Detection. In COLING’18. [39] Shuo Yang, Kai Shu, Suhang Wang, Renjie Gu, Fan Wu, and Huan Liu. Un-
[12] Antino Kim and Alan R Dennis. 2017. Says Who?: How News Presentation supervised Fake News Detection on Social Media: A Generative Approach. In
Format Influences Perceived Believability and the Engagement Level of Social AAAI’19.
Media Users. (2017). [40] Xinyi Zhou, Reza Zafarani, Kai Shu, and Huan Liu. Fake News: Fundamental
[13] David O Klein and Joshua R Wueller. 2017. Fake News: A Legal Perspective. Theories, Detection Strategies and Challenges. In WSDM’19.
(2017).