Beyond News Contents: The Role of Social Context For Fake News Detection
Beyond News Contents: The Role of Social Context For Fake News Detection
                                         ally false information. Detecting fake news is an important task,                             intentionally false information, are produced online for a variety
                                         which not only ensures users receive authentic information but                                of purposes, such as financial and political gain [2, 13].
                                         also helps maintain a trustworthy news ecosystem. The majority                                    Fake news can have detrimental effects on individuals and the
                                         of existing detection algorithms focus on finding clues from news                             society. First, people may be misled by fake news and accept false
                                         contents, which are generally not effective because fake news is of-                          beliefs [18, 21]. Second, fake news could change the way people
                                         ten intentionally written to mislead users by mimicking true news.                            respond to true news2 . Third, the wide propagation of fake news
                                         Therefore, we need to explore auxiliary information to improve                                could break the trustworthiness of entire news ecosystem. Thus, it
                                         detection. The social context during news dissemination process                               is important to detect fake news on social media. Fake news is inten-
                                         on social media forms the inherent tri-relationship, the relationship                         tionally written to mislead consumers, which makes it nontrivial
                                         among publishers, news pieces, and users, which has potential to                              to detect simply based on news content. To build an effective and
                                         improve fake news detection. For example, partisan-biased publish-                            practical fake news detection system, it is natural and necessary to
                                         ers are more likely to publish fake news, and low-credible users                              explore auxiliary information from different perspectives.
                                         are more likely to share fake news. In this paper, we study the                                   The news ecosystem on social media provides abundant social
                                         novel problem of exploiting social context for fake news detection.                           context information, which involves three basic entities, i.e., pub-
                                         We propose a tri-relationship embedding framework TriFN, which                                lishers, news pieces, and social media users. Figure 1 gives an il-
                                         models publisher-news relations and user-news interactions simul-                             lustration of such ecosystem. In Figure 1, p1 , p2 and p3 are news
                                         taneously for fake news classification. We conduct experiments                                publishers who publish news a 1 , . . . , a 4 and u 1 , . . . , u 6 are users
                                         on two real-world datasets, which demonstrate that the proposed                               who have engaged in sharing these news pieces. In addition, users
                                         approach significantly outperforms other baseline methods for fake                            tend to form social links with like-minded people with similar inter-
                                         news detection.                                                                               ests. As we will show, the tri-relationship, the relationship among
                                                                                                                                       publishers, news pieces, and users, contains additional information
                                         KEYWORDS                                                                                      to help detect fake news.
                                                                                                                                           First, sociological studies on journalism have theorized the cor-
                                         Fake news detection; joint learning; social media mining
                                                                                                                                       relation between the partisan bias of publisher and the veracity
                                         ACM Reference Format:                                                                         degree of news content [8]. The partisan bias means the perceived
                                         Kai Shu, Suhang Wang, and Huan Liu. 2019. Beyond News Contents: The                           bias of the publisher in the selection of how news is reported and
                                         Role of Social Context for Fake News Detection. In The Twelfth ACM Inter-                     covered [6]. For example, in Figure 1, p1 is a publisher with extreme
                                         national Conference on Web Search and Data Mining (WSDM ’19), February                        left partisan bias and p2 is a publisher with extreme right partisan
                                         11–15, 2019, Melbourne, VIC, Australia. ACM, New York, NY, USA, 10 pages.                     bias. To support their own partisan viewpoints, they have high
                                         https://doi.org/10.1145/3289600.3290994                                                       degree to distort the facts and report fake news pieces, such as a 1
                                                                                                                                       and a 3 ; while for a mainstream publisher p3 that has least partisan
                                         1     INTRODUCTION                                                                            bias, he/she has a lower chance to manipulate original news events,
                                         People nowadays tend to seek out and consume news from social                                 and is more likely to write a true news piece a 4 . Thus, exploit-
                                         media rather than traditional news organizations. For example, 62%                            ing the partisan bias of publishers to bridge the publisher-news
                                         of U.S. adults get news on social media in 2016, while in 2012, only                          relationships can bring additional benefits to predict fake news.
                                                                                                                                           Second, mining user engagements towards news pieces on social
                                                                                                                                       media also help fake news detection. Previous approaches try to
                                         Permission to make digital or hard copies of all or part of this work for personal or
                                         classroom use is granted without fee provided that copies are not made or distributed         aggregate users’ attributes to infer the degree of news veracity by
                                         for profit or commercial advantage and that copies bear this notice and the full citation     assuming that either (i) all the users contribute equally for learning
                                         on the first page. Copyrights for components of this work owned by others than ACM
                                         must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
                                                                                                                                       feature representations of news pieces [10]; or (ii) user features are
                                         to post on servers or to redistribute to lists, requires prior specific permission and/or a
                                         fee. Request permissions from permissions@acm.org.
                                                                                                                                       1 http://www.journalism.org/2016/05/26/news-use-across-social-media-platforms-
                                         WSDM ’19, February 11–15, 2019, Melbourne, VIC, Australia
                                         © 2019 Association for Computing Machinery.                                                   2016/
                                                                                                                                       2 https://www.nytimes.com/2016/11/28/opinion/fake-news-and-the-internet-shell-
                                         ACM ISBN 978-1-4503-5940-5/19/02. . . $15.00
                                         https://doi.org/10.1145/3289600.3290994                                                       game.html?
Figure 1: An illustration of tri-relationship among publish-                 Figure 2: The tri-relationship embedding framework, which
ers, news pieces, and users, during the news dissemination                   consists of five components: news contents embedding, user
process. For example, an edge (p → a) demonstrates that                      embedding, user-news interaction embedding, publisher-
publisher p publishes news item a, an edge (a → u) repre-                    news relation embedding, and news classification.
sents news item a is spread by user u, and an edge (u 1 ↔ u 2 )
indicates the social relation between user u 1 and u 2 .
                                                                             of news pieces, where t is the dimension of vocabulary size. We use
                                                                             A ∈ {0, 1}m×m to denote the user-user adjacency matrix, where
grouped locally for specific news and the global user-news interac-          Ai j = 1 indicates that user ui and u j are friends; otherwise Ai j = 0.
tions are ignored [4]. However, in practice, these assumptions may           We denote the user-news interaction matrix as W ∈ {0, 1}m×n ,
not hold. On social media, different users have different credibility        where Wi j = 1 indicates that user ui has shared the news piece
levels. The credibility score, which means “the quality of being             a j ; otherwise Wi j = 0. It’s worth mentioning that we focus on
trustworthy” [1], has a strong indication of whether some user is            those user-news interactions in which users agree with the news.
more likely to share fake news or not. Those less credible users,            For example, we only consider those users who share news pieces
such as malicious accounts or normal users who are vulnerable to             without comments, and these users share the same alignment of
fake news, are more likely to spread fake news. For example, u 2 and         viewpoints with the news items [12]. We will introduce more details
u 4 are users with low credibility scores, and they tend to spread fake      in Section 3.3. We also denote P = {p1 , p2 , ..., pl } as the set of l
news more than other highly credible users. In addition, users tend          news publishers. In addition, we denote B ∈ Rl ×n as the publisher-
to form relationships with like-minded people [25]. For example,             news publishing matrix, and Bk j = 1 means news publisher pk
user u 5 and u 6 are friends on social media, so they tend to post           publishes the news article a j ; otherwise Bk j = 0. We assume that
those news that confirm their own views, such as a 4 . Therefore,            the partisan bias labels of some publishers are given and available
incorporating the user credibility levels to capture the user-news           (see more details of how to collect partisan bias labels in Sec 3.4).
interactions has potentials to improve fake news prediction.                 We define o ∈ {−1, 0, 1}l ×1 as the partisan label vectors, where -1,
    Moreover, the publisher-news relationships and user-news inter-          0, 1 represents left-, neutral-, and right-partisan bias.
actions both provide new and different perspectives of social con-                Similar to previous research [10, 32], we treat fake news detec-
text, and thus contain complementary information to advance fake             tion problem as a binary classification problem. In other words, each
news detection. In this paper, we investigate: (1) how to mathemat-          news piece can be true or fake, and we use y = {y1 ; y2 ; ...; yn } ∈
ically model the tri-relationship to extract feature representations         Rn×1 to represent the labels, and yj = 1 means news piece a j is
of news pieces; and (2) how to take advantage of tri-relationship            fake news; yj = −1 means true news. With the notations given
modeling for fake news detection. Our solutions to these two chal-           above, the problem is formally defined as,
lenges results in a novel framework TriFN for fake news detection
problem. Our main contributions are summarized as follows:                   Given news article feature matrix X, user adjacency matrix A, user
    • We provide a principled way to model tri-relationship among            social engagement matrix W, publisher-news publishing matrix B,
      publishers, news pieces, and users simultaneously;                     publisher partisan label vector o, and partial labeled news vector yL ,
    • We propose a novel framework TriFN, which exploits both                we aim to predict remaining unlabeled news label vector yU .
      user-news interactions and publisher-news relations for learn-
      ing news feature representations to predict fake news; and             3    A TRI-RELATIONSHIP EMBEDDING
    • We conduct extensive experiments on two real-world datasets                 FRAMEWORK
      to assess the effectiveness of TriFN.                                  In this section, we present the details of the proposed framework
                                                                             TriFN for modeling tri-relationship for fake news detection. It con-
2    PROBLEM STATEMENT                                                       sists of five major components (Figure 2): a news contents embed-
Let A = {a 1 , a 2 , ..., an } be the set of n news pieces, and U =          ding component, a user embedding component, a user-news interac-
{u 1 , u 2 , ..., um } be the set of m users on social media posting these   tion embedding component, a publisher-news relation embedding
news pieces. We denote X ∈ Rn×t as the bag-of-word feature matrix            component, and a semi-supervised classification component.
   In general, the news contents embedding component describes          less likely to spread fake news. To measure user credibility scores,
the mapping of news from bag-of-word features to latent feature         we adopt the practical approach in [1]. The basic idea in [1] is
space; the user embedding component illustrates the extraction          that less credible users are more likely to coordinate with each
of user latent features from user social relations; the user-news       other and form big clusters, while more credible users are likely to
interaction embedding component learn the feature representations       from small clusters. Specifically, the credibility scores are measured
of news pieces guided by their partial labels and user credibilities;   through the following major steps: 1) detect and cluster coordinate
The publisher-news relation embedding component regularize the          users based on user similarities; 2) weight each cluster based on the
feature representations of news pieces through publisher partisan       cluster size. Note that for our fake news detection task, we do not
bias labels; The semi-supervised classification component learns a      assume that credibility scores are directly provided, but inferred
classification function to predict unlabeled news items.                from widely available data, such as user-generated contents. By
                                                                        using the method in [1], we can assign each user ui a credibility
3.1    News Contents Embedding                                          score ci ∈ [0, 1]. A larger ci indicates that user ui has a higher
We can use news contents to find clues to differentiate fake news       credibility, while a lower ci indicates a lower credibility score. We
and true news. Recently, it has been shown that nonnegative matrix      use c = {c1 , c2 , ..., cm } to denote the credibility score vector for all
factorization (NMF) algorithms are very practical and popular to        users.
learn document representations [20, 28, 38]. It can project the news-      First, high-credibility users are more likely to share true news
word matrix X to a joint latent semantic factor space with low          pieces, so we ensure that the distance between latent features of
dimensionality, such that the news-word relations are modeled           high-credibility users and that of true news is minimized,
as the inner product in the space. Specifically, giving the news-
word matrix X ∈ Rn×t , NMF methods try to find two nonnegative                                  m Õ
                                                                                                Õ r
                                                                                                                         1 + yLj
matrices D ∈ Rn×d
                +    and V ∈ Rt+×d , where d is the dimension of the                min                   Wi j ci (1 −           )||Ui − DL j ||22    (3)
                                                                                  U,DL ≥0
                                                                                                i=1 j=1
                                                                                                                            2
latent space, by solving the following optimization problem,
                                                                                  1+y
                 min ∥ X − DVT ∥F2 + λ(∥D∥F2 + ∥V∥F2 )            (1)   and (1 − 2 Lj ) is to ensure we only include true news pieces (i.e.,
              D,V≥0                                                     yLj = −1), and ci is to adjust the contribution of user ui to the loss
where D and V are the nonnegative matrices indicating low dimen-        function. For example, if ci is large (high-credibility) and Wi j = 1,
sion representations of news pieces and words. Note that we denote      we put a bigger weight on forcing the distance of feature Ui and
D = [DL ; DU ], where DL ∈ Rr ×d is the news latent feature matrix      DLj to be small; if ci is small (low-credibility) and Wi j = 1, than
for labeled news; while DU ∈ R(n−r )×d is the news latent feature       we put a smaller weight on forcing the distance of feature Ui and
matrix for unlabeled news. The term λ(∥D∥F2 + ∥V∥F2 ) is introduced     DLj to be small.
to avoid over-fitting.                                                     Second, low-credibility users are more likely to share fake news
                                                                        pieces, and we aim to minimize the distance between latent features
3.2    User Embedding                                                   of low-credibility users and that of fake news,
On social media, people tend to form relationships with like-minded
people, rather than those users who have opposing preferences and                           m Õ
                                                                                              r
                                                                                            Õ                             1 + yLj
interests. Thus, connected users are more likely to share similar                  min                Wi j (1 − ci )(             )||Ui − DL j ||22   (4)
latent interests in news pieces. To obtain a standardized represen-              U,DL ≥0
                                                                                            i=1 j=1
                                                                                                                             2
tation, we use nonnegative matrix factorization to learn the users’
                                                                                         1+y
latent representations. Specifically, giving user-user adjacency ma-    and the term ( 2 Lj ) is to ensure we only include fake news pieces
trix A ∈ {0, 1}m×m , we learn nonnegative matrix U ∈ Rm×d    +    by    (i.e., yL j = 1), and (1 − ci ) is to adjust the contribution of user ui to
solving the following optimization problem,                             the loss function. For example, if ci is large (high-credibility) and
                                                                        Wi j = 1, we put a smaller weight on forcing the distance of feature
          min ∥Y ⊙ (A − UTUT )∥F2 + λ(∥U∥F2 + ∥T∥F2 )             (2)
         U,T≥0                                                          Ui and DLj to be small; if ci is small (low-credibility) and Wi j = 1,
                                                                        then we put a bigger weight on forcing the distance of feature Ui
where U is the user latent matrix, T ∈ Rd+×d is the user-user cor-
                                                                        and DLj to be small.
relation matrix, Y ∈ Rm×m controls the contribution of A, and
                                                                            Finally, We combine Eqn 3 and Eqn 4 to consider the above two
⊙ denotes the Hadamard product operation. Since only positive
                                                                        situations, and obtain the following objective function,
links are observed in A, following common strategies [19], we first
set Yi j = 1 if Ai j = 1, and then perform negative sampling and                                m Õ
                                                                                                  r
                                                                                                Õ                        1 + yLj
generate the same number of unobserved links and set weights as                     min                   Wi j ci (1 −           )||Ui − DL j ||22
0. The term λ(∥U∥F2 + ∥T∥F2 ) is to avoid over-fitting.                           U,DL ≥0
                                                                                                i=1 j=1
                                                                                                                            2
                                                                                                |                        {z                    }
3.3    User-News Interaction Embedding                                                                             True news
                                                                                                                                                      (5)
                                                                                            m Õ
                                                                                              r
We model the user-news interactions by considering the relation-
                                                                                            Õ                             1 + yLj
                                                                                        +             Wi j (1 − ci )(             )||Ui − DL j ||22
ships between user features and the labels of news items. We have                           i=1 j=1
                                                                                                                             2
shown (see Section 1) that users with low credibilities are more                            |                            {z                     }
likely to spread fake news, while users with high credibilities are                                               Fake news
For simplicity, Eqn 5 can be rewritten as,                                                3.5    Proposed Framework - TriFN
                                  m Õ
                                    r                                                     We have introduced how we can learn news latent features by mod-
                                  Õ
                          min               Gi j ||Ui − DL j ||22                 (6)     eling different aspects of the tri-relationship. We further employ a
                        U,DL ≥0                                                           semi-supervised linear classifier term as follows,
                                  i=1 j=1
                                  1+y                    1+y
where Gi j = Wi j (ci (1 − 2 Lj ) + (1 − ci )( 2 Lj )). If we denote a                                          min ∥ DL p − yL ∥22 + λ∥p∥22                   (10)
                                                                                                                 p
new matrix H = [U; DL ] ∈ R(m+r )×d , we can also rewrite Eqn. 6 as
a matrix form as follows,                                                                  where p ∈ Rd ×1 is the weighting matrix that maps news latent
                                                                                          features to fake news labels. With all previous components, TriFN
            m Õ
              r                                         m rÕ
                                                           +m
                                                                                          solves the following optimization problem,
            Õ                                           Õ
  min                  Gi j ||Ui − DL j ||22 ⇔ min                  Gi j ||Hi − Hj ||22
 U,DL ≥0                                         H≥0
            i=1 j=1                                     i=1 j=1+m
             m+r
                                                                                                     min        ∥X − DVT ∥F2 + α ∥Y ⊙ (A − UTUT )∥F2
                                                                                                D,U,V,T≥0,p,q
                     Fi j ||Hi − Hj ||22 ⇔ min tr(HT LH)
             Õ
 ⇔ min
      H≥0
            i, j=1
                                            H≥0                                                                 + βtr(HT LH) + γ ∥e ⊙ (B̄Dq − o)∥22            (11)
                                                                 (7)                                            + η∥DL p − yL ∥22      + λR
where L = S − F is the Laplacian matrix and S is a diagonal ma-
trix with diagonal element Sii = m+r                (m+r )×(m+r ) is
                                 Í
                                   j=1 Fi j . F ∈ R                                       where R = (∥D∥F2 + ∥V∥F2 + ∥U∥F2 + ∥T∥F2 + ∥p∥22 + ∥q∥22 ) is to
computed as follows,                                                                      avoid over-fitting. The first term models the news latent features
                                                                                          from news contents; the second term extracts user latent features
              
              
               0,        i, j ∈ [1, m] or i, j ∈ [m + 1, m + r ]                         from their social relationships; and the third term incorporates the
        Fi j = G i(j−m) , i ∈ [1, m], j ∈ [m + 1, m + r ]
              
              
                                                                                  (8)     user-news interactions; and the fourth term models publisher-news
              G                                                                          relationships. The fifth term adds a semi-supervised fake news
               (i−m)j , i ∈ [m + 1, m + r ], j ∈ [1, m]
              
              
                                                                                          classifier. Therefore, this framework provides a principled way to
                                                                                          model tri-relationship for fake news prediction.
3.4     Publisher-News Relation Embedding
Fake news is often written to convey opinions or claims that sup-
port the partisan bias of news publishers. Thus, a good news rep-
                                                                                          4     AN OPTIMIZATION ALGORITHM
resentation should be good at predicting the partisan bias of its                         In this section, we present the detail optimization process for the
publisher. We obtain the list of publishers’ partisan scores from a                       proposed framework TriFN. If we update the variables jointly, the
well-known media bias fact-checking websites MBFC 3 . The parti-                          objective function in Eq. 11 is not convex. Thus, we propose to
san bias labels are checked with a principled methodology that en-                        use alternating least squares to update the variables separately. For
sures the reliability and objectivity of the partisan annotations. The                    simplicity, we user L to denote the objective function in Eq. 11.
labels are categorized into five categories: “left”, “left-Center”,“least-                Next, we introduce the updating rules for each variable in details.
biased”,“right-Center” and “right”. To further ensure the accuracy of                        Update D. Let ΨD be the Lagrange multiplier for constraint
the labels, we only consider those news publishers with the annota-                       D ≥ 0, the Lagrange function related to D is,
tions [“left”,“least-biased”, “Right”], and rewrite the corresponding
labels as [-1,0,1]. Thus, we can construct a partisan label vectors                             min ∥X − DVT ∥F2 + βtr(HT LH) + γ ∥e ⊙ (B̄Dq − o)∥22
                                                                                                 D                                                             (12)
for news publishers as o. Note that we may not obtain the partisan
                                                                                                     + η∥DL p − yL ∥22 + λ∥D∥F2 − tr(ΨD DT )
labels for all publishers, so we introduce e ∈ {0, 1}l ×1 to control the
weight of o. If we have the partisan bias label of publisher pk , then
                                                                                          and D = [DL ; DU ] and H = [U; DL ]. We rewrite L = [L11 , L12 ; L21 , L22 ],
ek = 1; otherwise, ek = 0. The basic idea is to utilize publisher par-
                                                                                          where L11 ∈ Rm×m , L12 ∈ Rm×r ,L21 ∈ Rr ×m , and L22 ∈ Rr ×r ; and
tisan labels vector o ∈ Rl ×1 and publisher-news matrix B ∈ Rl ×n                         X = [XL , XU ]. The partial derivative of L w.r.t. D as follows,
to optimize the news feature representation learning. Specifically,
we optimization following objective function,
                                                                                                1 ∂L
                                                                                                      = (DVT − X)V + λD + γ B̄T ET (EB̄Dq − Eo)qT
                        min ∥ e ⊙       (B̄Dq − o)∥22   + λ∥q∥22                                2 ∂D                                                           (13)
                                                                                  (9)
                       D≥0,q                                                                       + βL21 U + βL22 DL + η(DL p − yL )pT ; 0 − ΨD
                                                                                                                                          
F1
                                                                                                                                                                                                                                        F1
                                                                                                                                                    0.8                                                                                      0.8
               0.8                                                                0.8                                                                                      η                                γ                                                       η                            γ
                                                                                                                                                                                        0   0                                                                                    0   0
                                                                             F1
               0.7                                                                0.7
                                                                                                                                                          (a) η and γ on BuzzFeed                                                                  (b) η and γ on PolitiFact
               0.6                                                                0.6
               0.5                                                                0.5
                  12    24       36      48      60     72   84   96   All           12   24     36     48     60     72   84   96   All
                                                hours                                                         hours
                                                                                                                                                                                                                                        F1
                                                                                                                                               F1
                                                                                                                                                    0.8                                                                                      0.9
                                                                                                                                                                                                                                             0.8                                                                       0.01
                                                                                                                                                    0.6                                                                          0.01
                1                                                                  1                                                            0.01
                                                                                                                                                                                                                                         0.01
                                                                                           RST                                                                                                                           0.001                                                                                 0.001
                             RST
                                                                                           LIW C                                                          0.001                                                                                    0.001
                             LIW C
                                                                                           Castillo                                                                                                             0.0001                                                                                0.0001
                             Castillo                                                                                                                             0.0001                                                                                   0.0001
               0.9                                                                0.9      RST + Castillo
                             RST + Castillo                                                                                                                                                             1e-05                                                                                1e-05
                                                                                           LIW C + Castillo                                                                     1e-05                                                                                    1e-05
                             LIW C + Castillo
                             T riF N
                                                                                           T riF N                                                                     β                                        α                                               β                                     α
                                                                                                                                                                                        0       0                                                                                0   0
    Accuracy
               0.8                                                                0.8
                                                                             F1
               0.7                                                                0.7
                                                                                                                                                          (c) α and β on BuzzFeed                                                                  (d) α and β on PolitiFact
               0.6                                                                0.6
                                                                                                                                           Figure 5: Model parameter analysis for TriFN on BuzzFeed
                                                                                                                                           and PolitiFact in terms of F1.
               0.5                                                                0.5
                  12    24       36      48      60     72   84   96   All           12   24     36     48     60     72   84   96   All
                                                hours                                                         hours
Figure 4: The performance of early fake news detection on                                                                                  specific writing styles and sensational headlines that commonly oc-
BuzzFeed and PolitiFact in terms of Accuracy and F1.                                                                                       cur in fake news content [24], such as lexical and syntactic features.
                                                                                                                                           Visual-based features try to identify fake images [9] that are in-
                                                                                                                                           tentionally created or capturing specific characteristics for images
5.6                    Model Parameter Analysis                                                                                            in fake news. News content based models include i) knowledge-
The proposed TriFN has four important parameters. The first two                                                                            based: using external sources to fact-checking claims in news con-
are α and β, which control the contributions from social relationship                                                                      tent [17, 37], and 2) style-based: capturing the manipulators in
and user-news engagements. γ controls the contribution of pub-                                                                             writing style, such as deception [7, 27] and non-objectivity [24]. For
lisher partisan and η controls the contribution of semi-supervised                                                                         example, Potthast et al. [24] extracted various style features from
classifier. We first fix {α = 1e − 4, β = 1e − 5} and {α = 1e − 5, β =                                                                     news contents and predict fake news and media bias.
1e − 4} for BuzzFeed and PolitiFact, respectively. Then we vary η as                                                                          In addition to news content, social context related to news pieces
{1, 10, 20, 50, 100} and γ in {1, 10, 20, 30, 100}. The performance vari-                                                                  contains rich information to help detect fake news. For social con-
ations are depicted in Figure 5. We can see i) when η increases from                                                                       text based approaches, the features mainly include user-based,
0, eliminating the impact of semi-supervised classification term, to                                                                       post-based and network-based. User-based features are extracted
1, the performance increase dramatically in both datasets. These re-                                                                       from user profiles to measure their characteristics and credibili-
sults support the importance to combine semi-supervised classifier                                                                         ties [4, 14, 34, 39]. For example, Shu et al. [34] proposed to under-
to feature learning; ii) generally, the increase of γ will increase the                                                                    stand user profiles from various aspects to differentiate fake news.
performance in a certain region, γ ∈ [1, 50] and η ∈ [1, 50] for both                                                                      Yang et al. [39] proposed an unsupervised fake news detection
datasets, which easy the process for parameter setting. Next, we                                                                           algorithm by utilizing users’ opinions on social media and estimat-
fix {γ = 1, η = 1} and {γ = 10, η = 1} for BuzzFeed and PolitiFact,                                                                        ing their credibilities. Post-based features represent users’ social
respectively. Then we vary α, β ∈ [0, 1e −5, 1e −4, 1e −3, 0.001, 0.01].                                                                   response in term of stance [10], topics [16], or credibility [4, 36].
We can see that i) when α and β increase from 0, which eliminate                                                                           Network-based features [29] are extracted by constructing specific
the social engagements, to 1e − 5, the performance increases rela-                                                                         networks, such as diffusion network [14] etc. Social context models
tively, which again support the importance of social engagements;                                                                          basically include stance-based and propagation-based. Stance-based
ii) The performance tends to increase first and then decrease, and                                                                         models utilize users’ opinions towards the news to infer news ve-
it’s relatively stable in [1e − 5, 1e − 3].                                                                                                racity [10]. Propagation-based models assume that the credibility
                                                                                                                                           of news is highly related to the credibilities of relevant social me-
6                RELATED WORK                                                                                                              dia posts, which several propagation methods can be applied [10].
We briefly introduce the related work about fake news detection                                                                            Recently, deep learning models are applied to learn the temporal
on social media. Fake news detection methods generally focus on                                                                            and linguistic representation of news [11, 30, 35]. Shu et al. [33]
using news contents and social contexts [32, 40].                                                                                          proposed to generate synthetic data for augmenting training data to
   News contents contain the clues to differentiate fake and real                                                                          help improve the detection of clickbaits. It’s worth mentioning that
news. For news content based approaches, features are extracted as                                                                         we can not directly compare the propagation-based approaches,
linguistic-based and visual-based. Linguistic-based features capture                                                                       because we assume we only have user actions, e.g., posting the
news or not. In this case, the propagation signals inferred from text                       [14] Sejeong Kwon, Meeyoung Cha, Kyomin Jung, Wei Chen, and Yajun Wang. 2013.
are the same and thus become ineffective.                                                        Prominent features of rumor propagation in online social media. In Data Mining
                                                                                                 (ICDM), 2013 IEEE 13th International Conference on. IEEE, 1103–1108.
   In this paper, we are to our best knowledge the first to clas-                           [15] Daniel D Lee and H Sebastian Seung. 2001. Algorithms for non-negative matrix
sify fake news by learning the effective news features through the                               factorization. In Advances in neural information processing systems. 556–562.
                                                                                            [16] Jing Ma, Wei Gao, Zhongyu Wei, Yueming Lu, and Kam-Fai Wong. 2015. Detect
tri-relationship embedding among publishers, news contents, and                                  rumors using time series of social context information on microblogging websites.
social engagements.                                                                              In Proceedings of the 24th ACM International on Conference on Information and
                                                                                                 Knowledge Management. ACM, 1751–1754.
                                                                                            [17] Amr Magdy and Nayer Wanas. 2010. Web-based statistical fact checking of
7     CONCLUSION AND FUTURE WORK                                                                 textual documents. In Proceedings of the 2nd international workshop on Search
Due to the inherent relationship among publisher, news and social                                and mining user-generated contents. ACM, 103–110.
                                                                                            [18] Brendan Nyhan and Jason Reifler. 2010. When corrections fail: The persistence
engagements during news dissemination process on social media,                                   of political misperceptions. Political Behavior 32, 2 (2010), 303–330.
we propose a novel framework TriFN to model tri-relationship                                [19] Rong Pan and Martin Scholz. Mind the gaps: weighting the unknown in large-
for fake news detection. TriFN can extract effective features from                               scale one-class collaborative filtering. In KDD’09.
                                                                                            [20] V Paul Pauca, Farial Shahnaz, Michael W Berry, and Robert J Plemmons. 2004.
news publisher and user engagements separately, as well as cap-                                  Text mining using non-negative matrix factorizations. In Proceedings of the 2004
ture the interrelationship simultaneously. Experimental results on                               SIAM International Conference on Data Mining. SIAM, 452–456.
                                                                                            [21] Christopher Paul and Miriam Matthews. 2016. The Russian âĂIJFirehose of
real world fake news datasets demonstrate the effectiveness of the
                                                                                                 FalsehoodâĂİ Propaganda Model. RAND Corporation (2016).
proposed framework and importance of tri-relationship for fake                              [22] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel,
news prediction. It’s worth mentioning TriFN can achieve good                                    Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss,
                                                                                                 Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal
detection performance in the early stage of news dissemination.                                  of machine learning research 12, Oct (2011), 2825–2830.
   There are several interesting future directions. First, it’s worth to                    [23] James W Pennebaker, Ryan L Boyd, Kayla Jordan, and Kate Blackburn. 2015. The
explore effective features and models for early fake news detection,                             development and psychometric properties of LIWC2015. Technical Report.
                                                                                            [24] Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno
as fake news usually evolves very fast on social media; Second, how                              Stein. 2017. A Stylometric Inquiry into Hyperpartisan and Fake News. arXiv
to extract features to model fake news intention from psychology’s                               preprint arXiv:1702.05638 (2017).
perspective needs further investigation. At last, how to identify low                       [25] Walter Quattrociocchi, Antonio Scala, and Cass R Sunstein. 2016. Echo chambers
                                                                                                 on facebook. (2016).
quality or even malicious users spreading fake news is important                            [26] Victoria L Rubin, N Conroy, and Yimin Chen. 2015. Towards news verifica-
for fake news intervention and mitigation.                                                       tion: Deception detection methods for news discourse. In Hawaii International
                                                                                                 Conference on System Sciences.
                                                                                            [27] Victoria L Rubin and Tatiana Lukoianova. 2015. Truth and deception at the
8     ACKOWLEDGMENTS                                                                             rhetorical structure level. Journal of the Association for Information Science and
                                                                                                 Technology 66, 5 (2015), 905–917.
This material is based upon work supported by, or in part by, the                           [28] Farial Shahnaz, Michael W Berry, V Paul Pauca, and Robert J Plemmons. 2006.
NSF #1614576 and the ONR grant N00014-16-1-2257.                                                 Document clustering using nonnegative matrix factorization. Information Pro-
                                                                                                 cessing & Management 42, 2 (2006), 373–386.
                                                                                            [29] Kai Shu, H. Russell Bernard, and Huan Liu. 2018. Studying Fake News via Network
REFERENCES                                                                                       Analysis: Detection and Mitigation. CoRR abs/1804.10233 (2018).
 [1] Mohammad Ali Abbasi and Huan Liu. 2013. Measuring User Credibility in Social           [30] Kai Shu, Deepak Mahudeswaran, and Huan Liu. 2018. FakeNewsTracker: a
     Media.. In SBP. Springer, 441–448.                                                          tool for fake news collection, detection, and visualization. Computational and
 [2] Hunt Allcott and Matthew Gentzkow. 2017. Social media and fake news in the                  Mathematical Organization Theory (2018), 1–12.
     2016 election. Technical Report. National Bureau of Economic Research.                 [31] Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and Huan Liu.
 [3] Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge                  2018. FakeNewsNet: A Data Repository with News Content, Social Context and
     university press.                                                                           Dynamic Information for Studying Fake News on Social Media. arXiv preprint
 [4] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credi-             arXiv:1809.01286 (2018).
     bility on twitter. In Proceedings of the 20th international conference on World wide   [32] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake
     web. ACM, 675–684.                                                                          News Detection on Social Media: A Data Mining Perspective. KDD exploration
 [5] Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system.            newsletter (2017).
     In Proceedings of the 22nd acm sigkdd international conference on knowledge            [33] Kai Shu, Suhang Wang, Thai Le, Dongwon Lee, and Huan Liu. Deep Headline
     discovery and data mining. ACM, 785–794.                                                    Generation for Clickbait Detection.. In ICDM’18.
 [6] Robert M Entman. 2007. Framing bias: Media in the distribution of power. Journal       [34] Kai Shu, Suhang Wang, and Huan Liu. 2018. Understanding User Profiles on
     of communication 57, 1 (2007), 163–173.                                                     Social Media for Fake News Detection. In 2018 IEEE Conference on Multimedia
 [7] Song Feng, Ritwik Banerjee, and Yejin Choi. 2012. Syntactic stylometry for de-              Information Processing and Retrieval (MIPR). IEEE.
     ception detection. In Proceedings of the 50th Annual Meeting of the Association for    [35] Yaqing Wang, Fenglong Ma, Zhiwei Jin, Ye Yuan, Guangxu Xun, Kishlay Jha, Lu
     Computational Linguistics: Short Papers-Volume 2. Association for Computational             Su, and Jing Gao. EANN: Event Adversarial Neural Networks for Multi-Modal
     Linguistics, 171–175.                                                                       Fake News Detection. In KDD’18.
 [8] Matthew Gentzkow, Jesse M Shapiro, and Daniel F Stone. 2014. Media bias in the         [36] Liang Wu and Huan Liu. Tracing fake-news footprints: Characterizing social
     marketplace: Theory. Technical Report. National Bureau of Economic Research.                media messages by how they propagate. In WSDM’18.
 [9] Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi.                   [37] You Wu, Pankaj K Agarwal, Chengkai Li, Jun Yang, and Cong Yu. 2014. Toward
     2013. Faking sandy: characterizing and identifying fake images on twitter during            computational fact-checking. Proceedings of the VLDB Endowment 7, 7 (2014),
     hurricane sandy. In Proceedings of the 22nd international conference on World               589–600.
     Wide Web. ACM, 729–736.                                                                [38] Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-
[10] Zhiwei Jin, Juan Cao, Yongdong Zhang, and Jiebo Luo. 2016. News Verification                negative matrix factorization. In Proceedings of the 26th annual international
     by Exploiting Conflicting Social Viewpoints in Microblogs.. In AAAI. 2972–2978.             ACM SIGIR conference on Research and development in informaion retrieval. ACM,
[11] Hamid Karimi, Proteek Roy, Sari Saba-Sadiya, and Jiliang Tang. Multi-Source                 267–273.
     Multi-Class Fake News Detection. In COLING’18.                                         [39] Shuo Yang, Kai Shu, Suhang Wang, Renjie Gu, Fan Wu, and Huan Liu. Un-
[12] Antino Kim and Alan R Dennis. 2017. Says Who?: How News Presentation                        supervised Fake News Detection on Social Media: A Generative Approach. In
     Format Influences Perceived Believability and the Engagement Level of Social                AAAI’19.
     Media Users. (2017).                                                                   [40] Xinyi Zhou, Reza Zafarani, Kai Shu, and Huan Liu. Fake News: Fundamental
[13] David O Klein and Joshua R Wueller. 2017. Fake News: A Legal Perspective.                   Theories, Detection Strategies and Challenges. In WSDM’19.
     (2017).