0% found this document useful (0 votes)
20 views15 pages

Detection

FGHJBNVCDXRTFYGUHJK?.NB VCFXTGYHJBN? VCFTYGUHJN?BVCFDXRTYUHJKN?B VCFGHJN? VCGHNB VCGFHGJ? CVGVBHN?J

Uploaded by

mariembelhaj110
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views15 pages

Detection

FGHJBNVCDXRTFYGUHJK?.NB VCFXTGYHJBN? VCFTYGUHJN?BVCFDXRTYUHJKN?B VCFGHJN? VCGHNB VCGFHGJ? CVGVBHN?J

Uploaded by

mariembelhaj110
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Fundamental Research 5 (2025) 332–346

Contents lists available at ScienceDirect

Fundamental Research
journal homepage: http://www.keaipublishing.com/en/journals/fundamental-research/

Review

An overview of fake news detection: From a new perspective


Bo Hu, Zhendong Mao∗, Yongdong Zhang∗
School of Information Science and Technology, University of Science and Technology of China, Hefei 230022, China

a r t i c l e i n f o a b s t r a c t

Article history: With the rapid development and popularization of Internet technology, the propagation and diffusion of infor-
Received 28 January 2023 mation become much easier and faster. While making life more convenient, the Internet also promotes the wide
Received in revised form 17 October 2023 spread of fake news, which will have a great negative impact on countries, societies, and individuals. Therefore,
Accepted 21 January 2024
a lot of research efforts have been made to combat fake news. Fake news detection is typically a classification
Available online 22 February 2024
problem aiming at verifying the veracity of news contents, which may include texts, images and videos. This ar-
ticle provides a comprehensive survey of fake news detection. We first summarize three intrinsic characteristics
Keywords:
of fake news by analyzing its entire diffusion process, namely intentional creation, heteromorphic transmission,
Fake news detection
and controversial reception. The first refers to why users publish fake news, the second denotes how fake news
Social media
Intentional creation propagates and distributes, and the last means what viewpoints different users may hold for fake news. We then
Heteromorphic transmission discuss existing fake news detection approaches according to these characteristics. Thus, this review will enable
Controversial reception readers to better understand this field from a new perspective. We finally discuss the trends of technological
advances in this field and also outline some potential directions for future research.

1. Introduction importantly, it is unable to detect emerging fake news at an early stage,


and thus fails to minimize the damage caused by fake news. To address
With the development of Internet communication technology and the above issues, various automatic fake news detection algorithms
the rise of social networks, it becomes possible for ordinary people to have been developed which can detect fake news as early as possible
publish news and make comments online, which even though brings and help stop the viral spread of such news. Early studies mainly focus
great convenience, while provides an environment conducive to the cre- on designing hand-crafted features, e.g., statistical features [3,6–11],
ation and spread of fake news. Fake news may have a negative impact on topic features [12,13], lexical features [10,13–16], and syntactic
countries, societies, and individuals. As for the countries, a large amount features [17–19], and then training supervised [6,7,18,20–23] or
of fake news arose during the U.S. 2016 presidential election [1], which unsupervised [24,25] classifiers to distinguish between fake and real
might have heavily influenced the election results. As for societies, fake news. Recent studies investigate the effectiveness of propagation
news sometimes emerges with natural disasters and pandemics, such as patterns on fake news detection, and a variety of propagation tree
the Japan earthquake in 2011 [2], Hurricane Sandy in 2012 [3], and or propagation graph based models [3,12,14,26] have been proposed
the COVID-19 pandemic in 2019 [4], which may cause panics among and applied successively. More recently with the development of deep
the public. As for individuals, fake news which claimed that Obama was learning [27], a growing number of studies [19,20,28–44] are moving
injured in the blast has resulted in a dramatic collapse in the stock mar- towards exploiting deep neural networks to extract features or model
ket [5], which might damage the property of individuals. In addition, a propagation patterns.
lot of health misinformation regarding COVID-19 and vaccination was There have been previous reviews of fake news detection techniques.
disseminated during the pandemic, which can even harm the physical For instance, Zubiaga et al. [45] provide an overview of existing re-
well-being of deceived individuals. search with the goal of developing a classification system of fake news,
In the past few years, there have been a lot of attempts to distin- including four components, such as detection, tracking, stance clas-
guish fake news from real news. Famous social networks like Twitter, sification and veracity classification. Zhou et al. [5] categorize cur-
Facebook and Weibo have developed anti-rumor centers, which allow rent techniques into knowledge-based methods, style-based approaches,
users to report and dispel possible fake news. Such a mechanism, to propagation-based algorithms and credibility-based networks. Zhang
some extent, reduces the negative impact of fake news, but is inefficient et al. [46] review the fake news detection mechanisms corresponding to
as it relies on extensive manual review and expert knowledge. More three types of features of fake news, such as creator/user-based, news


Corresponding authors.
E-mail addresses: zdmao@ustc.edu.cn (Z. Mao), zhyd73@ustc.edu.cn (Y. Zhang).

https://doi.org/10.1016/j.fmre.2024.01.017
2667-3258/© 2024 The Authors. Publishing Services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

Fig. 1. Three categories of fake news detection approaches based on three characteristics: intentional creation, heteromorphic transmission, and contro-
versial reception. (a) Intentional feature-based approaches first extract features to describe intentions of news messages, and then use these features for classification.
(b) Propagation-based approaches first construct the propagation structures, and then study the structure patterns and information diffusion for veracity evaluation.
(c) Stance-based approaches exploit the stances of different users as clues to facilitate fake news detection.

content-based and social context-based features. Shu et al. [47] discuss - Stance-based approaches exploit the stances of different users as clues
fake news detection from a data mining perspective, including feature to facilitate fake news detection.
extraction and model construction. Varshney et al. [48] review the prior Thus, this survey can enable readers to understand this field from
works on the procedure of fake news detection, including social me- a new perspective, and help us reveal the trends of technological ad-
dia data collection, preprocessing, feature analysis and detection mod- vances, which provides insights on how to design effective and explain-
els. Rohera et al. [49] classify previous works into different categories, able detection mechanisms, including
such as supervised learning, semi-supervised learning and unsupervised
- Characteristic Selection: Characteristics from all three categories
learning, etc. Schlicht et al. [50] propose a taxonomy with 6 dimen-
of the diffusion process tend to be utilized together for fake news
sions, such as inputs, source, topics, types, tasks, and detection methods,
detection.
to categorize and discuss recent research works on health misinforma-
- Framework Design: A framework can be designed to capture the
tion detection. Chen et al. [51] provide a systematic review on health
characteristics from all three categories for effective fake news de-
misinformation detection from the perspectives of misinformation char-
tection.
acterization, detection mechanisms, and intervention efforts made by
- Result Explanation: Detection results can be explained in a more
individuals, organizations and governments.
fine-grained manner, which reveals the key factors of fake news.
In this paper, we provide a thorough survey of fake news detection
from a brand-new perspective: the intrinsic characteristics of fake news. The rest of this survey is organized as follows. We first introduce
Specifically, by analyzing the entire diffusion process of fake news, we the definition of fake news and briefly describe the fake news detection
summarize its three main characteristics, i.e., intentional creation, het- task in Section 2. Then we discuss three intrinsic characteristics of fake
eromorphic transmission, and controversial reception, detailed as fol- news and corresponding detection approaches in Sections 3, 4, 5, respec-
lows. tively. After that, we present popular evaluation datasets in Section 6,
(1) Intentional creation. Unlike real news that reports real events, and discussion and future directions in Section 7. Finally, we conclude
fake news is usually created with intention, e.g., to mislead the public or in section Section 8.
manipulate opinions [47]. For example, an art painting from 2009 was
2. Definitions
posted to mislead the public, which indicated that Ronald McDonald
was flooded in Hurricane Sandy [52].
Fake News: There is no widely accepted definition of fake news.
(2) Heteromorphic transmission. Real news is usually spread by
In this survey, we follow the narrow definition of fake news used in
ordinary users, while fake news however tends to be propagated by di-
[47,53]: fake news is intentionally created and verified false. The key
verse types of users, which thereby results in heteromorphic transmis-
distinctive features of fake news are authenticity and intentionality. It
sion patterns. For example, as pointed out by Ma et al. [26], fake news
is easy to differentiate fake news from other related concepts by these
is typically posted by a low-impact user first and then widely spread by
two features. For example, when the authenticity is unverified and the
opinion leaders, while real news is usually initiated by an opinion leader
intention is unknown, the concept denotes rumor [54]; and when the
and spread by normal users.
authenticity is false, but the intention is not bad, the concept denotes
(3) Controversial reception. People are more likely to hold differ-
misinformation [5,47]. Note that fake news is a special case of disin-
ent views towards fake news than real news. Given a piece of fake news,
formation [5]. Fake news is usually limited to news articles while dis-
some people believe it, while others may suspect or deny its veracity.
information includes all kinds of information. In this survey, our main
For example, users can express their attitudes through “thumbs up” or
focus is on the work of fake news detection. However, the literature in-
“thumbs down” on Facebook [47].
dicates that the detection of rumors and misinformation exhibits analo-
Based on these characteristics, we categorize existing techniques into
gous characteristics to fake news, and the detection methods can also be
three groups: intentional feature-based, propagation-based, and stance-
applied to fake news detection. Therefore, this paper encompasses rep-
based approaches, as shown in Fig. 1.
resentative methods in rumors and misinformation detection for more
- Intentional Feature-based approaches first extract features to describe
comprehensive overview of fake news detection.
intentions of news messages, and then use these features for classifica-
Fake News Detection: Given an event 𝑥 ∈  , where  is the event
tion. { }
set, and a set of event-related news messages 𝑥 = 𝑚1 , 𝑚2 , … , 𝑚𝑁 .
- Propagation-based approaches first construct the propagation struc-
Each message 𝑚𝑖 ∈ 𝑥 contains textual descriptions and visual con-
tures, and then study the structure patterns and information diffusion { }
tents, regarding event 𝑥. Let  = 𝑢1 , 𝑢2 , … , 𝑢𝐾 be the user set, and
for veracity evaluate.

333
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

Table 1
Features extracted based on intentions of fake news.
Intention Feature Example Reference
Mislead the Public Special Symbol Features The number of URLs [3,6,7,9–11]
The fraction of messages containing a URL
Whether a message contains a personal pronoun in 1st, 2nd, or 3rd person
Whether a message contains a question mark or exclamation mark
The number of “@” tags in a message
Manipulate Opinions Sentiment Features The numbers of positive and negative emoticons used in a message [10,13–16,20,55]
Average sentiment score of a message
Whether a message contains strong negative words
Fraction of messages containing negative sentiment and positive sentiment
Style Features Style similarity of news messages [56]
Attract User Attention Topic Features The fraction of hashtags (#) [6,13]
LDA-based topic distribution of a message
Visual Features Whether a message contains images and videos [10,17,28–30,55,57–60]
Whether a user has a profile image
The time delay of an image
Image distribution: clarity score, coherence score, etc.
Whether an image matches the text
The topic of an image
The semantic of an image extracted by pretrained models
Clickbait Features Similarity between the headline and top sentences [9,61]
Informality and readability of a message
Whether a message contains internet slang or swear words
Whether a message uses repeated characters (e.g., ooh, aah, etc.)
GeneralFeatures Temporal Features The time difference between a repost and the original message [10,13–15]
Whether a message has periodic reposts/comments spikes
The number of duplicated reposts/comments
User Features Number of friends, followers, and the ratio of followers and friends [6–8,10,11,13,14,62–70]
Whether a user is a VIP or a verified user
Register time, client program type, location, organization, gender
Whether social profiles in different social media are linked with each other
Whether the profile of a user contains a description, URL, and location
Ratio of messages containing event verbs Ratio of messages containing strong negative words
The number of messages at posting time
Other Linguistic Features TF-IDF feature, part-of-speech tagging feature [16–20,37,39,55,57,62–
Bag-of-words, named entity recognizing feature 67,71–75]

user information includes user profiles and social relationships. A news


message 𝑚𝑖 is originally posted by user 𝑢𝑖0 ∈  at time 𝑡𝑖0 , and is reposted
by user 𝑢𝑖𝑗 at time 𝑡𝑖𝑗 with textual content 𝑧𝑖𝑗 about 𝑚𝑖 . Thus, this repost is

a triplet 𝑝𝑖𝑗 = {𝑢𝑖𝑗 , 𝑡𝑖𝑗 , 𝑧𝑖𝑗 }, and all post/reposts set 𝑖 = {𝑢𝑖0 , 𝑡𝑖0 , 𝑚𝑖 } {𝑝𝑖𝑗 =
{𝑢𝑖𝑗 , 𝑡𝑖𝑗 , 𝑧𝑖𝑗 }} of message 𝑚𝑖 form a propagation structure. Besides, a user
𝑢𝑖𝑘 comments on message 𝑚𝑖 at time 𝑡𝑖𝑘 with comment textual content
𝜏𝑘𝑖 . Let 𝑖 = {𝑐𝑘𝑖 = {𝑢𝑖𝑘 , 𝑡𝑖𝑘 , 𝜏𝑘𝑖 }} be the comment set of 𝑚𝑖 . The goal of fake
news detection is to learn a function 𝑓 (⋅) using above information to Fig. 2. Examples of fake news with three different types of creation inten-
distinguish whether a message 𝑚𝑖 or an event 𝑥 is fake or not. tions: (a) misleading the public, (b) manipulating opinions and (c) attracting
user attention.
3. Intentional creation

The first discriminative characteristic is that fake news is intention-


ally created. Intentional feature-based methods are employed, which the example of Fig. 2a. Therefore such symbols are extracted as features
first extract distinctive features according to the intention, and then per- of this type of fake news.
form classification using extracted features. • Manipulate Opinions: The primary objective of this category of

fake news is to manipulate individuals’ opinions in order to gain their


3.1. Feature extraction support for the viewpoints of fake news. Different from the above cat-
egory of “Mislead the Public”, this type of fake news does not focus on
As shown in Table 1, intentional features of fake news can be cate- making the news appear more authentic, but rather aims to convey a
gorized into four types: misleading the public, manipulating opinions, certain viewpoint, and meanwhile uses emotional words and particular
attracting user attention and other general features. Except for the gen- writing styles to incite people to support that viewpoint. As shown in
eral features, the other three types exhibit different features that reflect the example of Fig. 2b, it incites people to support “Stop 5G”. Thus, such
distinct underlying intentions of their creation. In Fig. 2, we provide sentiment features and style features are captured for detection of this
typical examples for each of them with corresponding feature words type of fake news.
or symbols marked in red color, and we discuss their differences as • Attract User Attention: this type of fake news aim to increase traffic,

follows. click rate or to create a buzz. Different from the above two categories,
• Mislead the Public: the intention behind this type of fake news is to this type of fake news heavily relies on the headlines or cover page
make people believe that the content of the news is true. Thus, its textual images as shown in the example of Fig. 2c, and therefore hot topics,
content is quite similar to that of real news, except for subtle differences attractive images or clickbaits are often employed. Accordingly, topic
of employing special symbols, such as personal pronouns, URLs and “@” features, visual features and clickbait features are extracted for detec-
tags, etc, in order to make the news seem more convincing as shown in tion.

334
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

3.1.1. Mislead the public late opinions. Zhu et al. [80] examine the differences in writing styles
This kind of fake news is usually used for commercial purposes, e.g., between real and fake news articles from eight distinct perspectives, in-
to make the veracity of a news message indistinguishable for the public cluding readability, logic, credibility, formality, interactivity, interest-
so that users are more likely to buy some products. It often employs ingness, sensationalism, and integrity.
special symbols as listed in Table 1 to make it seem more convincing,
which can be extracted as features. 3.1.3. Attract user attention
Special Symbol Features: These features focus on capturing some This kind of fake news is mainly used for commercial or entertain-
special words or characters that are typically used to mislead the pub- ment purposes, e.g., to increase traffic, click rate or to create a buzz,
lic. For instance, Castillo et al. [6] propose to consider the text length where topic features, visual features and clickbait features are extracted
of news messages, and whether the message contains question or excla- to differentiate fake news from real news.
mation marks. This is because fake news generally has similar length Topic Features: Fake news tends to make use of sensational topics to
and uses special marks to mislead users. Gupta et al. and Castillo et al. attract users’ interest [6], such as divorces or pregnancies of celebrities
[3,6] propose to count the number of words, and use the first, second and flight accidents (e.g. “Flight MH370 lost contact”). Based on this
or third-person pronoun, as a feature. Liu et al. [7] propose to verify observation, the topics a message relates to can be used as one type of
whether the message contains witness phrases, like “I see”, “I hear” and features for detecting fake news. For example, Ma et al. [13] adopt the
so on. This is because a news message seems more credible if containing topic distribution of a message as the feature, which is calculated with
such words. Besides, Gupta et al. [8] and Sahoo et al. [76] consider the the Latent Dirichlet Allocation (LDA) model [81].
external URLs in messages as a supportive evidence, and Biyani et al. Visual Features: Fake news tends to associate images or videos
[9] extract some features from URLs, such as frequencies of dashes, up- with the message as visual descriptions, which are much more eye-
per case letters, and commas. Sun et al. [10] propose to consider the catching than textual content [59]. Given that forged images/videos
influence of messages, and thus the number of “@” tags, comments and [82,83] (produced by manipulation operations such as splicing, copy-
reposts are calculated. Yang et al. [11] study the location feature of the move or retouching, or even created by computers) are sometimes at-
event described in a message, and they show that if the location is a tached with fake news, it is intuitive to use forensic based methods [84–
foreign place, the message has a higher probability to be fake than that 86] to detect such multimedia data. However, these methods might be
of a domestic place. ineffective in social web [87], since the forged data usually undergo
multiple post-processing operations, such as recompression and filter-
3.1.2. Manipulate opinions ing during dissemination, which to some extent destroys the forensic
Fake news is often used to manipulate people’s opinions, especially traces. In this paper we focus on visual features particularly for fake
for political purpose. As discussed above, this type of fake news often news detection, which are categorized as visual statistical features and
employs emotional words and particular writing styles, which are cap- visual semantic features. The former focuses on statistical distributions
tured as sentiment features and style features, respectively. and the latter focuses on semantics of visual contents.
Sentiment Features: Fake news tends to use emotional or inflam- Visual statistical features can be extracted from associated images
matory words to manipulate users to support its viewpoints. Sentiment or videos to detect fake news. For example, Sun et al. [10] propose to
features [9,10,13–16,18,55,60,77] are extracted to identify such inflam- detect outdated images as a clue of fake news. They start a query for
matory speech that contains emotional words or sentences. To extract the image using an image search engine to retrieve all the records from
such sentiment features, many sentiment analysis tools are exploited. the internet, sort the search results chronologically, and the eldest entry
For instance, [15,16] utilize a sentiment tool referred to as the Linguis- gives the original publish time of this image. If the time span (i.e. the
tic Inquiry and Word Count (LIWC) to count the number of words in time difference between the posting time of the news and the original
psychologically meaningful categories. On this basis, a large amount of publish time of this image) is bigger than a predefined threshold, the
sentiment-related statistic features are extracted. For Sina Weibo mes- image is considered as outdated, and the corresponding message is more
sages, Sun et al. [10] consider whether a message contains strong nega- likely to be fake news.
tive sentiment words and opinion words. Wu et al. [14] employ the num- Jin et al. [59] propose five visual statistical features to measure im-
ber of positive or negative sentiment words within a message and com- age distribution: visual clarity score, visual coherence score, visual sim-
pute the average sentiment score of the message. For Twitter messages, ilarity distribution histogram, visual diversity score, and visual cluster-
Ma et al. [13] identify positive or negative words using MPQA3 senti- ing score. In this work, related news messages about the same event are
ment lexicon and some manually collected frequent emoticons. Sheng grouped together for event-level fake news detection. They observe that
et al. [78] design a pattern-based model that extracts features from nega- a real event usually has images from different sources, and its image
tion and sentiment words of Weibo and Twitter messages for veracity distribution tends to be general, while a fake event usually has limited
verification. sources of images and its image distribution tends to be distinct from
Differently, some works find that fake news prefers to use some emo- the average. Based on this observation, statistical scores are designed to
tional extreme adverbs or adjectives. Hence, the Named Entity Recog- model image distributions of events for fake news detection.
nition [60] and Parts of Speech (POS) [18,60] techniques can be em- Visual semantic features aim to detect fake news by examining the
ployed. To be specific, Hassan et al. [18] exploit the Natural Language coherence of visual content, textual content and event in the semantic
Toolkit (NLTK) tagger to extract the POS features. They collect 43 POS level. In particular, fake news tends to attach pictures to increase its
tags in the corpus, and count the number of words belonging to these credibility, nevertheless such pictures are usually irrelevant with the
tags for each sentence. news event as shown in [52]. To detect fake news with images, Sun
Style Features: Fake news is generally written with distinctive styles et al. [10] first use the attached image as a query to retrieve similar
in order to manipulate opinions. Style features are employed to model pictures from the search engine, returning a set of websites that are
writing styles of messages. In particular, political hyperpartisan news is ranked based on their credibility. The text messages are crawled from
more likely to be fake news since it tend to manipulate user opinions, the top ranked web. Then, the Jaccard coefficient is calculated between
and its writing style is different from mainstream news. Potthast et al. the text of the news message and the text crawled above. If the value of
[56] analyze the writing style of hyperpartisan news, and reveal that Jaccard coefficient is low, the news message is considered as text-image
the style of left-wing and right-wing news is similar to each other, but unmatched fake news.
different from mainstream news. Koppel et al. [79] propose an unmask- Additionally, the visual content is useful for clustering messages into
ing scheme to separate hyperpartisan news from mainstream real news, groups, and the group level fake news detection can be performed.
which can be used to detect fake news with the intention to manipu- Specifically, Jin et al. [17] propose to cluster messages with the same

335
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

image or video as a group. Features of messages in the same group are - Social reputation: Famous users are less likely to create fake news. Some-
aggregated for group level fake news detection. Aside from this, some times, spammers pretend to be famous, e.g. using nicknames similar to
works [28–30,55,58,88] propose to employ deep neural networks to experts, in order to make their posts seem more credible. To conquer
extract visual semantic features. For example, Jin et al. [28] propose this, users’ social engagements can be used to evaluate their social rep-
a multimedia fusion network, including a visual sub-network with the utations. For a given user, Gupta et al. [8] propose to consider the num-
VGG-19 [89] as a backbone to extract 512-dimensional visual represen- ber of friends, followers, status, and whether being verified or trusted
tation. The extracted visual representation is then concatenated with by social media. These features indicate whether the user is popular
the textual representation for fake news detection. Furthermore, Qi et al. and convincing. Sun et al. [10] propose that if a user have few followers
[58] propose to incorporate the features of frequency and pixel domains but many followee, he is likely to be a spammer. Besides, they study
of images to detect fake news. the proportions of user posted messages which contain strong nega-
Clickbait Feature: Fake news tends to use sensational headlines to tive words and event-related verbs (i.e. verbs usually used for event
induce users to click on a particular web page [9], e.g. “Beers Ameri- description rather than for daily life). The larger proportion will give a
cans No Longer Drink”. This kind of news messages is less formal and higher probability for this user to be a spammer. The study conducted
more readable than professionally written ones. To detect the clickbait, by [75] analyzed how users behave on Twitter when promoting false
Biyani et al. [9] extract statistical informality/readability features to dif- cancer treatments. They then suggested a model that focuses on users
ferentiate clickbaits, such as whether containing internet slang or swear to identify those who are likely to spread health misinformation. This
words, whether using repeated characters (e.g., ooh, aah, etc.), and the model extracts various user behavior features, such as attitudes, writing
similarity between the headline and top sentences. Besides, they further styles, and sentiments expressed in their posts on social media. Ghenai
design indicative scores at the informality level and readability level, et al. [68], Zhao et al. [69] and Prasannakumaran et al. [70] study the
which are computed as follows: features of user behavior and engagement for health misinformation dis-
- Coleman-Liau score (CLScore) [90]: measures reading difficulty empir- semination, which can be exploited for health misinformation detection.
ically, computed as: For example, Ghenai et al. [68] propose a user-centric model to identify
those who are likely to spread health misinformation by extracting user
𝐶𝐿𝑆𝑐𝑜𝑟𝑒 = 0.0588𝐿 − 0.296𝑆 − 15.8 (1)
features, such as attitudes, writing styles, and sentiments expressed in
where 𝐿 denotes the average amount of letters per 100 words, and 𝑆 their posts on social media.
denotes the average amount of sentences per 100 words. - Personal information: Spammers tend to hide their real information,
- RIX and LIX indices [91]: indicate readability, computed as: i.e. incomplete personal information. Gupta et al. [8] consider that the
𝐿𝑊 𝑊 100𝐿𝑊 spammers might register recently, and their personal information is usu-
𝑅𝐼 𝑋 = and 𝐿𝐼 𝑋 = + (2)
𝑆 𝑆 𝑊 ally incomplete. Hence, they check the register time, personal descrip-
where 𝑊 is the word count, 𝐿𝑊 is the long word (i.e. over 6 characters) tions, URLs, profile images and locations. Also, they check whether the
count, 𝑆 is the sentence count. profiles in different social media are linked with each other, since nor-
- Formality measure (fmeasure) [92]: measures the formality by count- mal users always link them for convenience, while spammers do not.
ing different part-of-speech tags in the article, such as nouns, verbs and Liu et al. [62] indicate that such personal information is the most sig-
adjectives. nificant factor for early fake news detection. The consistency of tweet
Besides the above indicative scores, the style of structuring click- location, profile location and event location is also indicative [7]. Yang
bait headlines can also be used for detection. There is one style called et al. [11] find that the client program type is particularly useful in de-
forward-reference [93], where such headlines usually include teasers or tecting fake news on Sina Weibo, which includes PC-client program and
obvious information gap between the headline and the article. For exam- mobile-client program. They find that if a news message refers to an
ple, given a headline: “This Is the Most Horrifying Cheating Story”, users event happened abroad, and the message is published from a PC-client
might wonder what is “This”, and hence click the web page. Biyani et al. program, it is more likely to be fake news. Dou et al. [67] propose that
[9] show that forward-reference is usually featured with demonstrative users’ historical posts in the social media reflect their personalities, sen-
pronouns, personal pronouns, adverbs and definite articles, which can timents and stances, which can be used to detect fake news.
be used for clickbait detection. Other Linguistic Features: Linguistic features are widely used for
fake news detection [16–19,32,55,62,74]. For example, Hassan et al.
3.1.4. General features [17,18] propose to count TF-IDF, which is a numerical statistic that
In addition to the intention-specific features mentioned above, there reflects the importance of each word in the sentence. This can help
are still some general features applicable for all kinds of purposes. We to analyze words that are frequently used in the fake news but rarely
categorize such general features into temporal features, user features in the real news, such as “amazing”, “poisonous” and “mortal”. Chen
and other linguistic features. et al. [19] also propose to extract TF-IDF. They first build a dictio-
Temporal Features: After deliberately creating fake news, the spam- nary with K most frequent vocabularies with a message set, and then
mers tend to make it popular as much as possible. Thus, the spread compute the TF-IDF for these vocabularies. Each message is encoded
of fake news is different from real news. Kwon et al. [15] extract the as a vector using the TF-IDF, and the value is 0 if a word never ap-
temporal feature of the news spread process, and observe that fake pears in the message. Volkova et al. [32] propose to use a pretrained
news usually has multiple and periodic spikes for the number of re- model to extract the GloVe embeddings of message texts. Verónica et al.
posts/comments, while real news typically has a single prominent spike. [16] derive a set of rules based on context free grammar (CFG) trees us-
Similarly, Wu et al. [14] propose to compute the repost time feature, ing the Stanford Parser, which comprises all the lexicalized production
which is the average time difference between the original message and rules. These rules are integrated with parent and grandparent nodes,
the reposts. Considering that spammers repeatedly repost and comment which are then encoded as TF-IDF features. Likewise, bag-of-words
with similar content, Sun et al. [10] propose to compute the number of [72], part-of-speech tagging [18,71], and named entity recognizer
duplicated reposts/comments as a feature. They measure the similarity [73] are also employed to analyze keywords in messages for fake new
between two reposts/comments by computing the Jaccard coefficient of detection.
their keywords. Reposts/comments will be considered as duplicated as
long as their similarity exceeds a predefined threshold. 3.2. Detection methods
User Features: Users intentionally distribute fake news and may be-
have differently. In the literature, user features can be extracted from Based on the above extracted features, classification algorithms can
two perspectives: social reputation and personal information. be applied to detect fake news. Existing methods range from tradi-

336
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

tional machine learning approaches to recent neural network based ap- that a few key sentences, especially the last sentence of an article, play
proaches. more important roles in satirical news detection.
CNN-based Methods: Many CNN-based methods [38–41,62,96] are
proposed to detect fake news. Yu et al. [38] split news messages about
3.2.1. Traditional machine learning methods
an event into chronological groups, and a representation vector can be
After extracting features based on intentional creation, proper fea-
learned for each group by paragraph vector methods [97]. The group
tures and classifiers will be selected for fake news detection. The fea-
of vectors forms a matrix as the input of a CNN, which automatically
ture selection methods aim to reduce the feature dimension and re-
extracts local-global features and learns high-level interactions of latent
tain informative features, including GINI index, information gain and
features. Additionally, external sources can be mined to facilitate fake
random forest. For example, [6,7,18] use the GINI index to investi-
news detection. For instance, Karimi et al. [39] present a multi-source
gate the importance of features in constructing a decision tree. Castillo
and multi-class detection model that incorporates information from dif-
et al. [6] find that the sentiment features, the number of tweets, friends
ferent sources to increase the discrimination ability to differentiate de-
and re-tweet counts are prominent in fake news detection. Kwon et al.
grees of fakeness, including True, Mostly-True, Half-True, Barely-True.
[15] use the random forest and logistic model to find informative fea-
In this process, the feature of the message text from each source is ex-
tures. Specifically, the 2-fold cross-validation is conducted repeatedly,
tracted. Then, an attention mechanism is used to fuse such features, and
and the features are sequentially reduced from the feature set, in order
the result is used for multi-class classification.
to find the most important features. Biyani et al. [9] exploit information
Qian et al. [40] propose to utilize the historical user responses in
gain to rank the features, and discard features with zero information
previous articles as auxiliary information to perform fake news early
gain. Shushkevich et al. [94] analyze the word frequencies of COVID-19
detection. The overall framework consists of a User Response Generator
fake news, which can be used as features of such health misinformation.
(URG) and a Two-Level Convolutional Neural Networks (TCNN). The
Given the above selected features, many machine learning methods can
URG aims to learn a generative model of user response to true and fake
be used to perform fake news classification, like Decision Tree, Gradient
news based on their historical responses, and the TCNN learns features
Boosted Decision Trees (GBDT) [21], Logistic Regression, Max-Entropy
of the news in both word-level and sentence-level. The two modules
classifier [20] and SVM along with different kernels.
are finally fused to perform classification. Wang et al. [98] consider the
scenario where users can report to the social media platform whether
3.2.2. Neural network based methods a news message is fake or not, even though such reports may be noisy.
Inspired by recent advances in deep learning, a lot of deep neural They exploit these reports as weak annotations and use CNN for feature
network based approaches are proposed. Based on the network struc- extraction.
tures, such approaches can be categorized as, Recurrent Neural Net-
works (RNN) based methods and Convolutional Neural Networks (CNN) 4. Heteromorphic transmission
based methods.
RNN-based Methods: Many RNN-based methods [19,20,31,34– The second discriminative characteristic is that fake news is hetero-
37,43,44,75,95] are proposed to capture the variation of fake news over morphically transmitted, i.e., the propagation structure of fake news
time. Ma et al. [31] propose to identify fake news using RNN, which is different from that of real news. For example, the propagation tree
learns long-distance dependencies of information. For a given event, all of fake news is typically deeper and wider [5]. As for health misinfor-
related news messages are split into groups based on time intervals, and mation, the propagation of health misinformation tends to form more
the top-K TF-IDF values of the vocabulary terms in a group are calcu- local clusters than real information, resulting in the unique propagation
lated as the input of each RNN unit. The final output of RNN is used for patterns of health misinformation for detection [99]. Such heteromor-
fake news classification. Rashkin et al. [20] propose a Long Short Term phic propagation structure is caused by (a) the intentional nature of
Memory networks (LSTM) model that takes the sequence of words as users during fake news propagation. For example, spammers and bots
input, and predicts the reliability of the news into different categories, are paid to propagate fake news for commercial or political purposes.
i.e. trusted, satire, hoax, or propaganda. Ruchansky et al. [34] propose (b) the varying nature of messages during fake news propagation, e.g.,
a hybrid deep model that combines the textual content, user response comments with incendiary language or related to hot topics. Yang et al.
(i.e. reposts/comments) and source user for more accurate fake news de- [100] propose a subgraph reasoning mechanism for the detection of fake
tection. The hybrid model consists of three key modules: capture, score, news. This mechanism aims to identify the most significant subgraphs
and integrate. The capture module utilizes the LSTM to capture the tex- within the news propagation network, as they play a crucial role in ver-
tual and temporal pattern of user response. The score module learns the ifying the authenticity of the news.
user representation and signs a score of each user. These two modules All these not only promote the propagation, but also result in dif-
are further integrated in the third module to perform classification. ferent propagation characteristics of fake news. For example, Castillo
Various frameworks are proposed to classify fake news in a more et al. [6] propose to use the maximum or average depth of the propaga-
fine-grained manner. Wen et al. [35] propose to utilize external cross- tion graph, the degree of the root, the maximum/average degree of the
lingual cross-platform features extracted by gated recurrent unit (GRU), graph, etc. to detect fake news. Kwon et al. [15] propose to extract fea-
which captures the agreement of news and corresponding comments tures from three types of networks, such as the friendship network, the
from different social media and linguistics, and then perform fake news largest connected component of the friendship network, and the diffu-
classification. Some works focus on attending to partial distinct features sion network (i.e. the propagation tree for a given message). They show
for classification, e.g. emotional words and provocative sentences. To that if the fraction of information flow from low to high-degree nodes is
achieve this goal, the attention mechanism [19,36,37] is employed. For large or the fraction of singletons is large in the diffusion network, the
example, Chen et al. [19] apply the soft attention mechanism to RNN, message is likely to be fake.
which can simultaneously focus on particular distinct features and cap- Therefore, the propagation structure can be used to detect fake
ture contextual variations of the message over time. Instead of attending news. According to the way in which the propagation structure is con-
to the key features, some works focus on attending to key sentences. De structed, existing propagation-based methods can be categorized into
Sarkar et al. [37] propose a hierarchical attention model for satirical three groups, as shown in Fig. 3.
news detection. Specifically, this model first takes the word embedding • Message-based Propagation Structure: The propagation struc-

as input of an RNN to extract the embedding of a sentence, while all ture is modeled as a network where the nodes correspond to messages,
sentence embeddings are attentively merged to obtain an article level and the edges correspond to pairwise relations between messages. The
embedding that is used for classification. The experiment results show node information is the content of the corresponding message, such as

337
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

Fig. 3. According to the way in which the propagation structure is constructed, existing propagation-based methods can be categorized into three groups:
(a) message-based propagation structure, (b) user-based propagation structure, and (c) hybrid propagation structure.

Table 2
Propagation based Methods for Fake News Detection.
Characteristics Methods Reference
Message-based News hierarchical propagation structure including layers, such as Credit propagation mechanism for entity credibility evaluation [12]
event layer, sub-event layer and message layer
Message propagation patterns/graphs Kernel based classification method to detect fake news propagation pattern [14,26]
GCN based method to learn propagation and dispersion representations of [74]
the graph for detection
User-based Topology features of propagation structures, such as the fraction ofRandom forest based detection method [15]
isolated nodes and the fraction of news diffusion from low degree
node to high degree node
User propagation structure with user profiles along the propagation RNN and CNN based method to capture the global and local features of user [102,103]
path, such as follower counts, verification status, etc. propagation path for detection
GAT based method to learn the representation of the user propagation [66]
structure for detection
Hybrid Event-message-user relationship with the assumption that credible Credit propagation mechanism for entity credibility evaluation [8]
entities link with each other with high weights
Publisher-news, news-user and user-user interactions, including Representation learning based method to learn publisher, user and news [104]
news publishing, user reposting and user following etc. representations based on their interactions for prediction
Propagation graph structure including nodes such as, publishers, CNN based method using the representations of users’ profiles and their [62]
news messages and users reposted texts along the propagation path as the input of the CNN classifier
for detection
GNN based method to learn the node representations in the propagation [63–65]
graph for detection

the text, image, topic, sentiment score, etc. The edge information could Based on the above three propagation structures, we summarize ex-
be different types of message relations such as sourced from, being part isting works in three categories, as shown in Table 2. We then discuss
of, belonging to the same event and so on. these mechanisms respectively.
• User-based Propagation Structure: The propagation structure is

modeled as a network, where the nodes correspond to users, and the


4.1. Message-based propagation structure
edges correspond to pairwise interactions between users. The node in-
formation is the attributes of the corresponding user, including authen-
Many traditional machine learning based methods are proposed to
ticated or not, number of followers, number of friends, etc. The edges, or
model message-based propagation structure. The most common strat-
user interactions, could be reposting, commenting, replying, following,
egy is kernel-based approaches, which capture the propagation patterns
unfollowing, liking, and so on.
• Hybrid Propagation Structure: The propagation structure is mod-
and differentiate fake news from real news. Wu et al. [14] propose a ran-
dom walk graph kernel and a normal radial basis function (RBF) kernel
eled as a network, where the nodes correspond to messages or users,
to capture the high-order propagation patterns of the message. In this
and the edges correspond to the relations or interactions between two
process, the random walk graph kernel is used for measuring the sim-
nodes. Typically, the relation between a user and a message is reposting
ilarity between different propagation trees, and the RBF is applied to
or commenting on. In this kind of methods, user-message, user-user and
measure the distance between features of different messages. Similarly,
message-message relations are fully exploited. By integrating these vari-
Ma et al. [26] propose a propagation tree kernel to detect fake news
ous relations, the credibility of users and messages can be predicted in a
by comparing the similarity between the propagation trees of different
mutually reinforced manner. This kind of methods has become popular
news.
in recent years.

338
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

Besides, some works model the propagation structure in a fine- interactions, such as news publishing, user reposting and user follow-
grained level. Jin et al. [12] propose a hierarchical model that consists of ing. With the observation that publisher partisan bias is correlated with
three layers, i.e. event, sub-event and message layers. For a given news news veracity, and users tend to connect with like-minded peers, and
event, in a bottom-up manner, all related messages are first clustered repost news that confirms with their existing perceptions, the TriFN
into sub-events with the single-pass incremental clustering algorithm, framework is designed to learn publisher, user and news representa-
and then all sub-events link to the news event, each of which reflects tions based on such interactions, which can be used for fake news
one point of view of this event. Each entity of the network is assigned detection.
with a credibility value, which will propagate through the network. An Liu et al. [62] propose to incorporate the repost content along with
iterative graph-based optimization algorithm is proposed to calculate the user propagation path for early detection of fake news. They pro-
the final event credibility. pose a neural network based method, where a Text-CNN block and a
In recent years, a lot of research efforts has been made in graph neu- user profile embedding block are used to extract the text and user fea-
ral networks (GNN) [101], which can capture information diffusion in tures, respectively. The vector representations of users and repost texts
a graph, and learn the high-level representations of entities in networks along the propagation path form a matrix, which is then fed to a CNN
for downstream applications/tasks. Thus, GNN is also suitable to model classifier for fake news detection. In addition, a PU-Learning mecha-
the news propagation structure in social media. Bian et al. [74] propose nism is exploited to address the unlabeled and imbalanced training data
a Bi-directional GNN, which incorporates both propagation and disper- problem.
sion patterns of rumors. This model can extract features on both top- Huang et al. [63] propose to incorporate both the user interactions
down and bottom-up paths. The top-down directed graph is leveraged and the news propagation structure for fake news detection. On one
to learn propagation patterns of fake news, and the bottom-up directed hand, users who post and repost the same news message are consid-
graph is exploited to learn dispersion patterns. Then, the learned rep- ered to be fully connected in an undirected graph, and GNN is used
resentations are pooled and merged through multiple fully connected to encode user representations. On the other hand, the reposts along
layers to make predictions. the news propagation path are encoded by Recursive Neural Network
(RvNN). Finally, the encoded results of two networks are fused for clas-
sification. Similarly, Nguyen et al. [64] propose a Factual News Graph
4.2. User-based propagation structure
(FANG), which models social interactions such as user following, news
posting, reposting, and source media hyperlinking as edges in the graph,
A few works focus on modeling propagation structure based on users
and learns the representation of the target news message for fake news
and their interactions, which can be effective in early detection of fake
detection.
news. Lin et al. [102] model each user’s characteristics as a vector of
For a given news message, Lu et al. [65] exploit its message text, user
values, including the length of user name, follower count, registration
propagation sequence, and users’ profiles to verify its veracity. They pro-
age, etc. For a given news message, its propagation path is presented
pose a Graph-aware Co-Attention Network (GCAN) model, using both
as the sequence of users who repost it at different time instances. Thus,
GRU and GNN to learn the representations of the message text and users,
such path can be modeled as a sequence of user vectors, which are then
respectively. An attentive mechanism is jointly designed for reasonable
fed to both an RNN-based model and a CNN-based model to capture the
explanations. The experiment results show that some evidential words
global and local features of the propagation path. The learned features
(such as “breaking”, “strict”) and user profile factors (such as account
are then concatenated for classification. Similarly, Wu et al. [103] pro-
creation time and user description length) have higher attentive weights
pose a TraceMiner model that also takes the user repost sequence as
for fake news detection. Yu et al. [105] propose to construct a heteroge-
the propagation structure of a message. The embeddings of users are
neous graph to capture relationships between source posts, comments,
inferred from social network structures, and LSTM-RNN is employed to
and users, and employ attention-based mechanism to aggregate multi-
model propagation structures of fake news.
type information for news verification.
Ni et al. [66] propose a MVAN model that uses both the news seman-
In terms of health misinformation detection, Min et al. [106] for-
tic features and the user-based propagation structure features for fake
mulate the misinformation detection as a graph classification task and
news detection. The textual content of the news is encoded by a Bi-GRU
model the message-message, user-user and message-user interactions on
network, while the user propagation structure is modeled as a graph at-
social network as a heterogeneous graph. Cui et al. [107] first extract
tention network (GAT). Their experiment results show that the key users
the context information, such as news publishers and engaged users,
in the propagation structure of fake news usually register lately, with no
and temporal information of user interactions, and then formulate such
certification, nearly empty profile and very few followers.
information in the meta paths of the propagation structure for misinfor-
mation detection. Paraschiv et al. [108] propose an approach that com-
4.3. Hybrid propagation structure bines user-based, network-based, and content-based features of health
misinformation with a unified meta-graph structure.
The hybrid propagation structure based approaches jointly consider
users, messages and their interactions for fake news detection. For exam- 5. Controversial reception
ple, Gupta et al. [8] study the user-message-event structure, and propose
BasicCA, which is a credibility propagation method, using a PageRank- The third discriminative characteristic of fake news is controversial
like approach. In this model, three types of relationships are considered: reception. In social media, users have different viewpoints and com-
(1) User-Message relationship: it is more possible for credible users to ments on an event or a news message, such as supporting, denying or
provide credible messages. (2) Message-Event relationship: the average questioning. As discussed in [75], users tend to hold opposed stances
credibility of messages associated with credible events should be higher rather than the same stance towards fake news, which are crucial for
than that with non-credible events. (3) Event-Event relationship: events fake news detection. Therefore, recent works propose stance-based ap-
that share a large number of common words and topics should obtain proaches that utilize the user viewpoints towards a news message to
similar credibility. Based on this model, the credibility of users, mes- infer its veracity. User stance can be summarized as two types: explicit
sages and events can be jointly evaluated. stance and implicit stance, as shown in Table 3.
Based on the social and psychological studies that reveal the con- • Explicit Stance Based Method utilizes the user stance as an ex-

firmation bias effect and echo chamber effect in social media, Shu plicit label for fake news detection. The stance label can be either from
et al. [104] propose a TriFN framework to exploit the tri-relationship external annotations or statistical data, like the number of “thumbs up”
among news publishers, news, and users, which reflects their direct and “thumbs down”.

339
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

Table 3
Stance based methods for fake news detection.
Characteristics Methods Reference
Explicit stance News propagation graph with stance label Pattern match based method to identify the graph pattern of fake news [112]
Users’ comments and reposts Multi-task GRU based scheme to detect users’ stances from their [75]
comments/reposts, and predict news veracity in the same framework
Users’ opinions inferred from user behaviors, such as like, Bayesian network based method that considers the credibility of both news [24]
comment or repost behaviors messages and users as random variables, which can be evaluated by a Gibbs
sampling based scheme
Implicit stance News messages posted by users regarding the same event Event credibility evaluation by clustering related messages into conflicting [115]
viewpoints
Comments, reposts, and personal information of users in GNN based method that learns the representations of users’ stances from their [57,67]
the news propagation graph comment/reposts and personal information for fake news detection

• Implicit Stance Based Method mines users’ latent stances im- posts/reposts/comments with the same stance form supportive rela-
plicitly from their reposts or comments, which are helpful to infer the tions, and can mutually rise their credibility, while those with conflict
credibility of messages. stances form opposed relations, and will mutually weaken their credi-
bility.
Jin et al. [115] construct a credibility network by exploiting the re-
5.1. Explicit stance based method
lation of viewpoints. Specifically, they propose to cluster user posted
news messages regarding the same event into conflicting viewpoints us-
Hanselowski et al. [33] propose to detect user stances as the first step
ing the k-means algorithm. Messages are linked to construct a credibil-
towards fake news detection in FNC-1 [109]. Nguyen et al. [64] pro-
ity network, and the link type is either supporting or opposing based on
pose to use fine-tuned pre-trained model, such as Transformers [110] or
their viewpoints. The credibility values of the messages are propagated
RoBERTa [111] to detect the stances of users’ comments into four cat-
through the graph iteratively. Mutually supporting messages can have
egories, including neutral support (with neutral sentiment), negative
similar credibility values while mutually opposing messages can have
support (with negative sentiment), deny, and report (i.e. repost with
opposite or close to zero credibility values. The final credibility of the
no comment). Similarly, Wang et al. [112] adopt sentiment analysis
event can be obtained by averaging the credibility values of all related
techniques to retrieve user attitudes towards a news message, which
messages.
consists of three types of labels, including SUPPORT, DENY and QUES-
Li et al. [116] extract users’ sentiment polarity, degree of skepticism,
TION. The stances of users in the propagation path form a labeled graph.
and emoji attitude towards a news message from their responses, to facil-
By performing a graph-based pattern matching algorithm, the distinc-
itate the fake news detection. Dou et al. [67] propose a user preference-
tive patterns can be found for fake news detection. To tackle with health
aware mechanism to detect fake news, which implicitly mines users’
misinformation, Hossain et al. [113] propose to detect stances of tweets
personalities, sentiments, and stances from their historical posts as user
regarding specific known misconceptions with BERT and NLI (natural
preference features. Specifically, users, who repost or comment on the
language inference) models, and the tweets that agree with above mis-
same news, form a propagation graph. Their historical posts in the so-
conceptions will be identified as misinformation.
cial media are crawled for user preference feature extraction. Then, a
Ma et al. [75] propose to jointly treat the stance classification and
GNN model is used to encode user preference features in the graph, and
fake news detection in a multi-task learning scheme, where these two
a readout function taking the mean pooling operation over all node em-
tasks can be boosted in a mutually reinforced manner. For example,
beddings is performed, to obtain the entire graph embedding, which is
in the proposed architecture, a GRU layer can be shared for both stance
concatenated with the news textual embedding for fake news classifica-
classification and fake news detection tasks. Specifically, comments cor-
tion.
related to a given news message are organized in chronological order.
Similarly, Xie et al. [57] propose to extract the stance representations
Each comment is represented by a vector of TF-IDF values, which is then
from users’ replies/comments towards a news message using BERT. All
fed to a shared GRU in sequence, and each hidden state ℎ𝑡 from the GRU
these representations are encoded by a graph-based stance reasoning
is used to classify the corresponding message stance, while the final GRU
network, which simulates the fact that users may aggregate others’ com-
output ℎ𝑇 is used for fake news detection.
ments, which they can browse in social media, to form their own opin-
Yang et al. [24] propose a generative unsupervised approach for fake
ions. Finally, the encoded stance representation is concatenated with the
news detection, where user stances can be inferred from their behaviors,
textual and visual representations of the news message for classification.
such as like, comment, or repost. The credibility of the news is viewed
as a latent random variable, and a Bayesian network is exploited to cap-
6. Datasets
ture the conditional dependencies among news veracity, users’ opinions
and credibility. Finally, a collapsed Gibbs sampling based approach is
With the surging interest arising in fake news detection, plenty of
adopted to evaluate the credibility of news messages and users at the
benchmark datasets have been proposed. Most of them are collected
same time, given user opinions inferred from their behaviors.
from real-world social media, like Twitter, Facebook and Sina Weibo.
Davoudi et al. [114] propose to construct the propagation tree and
DUlizia et al. [117] provide a thorough review of evaluation datasets
the stance network for early fake news detection, where the stance net-
for fake news detection. In this paper, we give a brief introduction of
work is built by analyzing the sentiments of responses associated with
popular ones that are frequently used in this field. Their statistics, i.e.,
a news article, and responses with similar sentiment are linked in the
the numbers of messages, events and fake news, are listed in Table 4.
network. Finally, features are extracted from both the propropagation
• Weibo1 [31] and Weibo2 [118] are frequently used Chinese fake
tree and the stance network to detect the news veracity.
news detection datasets. Weibo1 is collected from a Sina community
management center. Each Weibo is regarded as a news message asso-
5.2. Implicit stance based method ciated with a binary label, indicating whether the story is a rumor or
not. This dataset contains 4664 events, where 2313 of them are fake.
Implicit stance based approaches focus on mining stance la- In this dataset, such events are associated with 3,805,656 messages
tent representations from users’ posts/reposts/comments, which can and 2,746,818 users, which include the original messages as well as
facilitate in measuring the credibility of news messages. Usually, retweets and replied messages. This online social context information

340
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

Table 4 On one hand, MediaEval includes only 11 events, which are extended
Statistics of the datasets. to 17 events in CCMR. On the other hand, news messages in MediEval
Statistic Messages Events Fake news are crawled from Twitter only, while CCMR also includes webpages
collected from different search engines. Specifically, CCMR consists of
Weibo1 [31] 3,805,656 4,664 2,313 three sub-datasets including CCMR Twitter, CCMR Google, and CCMR
Weibo2 [32] 7,300 - 3,834 Baidu, which are all related to 17 events. CCMR Twitter includes a to-
Twitter15 [7] 1,490 1,490 370 tal of 15,629 tweets, with 6225 of them classified as true and 9404 of
Twitter16 [31] 1,101,985 992 205
LIAR [121] 12,836 - -
them classified as fake. CCMR Google has 4625 Google webpages, out of
FNC-1 [33] 75,385 2,587 - which 3197 are true, 729 are fake and 699 are unverified. CCMR Baidu
MediaEval [52] 15,000 11 9,000 consists of 2506 Baidu webpages, with 1393 true news, 508 fake news,
CCMR [35] 15,629 17 9,404 and 605 unverified news.
FakeNewsNet [123] 23,196 23,196 5,755 • FakeNewsNet [123] mainly collects data from two well-known
Fakeddit [124] 1,063,106 - 628,501
PHEME [125] 5,802 5 1,972 platforms with fact-checking: PolitiFact [122] and GossipCop [128] In
FakeHealth [126] 2,296 16 763 PolitiFact, news messages are fact-checked by journalists and domain
experts, and FakeNewsNet gathers 432 fake messages and 624 real
messages from PolitiFact. In GossipCop, each news message is given a
can help in constructing the news propagation structures and inferring rating score out of 10, where messages with scores below 5 are con-
users’ stances towards the news. Weibo2 is a cross-domain dataset which sidered as fake news, and 5323 fake messages and 16,817 real mes-
can be found at the “Internet fake news detection during the epidemic sages are collected for FakeNewsNet. Thus, FakeNewsNet contains 5755
competition held by CCF Task Force on Big Data [119]. It covers eight fake news messages and 17,441 real news messages in total. In addi-
domains including health, economy, technology, entertainment, society, tion, FakeNewsNet includes 3 kinds of information i.e. news content
military, politics, and education, and includes 7300 news articles, where with labels, social context information, and spatio-temporal informa-
3834 of them are fake. For each news article, user comments are also tion. News content information contains the news messages with text
included. and images, and labels indicating their veracity. Social context informa-
• Twitter15 [7] and Twitter16 [31] are the most frequently used tion includes user engagements such as posting, forwarding, replies, and
datasets, which are collected by Snopes [120], which is an online web- likes, etc. Spatiotemporal information provides the locations of users
site providing rumor debunking service. Twitter15 contains 1490 tweets and news articles, and also timestamps of news publishment and users’
and 276,663 users, where all the tweets can be divided into four cate- responses.
gories: non-rumors, false-rumors, true-rumors, and unverified rumors, • Fakeddit [124] is collected from Reddit [129], which is a well-

with counts of 374, 370, 372, and 374 respectively. Twitter16 con- known social media platform and online community for users to share
tains 992 events, including 205 non-rumors, 205 false rumors, 207 true information and discuss with each other out of interests. Fakeddit con-
rumors and 201 unverified rumors. Such events are associated with tains more than 1 million message samples of 22 different topics rang-
1,101,985 tweets and 491,229 users. ing from political news stories to simple everyday posts, where 628,501
• LIAR [121] is an American politics related fake news dataset with of them are fake samples and 527,049 of them are true samples. The
over 12,836 short statements during the time from 2007 to 2016. It is dataset is gathered from over 300,000 users, covering the period from
collected from a fact-checking website PolitiFact [122], including 4150 March 19, 2008, to October 24, 2019. Each message includes its title,
statements from the Democratic Party, 5687 statements from the Repub- images, score, user comments, and up-vote to down-vote ratio, etc.
lican Party, 2185 statements from non-partisans, and 814 other state- • PHEME [125] is collected from Twitter, regarding five newswor-

ments. Each statement is assigned with a truthfulness rating out of 6 lev- thy events, including Ferguson unrest, Ottawa shooting, Sydney siege,
els, i.e. pants-fire, false, barely true, half true, mostly true, and true. In Charlie Hebdo shooting and Germanwings plane crash. It mainly sam-
the dataset, the proportions of the six truthfulness ratings are as follows: ples tweets that triggered a large number of retweets, and collects 5802
pants-fire 8.19%, false 19.60%, barely true 16.44%, half true 20.54%, such tweets in total. For each event, all of its tweets are organized in
mostly true 19.18%, and true 16.05%. a timeline, where professional journalists and experts are assigned to
• FNC-I [33] is a dataset from an online fake news detection chal- review this timeline and indicate whether such tweets are rumors or
lenge “Fake News Challenge Stage 1 (FNC-I): Stance Detection”, where not. Among all the tweets, 1972 are classified as rumors and 3830 are
the organizer of this challenge believes that understanding users’ opin- classified as non-rumors.
ions and stances regarding a news message would be the first step to- • FakeHealth [126] is collected from HealthNewsReview.org [130],

wards fake news detection. Thus, in FNC-I they focus on stance detec- which is a fact-checking project for health related news. It is a fake
tion, and include 75,385 (headline, document) pairs, where each pair is health news dataset with 2296 news about 16 health topics, includ-
annotated by one of the four stance labels, i.e. agree, disagree, discuss, ing Cancer, Surgery and Nutrition etc. It contains 4 types of informa-
and unrelated, describing the stance of the document to the headline. tion: news contents, news reviews, social engagements and user net-
• MediaEval [52] is a dataset including fake news with misuse of works. News contents include text, images, URLs, etc. News reviews con-
multimedia content on Twitter. The retrieval of data was facilitated sist of ratings, tags, categories and other elements. Social engagements
through the utilization of Topsy [127] that is a search engine and Twitter provide tweets, replies and retweets. User networks provide profiles,
APIs. The dataset includes news messages related to 11 events, such as timelines, followees and followers of users. It collects news messages
Hurricane Sandy, the Boston Marathon bombing, etc, and each message from 2 types of information sources, i.e. news media and institutes, re-
includes text content, attached image/video and several social contexts. spectively, which correspondingly forms two subsets, HealthStory and
Specifically, the dataset is divided into a development set and a test set. HealthRelease within FakeHealth. HealthStory has 1218 real news mes-
The development set includes 5008 true news messages with 176 au- sages and 472 fake news messages, while HealthRelease has 315 real
thentic images, which are posted by 4756 users, and 7032 fake news news messages and 291 fake news messages.
messages with 185 misused images, which are posted by 6769 users.
The test dataset consists of 1217 true news messages with 17 authentic 7. Discussion and future directions
images, which are posted by 1139 users, and 2564 fake news with 33
misused images and 2 misused videos, which are posted by 2447 users. In this survey, we analyze the entire diffusion process of fake news,
• CCMR [35] is a cross-lingual cross-platform multimedia rumor ver- and summarize its three main characteristics, i.e., intentional creation,
ification dataset, which extends MediaEval [52] from two perspectives. heteromorphic transmission, and controversial reception. In this section,

341
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

Table 5
Characteristics and proposed methods for fake news detection.
Characteristics adopted in proposed methods
√ √ √
Intentional Creation Mislead Public Special Symbol
√ √ √ √ √
Manipulate Opinions Sentiment

Style
√ √ √ √
Attract User Attention Topic
√ √ √
Visual

Clickbait
√ √ √ √
General Features Temporal
√ √ √ √ √ √ √ √ √ √
User
√ √ √ √ √ √ √ √ √ √ √
Linguistic
√ √ √
Heteromorphic Transmission Message-based
√ √ √
User-based
√ √ √
Hybrid
√ √ √
Controversial Reception Explicit Stance
√ √
Implicit Stance
References [6] [10] [13] [14] [9] [56] [34] [75] [37] [39] [63] [55] [64] [65] [66] [67] [57] [114]
Year 2011 2013 2015 2015 2016 2017 2017 2018 2018 2018 2019 2019 2020 2020 2021 2021 2021 2022

we take a close look of fake news detection methods using above char- 2010s, GINI index and information gain based schemes are used to rank
acteristics in the past decade and reveal the trends of technological ad- different hand-crafted features (e.g. [6,9]), and this explains which fea-
vances. This will provide some insights for designing effective detection tures are more important. With the recent advance in explainable deep
methods. We then discuss remaining challenges and other directions for neural networks, such detection mechanisms can reveal the key factors
future study, such as robust fake news detection, impacts of LLM in fake of fake news in a more fine-grained manner. For example, the works
news detection and early fake news detection, etc. in [65,66] show that some evidence words, sentences and users in the
propagation structures play more important roles in fake news detec-
tion.
7.1. Trends of fake news detection methods The above discussions provide insights of designing effective and ex-
plainable fake news detection mechanisms by (1) utilizing characteris-
Table 5 shows the typical detection mechanisms using differ- tics from all three categories of the diffusion process; (2) capturing these
ent characteristics in the past decade in chronological order, from characteristics in one framework for effective detection; and (3) making
which we can observe the trends of technological advances as use of explainable schemes.
follows:
Characteristic Selection: Characteristics from all three categories of
the diffusion process tend to be utilized together for fake news detection. 7.2. Future directions
Before 2015, the detection mechanisms mainly use hand-crafted features
in the category of intentional creation, and the topology features of the Although a lot of progress has been made in the past decade, there
propagation structures in the category of heteromorphic transmission. are still remaining challenges requiring future research. In this paper, we
Afterwards, the stance-based features are adopted and the message/user discuss potential directions, which we believe are important and urgent.
propagation-based mechanisms are proposed. Very recently, Nguyen Robust Detection: Robustness in fake news detection encompasses
and Dou [64,67] utilize the characteristics from all three categories of the ability of accurately identifying fake news and maintaining trustwor-
the diffusion process, including news content, user and publisher pro- thiness when facing interference or adversarial attacks. This research di-
files, propagation structures and feedback stances. Furthermore, they rection has gradually gained attention and exploration in recent years,
incorporate all characteristics together to improve the detection perfor- and research efforts have been made mainly in two perspectives. On
mance. one hand, adversarial attack methods are proposed to undermine the
Framework Design: A framework can be designed to capture the effectiveness of fake news detectors. On the other hand, some works
characteristics from all three categories for effective fake news detec- investigate which features and properties in current detection methods
tion. From Table 3, in early 2010s, traditional machine learning based exhibit better resistance or robustness against attacks.
methods are proposed to capture the hand-crafted features, such as • Adversarial attack methods: The works in [131–136] focus on at-

[6,10]. Later on, with the development of deep neural network, CNN tacking from the perspective of fake news contents. For instance, the
and RNN based methods [34,37,39] are proposed to capture the tex- works in [131–133] examine the robustness of fake news detectors by
tual, visual and temporal features. These features mainly belong to the conducting attacks that distort news content or inject adversarial words.
categories of intentional creation and heteromorphic transmission. Very Wang et al. [137] propose to simulate the adversarial behaviors of fraud-
recently, GNN based approaches [63,64,66,67] are proposed to capture sters and affect the feature of propagation so as to attack GNN-based mis-
the characteristics of propagation structures and users’ viewpoint inter- information detectors. These mechanisms show that attacks on detection
actions, which belong to the categories of heteromorphic transmission methods can effectively reduce their performance, and demonstrate the
and controversial reception. Furthermore, different types of neural net- vulnerabilities of current fake news detection methods. Therefore it is
works can be flexibly combined in one framework to capture different urgent to develop robust fake news detection.
types of characteristics for detection, which is illustrated to be compre- • Robust properties of fake news detectors: Mahabub et al. [138] ap-

hensive and effective. For example, Dou et al. [67] propose a framework ply an ensemble voting classifier based on various machine learning
to incorporate BERT and GNN, where BERT is used to extract features algorithms and prove its effectiveness. Horne et al. [134] show that
of news messages, user preferences and feedbacks, while GNN is used to handmade content-based features, e.g. writing style, are rather robust to
extract the representations of news propagation structures. Finally, all changes in the news cycle. Ali et al. [132] evaluate the performance of
features are fused in this framework for fake news detection. fake news detectors under different neural networks and configurations.
Result Explanation: Detection results can be explained in a more They find that RNNs are more robust, and also increasing the maximum
fine-grained manner, which reveals the key factors of fake news. In early length of input sequences can improve the detectors’ robustness.

342
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

Health Misinformation Detection: Health misinformation is de- generated text detection, which can help in detecting news generated
fined as human health related misinformation that is false or inaccu- by LLMs. These methods can be roughly categorized into two types,
rate based on current scientific consensus [139]. Compared to general including statistical feature based methods [145–147] and neural lan-
false information, the identification of health misinformation requires guage model based methods [148–151]. The former mainly focuses on
specialized biomedical knowledge. It can be challenging for the general the differences in statistical features between machine-generated texts
public to distinguish. In addition, health misinformation is particularly and human-written texts, while the latter extracts deep features from
related to human health and can potentially harm the physical well- the text semantics to detect machine-generated texts.
being of deceived individuals. In recent years, with the outbreak of the Early Detection: To maximally reduce the negative impact of fake
COVID-19 pandemic, a significant amount of pandemic-related fake in- news, it is crucial to detect fake news as early as possible, before it is
formation has been spread, causing significant disruptions to epidemic widely spread. However, it is challenging to detect fake news early, since
prevention. Therefore, combating health misinformation has become in- the available information is limited, where basically only the news con-
creasingly important. In recent years, research efforts have been made tent, user profiles and propagation information at the early stage can be
to detect health misinformation mainly from two perspectives. On one used. Though the works in [19,62] have explored the early detection,
hand, similar to the general fake news detection, features extracted from the performance still needs to be improved. To address this issue, similar
the misinformation diffusion process are exploited for detection, and to [67], besides user profiles, their historical posts/reposts/comments
thus such methods can be integrated into our proposed classification of can be mined to infer their personalities, sentiments and stances,
fake news detection. On the other hand, since health misinformation which provides extended user information for early fake news detec-
involves biomedical knowledge, and fact-checking based on biomedi- tion. In addition, the study of users’ historical behaviors can help in-
cal knowledge graphs is a promising research direction. The details are fer whether a user is a spammer. If spammers can be marked before-
discussed as follows. hand, it will facilitate early fake news detection and prevent fake news
• Features extracted from the misinformation diffusion process: Such propagation.
methods can be integrated into our proposed classification. For exam- Unsupervised/Semi-supervised Approaches: Supervised machine
ple, the work [68] belongs to intention creation, which proposes a user- learning based methods prevail in fake news detection, which usually
centric model to identify those who are likely to spread health misinfor- rely on a large amount of hand-labeled training data. However, labeling
mation by extracting user features, such as attitudes, writing styles, and such data consumes a lot of time and efforts. Furthermore, the labeled
sentiments expressed in their posts on social media. Min et al. [106] for- data may soon be outdated given the dynamic nature of social media and
mulate the detection as a heterogeneous graph classification task and news [98]. Thus, unsupervised or semi-supervised approaches should be
model the message-message, user-user and message-user interactions on explored to tackle such issue.
social network with a divide-and-conquer strategy, which is categorized Similar to Jin et al. [115], clustering based unsupervised methods
into heteromorphic transmission. In the category of controversial recep- can be used to cluster news messages or users’ feedbacks, and such in-
tion, Hossain et al. [113] detects stances of post-response pairs with formation can be further used to estimate the veracity of news messages.
BERT and NLI (natural language inference) models to identify whether Besides, a semi-supervised model with partially labeled data can also be
a claim contains misinformation. exploited. Similar to Liu and Wu [62] and Wang et al. [98], partially
• Fact-checking based on biomedical knowledge graphs: Biomedical labeled data can be used to generate useful annotations for unlabeled
knowledge graph can be used as an effective aid for health misinfor- data in a reinforcement manner, which can enlarge the training set and
mation detection, which can identifies unreasonable relations between improve the detection performance.
entities in texts to improve detection performance and provide expla- Fact-Checking based Approaches: In this survey, we focus on the
nations [95,140–142]. For instance, Cui et al. [95] apply a knowledge- detection methods that use the characteristics and features of news con-
guided graph attention network to capture crucial entities of news ar- tents and related social contexts, while there is another branch of meth-
ticles, and automatically assign more weights to important relations in ods referred to as fact-checking based methods, which check the ve-
differentiating misinformation from fact. Weinzierl et al. [142] design racity of news with the ground truths. For example, there are many
a Misinformation Knowledge Graph (MKG), where the misinformation websites providing fact-checking service, such as Factchecker [152],
detection task is formulated as a graph link prediction problem. PolitiFact [122], etc. They all rely on expert manual fact-checking,
Impacts of LLMs in Fake News Detection: Recently, with the rapid which is not able to scale with the rapid increase of news spread
development of LLMs such as ChatGPT, there has been a significant im- in social media. To address this issue, automatic fact-checking algo-
pact on the field of fake news detection. On the positive side, LLMs can rithms are studied. Shi et al. [22] and Wu et al. [23] perform fact-
be used for fake news detection. However, on the negative side, LLMs checking based on knowledge graph. For example, Shi et al. [22] view
also facilitate the generation of fake news, presenting new challenges the fact-checking as a link-prediction task in the knowledge graph
for fake news detection. Specifically: extracted from Wikipedia and SemMedDB. They perform a DFS-like
• On the positive side, LLMs can be used for fake news detection. graph traversal algorithm to retrieve meta paths, and the top-k discrim-
LLMs such as Chatgpt are trained on a vast amount of factual data, and inative paths are extracted as features to train the logistic regression
can help in detecting fake news by leveraging their comprehension of model. Ciampaglia et al. [153] propose a semantic proximity metric
factual knowledge. Although as shown in [143,144], the fact-checking that performs fact-checking by finding the shortest path between con-
accuracy of LLMs still lags behind human fact-checkers in renowned cept nodes on knowledge graphs. Generally, if the news content falls
organizations like PolitiFact and Snopes, LLMs have demonstrated their in the range of the knowledge domain, this type of methods may de-
potential in fact verification. Furthermore, with the ability to browse tect fake news with high accuracy. However, a lot of news regard-
internet information and access up-to-date knowledge, LLMs have the ing new events is published and spread through social media every
potential to conduct real-time fact-checking. day. This requires the fact database or knowledge graphs to be ex-
• On the negative side, firstly LLMs exhibit the issue of “halluci- panded and updated frequently, which may also raise open issues in this
nations”, and they may produce text that contradicts facts “uninten- field.
tionally”. Secondly, even with safety mechanisms to mitigate potential
risks of generating harmful or misinformation, “intentionally” designed 8. Conclusion
prompts can easily bypass such restrictions to generate misinformation.
However, currently there is very few research work on detecting fake In this paper, we provide a thorough survey of fake news detec-
news generated by LLMs, which is still an open issue, while there ex- tion techniques from a brand-new perspective, i.e. the intrinsic char-
ists a considerable amount of research on a related task, i.e. machine- acteristics of fake news diffusion process, including intentional cre-

343
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

ation, heteromorphic transmission and controversial reception. This [22] B. Shi, T. Weninger, Fact checking in heterogeneous information networks, in: Pro-
review can not only guide researchers to better understand this ceedings of the 25th International Conference Companion on World Wide Web,
2016, pp. 101–102.
field, but also help us reveal the trends of technological advances, [23] Y. Wu, P.K. Agarwal, C. Li, et al., Toward computational fact-checking, Proc. VLDB
which provides insights on how to design effective and explainable Endow. 7 (7) (2014) 589–600.
fake news detection mechanisms. We also discuss popular benchmark [24] S. Yang, K. Shu, S. Wang, et al., Unsupervised fake news detection on social me-
dia: A generative approach, in: Proceedings of the AAAI Conference on Artificial
datasets and further suggest several future directions in fake news Intelligence, vol. 33, 2019, pp. 5644–5651.
detection. [25] S. Hosseinimotlagh, E.E. Papalexakis, Unsupervised content-based identification of
fake news articles with tensor decomposition ensembles, in: Proceedings of the
Workshop on Misinformation and Misbehavior Mining on the Web, 2018.
Declaration of competing interest [26] J. Ma, W. Gao, K.-F. Wong, Detect rumors in microblog posts using prop-
agation structure via kernel learning, in: Proceedings of the 55th Annual
Meeting of the Association for Computational Linguistics, 2017, pp. 708–
The authors declare that they have no conflicts of interest in this 717.
work. [27] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436–444.
[28] Z. Jin, J. Cao, H. Guo, et al., Multimodal fusion with recurrent neural networks
for rumor detection on microblogs, in: Proceedings of the 25th ACM International
Acknowledgments Conference on Multimedia, 2017, pp. 795–816.
[29] Y. Wang, F. Ma, Z. Jin, et al., EANN: Event adversarial neural networks for multi–
modal fake news detection, in: Proceedings of the 24th ACM International Confer-
This work was supported in part by the National Natural Sci- ence on Knowledge Discovery and Data Mining, 2018, pp. 849–857.
ence Foundation of China Science Fund for Creative Research Groups [30] Z. Jin, J. Cao, J. Luo et al., Image credibility analysis with effective domain trans-
(62121002) and Excellent Young Scientists Fund (62222212). ferred deep networks, arXiv preprint arXiv:1611.05328(2016).
[31] J. Ma, W. Gao, P. Mitra, et al., Detecting rumors from microblogs with recurrent
neural networks, in: International Joint Conference on Artificial Intelligence, 2016,
References pp. 3818–3824.
[32] S. Volkova, K. Shaffer, J.Y. Jang, et al., Separating facts from fiction: Linguistic
[1] Z. Jin, J. Cao, H. Guo, et al., Detection and analysis of 2016 us presidential elec- models to classify suspicious and trusted news posts on twitter, in: Proceedings of
tion related rumors on twitter, in: International Conference on Social Computing, the 55th Annual Meeting of the Association for Computational Linguistics, 2017,
Behavioral-Cultural Modeling and Prediction and Behavior Representation in Mod- pp. 647–653.
eling and Simulation, Springer, 2017, pp. 14–24. [33] A. Hanselowski, A. PVS, B. Schiller et al., A retrospective analysis of the fake news
[2] M. Takayasu, K. Sato, Y. Sano, et al., Rumor diffusion and convergence during the challenge stance detection task, arXiv preprint arXiv:1806.05180(2018).
3.11 earthquake: A twitter case study, PLoS One 10 (4) (2015) e0121443. [34] N. Ruchansky, S. Seo, Y. Liu, CSI: A hybrid deep model for fake news detection,
[3] A. Gupta, H. Lamba, P. Kumaraguru, et al., Faking sandy: Characterizing and iden- in: Proceedings of the 2017 ACM on Conference on Information and Knowledge
tifying fake images on twitter during hurricane sandy, in: Proceedings of the 22nd Management, 2017, pp. 797–806.
International Conference on World Wide Web, 2013, pp. 729–736. [35] W. Wen, S. Su, Z. Yu, Cross-lingual cross-platform rumor verification pivoting on
[4] F. Alam, F. Dalvi, S. Shaar, et al., Fighting the COVID-19 infodemic in social media: multimedia content, in: Proceedings of the 2018 Conference on Empirical Methods
A holistic perspective and a call to arms, in: Proceedings of the International AAAI in Natural Language Processing, 2018, pp. 3487–3496.
Conference on Web and Social Media, vol. 15, 2021, pp. 913–922. [36] K. Shu, L. Cui, S. Wang, et al., dEFEND: Explainable fake news detection, in: Pro-
[5] X. Zhou, R. Zafarani, Fake news: A survey of research, detection methods, and ceedings of the 25th ACM International Conference on Knowledge Discovery and
opportunities, arXiv preprint arXiv:1812.00315 2 (2018). Data Mining, 2019.
[6] C. Castillo, M. Mendoza, B. Poblete, Information credibility on twitter, in: Proceed- [37] S. De Sarkar, F. Yang, A. Mukherjee, Attending sentences to detect satirical fake
ings of the 20th International Conference on World Wide Web, 2011, pp. 675–684. news, in: Proceedings of the 27th International Conference on Computational Lin-
[7] X. Liu, A. Nourbakhsh, Q. Li, et al., Real-time rumor debunking on twitter, in: Pro- guistics, 2018, pp. 3371–3380.
ceedings of the 24th ACM International Conference on Information and Knowledge [38] F. Yu, Q. Liu, S. Wu, et al., A convolutional approach for misinformation iden-
Management, 2015, pp. 1867–1870. tification, in: International Joint Conference on Artificial Intelligence, 2017,
[8] M. Gupta, P. Zhao, J. Han, Evaluating event credibility on twitter, in: Proceedings of pp. 3901–3907.
the 2012 SIAM International Conference on Data Mining, SIAM, 2012, pp. 153–164. [39] H. Karimi, P. Roy, S. Saba-Sadiya, et al., Multi-source multi-class fake news de-
[9] P. Biyani, K. Tsioutsiouliklis, J. Blackmer, “8 amazing secrets for getting more tection, in: Proceedings of the 27th International Conference on Computational
clicks”: Detecting clickbaits in news streams using article informality, in: Proceed- Linguistics, 2018, pp. 1546–1557.
ings of the AAAI Conference on Artificial Intelligence, 2016. [40] F. Qian, C. Gong, K. Sharma, et al., Neural user response generator: Fake news
[10] S. Sun, H. Liu, J. He, et al., Detecting event rumors on Sina Weibo automatically, detection with collective user intelligence, in: Proceedings of the 27th International
in: Asia-Pacific Web Conference, Springer, 2013, pp. 120–131. Joint Conference on Artificial Intelligence, vol. 18, 2018, pp. 3834–3840.
[11] F. Yang, Y. Liu, X. Yu, et al., Automatic detection of rumor on Sina Weibo, in: Pro- [41] K. Popat, S. Mukherjee, A. Yates, et al., Declare: Debunking fake news and false
ceedings of the ACM SIGKDD Workshop on Mining Data Semantics, 2012, pp. 1–7. claims using evidence-aware deep learning, in: Proceedings of the 2018 Con-
[12] Z. Jin, J. Cao, Y.-G. Jiang, et al., News credibility evaluation on microblog with a ference on Empirical Methods in Natural Language Processing, 2018, pp. 22–
hierarchical propagation model, in: 2014 IEEE 14th International Conference on 32.
Data Mining, IEEE, 2014, pp. 230–239. [42] Y. Tashtoush, B. Alrababah, O. Darwish, et al., A deep learning framework for
[13] J. Ma, W. Gao, Z. Wei, et al., Detect rumors using time series of social context infor- detection of COVID-19 fake news on social media platforms, Data 7 (5) (2022) 65.
mation on microblogging websites, in: Proceedings of the 24th ACM International [43] S. Kumari, H.K. Reddy, C.S. Kulkarni, et al., Debunking health fake news with
Conference on Information and Knowledge Management, 2015, pp. 1751–1754. domain specific pre-trained model, Global Trans. Proc. 2 (2) (2021) 267–272.
[14] K. Wu, S. Yang, K.Q. Zhu, False rumors detection on Sina Weibo by propagation [44] M.-Y. Chen, Y.-W. Lai, Using fuzzy clustering with deep learning models for de-
structures, in: 2015 IEEE 31st International Conference on Data Engineering, IEEE, tection of COVID-19 disinformation, Trans. Asian Low-Resour. Lang. Inf. Process.
2015, pp. 651–662. (2022).
[15] S. Kwon, M. Cha, K. Jung, et al., Prominent features of rumor propagation in online [45] A. Zubiaga, A. Aker, K. Bontcheva, et al., Detection and resolution of rumours in
social media, in: 2013 IEEE 13th International Conference on Data Mining, IEEE, social media: A survey, ACM Comput. Surv. 51 (2) (2018) 1–36.
2013, pp. 1103–1108. [46] X. Zhang, A.A. Ghorbani, An overview of online fake news: Characterization, de-
[16] V. Pérez-Rosas, B. Kleinberg, A. Lefevre, et al., Automatic detection of fake news, tection, and discussion, Inf. Process. Manage. 57 (2) (2020) 102025.
in: Proceedings of the 27th International Conference on Computational Linguistics, [47] K. Shu, A. Sliva, S. Wang, et al., Fake news detection on social media: A data mining
2018, pp. 3391–3401. perspective, Newsl. Spec. Interest Group Knowl. Discov. Data Min. 19 (1) (2017)
[17] Z. Jin, J. Cao, Y. Zhang, et al., MCG-ICT at MediaEval 2015: Verifying multimedia 22–36.
use with a two-level classification model, in: Working Notes Proceedings of the [48] D. Varshney, D.K. Vishwakarma, A review on rumour prediction and veracity as-
MediaEval 2015 Workshop, 2015. sessment in online social network, Expert Syst. Appl. 168 (2021) 114208.
[18] N. Hassan, C. Li, M. Tremayne, Detecting check-worthy factual claims in presiden- [49] D. Rohera, H. Shethna, K. Patel, et al., A taxonomy of fake news classification tech-
tial debates, in: Proceedings of the 24th ACM International Conference on Infor- niques: Survey and implementation aspects, IEEE Access 10 (2022) 30367–30394.
mation and Knowledge Management, 2015, pp. 1835–1838. [50] I.B. Schlicht, E. Fernandez, B. Chulvi, et al., Automatic detection of health mis-
[19] T. Chen, X. Li, H. Yin, et al., Call attention to rumors: Deep attention based re- information: A systematic review, J. Ambient Intell. Humanized Comput. (2023)
current neural networks for early rumor detection, in: Pacific-Asia Conference on 1–13.
Knowledge Discovery and Data Mining, Springer, 2018, pp. 40–52. [51] C. Chen, H. Wang, M. Shapiro et al., Combating health misinformation in social
[20] H. Rashkin, E. Choi, J.Y. Jang, et al., Truth of varying shades: Analyzing language media: Characterization, detection, intervention, and open issues, arXiv preprint
in fake news and political fact-checking, in: Proceedings of the 2017 Conference arXiv:2211.05289(2022).
on Empirical Methods in Natural Language Processing, 2017, pp. 2931–2937. [52] C. Boididou, K. Andreadou, S. Papadopoulos, et al., Verifying multimedia use at
[21] X. Wang, C. Yu, S. Baumgartner, et al., Relevant document discovery for fact-check- mediaeval 2015, in: Working Notes Proceedings of the MediaEval 2015 Workshop,
ing articles, in: Companion Proceedings of the The Web Conference 2018, 2018, vol. 3, 2015, p. 7.
pp. 525–533.

344
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

[53] A. Bondielli, F. Marcelloni, A survey on fake news and rumour detection techniques, rithm for video copy–move detection and localization, IEEE Trans. Circuits Syst.
Inf. Sci. 497 (2019) 38–55. Video Technol. 29 (3) (2018) 669–682.
[54] Q. Li, Q. Zhang, L. Si, et al., Rumor detection on social media: Datasets, methods [85] S. Chen, S. Tan, B. Li, et al., Automatic detection of object-based forgery in ad-
and opportunities, in: Proceedings of the 2nd Workshop on Natural Language Pro- vanced video, IEEE Trans. Circuits Syst. Video Technol. 26 (11) (2015) 2138–2151.
cessing for Internet Freedom: Censorship, Disinformation, and Propaganda, 2019, [86] J. Hu, X. Liao, W. Wang, et al., Detecting compressed deepfake videos in social
pp. 66–75. networks using frame-temporality two-stream convolutional network, IEEE Trans.
[55] L. Cui, S. Wang, D. Lee, Same: Sentiment-aware multi-modal embedding for de- Circuits Syst. Video Technol. 32 (3) (2021) 1089–1102.
tecting fake news, in: Proceedings of the 2019 ACM International Conference on [87] M. Masood, M. Nawaz, K.M. Malik, et al., Deepfakes generation and detection:
Advances in Social Networks Analysis and Mining, 2019, pp. 41–48. STATE-OF-THE-ART, OPEN CHALLEnges, countermeasures, and way forward,
[56] M. Potthast, J. Kiesel, K. Reinartz, et al., A stylometric inquiry into hyperpartisan Appl. Intell. 53 (4) (2023) 3974–4026.
and fake news, in: Proceedings of the 56th Annual Meeting of the Association for [88] P. Xu, X. Bao, An effective strategy for multi-modal fake news detection, Multimed.
Computational Linguistics, 2018, pp. 231–240. Tools Appl. (2022) 1–24.
[57] J. Xie, S. Liu, R. Liu, et al., SERN: Stance extraction and reasoning network for [89] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale im-
fake news detection, in: International Conference on Acoustics, Speech and Signal age recognition, in: Proceedings of the 3rd International Conference on Learning
Processing, IEEE, 2021, pp. 2520–2524. Representations, 2015.
[58] P. Qi, J. Cao, T. Yang, et al., Exploiting multi-domain visual information for fake [90] M. Coleman, T.L. Liau, A computer readability formula designed for machine scor-
news detection, in: 2019 IEEE International Conference on Data Mining, IEEE, ing, J. Appl. Psychol. 60 (2) (1975) 283–284.
2019, pp. 518–527. [91] Jonathan, Anderson, Lix and Rix: Variations on a little-known readability index,
[59] Z. Jin, J. Cao, Y. Zhang, et al., Novel visual and statistical image features Journal of Reading 26 (6) (1983) 490–496.
for microblogs news verification, IEEE Trans. Multimed. 19 (3) (2016) 598– [92] F. Heylighen, J.-M. Dewaele, Formality of language: Definition, measurement and
608. behavioral determinants, Interner Bericht, Center Leo Apostel,Vrije Universiteit
[60] C. Boididou, S.E. Middleton, Z. Jin, et al., Verifying information with multimedia Brüssel, 4, 1999.
content on twitter, Multimed. Tools Appl. 77 (12) (2018) 15545–15571. [93] J.N. Blom, K.R. Hansen, Click bait: Forward-reference as lure in online news head-
[61] Y. Chen, N.J. Conroy, V.L. Rubin, Misleading online content: recognizing clickbait lines, J. Pragmatics 76 (2015) 87–100.
as “false news”, in: Proceedings of the 2015 ACM Workshop on Multimodal Decep- [94] E. Shushkevich, J. Cardiff, Detecting fake news about COVID-19 on small datasets
tion Detection, 2015, pp. 15–19. with machine learning algorithms, in: 2021 30th Conference of Open Innovations
[62] Y. Liu, Y.-F.B. Wu, FNED: A deep network for fake news early detection on social Association FRUCT, IEEE, 2021, pp. 253–258.
media, ACM Trans. Inf. Syst. 38 (3) (2020) 25. [95] L. Cui, H. Seo, M. Tabar, et al., DETERRENT: Knowledge guided graph atten-
[63] Q. Huang, C. Zhou, J. Wu, et al., Deep structure learning for rumor detection on tion network for detecting healthcare misinformation, in: KDD ’20: The 26th ACM
twitter, in: International Joint Conference on Neural Networks, IEEE, 2019, pp. 1–8. SIGKDD Conference on Knowledge Discovery and Data Mining, 2020.
[64] V.-H. Nguyen, K. Sugiyama, P. Nakov, et al., FANG: leveraging social context [96] R.K. Kaliyar, A. Goswami, P. Narang, et al., FNDNet–a deep convolutional neural
for fake news detection using graph representation, in: CIKM ’20: Proceedings of network for fake news detection, Cognit. Syst. Res. 61 (2020) 32–44.
the 29th ACM International Conference on Information and Knowledge Manage- [97] Q. Le, T. Mikolov, Distributed representations of sentences and documents, in: Pro-
ment, Association for Computing Machinery, New York, NY, USA, 2020, pp. 1165– ceedings of the 31th International Conference on Machine Learning, PMLR, 2014,
1174. pp. 1188–1196.
[65] Y.-J. Lu, C.-T. Li, GCAN: graph-aware co-attention networks for explainable fake [98] Y. Wang, W. Yang, F. Ma, et al., Weak supervision for fake news detection via
news detection on social media, in: Proceedings of the 58th Annual Meeting of the reinforcement learning, in: Proceedings of the AAAI Conference on Artificial Intel-
Association for Computational Linguistics, 2020, pp. 505–514. ligence, vol. 34, 2020, pp. 516–523.
[66] S. Ni, J. Li, H.-Y. Kao, MVAN: multi-view attention networks for fake news detection [99] L. Safarnejad, Q. Xu, Y. Ge, et al., Contrasting misinformation and real-information
on social media, IEEE Access 9 (2021) 106907–106917. dissemination network structures on social media during a health emergency, Am.
[67] Y. Dou, K. Shu, C. Xia, et al., User preference-aware fake news detection, in: Pro- J. Public Health 110 (S3) (2020) S340–S347.
ceedings of the 44th International ACM Conference on Research and Development [100] R. Yang, X. Wang, Y. Jin, et al., Reinforcement subgraph reasoning for fake news
in Information Retrieval, 2021, pp. 2051–2055. detection, in: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Dis-
[68] A. Ghenai, Y. Mejova, Fake cures: user-centric modeling of health misinformation covery and Data Mining, 2022, pp. 2253–2262.
in social media, Proc. ACM Hum.-Comput. Interact. 2 (CSCW) (2018) 1–20. [101] T.N. Kipf, M. Welling, Semi-supervised classification with graph convolutional net-
[69] Y. Zhao, J. Da, J. Yan, Detecting health misinformation in online health commu- works, in: Proceedings of the 5th International Conference on Learning Represen-
nities: incorporating behavioral features into machine learning based approaches, tations, 2017.
Inf. Process. Manage. 58 (1) (2021) 102390. [102] Y. Liu, Y.-F.B. Wu, Early detection of fake news on social media through propaga-
[70] P. Dhanasekaran, H. Srinivasan, S.S. Sree, et al., SOMPS-Net: Attention based social tion path classification with recurrent and convolutional networks, in: Proceed-
graph framework for early detection of fake health news, in: Australasian Confer- ings of the AAAI Conference on Artificial Intelligence, vol. 32, 2018, pp. 354–
ence on Data Mining, Springer, 2021, pp. 165–179. 361.
[71] A. Hassan, V. Qazvinian, D. Radev, Whats with the attitude? Identifying sentences [103] L. Wu, H. Liu, Tracing fake-news footprints: Characterizing social media messages
with attitude in online discussions, in: Proceedings of the 2010 Conference on Em- by how they propagate, in: Proceedings of the 11th ACM International Conference
pirical Methods in Natural Language Processing, 2010, pp. 1245–1255. on Web Search and Data Mining, 2018, pp. 637–645.
[72] B. Ma, D. Lin, D. Cao, Content representation for microblog rumor detection, in: [104] K. Shu, S. Wang, H. Liu, Exploiting tri-relationship for fake news detection, arXiv
Advances in Computational Intelligence Systems, Springer, 2017, pp. 245–251. preprint arXiv:1712.07709 8 (2017).
[73] V.L. Rubin, N. Conroy, Y. Chen, et al., Fake news or truth? Using satirical cues [105] J. Yu, Q. Huang, X. Zhou, et al., IARNet: An information aggregating and reasoning
to detect potentially misleading news, in: Proceedings of the 2nd Workshop on network over heterogeneous graph for fake news detection, in: 2020 International
Computational Approaches to Deception Detection, 2016, pp. 7–17. Joint Conference on Neural Networks (IJCNN), IEEE, 2020, pp. 1–9.
[74] T. Bian, X. Xiao, T. Xu, et al., Rumor detection on social media with bi-directional [106] E. Min, Y. Rong, Y. Bian, et al., Divide-and-conquer: Post-user interaction network
graph convolutional networks, in: Proceedings of the AAAI Conference on Artificial for fake news detection on social media, in: Proceedings of the ACM Web Confer-
Intelligence, vol. 34, 2020, pp. 549–556. ence 2022, 2022, pp. 1148–1158.
[75] J. Ma, W. Gao, K.-F. Wong, Detect rumor and stance jointly by neural multi– [107] J. Cui, K. Kim, S.H. Na, et al., Meta-path-based fake news detection leveraging mul-
task learning, in: Companion Proceedings of the The Web Conference 2018, 2018, ti-level social context information, in: Proceedings of the 31st ACM International
pp. 585–593. Conference on Information & Knowledge Management, 2022, pp. 325–334.
[76] S.R. Sahoo, B.B. Gupta, Multiple features based approach for automatic fake news [108] M. Paraschiv, N. Salamanos, C. Iordanou, et al., A unified graph-based approach to
detection on social networks using deep learning, Appl. Soft Comput. 100 (2021) disinformation detection using contextual and semantic relations, in: Proceedings
106983. of the International AAAI Conference on Web and Social Media, vol. 16, 2022,
[77] X. Zhang, J. Cao, X. Li, et al., Mining dual emotion for fake news detection, in: pp. 747–758.
Proceedings of the Web Conference 2021, 2021, pp. 3465–3476. [109] (http://www.fakenewschallenge.org/). Last accessed December 6, 2023.
[78] Q. Sheng, X. Zhang, J. Cao, et al., Integrating pattern-and fact-based fake [110] J. Devlin, M.-W. Chang, K. Lee, et al., BERT: Pre-training of deep bidirectional
news detection via model preference learning, in: Proceedings of the 30th transformers for language understanding, in: Proceedings of the 2019 Conference
ACM International Conference on Information & Knowledge Management, 2021, of the North American Chapter of the Association for Computational Linguistics,
pp. 1640–1650. 2018, pp. 4171–4186.
[79] M. Koppel, J. Schler, E. Bonchek-Dokow, Measuring differentiability: Unmasking [111] Y. Liu, M. Ott, N. Goyal et al., RoBERTa: A robustly optimized bert pretraining
pseudonymous authors, J. Mach. Learn. Res. 8 (6) (2007) 1261–1276. approach, arXiv preprint arXiv:1907.11692(2019).
[80] Y. Zhu, Q. Sheng, J. Cao, et al., Memory-guided multi-view multi-domain fake news [112] S. Wang, T. Terano, Detecting rumor patterns in streaming social media, in: 2015
detection, IEEE Trans. Knowl. Data Eng. (2022) 7178–7191. IEEE International Conference on Big Data, IEEE, 2015, pp. 2709–2715.
[81] D.M. Blei, A.Y. Ng, M.I. Jordan, Latent Dirichlet allocation, J. Mach. Learn. Res. 3 [113] T. Hossain, R.L. Logan IV, A. Ugarte, et al., COVIDLies: detecting COVID-19 mis-
(Jan) (2003) 993–1022. information on social media, Workshop on NLP for COVID-19 (Part 2) at EMNLP
[82] L. Zhang, H. Yang, T. Qiu, et al., AP-GAN: Improving attribute preservation in video 2020, 2020.
face swapping, IEEE Trans. Circuits Syst. Video Technol. 32 (4) (2021) 2226–2237. [114] M. Davoudi, M.R. Moosavi, M.H. Sadreddini, DSS: A hybrid deep model for fake
[83] F. Peng, L. Yin, M. Long, BDC-GAN: Bidirectional conversion between computer– news detection using propagation tree and stance network, Expert Syst. Appl. 198
generated and natural facial images for anti-forensics, IEEE Trans. Circuits Syst. (2022) 116635.
Video Technol. 32 (10) (2022) 6657–6670. [115] Z. Jin, J. Cao, Y. Zhang, et al., News verification by exploiting conflicting social
[84] L. D’Amiano, D. Cozzolino, G. Poggi, et al., A patchmatch-based dense-field algo-

345
B. Hu, Z. Mao and Y. Zhang Fundamental Research 5 (2025) 332–346

viewpoints in microblogs, in: Proceedings of the AAAI Conference on Artificial [141] Z. Kou, L. Shang, Y. Zhang, et al., HC-COVID: A hierarchical crowdsource knowl-
Intelligence, vol. 30, 2016. edge graph approach to explainable COVID-19 misinformation detection, Proc.
[116] K. Li, B. Guo, J. Liu, et al., Dynamic probabilistic graphical model for progressive ACM Hum.-Comput. Interact. 6 (GROUP) (2022) 1–25.
fake news detection on social media platform, ACM Trans. Intell. Syst. Technol. [142] M.A. Weinzierl, S.M. Harabagiu, Automatic detection of COVID-19 vaccine misin-
(TIST) (2022) 86. formation with graph link prediction, J. Biomed. Inf. 124 (2021) 103955.
[117] A. D’Ulizia, M.C. Caschera, F. Ferri, et al., Fake news detection: A survey of evalu- [143] K.M. Caramancion, Harnessing the power of ChatGPT to decimate
ation datasets, PeerJ Comput. Sci. 7 (2021) e518. mis/disinformation: using ChatGPT for fake news detection, in: 2023 IEEE
[118] Y. Wang, L. Wang, Y. Yang, et al., SemSeq4FD: Integrating global semantic rela- World AI IoT Congress (AIIoT), IEEE, 2023, pp. 0042–0046.
tionship and local sequential order to enhance text representation for fake news [144] K.M. Caramancion, News verifiers showdown: A comparative performance evalua-
detection, Expert Syst. Appl. 166 (2021) 114090. tion of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in news fact-checking, arXiv
[119] https://www.datafountain.cn/competitions/422. Last accessed December 6, 2023. preprint arXiv:2306.17176(2023b).
[120] https://www.snopes.com/. Last accessed December 6, 2023. [145] S. Gehrmann, H. Strobelt, A.M. Rush, Gltr: Statistical detection and visualization
[121] W.Y. Wang, “Liar, liar pants on fire”: A new benchmark dataset for fake news of generated text, arXiv preprint arXiv:1906.04043(2019).
detection, in: Proceedings of the 55th Annual Meeting of the Association for Com- [146] G. Rosalsky, E. Peaslee, This 22-year-old is trying to save us from ChatGPT before
putational Linguistics, 2017, pp. 422–426. it changes writing forever, NPR 18 (2023) 2023. Archived from the original on
[122] http://www.politifact.com/. Last accessed December 6, 2023. January
[123] K. Shu, D. Mahudeswaran, S. Wang, et al., FakeNewsNet: A data repository with [147] E. Mitchell, Y. Lee, A. Khazatsky et al., DetectGPT: zero-shot machine-generated
news content, social context and dynamic information for studying fake news on text detection using probability curvature, arXiv preprint arXiv:2301.11305(2023).
social media, Big Data 8 (3) (2020) 171–188. [148] R. Zellers, A. Holtzman, H. Rashkin, et al., Defending against neural fake news,
[124] K. Nakamura, S. Levy, W.Y. Wang, r/Fakeddit: A new multimodal Adv. Neural Inf. Process. Syst. 32 (2019) 9054–9065.
benchmark dataset for fine-grained fake news detection, arXiv preprint [149] A. Bakhtin, S. Gross, M. Ott et al., Real or fake? learning to discriminate machine
arXiv:1911.03854(2019). from human generated text, arXiv preprint arXiv:1906.03351(2019).
[125] A. Zubiaga, M. Liakata, R. Procter, Exploiting context for rumour detection in social [150] X. Liu, Z. Zhang, Y. Wang et al., COCO: coherence-enhanced machine-generated
media, in: Social Informatics: 9th International Conference, SocInfo 2017, Oxford, text detection under data limitation with contrastive learning, arXiv preprint
UK, September 13–15, 2017, Proceedings, Part I 9, Springer, 2017, pp. 109–123. arXiv:2212.10341(2022).
[126] E. Dai, Y. Sun, S. Wang, Ginger cannot cure cancer: Battling fake health news with [151] W. Zhong, D. Tang, Z. Xu et al., Neural deepfake detection with factual structure
a comprehensive data repository, in: Proceedings of the International AAAI Con- of text, arXiv preprint arXiv:2010.07475(2020).
ference on Web and Social Media, vol. 14, 2020, pp. 853–862. [152] https://www.factcheck.org/. Last accessed December 6, 2023.
[127] https://en.wikipedia.org/wiki/Topsy_Labs. Last accessed December 6, 2023. [153] G.L. Ciampaglia, P. Shiralkar, L.M. Rocha, et al., Computational fact checking from
[128] http://www.gossipcop.com/ Last accessed December 6, 2023. knowledge networks, PloS One 10 (6) (2015) e0128193.
[129] https://www.reddit.com/. Last accessed December 6, 2023.
[130] https://en.wikipedia.org/wiki/HealthNewsReview.org. Last accessed December 6, Author profile
2023.
[131] Z. Zhou, H. Guan, M.M. Bhat et al., Fake news detection via NLP is vulnerable to Bo Hu received the BSc degree in Computer Science from the University of Science and
adversarial attacks, arXiv preprint arXiv:1901.09657(2019). Technology of China, Hefei, China in 2007, and the PhD degree in Electrical and Computer
[132] H. Ali, M.S. Khan, A. AlGhadhban, et al., All your fake detector are belong to us: Engineering from the University of Alberta, Edmonton, AB, Canada in 2013. Currently,
Evaluating adversarial robustness of fake-news detectors under black-box settings, he is an associate professor with the School of Information Science and Technology, Uni-
IEEE Access 9 (2021) 81678–81692. versity of Science and Technology of China. His research interests include computational
[133] C. Koenders, J. Filla, N. Schneider et al., How vulnerable are automatic fake news social science, recommender systems, data mining and information retrieval.
detection methods to adversarial attacks?, arXiv preprint arXiv:2107.07970(2021).
[134] B.D. Horne, J. Nørregaard, S. Adali, Robust fake news detection over time and Zhendong Mao received the PhD degree in computer application technology from the
attack, ACM Trans. Intell. Syst. Technol. (TIST) 11 (1) (2019) 1–23. Institute of Computing Technology, Chinese Academy of Sciences, in 2014. He is currently
[135] T. Le, S. Wang, D. Lee, MALCOM: Generating malicious comments to attack neural a professor with the University of Science and Technology of China, Hefei, China. He was
fake news detection models, in: 2020 IEEE International Conference on Data Mining an assistant professor with the Institute of Information Engineering, Chinese Academy of
(ICDM), IEEE, 2020, pp. 282–291. Sciences, Beijing, from 2014 to 2018. His research interests include the fields of computer
[136] B. He, M. Ahamad, S. Kumar, PETGEN: Personalized text generation attack on deep vision, natural language processing and cross-modal understanding.
sequence embedding-based classification models, in: Proceedings of the 27th ACM
SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 575–584. Yongdong Zhang received the PhD degree in electronic engineering from Tianjin Univer-
[137] H. Wang, Y. Dou, C. Chen et al., Attacking fake news detectors via manipulating sity, Tianjin, China, in 2002. He is currently a professor with the School of Information Sci-
news social engagement, arXiv preprint arXiv:2302.07363(2023). ence and Technology, University of Science and Technology of China. His current research
[138] A. Mahabub, A robust technique of fake news detection using ensemble voting interests are in the fields of multimedia content analysis and understanding, multimedia
classifier and comparison with other classifiers, SN Appl. Sci. 2 (4) (2020) 525. content security, video encoding, and streaming media technology. He has authored over
[139] W.-Y. Sylvia Chou, A. Gaysynsky, J.N. Cappella, Where we go from here: Health 100 refereed journal and conference papers. He serves as an Editorial Board Member of
misinformation on social media, 2020, the Multimedia Systems Journal and the IEEE Transactions on Multimedia.
[140] L. Shang, Y. Zhang, Z. Yue, et al., A knowledge-driven domain adaptive approach to
early misinformation detection in an emergent health domain on social media, in:
2022 IEEE/ACM International Conference on Advances in Social Networks Analysis
and Mining (ASONAM), IEEE, 2022, pp. 34–41.

346

You might also like