research-article

Open access

Beyond Text: Multimodal Credibility Assessment Approaches for Online User-Generated Content

Authors: Monika Choudhary, Satyendra Singh Chouhan, Santosh Singh RathoreAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 15, Issue 5

Article No.: 92, Pages 1 - 33

https://doi.org/10.1145/3673236

Published: 05 November 2024 Publication History

PDF eReader

Abstract

User-generated content (UGC) is increasingly becoming prevalent on various digital platforms. The content generated on social media, review forums, and question–answer platforms impacts a larger audience and influences their political, social, and other cognitive abilities. Traditional credibility assessment mechanisms involve assessing the credibility of the source and the text. However, with the increase in how user content can be generated and shared (audio, video, and images), multimodal representation of UGC has become increasingly popular. This article reviews the credibility assessment of UGC in various domains, particularly identifying fake news, suspicious profiles, and fake reviews and testimonials, focusing on both textual content and the source of the content creator. Next, the concept of multimodal credibility assessment is presented, which also includes audio, video, and images in addition to text. After that, the article presents a systematic review and comprehensive analysis of work done in the credibility assessment of UGC considering multimodal features. Additionally, the article provides extensive details on the publicly available multimodal datasets for the credibility assessment of UGC. In the end, the research gaps, challenges, and future directions in assessing the credibility of multimodal UGC are presented.

1 Introduction

As humans, it is inherent in our nature to desire and uphold social connections. The presence, acceptance, and support of others are crucial for our psychological and social well-being. People of heterogeneous age groups, social, and economic backgrounds become members of online social platforms daily. People use Online Social Media (OSM), the web, applications, and other digital and online tools to stay connected and interact with each other. Due to users’ activities, specifically their interactions on OSM, enormous amounts of data are produced every second, amounting to terabytes of information. The production and consumption of digital content have recently experienced an exponential rise. The rapid growth of digitization has led to the increased proliferation of online content. People publish their views, opinions, and information directly on online platforms such as websites, blogs, review forums, e-commerce websites, and social media. The opinions, information, and promotional content that online users share on these platforms is called User-Generated Content (UGC). Some of the common forms of UGC are given as follows:

—

Reviews and comments: Users share their reviews about products, places, and services on e-commerce websites (e.g., Amazon and eBay) and review and discussion forums such as Yelp and TripAdvisor. These reviews help people form opinions about a product or service and can also act as a marketing tool. People post comments on websites and social media posts to ask questions, give answers, express their opinions, and interact with other users.

—

Articles and blog posts: Online users and organizations post content on their websites, public websites (such as Wikipedia and OpenStack), and blogs. News organizations and publishers also publish news, articles, and other items on their websites.

UGC amounts to a large amount of data used for varying research purposes. The four broad categories of research that are carried out using UGC are shown in Figure 1. A large amount of UGC has been used to train recommender systems for product marketing and advertisements [1]. UGC data can also be used to study the patterns and relationships among different users, thereby understanding their likes and traits. This learning can help increase user engagement on certain e-commerce, social media, and entertainment platforms. Articles, blogs, and websites can act as data corpus on which models can be applied for different information retrieval and knowledge extraction tasks. Named Entity Recognition (NER) [2], question–answering, text summarization [3], entity retrieval, trend detection [4], and event tracking are some such domains where this large amount of data corpus can be beneficial. We also know that different online platforms allow us to share the UGC with friends, communities, and the public. The lack of moderation in UGC creates doubt about its quality and reliability. In the open and unrestricted online realm, where an immense amount of information and communication is feasible, the differentiation between quantity and quality becomes significant [5]. The vast volume of information on digital platforms can make it challenging to distinguish between reliable and relevant information from low-quality or irrelevant information. It becomes crucial to differentiate between the quality and quantity of information to ensure that the information obtained is accurate, credible, and useful. Failing to make this distinction can lead to misinformation, false beliefs, and misguided decision-making. Evaluating suspicious content at the user level can be an effective approach to quickly mitigate the spread of such content, given that only a small group of users are usually responsible for propagating it.

Fig. 1.

Assessing the credibility of content and its sources is a critical concern for internet users and is the fundamental aspect of our work in this article. In this work, we consider the creation and sharing of ideologies, news, and information as UGC. The views and information carried by UGC not only impact people by influencing consumers’ decision-making but also affect individuals’ opinions, ideologies, and social behavior. This motivates us to introduce the notion of credibility in the context of UGC. This article presents a systematic review and comprehensive analysis of related work in the credibility assessment of UGC. Most of the current studies focused on using textual and metadata features to evaluate the credibility of UGC [6, 7]. However, as digital data increasingly incorporate various modalities, including text, images, audio, and videos, there is a growing need to consider the credibility of such multimodal content [8, 9]. Visual content is particularly attention-grabbing and can result in wider dissemination [10, 11]. Yet, little research has explored how visual and non-textual information can be used to verify the credibility of both content and users. The main contributions of this article are highlighted as follows.

(1)

A taxonomy of domains where Machine Learning (ML) or deep learning algorithms have been used for the credibility assessment of UGC is presented.

(2)

A detailed analysis of the existing works on multimodal credibility assessment using ML- and deep learning-based approaches/models is provided.

(3)

Extensive details on the publicly available multimodal datasets for the credibility assessment of UGC are presented. The utilization and applicability of each dataset for the UGC credibility assessment are also discussed.

(4)

The article also highlights the key findings and limitations of existing studies and suggests future directions in the domain of multimodal credibility assessment of UGC.

The rest of the article is structured as follows. Section 2 provides background details on the credibility of the online content. Section 3 focused on the works on credibility assessment of UGC. A detailed review of the works related to the content-level credibility assessment is provided in Section 4. Section 5 reviewed the work on user-level credibility assessment. The research gaps are discussed in Section 6. Section 7 reported various multimodal datasets available for the content and user credibility assessment. Section 8 discusses the scope of the future work, and Section 9 concludes the article.

2 Background

Credibility is a multifaceted concept, and trustworthiness and expertise are two key dimensions of credibility perception. Therefore, credibility assessment is the process of evaluating the truthfulness and accuracy of the information. It is determined by individuals based on their expertise and knowledge [12]. Due to its importance, it has been a well-studied concept in widely varying research fields, such as psychology and social sciences [13]. According to recent studies, credibility is a characteristic that people often attribute to information despite lacking the cognitive ability to differentiate between true and false information on their own [14]. The UGC on social media platforms can be noisy and unreliable. Spam and misinformation may be intentionally or unintentionally introduced in the online content. Whatever may be the cause of false information, such information spreads rapidly and to a wider audience. In the “offline” world, the credibility of information is determined by individuals through various factors, including the reputation of the source of information (such as domain specialists) and their personal experiences. However, credibility assessment is often more complex in the digital realm than in previous media contexts. There could be multiple reasons behind this complexity. The online environment presents a range of challenges to information credibility assessment, including the presence of numerous sources, potential anonymity, a lack of information quality standards, ease of manipulation and alteration, ambiguous context, and multiple targets for credibility evaluation (such as the source of content, information in the content and the platform where the content is shared). As a result, it is crucial to develop approaches that can help users in the automatic assessment of both the reliability of information and the authenticity of online user profiles [15]. Hence, the credibility of content and user profiles becomes highly significant.

Let us discuss some examples to understand the importance and impact of credibility in the digital world. The Daily Mail, with millions of readers every day, had one of its headlines in January 2020 “Finland to introduce a four-day working week and SIX-HOUR days under plans drawn up by 34-year-old prime minister Sanna Marin.” However, the story has since been denounced as fake news, and the Finland Government has tweeted over its official Twitter handle to clarify it [16]. Figure 2(a) and (b) shows the fake news tweet and its clarification tweet.

Fig. 2.

Facebook removed 36 Instagram accounts, 55 Facebook accounts, and 42 Pages in the United States in August 2020 for suspected inauthentic behavior in the region. During the same time, Facebook prohibited the functioning of 453 Facebook accounts, 103 Pages, 78 Groups, and 107 Instagram accounts that were run from Pakistan and were centered on Pakistan and India. [17]. Figure 3 shows snippet of content posted by these accounts and pages. In today's times, social media platforms encourage users to engage and collaborate by creating UGC within virtual communities. Thus, assessing the credibility of this content has become highly significant.

Fig. 3.

Assessing the credibility of information in social media involves examining both UGC and the characteristics of its creators, as well as the intrinsic features of social media platforms. This entails considering credibility-related characteristics of both the information (i.e., UGC) and the users who post it on these online platforms. Although credibility is a perception individuals hold, it is not advisable to leave credibility assessment solely to users.

Figure 4 discusses the life cycle of the creation, propagation, transmission, and impact of untrustworthy information online. Fake and spam profiles typically generate “not-so-credible” or “untrustworthy” information. They generate dubious information and pollute OSM with spam, fake news, and fake reviews. This untrustworthy proliferation hampers users both directly and indirectly. The platforms get overloaded with false, negative, and spam content. Also, people may suffer financial losses and develop misconstrued perceptions based on fake news and false reviews circulated over social media. Thus, there is a need to develop interfaces, tools, or systems designed to help automatically or semi-automatically assess information and user credibility.

Fig. 4.

2.1 Motivation

OSM has emerged as one of the most effective communication and information-sharing platforms today. Some sample users of online social networks are heads of state, journalists, organizations, celebrities, and citizens. In recent years, the prevalence of social media has led to a rise in both the quantity and penetration of UGC. As per the DataReportal July 2022 global overview report, 4.7 billion people worldwide now use social media, which is more than half of the world population [18]. However, the users reading social media content should be able to identify the credibility of the content in terms of truthfulness and correctness. A Stanford University study conducted on 7,804 students from middle school to college found that students rely more on the amount of detail and photos in the news, like tweets, rather than the source to judge credibility [19]. Given the volume and variety of content, manual assessment of online data on digital platforms is nearly impossible. There have been various attempts to detect misinformation and fake news by analyzing UGC with intelligent systems. We have summarized below the key motivating factors to work in this direction:

—

With the increase in the use of online platforms, a huge volume of UGC is being created. This content may be in the form of text (posts/articles) or multimedia. It is challenging to moderate the huge amount of content that is generated.

—

As the generated content is shared online, it quickly penetrates online social networks with no trusted external control.

—

As online content is easily and instantly available to users, it directly influences and impacts public opinion.

—

The easy and quick proliferation of misinformation across online platforms makes it challenging to identify misinformation.

2.2 Key Challenges

Assessing the credibility of UGC on digital platforms is of growing importance and has attracted the attention of various researchers. However, several challenges are associated with assessing the online content users post. Some of the key challenges are shown in Figure 5. Social media platforms generate data that are both structured and unstructured in nature. While it is easier to process structured data like friends count, followers count, and so on, it is difficult to process unstructured data like the text of a post. Online users may use different types of web platforms. This disparate data sources may generate and store data in different structures. Also, for the credibility assessment of the content, it is crucial to bring together data from different data sources to pull together the different facets and determine the credibility of the content as a whole.

Fig. 5.

The volume of data is another big challenge. Users generate a humongous amount of data. It is a grueling task to identify the authenticity of the content and its source. The nature of this content also keeps on evolving and growing. People keep adapting to newer tools and technologies for data generation and sharing. This variability makes the task of credibility assessment further challenging. Penetration power of online content in terms of speed and reachability also poses difficulties. Digital content, especially social media posts, get “viral” quickly. Hence it is difficult to assess the credibility of viral information, as the information has already reached out to the masses. Online UGC can be multimodal in nature. It may carry different types of data formats, such as text, image, audio, video, gifs, and reels. Moreover, there are different types of data associated with user posts like user information (name, age, sex, profile picture, friends, and followers), engagement information (likes, retweets, and comments), temporal information (time and date of post), and so on. It is difficult to analyze such heterogeneous data. Lastly, there is a scarcity of labeled datasets available to train models for automated credibility assessment of UGC. This inhibits the researchers from developing automated tools and models for credibility assessment.

3 Credibility Assessment of UGC

The social media platforms such as Twitter, Facebook, and Instagram have revolutionized how information is disseminated, allowing for the rapid spread and amplification of content. The volume of UGC has grown exponentially in recent years, with millions of posts and comments shared daily. Researchers have used UGC to study online communities [20], political activism, and brand perceptions [21]. UGC is also valuable for businesses and researchers, providing insights into consumer behavior, preferences, and opinions [22]. However, UGC can also be unreliable, biased, and inaccurate, as it is created and shared voluntarily by individuals with varying expertise and knowledge [23]. Data digitization has made it easier to collect and analyze extensive data quickly but has also raised concerns about data privacy, security, and credibility [24]. Several studies have examined and found that a lot of content generated on online social networks may be less credible than content produced by traditional methods [25]. Next, we review the credibility of UGC in more detail.

Researchers have used different features, models, and datasets to assess the credibility of UGC. Credibility is a multidimensional concept determining the truthfulness of UGC on online platforms. It involves factors such as the source of information and its value against trustworthiness, expertise, and credentials [26]. While trust can be related to the dependability and reliability of an object or person, credibility is the believability of information and the quality of the source from which such information comes [27].

Figure 6 shows the classification of the credibility assessment of UGC based on the levels where it is analyzed, the approaches used, and the platforms considered for analysis. The credibility of UGC is generally assessed at all three levels, content or post level, topic level, and user level. At the post level, the aim is to analyze the genuineness of the post. The type of content in the post generally depends on the underlying platform. Several authors have used the text of the post and its metadata to assess content credibility [28, 29]. Content metadata includes the length of text, presence of hashtags, mentions, and links. Some researchers have also analyzed the sentiment of the text. Sentiment features can be extracted by tallying affirmative and critical phrases in a user post. Researchers’ main lineage of work in the credibility assessment at the content level includes fake review detection [30, 31], spam classification [32, 33], bullying content recognition [34], rumor analysis [35, 36], and fake news identification [37, 40]. Other work done on content-based analysis includes sentiment analysis [41], trends and pattern analysis [42, 43], information diffusion [44], predictive analysis [45], and recommendation systems [1].

Fig. 6.

In this literature, content credibility assessment focuses on identifying fake news and fake opinions. Together, they are referenced as “suspicious content” in our work.

Credibility analysis of UGC can also be done at the topic level. A topic becomes trending if numerous users post, comment, and repost content related to that particular topic over a short period. Typically, this occurs following a high-impact event, resulting in the creation of thousands of posts in a short span of time. Researchers have worked on assessing topic-level credibility by assessing the sentiment of content and temporal features [46, 47]. According to researchers in [47], most impactful events are unfavorable and comprise intense negative sentiment words and viewpoints. Some authors have used topic and opinion classification to measure credibility at this level [48, 49]. Credibility assessment of UGC can also be done at the user level. Online users may deliberately perform suspicious activities on digital platforms, like posting fake, repetitive, biased, and non-credible information. Several researchers have worked to identify fake accounts, bots, and bullying profiles. In general, genuine user accounts are not used directly to spread suspicious content on OSM. People create separate automated or manual accounts to perform malicious and suspicious activities. Researchers have worked on identifying individual and groups of fake accounts [7, 50, 53], compromised accounts [54, 57], Sybil accounts [58, 59], bullying accounts [60, 63], spam accounts [64, 70], and bots [71, 73] which perform suspicious activities. The approaches for identifying suspicious accounts have mainly relied on features data [53, 74, 77], behavioral analysis [55, 78], graph theory [58, 79], and pattern matching [80]. The proposed models have used different clustering and classification algorithms for classification.

Our work identifies profiles that spread hate content” and are involved in bullying. Together, they are referenced as “suspicious profiles” in our work.

3.1 Review Methodology

We employed a systematic and comprehensive approach to examining the work done on the credibility assessment of suspicious content and profiles. Initially, we developed a search strategy to determine the search methodology and locate relevant research studies in our area of interest. We used inclusion–exclusion criteria to narrow down the works within the scope of the study. Furthermore, we assessed the quality of the chosen papers based on specific parameters to ensure that only high-quality studies were included. Subsequently, data were collected, followed by a critical analysis of the research domain. This analysis provided a condensed review that evaluated and established practical implications, identified gaps or inconsistencies, and suggested future research directions.

An exhaustive search of all studies was meticulously conducted to find as many potentially relevant papers as possible related to the credibility assessment of online UGC. The search terms were identified as fake news, fake opinions, suspicious content, fake profiles, suspicious profiles, credibility assessment, and hate speech. An examination of titles, keywords, and abstracts of relevant studies was conducted to identify all primary research studies related to the topic of study. The search focused on journals of high repute and the most relevant studies available in major digital libraries. The prominent electronic sources we used are ACM, Elsevier, and IEEE. We have also utilized other digital resources, such as online repositories and websites, to obtain supplementary information, technical and scientific reports, and analytical data that aid in our comprehension of the subject matter. The key e-sources of information are listed in Figure 7. We thoroughly examined the chosen articles on credibility assessment by applying our criteria to articles gathered from diverse sources. We kept the relevant articles only, and Figure 8 shows the selection process. Our analysis classified related work into two major domains, i.e., suspicious content and suspicious profiles. The timeline for suspicious content and suspicious profiles is shown in Figure 9. The literature review on the content and user credibility assessments is discussed in Sections 4 and 5, respectively. Evaluating content credibility helps identify suspicious content, and evaluating user credibility helps identify suspicious user profiles.

Fig. 7.

Fig. 8.

Fig. 9.

4 Content-Level Credibility Assessment

Real-world data generated by users are often unstructured and can come in various modalities, including text, images, video, and other forms of media. These modalities are often created by users with the help of digital services. This varied content is often created through interaction with other users [81]. We have classified UGC into two types, i.e., single modes, such as text-based or image-based (unimodal), and consisting of more than one mode, such as both text and image or text and video (multimodal).

4.1 Content Credibility Assessment Based on Unimodal UGC

Unimodal UGC refers to UGC focusing on a single type of media or format, such as text, images, audio, or video. Processing unimodal data is easy and quick as it contains only one type of information. Text was the primary form of communication when the web and the internet were in their naive stages. Researchers have presented various text-processing and natural-language-processing techniques to assess text-based content, leading to more informed decision-making. Compared to other modalities, the research on text is relatively more extensive due to the abundant textual content available.

Early work on fake/suspicious content detection focused on statistical and ML-based models on handcrafted features [28, 82] and linguistics [26, 83, 84]. Linguistic features such as the frequency of swear words, negative or positive sentiment words, and pronouns with other information extracted from tweets were mainly used to give a real-time credibility score to text data [85]. Recent studies have also explored linguistic cues and other features [86, 87]. The complexity of a sentence has been recognized as both an indicator of information reliability and a measure of subjectivity [12, 88] and an index of subjectivity. Salient content words or expressions (bi-grams and tri-grams) have also been used as features [87, 89]. Features based on opinions and emotions in the text are used by authors of [28, 90]. Word embeddings have been used by Ma et al. [86] to analyze text content. It can be seen that content-based features have been studied and used for assessing content credibility. Some authors have used stance (primarily positive or mostly negative) for content veracity of social media posts [91]. Traditional ML techniques were employed for identifying fake content [92].

However, manually extracting features from the text was challenging and time-consuming in design and did not fully exploit the textual content. To address this issue, researchers have turned to deep learning approaches as they possess strong representation learning capabilities, which enable them to detect fake content more effectively. Authors in [93] utilized a convolutional neural network in their model to explore deeper interactions and extract low-level features. Ruchansky et al. [94] proposed a hybrid model (CSI) incorporating Long Short-Term Memory (LSTMs) to extract textual features in its Capture module. Work done with unimodal image-based models is discussed ahead. Though much of the work done in the credibility assessment of UGC focused mostly on text-based content, limited work has been done toward identifying fake content in image-based formats. Authors in [95] used Convolutional Neural Networks (CNN) and CNN-RNN-based frameworks to exploit the characteristics of images. A learning algorithm for detecting visual image manipulations using a Siamese Network with a ResNet50 backbone is used in [96]. The above approaches are meaningful attempts at assessing content credibility and identifying suspicious content.

Recent advancements in fact-checking algorithms leverage large-scale knowledge bases and information retrieval techniques to verify claims and cross-reference information across multiple sources. These technologies represent promising avenues for promoting a more informed and trustworthy social media environment. Within the realm of explainable content assessment, researchers have employed the visualization of neural attention weights as a means of providing explanations [97, 98]. But they cannot be inferred by humans easily. Authors in [99] introduced a First-Order-Logic-Guided Knowledge-Grounded (FOLK) Reasoning method that can check complex statements and give explanations without needing specific evidence marked by humans. FOLK uses Large Language Models (LLMs), which are very good at learning from context, to change the statement into a form called First-Order-Logic (FOL) such that each part of the statement becomes a smaller claim that needs to be checked. Then, FOLK uses this logic to check against a set of questions and answers based on knowledge. It predicts if the statement is true or false and explains how it made its decision. Complex semantic structures of real-world claims and high computational cost are some of the attributes that can be restrictive to work in this direction. Authors in [100] suggest that “mining” the potential of LLMs may require novel prompting techniques and a deeper understanding of its internal mechanism. Some other notable work in this direction has also been done by authors in [101, 102].

4.2 Content Credibility Assessment Based on Multimodal UGC

Due to limited information and the presence of ambiguity in textual content, exploring other modes of information can be useful. Here, the work done by authors considering multimodal features is discussed. Learning from multimodal features offers the possibility of capturing correspondence between modalities [103]. However, a few decision points and challenges should be addressed when dealing with multimodal data, such as representing heterogeneous multimodal data in the desired format, fusing information from different modalities to generate a prediction, and weighing different modalities.

UGC, such as tweets, has information in different modalities: the text of the tweet, the image attached to the tweet, and the surrounding social context. Jin et al. [10] proposed an RNN with an Attention Mechanism (att-RNN) to fuse the features from the three modalities for rumor detection on micro-blogs. Cross-examination was carried out on two result lists to validate the importance of fusing various modalities: one from the att-RNN and the other from the att-RNN with only text input. It was found that the att-RNN method successfully detected rumors that text-only RNN missed. Another work [104] presented the Event Adversarial Neural Network (EANN) multimodal model, which derives invariant features and thus benefits the detection of fake content on newly arrived events. The experimental results reported that fake tweets were correctly detected by EANN but missed by the unimodal models.

Qi et al. [95] presented a Multidomain Visual NN (MVNN) framework for identifying fake news by fusing visual information from both frequency and pixel domains. The CNN-based framework can automatically identify patterns in fake news images in the frequency domain. A multibranch CNN-RNN model was used to extract visual features in the pixel domain. The attention mechanism was used to dynamically combine the feature representations from both domains. The authors compared the presented approach with the approach in [10, 104] and demonstrated that MVNN outperformed these methods by more than 5.2% in accuracy. In the study by Shah et al. [105], the potential of natural language processing was harnessed and optimized using an evolutionary Cultural algorithm for fake news detection. In the work, the representations from both modalities were combined to produce the desired news vector, which was finally used for classification. Few other works such as [8, 106, 110] have also been done on multimodal credibility assessment.

Most of the works done in the multimodal credibility assessment ignored the background knowledge hidden along with the text content of the post, which facilitates fake content detection. To address this, authors in [1] proposed a knowledge-driven multimodal graph convolutional network. This model integrates textual, knowledge concepts, and visual information into a unified framework to model semantic representations for fake news detection. The text content is transformed into a graph instead of being viewed as word sequences, enabling the modeling of non-consecutive phrases to obtain a better composition of semantics. Researchers in [111] proposed HAN -based deep learning model for fake content detection. The model matches the semantic similarity of the image summary generated by an automatic caption generator and the news headline and verifies whether additional local noise was added during image tampering by duplicating low-quality images and calculating the difference. These textual and visual features are then combined using the max voting method. Authors in [112] utilized three well-known CNN architectures (VGG-19 [113], ResNet50 [114], and InceptionV3 [115]) to extract the latent features of the image, placing greater emphasis on the image's content.

The evaluation of credibility for video content on online social platforms has become more essential due to the widespread dissemination of misinformation. A commonly used approach for assessing video content is using deep learning-based models, which leverage NNs to analyze video content, metadata, and context. Authors in [116] develop Unified Comments Net, a deep network using word2vec and LSTM to detect fake videos using user comments. Researchers in [117] propose a CNN-based model to proficiently identify deceptive videos by incorporating content features, uploader characteristics, and environmental attributes. Domain Knowledge and multimodal data fusion approach is used by authors in [118] for fake video identification. They present fake news video detection model, based on multilingual BERT for comment, title, and description encoding and VGG-19 to extract the visual features from the thumbnail and video frames. Further, they suggest that subtitles, speech of video, and domain agnostic features can be explored to enhance model performance. Advancement in Artificial Intelligence (AI) tools has enabled deepfake content generation. While both fake and deepfake images and videos aim to deceive viewers, deepfakes utilize sophisticated AI algorithms to create highly convincing manipulations, whereas fake videos typically involve simpler editing techniques. Frame level detection [119], visual artifacts [120], and biological signal analysis [121] are some of the common approaches to detect deepfake videos [122].

Table 1 presents a summarized study of literary works in content credibility assessment. Metadata modality in Table 1 refers to the statistical and derived features related to user profile and posts such as number of posts per day, the temporal and historical information of the profile, number of URLs, length of post, and so on. The network modality points to the user–user relationship in the social network and considers connection between them. Text, image, and video modality refers to the post heading and its content, image(s), and video(s) in the post, respectively.

Table 1.

Author	UM	MM	Modality	Approach	Model	Platform	Observation
Castillo et al. 2011 [28]	✓	\(\times\)	Metadata	ML	J48 Decision Tree	Twitter	Credible news is propagated by users with large message history; message with single/few origins and re-posts.
Qazvinian et al. 2011 [90]	✓	\(\times\)	Metadata	ML	MLE	Twitter	Content-, network-, and microblog-specific features are effective.
Feng et al. 2012 [89]	✓	\(\times\)	Text	ML	Parse trees with SVM	Multiple	Suspicious content identified by deep syntactic patterns.
Zhang et al. 2012 [84]	✓	\(\times\)	Text	ML	SVM	Not specified	Feature selection methods are improved by CHI statistics and hypothesis testing.
Yang et al. 2012 [82]	✓	\(\times\)	Metadata	ML	SVM	Weibo	User device and location can classify rumors.
Briscoe et al. 2013 [26]	✓	\(\times\)	Network	ML	Social network graphs	Not specified	Corroboration and degree centrality are indicators of credibility.
Briscoe et al. 2014 [88]	✓	\(\times\)	Text	ML	SVM, gradient boosting	Self-curated	Linguistic cues are present in social media deception.
Gupta et al. 2014 [85]	✓	\(\times\)	Metadata	ML	SVM-Rank	Twitter	Credibility evaluation models and features change with time.
Liu et al. 2015 [35]	✓	\(\times\)	Metadata	ML	SVM	Twitter	ML model with selected features works much faster than manual verification.
Ma et al. 2015 [86]	✓	\(\times\)	Metadata	ML	SVM	Twitter, Weibo	Time series analysis of features improves rumor detection.
Zhao et al. 2015 [123]	✓	\(\times\)	Text	ML	Decision tree	Twitter	Tweets asking verification questions and corrections to controversial statements are signals of early rumors detection.
Wu et al. 2015 [124]	✓	\(\times\)	Metadata	ML, graph	Graph kernel-based hybrid classifier	Weibo	False rumors have a different re-post pattern.
Mukharjee et al. 2015 [29]	✓	\(\times\)	Metadata	graph	Continuous metadata random field	Websites	Credibility of articles is language, topic, perspective, and time-line dependent.
Kumar et al. 2016 [125]	✓	\(\times\)	Metadata	ML	Random forest	Wikipedia	Automated classifiers outperform humans by a big margin.
Rubin et al. 2016 [87]	✓	\(\times\)	Text	ML	SVM	Websites	Inter-link between deception and satire, irony and humor.
Zeng et al. 2016 [36]	✓	\(\times\)	Metadata	ML	Regression	Twitter	Crowd correction is an effective means to prevent misinformation.
Wang et al. 2016 [126]	✓	\(\times\)	Metadata	ML	Expectation maximization	Twitter	Jointly estimates the theme awareness and reliability of sources as well as the theme relevance.
Jin et al. 2016 [91]	✓	\(\times\)	Network	Graph	Iterative deduction	Twitter, Weibo	Conflicting social viewpoints are effective in credibility propagation network for microblogs.
Rosas et al. 2017 [127]	✓	\(\times\)	Text	ML	SVM	News Websites	Linguistic features can give insights into fake and real news.
Wang et al. 2017 [128]	✓	\(\times\)	Network	ML	MLE	Facebook, Twitter Weibo	Different communities have different user interests and different rumor propagation characteristics.
Ma et al. 2017 [129]	✓	\(\times\)	Metadata	ML	Tree kernel	Twitter	Kernel-based approaches have more higher dimension information carrying capabilities over feature-based methods.
Tacchini et al. 2017 [130]	✓	\(\times\)	Metadata	Graph	Harmonic Boolean label crowdsourcing	Facebook	Mapping the diffusion pattern of information can be a useful component of automatic hoax detection systems.
Ruchansky et al. 2017 [94]	✓	\(\times\)	Metadata	DL, graph	CSI	Twitter, Weibo	Text, article responses, and users are three aspects to identify fake news.
Yu et al. 2017 [93]	✓	\(\times\)	Text	DL	CNN	Twitter, Weibo	CNN can extract scattered key features and shape high-level interactions.
Jin et al. 2017 [10]	\(\times\)	✓	Metadata, text, image	DL	LSTM, VGG 19	Weibo, Twitter	Attention-based RNN mechanism can detect rumors.
Wang et al. 2018 [131]	✓	\(\times\)	Text	ML	Decision tree	Websites	Claim-relevance discovery can help identify online misinformation.
Potthast et al. 2018 [132]	✓	\(\times\)	Text	ML	Random forest	Websites	Style-based fake news detection may not work effectively.
Tschiatscheket et al. 2018 [133]	✓	\(\times\)	Network	Graph	Bayesian inference	Facebook	The approach to fake news detection need to learn about user's flagging behavior.
Kim et al. 2018 [134]	✓	\(\times\)	Metadata	Others	Bayesian inference	Twitter, Weibo	Crowd-powered procedure can reduce the spread of fake news using scalable online algorithms.
Huh et al.2018 [96]	✓	\(\times\)	Metadata	Others	ResNet	Columbia, Carvalho, RT	EXIF metadata can be used as a supervisory signal for training a model to determine whether an image is self-consistent.
Wang et al. 2018 [135]	✓	\(\times\)	Text	DL	CNN	Weibo, Twitter	CNN-based multiscale feature attention can select effective features from text.
Yang et al. 2019 [136]	✓	\(\times\)	Metadata	Graph	Bayesian network with Gibbs sampling	Websites	Users’ engagements on social media can be used to identify their opinions toward the authenticity of the news.
Khan et al. 2019 [137]	✓	\(\times\)	Text	DL	LSTM, VGG 19	Websites	Accuracy of the model is proportional to article length.
Reis et al. 2019 [38]	✓	\(\times\)	Metadata	ML	Random forest, extreme gradient boosting	Buzzfeed	Language, source, temporal, and engagement features can be combined for better analysis of fake news.
Shu et al. 2019 [98]	✓	\(\times\)	Text	DL	Sentence-comment co-attention network	Gossipcop, Politifact	Credibility of users and user engagement can be explored to enhance model performance.
Shu et al. 2019 [138]	✓	\(\times\)	Metadata	ML, graph	Tri-relationship optimization	Websites	There is a relationship among publisher, news, and social media engagement.
Khattar et al. 2019 [106]	\(\times\)	✓	Text, image	DL	Word2vec, VGG 19	Twitter, Weibo	Attention mechanism helps to improve the model performance by considering similar parts of image and text.
Singhal et al. 2019 [109]	\(\times\)	✓	Text, image	DL	BERT, VGG 19	Twitter, Weibo	Multimodal methods can be employed for fake news detection.
Yang et al. 2019 [139]	\(\times\)	✓	Text, image	DL	TI-CNN	Websites	CNN models can be trained much faster than LSTM and many other RNN models.
Qi et al. 2019 [95]	✓	\(\times\)	Image	DL	MVNN	MediaEval, Weibo	The pixel domain and frequency domain information is important for detecting fake news.
Nguygen et al. 2019 [140]	\(\times\)	✓	Text, network	DL, graph	Markov random field	Weibo, Twitter, PHEME	The correlations among news articles are effective cues for online news analysis.
Palod et al. 2019 [116]	✓	\(\times\)	Text	DL	Word2vec, LSTM	FVC, VAVD	Simple features extracted from metadata are not helpful in identifying fake videos.
Tanwar et al. 2020 [112]	\(\times\)	✓	Metadata, text	DL	Word2vec, CNN	MediaEval	More explicit feature based on textual information or user profile data can be explored to improve our accuracy.
Giachanou et al. 2020 [108]	\(\times\)	✓	Text, image	DL	Word2vec, CNN	Politifact, Gossipcop, MediaEval	Combining textual, visual, and text-image similarity information is beneficial for the task of fake news detection.
Shah et al. 2020 [105]	\(\times\)	✓	Text, image	ML	Cultural algorithm with radial kernel function	Twitter, Weibo	Cultural algorithm can extract optimum features from text and images.
Zhou et al. 2020 [107]	\(\times\)	✓	Text, image	DL	SAFE	Politifact, Gossipcop	Multimodal features and the cross-modal relationship (similarity) are essential.
Xia et a. 2020 [141]	\(\times\)	✓	Metadata, network	DL	Encoders	Twitter, Weibo	State independent and time-evolving networks can assist in rumor detection.
Silva et al. 2021 [110]	\(\times\)	✓	Metadata, network	DL, graph	Domain-agnostic news classification	Politifact, Gossipcop, CoAID	Modality of news records (propagation network and text) provides unique knowledge.
Song et al. 2021 [8]	\(\times\)	✓	Text, image	DL	Cross-modal attention residual and multichannel CNN	Twitter, Weibo	Keep the unique properties for each modality while fusing the relevant information between different modalities.
Wang et al. 2022 [142]	✓	\(\times\)	Text	DL, graph	Elementary discourse unit	Websites	Granularity between word and sentence with improved text representation can improve fake news detection.
Li et al. 2022 [117]	\(\times\)	✓	Metadata, text	DL	CNN	Bilibili	DL can be more appropriate over ML for misleading video detection.
Choi et al. 2022 [118]	\(\times\)	✓	Text, video	DL	Fake news video detection model	FVC, VAVD	Domain knowledge is effective in assessing fake news videos.
Wang et al. 2023 [99]	✓	\(\times\)	Text	DL	First-order-logic-guided knowledge	Wikipedia	LLMs can be used to better understand context.
Chen et al. 2023 [102]	✓	\(\times\)	Text	DL	LLMs, humans	LLM-generated data	Detecting LLM-generated misinformation is more challenging than human-written misinformation with similar semantics.

Table 1. Summary of Related Works in Content Credibility Assessment

CHI, chi square; DL, deep learning; FVC, Fake Video Corpus; MM, multimodal; MLE, maximum likelihood estimate; PCA, principal component analysis; UM, unimodal.

4.3 Discussion

Researchers in content credibility assessment have used ML models to extract features from text-based content metadata to identify its credibility. However, few researchers have used graph-based approaches to develop inference models. As UGC is increasingly multimodal, it is common to approach multimodal content-based credibility assessment as a two-step process. The first step entails the extraction of features from both text and images separately. The second step focuses on the fusion of these text- and image-based features to create the final feature vector for classification. This methodology serves as the foundation of the future work. The content credibility assessment can be further taken by extracting context-based features at the sentence level from the input text and fusing them with significant image-based features by evaluating multiple fusion approaches rather than limiting them to any single approach. Content credibility assessment is especially challenging in certain domains, such as healthcare and finance. This is because medical and financial information is often complex and technical. Additionally, most existing models treat content credibility assessment as a generalized problem, which means they may not be effective at assessing the credibility of information in these specific domains. Another challenge is real-time content credibility assessment. This requires developing highly efficient models and establishing robust data pipelines to process and analyze large amounts of data in real time. It is crucial to ensure that the assessment process does not introduce delays in the dissemination of information, as misinformation can spread rapidly.

5 User-Level Credibility Assessment

With the increasing use of social media, suspicious user activities have received wide attention. The simplified process of creating profiles and making new connections with little or no direct ownership of the content posted has led to an increase in suspicious activities. Suspicious accounts that engage in such activities tend to spread spam content and fake news, troll and bully celebrities and rivals, and favor or oppose individuals, organizations, and products. With little or no background verification of new users, online social networks are used by unethical users to create suspicious profiles for personal or organizational benefit. Facebook took down approximately 1.9 billion spam posts in the first quarter of 2020. Over 500 fake accounts, 131 Instagram accounts, 78 network groups, and 146 pages worldwide were also removed for engaging in inauthentic behavior (Facebook, 2020). The term “fake profile” or “fake account” on social media platforms can be defined in multiple ways. Few definitions of fake profiles have been laid out by some of the heavily engaging social media platforms, as shown in Table 2.

Table 2.

Social Media Platform	Definition of Fake/Suspicious Profiles
Twitter^a	Fake accounts are operated to engage in spam, interfere in civic processes, carry out financial scams, artificially inflate engagement, or abuse and harass others.
Facebook^b	Fake profile is a profile where someone is pretending to be something or someone that doesn’t exist. These profiles can include profiles for fake or made-up people, celebrities, or organizations.
LinkedIn^c	A profile may be fake if it appears empty or contains profanity, fake names, or impersonates public figures.

Table 2. Fake Profiles’ Definitions as per Prominent Social Media Platforms

https://help.twitter.com/en/rules-and-policies/twitter-impersonation-and-deceptive-identities-policy

https://www.facebook.com/help/1216349518398524/helpref=hc_fnav

https://www.linkedin.com/help/linkedin/answer/61664/report-fake-profiles?lang=en

Suspicious accounts indulge in posting or re-sharing false information, including biased versions of original content. Such accounts are consciously created using manual or automated procedures with false information to perform malicious activities on digital platforms. The creation of suspicious profiles can easily be achieved on online social platforms because many social media platforms only require a valid e-mail address for registration and creating a personalized profile. The information provided during the registration process is not verified, allowing a user to easily impersonate someone to deceive others for personal gain or fun. Additionally, users can conveniently create profiles by providing fake registration details. The presence of a large number of malicious automated social media accounts called bots and manually operated dubious accounts makes it difficult to identify and isolate such accounts. Most of the work on identifying suspicious profiles has focused on the Twitter platform. Researchers have used static and dynamic user profile-based and post-text-based features to identify suspicious profiles. Next, we discuss the work done in identifying suspicious users using unimodal approaches.

5.1 User Credibility Assessment Based on Unimodal UGC

As discussed previously, unimodal UGC focuses on a single mode of information, such as the text of user posts, user metadata, and others. Unimodal models consider only one type of information and, therefore, are less complex.

Text-metadata-based methods are commonly used to conduct text categorization and sentiment analysis. Kaur et al. [143] examined several text-based features, including stylometric, n-grams, and Bag of Words, and used the Analytic Hierarchy Process (AHP) and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to rank and assign appropriate weights to different features. The authors recommended exploring idiosyncratic, syntactic, and semantic features as well. Seylet et al. [144] detected fake accounts by using statistical text analysis. In another work, Agarwal et al. [78] created a supervised model that examined Facebook users’ emotions to detect fake accounts. The presented model revealed that fake profiles frequently used negative terms such as “hate,” “kill,” and “ugly.” After applying multiple ML classifiers to a set of chosen features, the model achieved a 90.91% accuracy rate with a random forest classifier. The user's post may contain additional information, such as images which has not been investigated for identifying suspicious users in the above work.

User metadata features collectively capture an account's personality and behavioral pattern. Khaled et al. [145] proposed a NN-based model to identify fake profiles and bots on Twitter. The authors used feature selection and dimensionality reduction techniques in their work and concluded that combining Support Vector Machine (SVM) and NN models resulted in the highest accuracy of 98.3%. Some researchers worked on identifying a network of fake users by applying graph-based approaches to user metadata [146, 147]. Gurajala et al. [80] analyzed 62 million Twitter user profiles and employed map-reduce and pattern-recognition approaches for fake accounts detection. The authors noted that the profile registration time and the set of fake profile URLs exhibited different behaviors compared to the ground truth dataset. However, the authors also acknowledged that this technique has a limitation in that it can only detect a small percentage of fake users in the experiment. Zhu et al. [148] proposed a novel framework, the supervised matrix factorization method with social regularization, that extracts social activity features to predict spammers. The label information and its relationship with the social graph are used in learning latent features. Other notable work to detect suspicious users is done by [55, 149, 152]. Some researchers have used a hybrid of post-based and user metadata-based approaches to evaluate users effectively [152]. Rathore et al. [75] proposed an effective SpamSpotter framework on Facebook for spammer detection. The proposed SpamSpotter has an Intelligent Decision Support System that leveraged eight ML classifiers on the baseline dataset from Facebook. The baseline dataset included 300 spammers and 700 legitimate user profiles. Next, we discuss notable work done in identifying suspicious profiles by considering multiple modalities.

5.2 User Credibility Assessment Based on Multimodal UGC

Limited work has been done on user-level credibility assessment that considers multiple modalities, such as text and/or images of user posts and user and text metadata. Fazil et al. [153] proposed a model that feeds profile, temporal, and activity information as sequences to a two-layer stacked BiLSTM and content information to a deep CNN. The datasets provided by [154] were used for model building, and results showed that the proposed model achieved an accuracy of above 94% on all datasets. The authors observed that an unbalanced dataset adversely affects the performance of the social bot detection approaches. Sansonetti et al. [155] investigated the use of deep features, both from an automatic and a human point of view, which is more predictive for identifying social network profiles accountable for spreading fake news in the online environment. Graph-based approaches have also been used extensively to evaluate user credibility by assessing the user network [156, 158]. Table 3 summarizes the related works in the user credibility assessment. Metadata modality in Table 3 refers to the statistical and derived features related to user profile and posts such as number of followers, and following accounts, their ratio, number of posts per day, the temporal and historical information information of the profile, number of URLs, and so on. The network modality points to the user–user relationship in the social network and considers connection between them. Text, image, and video modality refers to the post heading and its content, image(s) and video(s) in the post, respectively. Tables 1 and 3 comprehensively categorize existing work by modality, but classifying specific details can be nuanced. The authors have carefully examined the studies and identified the most prominent modalities. However, other modalities might also be relevant in certain cases.

Table 3.

Author	UM	MM	Modality	Approach	Model	Platform	Observation
Lee et al. 2011 [159]	✓	\(\times\)	Metadata	ML	Random forest	Twitter	Mis-classified users have a low standard deviation of numerical IDs of following and followers.
Zhu et al. 2012 [148]	✓	\(\times\)	Metadata	ML	Collective matrix factorization, SVM	Renren	Based on users’ social actions and social relations, spammers can be detected.
Ahmed et al. 2013 [65]	✓	\(\times\)	Metadata	ML	J48	Facebook, Twitter	Different profile-based features have varied impact on detection capabilities.
Alowibdi et al. 2014 [149]	✓	\(\times\)	Metadata	ML	Bayesian classifier	Twitter	Combination of multiple profile's characteristics from each Twitter user improves deception detection.
Cao et al. 2014 [160]	✓	\(\times\)	Metadata	Graph	User similarity graph	Facebook	Accounts that act similarly at around the same time for a sustained period can be grouped.
Ruan et al. 2015 [55]	✓	\(\times\)	Metadata	Statistical	Behavioral patterns	Facebook	Impostors’ social behaviors can hardly conform to the authentic user's behavioral profile.
Xiao et al. 2015 [161]	✓	\(\times\)	Metadata	ML	Random forest	LinkedIn	Basic distribution, pattern, and frequency features can be used to identify a cluster of fake users.
Gurajala et al. 2015 [80]	✓	\(\times\)	Metadata	ML	Pattern matching	Twitter	Activity-based profile-pattern detection scheme provides a quick way to identify potential spammers.
Tsikerdekis et al. 2016 [146]	✓	\(\times\)	Network	Others	Common contribution network	Reddit	The proposed model can have high computational overhead with more users.
Cresci et al. 2016 [71]	✓	\(\times\)	Metadata	Pattern Matching	DNA-inspired techniques	Twitter	DNA-inspired techniques can be used to model user behavior.
Singh et al. 2016 [162]	✓	\(\times\)	Metadata	ML	Random forest	Twitter	Behavioral characteristics can be used to differentiate between spammers and genuine users.
Zoubi et al. 2017 [163]	✓	\(\times\)	Metadata	ML	Naive Bayes and decision tree	Twitter	Suspicious words and the repeated words greatly influence the detection process's accuracy.
Kaur et al. 2018 [143]	✓	\(\times\)	Text	ML	K-NN	Twitter	AHP-TOPSIS method to rank and give appropriate weights to different features for each user improves results.
Walt et al. 2018 [150]	✓	\(\times\)	Metadata	ML	Random forest	Twitter	Engineered features are not successful in detecting fake accounts generated by humans.
Caruccio et al. 2018 [164]	✓	\(\times\)	Metadata	Statistical	Relaxed functional dependencies	Twitter	Automatic procedures can hardly simulate all human behaviors.
Agarwal et al. 2019 [78]	✓	\(\times\)	Text	ML	Random forest	Facebook	Three emotion categories, fear, surprise, and trust, are found least in the posts of fake users.
Khaled et al. 2018 [145]	✓	\(\times\)	Metadata	ML	SVM-NN	Twitter	Spearmans Rank correlation technique selects the best features and removes redundancy.
Rathore et al. 2018 [75]	✓	\(\times\)	Metadata	ML	Bayesian network	Facebook	Profile- and content-based features enhance spammer detection.
Singh et al. 2019 [165]	✓	\(\times\)	Metadata	ML	SVM	Twitter	Malicious users also tend to use spam bots to increase their followers count.
Akyon et al. 2019 [73]	✓	\(\times\)	Metadata	ML	SVM	Instagram	Normalization and feature selection algorithms can be used to mitigate bias in the dataset.
Zarei et al. 2019 [166]	✓	\(\times\)	Metadata	ML	k-Means	Instagram	Text can be considered to understand what users publish and find a pattern in it.
Yuan et al. 2019 [147]	✓	\(\times\)	Network	Graph	Graph inference	Wechat	Account registration information can be used for early sybil detection.
Wanda et al. 2020 [152]	\(\times\)	✓	Metadata, Network, Text	DL	CNN	Twitter	Fake accounts should be detected quickly before interacting with real users by capturing informative features from content posted and metadata of profile.
Adikari et al. 2020 [53]	✓	\(\times\)	Metadata	ML	SVM with polynomial kernel	LinkedIn	PCA-based feature selection followed by SVM modeling with the polynomial kernel can be used for identifying fake profiles when a limited number of profile features are publicly available.
Breuer et al. 2020 [167]	✓	\(\times\)	Network	Graph	SybilEdge	Facebook	Focusing on interaction of new fake users with other users is insightful.
Fazil et al. 2022 [153]	\(\times\)	✓	Metadata, Text	DL	LSTM	Twitter	Using the description text improved the cross-domain performance of the model.
Wanda et al. 2022 [168]	✓	\(\times\)	Metadata	DL	CNN	Twitter	Adding the Gaussian function to the non-linear classifier helps achieve better performance.
Verma et al. 2022 [169]	\(\times\)	✓	Metadata, Text	DL	RoBERTa, BiLSTM, random forest	Twitter	Using ML, DL, and pretrained models with voting classifier on text and numeric metadata improves model performance.
Goyal et al. 2023 [156]	\(\times\)	✓	Network, Text, Image	DL, Graph	CNN, LSTM, GCN	Twitter	Incorporating multimodal data allows for improved detection of bogus accounts.
Breuer et al. 2023 [157]	✓	\(\times\)	Network	Graph	Preferential attachment k-class classifier	Facebook	Analyzing the friend request behavior of new users can give insightful information in early detection.
Khan et al. 2024 [158]	\(\times\)	✓	Metadata, Network, Text	DL, Graph	Graph neural networks	Twitter	User's profile, content shared, and user–user interaction network provides important cues in identifying malicious users.

Table 3. Summary of Related Works in User Credibility Assessment

MM, multimodal; UM, unimodal; MLE, maximum likelihood estimate; PCA, principal component analysis.

5.3 Discussion

Most research in this domain explores user metadata with feature selection and ML models. Handcrafted user features provide insights into the different characteristics of user profiles. Researchers have also explored graph-based techniques to identify networks of bots. However, evaluating the user's content can enhance assessment capabilities. Little recent work has used multimodal features to assess user credibility, possibly due to the lack of labeled multimodal datasets for training such models. Nevertheless, multimodal data can provide the model with more context about the user, their behavior, and the content they post.

6 Research Gaps

The study of previous works enabled us to identify common observations and trends in the credibility assessment of UGC. We also identified the following research gaps:

—

Most of the existing works are application-specific, while suspicious content and profile notation are problem-specific.

—

The text-based approaches for credibility assessment lack in considering text context for identifying suspicious content. They are unable to handle polysemous words and homographs present in the content. Recent advancements in LLMs have substantially mitigated this challenge [170]. However, they may exhibit biases in their understanding of context and can be susceptible to generating responses that lack coherence or alignment with the intended context [171]. Therefore, combining LLM analysis with other techniques, like fact-checking or analyzing the source's reputation, can be explored for more robust credibility assessment models.

—

The existing works in identifying suspicious content have given more attention to text-based models. However, the UGC is evolving and is increasingly becoming multimodal. Thus, there is a need for efficient multimodal data-driven approaches for assessing content credibility.

—

The existing works identifying suspicious profiles has given more attention to user-metadata-based approaches. Incorporating multiple modalities can give better knowledge about users’ behavior and the nature of their posts.

—

The datasets used in the related work typically contain only user-metadata. Few datasets contain posts of users as well. However, to the best of our knowledge, there is a lack of state-of-the-art labeled multimodal datasets of suspicious user profiles.

The research gaps we identified can motivate researchers working in this domain to develop new context-aware multimodal frameworks for credibility assessment. We believe that exploiting multimodal features for content and user credibility assessment can enhance model capabilities. Therefore, we infer that deep learning-based context-aware multimodal frameworks are required to improve the credibility assessment of UGC.

7 Multimodal Datasets

To detect fake news, multimodal fake news detection models are trained on datasets that contain a mix of different data types, such as text, images, videos, and sometimes audio. Here are some common examples of datasets used for multimodal fake news detection:

—

FakeNews-Kaggle¹: This dataset, available online, comprises approximately 11,941 fake and 8,074 real news articles covering various events and stories. The fake news entries consist of text and metadata scraped from over 240 websites by Megan Risdal on Kaggle. The real news is crawled from well-known authoritative news websites such as the New York Times, Washington Post, and others. It provides nearly all the pertinent details for each news article, including title, text, images, and video, along with information about the author and, when possible, the website source. These websites, including heavyweights like “TheNewYorkTimes” and “Washington Post,” are used exclusively to distinguish between real and fake news by solely utilizing the associated attributes.

—

FakeNewsNet [172]: Named as “FakeNewsNet,” this dataset encompasses comprehensive details about events and news, accompanied by social context and spatial-temporal information. The creators of the dataset gathered its contents by utilizing fact-checking websites like “PolitiFact” and “GossipCop.” The dataset comprises fake and real news entries and their corresponding ground reality information.

—

Fakeddit [173]: This dataset comprises a vast collection of multimedia content gathered from highly diverse sources. The data are primarily sourced from Reddit, a popular social news and discussion forum covering various topics. Each topic is represented as a subreddit with its distinct theme. The dataset includes over 800,000 submissions from 21 different subreddits, incorporating images, text, comments, and submissions by other users within the same subreddit. Additionally, it provides information about the subreddit's score, source domain, upvotes, and downvotes. Approximately two-thirds of the samples contain multimedia content, while the remaining samples consist solely of textual information.

—

Breaking News dataset [174] : The corpus includes news articles that incorporate images and text. Initially, it encompassed around 100,000 news articles published in English during 2014, spanning various domains and a wide range of topics, including sports, politics, and healthcare. A subset called “TamperedNews” was subsequently created by Qi et al. [95] for the purpose of cross-modal consistency verification. This subset comprises 72,561 articles, with the news text and image still available.

—

News400 dataset [175]: The Twitter API was utilized by the authors to acquire web links (URLs) of news articles from three popular German news websites, namely faz.net, haz.de, and sueddeutsche.de. The text and main images of these articles were then crawled from the obtained URLs. The dataset consists of 400 news articles covering four different topics: politics, economy, sports, and travel, collected between August 2018 and January 2019. Due to the dataset's smaller size, the authors were able to conduct a manual annotation process with the assistance of three experts, ensuring valid relationships between images and text. For each document, the annotators verified the presence of at least one person, location, or event in both the image and text, ensuring consistency in context across both modalities.

—

MediaEval dataset [176]: The Twitter dataset is sourced from the MediaEval Verifying Multimedia Use benchmark, designed for detecting fake content on Twitter. It consists of two distinct parts: the development set and the test set. Within the Twitter dataset, the tweets encompass text content, attached images/videos, and supplementary social context information. Notably, the two sets do not share any overlapping events. As for the dataset specifics, it comprises 7,898 instances of fake news, 6,202 instances of real news, and a total of 514 images.

—

Weibo dataset [10]: The authors curated a dataset from Weibo, ensuring objective ground-truth labels. Specifically, they collected all the verified false rumor posts from May 2012 to January 2016, utilizing Weibo's official rumor debunking system. This system encourages regular users to report suspicious tweets, which are then examined and verified as false or real by a committee comprising reputable users. For non-rumor tweets, the dataset incorporates tweets verified by Xinhua News Agency, an authoritative news agency in China. The dataset comprises original tweet texts, attached images, and available surrounding social contexts from both rumor and non-rumor sources. Initially, the raw set contains approximately 40,000 tweets with images after removing text-only tweets. However, tweets on wild social media often contain redundancy and noise, necessitating the removal of duplicated images using a near-duplicated image detection algorithm based on locality-sensitive hashing.

—

NewsBag [177]: The NewsBag dataset consists of 200,000 real news articles and 15,000 fake news articles. The real news articles were obtained by scraping content from the Wall Street Journal. On the other hand, the fake news articles were scraped from ”The Onion,” a publication that publishes satirical articles related to both real and fictional events. To ensure the difficulty level of the database and the authenticity of the articles, the authors had several test subjects manually verify that the selected 15K articles are solely those covering fake events. Despite this verification process, the NewsBag dataset remains highly imbalanced. In response to the imbalanced nature of the dataset, the authors expanded their dataset, creating NewsBag++ [96], which now contains 200K real news articles and 389K fake news articles.

—

Multimodal Entity Image Repurposing [178]: This dataset contains multimedia packages, where each package corresponds to a pair of images and text. The text serves as a user-generated description related to the image. There are a total of 57,940 packages in this dataset, and among them, 28,976 packages are divided into 14,397 location manipulations, 7,337 person manipulations, and 7,242 organization manipulations. The dataset is further divided for training, validation, and test purposes, comprising 40,940, 7,000, and 10,000 packages, respectively. Each package encompasses three modalities: image, location, and text. The location modality includes country, county, region, and locality names, along with GPS coordinates. The text modality represents the user-generated description related to the image. Additionally, there is an extra modality—time-stamp—containing the date and time associated with each package. These packages are distributed across 106 countries, with a significant contribution from English-speaking countries.

—

Youtube videos dataset [179]: This dataset presents a collection of real-life deception detection videos. It comprises 121 videos, with 61 of them being deceptive trial clips and the remaining 60 being truthful trial clips. On average, the videos in this dataset are approximately 28.0 seconds long. Specifically, the deceptive clips have an average length of 27.7 seconds, while the truthful clips have an average length of 28.3 seconds. The data feature 21 distinct female speakers and 35 unique male speakers, with their ages approximately spanning from 16 to 60 years.

—

WIT: Wikipedia- based image text dataset [180]: The dataset encompasses diverse texts linked to images, sourced from Wikipedia articles and Wikimedia image links. It is a context-rich, multilingual, and multimodal dataset from over 100 languages. This wealth of information makes WIT highly versatile and suitable for various applications, such as pretraining multimodal models, fine-tuning image-text retrieval models, and constructing cross-lingual representations.

—

VisualNews dataset [181]: VisualNews is a comprehensive dataset comprising news articles, images, captions, and additional metadata sourced from four prominent news agencies: The Guardian, BBC, USA Today, and The Washington Post. The creators of the dataset ensure that a corresponding caption and substantial news article text accompany each image. Moreover, the dataset contains valuable metadata, including article titles, authors, and geo-locations. With over 1 million images and more than 600,000 articles, this dataset offers considerable scale. Its diversity is attributed to the inclusion of articles from four distinct news agencies, making it a valuable resource for various applications.

—

NewsCLIPpings dataset [182]: This dataset is sourced from VisualNews [181] containing image-caption pairs from news agencies like The Guardian, BBC, USA Today, and The Washington Post. Named entities are labeled using spaCy NER and linked to Wikipedia entries using the Radboud Entity Linker. Text and image embeddings are computed using various methods like SBERT-WK, CLIP, and Faster R-CNN. There are around 816K unique samples in the train set and around 85K samples each in the test and validation sets. The NewsCLIPpings dataset aims to provide a comprehensive benchmark for evaluating models’ ability to detect and combat misinformation by generating challenging image-caption pairs that closely resemble real-world scenarios.

—

COSMOS dataset [183]: The dataset consists of images sourced from news channels and fact-checking websites. These images are related to a wide variety of topics such as politics and climate change. The dataset consists of 200K images with 450K corresponding text captions.

—

COVID-VTS [184]: The authors curated a benchmark dataset, COVID-VTS, for fact-checking in short video platforms. The presented dataset is claimed to have higher prevalence of speech and on-screen text, with 87.5% of videos containing speech and over 77% displaying on-screen text. Also, 90.8% of sentences in the dataset contain named entities. Thus, it can act as a valuable resource for research into video-text alignment.

—

FakeSV [185]: FakeSV dataset consists of relevant videos crawled from two popular short video platforms in China, i.e., Douyin (the equivalent of TikTok in China) and Kuaishou. The dataset consists of 1,827 fake news videos, 1,827 real news videos, and 1,884 debunked videos under 738 events. The dataset also contains information regarding user responses and publisher profiles. The percentage of real and fake news videos with comments is 75% and 68%, respectively.

—

Fake Video Corpus (FVC) [186]: The dataset includes debunked and verified user-generated videos along with near-duplicate reposted versions. Different types of fake videos are included, such as staged videos, videos with false contextual information, past videos presented as breaking events, edited videos, and computer-generated imagery. The videos in the dataset cover various event categories like politics, sports, natural disasters, and so on. It comprises 200 unique debunked (fake) videos and 180 unique verified (real) videos URLs of Youtube. It is possible that some of them have been removed from the platform.

—

YouTubeAudit-data [187]: The YouTube audit dataset comprises over 56K YouTube videos, spanning 5 popular misinformative topics. These videos represent three key components of YouTube: search results, Up-Next recommendations, and Top 5 recommendations. The videos are labelled using 3-point normalized scores with values \(-\)1 (Promoting), 0 (Neutral), and 1 (Debunking). The data collection process involved conducting audit experiments known as ”search” and ”watch” audits. These experiments aimed to explore the prevalence of algorithmically surfaced misinformation on YouTube and how it is influenced by personalization attributes such as gender, age, geolocation, and watch history.

—

ReCOVery dataset [188]: The dataset comprises 2,029 news articles, with 2,017 containing both textual and visual information. Around 1,747 of these articles have been shared on social media platforms. The ratio of reliable to unreliable news articles in the dataset is approximately 2:1. A summary of multimodal datasets in presented in Table 4. There are other existing multimodal datasets but we have only considered those multimodal datasets which are pertinent to the credibility assessment domain.

Table 4.

Dataset	Key Modalities	Size	Link for the Dataset
FakeNews-Kaggle	Text, image, video	11,941 fake and 8,074 real news	https://www.kaggle.com/competitions/fake-news/data
FakeNewsNet [172]	Text, image, spatial, temporal, user, network	17,441 real and 5,755 fake news articles with user activity data	https://github.com/KaiDMML/FakeNewsNet
Fakeddit [173]	Text, image, and other metadata of post	628,501 fake and 527,049 real posts	https://github.com/entitize/fakeddit
Breaking News dataset [174]	Text, image	100,000 news articles	https://www.iri.upc.edu/people/aramisa/BreakingNews/
Tampered News dataset	Text, image	72K news articles	https://data.uni-hannover.de/dataset/tamperednews
News400 dataset [175]	Text, image	400 news articles	https://github.com/TIBHannover/cross-modal_entity_consistency
Twitter dataset [176] (MediaEval)	Text, image	Around 6,000 real and 9,000 fake posts	https://github.com/MKLab-ITI/image-verification-corpus
Weibo dataset [10]	Text, image, social context	Around 50,000 tweets	https://github.com/yaqingwang/EANN-KDD18/tree/master/data/weibo
NewsBag, NewsBag++ [177]	Text, image	200K real and 15K fake news articles (NewsBag) 200K real and 389K fake news articles (NewsBag++)	Available on request
Multimodal Entity Image Repurposing [178]	Text, image, spatial, temporal	57,940 packages	https://github.com/Ekraam/MEIR/tree/master/dataset
Youtube videos dataset [179]	Video and audio	121 video clips (61 deceptive, 60 truthful)	Available on request
WIT: Wikipedia- based image text dataset [180]	Text, image	37.6 million image-text sets and 11.5 million unique images	https://github.com/google-research-datasets/wit
VisualNews dataset [181]	Text, image, and other metadata	Over 1 million images and more than 600,000 articles	https://github.com/FuxiaoLiu/VisualNews-Repository Available on request
NewsCLIPpings dataset [182]	Text, image, metadata	Over 986K unique images with captions	https://github.com/g-luo/news_clippingsdata-format Available on request
COSMOS dataset [183]	Text, image	Over 200K images with 450K corresponding text captions	https://shivangi-aneja.github.io/projects/cosmos/ Available on request
COVID-VTS [184]	Text, video, speech	10K video-claim pairs	https://drive.google.com/drive/folders/1xT4QaOZQlZtW9Ul36VCJ4arZQ94-Ok3V
FakeSV [185]	Text, video, audio, metadata	Over 3.6K videos in Chinese with metadata	https://github.com/ICTMCG/FakeSV
FVC [186]	Text, video, audio, metadata	200 fake and 180 real videos	https://github.com/MKLab-ITI/fake-video-corpus
YouTubeAudit-data [187]	Video, title (text), metadata	56K videos, covers 5 popular misinformative topics	https://social-comp.github.io/YouTubeAudit-data/
ReCOVery dataset [188]	Text, image, metadata	Over 1.7K multimodal news articles; 93K users sharing 140K tweets	https://github.com/apurvamulay/ReCOVery

Table 4. Summary of MultiModal Datasets with Their Access Links

8 Scope for Future Work

This work presented a comprehensive analysis of unimodal and multimodal approaches to credibility assessment of online UGC. However, as online platforms and their users continue to evolve, the features and capabilities of these platforms are continually being updated. Future work in UGC credibility assessment can include the following:

—

Developing transferable models to unseen events: Current models typically extract and learn features specific to certain events, such as elections. This makes the model prone to bias and unsuitable for new and emerging events. It would be highly beneficial to develop models that differentiate between general and event-specific features and transfer their knowledge.

—

Interpretability of multimodal models: Another opportunity in multimodal credibility assessment is the creation of explainable frameworks, which can aid in the better understanding and interpretation of predictions made by detection models that use multiple modalities.

—

Employing knowledge-based methods: Employing knowledge-based methods to verify the factuality of claims based on past fact-checking could be an effective solution. These approaches could be especially more helpful when people, such as politicians, frequently make the same claims. Current efforts in this domain are limited, and there is potential for further exploration by establishing a shared database of prior fact-checked claims and harmful content.

—

Credibility assessment in closed networks: Credibility assessment in closed networks, such as closed social media groups or private messaging applications, can be challenging because of the limited access to information. However, sampling techniques and simulated networks can be used to train models on similar data.

—

Cross-modal ambiguity learning: As online social communities continue to produce a vast array of multimodal content, cross-modal learning can be explored for content and user credibility. The possibility of intrinsic ambiguity across different content modalities can be a challenge in credibility assessment.

—

Credibility assessment of multilingual UGC: Most of the work related to the credibility assessment of UGC is based on the English language. As more of the developing world gets online and as some of the large social networking platforms focus on localization, there is a need to have these models operate effectively in other languages. There is also a need to have substantially large datasets in other languages besides English.

—

Real-time assessment: As the UGC has deep penetration power and spreads quickly, developing efficient and lightweight frameworks for real-time detection of suspicious content and suspicious users is important.

9 Conclusion

This article comprehensively reviewed the current state of research on assessing the credibility of multimodal UGC. It introduced the challenges faced by existing methods and identified opportunities for further research. The major datasets used in the multimodal content domain were also summarized. The analysis revealed that most multimodal UGC research focuses on identifying fake news. Additionally, most existing models are supervised and use early fusion techniques, combining low-level features from different modalities to create a joint representation. A single model is then trained to learn the correlation and interactions between these low-level features. Notably, most of these methods lack generalization across datasets of diverse domains, and model explainability has not been addressed in any of the studies. To improve fake news detection, the authors suggest considering other modalities, such as videos and audio, which have even more powerful storytelling capabilities to communicate information to society.

Footnote

https://www.kaggle.com/competitions/fake-news/data

References

[1]

Hongwei Wang, Fuzheng Zhang, Xing Xie, and Minyi Guo. 2018. DKN: Deep knowledge-aware network for news recommendation. In World Wide Web Conference. 1835–1844.

Abstract

1 Introduction

2 Background

2.1 Motivation

2.2 Key Challenges

3 Credibility Assessment of UGC

3.1 Review Methodology

4 Content-Level Credibility Assessment

4.1 Content Credibility Assessment Based on Unimodal UGC

4.2 Content Credibility Assessment Based on Multimodal UGC

4.3 Discussion

5 User-Level Credibility Assessment

5.1 User Credibility Assessment Based on Unimodal UGC

5.2 User Credibility Assessment Based on Multimodal UGC

5.3 Discussion

6 Research Gaps

7 Multimodal Datasets

8 Scope for Future Work

9 Conclusion

Footnote

References

Index Terms

Recommendations

Exploring the user-generated content (UGC) uploading behavior on youtube

User generated content and credibility evaluation of online health information

Multimodal Analysis of User-Generated Multimedia Content

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations