Pavan Final
Pavan Final
                                    Project Report
                    Submitted In partial fulfilment for the award of the degree of
                                     Submitted by
                                    M.PAVAN KUMAR
                                  (REGD.NO.18L35F0009)
CERTIFICATE
This is to certify that the project report entitled “DETECTING FAKE NEWS USING MACHINE
LEARNING AND DEEP LEARNING ALGORITHM” is a bonafide record of project work car-
ried out under my supervision by M. PAVAN KUMAR (18L35F0009) during the academic year
2019-2020, in partial fulfilment of the requirements for the award of the degree of Master of Com-
puter Applications Jawaharlal Nehru Technological University, Kakinada. The results embodied in
this project report have not been submitted to any other University or Institute for the award of any
Degree or Diploma.
                                         External Examiner
      Detecting fake news using machine learning and deep learning algorithms
DECLARATION
We hereby declare that the project report entitled “DETECTING FAKE NEWS USING MACHINE
LEARNING AND DEEP LEARNING ALGORITHM” has been carried out by us and has not been
submitted either in part or whole for the award of any degree, diploma or any other similar title to this or
any other university.
DATE:                                                                      (18L35F0009)
       Detecting fake news using machine learning and deep learning algorithms
                                  ACKNOWLEDGEMENT
It gives us a great sense of pleasure to acknowledge the assistance and cooperation we have received from
several persons while undertaking this MCA Final Year Project. We owe special debt of gratitude to
Mr.K.LEELA PRASAD Asst. prof., Department of information technology for his constant support and
guidance throughout the course of our work. His guidance have been a constant source of inspiration for
us.
We also take the opportunity to acknowledge the contribution of HOD, Dr. B. Prasad, Assoc.Prof.,, De-
partment of Master of Computer Applications for full his support and assistance during the development
of the project.
We want to thank Dr. B. Arundhati ,Principal of VIIT and the Management for providing all the neces-
sary facilities.
We also acknowledge the contribution of all faculty members of the department for their kind assistance
and cooperation during the project.
M.PAVAN KUMAR
                                                                                  (18L35F0009)
       Detecting fake news using machine learning and deep learning algorithms
ABSTRACT
This Project comes up with the applications of NLP (Natural Language Processing) techniques for detect-
ing the 'fake news', that is, misleading news stories that comes from the non-reputable sources. Only by
building a model based on a count vectorizer (using word tallies) or a (Term Frequency Inverse Docu -
ment Frequency) tfidf matrix, (word tallies relative to how often they’re used in other articles in your
dataset) can only get you so far. But these models do not consider the important qualities like word order-
ing and context. It is very possible that two articles that are similar in their word count will be completely
different in their meaning. The data science community has responded by taking actions against the prob -
lem. So a proposed work on assembling a dataset of both fake and real news and employ a Naive
Bayes ,logistic regression,randomforest classifier in order to create a model to classify an article into fake
or real based on its words and phrases.
Detecting fake news using machine learning and deep learning algorithms
TABLE OF CONTENTS
1. INTRODUCTION 1-6
2.LITERATURE SURVEY 7
4. ALGORITHMS 17-20
5. METHODOLOGY 21-34
8. CONCLUSION 60-61
                                                                          62-63
            9.       REFERENCES
Detecting fake news using machine learning and deep learning algorithms
                                 CHAPATER-1
                               INTRODUCTION
   Detecting fake news using machine learning and deep learning algorithms
1.1 INTRODUCTION:
        The rise of fake news during the 2016 U.S. Presidential Election highlighted not only
the dangers of the effects of fake news but also the challenges presented when attempting to
separate fake news from real news. Fake news may be a relatively new term but it is not ne-
cessarily a new phenomenon. Fake news has technically been around at least since the ap-
pearance and popularity of one-sided, partisan newspapers in the 19thcentury However, ad-
vances in technology and the spread of news through different types of media have increased
the spread of fake news today. As such the effects of fake news have increased exponentially
in their past and something must be done to prevent this from continuing in the future.
        I have identified the three most prevalent motivations for writing fake news and
chosen only one as the target for this project as a means to narrow the search in a meaningful
way. The first motivation for writing fake news, which dates back to the 19th century one-
sided party newspapers is to influence public opinion. The second, which requires more re-
cent advances in technology, is the use of fake head-lines as click it to raise money. The third
motivation for writing fake news, which is equally prominent yet arguably less dangerous, is
satirical writing while all three subsets of fake news, namely ,(1)clickbait, (2),influential, and
(3)satire, share the common thread of being fictitious, their wide spread effects are vastly dif-
ferent. As such, this paper will focus primarily on fake news as defined by politifact “fabric-
ated content that intentionally masquerades as news coverage of actual events.” This defini-
tion excludes satire, which is intended to be humorous and not deceptive to readers.
        Most satirical articles come from sources like The Onion which specifically distin-
guish themselves as satire. Satire can already be classified by machine learning techniques ac-
cording to Therefore our goal is to move beyond these achievements and use machine learn-
ing to classify, at least as well as humans, more difficult discrepancies between real and fake
news.
The dangerous effects of fake news, as previously defined, are made clear by events such as
in which a man attacked a pizzeria due to a wide spread fake news article. This story along
with analysis from provide evidence that humans are not very good at detecting fake news,
possibly not better than chance as such, the question remains whether or not machines can do
a better job. There are two methods by which machines could attempt to solve the fake news
problem better than humans.
     Detecting fake news using machine learning and deep learning algorithms
    The first is that machines are better at detecting and keeping track of statistics than humans, for
  example it is easier for a machine to detect that the majority of verbs used are “suggests” and“im-
  plies ” versus, “states” and “proves.”
  Additionally, machines may be more efficient in surveying a knowledge base to find all relevant
  articles and answering based on those many different sources. Either of these methods could
  prove useful in detecting fake news but we decided to focus on how a machine can solve the fake
  news problem using supervised learning that extracts features of the language and content only
  within the source in question, without utilizing any fact check error knowledge base. For many
  fake news detection techniques, a fake article published by a trust worthy author through a trust
  worthy source would not be caught. This approach would combat those “false negative” classific-
  ations of fake news. In essence, the task would be equivalent to what a human face when reading
  a hard copy of a newspaper article, without internet accessor outside knowledge of the subject
  (versus reading something online where he can simply look up relevant sources). The machine,
  like the human in the coffee shop, will have only access to the words in the article and must use
  strategies that do not rely on black lists of authors and sources.
  The current project involves utilizing machine learning and natural language Processing tech-
  niques to create a model that can expose documents that are, with high probability, fake news art-
  icles. Many of the current automated approaches to this problem are centred around a “blacklist”
  of authors and sources that are known producers of fake news. But what about when the author is
  unknown or when fake news is published through a generally reliable source? In these cases it is
  necessary to rely simply on the content of the news article to make a decision on whether or not
  it is fake. By collecting examples of both real and fake news and training a model, it should be
  possible to classify fake news articles with a certain degree of accuracy.
  The goal of this project is to find the effectiveness and limitations of language-based techniques
  for detection of fake news through the use of machine learning algorithms including but not lim-
  ited to convolutional neural networks and recurrent neural networks .
                                                                                                          2
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
         In practice, a lot of sentences convey affect through underlying meaning rather than affect ad-
  jectives for example, the text “My husband just filed for divorce and he wants to take custody of my
  children away from me” certainly evokes strong emotions, but uses no affect keywords, and therefore,
  cannot be classified using a keyword spotting approach.
  Lexical affinity is slightly more sophisticated than keyword spotting as, rather than simply detecting
  obvious affect words, it assigns arbitrary words a probabilistic ‘affinity’ for a particular emotion for
  example, ‘accident’ might be assigned a 75% probability of being indicating a negative affect, as in
  ‘car accident’ or ‘hurt by accident’ These probabilities are usually trained from linguistic corpora.
  Though often outperforming pure keyword spotting, there are two main problems with the approach
  First, lexical affinity, operating solely on the word-level, can easily be tricked by sentences like “I
  avoided an accident” (negation) and “I met my girlfriend by accident” (other word senses) Second,
  lexical affinity probabilities are often biased toward text of a particular genre, dictated by the source
  of the linguistic corpora This makes it difficult to develop a reusable, domain-independent model .
  Statistical methods, such as Bayesian inference and support vector machines, have been popular for
  affect classification of texts. By feeding a machine learning algorithm a large training corpus of af-
  fectively annotated texts, it is possible for the system to not only learn the affective valence of affect
  keywords (as in the keyword spotting approach), but also to take into account the valence of other ar-
  bitrary keywords (like lexical affinity), punctuation, and word co-occurrence frequencies. However,
  traditional statistical methods are generally semantically weak, meaning that, with the exception of
  obvious affect keywords, other lexical or co-occurrence elements in a statistical model have little pre-
  dictive value individually .As a result, statistical text classifiers only work with acceptable accuracy
  when given a sufficiently large text input .So, while these methods may be able to affectively classify
  user’s text on the page- or paragraph- level, they do not work well on smaller text units such as sen-
  tences or clauses .
         The rapid growth of opinion sharing on social media has led to an increased interest in senti-
  ment analysis of social media texts. Sentiment Analysis can provide invaluable insights ranging from
  product reviews to capturing trending topics to designing business models for targeted advertise-
  ments. Many organizations today rely heavily on sentiment analysis of social media texts to monitor
                                                                                                          3
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
  the performance of their products and take the user feedback into account while upgrading to newer
  versions.
                                                                                                4
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
          Social media texts are informal with several linguistic differences. In multilingual societies
  like India, users generally combine the prominent language, like English, with their native languages.
  This process of switching texts between two or more languages is referred to as code-mixing. Mil-
  lions of internet users in India communicate by mixing their regional languages with English which
  generates enormous amount of code-mixed social media texts. One of such popular combinations is
  the mixing of Hindi and English, resulting in Hindi-English (Hi-En) code-mixed data. For example,
  “yeh gaana bohut super hai”(this song very super is), meaning “this is a superb song”, is a Hi-En
  code-mixed text.
          Apart from several existing challenges such as the presence of multiple entities in the text and
  sarcasm detection, code-mixing brings with it many other unique challenges. The linguistic complex-
  ity of code-mixed content is compounded by the presence of spelling variations, transliteration and
  non- adherence to formal grammar. Along with diverse sentence constructions, words in Hindi can
  have multiple variations when written in English which leads to a large amount of sparse and rare
  tokens. For instance, “pyaar” (love) can be written as “peyar”, “pyar”, “piyar”, “piyaar”, or “py-
  aarrrr”, etc.
          Code-mixing is a well-known problem in the field of NLP. Researchers have put in efforts for
  language identification, POS tagging and Named Entity Recognition of code-mixed data .Over the
  past years, researchers have established deep neural network based state-of-the-art models for senti-
  ment analysis in English data. For the problem of sentiment analysis of Hi-En code-mixed data, sub-
  word level representations in LSTM have shown promising results . However, since the code-mixed
  data is noisy in nature and the available datasets are smaller in size to tune deep learning models, we
  hypothesize that n-gram based traditional models should be able to assist deep learning based models
  in improving the overall accuracy of sentiment analysis in code-mixed data
          In this project, we propose an ensemble model where we combine the outputs of character- tri-
  grams based LSTM model and word n-gram based MNB model to predict the sentiment of Hi- En-
  code-mixed texts. While the LSTM model encodes deep sequential patterns in the text, MNB captures
  low-level word combinations of keywords to compensate for the grammatical inconsistencies.
                                                                                                      5
Department of MCA, VIIT
      Detecting fake news using machine learning and deep learning algorithms
Disadvantages:
In the proposed system, each news goes through tokenization process first. Then, unnecessary words are re-
moved and candidate feature words are generated. Each candidate feature words are checked against the dic-
tionary and if its entry is available in the dictionary then its frequency is counted and added to the column in
the feature vector that corresponds the numeric map of the word. Alongside with counting frequency, the
length of the review is measured and added to the feature vector. Finally, sentiment score which is available
in the data set is added in the feature vector. We have assigned negative sentiment as zero valued and posit-
ive sentiment as some positive valued in the feature vector. The system is very fast and effective due to
semi- supervised and supervised learning. Focused on the content of the review based approaches. As fea-
ture we have used word frequency count, sentiment polarity and length of review
                                                                                                        6
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
Advantages:
             The system is very fast and effective due to semi-supervised and supervised learning.
             Focused on the content of the review based approaches. As feature we have used word
             frequency count, sentiment polarity and length of review.
                                                                                                     7
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
CHAPTER-2
LITERATURE SURVEY
                                                                               8
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
          While there are some existing applications like BS Detector and Politifact which to some
  extent help users to identify misleading news but it requires human intervention and also the do-
  main is limited in case of BS Detector which does not give the user the extent of any article to be
  fake.
          They are using linguistic cues approaches and network analysis approaches to design a
  basic fake news detector which provides high accuracy in terms of classification tasks. They pro-
  pose a hybrid system whose features like multi-layer linguistic processing, the addition of net-
  work behavior are included. They propose a method to detect online deceptive test by using a lo-
  gistic regression classifier which is based on POS tags extracted from a corpus deceptive and
  truthful texts and achieves an accuracy of 72% which could be further improved by performing
  cross-corpus analysis of classification models and reducing the size of the input feature vector.
2.1Existing System
          To detect fake news on social media presents a datamining perspective which includes
  fake news characterization on psychology and social theories. This article discusses two major
  factors responsible for widespread acceptance of fake news by the user which are Naive Realism
  and Confirmation Bias. Further, it proposes a two-phase general datamining framework which
  includes
  1) Feature Extraction and 2) Model Construction and discusses the datasets and evaluation met-
  rics for the fake news detection research. They propose an SVM-based algorithm with 5 predict-
  ive features i.e. Absurdity, Humour, and Grammar, Negative Affect, and Punctuation and uses
  satirical cues to detect misleading news. The paper translates theories of humor, irony, and satire
  into a predictive model for satire detection with 87% accuracy.
          The purpose of this paper is to propose a new model for fake news detection which is us-
  ing Stance Detection and IF-TDF method for analyzing the data which is taken from various
  datasets of fake and legitimate news and Random Forest classifier for classifying the output into
  four classes namely: True, Fake, Mostly True, and Mostly Fake. Using Random Forest gives us
  an advantage of handling binary features and moreover, they do not expect linear features.
                                                                                                        9
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
2.2Proposed System
         In [1], Shloka Gilda presented concept approximately how NLP is relevant to stumble on
  fake information. They have used time period frequency-inverse record frequency (TF- IDF) of
  bi-grams and probabilistic context free grammar (PCFG) detection. They have examined their
  dataset over more than one class algorithms to find out the great model.
          They locate that TF-IDF of bi-grams fed right into a Stochastic Gradient Descent model
  identifies non-credible resources with an accuracy of 77.2%.
         In [2], Mykhailo Granik proposed simple technique for fake news detection the usage of
  naive Bayes classifier. They used BuzzFeed news for getting to know and trying out the Naïve
  Bayes classifier. The dataset is taken from facebook news publish and completed accuracy up to
  74 % on test set.
         In [3] Cody Buntain advanced a method for automating fake news detection on Twitter.
  They applied this method to Twitter content sourced from BuzzFeed’s fake news dataset. Fur-
  thermore, leveraging non-professional, crowdsourced people instead of journalists presents a be-
  neficial and much less costly way to classify proper and fake memories on Twitter rapidly.
         In [4], Marco L. Della offered a paper which allows us to recognize how social networks
  and gadget studying (ML) strategies may be used for faux news detection .They have used novel
  ML fake news detection method and carried out this approach inside a Facebook Messenger
  chatbot and established it with a actual-world application, acquiring a fake information detection
  accuracy of eighty one.7%.
         In [5], Rishabh Kaushal carried out 3 getting to know algorithms specifically Naive
  Bayes, Clustering and Decision bushes on some of features such astweet-degree and consumer-
  level like Followers/Followees, URLs, SpamWords, Replies and HashTags. Improvement of un-
  solicited mail detection is measured on the premise of general Accuracy, Spammers Detection
  Accuracy and Non- Spammers Detection Accuracy.
                                                                                                       10
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
           In [6], Saranya Krishnan used superior framework to identify faux information contents.
  Initially, they've extracted content material capabilities and consumer functions via Twitter API.
           Then functions together with statistical analysis of twitter user accounts, reverse picture
  searching, verification of fake news assets are used by facts mining algorithms for class and ana-
  lysis.
           [7] Detecting fake news through various machine learning models. The given machine
  learning models implemented are naïve Bayes classifier and support vector machine. No specific
  accuracy was recorded as only the models were discussed.
           [8] to detect whether the given Tweets are credible or not. The machine learning model
  implemented are naïve Bayes classifier, decision trees, Support vector machines and neural net-
  works. With both tweet and user features, the best F1 score is 0.94. Higher accuracy could have
  been attained by considering non-credible news into account.
           [9] Method for automating fake news detection on Twitter by learning to predict accu-
  racy assessments in two credibility-focused Twitter datasets. Accuracy rate of the given models
  are at 70.28%. The main limitation lies in the structural difference CREDBANK and PHEME,
  which could affect model transfer.
           [10] Wang, Guan and Xie, Sihong and Liu, Bing and Philip, S Yu, Review graph based
  online store review spammer detection, 2011 used the Bayesian approach and laid out a cluster-
  ing problem with opinion spam sensing.
           [11] Sun, Chengai and Du, Qiaolin and Tian, Gang, Exploiting product related review
  features for fake review detection, Mathematical Problems in Engineering, (2016). although all
  used the Support Vector Machine (SVM) as a classification, are other literature pieces that also
  have taken supervised learning
                                                                                                         11
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
CHAPTER-3
                                                                               12
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
                                                                               13
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
                                                                               14
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
Proposed Models
         3.1 N-gram Model: N-gram modeling is a popular feature identification and analysis
  approach used in language modeling and Natural language processing fields. N-gram is a con-
  tiguous sequence of items with length n. It could be a sequence of words, bytes, syllables, or
  characters. The most used n-gram models in text categorization are word-based and character-
  based n-grams. In this work, we use word-based n-gram to represent the context of the document
  and generate features to classify the document. We develop a simple n-gram based classifier to
  differentiate between fake and honest news articles. The idea is to generate various sets of n-
  gram frequency profiles from the training data to represent fake and truthful news articles. We
  used several baseline n-gram features based on words and examined the effect of the n-gram
  length on the accuracy of different classification algorithms.
         3.2 Data Pre-processing: Before representing the data using n-gram and vector-based
  model, the data need to be subjected to certain refinements like stop-word removal, tokenization,
  a lower casing, sentence segmentation, and punctuation removal. This will help us reduce the
  size of actual data by removing the irrelevant information that exists in the data. We created a
  generic processing function to remove punctuation and non-letter characters for each document;
  then we lowered the letter case in the document. In addition, an n-gram word based tokenizer
  was created to slice the text based on the length of n. Stop Word Removal Stop words are in-
  significant words in a language that will create noise when used as features in text classification.
  These are words commonly used a lot in sentences to help connect thought or to assist in the sen-
  tence structure. Articles, prepositions and conjunctions and some pronouns are considered stop
  words. We removed common words such as, a, about, an, are, as, at, be, by, for, from, how, in, is,
  of, on, or, that, the, these, this, too, was, what, when, where, who, will, etc. Those words were re-
  moved from each document, and the processed documents were stored and passed on to the next
  step. Stemming After tokenizing the data, the next step is to transform the tokens into a standard
  form. Stemming simply is changing the words into their original form,and decreasing the number
  of word types or classes in the data. For example, the words “Running”, “Ran” and “Runner”
  will be reduced to the word “run.” We use stemming to make classification faster and efficient.
  Furthermore, we use Porter stemmer, which is the most commonly used stemming algorithms
                                                                                                          15
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
                                                                               16
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
         Term Frequency (TF): Term Frequency is an approach that utilizes the counts of words
  appearing in the documents to figure out the similarity between documents. Each document is
  represented by an equal length vector that contains the words counts. Next, each vector is nor-
  malized in a way that the sum of its elements will add to one. Each word count is then converted
  into the probability of such word existing in the documents. For example, if a word is in a certain
  document it will be represented as one, and if it is not in the document, it will be set to zero.
  Thus, each document is represented by groups of words.
         3.4 Classification Process: It starts with preprocessing the data set, by removing unnec-
  essary characters and words from the data. N-gram features are extracted, and a features matrix is
  formed representing the documents involved. The last step in the classification process is to train
  the classifier. We investigated different classifiers to predict the class of the documents. We in-
  vestigated specifically six different machine learning algorithms, namely, Stochastic Gradient
  Descent (SGD), Support Vector Machines (SVM), Linear Support Vector Machines (LSVM), K-
                                                                                                         17
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
  these classifiers from the Python Natural Language Toolkit (NLTK). We split the dataset into
  training and testing sets. For instance, in the experiments presented subsequently, we use 5- fold
  cross validation, so in each validation around 80% of the dataset is used for training and 20% for
  testing.
     RAM:4GB
     (min)
   SOFTWARE REQUIREMENTS:
  Operatingsystem :Window8
                                                                                                       18
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
                                         Chapter-4
                                     ALGORITHMS
                                                                               19
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
                                                                                                          20
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
         4.3 Evaluation Metrics: To evaluate the performance of algorithms for fake news
  de-tection problem, various evaluation metrics have been used. In this subsection, we review the
  most widely used metrics for fake news detection. Most existing approaches consider the fake
  news problem as a classification problem that pre-dicts whether a news article is fake or not:
                                                                                                        21
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
• True Positive (TP): when predicted fake news pieces are actually annotated as fake news;
•True Negative (TN): when predicted true news pieces are actually annotated as true news;
•False Negative (FN): when predicted true news pieces are actually annotated as fake news;
•False Positive (FP): when predicted fake news pieces are actually annotated as true news.
          These metrics are commonly used in the machine learning community and enable us to
  evaluate the performance of a classifier from different perspectives. Specifically, accuracy meas -
  ures the similarity between predicted fake news and real fake news. Precision measures the frac-
  tion of all detected fake news that are annotated as fake news, addressing the important problem
  of identifying which news is fake. How- ever, because fake news datasets are often skewed, a
  high precision can be easily achieved by making fewer positive predictions. Thus, recall is used
  to measure the sensitivity, or the fraction of annotated fake news articles that are predicted to be
  fake news. F1 is used to combine precision and recall, which can provide an overall prediction
  performance for fake news detection. Note that for Precision, Recall, F1, and Accuracy, the
  higher the value, the better the performance.
                                                                                                         22
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
CHAPTER-5
METHODOLOGY
                                                                               23
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
5.1Sentence-Level Baselines
         I have run the baselines described in namely multi-class classification done via logistic
  regression and support vector machines. The features used were n-grams and TF-IDF. N-grams
  are consecutive groups of words, up to size “n”. For example, bi-grams are pairs of words seen
  next to each other. Features for a sentence or phrase are created from n-grams by having a vector
  that is the length of the new “vocabulary set,” i.e. it has a spot for each unique n-gram that re-
  ceives a 0 or 1 based on whether or not that n-gram is present in the sentence or phrase in ques-
  tion. TF-IDF stands for term frequency inverse document frequency. It is a statistical measure
  used to evaluate how important a word is to a document in a collection or corpus. As a feature,
  TF-IDF can be used for stop-word filtering, i.e. discounting the value of words like “and,”,
  “the”, etc. whose counts likely have no effect on the classification of the text. An alternative ap -
  proach is removing stop- words (as defined in various packages, such as Pythons NLTK). The
  results for this preliminary evaluation are found in Table 6.1
                                                                                                          24
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
         Additionally, we explored some of the characteristic n-grams that may influence Logistic
  Regression and other classifiers. In calculating the most frequent n-grams for “pants-fire” phrases
  and those of “true” phrases, we found that the word “wants” more frequently appears in “pants-
  fire” (i.e. fake news) phrases and the phrase “states” more frequently appears in “true” (i.e. real
  news) phrases. Intuitively this makes sense because it is easier to lie about what a politician
  wants than to lie about what he or she has stated since the former is more difficult to confirm.
  This observation motivates the experiments in Section 6.2 which aim to find a set of similarly in-
  tuitive patterns in the body texts of fake news and real news articles.
5.2 Document-Level
         Deep neural networks have shown promising results in NLP for other classification tasks
  such as CNNs are well suited for picking up multiple patterns, and sentences do not provide
  enough data for this to be useful. However, a CNN baseline modelled off of the one described
  for NLP in did not show a large improvement in accuracy on this task using the Liar Dataset.
  This is due to the lack of context provided in sentences. Not surprisingly, the same CNN per-
  formance on the full body text datasets we created was much higher.
         The nature of this project was to decide if and how machine learning could be useful in
  detecting patterns characteristic of real and fake news articles. In accordance with this purpose,
  we did not attempt to build deeper and better neural nets in order to improve performance, which
  was already much higher than expected. Instead, we took steps to analyze the most basic neural
  net. We wanted to learn what patterns it was learning that resulted in such a high accuracy of be-
  ing able to classify fake and real news.
        If a human were to take on the task of picking out phrases that indicate fake or real news,
  they may follow guidelines such as those in this and similar, guidelines often encourage readers
  to look for evidence supporting claims because fake news claims are often unbacked by evid-
  ence. Likewise, these guidelines encour- age people to read the full story, looking for details that
  seem “far-fetched.” Figures
                                                                                                         25
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
  5.1 and 5.2 show examples of the phrases a human might pick up on to decide if an article is fake
  or real news. We were curious to see if a neural net might pick up on similar patterns.
Figure 5.1: Which trigrams might a human find indicative of real news?
                                                                                                      26
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
Figure 5.2: Which trigrams might a human find indicative of fake news?
  The best way to do this was to simplify the network so that it had only one filter size. The net-
  work in was tuned to learn filter sizes 3, 4, and 5. With guidelines often encourage readers to
  look for evidence supporting claims because fake news claims are often unbacked by evidence.
        This intricacy, the model was able to learn overlapping segments. For example, the 4-gram
  “Donald Trumps presidential election” could be learned in addition to the trigrams “Donald
  Trumps presidential” and “Trumps presidential election”. To avoid this overlapping, we simpli-
  fied the network to only look at filter size 3, i.e. trigrams. We found that this did not cause a sig-
  nificant drop in accuracies; there was less than one half percent decrease in accuracy from the
  model with filter sizes = [3,4,5] to the model with filter sizes = [3]. We limited the data to 1000
  words because less than ten percent of the data was over this limit and found most of the time the
  article was longer than 1000 words it contained excess information at the end that was not relev-
  ant to the article itself. For example, lengthy ads were sometimes found at the end of articles,
  causing them to go over 1000 words. There were no noticeable drops in accuracy across trials
  when we restricted the document length to 1000 words.
        In order to obtain the trigrams that were most important in the classification decision, we
  essentially had to back-propagate from the output layer to the raw data (i.e. actual body text be-
  ing classified), as seen in Figures 5.3,5.4, 5.5 and 5.6.We did this in a manner similar to for any
  text being evaluated by the CNN, we can find the trigrams that were “most fake” and “most real”
  by looking at the weighti × activationi for each of the individual neuron, i, when that text was
  evaluated. I will explain the process for finding the most real trigrams, and the same process can
  be used to find the most fake trigrams. The only difference is which column of the 2-columns in
  each layer you choose to look at. The first step in this process is looking at the max pool layer
  where you will find a down sampled version of the convolutional layer (See Figure 6.4) Each of
  the 128 values are selected as the max of 998 values in the previous layer. Due to the dropout
  probability, we expect that a different pattern will cause the highest activation for each of these
                                                                                                           27
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
  neurons. i, weighti × activationi for that text. Therefore, we can select the neurons with the
  highest (most positive) As such, the max-pool layer represents the value of the trigram that was
  closest to this pattern, and made the neurons activation the highest. Each value in the max-pool
  layer is representative of the neuron, weighti] × activationi to ultimately find the “most real” tri-
  grams or we can select the neurons with the lowest (most negative) weight + i × activationi to ul-
  timately find the “least real” trigrams. Dimension in the output of the convolutional layer with
  ReLU function applied.
        Now, we have 998 values to look at. One of these values was chosen to be the max-pooled
  value, so we must look at all of them and find the match. Once we find the matching number, we
  have its index. Its index is representative of the trigram index in the original text. So if the index
  is 0, we look at the first trigram (words at indices 0,1, and 2) and if the index is 1, we look at the
  second trigram (words at indices 1, 2 and 3).
  Figure 5.3: The output layer of the CNN where the higher value indicates the final classification
  of the text
                                                                                                           28
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
Figure 5.4: Step 1: The Max Pool Values have the weighti* activationi for each of the neurons i.
                                                                                                   29
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
Figure 5.5: Step 2: Find the index of the max pooled value from Step 1 in the convolutional layer.
                                                                                                    30
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
  Figure 5.6: Step 3: The index in convolutional layer found in Step 2 represents which of the 998
  trigrams caused the max pooled values from Step 1. Use that same index to find the correspond-
  ing trigram.
                                                                                                     31
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
5.3Topic Dependency
         As we suspected from the makeup of the dataset which can be seen from 6.7 which
  demonstrates a general overview of the makeup of both of the datasets, there is a significant dif-
  ference in the subjects being written about in fake news and real news, even in the same time
  range with the same current events going up. More specifically, you can see that the concentra-
  tion of articles that involve “Hillary”, “Wikileaks”, and “republican” is higher in Fake News than
  it is in real news. This is not to say that these words did not appear in real news, but they were not
  some of the “most frequent” words there. Additionally, words like football and “love” appear
  very frequently in the real news dataset, but these are topics that you can imagine would not be
  written about, or rarely be written about, in fake news. The “hot topics” of fake news present an-
  other issue in this task. We do not want a model that simply chooses a classification based on the
  probability that a fake or real news article would be written on that topic just like we would never
  tell a person that every article written about Hillary is fake news or every article written about
  love is real news.
        The way we accounted for these differences in the dataset was by separating our training
  set and tests sets on the presence/absence of certain words. We tried this for a number of topics
  that were present in both fake news and real news but had different proportions in the two cat-
  egories. The words we chose were “Trump”, “election”, “war”, and “email.”
        To create a model that was not biased about the presence of one of these words, we extrac-
  ted all body texts which did not contain that word. We used this set as the training set. Then, we
  used the remaining body texts that did contain the target word as the test set. The accuracy of the
  model on the test set represents transfer learning in the sense that the model was trained on a
  number of articles about topics other than the target word and had to use what it learned to clas-
  sify texts about the target word. The accuracies were still quite high, as demonstrated in section
  5. This shows that the model was learning patterns of language other than those specific words.
  This could mean that it learned similar words because of the word embeddings or it could mean
  that it learned completely different words to “pay attention” to, or both.
                                                                                                           32
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
5.4Cleaning
         Pre-processing data is a normal first step before training and evaluating the data using a
  neural network. Machine learning algorithms are only as good as the data you are feeding them.
  It is crucial that data is formatted properly and meaningful features are included in order to have
  sufficient consistency that will result in the best possible results. For computer vision machine
  learning algorithms, pre-processing the data involves many steps including normalizing image
  inputs and dimensionality reduction. The goal of these is to take away some of the unimportant
  distinguishing features between different images. Features like the darkness or brightness are .
                                                                                                        33
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
          The task of pre-processing data is often an iterative task rather than a linear one. This was
  the case in this project where we used a new and not yet standardized dataset. As we found cer-
  tain unmeaningful features that the neural net was learning, we learned what more we needed to
  pre- process from the data.
          Two observations that lead us to more pre-processing were the presence of run-on words
  and proper nouns in the most important trigrams for classification. An example of a run on word
  that we saw frequently was in the “most fake” trigram category was “Not MyPresident” that
  came from a trending “hashtag” on twitter. There were also decisive trigrams that were simply
  pronouns like “Donald J Trump.” Proper nouns could not possibly be helpful in a meaningful
  way to a machine learning algorithm trying to detect language patterns indicative of real or fake
  news. We want our algorithm to be agnostic to the subject material and make a decision based on
  the types of words used to describe whatever the subject is. Another algorithm may aim to fact
  check statements in news articles. In this situation, it would be important to maintain the proper
  nouns/subjects because changing the proper noun in the sentence “Donald J. Trump is our current
  president” to “Hillary Clinton is our current president” changes the classification of true fact to
  false fact. However, our purpose is not fact checking but rather language pattern checking, so re-
  moval of proper nouns should aid in pointing the machine learning algorithms in the right direc-
  tion as far as finding meaningful features.
                                                                                                          34
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
         Another observation was that the two real news sources had some specific patterns that
  were easily learnable by the machine learning algorithms. This was more of an issue with the real
  news sources than the fake news sources because there were many more fake news sources than
  real news sources. More specifically, there were 244 fake news sources and only 128 neurons so
  the algorithm could not simply attune one neuron to each of the fake news sources patterns.
  There were only two real news sources, however. Therefore, the algorithm was able to pick up
  easily on the presence or absence of these patterns and use that, without much help from other
  words or phrases, to classify the data.
        There were a few separate steps in removing patterns from the real news sources. The New
  York Times articles of a particularly common section often started off with “Good morning. (or
  evening) Here’s what you need to know:” This, along with other repeated sentences were always
  in italics. To account for the lack of consistency in the exact sentences that were repeated, we had
  to scrape the data again from the URLs and remove anything that was originally in italics. An-
  other repeated pattern in the New York Times articles was parenthetical questions with links to
  sign up for emails, for example “Want to get California Today by email? Sign up. Another pat-
  tern was in The Guardian, articles almost always ended with “Share on Facebook Share on Twit-
  ter Share via Email Share on LinkedIn Share on Pinterest Share on Google+ Share on WhatsApp
  Share on Messenger Reuse this content” which is the result of links/buttons on the bottom of the
  webpage to share the article. When removing the non-English words, we were left with “on on
  on on on this content” which was enough of a pattern to force the model to learn classification al-
  most solely based on its presence.
        Note that this was a particularly strong pattern because it was consistent throughout the
  Guardian articles from a ence or absll sections of the Guardian. Also, the majority of articles in
  our real news set are from the Guardian.
                                                                                                         35
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
        Although the accuracy was high in the classification task even after extensive pre- pro-
  cessing of the data, we wanted a way to more qualitatively evaluate how and what the neural net
  was learning the classification. Understanding and visualizing the way a CNN encodes informa-
  tion is an ongoing question. It is an infinitely more challenging pattern when there are more than
  one convolutional layer, which is why we kept our neural net shallow. For CNNs with one convo-
  lutional layer shows a way to visualize any CNN single neuron as a filter in the first layer, in
  terms of the image space. We were able to use a similar method to “visualize” the CNN neurons
  as filters in the first (and only) layer in terms of text space.
        Instead of finding the location in each image of the window that caused each neuron to fire
  the most, we find the location in the pre-processed text of the trigram (or length 3 sequence of
  words) that caused each neuron to fire the most. As the authors of were able to identify patterns
  of colors/lines in images that caused firing, we were able to identify textual patterns that caused
  firing. Textual patterns are more difficult to visualize than image space patterns. While similar
  but non- identical RGB pixel values look similar, two words that are mathematically “similar” in
  their embedding but non-identical do not look similar. They do, however, have similar meanings.
        In order to get a general grasp of the meaning of words/trigrams that each neuron was fir-
  ing most highly for, we followed similar steps to those described in the section of 6.2.1 However,
  instead of finding those neurons that had the highest/lowest weight × activation, we looked at
  each neuron, and which trigram in each body text resulted in the pooled value for that neuron.
  Then, we accumulated all of the trigrams for each neuron and summarized them by counting the
  instances of each word in the trigram. Our algorithm reported the words with the highest counts,
  excluding stop words as described by NLTK (i.e. words like “the”, “a”, “by”, “it”, which are not
  meaningful in this circumstance).
                                                                                                        36
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
                                         CHAPTER-6
                                       SAMPLE CODE
                                                                               37
Department of MCA, VIIT
      Detecting fake news using machine learning and deep learning algorithms
6.0 CODE:
# Include Libraries
import pandas as pd
print(pd. version )
import sklearn
                                                                                38
Department of MCA, VIIT
      Detecting fake news using machine learning and deep learning algorithms
import numpy as np
import re
import csv
import nltk
df = pd.read_csv("fake_or_real_news.csv")
df.shape
# Set index
df = df.set_index("Unnamed: 0")
                                                                                39
Department of MCA, VIIT
       Detecting fake news using machine learning and deep learning algorithms
print(df)
y = df.label
  df.drop("label", axis=1)          #where numbering of news article is done that column is dropped in
  dataset
  X_train, X_test, y_train, y_test = train_test_split(df['text'], y, test_size=0.33, random_state=53)
writer.writerow([line])
writer.writerow([line])
writer.writerow([line])
                                                                                                         40
Department of MCA, VIIT
      Detecting fake news using machine learning and deep learning algorithms
writer.writerow([line])
count_vectorizer = CountVectorizer(stop_words='english')
count_test = count_vectorizer.transform(X_test)
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
tfidf_test = tfidf_vectorizer.transform(X_test)
n_train = n_vect.fit_transform(X_train)
n_test = n_vect.transform(X_test)
                                                                                               41
Department of MCA, VIIT
      Detecting fake news using machine learning and deep learning algorithms
plt.title(title) plt.-
colorbar()
plt.yticks(tick_marks, classes)
if normalize:
     else:
     print('Confusion matrix, without normalization')
     thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]),
range(cm.shape[1])):
horizontalalignment="center",
bel('True label')
plt.xlabel('Predicted label')
plt.show()
                                                                                                        42
Department of MCA, VIIT
       Detecting fake news using machine learning and deep learning algorithms
pred = clf.predict(xtest)
print(cm)
ac.append(score)
def Logreg(xtrain,ytrain,xtest,ytest,ac):
i=1
logreg=LogisticRegression(C=9) lo-
greg.fit(xtrain,ytrain)
pred = logreg.predict(xtest)
                                                                                             43
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
plot_confusion_matrix(cm,classes=['FAKE','REAL'],title='ConfusionmatrixLogisticRegressio’)
print(cm) ac.ap-
pend(score)
    def RForest(xtrain,ytrain,xtest,ytest,ac):
     clf1=RandomForestClassifier(max_depth=50,random_state=0,n_estimators=25)
clf1.fit(xtrain,ytrain)
pred = clf1.predict(xtest)
print(cm) ac.ap-
pend(score)
def SVM(xtrain,ytrain,xtest,ytest,ac):
clf3.fit(xtrain, ytrain)
pred = clf3.predict(xtest)
                                                                                                44
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
cm=metrics.confusion_matrix(ytest,pred,labels=['FAKE','REAL'])
print(cm)
ac.append(score)
def process(xtrain,ytrain,xtest,ytest,ac):
NaiveBayes(xtrain,ytrain,xtest,ytest,ac)
RForest(xtrain,ytrain,xtest,ytest,ac)
SVM(xtrain,ytrain,xtest,ytest,ac)
greg(xtrain,ytrain,xtest,ytest,ac)
plode = (0.1, 0, 0, 0, 0)
al=["NaiveBayes","Random Forest","SVM","LogisticRegres-
sion"] cac=[]
tac=[]
nac=[]
                                                                                                      45
Department of MCA, VIIT
      Detecting fake news using machine learning and deep learning algorithms
process(count_train,y_train,count_test,y_test,cac)
print(cac)
i in range(0,len(cac)):
print(al[i]+","+str(cac[i]))
result2.close()
fig = plt.figure(0)
df = pd.read_csv('CountAccuracy.csv')
acc = df["Accuracy"]
alc = df["Algorithm"]
gorithm')
plt.ylabel('Accuracy')
fig.savefig('CountAccuracy.png')
plt.show()
process(tfidf_train,y_train,tfidf_test,y_test,tac)
print(tac)
                                                                                46
Department of MCA, VIIT
      Detecting fake news using machine learning and deep learning algorithms
result2=open('TfidfAccuracy.csv', 'w')
print(al[i]+","+str(tac[i]))
result2.write("Algorithm,Accuracy" + "\n")
for i in range(0,len(tac)):
df = pd.read_csv('TfidfAccuracy.csv')
acc = df["Accuracy"]
alc = df["Algorithm"]
plt.xlabel('Algorithm')
plt.ylabel('Accuracy')
fig.savefig('TfidfAccuracy.png')
plt.show()
process(n_train,y_train,n_test,y_test,nac)
print(nac)
i in range(0,len(nac)):
     print(al[i]+","+str(nac[i]))
                                                                                47
Department of MCA, VIIT
      Detecting fake news using machine learning and deep learning algorithms
fig = plt.figure(0)
ult2.close()
df = pd.read_csv('NgramAccuracy.csv')
acc = df["Accuracy"]
alc = df["Algorithm"]
plt.xlabel('Algorithm')
plt.ylabel('Accuracy')
fig.savefig('NgramAccuracy.png')
plt.show()
                                                                                48
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
CHAPTER-7
                                                                               49
Department of MCA, VIIT
      Detecting fake news using machine learning and deep learning algorithms
  7.1 OUTPUT:
  >>>
1.0.1
Unnamed: 0
875 The Battle of New York: Why This Primary Matters ... REAL
accuracy: 0.892
                                                                                        50
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
[879 129]
[ 96 987]
accuracy: 0.857
[892 116]
[184 899]]
curacy: 0.521
tion [1006 2]
[ 999 84]]
accuracy: 0.902
[939 69]
[135 948]]
NaiveBayes,0.8923959827833573
                                                                                     51
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
Random Forest,0.8565279770444764
SVM,0.5212816834050693
LogisticRegression,0.9024390243902439
curacy: 0.903
[[891 117]
[ 85 998]]
accuracy: 0.868
[888 120]
[157 926]]
curacy: 0.937
[ 959 49]
[ 82 1001]]
accuracy: 0.931
                                                                               52
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
[[965 43]
[101 982]]
NaiveBayes,0.9033955045432808
Random Forest,0.8675274988043998
SVM,0.937350549976088
LogisticRegression,0.9311334289813487
curacy: 0.912
[[ 904 104]
[ 80 1003]]
accuracy: 0.864
[[910 98]
[186 897]]
curacy: 0.519
                                                                                    53
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
tion [[1008 0]
[1006 77]]
accuracy: 0.917
[[949 59]
[114 969]]
NaiveBayes,0.9120038259206121
Random Forest,0.8641798182687709
SVM,0.5188904830224773
LogisticRegression,0.9172644667623147
DETERMING OUTPUT:
  confusion matrix: A confusion matrix is a table that is often used to        describe the perform-
  ance of a classification model (or “classifier”) on a set of test data for which the true values are
  known. It allows the visualization of the performance of an algorithm.
                                                                                                         54
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
Classification Rate/Accuracy:
                                                                                   55
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
Fig 7.2.1 Confusion matrix of Naïve bayes Fig 7.2.2 Confusion matrix of Radom forest
Fig 7.2.3 Confusion matrix of SVM fig 7.2.4 Confusion matrix of logistic regression
                                                                                                  56
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
                                                                               57
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
Fig 7.4.1 Confusion matrix of Naïve bayes Fig 7.4.2 Confusion matrix of Radom forest
Fig 7.4.3 Confusion matrix of SVM fig 7.4.4 Confusion matrix of logistic regression
                                                                                                  58
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
                                                                               59
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
Fig 7.4.1 Confusion matrix of Naïve bayes Fig 7.4.2 Confusion matrix of Radom forest
Fig 7.5.3 Confusion matrix of SVM fig 7.5.4 Confusion matrix of logistic regression
                                                                                                  60
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
                                                                               61
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
                                       CHAPTER-8
                                      CONCLUSION
                                                                               62
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
8. CONCLUSION:
         In the 21st century, the majority of the tasks are done online. Newspapers who were
  earlier preferred as hard-copies are now being substituted by applications like Facebook,
  Twitter, and news articles to be read online. The growing problem of fake news only makes
  things more complicated and tries to change or hamper the opinion and attitude of people to-
  wards use of digital technology. When a person is deceived by the real news two possible
  things happen.People start believing that their perceptions about a particular topic are true as
  assumed. Another problem is that even if there is any news article available which contradicts
  a supposedly fake one, people believe in the words which just support their thinking without
  taking in the measure the facts involved. Thus, in order to curb the phenomenon, Google and
  Facebook.
                                                                                                     63
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
                                     CHAPTER-9
                                    REFERENCES
                                                                               64
Department of MCA, VIIT
     Detecting fake news using machine learning and deep learning algorithms
  REFERENCES:
      1. Conroy, Niall & Rubin, Victoria & Chen, Yimin. (2015). Automatic Deception De-
         tection: Methods for Finding Fake News. USA
      2. Ball, L. & Elworthy, J. J Market Anal (2014) 2: 187. https://doi.org/10.1057/jma.2014.15
      3. Lu TC. Yu T., Chen SH. (2018) Information Manipulation and Web Credibility. In:
         Bucciarelli E., Chen SH., Corchado J. (eds) Decision Economics: In the Tradition of
         Herbert A. Simon's Heritage. DCAI 2017. Advances in Intelligent Systems and Com-
         puting, vol 618. Springer, Cham
      4. Rubin, Victoria & Conroy, Niall & Chen, Yimin & Cornwell, Sarah. (2016). Fake
         News or Truth? Using Satirical Cues to Detect Potentially Misleading
         News.10.18653/v1/W160802.
      5. 0. Wang, W.Y.: Liar, Liar Pants on fire: a new Benchmark dataset for fake news detection.
      6. Rubin., Victoria, L., et al.: Fake news or truth? Using satirical cues to detect poten-
         tially misleading news. In: Proceedings of NAACL-HLT (2016).
      7. Shivam B. Parikh and Pradeep K. Atrey, “Media-Rich Fake News Detection: A Survey”,
      8. Hunt Allcott and Matthew Gentzkow. Social mediaand fake news in the 2016 elec-
         tion. Technical report,National Bureau of Economic Research, 2017.
                                                                                                   65
Department of MCA, VIIT