FAKE NEWS DETECTOR
Muhammad Hassan Ur Rehman, Muhammad Huzaifa, Sufyan A. Siddiqui, Taber Bin Zameer
Abstract--Fake news is propaganda or manipulated                          II. LITERATURE REVIEW
news that is spread across the internet with an
objective to damage a person, agency and                    Author in [1] used a dataset from Kaggle to predict their
organization. Most of fake news systems use the             accuracy of detecting fake news. He tested various
linguistic feature of the news. First, we have to take      models like Naïve Bayes, Convolutional Neural
dataset which contain both fake and real news and           Networks (CNN), Long Short-Term Memory (LSTM),
conduct various test to make a fake news detector.          K nearest neighbor and Random Forest. After, testing
We use the techniques of Machine learning and deep          these models for accuracy it was found that CNN model
learning to classify the datasets. This has also drawn      was the most accurate with 98.3% accuracy.
people interest from researchers around the globe to
                                                            The authors in [2] incorporated sentimental analysis to
work on deception detection mechanisms to
                                                            further enhance the accuracy of currently existing
eradicate the problem. The goal is to introduce a
                                                            dataset. A merged dataset was formed from 3 existing
mechanism that automatic, vigorous, trustworthy,
                                                            datasets in order to test the accuracy. It was found that
efficient and accurate. We take a dataset that contain
                                                            tf-idf was the best text processing technique.
both real and fake news and from that we make a
                                                            Furthermore, the accuracy between Naïve Bayes and
classifier that can make decision about fake and real
                                                            Random Forest model was compared and it was found
news. We use two different models Naive Bayes,
                                                            that Random Forest was more accurate.
Passive Aggressive Classifier and LSTM. The best
performing model was LSTM with the most perfect             Author in [3] prepared a Fact Database. This fact
predictions.                                                database contained articles related to various topics. The
                                                            input was checked in the Fact Database and the relevant
                 I. INTRODUCTION
                                                            article was picked. Then they used Bidirectional Multi-
Fake news, by the name it is very small and clearly not     Perspective Matching (BiMPM) to find if the given
understandable but in today’s world where we are            proposition was in the context of the article. The result
surrounded by thousands of lies due to which many           was then based on this calculation.
conflict and rumors arises. This is the area where we
                                                            Author in [4] used a dataset from Buzzfeed News which
believe we must proceed very carefully though.
                                                            contained Facebook posts from three major news
Identifying the truth is complicated. We have developed
                                                            sources. The posts were labelled by the BuzzFeed News
a project that can find out if the news is true or false.
                                                            as true, false, mostly true, mostly false and mixture of
Fake news is becoming threat to the society. It is
                                                            true and false. The dataset was then trained and tested
generated to gather the attraction of viewers and for
                                                            using naïve bayes classifier. Using this approach, they
advertising purposes. We have developed fake news
                                                            got accuracy of about 74.
detector to stop people from lies because from lies hate
is prevailing around the globe. Even though the                             III. METHODOLOGY
difficulty of fake news is not another debate, revealing
false news is accepted to be a confused requirement         a) Platform and Technologies:
hardened that people incline to acknowledge tempting
guideline and the absence of control of the show of             •    Jupyter Notebook
deceiving substance.                                            •    Flask
                                                                •    Docker
                                                                                                                    1
b) Dataset:                                                  works by taking an example, learning from it and then
                                                             throwing it away.
 In order to detect the fake or real news, first we need
actual real or fake data to build our that will detect the
news. We use pre-build data of news labelled fake/real
from Kaggle.
c) Preprocessing:
In this phase, we preprocess the data by removing
spaces, characters, punctuations and by lowering the
data to make it simple so that it can be classified.
                                                                    Confusion Matrix using tf-idf vectorizer
d) Model/Training:
Now selection of model takes place according to
classification. Here we use two models “Naive Bayes”
and “Passive Aggressive Classifier”.
e) Naïve Bayes:
This is one of the simplest approaches to classification,
in which a probabilistic approach is used, with the
assumption that all features are conditionally                      Confusion Matrix using count vectorizer
independent given the class label.
                                                             g) Testing:
                                                              Finally, the testing process comes, in which we test the
                                                             fake/real news. Here we take user input and the model
                                                             predict the news upon the nature of the words used.
                                                             The process is also demonstrated in the diagram below
                                                                               Data flow diagram:
       Confusion Matrix using tf-idf vectorizer
       Confusion Matrix using count vectorizer
f) Passive Aggressive Classifier:
The Passive Aggressive Algorithm is an online
algorithm; ideal for classifying massive streams of data
(e.g. twitter). It is easy to implement and very fast. It
                                                                                                                    2
                   IV. CASE STUDY                                            ACKNOWLEDGMENT
The interface choice and the interaction with the             This project was not have been possible without the
Application is as straight forward as possible without        contribution of all the members, especially Taber and
anything fancy going on. The user is simply provided          Hassan (students of NED University) with their
with the three textboxes.                                     assistance and instructions throughout the project.
                                                              We are also grateful to Ms. Zainab (lecturer at NED
                                                              University) for motivating us to work hard on this
                                                              project for research purposes and for helping us find
                                                              resources.
                                                                                  REFERENCES
                                                              [1] R. K. Kaliyar, "Fake News Detection Using A
                                                                  Deep Neural Network," in 2018 4th International
                                                                  Conference on Computing Communication and
a) URL:                                                           Automation (ICCCA), 2018.
This textbox will be filled with the URL of the news
                                                              [2] B. Bhutani, N. Rastogi, P. Sehgal and A. Purwar,
article that you want to analyze. Users can provide any
                                                                  "Fake News Detection Using Sentiment Analysis".
source of news and it is not limited to just news sites.
b) Title:                                                     [3] K.-h. Kim and C.-s. Jeong, "Fake News Detection
                                                                  System using Article Abstraction".
As the news title is playing a vital part of our training
dataset, we also require the user to enter the title of the   [4] M. Granik and V. Mesyura, "Fake News Detection
said news.                                                        Using Naive Bayes Classifier," in IEEE First
                                                                  Ukraine Conference on Electrical and Computer
c) Content:                                                       Engineering (UKRCON), 2017.
This is the important part of the input. This will provide
                                                              [5] J. Straub, Gurmeet, N. Snell and T. Traylor,
us with the text that we want to proceed and carry out
                                                                  "Classifying Fake News Articles Using Natural
our classification on.
                                                                  Language Processing to Identify In-Article
On pressing the Analyze button, it will show you the              Attribution as a Supervised Learning Estimator,"
results.                                                          in IEEE 13th International Conference on
                                                                  Semantic Computing (ICSC), 2019.
                   V. CONCLUSION:
                                                                                   BIOGRAPHY
Projects like fake news are very rare because fake news
is contending bone for all of us. It is not important that    Muhammad Huzaifa Shuja (SE-093):
the project we made is 100 percent accurate but we can
say that it’s a small contribution to the world from our      Muhammad Huzaifa Shuja is an undergraduate student
side. We face many problems while making this project         of Software Engineering Department of NED University
because it requires a complete dataset for real and fake      of Engineering. He has his interests in electronics and
news. We can say that it is unimplemented project             especially Arduino. In this project he made a significant
because it is not available for our use on the internet but   contribution in making the frontend of the Application.
only a small-scale project is available of this type. We
                                                              Sufyan Ahmed Siddiqui (SE-060):
think it would be a great contribution from our side if we
succeed in making a complete practical version of it          Sufyan Ahmed Siddiqui is an undergraduate student of
                                                              Software Engineering Department of NED University of
                                                                                                                     3
Engineering. He has his certain interests in
programming on C++ and networking. In this project his
most notable work was to develop the front-end with
Huzaifa and connect it with the backend through flask.
Taber Bin Zameer (SE-082):
Taber Bin Zameer is an undergraduate student of
Software Engineering Department of NED University of
Engineering. He has his interests in Machine Learning
and data science. He worked with testing and training
different models.
Muhammad Hassan Ur Rehman (SE-062):
Muhammad Hassan Ur Rehman is an undergraduate
student of Software Engineering Department of NED
University of Engineering. He is a Java Developer and
also has experience working in the field of Artificial
Intelligence. In this project he worked with Taber to get
test and train select the best model for prediction.