0% found this document useful (0 votes)
28 views36 pages

Fake News Detection

Uploaded by

A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views36 pages

Fake News Detection

Uploaded by

A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

A Project Report

On

FAKE NEWS DETECTION

Submitted in partial fulfillment of the requirement of


the award of the degree of

Bachelor of Computer Application


Session 2020 - 2023

Under the Guidance of Submitted By


Mr. Kanchan Raju Name: CHHOTU KUMAR
Roll No.210219016890

Department of Bachelor of Computer Application


ANNADA COLLEGE HAZARIBAGH
1
Certificate for Project
This is to certify that this is a bona fide record of the project work
done satisfactory at “SAI CODING SOLUTION” by CHHOTU KUMAR Roll
No. 210219016890, in partial fulfillment of BCA Examination.
This report or similar report on the topic has not been submitted
for any other examination and doesn’t form part of any other course
undergone by the candidate.

Signature Internal Signature External

2
Certificate for Bonafide
This is to certify that this is a bona fide record of the project work
done satisfactory at “SAI CODING SOLUTION” by CHHOTU KUMAR Roll
No.210219016890, in partial fulfillment of BCA Examination.
This report or similar report on the topic has not been submitted
for any other examination and doesn’t form part of any other course
undergone by the candidate.

Date: MR. KANCHAN RAJU


Place: RANCHI GUIDE, SAI CODING SOLUTION
RANCHI JHARKHAND

3
ACKNOWLEDGEMENT
This thesis work has been an intellectually invigorating experience for
me. I am sure that the knowledge and experience gathered during the
course of this work will make me stand in good stead in future.
With immense pleasure and due respect, I express my sincere
gratitude to (principal), In charge, Annada College Hazaribagh, for all
his support and co-operation in successfully completing this thesis
work by providing excellent facilities.
I am highly grateful to the College Coordinator Dr. Bakshi Sir for her
ever-helping attitude and encouraging us to excel in studies, besides,
she has been a source of inspiration during my entire period of BCA.
I would also like to extend my sincere gratitude to all faculties’
members and staff for helping me in my college during my BCA
course.
I would like to take this opportunity to extend my sincere gratitude and
thanks to my Pioneer Prof. Sanjeev Sir, firstly for coming up with such
an innovative thesis idea. He has not only made us to work but guided
us to orient toward research. It has been real pleasure working under
her guidance and it is chiefly her encouragement and motivation that
has made this thesis a reality.
Last, but not the last I am heartily thankful to almighty God for
showering his blessing forever during my entire life and also to my
family members for providing me a great support.

BCA Annada College Hazaribagh (CHHOTU KUMAR)

4
Declaration by the Candidate
By CHHOTU KUMAR Roll No. 210219016890 hereby declare that the
work, which is being presented in the dissertation, entitled “FAKE
NEWS DETECTION”, in partial fulfillment of requirement for the award
of the degree of “Bachelor Of computer application” submitted in
Vinoba Bhave University, Hazaribagh is an authentic record of our
work carried out under the guidance of Mr. Kanchan Raju.
We have not submitted the matter embodied in this dissertation
for the award of any other degree.

Date: CHHOTU KUMAR


Place: Hazaribagh

5
TABLE OF CONTENTS

S. No Particulars Page No

1 Introduction 7

2 About of Project 9

3 System Analysis 17

4 Flow Chart 23

5 Abstract 24

6 Objective 25

7 Coding 27

8 Output 31

9 Future Scope 33

10 Conclusion 35

INTRODUCTION
6
These days‟ fake news is creating different issues from sarcastic
articles to a fabricated news and plan government propaganda in some
outlets. Fake news and lack of trust in the media are growing problems
with huge ramifications in our society. Obviously, a purposely
misleading story is “fake news “but lately blathering social media’s
discourse is changing its definition. Some of them now use the term to
dismiss the facts counter to their preferred viewpoints.

The importance of disinformation within American political discourse


was the subject of weighty attention, particularly following the
American president election. The term 'fake news' became common
parlance for the issue, particularly to describe factually incorrect and
misleading articles published mostly for the purpose of making money
through page views. In this paper, it is seeker to produce a model
that can accurately predict the likelihood that a given article is fake
news. Facebook has been at the epicenter of much critique following
media attention. They have already implemented a feature to flag fake
news on the site when a user sees’ it ; they have also said publicly they
are working on to distinguish these articles in an automated way.
Certainly, it is not an easy task. A given algorithm must be politically
unbiased – since fake news exists on both ends of the spectrum –
and also give equal balance to legitimate news sources on either end
of the spectrum. In addition, the question of legitimacy is a difficult one.
7
However, in order to solve this problem, it is necessary to have an
understanding on what Fake News.

We will be training and testing the data, when we use supervised


learning, it means we are labeling the data. By getting the testing and
training data and labels we can perform different machine learning
algorithms but before performing the predictions and accuracies, the
data is need to be preprocessing i.e., the null values which are not
readable are required to be removed from the data set and the data is
required to be converted into vectors by normalizing and tokening the
data so that it could be understood by the machine. Next step is by
using this data, getting the visual reports, which we will get by using
the Mat Plot Library of Python and Sickit Learn. This library helps us in
getting the results in the form of histograms, pie charts or bar charts.

ABOUT OF PROJECT
8
With the advancement of technology, digital news is more widely
exposed to users globally and contributes to the increment of
spreading and disinformation online. Fake news can be found through
popular platforms such as social media and the Internet. There have
been multiple solutions and efforts in the detection of fake news where
it even works with tools. However, fake news intends to convince the
reader to believe false information which deems these articles difficult
to perceive. The rate of producing digital news is large and quick,
running daily at every second, thus it is challenging for machine
learning to effectively detect fake news.

The available literature has described many automatic detection


techniques of fake news and deception posts. Since there are
multidimensional aspects of fake news detection ranging from using
chatbots for spread of misinformation to use of clickbait’s for the rumor
spreading. There are many clickbait’s available in social media
networks including Facebook which enhance sharing and liking
Proceedings of posts which in turn spreads falsified information. Lot of
work has been done to detect falsified information.

MEDIA RICH FAKE NEWS DETECTION: A SURVEY


9
In general, the goal is profiting through clickbait’s. Clickbait’s lure users
and entice curiosity with flashy headlines or designs to click links to
increase advertisements revenues. This exposition analyzes the
prevalence of fake news in light of the advances in communication
made possible by the emergence of social networking sites. The
purpose of the work is to come up with a solution that can be utilized by
users to detect and filter out sites containing false and misleading
information. We use simple and carefully selected features of the title
and post to accurately identify fake posts. The experimental results
show a 99.4% accuracy using logistic classifier.
WEAKLY SUPERVISED LEARNING FOR FAKE NEWS
DETECTION ON TWITTER

The problem of automatic detection of fake news in social media,


e.g., on Twitter, has recently drawn some attention. Although, from a
technical perspective, it can be regarded as a straight-forward, binary
classification problem, the major challenge is the collection of large
enough training corpora, since manual annotation of tweets as fake or
non-fake news is an expensive and tedious endeavor. In this paper, we
discuss a weakly supervised approach, which automatically collects
a large-scale,

1
0
but very noisy training dataset comprising hundreds of thousands of
tweets. During collection, we automatically label tweets by their source,
i.e., trustworthy or untrustworthy source, and train a classifier on this
dataset. We then use that classifier for a different classification
target, i.e., the classification of fake and non- fake tweets. Although
the labels are not accurate according to the new classification target
(not all tweets by an untrustworthy source need to be fake news, and
vice versa), we show that despite this unclean inaccurate dataset, it is
possible to detect fake news with an F1 score of up to 0.9.

FAKE NEWS DETECTION IN SOCIAL MEDIA

Fake news and hoaxes have been there since before the advent of the
Internet. The widely accepted definition of Internet fake news is:
fictitious articles deliberately fabricated to deceive readers”. Social
media and news outlets publish fake news to increase readership or as
part of psychological warfare. In general, the goal is profiting through
clickbait’s. Clickbait’s lure users and entice curiosity with flashy
headlines or designs to click links to increase advertisements
revenues. This exposition analyzes the prevalence of fake news in light
of the advances in communication made possible by the emergence of
social networking sites. The purpose of the work is to come up with a
solution that can be utilized by users to detect and filter out sites
containing false and misleading information. We use simple and
carefully selected features of the title
1 and post to accurately identify
1
fake posts. The experimental results show a 99.4% accuracy using
logistic classifier.

Automatic Online Fake News Detection Combining Content and


Social Signals

The proliferation and rapid diffusion of fake news on the Internet


highlight the need of automatic hoax detection systems. In the context
of social networks, machine learning (ML) methods can be used for
this purpose. Fake news detection strategies are traditionally either
based on content analysis (i.e. analyzing the content of the news) or -
more recently - on social context models, such as mapping the news‟
diffusion pattern. In this paper, we first propose a novel ML fake news
detection method which, by combining news content and social
context features, outperforms existing methods in the literature,
increasing their already high accuracy by up to 4.8%. Second, we
implement our method within a Facebook Messenger chatbot and
validate it with a real-world application, obtaining a fake news detection
accuracy of 81.7%.

In recent years, the reliability of information on the Internet has


emerged as a crucial issue of modern society. Social network sites
(SNSs) have revolutionized the way in which information is spread by
allowing users to freely share content. As a consequence, SNSs are
also increasingly used as vectors for the diffusion of misinformation
and hoaxes. The amount of disseminated information and the rapidity
1
2
of its diffusion make it practically impossible to assess reliability in a
timely manner, highlighting the need for automatic hoax detection
systems. As a contribution towards this objective, we show that
Facebook posts can be classified with high accuracy as hoaxes or non-
hoaxes on the basis of the users who "liked" them. We present two
classification techniques, one based on logistic regression, the other
on a novel adaptation of Boolean crowdsourcing algorithms. On a
dataset consisting of 15,500 Facebook posts and 909,236 users, we
obtain classification accuracies exceeding 99% even when the training
set contains less than 1% of the posts. We further show that our
techniques are robust: they work even when we restrict our attention to
the users who like both hoax and non-hoax posts. These results
suggest that mapping the diffusion pattern of information can be a
useful component of automatic hoax detection systems.

THE SPREAD OF FAKE NEWS BY SOCIAL BOTS

The massive spread of fake news has been identified as a major global
risk and has been alleged to influence elections and threaten
democracies. Communication, cognitive, social, and computer
scientists are engaged in efforts to study the complex causes for the
viral diffusion of digital misinformation and to develop solutions, while
search and social media platforms are beginning to deploy
countermeasures. However, to date, these efforts have been mainly
informed by anecdotal evidence rather than systematic data. Here we
1
3
analyze 14 million messages spreading 400
thousand claims on Twitter during and following the 2016 U.S.
presidential campaign and election. We find evidence that social bots
play a key role in the spread of fake news. Accounts that actively
spread misinformation are significantly more likely to be bots.
Automated accounts are particularly active in the early spreading
phases of viral claims, and tend to target influential users. Humans are
vulnerable to this manipulation, retweeting bots who post false news.
Successful sources of false and biased claims are heavily supported by
social bots. These results suggests that curbing social bots may be an
effective strategy for mitigating the spread of online misinformation.

MISLEADING ONLINE CONTENT

Tabloid journalism is often criticized for its propensity for exaggeration,


sensationalization, scare-mongering, and otherwise producing
misleading and low-quality news. As the news has moved online, a
new form of tabloidization has emerged: „click baiting. ‟ „Clickbait‟
refers to “content whose main purpose is to attract attention and
encourage visitors to click on a link to a particular web page” and has
been implicated in the rapid spread of rumor and misinformation online.
1
4
This paper examines potential methods for the automatic detection of
clickbait as a form of deception. Methods for recognizing both textual
and non-textual click baiting cues are surveyed, leading to the
suggestion that a hybrid approach may yield best results.
Big Data Analytics and Deep Learning are two high-focus of data
science. Big Data has become important as many organizations both
public and private have been collecting massive amounts of domain-
specific information, which can contain useful information about
problems such as national intelligence, cyber security, fraud detection,
marketing, and medical informatics. Companies such as Google and
Microsoft are analyzing large volumes of data for business analysis
and decisions, impacting existing and future technology. Deep
Learning algorithms extract high- level, complex abstractions as data
representations through a hierarchical learning process. Complex
abstractions are learnt at a given level based on relatively simpler
abstractions formulated in the preceding level in the hierarchy. A key
benefit of Deep Learning is the analysis and learning of massive
amounts of unsupervised data, making it a valuable tool for Big Data
Analytics where raw data is largely unlabeled and un-categorized. In
the present study, we explore how Deep Learning can be utilized for
addressing some important problems in Big Data Analytics, including
extracting complex patterns from massive volumes of data, semantic
indexing, data tagging, fast information retrieval, and simplifying
discriminative tasks. We also investigate some aspects of Deep
1
5
Learning research that need further exploration to incorporate specific
challenges introduced by Big Data Analytics, including streaming data,
high-dimensional data, scalability of models, and distributed
computing. We conclude by presenting insights into relevant future
works by posing some questions, including defining data sampling
criteria, domain adaptation modeling, defining criteria for obtaining
useful data abstractions, improving semantic indexing, semi -
supervised learning, and active learning.

1
6
SYSTEM ANALYSIS

Hardware Requirement

Processor: Intel(R) Core or higher


RAM: - 4.00GB or higher
Hard disk: - 80GB or more
Monitor: -15” CRT, or LCD monitor
Keyboard: - Normal or Multimedia
Mouse: - Compatible mouse
Speed: - 1.40GHz or faster
Operating System: - 32/64-Bit operating system, x86/x64-based
processor

Software Requirement

Operating System: Windows 7/8/8.1/10/11


PYTHON IDE: Anaconda, Jupyter Notebook, Spyder, IDLE
Programming Language: Python, ML
Dataset: CSV

1
7
There exists a large body of research on the topic of machine learning
methods for deception detection, most of it has been focusing on
classifying online reviews and publicly available social media posts.
Particularly since late 2016 during the American Presidential election,
the question of determining 'fake news' has also been the subject of
particular attention within the literature. Conroy, Rubin, and Chen
outlines several approaches that seem promising towards the aim of
perfectly classify the misleading articles. They note that simple content-
related n-grams and shallow parts-of-speech tagging have proven
insufficient for the classification task, often failing to account for
important context information. Rather, these methods have been
shown useful only in tandem with more complex methods of
analysis. Deep Syntax analysis using Probabilistic Context Free
Grammars have been shown to be particularly valuable in combination
with n-gram methods. Feng, Banerjee, and Choi are able to achieve
85%-91% accuracy in deception related classification tasks using
online review corpora.

1
8
SOFTWARE ENVIRONMENT PYTHON
Python is a high-level, interpreted, interactive and object-oriented
scripting language. Python is designed to be highly readable. It uses
English keywords frequently where as other languages use
punctuation, and it has fewer syntactical constructions than other
languages.
 Python is Interpreted − Python is processed at runtime by the
interpreter. You do not need to compile your program before executing
it. This is similar to PERL and PHP.

 Python is Interactive − You can actually sit at a Python prompt and


interact with the interpreter directly to write your programs.

 Python is Object-Oriented − Python supports Object-Oriented style


or technique of programming that encapsulates code within objects.

 Python is a Beginner's Language − Python is a great language for


the beginner-level programmers and supports the development of a
wide range of applications from simple text processing to WWW
browsers to games.

History of Python

Python was developed by Guido van Rossum in the late eighties and
early nineties at the National Research Institute for Mathematics and
Computer Science in the Netherlands.
1
9
Python is derived from many other languages, including ABC, Modula-
3, C, C++, Algol-68, SmallTalk, and Unix shell and other scripting
languages.

Python is copyrighted. Like Perl, Python source code is now


available under the GNU General Public License (GPL).

Python is now maintained by a core development team at the


institute, although Guido van Rossum still holds a vital role in directing
its progress.

PYTHON FEATURES

Python's features include −

 Easy-to-learn − Python has few keywords, simple structure, and a


clearly defined syntax. This allows the student to pick up the language
quickly.

 Easy-to-read − Python code is more clearly defined and visible to the


eyes.
 Easy-to-maintain − Python's source code is fairly easy-to-maintain.

 A broad standard library − Python's bulk of the library is very portable


and cross-platform compatible on UNIX, Windows, and Macintosh.

 Interactive Mode − Python has support for an interactive mode which


allows interactive testing and debugging of snippets of code.

 Portable − Python can run on a wide variety of hardware platforms


2
0
and has the same interface on all platforms.

 Extendable − You can add low-level modules to the Python


interpreter. These modules enable programmers to add to or customize
their tools to be more efficient.

 Databases − Python provides interfaces to all major commercial


databases.

 GUI Programming − Python supports GUI applications that can be


created and ported to many system calls, libraries and windows
systems, such as Windows MFC, Macintosh, and the X Window
system of Unix.

 Scalable − Python provides a better structure and support for large


programs than shell scripting.

Apart from the above-mentioned features, Python has a big list of good
features, few are listed below −

 It supports functional and structured programming methods as well as


OOP.

 It can be used as a scripting language or can be compiled to byte-code


for building large applications.

 It provides very high-level dynamic data types and supports dynamic


type checking.

 It supports automatic garbage collection.


2
1
 It can be easily integrated with C, C++, COM, ActiveX, CORBA, and
Java.

Python is available on a wide variety of platforms including Linux and


Mac OS X. Let's understand how to set up our Python environment.

Getting Python

The most up-to-date and current source code, binaries, documentation,


news, etc., is available on the official website of Python
https://www.python.org.

Windows Installation

Here are the steps to install Python on Windows machine.

Open a Web browser and go to https://www.python.org/downloads/.

Follow the link for the Windows installer python-XYZ. misfile where XYZ is
the version you need to install.

To use this installer python-XYZ.msi, the Windows system must support


Microsoft Installer 2.0. Save the installer file to your local machine and
then run it to find out if your machine supports MSI.

Run the downloaded file. This brings up the Python install wizard,
which is really easy to use. Just accept the default settings, wait until the
install is finished, and you are done.

2
2
FLOW CHART

2
3
ABSTRACT

Recent political events have lead to an increase in the popularity and


spread of fake news. As demonstrated by the widespread effects of the
large onset of fake news, humans are inconsistent if not outright poor
detectors of fake news. With this, efforts have been made to automate
the process of fake news detection. The most popular of such attempts
include “blacklists” of sources and authors that are unreliable. While
these tools are useful, in order to create a more complete end to end
solution, we need to account for more difficult cases where reliable
sources and authors release fake news. As such, the goal of this
project was to create a tool for detecting the language patterns that
characterize fake and real news through the use of machine learning
and natural language processing techniques. The results of this project
demonstrate the ability for machine learning to be useful in this task.
We have built a model that catches many intuitive indications of real
and fake news as well as an application that aids in the visualization of
the classification decision.

2
4
OBJECTIVE

The objective of this project is to examine the problems and possible


significances related with the spread of fake news. We will be working
on different fake news data set in which we will apply different machine
learning algorithms to train the data and test it to find which news is the
real news or which one is the fake news. As the fake news is a problem
that is heavily affecting society and our perception of not only the
media but also facts and opinions themselves. By using the artificial
intelligence and the machine learning, the problem can be solved as
we will be able to mine the patterns from the data to maximize well
defined objectives. So, our focus is to find which machine learning
algorithm is best suitable for what kind of text dataset. Also, which
dataset is better for finding the accuracies as the accuracies directly
depends on the type of data and the amount of data. The more the
data, more are your chances of getting correct accuracy as you can
test and train more data to find out your results.

In this paper a model is build based on the count vectorizer or trimatrix


( i.e. ) word tallies relatives to how often they are used in other articles
in your dataset ) can help . Since this problem is a kind of text
classification, implementing a Naive Bayes classifier will be best as
this is standard for text-based processing. The actual goal is in
2
5
developing a model which was the text transformation (count vectorizer
vs tiff vectorizer) and choosing which type of text to use (headlines vs
full text). Now the next step is to extract the most optimal features for
count vectorizer or tiff-vectorizer, this is done by using a n-number of
the most used words, and/or phrases, lower casing or not, mainly
removing the stop words which are common words such as “the”,
“when”, and “there” and only using those words that appear at least a
given number of times in a given text dataset.

CODING
2
6
# Fake News Detection
![image.png](attachment:image.png)
## Importing Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
import re
import string
## Importing Dataset
df_fake = pd.read_csv("../input/fake-news-detection/Fake.csv")
df_true = pd.read_csv("../input/fake-news-detection/True.csv")
df_fake.head()
df_true.head(5)
## Inserting a column "class" as target feature
df_fake["class"] = 0
df_true["class"] = 1
df_fake.shape, df_true.shape
# Removing last 10 rows for manual testing
df_fake_manual_testing = df_fake.tail(10)
for i in range(23480,23470,-1):
df_fake.drop([i], axis = 0, inplace = True)

df_true_manual_testing = df_true.tail(10)
for i in range(21416,21406,-1):
df_true.drop([i], axis = 0, inplace = True)
df_fake.shape, df_true.shape
df_fake_manual_testing["class"] = 0
df_true_manual_testing["class"] = 1
df_fake_manual_testing.head(10)
df_true_manual_testing.head(10)
2
7
df_manual_testing =
pd.concat([df_fake_manual_testing,df_true_manual_testing], axis = 0)
df_manual_testing.to_csv("manual_testing.csv")
## Merging True and Fake Dataframes
df_merge = pd.concat([df_fake, df_true], axis =0 )
df_merge.head(10)
df_merge.columns
## Removing columns which are not required
df = df_merge.drop(["title", "subject","date"], axis = 1)
df.isnull().sum()
## Random Shuffling the dataframe
df = df.sample(frac = 1)
df.head()
df.reset_index(inplace = True)
df.drop(["index"], axis = 1, inplace = True)
df.columns
df.head()
## Creating a function to process the texts
def wordopt(text):
text = text.lower()
text = re.sub('\[.*?\]', '', text)
text = re.sub("\\W"," ",text)
text = re.sub('https?://\S+|www\.\S+', '', text)
text = re.sub('<.*?>+', '', text)
text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
text = re.sub('\n', '', text)
text = re.sub('\w*\d\w*', '', text)
return text
df["text"] = df["text"].apply(wordopt)
## Defining dependent and independent variables
x = df["text"]
y = df["class"]
## Splitting Training and Testing
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)
## Convert text to vectors
2
8
from sklearn.feature_extraction.text import TfidfVectorizer

vectorization = TfidfVectorizer()
xv_train = vectorization.fit_transform(x_train)
xv_test = vectorization.transform(x_test)
## Logistic Regression
from sklearn.linear_model import LogisticRegression

LR = LogisticRegression()
LR.fit(xv_train,y_train)
pred_lr=LR.predict(xv_test)
LR.score(xv_test, y_test)
print(classification_report(y_test, pred_lr))
## Decision Tree Classification
from sklearn.tree import DecisionTreeClassifier

DT = DecisionTreeClassifier()
DT.fit(xv_train, y_train)
pred_dt = DT.predict(xv_test)
DT.score(xv_test, y_test)
print(classification_report(y_test, pred_dt))
## Gradient Boosting Classifier
from sklearn.ensemble import GradientBoostingClassifier

GBC = GradientBoostingClassifier(random_state=0)
GBC.fit(xv_train, y_train)
pred_gbc = GBC.predict(xv_test)
GBC.score(xv_test, y_test)
print(classification_report(y_test, pred_gbc))
## Random Forest Classifier
from sklearn.ensemble import RandomForestClassifier

RFC = RandomForestClassifier(random_state=0)
RFC.fit(xv_train, y_train)
pred_rfc = RFC.predict(xv_test)
2
9
RFC.score(xv_test, y_test)
print(classification_report(y_test, pred_rfc))
## Model Testing
def output_lable(n):
if n == 0:
return "Fake News"
elif n == 1:
return "Not A Fake News"
def manual_testing(news):
testing_news = {"text":[news]}
new_def_test = pd.DataFrame(testing_news)
new_def_test["text"] = new_def_test["text"].apply(wordopt)
new_x_test = new_def_test["text"]
new_xv_test = vectorization.transform(new_x_test)
pred_LR = LR.predict(new_xv_test)
pred_DT = DT.predict(new_xv_test)
pred_GBC = GBC.predict(new_xv_test)
pred_RFC = RFC.predict(new_xv_test)

return print("\n\nLR Prediction: {} \nDT Prediction: {} \nGBC


Prediction: {} \nRFC Prediction: {}".format(output_lable(pred_LR[0]),
output_lable(pred_DT[0]),
output_lable(pred_GBC[0]),
output_lable(pred_RFC[0])))
news = str(input())
manual_testing(news)
news = str(input())
manual_testing(news)
news = str(input())
manual_testing(news)
OUTPUT

3
0
3
1
precision recall f1-score
support

0 0.99 1.00 0.99


5853
1 1.00 0.99 0.99
5367

accuracy 0.99
11220
macro avg 0.99 0.99 0.99
11220
weighted avg 0.99 0.99 0.99
11220

LR Prediction: Not A Fake News


DT Prediction: Not A Fake News
GBC Prediction: Not A Fake News
RFC Prediction: Not A Fake News

3
2
FUTURE SCOPE
Through the work done in this project, we have shown that machine
learning certainly does have the capacity to pick up on sometimes
subtle language patterns that may be difficult for humans to pick up on.
The next steps involved in this project come in three different aspects.

The first of aspect that could be improved in this project is augmenting


and increasing the size of the dataset. We feel that more data would be
beneficial in ridding the model of any bias based on specific patterns in
the source. There is also question as to weather or not the size of our
dataset is sufficient.

The second aspect in which this project could be expanded is by


comparing it to humans performing the same task. Comparing the
accuracies would be beneficial in deciding whether or not the dataset is
representative of how difficult the task of separating fake from real
news is.

If humans are more accurate than the model, it may mean that we
need to choose more deceptive fake news examples. Because we
acknowledge that this is only one tool in a toolbox that would really be
required for an end-to-end system for classifying fake news, we expect
that its accuracy will never reach perfect.

However, it may be beneficial as a stand-alone application if its


accuracy is already higher than human accuracy at the same task. In
addition to comparing the accuracy to human accuracy, it would also
be interesting to compare the phrases/trigrams that a human would
point out if asked what they based their classification decision on.

Then, we could quantify how similar these patterns are to those that
humans find indicative of fake and real news. Finally, as we have
mentioned throughout, this application is only one that 43 would be
necessary in a larger toolbox that could function as a highly accurate
3
3
fake news classifier.

Other tools that would need to be built may include a fact detector and
a stance detector. In order to combine all of these “routines,” there
would need to be some type of model that combines all of the tools and
learns how to weight each of them in its final decision.

3
4
CONCLUSION
Many people consume news from social media instead of traditional
news media. However, social media has also been used to spread fake
news, which has negative impacts on individual people and society. In
this paper, an innovative model for fake news detection using machine
learning algorithms has been presented. This model takes news events
as an input and based on twitter reviews and classification algorithms it
predicts the percentage of news being fake or real.

 Algorithm’s accuracy depends on the type and size of your dataset.


More the data, more chances of getting correct accuracy.
 Machine learning depends on the variations and relations
 Understanding what is predictable is as important as trying to predict it.
 While making algorithm choice , speed should be a consideration
factor.

The feasibility of the project is analyzed in this phase and business


proposal is put forth with a very general plan for the project and some
cost estimates. During system analysis the feasibility study of the
proposed system is to be carried out. This is to ensure that the
proposed system is not a burden to the company. For feasibility
analysis, some understanding of the major requirements for the system
is essential. This study is carried out to check the economic impact that
3
5
the system will have on the organization. The amount of fund that the
company can pour into the research and development of the system is
limited. The expenditures must be justified. Thus, the developed
system as well within the budget and this was achieved because most
of the technologies used are freely available. Only the customized
products had to be purchased.

3
6

You might also like