Fake News Detection
Fake News Detection
On
2
Certificate for Bonafide
This is to certify that this is a bona fide record of the project work
done satisfactory at “SAI CODING SOLUTION” by CHHOTU KUMAR Roll
No.210219016890, in partial fulfillment of BCA Examination.
This report or similar report on the topic has not been submitted
for any other examination and doesn’t form part of any other course
undergone by the candidate.
3
ACKNOWLEDGEMENT
This thesis work has been an intellectually invigorating experience for
me. I am sure that the knowledge and experience gathered during the
course of this work will make me stand in good stead in future.
With immense pleasure and due respect, I express my sincere
gratitude to (principal), In charge, Annada College Hazaribagh, for all
his support and co-operation in successfully completing this thesis
work by providing excellent facilities.
I am highly grateful to the College Coordinator Dr. Bakshi Sir for her
ever-helping attitude and encouraging us to excel in studies, besides,
she has been a source of inspiration during my entire period of BCA.
I would also like to extend my sincere gratitude to all faculties’
members and staff for helping me in my college during my BCA
course.
I would like to take this opportunity to extend my sincere gratitude and
thanks to my Pioneer Prof. Sanjeev Sir, firstly for coming up with such
an innovative thesis idea. He has not only made us to work but guided
us to orient toward research. It has been real pleasure working under
her guidance and it is chiefly her encouragement and motivation that
has made this thesis a reality.
Last, but not the last I am heartily thankful to almighty God for
showering his blessing forever during my entire life and also to my
family members for providing me a great support.
4
Declaration by the Candidate
By CHHOTU KUMAR Roll No. 210219016890 hereby declare that the
work, which is being presented in the dissertation, entitled “FAKE
NEWS DETECTION”, in partial fulfillment of requirement for the award
of the degree of “Bachelor Of computer application” submitted in
Vinoba Bhave University, Hazaribagh is an authentic record of our
work carried out under the guidance of Mr. Kanchan Raju.
We have not submitted the matter embodied in this dissertation
for the award of any other degree.
5
TABLE OF CONTENTS
S. No Particulars Page No
1 Introduction 7
2 About of Project 9
3 System Analysis 17
4 Flow Chart 23
5 Abstract 24
6 Objective 25
7 Coding 27
8 Output 31
9 Future Scope 33
10 Conclusion 35
INTRODUCTION
6
These days‟ fake news is creating different issues from sarcastic
articles to a fabricated news and plan government propaganda in some
outlets. Fake news and lack of trust in the media are growing problems
with huge ramifications in our society. Obviously, a purposely
misleading story is “fake news “but lately blathering social media’s
discourse is changing its definition. Some of them now use the term to
dismiss the facts counter to their preferred viewpoints.
ABOUT OF PROJECT
8
With the advancement of technology, digital news is more widely
exposed to users globally and contributes to the increment of
spreading and disinformation online. Fake news can be found through
popular platforms such as social media and the Internet. There have
been multiple solutions and efforts in the detection of fake news where
it even works with tools. However, fake news intends to convince the
reader to believe false information which deems these articles difficult
to perceive. The rate of producing digital news is large and quick,
running daily at every second, thus it is challenging for machine
learning to effectively detect fake news.
1
0
but very noisy training dataset comprising hundreds of thousands of
tweets. During collection, we automatically label tweets by their source,
i.e., trustworthy or untrustworthy source, and train a classifier on this
dataset. We then use that classifier for a different classification
target, i.e., the classification of fake and non- fake tweets. Although
the labels are not accurate according to the new classification target
(not all tweets by an untrustworthy source need to be fake news, and
vice versa), we show that despite this unclean inaccurate dataset, it is
possible to detect fake news with an F1 score of up to 0.9.
Fake news and hoaxes have been there since before the advent of the
Internet. The widely accepted definition of Internet fake news is:
fictitious articles deliberately fabricated to deceive readers”. Social
media and news outlets publish fake news to increase readership or as
part of psychological warfare. In general, the goal is profiting through
clickbait’s. Clickbait’s lure users and entice curiosity with flashy
headlines or designs to click links to increase advertisements
revenues. This exposition analyzes the prevalence of fake news in light
of the advances in communication made possible by the emergence of
social networking sites. The purpose of the work is to come up with a
solution that can be utilized by users to detect and filter out sites
containing false and misleading information. We use simple and
carefully selected features of the title
1 and post to accurately identify
1
fake posts. The experimental results show a 99.4% accuracy using
logistic classifier.
The massive spread of fake news has been identified as a major global
risk and has been alleged to influence elections and threaten
democracies. Communication, cognitive, social, and computer
scientists are engaged in efforts to study the complex causes for the
viral diffusion of digital misinformation and to develop solutions, while
search and social media platforms are beginning to deploy
countermeasures. However, to date, these efforts have been mainly
informed by anecdotal evidence rather than systematic data. Here we
1
3
analyze 14 million messages spreading 400
thousand claims on Twitter during and following the 2016 U.S.
presidential campaign and election. We find evidence that social bots
play a key role in the spread of fake news. Accounts that actively
spread misinformation are significantly more likely to be bots.
Automated accounts are particularly active in the early spreading
phases of viral claims, and tend to target influential users. Humans are
vulnerable to this manipulation, retweeting bots who post false news.
Successful sources of false and biased claims are heavily supported by
social bots. These results suggests that curbing social bots may be an
effective strategy for mitigating the spread of online misinformation.
1
6
SYSTEM ANALYSIS
Hardware Requirement
Software Requirement
1
7
There exists a large body of research on the topic of machine learning
methods for deception detection, most of it has been focusing on
classifying online reviews and publicly available social media posts.
Particularly since late 2016 during the American Presidential election,
the question of determining 'fake news' has also been the subject of
particular attention within the literature. Conroy, Rubin, and Chen
outlines several approaches that seem promising towards the aim of
perfectly classify the misleading articles. They note that simple content-
related n-grams and shallow parts-of-speech tagging have proven
insufficient for the classification task, often failing to account for
important context information. Rather, these methods have been
shown useful only in tandem with more complex methods of
analysis. Deep Syntax analysis using Probabilistic Context Free
Grammars have been shown to be particularly valuable in combination
with n-gram methods. Feng, Banerjee, and Choi are able to achieve
85%-91% accuracy in deception related classification tasks using
online review corpora.
1
8
SOFTWARE ENVIRONMENT PYTHON
Python is a high-level, interpreted, interactive and object-oriented
scripting language. Python is designed to be highly readable. It uses
English keywords frequently where as other languages use
punctuation, and it has fewer syntactical constructions than other
languages.
Python is Interpreted − Python is processed at runtime by the
interpreter. You do not need to compile your program before executing
it. This is similar to PERL and PHP.
History of Python
Python was developed by Guido van Rossum in the late eighties and
early nineties at the National Research Institute for Mathematics and
Computer Science in the Netherlands.
1
9
Python is derived from many other languages, including ABC, Modula-
3, C, C++, Algol-68, SmallTalk, and Unix shell and other scripting
languages.
PYTHON FEATURES
Apart from the above-mentioned features, Python has a big list of good
features, few are listed below −
Getting Python
Windows Installation
Follow the link for the Windows installer python-XYZ. misfile where XYZ is
the version you need to install.
Run the downloaded file. This brings up the Python install wizard,
which is really easy to use. Just accept the default settings, wait until the
install is finished, and you are done.
2
2
FLOW CHART
2
3
ABSTRACT
2
4
OBJECTIVE
CODING
2
6
# Fake News Detection

## Importing Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
import re
import string
## Importing Dataset
df_fake = pd.read_csv("../input/fake-news-detection/Fake.csv")
df_true = pd.read_csv("../input/fake-news-detection/True.csv")
df_fake.head()
df_true.head(5)
## Inserting a column "class" as target feature
df_fake["class"] = 0
df_true["class"] = 1
df_fake.shape, df_true.shape
# Removing last 10 rows for manual testing
df_fake_manual_testing = df_fake.tail(10)
for i in range(23480,23470,-1):
df_fake.drop([i], axis = 0, inplace = True)
df_true_manual_testing = df_true.tail(10)
for i in range(21416,21406,-1):
df_true.drop([i], axis = 0, inplace = True)
df_fake.shape, df_true.shape
df_fake_manual_testing["class"] = 0
df_true_manual_testing["class"] = 1
df_fake_manual_testing.head(10)
df_true_manual_testing.head(10)
2
7
df_manual_testing =
pd.concat([df_fake_manual_testing,df_true_manual_testing], axis = 0)
df_manual_testing.to_csv("manual_testing.csv")
## Merging True and Fake Dataframes
df_merge = pd.concat([df_fake, df_true], axis =0 )
df_merge.head(10)
df_merge.columns
## Removing columns which are not required
df = df_merge.drop(["title", "subject","date"], axis = 1)
df.isnull().sum()
## Random Shuffling the dataframe
df = df.sample(frac = 1)
df.head()
df.reset_index(inplace = True)
df.drop(["index"], axis = 1, inplace = True)
df.columns
df.head()
## Creating a function to process the texts
def wordopt(text):
text = text.lower()
text = re.sub('\[.*?\]', '', text)
text = re.sub("\\W"," ",text)
text = re.sub('https?://\S+|www\.\S+', '', text)
text = re.sub('<.*?>+', '', text)
text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
text = re.sub('\n', '', text)
text = re.sub('\w*\d\w*', '', text)
return text
df["text"] = df["text"].apply(wordopt)
## Defining dependent and independent variables
x = df["text"]
y = df["class"]
## Splitting Training and Testing
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25)
## Convert text to vectors
2
8
from sklearn.feature_extraction.text import TfidfVectorizer
vectorization = TfidfVectorizer()
xv_train = vectorization.fit_transform(x_train)
xv_test = vectorization.transform(x_test)
## Logistic Regression
from sklearn.linear_model import LogisticRegression
LR = LogisticRegression()
LR.fit(xv_train,y_train)
pred_lr=LR.predict(xv_test)
LR.score(xv_test, y_test)
print(classification_report(y_test, pred_lr))
## Decision Tree Classification
from sklearn.tree import DecisionTreeClassifier
DT = DecisionTreeClassifier()
DT.fit(xv_train, y_train)
pred_dt = DT.predict(xv_test)
DT.score(xv_test, y_test)
print(classification_report(y_test, pred_dt))
## Gradient Boosting Classifier
from sklearn.ensemble import GradientBoostingClassifier
GBC = GradientBoostingClassifier(random_state=0)
GBC.fit(xv_train, y_train)
pred_gbc = GBC.predict(xv_test)
GBC.score(xv_test, y_test)
print(classification_report(y_test, pred_gbc))
## Random Forest Classifier
from sklearn.ensemble import RandomForestClassifier
RFC = RandomForestClassifier(random_state=0)
RFC.fit(xv_train, y_train)
pred_rfc = RFC.predict(xv_test)
2
9
RFC.score(xv_test, y_test)
print(classification_report(y_test, pred_rfc))
## Model Testing
def output_lable(n):
if n == 0:
return "Fake News"
elif n == 1:
return "Not A Fake News"
def manual_testing(news):
testing_news = {"text":[news]}
new_def_test = pd.DataFrame(testing_news)
new_def_test["text"] = new_def_test["text"].apply(wordopt)
new_x_test = new_def_test["text"]
new_xv_test = vectorization.transform(new_x_test)
pred_LR = LR.predict(new_xv_test)
pred_DT = DT.predict(new_xv_test)
pred_GBC = GBC.predict(new_xv_test)
pred_RFC = RFC.predict(new_xv_test)
3
0
3
1
precision recall f1-score
support
accuracy 0.99
11220
macro avg 0.99 0.99 0.99
11220
weighted avg 0.99 0.99 0.99
11220
3
2
FUTURE SCOPE
Through the work done in this project, we have shown that machine
learning certainly does have the capacity to pick up on sometimes
subtle language patterns that may be difficult for humans to pick up on.
The next steps involved in this project come in three different aspects.
If humans are more accurate than the model, it may mean that we
need to choose more deceptive fake news examples. Because we
acknowledge that this is only one tool in a toolbox that would really be
required for an end-to-end system for classifying fake news, we expect
that its accuracy will never reach perfect.
Then, we could quantify how similar these patterns are to those that
humans find indicative of fake and real news. Finally, as we have
mentioned throughout, this application is only one that 43 would be
necessary in a larger toolbox that could function as a highly accurate
3
3
fake news classifier.
Other tools that would need to be built may include a fact detector and
a stance detector. In order to combine all of these “routines,” there
would need to be some type of model that combines all of the tools and
learns how to weight each of them in its final decision.
3
4
CONCLUSION
Many people consume news from social media instead of traditional
news media. However, social media has also been used to spread fake
news, which has negative impacts on individual people and society. In
this paper, an innovative model for fake news detection using machine
learning algorithms has been presented. This model takes news events
as an input and based on twitter reviews and classification algorithms it
predicts the percentage of news being fake or real.
3
6