0% found this document useful (0 votes)

128 views48 pages

Sentiment Analysis Using Feature Selection and Machine Learning Algorithms

The document summarizes the key steps in a proposed methodology for sentiment analysis using feature selection and machine learning algorithms. The methodology involves preprocessing text data to remove stop words and stem words, selecting important features using chi-square scoring, and classifying the sentiment using a Naive Bayes classifier. The goal is to automatically predict the sentiment class of reviews to maximize the performance of the model.

Uploaded by

Shruti Pant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

128 views48 pages

Sentiment Analysis Using Feature Selection and Machine Learning Algorithms

Uploaded by

Shruti Pant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Sentiment Analysis using Feature Selection

and Machine Learning Algorithms

Presented By

Shruti Pant
Under Guidance Of:

Ms. Kalpana Jain : Major Advisor

Dr. Naveen Choudhary : Co-Advisor
Dr. Naveen Jain : Advisor
Dr. Chitranjan Agarwal : DRI Nominee
Contents:
1) Introduction of Sentiment Analysis
2) Literature Survey
3) Research Gap and Motivation
4) Objective of thesis
5) Design Issue
6) Design Flow
7) Experimental Results
8) List of Publication
9) Scope of Future Enhancement
10) References
11) Changes
Introduction to
Sentiment Analysis
What is Machine Learning?
Machine Learning
Study of algorithms that improve their performance at
some task with experience
Optimize a performance criterion using example
data or past experience.
Role of Statistics: Inference from a sample
Role of Computer science: Efficient algorithms to
Solve the optimization problem
Representing and evaluating the model for
inference
Types

Supervised Learning
Classification
Regression
Unsupervised Learning
Reinforcement Learning
What people think?
What others think has always been an important piece of information

Which car should I buy?

Which schools should I

apply to?

Which Professor to work for?

Whom should I vote for?

So whom shall I ask?
Pre Web
Friends and relatives
Acquaintances
Consumer Reports

Post Web
I dont know who..but apparently its a good phone. It has good battery life and
Blogs (google blogs, livejournal)
E-commerce sites (amazon, ebay)
Review sites (CNET, PC Magazine)
Discussion forums (forums.craigslist.org,
forums.macrumors.com)
Friends and Relatives (occasionally)
Basics Of Sentiments
Holder (source) of attitude
Target (aspect) of attitude
Type of attitude
From a set of types
Like, love, hate, value, desire, etc.
Or (more commonly) simple weighted
Polarity: positive or negative
Text containing the attitude
Sentence or entire document
Sentiment
A thought, view, or attitude, especially one based
mainly on emotion instead of reason

Sentiment Analysis
aka opinion mining
use of natural language processing (NLP)
and computational techniques to automate
the extraction or classification of
sentiment from typically unstructured text
Identify the orientation of opinion in a piece of text

The movie The movie

was fabulous! was horrible!

Approaches : Classifier Based

Lexicon Based
A. Sentence Level Classification
Assumption: a sentence contains only one
opinion
Task 1: identify if sentence is opinionated
classes: objective and subjective
Task 2: determine polarity of sentence
classes: positive and negative

Quiz:
This is a beautiful bracelet..
Is this sentence subjective/objective?
Is it positive or negative ?
B. Document(post/review) Level
Classification

Assumption:
each document focuses on a single object

contains opinion from a single opinion holder

Task: determine overall sentiment orientation

in document
classes: positive and negative
C. Feature Level Classification

Goal: produce a feature-based opinion summary of

multiple reviews

Task 1: Identify and extract object features that

have been commented on by an opinion holder (e.g.
picture,battery life).
Task 2: Determine polarity of opinions on features
classes: positive and negative
Task 3: Group feature synonyms
Need of Sentiment Analysis
Consumer information
Product reviews
Marketing
Consumer attitudes
Trends
Politics
Politicians want to know voters views
Voters want to know policitians stances and who
else supports them
Social
Find like-minded individuals or communities
Literature Review
[1] presented an unsupervised method on document level using
point wise mutual information to classify reviews are
recommended and not recommended. This algorithm achieved
74% accuracy.

[2] extended the work using PMI and Latent Semantic Analysis
(LSA) and achieved the accuracy of 82.2%.

[3] analysed movie review dataset on several supervised machine

learning algorithms (SVM, NB and MaxEnt) and different feature
selection techniques on a movie reviews dataset. He applied
various pre-processing techniques like stemming or lemmatization.
He used NB and MaxEnt with POS tagging to increase the
performance more than SVM.
[4] proposed an approach using IMDB movie database where it
labelled the document into objective and subjective to find
minimum s-t cut in graph to achieve the accuracy of 85%.

[5] used machine learning techniques for interlanguage (English,

Dutch and French) studies. Feature selection along with negation
unigrams and stemming is performed for relevant features and then
Multinomial nave Bayes, SVM and maximum entropy are
compared to get the overall performance.

[6] performed sentiment analysis using various feature selections

schemes like tf-idf and term occurrence and classifies the dataset
using SVM and Naive Bayes to show the performance
comparison.
Research Gap and
motivation
Sentimental analysis is a hot topic of research.
Use of electronic media is increasing day by
day.
Time is money or even more valuable than
money therefore instead of spending times in
reading and figuring out the positivity or
negativity of text we can use automated
techniques for sentimental analysis.
Sentiment analysis is used in opinion mining.
Example Analyzing a product based on its
reviews and comments.
Key is to find the best
classifier according to the
dataset used and space and
time available

Earlier work have used

Lexicon analysis to find the
sentiment of the word

Earlier works have used

different feature selection
techniques and even
classifier to build a automatic
model
Objective
A heuristic based on Nave Bayes classifier
is designed to automatically predict the class
of the incoming review in order to maximize
the performance of the model.
Design Issue
Supervised / Classification
unsupervised / Hybrid Algorithm

Feature Selection
Techniques Lexicon Or Machine
Based
Design Flow
Dataset (Phase 1)
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002.
Thumbs up? Sentiment Classification using Machine Learning
Techniques. EMNLP-2002, 7986.
Bo Pang and Lillian Lee. 2004. A Sentimental Education:
Sentiment Analysis Using Subjectivity Summarization Based
on Minimum Cuts. ACL, 271-278

Polarity detection:
Is an IMDB movie review positive or
negative?
Data: Polarity Data 2.0:
http://www.cs.cornell.edu/people/pabo/movi
e-review-data
IMDB data in the Pang and Lee
database
when _star wars_ came out some snake eyes is the most
twenty years ago , the image of aggravating kind of movie :
traveling throughout the stars has the kind that shows so much
become a commonplace image . potential then becomes
[] unbelievably disappointing .
when han solo goes light speed , its not just because this is a
the stars change to bright lines , brian depalma film , and since
going towards the viewer in lines hes a great director and one
that converge at an invisible point whos films are always greeted
cool . with at least some fanfare .
_october sky_ offers a much and its not even because this
simpler imagethat of a single was a film starring nicolas
white dot , traveling horizontally cage and since he gives a
across the night sky . [. . . ] brauvara performance , this
film is hardly worth his talents
Pre-Processing (Phase 2)
Pre-processing is done in our proposed methodology to remove
the words which impede our process of sentiment analysis by
increasing the number of false positives or false negative.
In our model stop words are removed using Tf-idf. Term
Frequency- Inverse Document Frequency is known to find the
important and no so important word in the document. NLTK
also comes with an in-built list of 128 stop words which is
also included in our model to select the not relevant words.
We have done this by importing stopwords from NLTK
corpus.
Stemming algorithms attempt to automatically remove
suffixes (and in some cases prefixes) in order to find the root
word or stem of a given word. NLTK provides several
stemmer interfaces. In our proposed method we have used
porter stemmer to find the root words.
Feature Selection (Phase 3)
Feature selection is used to increase the
effectiveness of the model. Features which are
important are selected and fed to the classifier.
In our proposed methodology we used chi square
as a scoring function with which we can find if
two terms are associated to each other
(collocation correlation of two words or words
that are more likely to occur together).
It helps us in understanding if a word is
informative or not. If a word mainly occurs in
positive review and rarely in negative reviews it
can main that the word is important. So we find
how common a word is in a particular class
compared to other classes.
Feature Selection (Phase 4)
In Machine learning, A nave Bayes
classifier is a family of simple,
baseline probabilistic classifier based
on Bayes theorem with strong but
nave independence assumptions.
Experimental Results
Accuracy
100
93
90 84.75
84
81.6
80 75.25 76.5

60 Accuracy

Figure 4.2: Accuracy comparison of chi square and

information gain applied on our proposed methodology
with G. Tripathi et al
Precision
100 93
90.4 90.15
86.17
83.6 82.63
90

60 Precision

Figure 4.3: Precision comparison of chi square and

information gain applied on our proposed methodology
with G. Tripathi et al
Recall

100 94
88
90 81.6 81
80

70
59.5
56.5
60 Recall

Figure 4.3: Recall comparison of chi square and

information gain applied on our proposed methodology
with G. Tripathi et al
F-MEASURE
100

60 F-MEASURE

50 93.49
83.96 83.5 83.23
40 69.53 71.68

Figure 4.4: F-measure comparison of chi square and

information gain applied on our proposed methodology
with G. Tripathi et al
List of Publications
Pant, S., & Jain, K. (2017). Sentiment Analysis
using Feature Selection and Classification
Algorithms A survey. IJIERT,4(3), 109-113.

Pant, S., & Jain, K. (2017). Sentiment Analysis

using Feature Selection and Classification
Algorithms. IJIERT,4(5), 5-11.
Scope of Future
Enhancement
We would like to extend this technique on other
domains of opinion mining likes newspaper
articles, product reviews, political discussion
forums etc. We would like to apply in-depth
concepts of NLP for improved prediction of the
polarity of the document.
We are planning to make automatic sentiment
classifier for more than one languages starting
from the Hindi language. As nowadays
multilingual messages are posted on social
websites, so we will able to predict the sentiment
for any language.
It is worth extending the research using hybrid
techniques for sentiment analysis.
References
1. Turney, P. D. 2002, July. Thumbs up or thumbs down?:
semantic orientation applied to unsupervised
classification of reviews. In Proceedings of the 40th
annual meeting on association for computational
linguistics pp. 417-424.
2. Turney, P. D., & Littman, M. L. 2003. Measuring praise
and criticism: Inference of semantic orientation from
association. ACM Transactions on Information Systems
(TOIS), 21 : 315-346.
3. Pang, B., Lee, L., & Vaithyanathan, S. 2002, July.
Thumbs up?: sentiment classification using machine
learning techniques. In Proceedings of the ACL-02
conference on Empirical methods in natural language
processing-Volume 10 pp. 79-86.
4. Pang, B., & Lee, L. 2004, July. A sentimental
education: Sentiment analysis using subjectivity
summarization based on minimum cuts. In Proceedings
of the 42nd annual meeting on Association for
Computational Linguistics pp. 271-275.
5. Boiy, E., & Moens, M. F. 2009. A machine learning
approach to sentiment analysis in multilingual Web
texts. Information retrieval, 12 : 526-558.
6. Tripathi, G., & Naganna, S. 2015. Feature selection and
classification approach for sentiment analysis. Machine
Learning and Applications: An International
Journal, 2 : 1-16.
Changes
1

4
5

8
9

13
Thank You

NLP Final Mini Project
No ratings yet
NLP Final Mini Project
17 pages
Sentiment Analysis of Twitter
No ratings yet
Sentiment Analysis of Twitter
26 pages
Business Analytics Using Python Sentiment Analytics: Cyrus Lentin
100% (1)
Business Analytics Using Python Sentiment Analytics: Cyrus Lentin
28 pages
Text Mining Techniques Overview
100% (1)
Text Mining Techniques Overview
4 pages
Project On Sentimental Analysis: Submitted by
No ratings yet
Project On Sentimental Analysis: Submitted by
17 pages
IMDB Movie Review Analysis
No ratings yet
IMDB Movie Review Analysis
9 pages
Engineering-A Review Web Data Scrapping
No ratings yet
Engineering-A Review Web Data Scrapping
4 pages
Twitter Sentiment Analysis With Textblob
No ratings yet
Twitter Sentiment Analysis With Textblob
6 pages
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
No ratings yet
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
27 pages
Amazon Product Review Sentiment Analysis With Machine Learning
No ratings yet
Amazon Product Review Sentiment Analysis With Machine Learning
4 pages
Logistic Regression for Ad Clicks
No ratings yet
Logistic Regression for Ad Clicks
14 pages
COVID-19 Twitter Sentiment Analysis
No ratings yet
COVID-19 Twitter Sentiment Analysis
19 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
8 pages
Case - Study of Data Warehouse
No ratings yet
Case - Study of Data Warehouse
14 pages
An Empirical Case Study On Indian Consumers' Sentiment Towards Electric Vehicles - A Big Data Analytics Approach
No ratings yet
An Empirical Case Study On Indian Consumers' Sentiment Towards Electric Vehicles - A Big Data Analytics Approach
12 pages
A Survey On Opinion Mining and Sentiment Analysis: Tasks, Approaches and Applications1-S2.0-S0950705115002336-Main
No ratings yet
A Survey On Opinion Mining and Sentiment Analysis: Tasks, Approaches and Applications1-S2.0-S0950705115002336-Main
33 pages
Interim Project - Sentiment Analysis of Movie
No ratings yet
Interim Project - Sentiment Analysis of Movie
101 pages
Big Data
No ratings yet
Big Data
30 pages
Case Study Predictive Analytics
No ratings yet
Case Study Predictive Analytics
1 page
Data Mining for Business Decisions
100% (1)
Data Mining for Business Decisions
85 pages
5.web Data Mining
No ratings yet
5.web Data Mining
41 pages
Family Decision Making in Consumer Behavior
100% (1)
Family Decision Making in Consumer Behavior
20 pages
A Project Report: A Study On Recommender Systems Employed by Indian E-Commerce Companies
No ratings yet
A Project Report: A Study On Recommender Systems Employed by Indian E-Commerce Companies
64 pages
Social Media Data Analytics For Business Decision Making System PDF
No ratings yet
Social Media Data Analytics For Business Decision Making System PDF
15 pages
Social Media and Web Analytics
No ratings yet
Social Media and Web Analytics
7 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
5 pages
Sentiment Analysis of Restaurant Customer
100% (1)
Sentiment Analysis of Restaurant Customer
6 pages
Detection of Fraud Apps Using Sentiment Analysis
No ratings yet
Detection of Fraud Apps Using Sentiment Analysis
4 pages
Sentiment Analysis On Movie Reviews Using RNN
No ratings yet
Sentiment Analysis On Movie Reviews Using RNN
10 pages
AMAZON PRODUCT REVIEW ANALYSIS-Report
No ratings yet
AMAZON PRODUCT REVIEW ANALYSIS-Report
77 pages
Sentiment Analysis Tutorial 2011
100% (2)
Sentiment Analysis Tutorial 2011
198 pages
Global Marketing: A Project Report
No ratings yet
Global Marketing: A Project Report
24 pages
Machine Learning
100% (1)
Machine Learning
46 pages
Final Report
100% (1)
Final Report
20 pages
Project Assignment.2024
No ratings yet
Project Assignment.2024
2 pages
Text Analytics
No ratings yet
Text Analytics
30 pages
Project Report Hate
100% (1)
Project Report Hate
24 pages
Aspect Based Sentiment Analysis Approaches and Algorithms
No ratings yet
Aspect Based Sentiment Analysis Approaches and Algorithms
4 pages
ML and Ai Synopsis
No ratings yet
ML and Ai Synopsis
8 pages
Text Analytics and Text Mining Overview
No ratings yet
Text Analytics and Text Mining Overview
16 pages
Twitter Sentiment Analysis Project Report Compressed
No ratings yet
Twitter Sentiment Analysis Project Report Compressed
33 pages
Movie Prediction
100% (1)
Movie Prediction
7 pages
Sentiment Analysis On Twitter Using Streaming Api: Abstract
No ratings yet
Sentiment Analysis On Twitter Using Streaming Api: Abstract
5 pages
Arm PPT
No ratings yet
Arm PPT
15 pages
Semantic Web SN
No ratings yet
Semantic Web SN
22 pages
Training Report On Machine Learning PDF
No ratings yet
Training Report On Machine Learning PDF
28 pages
DA Project Report
No ratings yet
DA Project Report
17 pages
BA 4 Module 1
No ratings yet
BA 4 Module 1
57 pages
Report Minor Project PDF
No ratings yet
Report Minor Project PDF
37 pages
An Introduction To Text: Mining
No ratings yet
An Introduction To Text: Mining
39 pages
Cost Reduction Techniques Final
100% (1)
Cost Reduction Techniques Final
89 pages
Advertising & Sales Promotion Study
No ratings yet
Advertising & Sales Promotion Study
56 pages
Sentiment Analysis Report
No ratings yet
Sentiment Analysis Report
31 pages
Major Pro On Sentiment Analysis of Mobile Reviews PDF
No ratings yet
Major Pro On Sentiment Analysis of Mobile Reviews PDF
73 pages
MBA Marketing Project Report
No ratings yet
MBA Marketing Project Report
71 pages
Final Report
No ratings yet
Final Report
79 pages
Multimodal Sentiment Analysis
No ratings yet
Multimodal Sentiment Analysis
11 pages
Supply Chain Analytics for Industry
No ratings yet
Supply Chain Analytics for Industry
27 pages
b10 PDF
100% (1)
b10 PDF
6 pages
Sentiment Analysis of Movie Reviews
No ratings yet
Sentiment Analysis of Movie Reviews
6 pages
Database Management System
No ratings yet
Database Management System
4 pages
Chi Square Formul1
No ratings yet
Chi Square Formul1
2 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
5 pages
Confusion Matrix in Machine Learning
No ratings yet
Confusion Matrix in Machine Learning
2 pages
Vijay Kumar: IBPS (SO) I.T.O Cer: Operating System Study Notes
No ratings yet
Vijay Kumar: IBPS (SO) I.T.O Cer: Operating System Study Notes
20 pages
CC3 Finals Toxicology of The Drugs of Abuse-1
No ratings yet
CC3 Finals Toxicology of The Drugs of Abuse-1
4 pages
My Philosophy As A Student
No ratings yet
My Philosophy As A Student
2 pages
OpEd Rubric
No ratings yet
OpEd Rubric
1 page
Team Project Video Assignment - MO221 - FA21
No ratings yet
Team Project Video Assignment - MO221 - FA21
2 pages
Nombre: - Fecha: - Curso: 2° BASICO Puntaje Ideal: 12 Ptos. Puntaje Obtenido
No ratings yet
Nombre: - Fecha: - Curso: 2° BASICO Puntaje Ideal: 12 Ptos. Puntaje Obtenido
3 pages
Lesbian Couples
No ratings yet
Lesbian Couples
4 pages
37 BEST Law of Attraction Tips
100% (1)
37 BEST Law of Attraction Tips
15 pages
NURSERY - D May 2025 - 2026 (PDF 2)
No ratings yet
NURSERY - D May 2025 - 2026 (PDF 2)
8 pages
The Foundations of Behavioral Economic Analysis Dhami Download
100% (1)
The Foundations of Behavioral Economic Analysis Dhami Download
93 pages
NLC Math 3 Enhancement WB v.1
No ratings yet
NLC Math 3 Enhancement WB v.1
77 pages
The Birdhouse Project As Experiential Le
No ratings yet
The Birdhouse Project As Experiential Le
18 pages
DONKERWEB CHAPTER SUMMARIES Afrikaans Help
No ratings yet
DONKERWEB CHAPTER SUMMARIES Afrikaans Help
34 pages
Nursing Care Plan
No ratings yet
Nursing Care Plan
3 pages
GVJH Stories
No ratings yet
GVJH Stories
4 pages
Alligood and Tomey (2010) : in Addition To Developing
No ratings yet
Alligood and Tomey (2010) : in Addition To Developing
2 pages
Irene Ziemba Letter of Professional Promise
No ratings yet
Irene Ziemba Letter of Professional Promise
2 pages
Thesis HardBound
No ratings yet
Thesis HardBound
151 pages
Nature of Organization by Animaw Yayeh
No ratings yet
Nature of Organization by Animaw Yayeh
3 pages
A Hero. Questions and Answers
No ratings yet
A Hero. Questions and Answers
4 pages
Song Analysis Grading Rubric 1
No ratings yet
Song Analysis Grading Rubric 1
2 pages
Topic 4 Qualitative Research Editted With Voice Narration
No ratings yet
Topic 4 Qualitative Research Editted With Voice Narration
28 pages
Language Listening Models Guide
100% (2)
Language Listening Models Guide
2 pages
Vocabulary MCQs for 2023 Exams
No ratings yet
Vocabulary MCQs for 2023 Exams
136 pages
Journal Club Meeting Paper
No ratings yet
Journal Club Meeting Paper
4 pages
Silabus Speaking IV
0% (1)
Silabus Speaking IV
4 pages
Bumboat
No ratings yet
Bumboat
19 pages
Crim 200-b TTH (Theory of Subcultural Violence)
No ratings yet
Crim 200-b TTH (Theory of Subcultural Violence)
15 pages
Coach Verbal Aggression Impact
No ratings yet
Coach Verbal Aggression Impact
11 pages
TROM1.3.5-Loco & Crew Links
100% (3)
TROM1.3.5-Loco & Crew Links
45 pages
The Handbook of Clinical Neuropsychology5
No ratings yet
The Handbook of Clinical Neuropsychology5
25 pages