CNN based Capsule Model for Sentiment
Analysis
M.E. Dissertation Presentation
Dated: 5 July 2018
Presented By: Supervisor(s):
Manish Bisht Dr. Poonam Saini
ME-CSE Assistant Professor
SID: 16205012 Prof. Anoop Dobhal
Assistant Professor
Department of Computer Science and Engineering
Punjab Engineering College (Deemed to be University), Chandigarh
Outline
⊡ Introduction
⊡ Literature Review
□ Literature survey
□ Research gaps
⊡ Methodology and Problem statement
□ Motivation
□ Problem statement
□ Objectives
□ Methodology
⊡ Implementation
⊡ Simulation
□ Simulation details
□ Simulation Results
⊡ Conclusion and future scope
⊡ References
2
1
Introduction
3
Introduction
⊡ Process of computationally identifying and categorizing opinions
expressed in text towards a particular topic, product, etc. is
positive or negative.
The movie The movie The movie
was fabulous! stars Mr. X was horrible!
4
Types of Sentiment
Analysis (SA)
Document level Sentence level Aspect level
main task is to sentence is Allows to extract
define opinion of considered as a opinions
the whole short document towards aspects
document which can be of entities.
opinion should be subjective or
expressed about objective.
one topic. Subjective
(opinionated)
sentence
expresses
sentiment.
5
Approaches of SA
6
Lexican based
approach
Dictionary-based approach Corpus-based approach
Uses a lexicon that consists of Identification of opinion words
terms with respective and their polarities in the
sentiment scores to each term. domain corpus using a given
set of opinion words.
Building a new lexicon within
the particular domain from
another lexicon using a
domain corpus.
7
Machine Learning
approach
Unsupervised machine Supervised machine
learning methods learning methods
Uses unlabeled datasets in Assume the presence of
order to discover the structure labeled training data that are
and find the similar patterns used for the learning process.
from the input data. Estimates the output from the
input dataset
8
2
Literature Review
9
Literature Survey
S.No. Title Source Observations
1. Efficient Estimation of arXiv preprint • Improved vector
Word Representations arXiv:1301.3781 representation of words.
in Vector Space • Captures relation between
(Mikolov et al., 2013) words.
2. Glove: Global vectors Proceedings of the 2014 • Used efficient statistics
for word conference on empirical methods to improve word
representation(Penning methods in natural language representation
ton et al., 2014) processing (EMNLP), Qatar • Faster than word2vec
3. Twitter sentiment CS224N Project Report, • Emoticons are used to train
classification using Stanford the classifier.
distant supervision (Go • Novel preprocessing steps
et al., 2009) were introduced
10
Literature Survey
S.No. Title Source Observations
4. Convolutional arXiv preprint • Shows state of the art performance
neural networks for arXiv:1408.5882 on 4 NLP task using deep neural
sentence network.
classification (Kim, • Initialization of CNN with good word
2014) representation improves accuracy.
5. Twitter sentimentProceedings of the 38th • Initialized model with wor2vec and
analysis with deep International ACM SIGIR then tuned with distant supervision
convolutional neural Conference on Research • Produces state of the art results
networks(Severyn and Development in
et al., 2015) Information Retrieval,
Brazil
6. SemEval-2016 task Proceedings of the 10th • Ensemble of two CNNs are used to
4: Sentiment International Workshop improve accuracy
analysis in Twitter on Semantic Evaluation • Predictions are combined using
(Nakov et al., 2016) (SemEval-2016), USA random forest classifier
11
Literature Survey
S.No. Title Source Observations
7. SemEval-2017 task 4: Proceedings of the • Ensemble of several
Sentiment analysis in Twitter 11th International CNNs and LSTMs are
(Rosenthal et al., 2017) Workshop on used to improve accuracy
Semantic Evaluation
(SemEval-2017),
Canada
8. Dynamic routing between Advances in Neural • Dynamic routing algorithm
capsules (Sabour et al., 2017) Information Processing is introduced to take
Systems, USA information from one level
to next.
9. Sentiment analysis by capsules Proceedings of the • A hybrid approach of RNN
(Wang et al., 2018) 2018 World Wide Web and capsule network is
Conference on World proposed
Wide Web , USA • Achieves state of the art
performance on movie
review dataset
12
Capsule Network
Figure: Architecture of Capsule Network (Sabour et al., 2017)
13
Research Gaps
Pooling layer in CNN keeps only the most dominating feature. For
e.g., a standard CNN considers “you are amazing” and “are you
amazing” as sentences with same sentiment.
Research is only limited to textual data.
Real time opinion minor can be build.
14
Research Gap
explained
Convolution + activation Pooling
You 1 0 0 0 0
are 0 0 1 1 1 1
Amazing 1 1 1 1 1
0.5
0
1
0.5
Are 0 0 1 1 1 1
you 1 0 0 0 0 0
Amazing 1 1 1 1 1
Convolution + activation
15
3
Methodology and Problem
Statement
16
Motivation
⊡ Knowing sentiment is a very natural ability of a human being.
Can a machine be trained to do it?
⊡ SA aims at getting sentiment-related knowledge especially from the
huge amount of information on the internet
⊡ Can be generally used to understand opinion in a set of documents
17
Problem Statement
To develop a hybrid text classification model using CNN
and capsule network which can improve accuracy of
sentiment classification task.
18
Objectives
⊡ To study feature selection methods for text classification and to
investigate machine learning algorithms that can be applied for
classification problem.
⊡ To analyze accuracy of existing algorithms with respect to
different datasets and analyze their computational time and
resource requirement.
⊡ To evaluate the results of applied techniques and propose an
improvement model based on results obtained.
19
Methodology
20
4
Implementation
21
Preprocessing
⊡ Remove URLs, special characters
⊡ Stemming and lemmatization
⊡ Tokenization
22
Word Embedding’s
Model
⊡ Word embedding vectors initialized by Glove are used.
⊡ Word embedding vectors are pre-trained on unlabeled corpus
size of 840 billion with dimension size 300.
23
Proposed System
Model
Sentiment Capsule
Figure: Architecture of CNN based Capsule Model
24
Training Operation
25
Dynamic Routing
by Agreement
Algorithm: Following steps are performed in this algorithm and are repeated for r iterations.
1. Softmax is calculated for all the routing weights corresponding to a primary capsule.
ci = Softmax (bi )
2. Weighted sum is calculated for each predicted vector
sj = σ𝑖 𝐶𝑖𝑗 û𝑗|𝑖
3. Weighted sum is squashed to get the output of sentiment capsule
Vj = Squash(Si )
4. Routing weight gets updated
bij = bij + ûij x Vj
26
5
Simulation Details
27
Dataset
⊡ Movie reviews dataset is used from www.rottentomatoes.com
⊡ There are 5331 positive and 5331 negative reviews
⊡ 10% of the data is used as testing set.
28
Python (3.6)
Libraries Used For
Keras Inbuilt CNN and RNN
Support GPU
Inbuilt activation functions
Tensorflow Efficient matrix multiplication
Dill Serializing objects
Gensim Inbuilt word2vec model
Transforming GloVe to word2vec
NLTK Preprocessing of data
Tokenization, removing stop words, removing URLs, special characters
Multiprocessing Parallel computing
Computing thousands of operations in parallel on GPU
Pandas Importing and analyzing data
Numpy Compute and manipulate dataset
29
Jupyter Notebook
⊡ platform between server and client that allows execution and
implementation of python code via internet browser.
⊡ can be accessed remotely through web.
⊡ (VM) instances are taken from Google cloud and accessed
through external IP.
30
Google Cloud
Resource Name Resource Configuration
No. of CPU 8
1 CPU RAM 3.75 GB
Total RAM used 30
No. of GPU 1
Total GPU RAM 12 GB
Secondary Memory used 30 GB
31
6
Simulation Results
32
Confusion matrix
n=1064 Predicted Positive Predicted Negative
Actual Positive TP = 437 FN = 95
Actual Negative FP = 85 TN = 447
33
Performance
Parameters
Parameters Value in percent
Accuracy 83
Misclassification Rate 16.8
True Positive Rate 82
False Positive Rate 16
Specificity 84
Precision 83
Prevalence 50
F1 score 84
34
Accuracy
Model Movie Review (MR)
RAE 77.7
RNTN 75.9
LSTM 77.4
Bi-LSTM 79.3
LR-LSTM 81.5
LR-Bi-LSTM 82.1
Tree-LTSM 80.7
CNN 81.5
NCSL 82.9
CNN-Capsule 83.05
RNN-Capsule 83.8
35
CNN vs RNN
capsule
⊡ CNNs tend to be much faster (~5 times faster) than RNN.(Sharan Narang.
2018. DeepBench. [ONLINE] Available at: https://github.com/baidu-
research/DeepBench. [Accessed 5 June 2018].)
⊡ CNN capsule is 8 times faster than RNN capsule
Filter No. of Padding Stride Fwd
Application Total Time (ms) Processor
Size Filters (h, w) (h, w) TeraFLOPS
R = 5, Sentiment Tesla V100
32 0, 0 2, 2 1.03 7.75
S = 20 analysis FP32
Total
Hidden Batch Recurrent Fwd
TimeSteps Application Time Processor
Units Size Type TeraFLOPS
(ms)
Sentiment
1760 16 50 Vanilla 8.21 1.19 Tesla V100
analysis
36
Conclusion
⊡ shows that this simple capsule model achieves good sentiment
classification accuracy without any carefully designed instance
representations or linguistic knowledge.
⊡ proposed CNN based capsule model achieves nearly 83%
accuracy.
⊡ objective of the training is to maximize the length of the activation
vector of capsule and minimizing its reconstruction loss.
37
Future Work
⊡ training data must be large enough to train the system properly so
that system can be initialized to produce accurate results.
⊡ preprocessing steps can be altered or enhanced to improve the
accuracy of the model.
⊡ Parameters of the model can be changed to improve accuracy.
38
References
⊡ Bautin, M., Vijayarenu, L. and Skiena, S., 2008, April. International Sentiment
Analysis for News and Blogs. In ICWSM.
⊡ Beineke, P., Hastie, T., Manning, C. and Vaithyanathan, S., 2004. Exploring
sentiment summarization. In Proceedings of the AAAI spring symposium on
exploring attitude and affect in text: theories and applications (Vol. 39, pp. 1-4).
⊡ Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T., 2016. Enriching word vectors
with subword information. arXiv preprint arXiv:1607.04606.
⊡ Bollen, J., Mao, H. and Zeng, X., 2011. Twitter mood predicts the stock market.
Journal of computational science, 2(1), pp.1-8.
⊡ Cliche, M., 2017. BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with
CNNs and LSTMs. arXiv preprint arXiv:1704.06125.
⊡ Crossley, S.A., Kyle, K. and McNamara, D.S., 2017. Sentiment Analysis and Social
Cognition Engine (SEANCE): An automatic tool for sentiment, social cognition, and
social-order analysis. Behavior research methods, 49(3), pp.803-821.
39
References
⊡ Deriu, J., Gonzenbach, M., Uzdilli, F., Lucchi, A., Luca, V.D. and Jaggi, M., 2016.
Swisscheese at semeval-2016 task 4: Sentiment classification using an ensemble of
convolutional neural networks with distant supervision. In Proceedings of the 10th
International Workshop on Semantic Evaluation (No. EPFL-CONF-229234, pp.
1124-1128).
⊡ Go, A., Bhayani, R. and Huang, L., 2009. Twitter sentiment classification using
distant supervision. CS224N Project Report, Stanford, 1(12).
⊡ Goodfellow, I., Bengio, Y., Courville, A. and Bengio, Y., 2016. Deep learning (Vol. 1).
Cambridge: MIT press.
⊡ Horrigan, J.A., 2008. Online shopping. Pew Internet & American Life Project Report,
36, pp.1-24.
⊡ Kalchbrenner, N., Grefenstette, E. and Blunsom, P., 2014. A convolutional neural
network for modelling sentences. arXiv preprint arXiv:1404.2188.
⊡ Khan, F.H., Bashir, S. and Qamar, U., 2014. TOM: Twitter opinion mining framework
using hybrid classification scheme. Decision Support Systems, 57, pp.245-257.
40
References
⊡ Kim, P., 2006. The forrester wave: Brand monitoring, Q3 2006. Forrester Wave
(white paper).
⊡ Kim, Y., 2014. Convolutional neural networks for sentence classification. arXiv
preprint arXiv:1408.5882.
⊡ Kingma, D.P. and Ba, J., 2014. Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980.
⊡ Lei, T., Barzilay, R. and Jaakkola, T., 2015. Molding cnns for text: non-linear, non-
consecutive convolutions. arXiv preprint arXiv:1508.04112.
⊡ Liu, B., 2012. Sentiment analysis and opinion mining (synthesis lectures on human
language technologies). Morgan & Claypool Publishers, 5(1), pp.1-67.
⊡ Mikolov, T., 2012. Statistical language models based on neural networks.
Presentation at Google, Mountain View, 2nd April.
⊡ ‘a’ Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word
representations in vector space. arXiv preprint arXiv:1301.3781.
41
References
⊡ ‘b’ Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J., 2013. Distributed
representations of words and phrases and their compositionality. In Advances in
neural information processing systems (pp. 3111-3119).
⊡ O'Connor, B., Balasubramanyan, R., Routledge, B.R. and Smith, N.A., 2010. From
tweets to polls: Linking text sentiment to public opinion time series. Icwsm, 11(122-
129), pp.1-2.
⊡ Pak, A. and Paroubek, P., 2010, May. Twitter as a corpus for sentiment analysis
and opinion mining. In LREc (Vol. 10, No. 2010).
⊡ Pang, B. and Lee, L., 2005, June. Seeing stars: Exploiting class relationships for
sentiment categorization with respect to rating scales. In Proceedings of the 43rd
annual meeting on association for computational linguistics (pp. 115-124).
Association for Computational Linguistics.
⊡ Pennington, J., Socher, R. and Manning, C., 2014. Glove: Global vectors for word
representation. In Proceedings of the 2014 conference on empirical methods in
natural language processing (EMNLP) (pp. 1532-1543).
42
References
⊡ ‘Sabour, S., Frosst, N. and Hinton, G.E., 2017. Dynamic routing between capsules.
In Advances in Neural Information Processing Systems (pp. 3859-3869).
⊡ Saif, H., He, Y., Fernandez, M. and Alani, H., 2016. Contextual semantics for
sentiment analysis of Twitter. Information Processing & Management, 52(1), pp.5-19.
⊡ Saif, H., He, Y., Fernandez, M. and Alani, H., 2016. Contextual semantics for
sentiment analysis of Twitter. Information Processing & Management, 52(1), pp.5-19.
⊡ Severyn, A. and Moschitti, A., 2015, August. Twitter sentiment analysis with deep
convolutional neural networks. In Proceedings of the 38th International ACM SIGIR
Conference on Research and Development in Information Retrieval (pp. 959-962).
ACM.
⊡ Tai, K.S., Socher, R. and Manning, C.D., 2015. Improved semantic representations
from tree-structured long short-term memory networks. arXiv preprint
arXiv:1503.00075.
43
References
⊡ Thakkar, H. and Patel, D., 2015. Approaches for sentiment analysis on twitter: A
state-of-art study. arXiv preprint arXiv:1512.01043.
⊡ Tumasjan, A., Sprenger, T.O., Sandner, P.G. and Welpe, I.M., 2010. Predicting
elections with twitter: What 140 characters reveal about political sentiment. Icwsm,
10(1), pp.178-185. Wang, Y., Sun, A., Han, J., Liu, Y. and Zhu, X., 2018, April.
Sentiment analysis by capsules. In Proceedings of the 2018 World Wide Web
Conference on World Wide Web (pp. 1165-1174). International World Wide Web
Conferences Steering Committee.
⊡ Wu, D.D., Zheng, L. and Olson, D.L., 2014. A decision support approach for online
stock forum sentiment analysis. IEEE transactions on systems, man, and
cybernetics: systems, 44(8), pp.1077-1087.
⊡ Zabin, J. and Jefferies, A., 2008. Social media monitoring and analysis: Generating
consumer insights from online conversation. Aberdeen Group Benchmark Report,
37(9).
⊡ Zhu, X., Kiritchenko, S. and Mohammad, S., 2014. Nrc-canada-2014: Recent
improvements in the sentiment analysis of tweets. In Proceedings of the 8th
international workshop on semantic evaluation (SemEval 2014) (pp. 443-447).
44