FTC 2016 - Future Technologies Conference 2016
6-7 December 2016 | San Francisco, United States
      Fine-Grained Sentiment Analysis of Social Media
                   with Emotion Sensing
      Zhaoxia WANG, Chee Seng CHONG, Landy LAN, Yinping YANG, Seng Beng HO and Joo Chuan TONG
                                             Social and Cognitive Computing Department
                                           Institute of High Performance Computing (IHPC)
                                        Agency for Science, Technology and Research (A*STAR)
                                                               Singapore
                                                        wangz@ihpc.a-star.edu.sg
    Abstract—Social media is arguably the richest source of            analysis technologies focus on finding the aggregate level
human generated text input. Opinions, feedbacks and critiques          sentiment such that the sentiment polarity is typically one of
provided by internet users reflect attitudes and sentiments            two categories (“positive” and “negative”) or three categories
towards certain topics, products, or services. The sheer volume of     (with the addition of “neutral”) [2]. If finer-grained sentiment
such information makes it effectively impossible for any group of      analysis can be achieved, it will yield more specific and more
persons to read through. Thus, social media sentiment analysis         actionable results with detailed negative emotion subcategories
has become an important area of work to make sense of the social       such as anger, sadness, and anxiety or positive emotion
media talk. However, most existing sentiment analysis techniques       subcategories such as happiness and excitement [3].
focus only on the aggregate level, classifying sentiments broadly
into positive, neutral or negative, and lack the capabilities to           In this paper, we describe a new method for fine-grained
perform fine-grained sentiment analysis. This paper describes a        classification of social media sentiment. The actual sentiments
social media analytics engine that employs a social adaptive fuzzy     as well as detailed emotions were identified in accordance with
similarity-based classification method to automatically classify       industry needs. The basis for this method is a series of patents
text messages into sentiment categories (positive, negative,           filed in [4] [5] and [6].
neutral and mixed), with the ability to identify their prevailing
emotion categories (e.g., satisfaction, happiness, excitement,             The rest of this paper is organized as follows. Section II
anger, sadness, and anxiety). It is also embedded within an end-       discusses the existing sensing technologies. Section III presents
to-end social media analysis system that has the capabilities to       the proposed methodology of fine-grained sentiment analysis.
collect, filter, classify, and analyze social media text data and      Section IV examines the performance of the proposed method
display a descriptive and predictive analytics dashboard for a         using real world social media data. Lastly, in Section V, we
given concept. The proposed method has been developed and is           conclude this study.
ready to be licensed to users.
                                                                                       II.    EXISTING TECHNOLOGIES
   Keywords—sentiment classification; sentiment analysis;
opinion mining; social media; social adaptive fuzzy similarity;
                                                                           Sentiment analysis methods can be broadly categorized into
emotion                                                                two types: learning-based and lexical-based [7] [8]. Learning-
                                                                       based method uses known properties derived from labelled
                        I.     INTRODUCTION                            training data to make predictions about unlabelled new data. In
                                                                       text data, it derives the relationship between the features of the
   Social media, such as Twitter, Facebook and Chinese
                                                                       text segment. Some examples of learning-based methods are
Weibo, is overwhelmingly the go-to platform for internet users
                                                                       the Naïve Bayes (NB) classifier [9] [10], Maximum Entropy
to share their comments or experiences towards certain
                                                                       (MaxEnt) classifier [11], support vector machine (SVM) [12]
products, services or policies. It is a gold mine for those who
                                                                       [13] and Extreme Learning Machine (ELM) [14] [15].
appreciate the value of understanding public sentiment.
                                                                           To be effective, models using such learning-based methods
    There are various compelling use cases of social media
                                                                       typically require a sufficiently large labelled training dataset
sentiment analysis: consumers referring to online reviews to
                                                                       [15] [16] to achieve an acceptable classification accuracy [16]
help them make better purchase decisions; businesses eager to
                                                                       [17]. However, in most social media contexts, it is difficult to
understand market preferences in order to improve their
                                                                       determine what size of labelled dataset qualifies as being
offerings; politicians aspiring to gauge public response to their
                                                                       sufficient because the diversity of the social discussion is not
policies or speeches. Not surprisingly, one of the hottest areas
                                                                       known a priori [3] [12]. In addition, the labelling task would
of research in social analytics is sentiment analysis.
                                                                       be costly or even prohibitive [3] [7] [12], not to mention
   Sentiment analysis aims to understand the sentiment                 wasteful because the training results could not be readily
polarity of data [1]. A lot of social media analysis tools are         applied to other datasets.
now available to perform such analysis, such as Stanford NLP's
                                                                           On the other hand, lexical-based methods typically search a
natural language processing tool [2], Facebook Insights on
                                                                       text for sentiment or emotion indicators specified in the
Facebook and TweetStats on Twitter. However, these existing
                                                                       existing lexicons used [7] [18] [19] [20]. The effects of the
    The work is supported by A*STAR Joint Council Office Development
Programme “Social Technologies+ Programme”.
                                                                                                                         1361 | P a g e
                                                 978-1-5090-4171-8/16/$31.00 ©2016 IEEE
                                                                                FTC 2016 - Future Technologies Conference 2016
                                                                                6-7 December 2016 | San Francisco, United States
indicators are then aggregated in order to derive the dominant      applied the method to a real-world case which provides an
polarity of the text. Compared to learning-based methods,           answer to the key question of whether public sentiments are
lexical-based methods are easier to be applied across different     useful in support of effective social sensing and policy
datasets and costly labelling tasks are not required (no training   management. The resulting insights enable decision-makers to
needed).                                                            formulate strategies and fine-tune the quality of their products,
                                                                    services, and policies.
    However, there are shortcomings in the current lexicon-
based methods. It is hard to create a unique lexical-basic                           III.    PROPOSED TECHNIQUE
dictionary for use and test in different applications. Hence, the
existing methods use cleaned samples that are manually                  Detecting the attitude or emotions of a user with respect to
created. However, such data is different from real-world social     certain topics or certain domains is the aim of sentiment
media data and only real-world social media data can produce        analysis [2] [4] [5] [6]. Various techniques are leveraged for
true insight for organizations. The other shortcoming, as           the design of the proposed method [3]. These techniques
mentioned earlier in Section I, is the lack of fine-grained         include: linguistic inquiry and word count (LIWC) method
sensing capability [3] that provides detailed emotion               [27], the affective norms for English words (ANEW) approach
identification. These shortcomings of the lexicon based             for assigning normative emotional ratings to text [28], fuzzy
methods are also the limitations of the current learning based      logic [29] and emotion theories [21] [22] [23] [24].
methods.                                                               The proposed method pays special design attention to the
    The research on emotion has a long evolutionary history         challenges of real-world datasets. It uses an innovative social
and emotion research activities have increased significantly        adaptive fuzzy rule inference technique with linguistics
over the past two decades. One of the earlier efforts in emotion    processors designed to minimize semantic ambiguity. This is
research was the effort of Shaver et al. [21]. Shaver et al.        combined with multi-source lexicon integration and
grouped emotions into prototypes on the assumption that             development to derive dominant valence (positive, negative,
different parts of emotion knowledge tend to make up an             neutral, mixed) as well as prominent emotions (e.g., anger,
organized whole [21]. In their experiment, they first selected a    sadness, anxiety, satisfaction, happiness, excitement).
group of words and had them rated based on whether the word         A. Design Features and Components of the Proposed Method
was an emotion. Using the typical prototyping approach, they
                                                                        The backbone of the proposed method is a social adaptive
managed to develop an abstract-to-concrete emotion hierarchy.
                                                                    fuzzy inference algorithm that mimics human interpretations of
    Psychologists Ortony and Turner argued against the view         the expression of attitudes and emotions in online social
that basic emotions are psychologically primitive [22]. They        network contexts. There is also a built-in advanced linguistic
proposed that all emotions are discrete and independent and are     processing unit that contains the following sub-modules:
related to each other through a hierarchical structure.             sentence decomposers, negation handlers, amplifier, diminisher
                                                                    handlers, etc. [4] [5] [6] In addition, the proposed method is
    Ekman’s emotion model is based on the argument that             empowered by built-in linguistic lexicons from a variety of
there are distinctive facial expressions [23]. In this model, the   sources, including a dictionary of emotion words and phrases
emotions are treated as discrete, measurable, and                   from Standard English, Internet/social media slang and local
physiologically distinct. Each of the emotions is a family of       languages. It also includes emoticons. With more linguistics-
related states and this is consistent with Shaver’s model [21].     enhanced fuzzy similarity rules to handle sentiment
    Plutchik enhanced Ekman's biologically driven perspective       classification and without relying on any training data, it is thus
and developed the "wheel of emotions" [24]. He constructed a        able to achieve the same level of measurement accuracy with
wheel-like diagram of emotions to visualize the basic emotions      less human input than simple lexicon-based and learning-based
and grouped the primary emotions into a positive vs negative        methods.
category, e.g., joy versus sadness; anger versus fear; trust            The domain knowledge was obtained by using the domain
versus disgust; and surprise versus anticipation [24] [25].         lexicon knowledge extraction algorithm [30] to form domain
   On the other hand, Alena et al. also took the typical lexicon    lexicon dictionaries. In addition, to enhance domain
approach which leveraged and enhanced the above emotion             adaptability, an expert user can further configure the domain
models [26]. They had each emotion word annotated by expert         knowledge through the specification of a seed lexicon. For
annotators and compiled the words into an emotion dictionary        example, the expert user can add to the lexicon the phrase
[26].                                                               “salary lower” (in the company review domain), and to remove
                                                                    from the lexicon the word “smart” (as in “smart watch” in the
    The above efforts have contributed greatly to emotion           smart phone domain). This can achieve a higher measurement
research and identification; however, there have been rare          accuracy than simple lexicon-based and learning-based
research efforts that make use of them to integrate emotion         methods.
analysis into sentiment analysis and enhance the capability of
the sensing technologies.                                           B. Social Media Analysis System
   In this paper, we leverage the above emotion research to             To make the proposed method useful for real-world
develop fine-grained sentiment analysis technologies and            datasets, we implement it within an end-to-end social media
implement a fine-grained emotion sensing method to address          analysis system. The system consists of 6 modules, including
the limitations of the existing technologies. In addition, we       social data collectors, noise filters, sentiment & emotion
                                                                                                                       1362 | P a g e
                                            978-1-5090-4171-8/16/$31.00 ©2016 IEEE
                                                                                    FTC 2016 - Future Technologies Conference 2016
                                                                                    6-7 December 2016 | San Francisco, United States
analysis engine module, predictive analyser, results viewer and      further improves the assessment of the situation, particularly
database. Fig. 1 shows the system’s architecture.                    negative emotions requiring attention from decision-makers
                                                                     and crisis managers.
                                                                         As shown in Fig. 2, the final outcome of any text will be
                                                                     the sentiment categories and fine-grained emotions [2] [23]
                                                                     [24] [25]. Fig. 2 (a) shows the sentiments and Fig. 2 (b) shows
                                                                     the fine-grained emotions the system outputs.
Fig. 1. Social Media Analysis System [4] [5]
    The “Data Collector” crawls raw data from various Internet       Fig. 2. Positive or negative sentiment can be further broken-down into fine-
sites, including forums, Twitter, and other blogs. Depending on      grained emotions. (a) Sentiments; (b) Break down of negative sentiment into
                                                                     fine-grained emotions
whether the data sources provide programmatic interface to
read data (such as Twitter’s REST API based on keywords and
                                                                         For real-time testing of the proposed method, the interface
Streaming API that reads data constantly), the module is a
                                                                     of real-time data analysis is illustrated in Fig.3. The tweets are
collection of codes that collects data and passes them to the
                                                                     used as a test case to illustrate real time data collection,
“Noise Filter” module before processing.
                                                                     analysis and visualization. The data containing geographic
    The “Noise Filter/Smart Filter” removes noisy                    information is displayed in the form of a map.
“meaningless data”, such as advertisements, useless content
which does not include any comment information, and other
content-specific noises. Raw data are pre-processed by “Noise
Filter” to determine if they are relevant data or irrelevant data.
The relevant data are passed to an optional sub-module, “User-
defined filter”, that allows the user to define rules to further
trim out some data. These filtering ensures that data passed to
the “sentiment analysis engine” module is relevant to the
intended concept for further analysis.
    The “Predictive Analyzer” performs the task of predictive
analysis of important outcomes such as sales volumes and
reputation crisis so that it can be used for important business
activities of forecasting, monitoring and action strategizing. It
includes two key components, 1) the predictor/feature set and
2) the predictive algorithm pool. The output of sentiment and
emotion analysis (i.e., such as positive, negative, neutral and      Fig. 3. Part of the interface of the social media analytics system
mixed sentiments, and anger, sadness and anxiety emotions)
serves as a new predictor/feature on top of existing                                            V.      CONCLUSION
predictors/features.                                                     This research describes a social media analytics method
    Consumer preference analysis, anomaly identification and         that is able to perform fine-grained sentiment and emotion
time-series analysis for sales forecasting will be realized          analysis. This research offers new ideas for designing a robust
through leveraging the output of the sentiment and emotion           method that leverages adaptive learning capabilities, fuzzy
analysis engine combined the other results obtained through          logic, and social science concepts in handling fine-grained
the predictive algorithm pool.                                       sensing classification (sentiments as well as emotions) in
                                                                     textual datasets. There are ample opportunities to apply the
  IV.      A REALWORLD CASE STUDY THROUGH THE SOCIAL                 proposed method to other sectors such as the healthcare,
                  MEDIA ANALYSIS SYSTEM                              corporate, leisure, public and private sectors to help them to
    While understanding the valence of sentiments helps to           understand their customers better, identify the relevant risks,
assess overall public reactions, the understanding of emotion        and improve their products and services.
                                                                                                                                  1363 | P a g e
                                               978-1-5090-4171-8/16/$31.00 ©2016 IEEE
                                                                                                 FTC 2016 - Future Technologies Conference 2016
                                                                                                 6-7 December 2016 | San Francisco, United States
                            ACKNOWLEDGMENT                                        [13] T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual
                                                                                       polarity: An exploration of features for phrase-level sentiment analysis,”
   The “SentiMo-Advanced Social Media Analytics” team                                  Assoc. Comput. Linguist., vol. 35, no. 3, 2009.
provided great support for the discussion on issues related to                    [14] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:
emotion and the algorithm development and implementation.                              Theory and applications,” Neurocomputing, vol. 70, no. 1–3, pp. 489–
The authors would also like to thank Dr. Kenneth Kwok, Dr                              501, Dec. 2006.
Paul Yang and their team for the discussion on issues related to                  [15] Z. Wang and Y. Parth, “Extreme Learning Machine for Multi-class
emotion and knowledge building.                                                        Sentiment Classification of Tweets,” Proc. ELM-2015, Springer Int.
                                                                                       Publ. 2016, vol. 1, pp. 1–11, 2016.
    The proposed method and system had been developed and                         [16] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up? sentiment
the     API     version   is    ready     to   be     licensed:                        classification using machine learning techniques,” Proc. ACL-02 Conf.
https://www.etpl.sg/innovation-offerings/technologies-for-                             Empir. methods Nat. Lang. Process. Assoc. Comput. Linguist., vol. 10,
                                                                                       pp. 79–86, 2002.
license/tech-offers/2087.
                                                                                  [17] E. Haddi, X. Liu, and Y. Shi, “The role of text pre-processing in
                                 REFERENCES                                            sentiment analysis,” Procedia Comput. Sci., vol. 17, pp. 26–32, Jan.
[1]  A. Trilla and F. Alías, “Sentence-based sentiment analysis for expressive         2013.
     text-to-speech,” Audio, Speech, Lang. Process. IEEE Trans., vol. 21, no.     [18] R. Feldman, “Techniques and applications for sentiment analysis,”
     2, pp. 223–233, 2013.                                                             Commun. ACM, vol. 56, no. 4, p. 82, Apr. 2013.
[2] R. Socher, A. Perelygin, and J. Wu, “Recursive deep models for                [19] I. Maks and P. Vossen, “A lexicon model for deep sentiment analysis
     semantic compositionality over a sentiment treebank,” Proc. Conf.                 and opinion mining applications,” Decis. Support Syst., vol. 53, no. 4,
     Empir. methods Nat. Lang. Process., vol. 1631, p. 1642, 2013.                     pp. 680–688, Nov. 2012.
[3] Z. Wang, J. C. Tong, and D. Chan, “Issues of social data analytics with a     [20] Y. Rao, J. Lei, L. Wenyin, Q. Li, and M. Chen, “Building emotional
     new method for sentiment analysis of social media data,” in 2014 IEEE             dictionary for sentiment analysis of online news,” World Wide Web,
     6th International Conference on Cloud Computing Technology and                    vol. 17, pp. 723–742, Jun. 2014.
     Science, 2014, pp. 899–904.                                                  [21] P. Shaver, J. Schwartz, D. Kirson, and C. O’Connor, “Emotion
[4] Z. Wang, R. S. M. Goh, and Y. Yang, “A method and system for                       knowledge: further exploration of a prototype approach.,” J. Pers. Soc.
     sentiment classification and emotion classification,” Patent Cooperation          Psychol., vol. 52, no. 6, pp. 1061–1086, 1987.
     Treaty (PCT) Application, PCT/SG2015/050469, 2014.                           [22] A. Ortony and T. J. Turner, “What ’ s Basic About Basic Emotions ?,”
[5] Z. Wang, R. S. M. Goh, and Y. Yang, “SentiMo-A Method and system                   Psychol. Rev., vol. 97, no. 3, pp. 315–331, 1990.
     for fine-grained classification of social media sentiment and emotion        [23] P. Ekman, “An argument for basic emotions,” Cognition & Emotion,
     patterns,” Singapore Patent Application10201407766R, 2014.                        vol. 6. pp. 169–200, 1992.
[6] Z. Wang and J. C. Tong, “ChiEFS-A method and system for Chinese               [24] D. Chafale and A. Pimpalkar, “Review on Developing Corpora for
     hybrid multilingual emotion fine-grained sensing of text data,”                   Sentiment Analysis Using Plutchik ’ s Wheel of Emotions with Fuzzy
     Singapore Patent Application No. 10201601413Q, 2015.                              Logic,” Int. J. Comput. Sci. Eng., vol. 2, no. 10, 2014.
[7] P. Gonçalves and M. Araújo, “Comparing and combining sentiment                [25] R. Plutchik, “The Nature of Emotions Human emotions have deep
     analysis methods,” Proc. first ACM Conf. Online Soc. networks. ACM.,              evolutionary roots, a fact that may explain their complexity and provide
     pp. 27–38, 2013.                                                                  tools for clinical practice,” Am. Sci., vol. 89, no. 4, pp. 344–350, 2001.
[8] B. Yuan, Y. Liu, and H. Li, “Sentiment classification in Chinese              [26] A. Neviarouskaya, H. Prendinger, and M. Ishizuka, “Textual affect
     microblogs: Lexicon-based and learning-based approaches,” Int. Proc.              sensing for sociable and expressive online communication,” Affect.
     Econ. Dev. Res., vol. 68, pp. 1–6, 2013.                                          Comput. Intell. Interact., pp. 218–229, 2007.
[9] J. Ortigosa-Hernández, J. D. Rodríguez, L. Alzate, M. Lucania, I. Inza,       [27] Y. R. Tausczik and J. W. Pennebaker, “The psychological meaning of
     and J. a. Lozano, “Approaching sentiment analysis by using semi-                  words: LIWC and computerized text analysis methods,” J. Lang. Soc.
     supervised learning of multi-dimensional classifiers,” Neurocomputing,            Psychol., vol. 29, no. 1, pp. 24–54, Dec. 2010.
     vol. 92, pp. 98–115, Sep. 2012.
                                                                                  [28] A. P. Soares, M. Comesaña, A. P. Pinheiro, A. Simões, and C. S. Frade,
[10] X. Glorot, A. Bordes, and Y. Bengio, “Domain adaptation for large-                “The adaptation of the Affective Norms for English Words (ANEW) for
     scale sentiment classification: A deep learning approach,” Proc. 28th Int.        European Portuguese,” Behav. Res. Methods, vol. 44, pp. 256–269,
     Conf. Mach. Learn., pp. 513–520, 2011.                                            2012.
[11] H. Ji, H. Deng, and J. Han, “Uncertainty reduction for knowledge             [29] J. M. Mendel and D. Wu, “Challenges for perceptual computer
     discovery and information extraction on the World Wide Web,” Proc.                applications and how they were overcome,” IEEE Comput. Intell. Mag.,
     IEEE, vol. 100, no. 9, pp. 2658–2674, Sep. 2012.                                  vol. 7, no. 3, pp. 36 – 47, 2012.
[12] B. Gokaraju, S. S. Durbha, R. L. King, S. Member, and N. H. Younan,          [30] Z. Wang, J. C. Tong, P. Ruan, and F. Li, “Lexicon knowledge extraction
     “A machine learning based spatio-temporal data mining approach for                with sentiment polarity computation,” IEEE Int. Conf. Data Min. Ser.
     detection of harmful algal blooms in the Gulf of Mexico,” IEEE J. Sel.            (ICDM), SENTIRE, Accept., 2016.
     Top. Appl. earth Obs. Remote Sens., vol. 4, no. 3, pp. 710–720, 2011.
                                                                                                                                              1364 | P a g e
                                                      978-1-5090-4171-8/16/$31.00 ©2016 IEEE