Emotion Recognition with Machine Learning Using
EEG Signals
           Omid Bazgir*
                                                        Zeynab Mohammadi                          Seyed Amir Hassan Habibi
    Department of Electrical and
                                                     Department of Electrical and              Department of Neurology of Rasool
      Computer Engineering
                                                       Computer Engineering                             Akram Hospital
      Texas Tech University
                                                        University of Tabriz                   Iran University of Medical Sciences
      Lubbock, Texas, USA
                                                            Tabriz, Iran                                  Tehran, Iran
    Email: omid.bazgir@ttu.edu
Abstract—In this research, an emotion recognition system is            attributes of the classifiers. A variety of features have been
developed     based    on   valence/arousal     model     using        extracted from the time domain [4], frequency domain [5], and
electroencephalography (EEG) signals. EEG signals are                  joint time-frequency domain [6] from EEG signals, in
decomposed into the gamma, beta, alpha and theta frequency             intelligent emotion recognition systems. The wavelet
bands using discrete wavelet transform (DWT), and spectral             transform is able to decompose signals in specific frequency
features are extracted from each frequency band. Principle             bands with minimum time-resolution loss. To this end,
component analysis (PCA) is applied to the extracted features          choosing an appropriate mother wavelet is crucial.
by preserving the same dimensionality, as a transform, to make         Murugappan [7] considered four different mother wavelets,
the features mutually uncorrelated. Support vector machine
                                                                       namely ‘db4’, ‘db8’, ‘sym8’ and ‘coif5’ to extract the
(SVM), K-nearest neighbor (KNN) and artificial neural
network (ANN) are used to classify emotional states. The cross-
                                                                       statistical features, including standard deviation, power and
validated SVM with radial basis function (RBF) kernel using            entropy from EEG signal. The KNN classifier was employed
extracted features of 10 EEG channels, performs with 91.3%             to classify five categories of emotions (disgust, happy,
accuracy for arousal and 91.1% accuracy for valence, both in           surprise, fear, and neutral). The supreme accuracy rate was
the beta frequency band. Our approach shows better                     about 82.87% on 62 channels and 78.57% on 24 channels.
performance compared to existing algorithms applied to the             Nasehi and Pourghasem [8], applied Gabor function and
“DEAP” dataset.                                                        wavelet transform to extract spectral, spatial and temporal
                                                                       features from four EEG channels. The artificial neural
  Keywords: Emotion, Machine Learning, Valence-arousal,                network (ANN) classifier was used for classification of six
EEG, DWT, PCA, SVM, KNN.                                               kinds of emotions (happiness, surprise, anger, fear, disgust
                                                                       and sadness), with 64.78% accuracy. Ishino and Hagiwara
                     I.     INTRODUCTION                               extracted mean and variance from power spectra density
    Emotion states are associated with wide variety of human           (PSD), wavelet coefficients of EEG data. Then, a neural
feelings, thoughts and behaviors; hence, they affect our ability       network was trained based on the principal components of the
to act rationally, in cases such as decision-making, perception        features to classify four types of emotion (joy, sorrow,
and human intelligence. Therefore, studies on emotion                  relaxation and anger) with 67.7% classification rate.
recognition using emotional signals enhance the brain-                 Mohammedi et al [9], extracted spectral features including
computer interface (BCI) systems as an effective subject for           energy and entropy of wavelet coefficients from 10 EEG
clinical applications and human social interactions [1].               channels. The maximum classification accuracy using KNN
Physiological signals are being used to investigate emotional          was 84% for arousal and 86% for valence. Jie et al [10]
states while considering natural aspects of emotions to                employed Kolmogorov-Smirnov (K-S) test to select an
elucidate therapeutics for psychological disorders such as             appropriate channel for extracting sample entropy as a feature
autism spectrum disorder (ASD), attention deficit                      and used it as input of an SVM. The maximum accuracy of
hyperactivity disorder (ADHD) and anxiety disorder [2]. In             Jie’s method is 80.43% and 71.16% respectively for arousal
recent years, developing emotion recognition systems based             and valence. Ali et al [11], combined wavelet energy, wavelet
on EEG signals have become a popular research topic among              entropy, modified energy and statistical features of EEG
cognitive scientists.                                                  signals, using three classifiers including; SVM, KNN, and
                                                                       quadratic discriminant analysis (QDA), to classify emotion
    To design an emotion recognition system using EEG                  states. The overall obtained classification accuracy of Alie’s
signals, effective feature extraction and optimal classification       method was 83.8%.
are the main challenges. EEG signals are non-linear, non-
stationary, buried into various sources of noise and are random            Different theories and principles have been proposed by
in nature [3]. Thus, handling and extracting meaningful                experts in psychology and cognitive sciences about defining
features from EEG signals plays a crucial role in an effective         and discriminating emotional states, such as being cognitive
designing of an emotion recognition system. Extracted                  or non-cognitive. In this heated debate that continues to go
features quantify the EEG signals and then are used as                 on, there is one claim that states that instead of classifying
                                                                       emotions into boundless common, basic, types of emotions
such as; happiness, sadness, anger, fear, joy so on and so forth,   B. Channel selection
emotion can be classified based on valence-arousal model                According to Coan et al. [15], positive and negative
introduced the rooted brain connectivity [12, 13]. The              emotions are respectively associated with left and right frontal
valence-arousal model is a bi-dimensional model which               brain regions. They have shown that, the brain activity
inculdes four emotional states; high arousal high valence           decreases more in the frontal region of the brain as compared
(HAHV), high arousal low valence (HALV), low arousal high           to other regions. Therefore, the following channels were
valence (LAHV) and low arousal low valence (LALV) [1].              selected to investigate in this study: F3-F4, F7-F8, FC1-FC2,
Therefore, each of the common emotional states can be               FC5- FC6, and FP1- FP2.
modeled and interpreted based on valence-arousal model,
figure 1.                                                           C. Preprocessing
                                                                        To reduce the electronic amplifier, power line and
                                                                    external interference noise, the average mean reference
                                                                    (AMR) method was utilized. For each selected channel, the
                                                                    mean is calculated and subtracted from every single sample
                                                                    of that channel. To reduce the individual difference effect, all
                                                                    the values were normalized between [0, 1].
                                                                    D. Feature extraction
                                                                        Due to DWT effective multi-resolution capability in
                                                                    analysis of non-stationary signals, we followed our previous
                                                                    work [9] for the feature extraction, in which, DWT was
                                                                    applied on the windowed EEG signals of the selected
                                                                    channels. The EEG signals are windowed due to increasing
                                                                    possibility of the quick detection of the emotional state. Thus,
                                                                    the 4- and 2-seconds temporal windows with 50% overlap
                                                                    were chosen. The EEG signals are decomposed into 5
                                                                    different bands, including; theta (4-8 Hz), alpha (8-16 Hz),
                                                                    beta (16-32 Hz), gamma (32-64 Hz) and noises (> 64 Hz) via
 Figure 1. Interpretation of different emotions based on valence-
                        arousal model [12].
                                                                    db4 mother wavelet function. Afterwards, the entropy and
                                                                    energy were extracted from each window of every frequency
    In this paper, an emotion recognition system is developed       band.
based on the valence-arousal model. We applied wavelet                  Entropy is a measurement criterion of the amount of
transform on EEG signals, then energy and entropy were              information within the signal. The entropy of signal over a
extracted from 4 different decomposed frequency bands. PCA          temporal window within a specific frequency band is
was used to make the attributes uncorrelated. SVM, KNN and          computed as:
ANN are utilized for emotional states classification into                                𝑁
arousal/valence dimension.                                                 𝐸𝑁𝑇𝑗 = − ∑(𝐷𝑗 (𝑘)2 ) log(𝐷𝑗 (𝑘)2 )            (1)
                                                                                      𝑘=1
                     II.    METHODOLOGY                                By summing the square of the wavelet coefficients over
A. Data Acquisition                                                 temporal window, the energy for each frequency band is
    In this study, the DEAP database, a database for emotion        computed:
                                                                                     𝑁
analysis using physiological signals, labeled based on
valence-arousal-dominance emotion model, is used [14].                     𝐸𝑁𝐺𝑗 = ∑(𝐷𝑗 (𝑘)2 ) 𝑘 = 1,2, … , 𝑁.            (2)
DEAP dataset includes 32 participants. To stimulate the                             𝑘=1
auditory and visual cortex, 1-min long music videos were                 Where j is the wavelet decomposition level (frequency
played for each participant. 40 music videos were shown to          band), and k is the number of wavelet coefficients within the
each participants, and seven different modalities were              j frequency band.
recorded, EEG is used in this study, more information is            E. Principal component analysis
provided in [14]. The 40 video clips were pre-determined so
that their valence/arousal time length would be large enough            PCA is an eigenvector-based statistical mechanism,
in valence/arousal scope. Each participant was asked to grade       which employs singular value decomposition, that transforms
each music video from 1 to 9 for valence, arousal, dominance        a set of correlated features into mutually uncorrelated
and liking. Hence, if the grade was greater than 4.5 then the       features [16], principal components or PCs. If we define the
arousal/valence label is high, if the grade is less than 4.5 then   extracted training features as 𝑋, then the PCA coefficients (Z)
the arousal/valence label is low [14]. All the signals were         can be written as:
recorded with 512 Hz sampling frequency.                                                𝑍 = 𝑋𝜙               (3)
                                                                        Where 𝜙 is the training set underlying basis vector, then
                                                                    the underlying basis vector is applied to the extracted features
of the test set, 𝑋̂, to obtain principal components of the test      on the remaining fold, the process repeated eight times, each
set, as:                                                             time a different fold is selected for testing.
                      𝑍̂ = 𝑋̂𝜙            (4)
                                                                                               III.   RESULTS
    The PCA was applied on the stacked extracted features,
without any dimensionality reduction, to generate PCs. The              We trained SVM, ANN and KNN classifiers, with
PCs are used as the input vectors of the three classifiers           extracted features from a pair of channels (F3-F4, F7-F8,
addressed in the next section.                                       FC1-FC2, FC5-FC5, FP1-FP2) of all frequency bands in 2-
                                                                     and 4-seconds temporal window. The RBF kernel of SVM is
F. Classification                                                    implemented with 𝜎 = 2 . The table 1 shows the cross-
   In this research, kernel SVM, KNN, same as [9], and ANN           validated accuracy of the classifiers with each pair of
are used for classification with the eight-fold cross-               channels. Optimum accuracy yields with F3-F4 pair of EEG
validation. The goal of SVM, as a parametric classifier, is to       channels, using SVM, which is 90.8 % (sensitivity 88.18%,
formulate a separating hyperplane with application of solving        specificity 89.27%) for arousal and 90.6% (sensitivity 89.9%,
a quadratic optimization problem in the feature space [16].          specificity 87.7%) for valence. Therefore, to reduce the
Kernel SVM finds the optimum hyperplane into a higher                computational cost, F3-F4 pair of channels can be utilized.
dimensional space, that maximizes the generalization
capability, where the distance between margins is maximum.            Table 1. Cross-validated accuracy of classifiers from each pair of
The RBF kernel is a function which projects input vectors                    EEG channels in temporal windows of (a) 4 (b)2 s
into a gaussian space, using equation (3). The generalization
property makes kernel SVM insensitive to overfitting [17].             Cross-validated                       Channels
                                                                        Accuracy (%)         F3-      F7-    FC1-     FC5-       FP1-
                  ′)                    ′ ‖2
                                                                                             F4       F8      FC2     FC6        FP2
     𝐾𝑅𝐵𝐹 (𝑥, 𝑥        = exp[−𝜎‖𝑥 − 𝑥          ]         (5)          (a)
                                                                        Arousal-SVM          90.8     87.9    87.2      85.1      88.1
   KNN is a non-parametric instance-based classifier, which             Valence-SVM          90.6     84.9    89.8      85.5      88.5
classifies an object based on the majority of votes of its              Arousal-KNN          76.4      79     75.9      77.6      71.5
neighbors. The votes are being assigned by the K-nearest                Valence-KNN           79      80.4     77       75.5      73.6
neighbors’ distance to the object.                                      Arousal-ANN          82.1     83.2    71.1      77.7      80.4
    ANN is a semi-parametric classifier flexible for non-               Valence-ANN          84.7     83.1    73.5      74.4      78.9
                                                                      (b)
linear classification, which aggregates multilayer logistic
                                                                        Arousal-SVM          85.7     79.7    80.3      83.1      82.3
regressions [18]. For multilayer feedforward network the                Valence-SVM          84.8     81.2    82.2      82.8      83.2
nonlinear activation function of the output layer is a sigmoid,         Arousal-KNN          68.5     70.2    65.3      66.6      64.5
equation 4:                                                             Valence-KNN           69       68     68.2      69.9      63.5
                           1                                            Arousal-ANN          74.3     76.2    73.6      70.2      73.4
               𝜙(𝑥) =                        (6)
                       1 + 𝑒 −𝑥                                         Valence-ANN          71.2     79.2    69.4      68.5      74.6
   Which predicts the probability as an output. In the hidden
layers rectified linear unit (ReLu) activation function,
equation 5, is employed. The ReLu provides sparsity and a               Using all the features extracted from all the channels
reduced likelihood of vanishing gradient, therefore the              within each frequency band (gamma, beta, alpha, theta), the
network convergence is improved and the learning process             SVM, ANN and KNN are trained. The cross-validated
including backpropagation is faster.                                 accuracy of each classifier trained on the principal
                                                                     components of each band is shown in table 2. The trained
               𝜙(𝑥) = max(0, 𝑥)                    (7)               SVM classifier with beta frequency band features generates
                                                                     the optimum accuracy, 91.3% (sensitivity 89.1%, specificity
   The implemented ANN incorporates three layers,                    88.4%) for arousal and 91.1% (sensitivity 87.3%, specificity
including two hidden layers with ReLu activation function            86.8%) for valence.
and the output layer with sigmoid activation function.
                                                                                  IV.    DISCUSSION AND CONCLUSION
   Radial basis function (RBF) is implemented as the SVM
kernel, with two different scale factors, 𝜎 = 2, 𝜎 = 0.1 .              In this research, EEG signals from the DEAP dataset are
KNN is implemented with five different values of nearest             windowed into 2- and 4-seconds windows with 50% overlap,
neighbors ( 3 ≤ 𝐾 ≤ 7 ), and the optimum classification              then, decomposed into 5 frequency bands; gamma, beta,
accuracy is achieved with K=5.                                       alpha, theta and noise. Spectral features, entropy, and energy
   The SVM, ANN and KNN are implemented with eight-                  of each window within the pre-specified frequency band are
fold cross validation to estimate the average accuracy of each       extracted. Afterwards, the PCA is applied as a transform
classifier. It partitions the data randomly into eight folds, with   preserving the input dimensionality to produce uncorrelated
equivalent size, each fold includes four participants                attributes. The ANN, KNN and SVM are trained with
attributes. The learner was trained on seven folds and tested        attributes from different frequency bands, and different pair
                                                                     of channels, by eight-fold cross validation.
  Table 2. Cross-validated accuracy of classifiers from different    robust classifiers such as random forest, deep neural network,
      frequency bands in temporal windows of (a) 4 (b) 2 s           or recurrent neural network may also lead to higher cross-
                                                                     validated accuracy.
       Cross-validated              Frequency Bands
       Accuracy (%)            Gamma Beta Alpha Theta
 (a)                                                                                        ACKNOWLEDGMENT
       Arousal-SVM               89.8      91.3     90.4      89.4
       Valence-SVM               89.7      91.1     90.9      89.1       The authors are extremely grateful to Dr. Mahmood Amiri
       Arousal-KNN                75       75.1     72.6      73.8   of the Kermanshah University of Medical Sciences, and Javad
       Valence-KNN                77       78.1      78       76.1   Frounchi from the department of Electrical and Computer
       Arousal-ANN               81.3      86.5     83.5      80.7   Engineering of University of Tabriz for their support and
       Valence-ANN               84.8      88.3     87.2      82.6   guidance.
 (b)
       Arousal-SVM               81.4       89     88.68      86.6                            V.     REFERENCES
       Valence-SVM              82.43     89.34    89.72      86.1
       Arousal-KNN                76        73     73.7       75.6   [1]      Y. Zhang, S. Zhang, and X. Ji, "EEG-based classification of
       Valence-KNN               74.3      73.2    73.8       72.4            emotions using empirical mode decomposition and
       Arousal-ANN               83.2      78.4    82.7       81.9            autoregressive model," Multimedia Tools and Applications, pp.
       Valence-ANN               79.1      75.2    86.1       84.3            1-14, 2018.
                                                                     [2]      L. Santamaria-Granados, M. Munoz-Organero, G. Ramirez-
                                                                              Gonzalez, E. Abdulhay, and N. J. I. A. Arunkumar, "Using Deep
   It is evident from comparing the cross-validated accuracy                  Convolutional Neural Network for Emotion Detection on a
results shown in table 1 and 2, that the SVM classifier                       Physiological Signals Dataset (AMIGOS)," 2018.
outperforms KNN and ANN. In the previous work [9], the               [3]      X. Li, D. Song, P. Zhang, G. Yu, Y. Hou, and B. Hu, "Emotion
maximum classification accuracy was 86.75% using KNN. In                      recognition from multi-channel EEG data through convolutional
                                                                              recurrent neural network," in Bioinformatics and Biomedicine
this study, applying PCA made the extracted features                          (BIBM), 2016 IEEE International Conference on, 2016, pp. 352-
uncorrelated. In addition, picking appropriate kernel (RBF),                  359: IEEE.
greatly increased the cross-validated accuracy.                      [4]      K. Takahashi, "Remarks on SVM-based emotion recognition
   Comparing the accuracy rates of tables 1 and 2 shows that,                 from multi-modal bio-potential signals," in Robot and Human
                                                                              Interactive Communication, 2004. ROMAN 2004. 13th IEEE
using a 4 seconds window ends up to higher accuracy,                          International Workshop on, 2004, pp. 95-100: IEEE.
therefore, the features are more meaningful. It is also              [5]      X.-W. Wang, D. Nie, and B.-L. Lu, "EEG-based emotion
noticeable that emotion status discrimination, based on                       recognition using frequency domain features and support vector
valence/arousal area, in different frequency bands is higher                  machines," in International Conference on Neural Information
                                                                              Processing, 2011, pp. 734-743: Springer.
than the channels. The minimum accuracy in table 2 (a), for          [6]      R.-N. Duan, J.-Y. Zhu, and B.-L. Lu, "Differential entropy
valence/arousal SVM is 89.1%, whereas in table 1 (a), for                     feature for EEG-based emotion classification," in Neural
valence/arousal SVM the maximum accuracy is 90.8%.                            Engineering (NER), 2013 6th International IEEE/EMBS
Therefore, it is better to extract features from all the channels             Conference on, 2013, pp. 81-84: IEEE.
                                                                     [7]      M. Murugappan, "Human emotion classification using wavelet
in each frequency band, then train a classifier.                              transform and KNN," in Pattern analysis and intelligent robotics
   Table 3 shows a comparison between our study results and                   (ICPAIR), 2011 international conference on, 2011, vol. 1, pp.
other researchers that have conducted research on the DEAP                    148-153: IEEE.
dataset.                                                             [8]      S. Nasehi, H. Pourghassem, and I. Isfahan, "An optimal EEG-
                                                                              based emotion recognition algorithm using gabor," WSEAS
                                                                              Transactions on Signal Processing, vol. 3, no. 8, pp. 87-99, 2012.
   Table 3. Accuracy comparison of different studies on DEAP         [9]      Z. Mohammadi, J. Frounchi, and M. Amiri, "Wavelet-based
                           dataset                                            emotion recognition system using EEG signal," Neural
                                                                              Computing and Applications, vol. 28, no. 8, pp. 1985-1990, 2017.
 Reference    Year       Number of        Classifier   Accuracy      [10]     X. Jie, R. Cao, and L. Li, "Emotion recognition based on the
                         Channels                                             sample entropy of EEG," Bio-medical materials and engineering,
    [10]      2014           5              SVM            80.4%              vol. 24, no. 1, pp. 1185-1192, 2014.
                                                                     [11]     M. Ali, A. H. Mosa, F. Al Machot, and K. Kyamakya, "EEG-
    [11]      2016          --              SVM            83.8%
                                                                              based emotion recognition approach for e-healthcare
     [9]      2017          10              KNN            86.7 %             applications," in Ubiquitous and Future Networks (ICUFN),
    Our       2018          10              SVM            91.3%              2016 Eighth International Conference on, 2016, pp. 946-950:
   study                                                                      IEEE.
                                                                     [12]     R. Horlings, D. Datcu, and L. J. Rothkrantz, "Emotion
   One can increase the accuracy with ensemble learning,                      recognition using brain activity," in Proceedings of the 9th
                                                                              international conference on computer systems and technologies
using the trained SVM and extracted features in different                     and workshop for PhD students in computing, 2008, p. 6: ACM.
frequency bands, which makes the algorithm time-                     [13]     T. Dalgleish and M. Power, Handbook of cognition and emotion.
consuming or somewhat inappropriate for real-time                             John Wiley & Sons, 2000.
applications, such as self-driving cars.                             [14]     S. Koelstra et al., "Deap: A database for emotion analysis; using
                                                                              physiological signals," IEEE Transactions on Affective
   For future works, other transforms such as independent                     Computing, vol. 3, no. 1, pp. 18-31, 2012.
component analysis (ICA) or linear discriminant analysis             [15]     J. A. Coan, J. J. Allen, and E. Harmon-Jones, "Voluntary facial
(LDA) could be applied, on the extracted features. Other                      expression and hemispheric asymmetry over the frontal cortex,"
                                                                              Psychophysiology, vol. 38, no. 6, pp. 912-925, 2001.
[16]   S. Theodoridis, A. Pikrakis, K. Koutroumbas, and D. Cavouras,
       Introduction to pattern recognition: a matlab approach.
       Academic Press, 2010.
[17]   O. Bazgir, S. A. H. Habibi, L. Palma, P. Pierleoni, and S. Nafees,
       "A Classifcation System for Assessment and Home Monitoring
       of Tremor in Patients with Parkinson’s Disease," Journal of
       Medical Signals and Sensors, vol. 8, no. 2, 2018.
[18]   O. Bazgir, J. Frounchi, S. A. H. Habibi, L. Palma, and P.
       Pierleoni, "A neural network system for diagnosis and
       assessment of tremor in Parkinson disease patients," in
       Biomedical Engineering (ICBME), 2015 22nd Iranian
       Conference on, 2015, pp. 1-5: IEEE.