Biomedical Signal Processing and Control: Linlin Gong, Mingyang Li, Tao Zhang, Wanzhong Chen
Biomedical Signal Processing and Control: Linlin Gong, Mingyang Li, Tao Zhang, Wanzhong Chen
Keywords:                                                EEG-based emotion recognition has become an important task in affective computing and intelligent interac-
Electroencephalogram (EEG)                               tion. However, how to effectively combine the spatial, spectral, and temporal distinguishable information of
Emotion recognition                                      EEG signals to achieve better emotion recognition performance is still a challenge. In this paper, we propose
Attention mechanism
                                                         a novel attention-based convolutional transformer neural network (ACTNN), which effectively integrates the
Convolutional neural network (CNN)
                                                         crucial spatial, spectral, and temporal information of EEG signals, and cascades convolutional neural network
Transformer
                                                         and transformer in a new way for emotion recognition task. We first organized EEG signals into spatial–
                                                         spectral–temporal representations. To enhance the distinguishability of features, spatial and spectral attention
                                                         masks are learned for the representation of each time slice. Then, a convolutional module is used to extract
                                                         local spatial and spectral features. Finally, we concatenate the features of all time slices, and feed them into the
                                                         transformer-based temporal encoding layer to use multi-head self-attention for global feature awareness. The
                                                         average recognition accuracy of the proposed ACTNN on two public datasets, namely SEED and SEED-IV, is
                                                         98.47% and 91.90% respectively, outperforming the state-of-the-art methods. Besides, to explore the underlying
                                                         reasoning process of the model and its neuroscience relevance with emotion, we further visualize spatial and
                                                         spectral attention masks. The attention weight distribution shows that the activities of prefrontal lobe and
                                                         lateral temporal lobe of the brain, and the gamma band of EEG signals might be more related to human
                                                         emotion. The proposed ACTNN can be employed as a promising framework for EEG emotion recognition.
  ∗ Corresponding author.
    E-mail address: chenwz@jlu.edu.cn (W. Chen).
https://doi.org/10.1016/j.bspc.2023.104835
Received 28 November 2022; Received in revised form 17 February 2023; Accepted 5 March 2023
Available online 10 March 2023
1746-8094/© 2023 Elsevier Ltd. All rights reserved.
L. Gong et al.                                                                                             Biomedical Signal Processing and Control 84 (2023) 104835
domain features generally include power spectrum (PSD) features [23].           framework. First, we use a non-overlapping window with a length of
For time-frequency domain features, wavelet transform is often used to          T-second to intercept EEG signals after removing noise and artifacts.
extract, including discrete wavelet transform (DWT) [24,25], tunable Q          Then, we divide it into T 1-second segments. For each segment, we
wavelet transform (TQWT) [26–28], Dual-Tree Complex Wavelet Trans-              extract the DE features in the 𝛿, 𝜃, 𝛼, 𝛽, 𝛾 frequency bands, and then
forms (DT-CWT) [29,30], etc. In addition, some studies used features            map the features in space according to the position of the electrodes.
based on entropy measure. In the field of EEG emotion recognition,              Subsequently, to enhance the critical spatial and spectral information
differential entropy (DE) [31] feature is widely used and proved to             and suppress invalid information, we introduce a new parallel spatial
be robust. Besides, it also includes sample entropy [32,33], energy             and spectral attention mechanism. Next, we use the convolution mod-
entropy [34], approximate entropy [35,36], etc. For feature classi-             ule to extract the local spatial and spectral features of each time slice. In
fication, the classifiers used generally include support vector ma-             the temporal encoding part, we concatenate the features from the time
chine (SVM) [31,36], k-nearest neighbor (KNN) [19,21], random forest            slices and apply multi-head self-attention for global awareness through
(RF) [34], decision tree (DT) [25,37], and so on, as well as ensemble           three temporal encoding layers. Finally, the classifier is composed of a
model [24] of multiple classifiers.                                             fully-connected layer and a softmax layer to predict emotion labels.
    Recently, with the continuous improvement and superior perfor-                  We carry out a series of experiments through ACTNN. Firstly, statis-
mance of deep learning algorithm, EEG emotion recognition method                tic analysis of DE features is carried out using one-way ANOVA. Sec-
based on deep learning framework has been effectively applied, and has          ondly, the overall performance of ACTNN and the comparison of results
achieved the better performance. Zheng et al. [38] designed a classifica-       under different attention conditions are reported. Thirdly, we com-
tion model based on deep belief network (DBN), and discussed the key            pare the recognition performance when the input is the raw EEG
frequency bands and channels more suitable for emotion classification           signals or DE features. Fourthly, the spatial and spectral attention mask
tasks. Maheshwari et al. [39] proposed a deep convolution neural                is visualized to explore the model’s potential reasoning process and
network (Deep CNN) EEG emotion classification method. Considering               interpretability. Finally, the ablation experiment is conducted to in-
the spatial information of adjacent and symmetric channels of EEG,              vestigate the contribution of key components of ACTNN to recognition
Cui et al. [40] proposed an end-to-end regional asymmetric convo-               performance.
lutional neural network (RACNN), wherein the temporal, regional,                    The main contributions of this paper are as follows:
and asymmetric feature extractor in the model are all composed of                   (1) We propose a novel attention-based Convolutional Transformer
convolution structures. Xing et al. [41] used the Stack AutoEncoder             neural network, named ACTNN. It cascades convolutional neural net-
(SAE) to decompose the EEG source signal, and then used the long                work and transformer in an innovative way to deal with EEG emo-
short-term memory recurrent neural network (LSTM-RNN) framework                 tion recognition tasks, which effectively utilizes the advantages of
for emotion classification.                                                     local awareness of CNN and global awareness of transformer, and the
    In addition, some researchers also use the mixed depth model to             combination of the two can form a powerful model.
carry out experiments. For example, Iyer et al. [42] proposed a hybrid              (2) We introduce the new attention mechanism to effectively en-
model based on CNN and LSTM, and an integrated model combining                  hance the distinguishability of spatial, spectral, and temporal of EEG
CNN, LSTM and hybrid models. Li et al. [43] presented a hybrid model            signals, and achieved satisfactory results. Moreover, we apply a more
based on CNN and RNN (CRNN), which constructs scalograms as the                 lightweight spatial and spectral attention layout, which overcomes the
input of the model after continuous wavelet transform of EEG signals.           high computational complexity caused by common attention mech-
Zhang et al. [44] designed an end-to-end hybrid network based on CNN            anism layout, saves computational consumption and ensures better
and LSTM (CNN-LSTM), which directly takes the original EEG signal               recognition accuracy.
as the input. All hybrid frameworks showed better classification results            (3) The average recognition accuracy of the proposed ACTNN model
than using a single model.                                                      on SEED and SEED-IV datasets is 98.47% and 91.90% respectively,
    However, there are still some challenges and problems worthy of             outperforms the state-of-the-art methods. Besides, to explore the un-
improvement in the area of EEG-based emotion recognition.                       derlying reasoning process of the model and its neuroscience relevance
    Firstly, as mentioned above, most studies often extract features            to emotion, we analyze the attention mask, and the weight distribution
from the time domain, frequency domain or time-frequency domain                 showed that the activities of the prefrontal and lateral temporal lobes
of EEG signals. In fact, EEG also includes the spatial information of           of the brain and the gamma band of EEG signals might be more related
each channel. Because when in an emotional state, it involves large-            to human emotion.
scale network interaction of the entire neural axis [45]. To effectively            The rest of this paper is arranged as follows: Section 2 introduces
use the spatial information of EEG signals, Song et al. [46] proposed to        two public databases and the proposed methods in detail. Section 3
use dynamical graph convolution neural network (DGCNN) to carry out             reports and analyzes the experimental results. Section 4 discusses the
EEG emotion recognition, in their method, each EEG channel is taken             noteworthy points and points to be improved in our work according to
as the vertex of the graph, and the adjacency matrix is dynamically             the results. Section 5 elaborates the conclusion of our work.
updated during training. Subsequently, some models based on graph
neural network are proposed gradually [47,48]. Another method using             2. Materials and methods
spatial information has recently attracted attention, that is, EEG signal
is processed in a two-dimensional matrix. Yang et al. [49] was the              2.1. Dataset
first to propose this integration method. Later, CRNN [43], HCNN [50],
PCNN [51], etc., also used similar construction methods.                            We conduct extensive experiments on the SEED1 [31,38] and SEED-
    Secondly, CNN has been extensively applied in EEG emotion recog-            IV [53] dataset to evaluate our model. The main details of the two
nition task. However, there is a temporal context relation between              datasets are summarized in Table 1.
frames. The convolution kernel in CNN can perceive locally, but it                  The SEED dataset [31,38] is a public EEG emotion dataset, which
may break these relations. We have learned that Transformer [52] has            is mainly oriented to discrete emotion models. The experimental flow
strong global awareness due to the design of multi-head self-attention          of SEED dataset is shown in Fig. 1, which is similar to that of SEED-
mechanism. Therefore, we hope to combine the local perception ability           IV dataset. It including 15 subjects (7 males and 8 females, age range:
of CNN and the global perception ability of Transformer, and to design          23.27±2.37). Each subject did three experiments at intervals of about
a novel EEG-based emotion recognition model with better performance.
    In this paper, we propose a novel multi-channel EEG emotion
                                                                                  1
recognition model (ACTNN), which cascades CNN and transformer                         https://bcmi.sjtu.edu.cn/~seed/index.html
                                                                            2
L. Gong et al.                                                                                                  Biomedical Signal Processing and Control 84 (2023) 104835
                 Table 1                                                       on the 𝛿(1–4 Hz), 𝜃(4–8 Hz), 𝛼(8–13 Hz), 𝛽(13–31 Hz), and 𝛾(31–50 Hz)
                 Details of SEED and SEED-IV datasets.
                                                                               frequency bands, respectively. The calculation formula of DE feature
                  Item                       SEED         SEED-IV              is
                  Subjects                   15           15
                  Trials/Film clips          15           24                   𝐷𝐸(𝑋) = −          𝑓 (𝑥) log 𝑓 (𝑥) 𝑑𝑥                                                 (1)
                                                                                             ∫𝑋
                  Each clip duration         4-min        2-min
                  Sessions/experiments       3            3                    Where 𝑋 represented the EEG sequence and 𝑓 (𝑥) represented the its
                  EEG electrodes             62           62
                                                                               probability density function. Shi et al. [54] had proved that when band-
                  Sampling rate              200 Hz       200 Hz
                  Emotion category           3 class      4 class
                                                                               pass filtering is carried out at a 2 Hz step from 2 Hz to 44 Hz, the EEG
                                                                               signals of each subband approximately follow the Gaussian distribution,
                                                                               namely, 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ). Therefore, the formula (1) can be further written
                                                                               as
                                                                                              ∞
                                                                                                      1               (𝑥 − 𝜇)2         1          (𝑥 − 𝜇)2
                                                                               𝐷𝐸(𝑋) =            √           exp(−            ) log √      exp(−          ) 𝑑𝑥
                                                                                            ∫−∞                         2𝜎 2                        2𝜎 2
                                                                                                      2𝜋𝜎 2                           2𝜋𝜎 2
                                                                                                                                                                (2)
                                                                                            1
                                                                                        =     log(2𝜋𝑒𝜎 2 )
                                                                                            2
                                                                               Where 𝜋 and 𝑒 was constant, and 𝜎 2 represented the variance of the
                                                                               EEG time series.
2.2. The proposed model                                                            The three-dimensional structure 𝐸𝑖 ∈ 𝑅𝐵×(𝐻×𝑊 ) contained impor-
                                                                               tant spatial and spectral information, where 𝑖 represented the 𝑖th time
    The framework of the proposed ACTNN is shown in Fig. 2. It mainly          slice, 𝑖 = 1, … , 𝑇 . We introduced spatial and spectral attention branch,
consists of the following parts: EEG signal acquisition, preprocessing         which were to adaptively capture brain regions and frequency bands
and segmentation, feature extraction, spatial projection, spatial and          that were more critical for the tasks. Inspired by the convolutional
spectral attention branch, spatial–spectral convolution part, temporal         attention module, scSE [55], which was used in the field of medical
encoding part, and classifier.                                                 image segmentation initially, we designed attention branch that were
    For the EEG signals induced by emotional stimulus of each subject          suitable for our tasks, as shown in the lower-left corner of Fig. 2.
in the dataset, we first intercept the non-overlapping T-second EEG
signals and divide them into T time slices with a length of 1 s. Then,         2.5.1. Spatial attention branch
we extract DE features in five frequency bands (i.e., 𝛿, 𝜃, 𝛼, 𝛽, 𝛾               The spatial attention branch aimed to capture crucial brain regions
rhythms) from each slice and map them to the spatial matrix. In the            and corresponding electrodes involved in emotional activities, which
attention stage, we introduce a parallel spatial and spectral attention        used the method of spectral squeeze and spatial excitation. In detail,
branch to adaptively allocate attention weights to spatial and spectral        the three-dimensional structure 𝐸𝑖 of each time slice was shown as
dimensions. Next, we use the spatial–spectral convolution module to            𝐸𝑖 = [𝑒1,1 , 𝑒1,2 , … , 𝑒𝐻,𝑊 ], where 𝑒𝑖,𝑗 ∈ 𝑅𝐵×(1×1) . Spectral squeeze was
extract local features from each time slice. After concatenating the           mainly realized by 3D convolution, that used convolution kernel with
features of each time slice, the temporal encoding layer is used to            the size of 𝐵 × 1 × 1 and the output channel of 1, which was represented
further extract the temporal features from the global. Finally, a fully-       by
connected layer and a softmax layer is used to predict the emotional
state of the subjects. The following is a detailed introduction to the         𝐾𝑖 = 𝑊𝑘 ⊗ 𝐸𝑖                                                                          (3)
specific implementation process of each part.                                  Where 𝑊𝑘 ∈ 𝑅1×𝐵×1×1 represented the learned matrix, and 𝐾𝑖 ∈
                                                                               𝑅1×𝐻×𝑊 was the spatial scores tensor. Next, the sigmoid function (rep-
2.3. Preprocessing and feature extraction                                      resented by 𝜎(⋅)) was applied to normalize each element
                                                                               𝑘𝑚=1,2,…,𝐻,𝑛=1,2,…,𝑊 of 𝐾𝑖 to the range of [0,1], which was the spatial
    For the preprocessed EEG signals in the SEED and SEED-IV dataset,          attention scores. Finally, using spatial attention scores to recalibrate the
we used a non-overlapping window with a length of T-second to                  original three-dimensional structure 𝐸𝑖 , we can get
intercept EEG signals. Then, we divided it into T 1-second segments.
For each segment, we extracted the differential entropy (DE) features          𝐸𝑖,𝑠𝑝𝑎𝑡𝑖𝑎𝑙 = [𝜎(𝑘1,1 )𝑒1,1 , 𝜎(𝑘1,2 )𝑒1,2 , … , 𝜎(𝑘𝐻,𝑊 )𝑒𝐻,𝑊 ]                        (4)
                                                                           3
L. Gong et al.                                                                                                             Biomedical Signal Processing and Control 84 (2023) 104835
Fig. 2. The framework diagram of the attention-based convolutional transformer neural network (ACTNN) for EEG emotion recognition.
                                                                                     4
L. Gong et al.                                                                                                         Biomedical Signal Processing and Control 84 (2023) 104835
                                                                                                           Table 2
                                                                                                           Sample size in SEED and SEED-IV datasets.
𝑌𝑖 = 𝑓 (𝐶𝑜𝑛𝑣(𝐵𝑖,2 , 𝑘𝑐3 )), 𝑘𝑐3 ∈ 𝑅1×3×3                                      (12)                          Dataset              session1/session2/session3
                                                                                      5
L. Gong et al.                                                                                                          Biomedical Signal Processing and Control 84 (2023) 104835
                 Table 3                                                                                  Table 5
                 Hyper-parameter setting.                                                                 The F statistic and 𝑝-value obtained by one-way
                  Hyper-parameter           Value or type                                                 ANOVA in SEED-IV.
                                                                                                            Subject             F                   𝑝-value
                  Optimizer                 Adam
                  Learning rate             1e−5                                                            1                   227.51              1.28E−134
                  Loss function             cross entropy                                                   2                   250.23              7.59E−147
                  Batch size                32                                                              3                   286.65              4.90E−166
                  Number of epochs          30(SEED)/50(SEED-IV)                                            4                   496.03              9.90E−268
                  Dropout                   0.7(SEED)/0.6(SEED-IV)                                          5                   64.03               2.80E−40
                                                                                                            6                   37.03               1.52E−23
                                                                                                            7                   160.74              1.72E−97
                 Table 4                                                                                    8                   166.12              1.51E−100
                 The F statistic and 𝑝-value obtained by one-way                                            9                   74.84               6.92E−47
                 ANOVA in SEED.                                                                             10                  52.90               2.03E−33
                  Subject            F                  𝑝-value                                             11                  168.25              9.35E−102
                                                                                                            12                  329.79              4.09E−185
                  1                  436.99             1.40E−169
                                                                                                            13                  129.28              2.42E−79
                  2                  118.4              1.97E−50
                                                                                                            14                  388.92              1.95E−217
                  3                  101.98             9.76E−44
                                                                                                            15                  67.18               3.31E−42
                  4                  238.4              1.34E−97
                  5                  422.8              1.15E−164
                  6                  1769.72            0
                  7                  903.47             3.01e−315
                  8                  1386.73            0
                  9                  935.79             0
                  10                 227.4              2.13E−93
                  11                 960.23             0
                  12                 65.01              1.96E−28
                  13                 573.44             3.01E−215
                  14                 1060.74            0
                  15                 781.07             1.00E−279
                                                                                      6
L. Gong et al.                                                                                                               Biomedical Signal Processing and Control 84 (2023) 104835
         Table 6
         The component of different attention   situations.
                          Component             Spatial attention      Spectral attention         Spatial–spectral convolution module       Temporal encoding layer
          Attention                                                                                                                         MHSA        FFN
          W/O any attention                     ×                      ×                          ✓                                         ×           ✓
          With only spatial attention           ✓                      ×                          ✓                                         ×           ✓
          With only spectral attention          ×                      ✓                          ✓                                         ×           ✓
          With spatial–spectral attention       ✓                      ✓                          ✓                                         ×           ✓
          With only temporal attention          ×                      ×                          ✓                                         ✓           ✓
          With all attention                    ✓                      ✓                          ✓                                         ✓           ✓
                 Table 7
                 The average accuracy and standard deviation (acc/std(%)) of ACTNN model in different attention situations.
                   Attention                              SEED                                                    SEED-IV
                                                          session1          session2            session3          session1          session2           session3
                   w/o any attention                      91.57/7.11        92.24/4.15          91.86/6.72        66.98/6.66        63.04/9.25         67.69/7.86
                   With only spatial attention            94.43/4.90        95.92/2.80          95.51/4.72        74.28/7.80        70.55/8.26         73.94/8.37
                   With only spectral attention           94.59/4.73        95.99/2.83          95.50/4.73        74.20/8.32        70.62/8.62         74.07/8.15
                   With spatial–spectral attention        96.31/3.11        97.21/2.17          97.06/3.49        77.24/7.62        73.59/7.47         76.27/8.83
                   With only temporal attention           97.21/2.66        97.48/2.86          97.35/2.58        89.71/4.72        84.13/8.24         87.37/9.63
                   With all attention                     98.21/1.71        98.47/1.73          98.72/1.71        93.55/2.33        90.93/5.51         91.21/8.46
Fig. 7. The recognition accuracy of each subject under six attention situations in three sessions of SEED dataset. They are w/o any attention (dark blue square), with only spatial
attention (green circle), with only spectral attention (light blue triangle), with spatial–spectral attention (purple pentagon), with only temporal attention (orange diamond), and
with all attention (red star), respectively.
    Table 6 listed the component of each attention situation. Table 7                               For the overall performance of ACTNN, Figs. 9 and 10 reported
was the average accuracy and standard deviation obtained under dif-                             the accuracy of all subjects on SEED and SEED-IV datasets. As can
ferent attention mechanisms. Figs. 7 and 8 showed the results of all                            be seen from Fig. 9, the proposed ACTNN can achieve a satisfactory
sessions for all subjects in SEED and SEED-IV respectively.                                     classification result for all subjects in SEED. The average recognition
    When no attention was used (dark blue square in Figs. 7 and 8),                             accuracy of all subjects in the three sessions was 98.21%, 98.47%, and
all subjects in SEED dataset can still perform relatively well, but the                         98.72% respectively, and the corresponding standard deviation was
subjects in SEED-IV dataset had been greatly affected. Compared with-                           1.71%, 1.73%, and 1.71% respectively. This showed that ACTNN had
out any attention, the results obtained by only adding spatial attention                        better stability and superiority on SEED dataset.
(green circle) or spectral attention (light blue triangle) had a similar                            For most subjects in SEED-IV dataset (see Fig. 10), the proposed
increase, which may be due to the parallel structure of spatial and                             ACTNN can achieve good results, the average recognition accuracy of
spectral attention mechanisms. And when spectral and spatial attention                          all subjects in the three sessions was 93.55%, 90.93%, and 91.21%
mechanisms were combined (purple pentagon), the accuracy can be                                 respectively, and the standard deviation was 2.33%, 5.51%, and 8.46%
                                                                                                respectively. Except for a few cases, such as subject #9 in session 3
further improved. As for adding only temporal attention (orange di-
                                                                                                (75.4%), subject #11 in session 2 (77.16%) and session 3 (67.31%),
amond), the best improvement can be got compared with the previous
                                                                                                which may be due to the difference between the emotional label
situations. This may be because it captured context from the global
                                                                                                marked for the EEG signals and the induced emotions of the
scope of the time slice, which makes input more discriminative.
                                                                                                subject.
    We can make a quantitative comparison from Table 7. Compared
                                                                                                    In addition, to illustrate the ability of ACTNN to distinguish various
with no attention, the maximum improvement had been achieved by
                                                                                                emotional states, Fig. 11 showed the confusion matrix obtained by
adding temporal attention, which can increase the accuracy by at                                ACTNN on SEED and SEED-IV datasets respectively. As shown in (a)
least 5.24% and 19.68% in SEED and SEED-IV, respectively. However,                              of Fig. 11, for the SEED dataset, ACTNN can achieve the best classi-
adding spatial attention increased the average accuracy by at least                             fication for positive emotions, followed by neutral emotions. For the
2.86% and 6.25% respectively, adding spectral attention increased by                            SEED-IV dataset (see (b) in Fig. 11), sad emotion was the most easily
at least 3.02% and 6.38% respectively, and adding spatial–spectral                              distinguished emotional state and fear emotions seemed to be the least
attention increased by at least 4.74% and 8.58% respectively.                                   recognizable.
    To sum up, in our model, temporal attention achieved the best
performance than spatial or spectral attention. Due to the structure                            3.5. Comparative analysis between raw EEG signals and DE features
designed, spatial attention and spectral attention had similar improve-
ments. It played a better role when spatial and spectral attention were                            It proved that when the extracted DE features were used as the input
combined.                                                                                       of ACTNN, good emotion recognition performance can be obtained.
                                                                                            7
L. Gong et al.                                                                                                               Biomedical Signal Processing and Control 84 (2023) 104835
Fig. 8. The recognition accuracy of each subject under six attention situations in three sessions of SEED-IV dataset. They are w/o any attention (dark blue square), with only
spatial attention (green circle), with only spectral attention (light blue triangle), with spatial–spectral attention (purple pentagon), with only temporal attention (orange diamond),
and with all attention (red star), respectively.
Fig. 10. The overall performance of the proposed ACTNN on SEED-IV dataset.
Fig. 11. The confusion matrix of the proposed ACTNN on SEED and SEED-IV dataset.
                                                                                          8
L. Gong et al.                                                                                                          Biomedical Signal Processing and Control 84 (2023) 104835
Table 8
Comparison of recognition performance (average accuracy and standard deviation) with
raw EEG signal and DE feature as input in SEED.
 Session           Raw EEG signals                      DE features
                   Acc (%)           Std (%)            Acc (%)            Std (%)
 1                 94.70             5.77               98.21              1.71
 2                 96.78             2.28               98.47              1.73
 3                 95.86             2.81               98.72              1.71
 Average           95.78             3.62               98.47              1.72
Table 9
Comparison of recognition performance (average accuracy and standard deviation) with
raw EEG signal and DE feature as input in SEED-IV.
 Session           Raw EEG signals                      DE features
                   Acc (%)           Std (%)            Acc (%)            Std (%)
 1                 89.36             5.88               93.55              2.33            Fig. 12. Brain topographic map of spatial attention mask adaptively assigned by
 2                 87.65             4.45               90.93              5.51            ACTNN on the subject #4 in SEED dataset, where the first and second maps represent
 3                 87.34             5.05               91.21              8.46            the attention weights assigned for the 1st and 2nd second input data, respectively.
 Average           88.12             5.13               91.90              5.43
95.78%, which was 2.69% less than the DE features. Similarly, Table 9
described that the average accuracy obtained by using the raw EEG
signals in the SEED-IV dataset was 88.12%, which was 3.78% less than                       bands. As shown in Figs. 12 and 13, since we set T to 2, the first
the DE features. Therefore, although extracted DE feature seemed to                        and second brain topographic maps of each emotion represented the
increase the complexity compared with the raw EEG signals, the final                       attention masks captured for the 1st and 2nd second, respectively. It
recognition results showed that DE features obtained better perfor-                        can be seen that the weight distribution changed in a small range with
mance within the acceptable range of complexity, as they contained                         time. Due to the limited space, we only listed the results of subject #4
more effective emotional information.                                                      in SEED and subject #3 in SEED-IV as an example. The spatial attention
                                                                                           brain topographic maps of other subjects were attached at the end of
3.6. Analysis of spatial and spectral attention mask                                       the paper (see supplemental material).
                                                                                               For spectral attention masks, we computed the average spectral
    To further understand the underlying reasoning process of our                          attention masks of all subjects after the training, which represented the
proposed method, we visualized the spatial and spectral attention mask                     common importance of the different frequency bands, and explained
in the model. These attention masks were a set of data-driven attention                    the contribution of each frequency band to emotion recognition. Then,
weights, that dynamically assigned to critical electrodes or frequency                     we plotted the average weights of spectral attention mask in Fig. 14.
bands after training.                                                                      We can see that all the attention mask values were between 0 and 1.
    To describe the weight distribution of spatial attention masks more                    In the SEED and SEED-IV datasets, the model allocated the maximum
intuitively, we captured the updated masks in the last iteration of the                    attention weight on the gamma band. Since the attention weight was
model and mapped them to the brain topographic map. Figs. 12 and                           data-driven, it indicated that the features of the gamma band may
13 showed the spatial attention mask captured for SEED and SEED-                           provide more valuable discrimination information for emotion recog-
IV datasets respectively. The redder the color, the higher the assigned                    nition tasks, and the EEG of the gamma band may be more related to
weight. It can be seen that the attention weights of all emotions were                     human emotion, which was consistent with the existing research [58].
mainly distributed in the prefrontal lobe and lateral temporal lobe,                       Thus, the features of gamma band were continuously enhanced after
which indicated that these brain regions may be more closely related                       recalibration, and to improve the overall recognition performance.
to emotional activation and information processing in the brain, and
this was consistent with the observation results of neurobiological                        3.7. Method comparison
studies [56,57]. It should be noted that the spatial attention mask was
obtained by compressing the frequency bands, that is, we used the                             To verify the effectiveness of our model, we compared the proposed
convolutional kernels with the size of 5 × 1 × 1 on spectral dimension,                    model with the state-of-the-art methods, and a brief introduction of
so it contained the comprehensive information of the five frequency                        each method was listed as follows.
                                                                                       9
L. Gong et al.                                                                                                       Biomedical Signal Processing and Control 84 (2023) 104835
                                                                                       10
L. Gong et al.                                                                                                         Biomedical Signal Processing and Control 84 (2023) 104835
                 Table 10
                 Performance comparison between the baseline methods and the proposed ACTNN on the SEED and SEED-IV datasets.
                  Methods                         Year             Evaluation methods           SEED                             SEED-IV
                                                                                                Acc (%)         Std (%)          Acc (%)            Std (%)
                  DBN [38]                        2015             Trial (9:6)                  86.08           8.34             –                  –
                  SVM [53]                        2018             Trial (16:8)                 –               –                70.58              17.01
                  DGCNN [46]                      2018             Trial (9:6)                  90.40           8.49             –                  –
                  BiHDM [59]                      2019             Trial (9:6)/(16:8)           93.12           6.06             74.35              14.09
                  GCB-net+BLS [48]                2019             Trial (9:6)                  94.24           6.70             –                  –
                  RGNN [47]                       2020             Trial (9:6)/(16:8)           94.24           5.95             79.37              10.54
                  4D-CRNN [60]                    2020             5-fold CV                    94.74           2.32             –                  –
                  SST-EmotionNet [61]             2020             Shuffle (6:4)                96.02           2.17             84.92              6.66
                  3D-CNN&PST [62]                 2021             Shuffle (9:6)                95.76           4.98             82.73              8.96
                  EeT [63]                        2021             5-fold CV                    96.28           4.39             83.27              8.37
                  JDAT [64]                       2021             10-fold CV                   97.30           1.74             –                  –
                  4D-aNN [65]                     2022             5-fold CV                    96.25           1.86             86.77              7.29
                  MDGCN-SRCNN [66]                2022             Trial (9:6)/(16:8)           95.08           6.12             85.52              11.58
                  HCRNN [67]                      2022             5 times 10-fold CV           95.33           1.39             –                  –
                  ACTNN(this paper)               2022             10-fold CV                   98.47           1.72             91.90              5.43
5. Conclusion
                                                                                        11
L. Gong et al.                                                                                                                Biomedical Signal Processing and Control 84 (2023) 104835
CNN-based and Transformer-based modules, and the results show that                              [10] J. Yedukondalu, L.D. Sharma, Cognitive load detection using circulant singular
the temporal encoding module has a relatively larger contribution to                                 spectrum analysis and Binary Harris Hawks Optimization based feature selection,
                                                                                                     Biomed. Signal Process. Control 79 (2023) 104006.
the improvement of recognition performance.
                                                                                                [11] R.W. Picard, Affective computing: challenges, Int. J. Hum.-Comput. Stud. 59
   The proposed ACTNN provides a new insight into human emotion                                      (1–2) (2003) 55–64.
decoding based on EEG signals, and can also be easily applied to                                [12] H.D. Nguyen, S.H. Kim, et al., Facial expression recognition using a temporal
other EEG classification tasks, such as sleep stage classification, motor                            ensemble of multi-level convolutional neural networks, IEEE Trans. Affect.
imagination, etc. In future work, we will explore the performance of                                 Comput. 13 (1) (2022) 226–237.
                                                                                                [13] F. Noroozi, C.A. Corneanu, et al., Survey on emotional body gesture recognition,
ACTNN in subject-independent and cross-session tasks to improve the                                  IEEE Trans. Affect. Comput. 12 (2) (2021) 505–523.
generalization ability of the model.                                                            [14] W. Li, Z. Zhang, A. Song, Physiological-signal-based emotion recognition: An
                                                                                                     odyssey from methodology to philosophy, Measurement 172 (2021) 108747.
CRediT authorship contribution statement                                                        [15] C. Morawetz, S. Bode, et al., Effective amygdala-prefrontal connectivity pre-
                                                                                                     dicts individual differences in successful emotion regulation, Soc. Cogn. Affect.
                                                                                                     Neurosci. 12 (4) (2017) 569–585.
   Linlin Gong: Conceptualization, Methodology, Software, Writing –                             [16] S. Berboth, C. Morawetz, Amygdala-prefrontal connectivity during emotion reg-
original draft. Mingyang Li: Writing – review & editing, Methodology.                                ulation: A meta-analysis of psychophysiological interactions, Neuropsychologia
Tao Zhang: Investigation, Validation. Wanzhong Chen: Supervision,                                    153 (2021) 107767.
                                                                                                [17] J.T. Cacioppo, D.J. Klein, et al., The psychophysiology of emotion, in: The
Formal analysis.
                                                                                                     Handbook of Emotion, 2003.
                                                                                                [18] P.C. Petrantonakis, L.J. Hadjileontiadis, Emotion recognition from brain signals
Declaration of competing interest                                                                    using hybrid adaptive filtering and higher order crossings analysis, IEEE Trans.
                                                                                                     Affect. Comput. 1 (2) (2010) 81–97.
    The authors declare that they have no known competing finan-                                [19] P.C. Petrantonakis, L.J. Hadjileontiadis, Emotion recognition from EEG using
                                                                                                     higher order crossings, IEEE Trans. Inf. Technol. Biomed. 14 (2) (2010) 186–197.
cial interests or personal relationships that could have appeared to
                                                                                                [20] H. Bo, C. Xu, et al., Emotion recognition based on representation dissimilar-
influence the work reported in this paper.                                                           ity matrix, in: 2022 IEEE International Conference on Multimedia and Expo
                                                                                                     Workshops, ICMEW, 2022, pp. 1–6.
Data availability                                                                               [21] N. Jadhav, R. Manthalkar, Y. Joshi, Effect of meditation on emotional response:
                                                                                                     An EEG-based study, Biomed. Signal Process. Control 34 (2017) 101–113.
                                                                                                [22] R.M. Mehmood, B. Muhammad, et al., EEG-based affective state recognition from
    The authors do not have permission to share data.                                                human brain signals by using hjorth-activity, Measurement 202 (2022) 111738.
                                                                                                [23] M. Alsolamy, A. Fattouh, Emotion estimation from EEG signals during listening
Acknowledgments                                                                                      to Quran using PSD features, in: 2016 7th International Conference on Computer
                                                                                                     Science and Information Technology, CSIT, 2016, pp. 1–5.
                                                                                                [24] K.S. Kamble, J. Sengupta, Ensemble machine learning-based affective computing
    We sincerely appreciate all the editors and reviewers for their
                                                                                                     for emotion recognition using dual-decomposed EEG signals, IEEE Sens. J. 22
insightful comments and constructive suggestions. This work was sup-                                 (3) (2022) 2496–2507.
ported by the Natural Science Foundation of Jilin Province, China                               [25] P. Wagh Kalyani, K. Vasanth, Performance evaluation of multi-channel electroen-
(Grant No. 20210101178JC), Scientific Research Project of Education                                  cephalogram signal (EEG) based time frequency analysis for human emotion
                                                                                                     recognition, Biomed. Signal Process. Control 78 (2022) 103966.
Department of Jilin Province, China (Grant No. JJKH20221009KJ), In-
                                                                                                [26] S. Li, X. Lyu, et al., Identification of emotion using electroencephalogram by
terdisciplinary Integration and Innovation Project of JLU, China (Grant                              tunable Q-factor wavelet transform and binary gray wolf optimization, Front.
No. JLUXKJC2021ZZ02), and National Natural Science Foundation of                                     Comput. Neurosci. 15 (2021).
China (Grant No. 62203183).                                                                     [27] A. Subasi, T. Tuncer, et al., EEG-based emotion recognition using tunable q
                                                                                                     wavelet transform and rotation forest ensemble classifier, Biomed. Signal Process.
                                                                                                     Control 68 (2021) 102648.
Appendix A. Supplementary data
                                                                                                [28] S.K. Khare, V. Bajaj, G.R. Sinha, Adaptive tunable q wavelet transform-based
                                                                                                     emotion identification, IEEE Trans. Instrum. Meas. 69 (12) (2020) 9609–9617.
    Supplementary material related to this article can be found online                          [29] C. Wei, L. Chen, et al., EEG-based emotion recognition using simple recurrent
at https://doi.org/10.1016/j.bspc.2023.104835.                                                       units network and ensemble learning, Biomed. Signal Process. Control 58 (2020)
                                                                                                     101756.
                                                                                                [30] D.S. Naser, G. Saha, Recognition of emotions induced by music videos using DT-
References                                                                                           CWPT, in: 2013 Indian Conference on Medical Informatics and Telemedicine,
                                                                                                     ICMIT, 2013, pp. 53–57.
 [1] R.W. Picard, Affective Computing, MIT Press, 1997.                                         [31] R. Duan, J. Zhu, B. Lu, Differential entropy feature for EEG-based emotion
 [2] P.D. Bamidis, C. Papadelis, et al., Affective computing in the era of contempo-                 classification, in: 2013 6th International IEEE/EMBS Conference on Neural
     rary neurophysiology and health informatics, Interact. Comput. 16 (4) (2004)                    Engineering, NER, 2013, pp. 81–84.
     715–721.                                                                                   [32] J. Xiang, C. Rui, L. Li, Emotion recognition based on the sample entropy of EEG,
 [3] R.W. Picard, E. Vyzas, J. Healey, Toward machine emotional intelligence:                        in: Proceedings of the 2nd International Conference on Biomedical Engineering
     analysis of affective physiological state, IEEE Trans. Pattern Anal. Mach. Intell.              and Biotechnology, 2014, pp. 1185–1192.
     23 (10) (2001) 1175–1191.                                                                  [33] Y. Shi, X. Zheng, T. Li, Unconscious emotion recognition based on multi-scale
 [4] S. Ehrlich, C. Guan, G. Cheng, A closed-loop brain-computer music interface for                 sample entropy, in: 2018 IEEE International Conference on Bioinformatics and
     continuous affective interaction, in: 2017 International Conference on Orange                   Biomedicine, BIBM, 2018, pp. 1221–1226.
     Technologies, ICOT, 2017, pp. 176–179, http://dx.doi.org/10.1109/ICOT.2017.                [34] E.S. Pane, A.D. Wibawa, M.H. Purnomo, Improving the accuracy of EEG emotion
     8336116.                                                                                        recognition by combining valence lateralization and ensemble learning with
 [5] J. Pan, Q. Xie, et al., Emotion-related consciousness detection in patients with                tuning parameters, Cogn. Process 20 (2019) 405–417.
     disorders of consciousness through an EEG-based BCI system, Front. Hum.                    [35] T. Chen, S. Ju, et al., Emotion recognition using empirical mode decomposition
     Neurosci. 12 (2018).                                                                            and approximation entropy, Comput. Electr. Eng. 72 (2018) 383–392.
 [6] E.V.C. Friedrich, A. Sivanathan, et al., An effective neurofeedback intervention to        [36] T. Chen, S. Ju, et al., EEG emotion recognition model based on the LIBSVM
     improve social interactions in children with autism spectrum disorder, J. Autism                classifier, Measurement 164 (2020) 108047.
     Dev. Disord. 45 (2015) 4084–4100.                                                          [37] W. Jiang, G. Liu, et al., Cross-subject emotion recognition with a decision tree
 [7] H. Dini, F. Ghassemi, M.S.E. Sendi, Investigation of brain functional networks                  classifier based on sequential backward selection, in: 2019 11th International
     in children suffering from attention deficit hyperactivity disorder, Brain Topogr.              Conference on Intelligent Human–Machine Systems and Cybernetics, IHMSC,
     33 (2020) 733–750.                                                                              2019, pp. 309–313.
 [8] H. Chang, Y. Zong, et al., Depression assessment method: An EEG emotion recog-             [38] W. Zheng, B. Lu, Investigating critical frequency bands and channels for EEG-
     nition framework based on spatiotemporal neural network, Front. Psychiatry 12                   based emotion recognition with deep neural networks, IEEE Trans. Auton. Ment.
     (2022).                                                                                         Dev. 7 (3) (2015) 162–175.
 [9] H. Hu, Z. Zhu, et al., Analysis on biosignal characteristics to evaluate road rage         [39] D. Maheshwari, S.K. Ghosh, et al., Automated accurate emotion recognition
     of Younger drivers: A driving simulator study, in: 2018 IEEE Intelligent Vehicles               system using rhythm-specific deep convolutional neural network technique with
     Symposium, IV, 2018, pp. 156–161.                                                               multi-channel EEG signals, Comput. Biol. Med. 134 (2021) 104428.
                                                                                           12
L. Gong et al.                                                                                                              Biomedical Signal Processing and Control 84 (2023) 104835
[40] H. Cui, A. Liu, et al., EEG-based emotion recognition using an end-to-end                [54] L. Shi, Y. Jiao, B. Lu, Differential entropy feature for EEG-based vigilance
     regional-asymmetric convolutional neural network, Knowl.-Based Syst. 205                      estimation, in: 2013 35th Annual International Conference of the IEEE Engi-
     (2020) 106243.                                                                                neering in Medicine and Biology Society, EMBC, 2013, pp. 6627–6630, http:
[41] X. Xing, Z. Li, et al., SAE+LSTM: A new framework for emotion recognition from                //dx.doi.org/10.1109/EMBC.2013.6611075.
     multi-channel EEG, Front. Neurorobot. 13 (2019).                                         [55] A.G. Roy, N. Navab, C. Wachinger, Concurrent spatial and channel ‘squeeze &
[42] A. Iyer, S.S. Das, et al., CNN and LSTM based ensemble learning for human                     excitation’ in fully convolutional networks, in: 2018 International Conference
     emotion recognition using EEG recordings, Multimodal Interact. IoT Appl. (2022)               on Medical Image Computing and Computer-Assisted Intervention, vol. 11070,
     http://dx.doi.org/10.1007/s11042-022-12310-7.                                                 2018, http://dx.doi.org/10.1007/978-3-030-00928-1_48.
[43] X. Li, D. Song, et al., Emotion recognition from multi-channel EEG data through          [56] S.J. Reznik, J.J.B. Allen, Frontal asymmetry as a mediator and moderator of
     Convolutional Recurrent Neural Network, in: 2016 IEEE International Conference                emotion: An updated review, Psychophysiology 55 (1) (2018) e12965.
     on Bioinformatics and Biomedicine, BIBM, 2016, pp. 352–359.                              [57] D. Seo, C.A. Olman, Neural correlates of preparatory and regulatory control
[44] Y. Zhang, J. Chen, et al., An investigation of deep learning models for EEG-based             over positive and negative emotion, Soc. Cogn. Affect. Neurosci. 9 (4) (2014)
     emotion recognition, Front. Neurosci. 14 (2020).                                              494–504.
[45] L. Pessoa, A network model of the emotional brain, Trends in Cognitive Sciences          [58] K. Yang, L. Tong, High Gamma band EEG closely related to emotion: Evidence
     21 (5) (2017) 357–371.                                                                        from functional network, Front. Hum. Neurosci. 14 (2020).
[46] T. Song, W. Zheng, et al., EEG emotion recognition using dynamical graph                 [59] Y. Li, L. Wang, et al., A novel bi-hemispheric discrepancy model for EEG emotion
     convolutional neural networks, IEEE Trans. Affect. Comput. 11 (3) (2020)                      recognition, IEEE Trans. Cogn. Dev. Syst. 13 (2) (2021) 354–367.
     532–541.                                                                                 [60] F. Shen, G. Dai, et al., EEG-based emotion recognition using 4D convolutional
[47] P. Zhong, D. Wang, C. Miao, EEG-based emotion recognition using regularized                   recurrent neural network, Cogn. Neurodyn. 14 (2020) 815–828.
     graph neural networks, IEEE Trans. Affect. Comput. 13 (3) (2022) 1290–1301.              [61] Z. Jia, Y. Lin, et al., SST-EmotionNet: Spatial-spectral-temporal based attention
[48] T. Zhang, X. Wang, et al., GCB-net: Graph convolutional broad network and its                 3D dense network for EEG emotion recognition, in: Proceedings of the 28th ACM
     application in emotion recognition, IEEE Trans. Affect. Comput. 13 (1) (2022)                 International Conference on Multimedia, MM ’20, Association for Computing
     379–388.                                                                                      Machinery, 2020, pp. 2909–2917, http://dx.doi.org/10.1145/3394171.3413724.
[49] Y. Yang, Q. Wu, et al., Continuous convolutional neural network with 3D                  [62] J. Liu, Y. Zhao, et al., Positional-spectral-temporal attention in 3D convolutional
     input for EEG-based emotion recognition, in: 2018 International Conference                    neural networks for EEG emotion recognition, in: 2021 Asia-Pacific Signal and
     on Neural Information Processing, 2018, http://dx.doi.org/10.1007/978-3-030-                  Information Processing Association Annual Summit and Conference, APSIPA ASC,
     04239-4_39.                                                                                   2021, pp. 305–312.
[50] J. Li, Z. Zhang, et al., Hierarchical convolutional neural networks for EEG-based        [63] J. Liu, H. Wu, et al., Spatial–temporal transformers for EEG emotion recognition,
     emotion recognition, Cogn. Comput. 10 (2018) 368–380.                                         2021, preprint, http://dx.doi.org/10.48550/arXiv.2110.06553.
[51] Y. Yang, Q. Wu, et al., Emotion recognition from multi-channel EEG through               [64] Z. Wang, Z. Zhou, et al., JDAT: Joint-Dimension-Aware Transformer with Strong
     parallel convolutional recurrent neural network, in: 2018 International Joint                 Flexibility for EEG Emotion Recognition, TechRxiv, 2021, preprint http://dx.doi.
     Conference on Neural Networks, IJCNN, 2018.                                                   org/10.36227/techrxiv.17056961.v1.
[52] A. Vaswani, N. Shazeer, et al., Attention is all you need, in: Proceedings of            [65] G. Xiao, M. Shi, et al., 4D attention-based neural network for EEG emotion
     the 31st International Conference on Neural Information Processing Systems,                   recognition, Cogn. Neurodyn. 16 (2022) 805–818.
     NIPS’17, 2017, pp. 6000–6010.                                                            [66] G. Bao, K. Yang, et al., Linking multi-layer dynamical GCN with style-based
[53] W. Zheng, W. Liu, et al., EmotionMeter: A multimodal framework for recognizing                recalibration CNN for EEG-based emotion recognition, Front. Neurorobot. 16
     human emotions, IEEE Trans. Cybern. 49 (3) (2019) 1110–1122.                                  (2022).
                                                                                              [67] M. Zhong, Q. Yang, et al., EEG emotion recognition based on TQWT-features and
                                                                                                   hybrid convolutional recurrent neural network, Biomed. Signal Process. Control
                                                                                                   79 (2) (2023) 104211.
13