<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Human emotion recognition from EEG signals: model evaluation in DEAP and SEED datasets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mohit Kumar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marta Molinas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Norwegian University of Science and Technology</institution>
          ,
          <addr-line>Trondheim</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The automatic distinction of human emotional states can provide the technological basis for applications in healthcare, education, marketing, and manufacturing sectors that rely on human-machine interfaces. Emotion recognition from Electroencephalography (EEG) signals is a challenging task entailing the development of classification models that should accurately distinguish among diverse human emotions. In this work, two publicly available EEG emotion datasets, SEED, and DEAP, are used to develop automatic emotion detection models and to evaluate their performance for emotion recognition. Models are built based on a two-dimensional (valence or pleasantness and arousal or intensity) and a positive/neutral/negative emotion models using multilayer perceptron (MLP) and convolution neural network (CNN) classification algorithms. First, the preprocessed EEG signals of these datasets are decomposed into ifve rhythms, namely, delta, theta, alpha, beta, and gamma. The diferential entropy (DE) is computed from the rhythms of the EEG signals and used as a feature for the classification algorithms. Epochs of 1-sec. duration are considered for the computation of DE features. The CNN-based method achieves a better F1-score (93.7%) for the SEED dataset as compared to the MLP-based method. However, for DEAP dataset, 94.5% and 94%, F1-scores are achieved for high vs. low arousal and high vs. low valence classes respectively, with no significant diference between the performance of the two methods.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Emotion detection</kwd>
        <kwd>EEG</kwd>
        <kwd>Brain rhythms</kwd>
        <kwd>Features</kwd>
        <kwd>Arousal-Valence</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Emotion is considered an important factor in human life as it afects the human working ability,
communication, personal, social life, mental state, and physical health. Human brain is
responsible for the generation and regulation of diferent emotions. Humans have a unique ability to
adjust their behavior while interacting with each other. The human can do this because they
are able to decipher the emotional states of others. Human-machine interaction can also be
improved if the machines are able to infer human emotional states. Hence, automatic emotion
recognition can be very useful to improve human-machine interaction. Moreover, the study of
human emotion encompasses research in various fields such as cognitive science, computer
science, neuroscience, and psychology [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Electroencephalogram (EEG) based emotion
recognition has opened a vast space for exploration and innovation in these disciplines [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A
correlation between human emotions and EEG signal is observed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. EEG-based methods are
considered to be more reliable than facial expression and gesture [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] as they are less susceptible
to counterfeit.
      </p>
      <p>
        Automatic detection approaches traditionally utilize hand-crafted features from EEG signals
and implement a classification model for EEG emotion recognition [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Various methodologies
such as Fourier transform, power spectral density, and wavelet transform are widely used to
analyse the EEG signal for emotion recognition [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The support vector machine (SVM) based
methodology is used to classify joy, sadness, anger, and pleasure feelings [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The method
proposed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is based on frequency band searching to find an optimal band for emotion
recognition to then apply the SVM method for emotion classification. It is observed that the
gamma band is suitable for EEG-based emotion classification. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], diferential entropy (DE)
is extracted as a feature and found to be efective in representing the emotional states in EEG
signals. A transfer recursive feature elimination based approach is proposed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which is
found to be useful for selecting the optimal feature subset.
      </p>
      <p>
        Recently, various studies have focused on deep learning based methods for the classification
of EEG signals [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. A classification method based on a deep belief network (DBN) is introduced
for the classification of EEG signals related to the diferent types of emotion [
        <xref ref-type="bibr" rid="ref1 ref11">11, 1</xref>
        ]. The
performance of the DBN is found to be better than that of SVM and K-nearest neighbour
classifiers. A regularized graph neural network is proposed to capture both local and global
relations among diferent EEG channels for emotion recognition [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. A method based on
long short-term memory (LSTM) classifier is introduced to extract temporal information to
discriminate the diferent emotions using EEG signals [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ], a model based on LSTM
and convolution neural network (CNN) is proposed which incorporated both temporal and
spatial information for emotion identification.
      </p>
      <p>
        In this work, our aim is to develop an automated method that can accurately detect human
emotions using EEG signals. The current state of the art methods [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ] for detecting human
emotions are generally based on rather complex network architectures. Hence, in this work, we
focused on less complex classification models and for that, we have chosen two classification
approaches, one based on multilayer perceptron (MLP) and another based on CNN. The rationale
for choosing the CNN model is to utilise the spatial information provided by the location of
the diferent EEG channels on the scalp. For comparison purposes and in order to understand
the value of spatial information, the simpler MLP architecture is chosen since its input is one
dimensional and does not take into account the spatial information. We then compare the
performance of the CNN and MLP models on the SEED and DEAP datasets.
      </p>
      <p>The remaining paper is arranged as follows: the two datasets used to evaluate the proposed
method are described in section 2. Section 3 presents the methodology which includes the
feature extraction and classification methods. Results and conclusions are provided in section 4
and section 5, respectively.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset description</title>
      <sec id="sec-2-1">
        <title>In the present work, the following two datasets are analysed:</title>
        <sec id="sec-2-1-1">
          <title>2.1. DEAP dataset</title>
          <p>
            The DEAP dataset contains the EEG signals of 32 subjects [
            <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
            ]. The recording was performed
when the subjects were watching 40 pieces of music videos. The duration of the music video was
1 minute. The signals were collected using 32 EEG channels. The EEG signals were recorded
at a sampling frequency of 512 Hz and further down-sampled to 128 Hz. The EEG signals
were passed through a bandpass filter of 4-45 Hz and EOG artifacts were also removed. The
pre-processed EEG signals in each trial consisted of 3 sec. baseline data and 60 sec. trial data.
During the recordings of EEG signals, the participants were asked to rate the levels of arousal,
valence, liking, and dominance for each video from 1 to 9 using the self-assessment manikin.
Further details of the experiment are available in [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ]
          </p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.2. SEED dataset</title>
          <p>
            The SEED dataset [
            <xref ref-type="bibr" rid="ref1">1, 18</xref>
            ] contains the EEG signals of 15 subjects obtained from an emotion
elicitation paradigm. It contains the EEG signals of 3 sessions for each subject and 15 trials
for each session. The recording was performed when the subjects were watching Chinese film
clips of 4 minutes duration. The signals were collected using a 62 channels EEG system with
a sampling frequency of 1000 Hz that was further down-sampled to 200 Hz. The EEG signals
were passed through a bandpass filter of 0-75 Hz [ 18]. During the recordings of EEG signals,
the participants were asked to rate each film clip with three types of emotions, namely positive,
neutral, and negative. The class labels -1, 0, and 1 are used to represent negative, neutral, and
positive classes. Further details of the experiment are available in [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ].
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>The overall methodology is summarised in Figure 1. The steps involved in the method are
described as follows:</p>
      <sec id="sec-3-1">
        <title>3.1. Decomposition and segmentation</title>
        <p>In the present work, the Butterworth filter of order 3 is used to decompose the EEG signals into
the EEG rhythms. The EEG signals of SEED dataset are decomposed into five EEG rhythms
named, delta (1- 4Hz), theta (4-8Hz), alpha (8-14Hz), beta (14-31Hz), and gamma (31- 51Hz).
The EEG signals of the DEAP dataset are decomposed into four rhythms namely, theta (4-8Hz),
alpha (8-14Hz), beta (14-31Hz), and gamma (31-45Hz), as the DEAP dataset is preprocessed
using 4 Hz to 45 Hz band pass filter. In [ 19], the 1 sec. segment length was found to be most
suitable for emotion recognition. Hence, the EEG rhythms are segmented into epochs of 1 sec.
duration. After segmentation, the total number of epochs for each subject are 3394 and 2400 for
SEED and DEAP datasets, respectively.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Feature extraction</title>
        <p>
          In this work, we have used DE as a feature for the classification of diferent types of emotions.
DE is known to provide a measure of the complexity of a signal and has been applied with
success to non-stationary and non-linear signals such as EEG. DE can diferentiate between low
and high frequency energy [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. It is defined as:
ℎ() = −
∫︁

()(())
(1)
where  is a random variable and in this work,  represents the segmented EEG rhythms. The
() is the probability density function. For a Gaussian random variable, DE can be estimated
as follows:
        </p>
        <p>1
ℎ() = (2 2) (2)</p>
        <p>2
where e and  2 are the Euler’s constant and the variance of a time series, respectively. In [20],
it is shown that the sub-band EEG signals can meet the Gaussian distribution criterion. Hence,
the DE features are computed from the EEG rhythms.</p>
        <p>For the SEED dataset, the DE feature is extracted from the five EEG rhythms. The obtained
feature matrix has the dimension of (3394, 62, 5) for each subject and each session, where, 3394
represents the total number of epochs collected from the 15 trials, 62 represents the number of
channels, and 5 is corresponding to the five diferent EEG rhythms.</p>
        <p>
          For the DEAP dataset, the DE feature is extracted from the four EEG rhythms. The feature
matrix has the dimension of (2400, 32, 4) for each subject, where, 2400 represents the total
epochs collected from all 40 trials for each subject, 32 represents the number of channels, and 4
is corresponding to the four subbands. The feature matrix is termed as DE. the DE feature
is also extracted from the 3 secs. baseline provided with the DEAP dataset. First, we have
divided the 3 secs. baseline into 3 segments of 1 sec. duration. Then, we computed the DE from
all 3 segments and marked it as DE. Finally, we consider the average value of the three
DE as the baseline DE features. The baseline DE feature matrix has a dimension of (40, 4)
for each subject, where, 40 represents the number of trials and 4 features correspond to the four
subbands. The baseline DE feature is subtracted from the main DE feature before applying it
to the input of the classifiers as suggested in [21]. However, in [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], the baseline is segmented
into six parts. Each one of them is of 0.5 sec. duration and the average baseline DE features are
computed over six segments.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Classification methods</title>
        <sec id="sec-3-3-1">
          <title>In this work, we have applied the following two classification approaches:</title>
          <p>
            3.3.1. MLP based classification
MLP is a widely used tool in various classification problems. It is also used for emotion
classification in [ 21]. It handles large amounts of input data well and can make quick predictions
after training. A two-layer neural network has an input layer, an output layer, and one hidden
layer between the input and output layers [22]. However, the MLP may have more than one
hidden layers. In this work, the employed architecture of the MLP consists of 4 hidden layers.
The hidden layers consist of 256, 256, 128, and 64 units corresponding to hidden layer 1 to hidden
layer 4. The number of units and layers are selected via the trial and experimentation method
so that the model can perform well on both SEED and DEAP datasets. For the SEED dataset,
the feature matrix is reshaped into the size of (3394,310) for each subject and each session to
feed to the MLP classifier, and for the DEAP dataset is reshaped into the size of (2400,128).
3.3.2. CNN based classification
CNN is a special kind of neural network which has shown tremendous success in practical
applications [23]. In [
            <xref ref-type="bibr" rid="ref15">21, 15</xref>
            ], it is utilized for emotion classification using EEG signals. For the
CNN classifier, the DE feature matrix is arranged into the 2D features space according to the
location of electrodes as suggested in [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]. This mapping can preserve the spatial structure
information of the electrode location. The input data size to the CNN classifier is (3394, 8, 9, 5)
for SEED dataset and (2400, 8, 9, 4) for DEAP dataset. The convolution layer (CLr) is an essential
part of CNN architecture, it applies a set of filters to the input data that provides activation of
the features from the input data. The output can be referred to as the feature map [23]. The
details of the employed CNN architecture are as follows: It has two convolution blocks CB-1
and CB-2. CB-1 consists of 2 CLrs and one maxpooling layer. Each CLr of CB-1 has 256 units
with a filter size of 5X5. CB-2 contains one CLr having 128 units with a filter size of 4x4 and one
maxpooling layer. The filter size of maxpooling layer for CB-1 and CB-2 is 2x2 with a stride of
2. The output of CB-2 is flattened and connected to the dense layer having 64 units. Then, it is
ifnally connected to the classification layer having a softmax activation function. The number
of units, convolution blocks, and CLrs are selected via the trial and experimentation method so
that the model can perform well on both SEED and DEAP datasets.
          </p>
          <p>
            Generally, a CLr is followed by a pooling layer [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]. It is used to downsample the output of
the CLr layer. In case of a small translation in the input, it helps to keep the representation to
be nearly invariant [23]. The studies on the DEAP dataset [
            <xref ref-type="bibr" rid="ref14">14, 21</xref>
            ] suggested that the pooling
layer is not necessary as the size of the feature matrix is smaller than the size of the feature
matrix in the computer vision field. However, in [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ], one pooling layer is added after the last
CLr. Hence, in our proposed CNN architecture, each CLr is not followed by a pooling layer.
Each convolution block is followed by a pooling layer. All the CLrs of CB-1 and CB-2 are used
with the same padding to preserve the size of the feature matrix within each convolution block
as the size of the feature matrix is not large.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>
        In this work, a 5-fold cross-validation approach is applied to test the performance of the model.
In this approach, the dataset is divided into five equal sets. The dataset is iterated over five
times. At a time, four sets are used for the training, and the remaining one set is used for
testing. In each iteration, a diferent set is used for testing. Finally, the average classification
accuracy over five iterations is reported. The results are obtained for each subject separately.
The maximum number of iterations and batch size is kept 100 and 128 for both MLP and CNN
classifiers as suggested in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Both of the classification models are implemented using the
Keras library [24]. The Adam optimizer [25] is used with the default learning rate provided in
the Keras library. The RELU activation function is used in each layer of both classifiers except
for the final classification layer. It incorporates the nonlinearity in the neural network [ 23]. In
the final layer, the softmax activation function is used.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Results on SEED dataset</title>
        <p>The performance of the MLP model for SEED dataset for all 15 subjects in terms of accuracy is
shown in Figure 2. The highest accuracy of 98.4% is observed for the first session of the 15 ℎ
subject and the lowest accuracy of 69.6% is obtained for the third session of 4ℎ subject. From
Figure 2, it can be noted that the accuracy is found to be greater than 90% for subjects 1 (session
1), 5 (session 2 and 3), 6, 7 (session 2), 8 (session 2 and 3), 13 (session 1), and 15. The average
performance of MLP is summarised in Figure 4. The average accuracy and F1-score over 15
subjects and three sessions are 86.8% and 86.5%, respectively. The performance of the CNN
model for SEED dataset for all 15 subjects in terms of accuracy is shown in Figure 3. The highest
accuracy of 99.6% is observed for the third session of the 15ℎ subject and the lowest accuracy
of 84.6% is obtained for the third session of the 4ℎ subject. From Figure 3, it can be noted that
the accuracy is found to be greater than 95% for subjects 5 (session 2 and 3), 6 (session 1 and 2),
8 (session 3), 10 (session 1), 12 (session 1), 13 (session 3), 14 (session 1), and 15. The average
accuracy and F1-score over 15 subjects and three sessions are 93.8% and 93.7%, respectively,
which are shown in Figure 4. The accuracy is significantly improved with the CNN classifier as
compared to that of the MLP classifier ( -value&lt;0.001). The improvement of accuracy with the
CNN classifier can be explained by the fact that the CNN takes tensor (2D) as input so it can
understand spatial relation between EEG channels better and this can explain why the CNN
performs better than MLP, since MLP takes vector (1D) as input.</p>
        <p>The confusion matrix for subject 1 (session 1) is depicted in Figure 5. It is the summation
of the five confusion matrices obtained over 5-folds. From Figure 5, it can be observed that
the performance of the CNN model is well balanced over the three classes of emotion. For
neutral, positive, and negative classes, it has correctly identified 1048, 1139, and 1012 samples,
respectively. However, MLP model is performing better for positive class than for negative and
neutral classes.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results on DEAP dataset</title>
        <p>For the DEAP dataset, the participants’ rating are available on a scale of 1 to 9. Hence, ratings
greater than 5 are considered as high valence (HV) and high arousal (HA), and others are
considered as low valence (LV) and low arousal (LA). The classification performance of the
proposed models is evaluated for two diferent sets of classes which are summarised as follows:
4.2.1. High and low valence classes
The accuracy of MLP classifier for DEAP dataset is shown in Figure 6 for low and high
valence classes. Subjects 15 and 22 have shown the highest and lowest accuracy of 97.2% and
86%, respectively. The accuracy is higher than or equal to 95% for the following 10 subjects,
1,6,7,10,13,15,16,18,23 and 27. The results of the CNN classifier in terms of accuracy are shown
in Figure 7. Once again, the highest accuracy of 97.29% is observed for 15ℎ subject. The CNN
classifier shows the lowest accuracy of 86.4 % for 5ℎ subject. However, for subject 22, the CCN
classifier shows 88.5 % accuracy for which MLP classifier shows the lowest accuracy. It should
be noted that accuracy above 95% is observed for the same 10 subjects as mentioned for the
MLP classifier. Hence, it can be perceived that there is not any significant diference between
MLP and CNN classifiers’ performance for the DEAP dataset for high and low valence classes.
The average accuracy and F1-score over 32 subjects are depicted in Figure 11. The classification
results are not significantly diferent for MLP and CNN classifiers ( -value&gt;0.05). For subject 1,
the summation of the five confusion matrices obtained over 5-folds is shown in Figure 8. From
Figure 8, it can be perceived that the two models are performing well for both high and low
valence cases. Also, from the confusion matrix, it can be inferred that performance of the CNN
and MLP models are nearly closed to each other. The reason for this is that the SEED used 62
channels while the DEAP has only 32 channels and therefore the CNN does not benefit much
from only 32 channels compared to 62 channels that provides more information about possible
correlations among nearby channels.
4.2.2. High and low arousal classes
The accuracy of MLP classifier for DEAP dataset is shown in Figure 9 for low and high
arousal classes. Subjects 13 and 22 have shown the highest and lowest accuracy of 97.33%
and 86.7%, respectively. The accuracy is higher than 95% for the following subjects,
1,3,7,10,12,13,15,16,18,19,20,21,23,24,25 and 29. The results of CNN classifier in terms of accuracy
are shown in Figure 10. Once again, the highest accuracy of 97.75% is observed for the 13ℎ
subject. The CNN classifier shows the lowest accuracy of 86.4 % for 5ℎ subject. However,
for subject 22 the CCN classifier shows 89.4 % accuracy for which MLP classifier shows the
lowest accuracy. Subjects 1,3,7,12,13,15,16,18,19,20,21,23,25 and 27 have shown more than 95%
accuracy for the CNN classifier. There is not any significant diference ( -value&gt;0.05) between
MLP and CNN classifier performance for high and low arousal classes. The average result over
32 subjects are shown in Figure 11, which are also not significantly diferent for MLP and CNN
classifiers. The same reason as mentioned for high and low valence classes indicated for similar
results of MLP and CNN classifiers for high and low arousal classes. The extra strength of CNN
is not efectively used when the number of channels is 32. In Figure 12, the summation of the
ifve confusion matrices obtained over 5-folds for subject 1 is demonstrated. It can be seen in
Figure 12, that the two models are equally good to detect the high and low arousal classes.
(a)
(b)</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Comparison with other works</title>
        <p>
          The performance of the proposed MLP and CNN models are also compared with other state
of the art methods. The comparison is summarised in Table 1. In [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], a parallel convolution
recurrent neural network (PCRNN) is proposed. This method has achieved an average accuracy
of 90.80% and 91.03% for high/low valence classes and high/low arousal classes respectively,
with a 10-fold cross-validation method on DEAP dataset. The continuous convolutional neural
network (CCNN) method proposed in [21] yielded 89.45% and 90.24% accuracy for high/low
valence classes and high/low arousal classes respectively, with a 10-fold cross-validation method
on DEAP dataset. In [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], the 4D-convolution recurrent neural network (4D-CRNN) method
is proposed. It has achieved 94.22% and 94.58% accuracy for high/low valence classes and
high/low arousal classes respectively, with a 5-fold cross-validation method on DEAP dataset.
The same method is also applied to the SEED dataset and achieved 94.74% accuracy. The
performance of our method is also comparable to the above-mentioned state-of-the-art methods.
The architecture of the classification model in the above-mentioned methods [
          <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
          ] exhibits
higher complexity than the ones presented in this paper. However, our results are not directly
comparable to the results of the method proposed in [
          <xref ref-type="bibr" rid="ref14">14, 21</xref>
          ], as they have used a 10-fold
crossvalidation method. The obtained results indicate that comparable accuracy can be achieved with
less complex methods and also highlight the importance of the spatial features when using CNN.
The spatial features acquired relevance only when the number of channels was 62 compared to
when the number of channels was 32.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this work, two diferent classification approaches which are based on MLP and CNN models
are analysed and their performance is investigated on two publicly available EEG emotion
datasets, SEED and DEAP. The DE feature is used to feed the input of the classifiers. DE feature
is computed from the diferent EEG rhythms namely, delta, theta, alpha, beta, and gamma. The
experimental results show that CNN-based method outperforms the MLP-based method for
the SEED dataset. The average accuracy for CNN-based method for SEED dataset is 93.8%.
(a)
(b)
However, for DEAP dataset, no significant diference is observed in the performance of MLP
and CNN-based approaches. The average accuracy for DEAP dataset is found to be 94.33% and
93.53% for high/low arousal and high/low valence classes, respectively. The obtained results
show that it is possible to achieve state-of-the-art performance with less complex network
models. In future, our aim is to improve the obtained results and to achieve this, we plan to
implement a channel selection approach to select the channels which are more relevant for
emotion detection. We will also work to develop new features that can achieve better results.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>This work is supported by the European Research Consortium for Informatics and Mathematics
(ERCIM) fellowship.
2022-08-30.
[18] SEED Dataset, https://bcmi.sjtu.edu.cn/home/seed/seed.html, . Accessed: 2022-08-30.
[19] X.-W. Wang, D. Nie, B.-L. Lu, Emotional state classification from EEG data using machine
learning approach, Neurocomputing 129 (2014) 94–106.
[20] L.-C. Shi, Y.-Y. Jiao, B.-L. Lu, Diferential entropy feature for EEG-based vigilance estimation,
in: 35th Annual International Conference of the IEEE Engineering in Medicine and Biology
Society (EMBC), 2013, pp. 6627–6630.
[21] Y. Yang, Q. Wu, Y. Fu, X. Chen, Continuous convolutional neural network with 3D input
for EEG-based emotion recognition, in: L. Cheng, A. C. S. Leung, S. Ozawa (Eds.), Neural
Information Processing, 2018, pp. 433–443.
[22] C. M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
[23] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016.
[24] F. Chollet, et al., Keras, https://keras.io, 2015.
[25] D. Kingma, J. Ba, Adam: a method for stochastic optimization, arXiv preprint
arXiv:1412.6980 (2014).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W.-L.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-L.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks</article-title>
          ,
          <source>IEEE Transactions on Autonomous Mental Development</source>
          <volume>7</volume>
          (
          <year>2015</year>
          )
          <fpage>162</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Alarcão</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Fonseca</surname>
          </string-name>
          ,
          <article-title>Emotions recognition using EEG signals: a survey</article-title>
          ,
          <source>IEEE Transactions on Afective Computing</source>
          <volume>10</volume>
          (
          <year>2019</year>
          )
          <fpage>374</fpage>
          -
          <lpage>393</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sammler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grigutsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fritz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Koelsch</surname>
          </string-name>
          ,
          <article-title>Music and emotion: electrophysiological correlates of the processing of pleasant and unpleasant music</article-title>
          ,
          <source>Psychophysiology</source>
          <volume>44</volume>
          (
          <year>2007</year>
          )
          <fpage>293</fpage>
          -
          <lpage>304</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Exploring EEG features in cross-subject emotion recognition</article-title>
          ,
          <source>Frontiers in Neuroscience</source>
          <volume>12</volume>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .3389/fnins.
          <year>2018</year>
          .
          <volume>00162</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>A bi-hemisphere domain adversarial neural network model for EEG emotion recognition</article-title>
          ,
          <source>IEEE Transactions on Afective Computing</source>
          <volume>12</volume>
          (
          <year>2021</year>
          )
          <fpage>494</fpage>
          -
          <lpage>504</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.-P.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.-L. Wu</surname>
            ,
            <given-names>S.-K.</given-names>
          </string-name>
          <string-name>
            <surname>Jeng</surname>
            ,
            <given-names>J.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>EEG-based emotion recognition in music listening: a comparison of schemes for multiclass support vector machine</article-title>
          ,
          <source>in: IEEE International Conference on Acoustics, Speech and Signal Processing</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>489</fpage>
          -
          <lpage>492</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-L.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Emotion classification based on gamma-band EEG</article-title>
          ,
          <source>in: Annual International Conference of the IEEE Engineering in Medicine and Biology Society</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>1323</fpage>
          -
          <lpage>1326</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.-N.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-L.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Diferential entropy feature for EEG-based emotion classification</article-title>
          ,
          <source>in: 6th International IEEE/EMBS Conference on Neural Engineering (NER)</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , L. Liu,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Zhang,
          <article-title>Cross-subject EEG feature selection for emotion recognition using transfer recursive feature elimination</article-title>
          .,
          <string-name>
            <surname>Front Neurorobot</surname>
          </string-name>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Craik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Contreras-Vidal</surname>
          </string-name>
          ,
          <article-title>Deep learning for electroencephalogram (EEG) classification tasks: a review</article-title>
          ,
          <source>Journal of Neural Engineering</source>
          <volume>16</volume>
          (
          <year>2019</year>
          )
          <fpage>031001</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>W.-L. Zheng</surname>
            , J.-
            <given-names>Y.</given-names>
            Zhu, Y.
          </string-name>
          <string-name>
            <surname>Peng</surname>
            ,
            <given-names>B.-L.</given-names>
          </string-name>
          <string-name>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>EEG-based emotion classification using deep belief networks</article-title>
          ,
          <source>in: 2014 IEEE International Conference on Multimedia and Expo (ICME)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Miao, EEG-based emotion recognition using regularized graph neural networks</article-title>
          ,
          <source>IEEE Transactions on Afective Computing</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <fpage>1290</fpage>
          -
          <lpage>1301</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.-Y.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-L.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <article-title>Emotion recognition under sleep deprivation using a multimodal residual LSTM network</article-title>
          ,
          <source>in: International Joint Conference on Neural Networks (IJCNN)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Emotion recognition from multi-Channel EEG through parallel convolutional recurrent neural network</article-title>
          , in: 2018
          <source>International Joint Conference on Neural Networks (IJCNN)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Shen</surname>
          </string-name>
          , et al.,
          <article-title>EEG-based emotion recognition using 4D convolutional recurrent neural network</article-title>
          ,
          <source>Cognitive Neurodynamics 14</source>
          (
          <year>2020</year>
          )
          <fpage>815</fpage>
          -
          <lpage>828</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Koelstra</surname>
          </string-name>
          , et al.,
          <article-title>DEAP: a database for emotion analysis using physiological signals</article-title>
          ,
          <source>IEEE Transactions on Afective Computing</source>
          <volume>3</volume>
          (
          <year>2012</year>
          )
          <fpage>18</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>DEAP</given-names>
            <surname>Dataset</surname>
          </string-name>
          , https://www.eecs.qmul.ac.uk/mmv/datasets/deap/readme.html, . Accessed:
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>