Deep Learning Based Emotion Classification through EEG
Spectrogram Images
Lorenzo Battistia , Alessio Ferratoa , Carla Limongellia , Mauro Mezzinib and
Giuseppe Sansonettia
a
    Department of Engineering, Roma Tre University, Via della Vasca Navale 79, 00146 Rome, Italy
b
    Department of Education, Roma Tre University, Viale del Castro Pretorio 20, 00185 Rome, Italy


                                             Abstract
                                             Emotion modeling for social robotics has the great potential to improve the life quality for the elderly and individuals with
                                             disabilities by making communication, care, and interactions more effective. It can help individuals with communication
                                             difficulties express their emotions. It can also be used to monitor the emotional well-being of elderly persons living alone
                                             and alert caregivers or family members if there are signs of distress. More broadly, emotion modeling is necessary to design
                                             robots closer and closer to human beings that can naturally interact with them by understanding their behavior and reactions.
                                             Here, we propose a deep learning technique for emotion classification using electroencephalogram (EEG) signals. We aim to
                                             recognize valence, arousal, dominance, and likability. Our technique uses the spectrogram from each of the 32 electrodes
                                             applied in the skull area. Then, we employ a Resnet101 convolutional neural network to learn a model capable of predicting
                                             several emotions. We built and tested our model on the DEAP dataset.

                                             Keywords
                                             Emotion classification, Electroencephalogram, Deep Learning


1. Introduction and Background                                                                                      valence. For example, a negative value of valence is re-
                                                                                                                    lated to sadness whilst a positive value is associated with
Automatic emotion recognition is a vast and complex                                                                 feelings of happiness. Dominance refers to an individ-
area of research. It has attracted the attention of sci-                                                            ual’s sense of control or authority in a specific scenario.
entists in many fields, including psychology, artificial                                                            A high value of dominance is linked to feelings of control,
intelligence, neuroscience, and robotics. The main goal                                                             whereas low levels are linked to feelings of helplessness.
of this research is to create systems capable of automati-                                                             Emotion recognition is crucial for the creation of so-
cally recognizing and interpreting human emotions.                                                                  cial robots. The knowledge of these dimensions can be
   Emotions are a fundamental part of the human expe-                                                               exploited to create a more natural and human-like inter-
rience. From the pleasant joy of spending time with a                                                               action between robots and humans. By understanding
loved one to the pain of facing a difficult time in life.                                                           and expressing different emotional states, robots can bet-
Several models have been created to describe emotions,                                                              ter understand and respond to the emotional needs of
which can be divided into two large groups: categorical                                                             humans, and therefore improve the overall experience.
models that represent the space of all emotions as a fi-                                                            Several techniques for emotion recognition have been
nite set, and dimensional models that represent emotions                                                            proposed in the scientific literature, including the analy-
through continuous values on multiple axes [1]. Con-                                                                sis of physiological signals [3], natural language process-
cerning dimensional models, three main components are                                                               ing [4], and facial expressions [5]. However, there is no
frequently used to define emotions and affective states:                                                            single solution for emotion recognition, and research in
arousal, valence, and dominance [2]. Arousal refers to an                                                           this field is still under development.
individual’s level of enthusiasm or activity. High arousal                                                             Our proposal focuses on the analysis of physiological
levels are related to emotions of excitement, whereas                                                               signals, in particular, the electroencephalogram (EEG)
lower ones are associated with relaxation. The positivity                                                           signals for emotion recognition. This field has already
or negativity of an emotional experience is referred to as                                                          been studied for a decade. In 2014, Wang et al. [6] pro-
                                                                                                                    posed to extract EEG data (power spectrum, wavelet,
Joint Proceedings of the ACM IUI Workshops 2023, March 2023,
Sydney, Australia                                                                                                   and nonlinear analysis) from the observation of movie
Envelope-Open lor.battisti5@stud.uniroma3.it (L. Battisti);                                                         clips, to assess the association between EEG data and
ale.ferrato@stud.uniroma3.it (A. Ferrato);                                                                          emotional states. Using a Support Vector Machine classi-
limongel@dia.uniroma3.it (C. Limongelli);                                                                           fier the authors showed that representing the state space
mauro.mezzini@uniroma3.it (M. Mezzini);
                                                                                                                    model in the form of linear dynamical systems removes
gsansone@dia.uniroma3.it (G. Sansonetti)
Orcid 0000-0003-4953-1390 (G. Sansonetti)                                                                           the noise not correlated with emotions. This makes the
                                       © 2023 Copyright © 2023 for this paper by its authors. Use permitted under
                                       Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                                    classification of emotions more accurate. In 2018, Dabas
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
et al. [7] proposed a 3D emotional model for classifying     please refer to [12]). Figure 2 shows an example of the
the emotions of users watching music videos based on         spectral data of the first two EEG channels of one partic-
the DEAP dataset [8]. In 2019 Donmez and Ozkurt [9]          ipant while watching one video. The continuous scale
proposed to classify EEG signals by using a convolutional
neural network. They classified three emotions by using
brain signals and spectrogram images.


2. Methodology
In this paper, we propose a machine learning [10] tech-
nique, more precisely a deep learning [11] technique, for
the realization of a predictive model of emotions using
the EEG signal. We built and tested this model using
one of the best-known online datasets, the DEAP dataset.
This dataset contains the EEG signals of 32 individuals
that were collected while the subjects watched and lis-
tened to music videos taken from YouTube1 . Each subject
was invited to view 40 one-minute videos and then asked
to express her emotions on the dimensional model shown
in Figure 1. Additionally, a parameter called likability


                                                             Figure 2: The color plot of the spectral data of the first two
                                                             EEG channels of one participant while watching one video.


                                                              of each emotion 𝑒 was transformed into a binary value
                                                              𝑏(𝑒) ∈ {0, 1} so that 𝑏(𝑒) = 0 if 𝑒 < 5, 𝑏(𝑒) = 1 if 𝑒 ≥ 5.
                                                              Denoting below the viewing of a video by a subject as
                                                              an experiment, we divided the total number of experi-
                                                              ments (i.e., 40 × 32 = 1280) reserving 32 experiments for
                                                              the validation set, 32 for the test set, and the remaining
Figure 1: A 3D representation of the emotion dimensional
                                                              for the training set. The experiments belonging to the
model.
                                                              validation and test set were arbitrarily chosen, one for
                                                              each participant relating to different video experiments,
was used to quantify how much the participant liked the so that the sum of positive emotions (in which 𝑏(𝑒) = 1)
stimulus. For each dimension, the participant was asked and the negative ones (in which 𝑏(𝑒) = 0) approximately
to rate its intensity on a continuous scale between 1 and balance each other. We used the ResNet101 [13] convolu-
9, where 1 stands for minimum intensity, and 9 for maxi- tional neural network, suitably adapted to take as input
mum intensity. The EEG signal consists of 32 channels, a tensor with an arbitrary number of input channels.
each corresponding to an electrode that measures the We empirically tested different hyperparameter config-
difference in electric potential in the skull area where urations through a grid search, obtaining the following
it is positioned. The proposed methodology provides optimal values:
for the spectral analysis of the signal. This is achieved
                                                                    • Loss Function: Cross-Entropy
by applying the Discrete Fourier Transform to the sig-
                                                                    • Optimizer: Stochastic Gradient Descent (SGD)
nal, thus obtaining the power of the individual sinusoids
                                                                    • Momentum: 0.9
that make up the signal. The spectrogram obtained for
                                                                    • Weight Decay: 0.0005
each EEG channel is a two-dimensional matrix where
                                                                    • Learning Rate: 3.0e-3
each cell (𝑓 , 𝑡) represents the intensity of the sinusoid at
the frequency 𝑓 in the time segment 𝑡 (for more details The training set size was limited, so the network tends to
                                                              overfit after about 200-300 epochs reaching 100% of accu-
     1
       https://www.youtube.com/                               racy on the training set. Therefore, we introduced a data
Table 1                                                        stimulus can be exploited to improve the algorithms for
Accuracy on 400 epochs without and with augmentation           suggesting points of interest to visit (e.g., cultural her-
                                                               itage resources [20, 21, 22] such as museums [23, 24, 25]
    Emotion        W/O Augmentation W/ Augmentation
                                                               or restaurants [26, 27]) and itineraries to follow between
     Valence               43%                    43%          them [28, 29]. Finally, future development could concern
     Arousal               50%                    60%          the increase in the stimulus classes to which the user is
  Dominance                50%                    63%          subjected. In this article, we analyzed the EEG signal
    Likability             60%                    60%          collected while the user watches music videos. It would
                                                               be interesting to collect and subsequently analyze the
                                                               EGG signal while the user listens to music [30], reads
augmentation process that performed a random horizon- news articles [31], watches a movie [32] or looks at an
tal offset of the input tensor by up to 20% of the horizontal image [33].
dimension. The rationale behind this is that a horizontal
shift corresponds to a time shift of the signal. It is reason-
able that this variability could occur between subject and References
subject and between experiment and experiment. Table
1 shows the preliminary experimental results.                    [1] A. F. Bulagang, N. G. Weng, J. Mountstephens,
                                                                     J. Teo, A review of recent approaches for emotion
                                                                     classification using electrocardiography and elec-
3. Conclusions and Future Works                                      trodermography signals, Informatics in Medicine
                                                                     Unlocked 20 (2020) 100363.
In the research literature, it has been largely shown that       [2] J. Russell, A circumplex model of affect, Jour-
the knowledge of the user’s emotions can make a signifi-             nal of Personality and Social Psychology 39 (1980)
cant contribution to the creation of increasingly effective          1161–1178.
human-machine interaction systems. Several aspects can           [3] F. Cavallo, F. Semeraro, L. Fiorini, G. Magyar,
be analyzed to recognize emotions and, more generally,               P. Sinčák, P. Dario, Emotion modelling for social
the user’s affective state. In this article, we have pre-            robotics applications: a review, Journal of Bionic
sented a deep learning approach to EEG signal analysis.              Engineering 15 (2018) 185–203.
Specifically, a ResNet101 convolutional neural network           [4] J. Deng, F. Ren, A survey of textual emotion recog-
takes the EEG spectrogram as input and returns the val-              nition and its challenges, IEEE Transactions on
ues of arousal, valence, dominance, and likability.                  Affective Computing (2021).
   Our idea is still evolving, so the possible future develop- [5] N. Webb, A. Ruiz-Garcia, M. Elshaw, V. Palade, Emo-
ments are manifold. These developments can be method-                tion recognition from face images in an uncon-
ological or applicative. As regards the former, clearly the          strained environment for usage on social robots,
data at our disposal are too limited to fully exploit the po-        in: 2020 International Joint Conference on Neural
tential of deep neural networks. We, therefore, need new             Networks (IJCNN), 2020, pp. 1–8.
data, so we are planning to collect it ourselves with the        [6] X.-W. Wang, D. Nie, B.-L. Lu, Emotional state clas-
appropriate instrumentation. Another aspect concerns                 sification from eeg data using machine learning
the deep neural network chosen. The ResNet101 is one of              approach, Neurocomputing 129 (2014) 94–106.
the many possibilities that deep learning research makes         [7] H. Dabas, C. Sethi, C. Dua, M. Dalawat, D. Sethia,
available today. A further development of our work con-              Emotion classification using eeg signals, in: Pro-
cerns the data augmentation process, which has been                  ceedings of the 2018 2nd International Conference
shown to be able to improve the model accuracy. In the               on Computer Science and Artificial Intelligence,
system described, the data augmentation concerned only               2018, pp. 380–384.
the horizontal shift. Hence, we want to apply new geo- [8] S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yaz-
metric transformations and image processing techniques               dani, T. Ebrahimi, T. Pun, A. Nijholt, I. Patras, Deap:
and verify whether they can further improve the accu-                A database for emotion analysis using physiological
racy of the results. As far as application developments are          signals, IEEE Transactions on Affective Computing
concerned, our idea is to combine physiological data with            3 (2012) 18–31.
those related to facial expressions and eye tracking. Our        [9] H. Donmez, N. Ozkurt, Emotion classification from
ultimate goal is to improve human-machine interaction,               eeg signals in convolutional neural networks, in:
both when the user is dealing with social robots and with            2019 Innovations in Intelligent Systems and Appli-
recommender systems [14, 15, 16] or multimedia applica-              cations Conference (ASYU), 2019, pp. 1–6.
tions [17, 18, 19]. For instance, the information related to [10] L. Vaccaro, G. Sansonetti, A. Micarelli, An empirical
the emotions that the user feels when faced with a certain
     review of automated machine learning, Computers           [23] A. Ferrato, C. Limongelli, M. Mezzini, G. Sansonetti,
     10 (2021).                                                     Using deep learning for collecting data about mu-
[11] G. Sansonetti, F. Gasparetti, G. D’Aniello, A. Mi-             seum visitor behavior, Applied Sciences 12 (2022).
     carelli, Unreliable users detection in social media:      [24] A. Ferrato, C. Limongelli, M. Mezzini, G. Sansonetti,
     Deep learning techniques for automatic detection,              The meta4rs proposal: Museum emotion and track-
     IEEE Access 8 (2020) 213154–213167.                            ing analysis for recommender systems, in: Adjunct
[12] M. X. Cohen, Analyzing neural time series data:                Proceedings of the 30th ACM Conference on User
     theory and practice, MIT press, 2014.                          Modeling, Adaptation and Personalization, UMAP
[13] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learn-          ’22 Adjunct, Association for Computing Machinery,
     ing for image recognition, in: 2016 IEEE Conference            New York, NY, USA, 2022, pp. 406–409.
     on Computer Vision and Pattern Recognition, IEEE          [25] M. Mezzini, C. Limongelli, G. Sansonetti,
     Computer Society, Los Alamitos, CA, USA, 2016,                 C. De Medio, Tracking museum visitors through
     pp. 770–778. Las Vegas, NV, US, 27–30 June 2016.               convolutional object detectors,            in: Adjunct
[14] F. Gasparetti, G. Sansonetti, A. Micarelli, Commu-             Publication of the 28th ACM Conference on User
     nity detection in social recommender systems: a                Modeling, Adaptation and Personalization, UMAP
     survey, Applied Intelligence 51 (2021) 3975–3995.              ’20 Adjunct, Association for Computing Machinery,
[15] G. Sansonetti, Point of interest recommendation                New York, NY, USA, 2020, p. 352–355.
     based on social and linked open data, Personal and        [26] C. Biancalana, F. Gasparetti, A. Micarelli, G. San-
     Ubiquitous Computing 23 (2019) 199–214.                        sonetti, An approach to social recommendation for
[16] D. Feltoni Gurini, F. Gasparetti, A. Micarelli, G. San-        context-aware mobile services, ACM Trans. Intell.
     sonetti, Temporal people-to-people recommenda-                 Syst. Technol. 4 (2013) 10:1–10:31.
     tion on social networks with sentiment-based ma-          [27] N. Sardella, C. Biancalana, A. Micarelli, G. San-
     trix factorization, Future Generation Computer                 sonetti, An approach to conversational recommen-
     Systems 78 (2018) 430–439.                                     dation of restaurants, in: C. Stephanidis (Ed.), HCI
[17] A. Micarelli, A. Neri, G. Sansonetti, A case-based             International 2019 - Posters, Springer International
     approach to image recognition, in: Proceedings of              Publishing, Cham, 2019, pp. 123–130.
     the 5th European Workshop on Advances in Case-            [28] D. D’Agostino, F. Gasparetti, A. Micarelli, G. San-
     Based Reasoning, EWCBR ’00, Springer-Verlag,                   sonetti, A social context-aware recommender of
     Berlin, Heidelberg, 2000, pp. 443–454.                         itineraries between relevant points of interest, in:
[18] G. Sansonetti, F. Gasparetti, A. Micarelli, Using so-          HCI International 2016, volume 618, Springer Inter-
     cial media for personalizing the cultural heritage ex-         national Publishing, Cham, 2016, pp. 354–359.
     perience, in: Adjunct Proceedings of the 29th ACM         [29] A. Fogli, G. Sansonetti, Exploiting semantics for
     Conference on User Modeling, Adaptation and Per-               context-aware itinerary recommendation, Personal
     sonalization, UMAP ’21, Association for Computing              and Ubiquitous Computing 23 (2019) 215–231.
     Machinery, New York, NY, USA, 2021, p. 189–193.           [30] M. Onori, A. Micarelli, G. Sansonetti, A comparative
[19] L. Xie, Z. Deng, S. Cox, Multimodal joint infor-               analysis of personality-based music recommender
     mation processing in human machine interaction:                systems, in: CEUR Workshop Proceedings, volume
     Recent advances, Multimedia Tools Appl. 73 (2014)              1680, CEUR-WS.org, Aachen, Germany, 2016, pp.
     267–271.                                                       55–59.
[20] A. De Angelis, F. Gasparetti, A. Micarelli, G. San-       [31] S. Caldarelli, D. F. Gurini, A. Micarelli, G. Sansonetti,
     sonetti, A social cultural recommender based on                A signal-based approach to news recommendation,
     linked open data, in: Adjunct Publication of the               in: CEUR Workshop Proceedings, volume 1618,
     25th Conference on User Modeling, Adaptation and               CEUR-WS.org, Aachen, Germany, 2016, pp. 1–4.
     Personalization, UMAP ’17, ACM, New York, NY,             [32] C. Biancalana, F. Gasparetti, A. Micarelli, A. Mi-
     USA, 2017, pp. 329–332.                                        ola, G. Sansonetti, Context-aware movie recom-
[21] G. Sansonetti, F. Gasparetti, A. Micarelli, Cross-             mendation based on signal processing and machine
     domain recommendation for enhancing cultural                   learning, in: Proceedings of the 2nd Challenge on
     heritage experience, in: Adjunct Publication of                Context-Aware Movie Recommendation, CAMRa
     the 27th Conference on User Modeling, Adaptation               ’11, ACM, New York, NY, USA, 2011, pp. 5–10.
     and Personalization, Association for Computing            [33] A. Mensen, W. Marshall, G. Tononi, Eeg differen-
     Machinery, New York, NY, USA, 2019, pp. 413–415.               tiation analysis and stimulus set meaningfulness,
[22] G. Sansonetti, F. Gasparetti, A. Micarelli, F. Cena,           Frontiers in psychology 8 (2017) 1748.
     C. Gena, Enhancing cultural recommendations
     through social and linked open data, User Modeling
     and User-Adapted Interaction 29 (2019) 121–159.