Deep Learning Based Emotion Classification through EEG Spectrogram Images Lorenzo Battistia , Alessio Ferratoa , Carla Limongellia , Mauro Mezzinib and Giuseppe Sansonettia a Department of Engineering, Roma Tre University, Via della Vasca Navale 79, 00146 Rome, Italy b Department of Education, Roma Tre University, Viale del Castro Pretorio 20, 00185 Rome, Italy Abstract Emotion modeling for social robotics has the great potential to improve the life quality for the elderly and individuals with disabilities by making communication, care, and interactions more effective. It can help individuals with communication difficulties express their emotions. It can also be used to monitor the emotional well-being of elderly persons living alone and alert caregivers or family members if there are signs of distress. More broadly, emotion modeling is necessary to design robots closer and closer to human beings that can naturally interact with them by understanding their behavior and reactions. Here, we propose a deep learning technique for emotion classification using electroencephalogram (EEG) signals. We aim to recognize valence, arousal, dominance, and likability. Our technique uses the spectrogram from each of the 32 electrodes applied in the skull area. Then, we employ a Resnet101 convolutional neural network to learn a model capable of predicting several emotions. We built and tested our model on the DEAP dataset. Keywords Emotion classification, Electroencephalogram, Deep Learning 1. Introduction and Background valence. For example, a negative value of valence is re- lated to sadness whilst a positive value is associated with Automatic emotion recognition is a vast and complex feelings of happiness. Dominance refers to an individ- area of research. It has attracted the attention of sci- ual’s sense of control or authority in a specific scenario. entists in many fields, including psychology, artificial A high value of dominance is linked to feelings of control, intelligence, neuroscience, and robotics. The main goal whereas low levels are linked to feelings of helplessness. of this research is to create systems capable of automati- Emotion recognition is crucial for the creation of so- cally recognizing and interpreting human emotions. cial robots. The knowledge of these dimensions can be Emotions are a fundamental part of the human expe- exploited to create a more natural and human-like inter- rience. From the pleasant joy of spending time with a action between robots and humans. By understanding loved one to the pain of facing a difficult time in life. and expressing different emotional states, robots can bet- Several models have been created to describe emotions, ter understand and respond to the emotional needs of which can be divided into two large groups: categorical humans, and therefore improve the overall experience. models that represent the space of all emotions as a fi- Several techniques for emotion recognition have been nite set, and dimensional models that represent emotions proposed in the scientific literature, including the analy- through continuous values on multiple axes [1]. Con- sis of physiological signals [3], natural language process- cerning dimensional models, three main components are ing [4], and facial expressions [5]. However, there is no frequently used to define emotions and affective states: single solution for emotion recognition, and research in arousal, valence, and dominance [2]. Arousal refers to an this field is still under development. individual’s level of enthusiasm or activity. High arousal Our proposal focuses on the analysis of physiological levels are related to emotions of excitement, whereas signals, in particular, the electroencephalogram (EEG) lower ones are associated with relaxation. The positivity signals for emotion recognition. This field has already or negativity of an emotional experience is referred to as been studied for a decade. In 2014, Wang et al. [6] pro- posed to extract EEG data (power spectrum, wavelet, Joint Proceedings of the ACM IUI Workshops 2023, March 2023, Sydney, Australia and nonlinear analysis) from the observation of movie Envelope-Open lor.battisti5@stud.uniroma3.it (L. Battisti); clips, to assess the association between EEG data and ale.ferrato@stud.uniroma3.it (A. Ferrato); emotional states. Using a Support Vector Machine classi- limongel@dia.uniroma3.it (C. Limongelli); fier the authors showed that representing the state space mauro.mezzini@uniroma3.it (M. Mezzini); model in the form of linear dynamical systems removes gsansone@dia.uniroma3.it (G. Sansonetti) Orcid 0000-0003-4953-1390 (G. Sansonetti) the noise not correlated with emotions. This makes the Β© 2023 Copyright Β© 2023 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). classification of emotions more accurate. In 2018, Dabas CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) et al. [7] proposed a 3D emotional model for classifying please refer to [12]). Figure 2 shows an example of the the emotions of users watching music videos based on spectral data of the first two EEG channels of one partic- the DEAP dataset [8]. In 2019 Donmez and Ozkurt [9] ipant while watching one video. The continuous scale proposed to classify EEG signals by using a convolutional neural network. They classified three emotions by using brain signals and spectrogram images. 2. Methodology In this paper, we propose a machine learning [10] tech- nique, more precisely a deep learning [11] technique, for the realization of a predictive model of emotions using the EEG signal. We built and tested this model using one of the best-known online datasets, the DEAP dataset. This dataset contains the EEG signals of 32 individuals that were collected while the subjects watched and lis- tened to music videos taken from YouTube1 . Each subject was invited to view 40 one-minute videos and then asked to express her emotions on the dimensional model shown in Figure 1. Additionally, a parameter called likability Figure 2: The color plot of the spectral data of the first two EEG channels of one participant while watching one video. of each emotion 𝑒 was transformed into a binary value 𝑏(𝑒) ∈ {0, 1} so that 𝑏(𝑒) = 0 if 𝑒 < 5, 𝑏(𝑒) = 1 if 𝑒 β‰₯ 5. Denoting below the viewing of a video by a subject as an experiment, we divided the total number of experi- ments (i.e., 40 Γ— 32 = 1280) reserving 32 experiments for the validation set, 32 for the test set, and the remaining Figure 1: A 3D representation of the emotion dimensional for the training set. The experiments belonging to the model. validation and test set were arbitrarily chosen, one for each participant relating to different video experiments, was used to quantify how much the participant liked the so that the sum of positive emotions (in which 𝑏(𝑒) = 1) stimulus. For each dimension, the participant was asked and the negative ones (in which 𝑏(𝑒) = 0) approximately to rate its intensity on a continuous scale between 1 and balance each other. We used the ResNet101 [13] convolu- 9, where 1 stands for minimum intensity, and 9 for maxi- tional neural network, suitably adapted to take as input mum intensity. The EEG signal consists of 32 channels, a tensor with an arbitrary number of input channels. each corresponding to an electrode that measures the We empirically tested different hyperparameter config- difference in electric potential in the skull area where urations through a grid search, obtaining the following it is positioned. The proposed methodology provides optimal values: for the spectral analysis of the signal. This is achieved β€’ Loss Function: Cross-Entropy by applying the Discrete Fourier Transform to the sig- β€’ Optimizer: Stochastic Gradient Descent (SGD) nal, thus obtaining the power of the individual sinusoids β€’ Momentum: 0.9 that make up the signal. The spectrogram obtained for β€’ Weight Decay: 0.0005 each EEG channel is a two-dimensional matrix where β€’ Learning Rate: 3.0e-3 each cell (𝑓 , 𝑑) represents the intensity of the sinusoid at the frequency 𝑓 in the time segment 𝑑 (for more details The training set size was limited, so the network tends to overfit after about 200-300 epochs reaching 100% of accu- 1 https://www.youtube.com/ racy on the training set. Therefore, we introduced a data Table 1 stimulus can be exploited to improve the algorithms for Accuracy on 400 epochs without and with augmentation suggesting points of interest to visit (e.g., cultural her- itage resources [20, 21, 22] such as museums [23, 24, 25] Emotion W/O Augmentation W/ Augmentation or restaurants [26, 27]) and itineraries to follow between Valence 43% 43% them [28, 29]. Finally, future development could concern Arousal 50% 60% the increase in the stimulus classes to which the user is Dominance 50% 63% subjected. In this article, we analyzed the EEG signal Likability 60% 60% collected while the user watches music videos. It would be interesting to collect and subsequently analyze the EGG signal while the user listens to music [30], reads augmentation process that performed a random horizon- news articles [31], watches a movie [32] or looks at an tal offset of the input tensor by up to 20% of the horizontal image [33]. dimension. The rationale behind this is that a horizontal shift corresponds to a time shift of the signal. It is reason- able that this variability could occur between subject and References subject and between experiment and experiment. Table 1 shows the preliminary experimental results. [1] A. F. Bulagang, N. G. Weng, J. Mountstephens, J. Teo, A review of recent approaches for emotion classification using electrocardiography and elec- 3. Conclusions and Future Works trodermography signals, Informatics in Medicine Unlocked 20 (2020) 100363. In the research literature, it has been largely shown that [2] J. Russell, A circumplex model of affect, Jour- the knowledge of the user’s emotions can make a signifi- nal of Personality and Social Psychology 39 (1980) cant contribution to the creation of increasingly effective 1161–1178. human-machine interaction systems. Several aspects can [3] F. Cavallo, F. Semeraro, L. Fiorini, G. Magyar, be analyzed to recognize emotions and, more generally, P. SinčÑk, P. Dario, Emotion modelling for social the user’s affective state. In this article, we have pre- robotics applications: a review, Journal of Bionic sented a deep learning approach to EEG signal analysis. Engineering 15 (2018) 185–203. Specifically, a ResNet101 convolutional neural network [4] J. Deng, F. Ren, A survey of textual emotion recog- takes the EEG spectrogram as input and returns the val- nition and its challenges, IEEE Transactions on ues of arousal, valence, dominance, and likability. Affective Computing (2021). Our idea is still evolving, so the possible future develop- [5] N. Webb, A. Ruiz-Garcia, M. Elshaw, V. Palade, Emo- ments are manifold. These developments can be method- tion recognition from face images in an uncon- ological or applicative. As regards the former, clearly the strained environment for usage on social robots, data at our disposal are too limited to fully exploit the po- in: 2020 International Joint Conference on Neural tential of deep neural networks. We, therefore, need new Networks (IJCNN), 2020, pp. 1–8. data, so we are planning to collect it ourselves with the [6] X.-W. Wang, D. Nie, B.-L. Lu, Emotional state clas- appropriate instrumentation. Another aspect concerns sification from eeg data using machine learning the deep neural network chosen. The ResNet101 is one of approach, Neurocomputing 129 (2014) 94–106. the many possibilities that deep learning research makes [7] H. Dabas, C. Sethi, C. Dua, M. Dalawat, D. Sethia, available today. A further development of our work con- Emotion classification using eeg signals, in: Pro- cerns the data augmentation process, which has been ceedings of the 2018 2nd International Conference shown to be able to improve the model accuracy. In the on Computer Science and Artificial Intelligence, system described, the data augmentation concerned only 2018, pp. 380–384. the horizontal shift. Hence, we want to apply new geo- [8] S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yaz- metric transformations and image processing techniques dani, T. Ebrahimi, T. Pun, A. Nijholt, I. Patras, Deap: and verify whether they can further improve the accu- A database for emotion analysis using physiological racy of the results. As far as application developments are signals, IEEE Transactions on Affective Computing concerned, our idea is to combine physiological data with 3 (2012) 18–31. those related to facial expressions and eye tracking. Our [9] H. Donmez, N. Ozkurt, Emotion classification from ultimate goal is to improve human-machine interaction, eeg signals in convolutional neural networks, in: both when the user is dealing with social robots and with 2019 Innovations in Intelligent Systems and Appli- recommender systems [14, 15, 16] or multimedia applica- cations Conference (ASYU), 2019, pp. 1–6. tions [17, 18, 19]. For instance, the information related to [10] L. Vaccaro, G. Sansonetti, A. Micarelli, An empirical the emotions that the user feels when faced with a certain review of automated machine learning, Computers [23] A. Ferrato, C. Limongelli, M. Mezzini, G. Sansonetti, 10 (2021). Using deep learning for collecting data about mu- [11] G. Sansonetti, F. Gasparetti, G. D’Aniello, A. Mi- seum visitor behavior, Applied Sciences 12 (2022). carelli, Unreliable users detection in social media: [24] A. Ferrato, C. Limongelli, M. Mezzini, G. Sansonetti, Deep learning techniques for automatic detection, The meta4rs proposal: Museum emotion and track- IEEE Access 8 (2020) 213154–213167. ing analysis for recommender systems, in: Adjunct [12] M. X. Cohen, Analyzing neural time series data: Proceedings of the 30th ACM Conference on User theory and practice, MIT press, 2014. Modeling, Adaptation and Personalization, UMAP [13] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learn- ’22 Adjunct, Association for Computing Machinery, ing for image recognition, in: 2016 IEEE Conference New York, NY, USA, 2022, pp. 406–409. on Computer Vision and Pattern Recognition, IEEE [25] M. Mezzini, C. Limongelli, G. Sansonetti, Computer Society, Los Alamitos, CA, USA, 2016, C. De Medio, Tracking museum visitors through pp. 770–778. Las Vegas, NV, US, 27–30 June 2016. convolutional object detectors, in: Adjunct [14] F. Gasparetti, G. Sansonetti, A. Micarelli, Commu- Publication of the 28th ACM Conference on User nity detection in social recommender systems: a Modeling, Adaptation and Personalization, UMAP survey, Applied Intelligence 51 (2021) 3975–3995. ’20 Adjunct, Association for Computing Machinery, [15] G. Sansonetti, Point of interest recommendation New York, NY, USA, 2020, p. 352–355. based on social and linked open data, Personal and [26] C. Biancalana, F. Gasparetti, A. Micarelli, G. San- Ubiquitous Computing 23 (2019) 199–214. sonetti, An approach to social recommendation for [16] D. Feltoni Gurini, F. Gasparetti, A. Micarelli, G. San- context-aware mobile services, ACM Trans. Intell. sonetti, Temporal people-to-people recommenda- Syst. Technol. 4 (2013) 10:1–10:31. tion on social networks with sentiment-based ma- [27] N. Sardella, C. Biancalana, A. Micarelli, G. San- trix factorization, Future Generation Computer sonetti, An approach to conversational recommen- Systems 78 (2018) 430–439. dation of restaurants, in: C. Stephanidis (Ed.), HCI [17] A. Micarelli, A. Neri, G. Sansonetti, A case-based International 2019 - Posters, Springer International approach to image recognition, in: Proceedings of Publishing, Cham, 2019, pp. 123–130. the 5th European Workshop on Advances in Case- [28] D. D’Agostino, F. Gasparetti, A. Micarelli, G. San- Based Reasoning, EWCBR ’00, Springer-Verlag, sonetti, A social context-aware recommender of Berlin, Heidelberg, 2000, pp. 443–454. itineraries between relevant points of interest, in: [18] G. Sansonetti, F. Gasparetti, A. Micarelli, Using so- HCI International 2016, volume 618, Springer Inter- cial media for personalizing the cultural heritage ex- national Publishing, Cham, 2016, pp. 354–359. perience, in: Adjunct Proceedings of the 29th ACM [29] A. Fogli, G. Sansonetti, Exploiting semantics for Conference on User Modeling, Adaptation and Per- context-aware itinerary recommendation, Personal sonalization, UMAP ’21, Association for Computing and Ubiquitous Computing 23 (2019) 215–231. Machinery, New York, NY, USA, 2021, p. 189–193. [30] M. Onori, A. Micarelli, G. Sansonetti, A comparative [19] L. Xie, Z. Deng, S. Cox, Multimodal joint infor- analysis of personality-based music recommender mation processing in human machine interaction: systems, in: CEUR Workshop Proceedings, volume Recent advances, Multimedia Tools Appl. 73 (2014) 1680, CEUR-WS.org, Aachen, Germany, 2016, pp. 267–271. 55–59. [20] A. De Angelis, F. Gasparetti, A. Micarelli, G. San- [31] S. Caldarelli, D. F. Gurini, A. Micarelli, G. Sansonetti, sonetti, A social cultural recommender based on A signal-based approach to news recommendation, linked open data, in: Adjunct Publication of the in: CEUR Workshop Proceedings, volume 1618, 25th Conference on User Modeling, Adaptation and CEUR-WS.org, Aachen, Germany, 2016, pp. 1–4. Personalization, UMAP ’17, ACM, New York, NY, [32] C. Biancalana, F. Gasparetti, A. Micarelli, A. Mi- USA, 2017, pp. 329–332. ola, G. Sansonetti, Context-aware movie recom- [21] G. Sansonetti, F. Gasparetti, A. Micarelli, Cross- mendation based on signal processing and machine domain recommendation for enhancing cultural learning, in: Proceedings of the 2nd Challenge on heritage experience, in: Adjunct Publication of Context-Aware Movie Recommendation, CAMRa the 27th Conference on User Modeling, Adaptation ’11, ACM, New York, NY, USA, 2011, pp. 5–10. and Personalization, Association for Computing [33] A. Mensen, W. Marshall, G. Tononi, Eeg differen- Machinery, New York, NY, USA, 2019, pp. 413–415. tiation analysis and stimulus set meaningfulness, [22] G. Sansonetti, F. Gasparetti, A. Micarelli, F. Cena, Frontiers in psychology 8 (2017) 1748. C. Gena, Enhancing cultural recommendations through social and linked open data, User Modeling and User-Adapted Interaction 29 (2019) 121–159.