1. Introduction and Background

Classification through EEG Spectrogram Images

Lorenzo Battisti

lor.battisti5@stud.uniroma3.it 1

Alessio Ferrato

ale.ferrato@stud.uniroma3.it 1

Carla Limongelli

limongel@dia.uniroma3.it 1

Mauro Mezzini

mauro.mezzini@uniroma3.it 0

Sydney, Australia

0 Department of Education, Roma Tre University , Viale del Castro Pretorio 20, 00185 Rome , Italy 1 Department of Engineering, Roma Tre University , Via della Vasca Navale 79, 00146 Rome , Italy

2016

618 406 409

Emotion modeling for social robotics has the great potential to improve the life quality for the elderly and individuals with disabilities by making communication, care, and interactions more efective. It can help individuals with communication dificulties express their emotions. It can also be used to monitor the emotional well-being of elderly persons living alone and alert caregivers or family members if there are signs of distress. More broadly, emotion modeling is necessary to design robots closer and closer to human beings that can naturally interact with them by understanding their behavior and reactions. Here, we propose a deep learning technique for emotion classification using electroencephalogram (EEG) signals. We aim to recognize valence, arousal, dominance, and likability. Our technique uses the spectrogram from each of the 32 electrodes applied in the skull area. Then, we employ a Resnet101 convolutional neural network to learn a model capable of predicting several emotions. We built and tested our model on the DEAP dataset.

Emotion classification Electroencephalogram Deep Learning

1. Introduction and Background

Automatic emotion recognition is a vast and complex area of research. It has attracted the attention of scientists in many fields, including psychology, artificial intelligence, neuroscience, and robotics. The main goal of this research is to create systems capable of automatically recognizing and interpreting human emotions. rience. From the pleasant joy of spending time with a loved one to the pain of facing a dificult time in life. which can be divided into two large groups: categorical models that represent the space of all emotions as a finite set, and dimensional models that represent emotions cerning dimensional models, three main components are frequently used to define emotions and afective states: arousal, valence, and dominance [ 2 ]. Arousal refers to an individual’s level of enthusiasm or activity. High arousal levels are related to emotions of excitement, whereas lower ones are associated with relaxation. The positivity or negativity of an emotional experience is referred to as clips, to assess the association between EEG data and emotional states. Using a Support Vector Machine classiifer the authors showed that representing the state space model in the form of linear dynamical systems removes the noise not correlated with emotions. This makes the classification of emotions more accurate. In 2018, Dabas et al. [ 7 ] proposed a 3D emotional model for classifying the emotions of users watching music videos based on the DEAP dataset [ 8 ]. In 2019 Donmez and Ozkurt [ 9 ] proposed to classify EEG signals by using a convolutional neural network. They classified three emotions by using brain signals and spectrogram images.

2. Methodology

In this paper, we propose a machine learning [ 10 ] technique, more precisely a deep learning [11] technique, for the realization of a predictive model of emotions using the EEG signal. We built and tested this model using one of the best-known online datasets, the DEAP dataset. This dataset contains the EEG signals of 32 individuals that were collected while the subjects watched and listened to music videos taken from YouTube1. Each subject was invited to view 40 one-minute videos and then asked to express her emotions on the dimensional model shown in Figure 1. Additionally, a parameter called likability was used to quantify how much the participant liked the stimulus. For each dimension, the participant was asked to rate its intensity on a continuous scale between 1 and 9, where 1 stands for minimum intensity, and 9 for maximum intensity. The EEG signal consists of 32 channels, each corresponding to an electrode that measures the diference in electric potential in the skull area where it is positioned. The proposed methodology provides for the spectral analysis of the signal. This is achieved by applying the Discrete Fourier Transform to the signal, thus obtaining the power of the individual sinusoids that make up the signal. The spectrogram obtained for each EEG channel is a two-dimensional matrix where each cell ( , ) represents the intensity of the sinusoid at the frequency in the time segment (for more details 1https://www.youtube.com/ please refer to [12]). Figure 2 shows an example of the spectral data of the first two EEG channels of one participant while watching one video. The continuous scale of each emotion was transformed into a binary value () ∈ {0, 1} so that () = 0 if < 5 , () = 1 if ≥ 5 . Denoting below the viewing of a video by a subject as an experiment, we divided the total number of experiments (i.e., 40 × 32 = 1280) reserving 32 experiments for the validation set, 32 for the test set, and the remaining for the training set. The experiments belonging to the validation and test set were arbitrarily chosen, one for each participant relating to diferent video experiments, so that the sum of positive emotions (in which () = 1 ) and the negative ones (in which () = 0 ) approximately balance each other. We used the ResNet101 [13] convolutional neural network, suitably adapted to take as input a tensor with an arbitrary number of input channels. We empirically tested diferent hyperparameter configurations through a grid search, obtaining the following optimal values: • Loss Function: Cross-Entropy • Optimizer: Stochastic Gradient Descent (SGD) • Momentum: 0.9 • Weight Decay: 0.0005 • Learning Rate: 3.0e-3 The training set size was limited, so the network tends to overfit after about 200-300 epochs reaching 100 % of accuracy on the training set. Therefore, we introduced a data 3. Conclusions and Future Works In the research literature, it has been largely shown that the knowledge of the user’s emotions can make a significant contribution to the creation of increasingly efective human-machine interaction systems. Several aspects can be analyzed to recognize emotions and, more generally, the user’s afective state. In this article, we have presented a deep learning approach to EEG signal analysis. Specifically, a ResNet101 convolutional neural network takes the EEG spectrogram as input and returns the values of arousal, valence, dominance, and likability.

Our idea is still evolving, so the possible future developments are manifold. These developments can be methodological or applicative. As regards the former, clearly the data at our disposal are too limited to fully exploit the potential of deep neural networks. We, therefore, need new data, so we are planning to collect it ourselves with the appropriate instrumentation. Another aspect concerns the deep neural network chosen. The ResNet101 is one of the many possibilities that deep learning research makes available today. A further development of our work concerns the data augmentation process, which has been shown to be able to improve the model accuracy. In the system described, the data augmentation concerned only the horizontal shift. Hence, we want to apply new geometric transformations and image processing techniques and verify whether they can further improve the accuracy of the results. As far as application developments are concerned, our idea is to combine physiological data with those related to facial expressions and eye tracking. Our ultimate goal is to improve human-machine interaction, both when the user is dealing with social robots and with recommender systems [14, 15, 16] or multimedia applications [17, 18, 19]. For instance, the information related to the emotions that the user feels when faced with a certain

[1]

A. F.

Bulagang ,

N. G.

Weng ,

Mountstephens ,

Teo , A review of recent approaches for emotion classification using electrocardiography and electrodermography signals , Informatics in Medicine Unlocked 20 ( 2020 ) 100363 .

[2]

Russell , A circumplex model of afect , Journal of Personality and Social Psychology 39 ( 1980 ) 1161 - 1178 .

[3]

Cavallo ,

Semeraro ,

Fiorini , G. Magyar,

Sinčák ,

Dario , Emotion modelling for social robotics applications: a review , Journal of Bionic Engineering 15 ( 2018 ) 185 - 203 .

[4]

Deng ,

Ren , A survey of textual emotion recognition and its challenges , IEEE Transactions on Afective Computing ( 2021 ).

[5]

Webb ,

Ruiz-Garcia ,

Elshaw ,

Palade , Emotion recognition from face images in an unconstrained environment for usage on social robots , in: 2020 International Joint Conference on Neural Networks (IJCNN) , 2020 , pp. 1 - 8 .

[6]

X.-W.

Wang ,

Nie ,

B.-L.

Lu , Emotional state classification from eeg data using machine learning approach , Neurocomputing 129 ( 2014 ) 94 - 106 .

[7]

Dabas ,

Sethi ,

Dua ,

Dalawat ,

Sethia , Emotion classification using eeg signals , in: Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence , 2018 , pp. 380 - 384 .

[8]

Koelstra ,

Muhl ,

Soleymani ,

J.-S.

Lee ,

Yazdani ,

Ebrahimi ,

Pun ,

Nijholt , I. Patras , Deap: A database for emotion analysis using physiological signals , IEEE Transactions on Afective Computing 3 ( 2012 ) 18 - 31 .

[9]

Donmez ,

Ozkurt , Emotion classification from eeg signals in convolutional neural networks , in: 2019 Innovations in Intelligent Systems and Applications Conference (ASYU) , 2019 , pp. 1 - 6 .

[10]

Vaccaro ,

Sansonetti ,

Micarelli , An empirical