Proceedings of the 4th Congress on Robotics and Neuroscience Convolutional neural network for cognitive task prediction from EEG’s auditory steady state responses Daniela Montilla-Trochez1 , Rodrigo Salas1,4 , Alejandro Bertin1 , Inga Griskova-Bulanova2* , Paulo Lisboa3 , Carolina Saavedra1* *For correspondence: daniela.montilla@postgrado.uv.cl 1 Universidad de Valparaíso; 2 Vilnius University; 3 Liverpool John Moores University; (DM); carolina.saavedra@uv.cl (CS) 4 Centro de Investigación y Desarrollo en Ingeniería en Salud Present address: † Escuela de Ingeniería C. Biomédica, Universidad de Valparaíso, Chile; ‡ Department of Neurobiology and Biophysics, Vilnius University, Abstract The prediction of cognitive tasks from electroencephalography (EEG) signals have Lithuania; § School of Applied allowed to discriminate the cognitive states emitted by the subjects and to carry out robust Mathematicas, Liverpool John Moores University, United monitoring of cognition; a fact that is associated with the attention and performance of an Kingdom; ¶ Centro de Investigación individual’s behavior, allowing greater control in the experiments. The objective of this work is to y Desarrollo en Ingeniería en Salud, perform the prediction of tasks in the function of the auditory steady-state response (ASSR). CINGS-UV, Universidad de Valparaíso, Chile. Twenty-two subjects underwent three types of tasks: counting, reading and rest, accompanied by a constant stimulus. Images were obtained from the Inter Trial phase coherence (ITPC) to train classification algorithms based on convolutional neural networks (CNN) in order to separate the tasks performed by the subjects. Performance evaluation of the classification algorithm shows very good separation between count, read and rest with an AUROC of 0.95. This is significantly better than a feedforward neural network and a pre-trained convolutional deep neural network. Introduction Task detection from electroencephalography (EEG) signal allows us to discriminate between specific cognitive states and so monitor cognition. This is associated with the attention and performance of an individual’s behavior Papakostas et al. (2017). Parameters such as functional connectivity can be used to distinguish between cognitive states based on the individual’s brain Gaut et al. (2018), also considering that the last components of the evoked potentials are related to discrimination tasks that reveal complex cognitive processes Saavedra and Bougrain (2012); Saavedra et al. (2019). It is important to highlight that discrimination of mental tasks has the purpose of monitoring the behavior of an individual, rather than establishing a correlation between EEG measurements and the final result of the tasks. In particular, there are patterns that might be able to detect cognitive states between different users. Palaniappan and Raveendran (2001). Part of the behavior is associated with the individual’s auditory quality, which can be monitored through evoked potentials that estimate hearing sensitivity. In fact, specifically the Auditory Steady State Responses (ASSRs) are used to measure the ability of local cortical networks to generate activity and thus be able to differentiate individuals with normal hearing sensitivity from those with varying degrees of auditory sensorineural loss. Korczak et al. (2012). It should be noted that ASSRs are obtained when an auditory stimulus that is periodically presented produces an electroencephalographic response. Although there are investigations that address the study of tasks and use artificial neural networks for the classification of waveforms of Event-related Potential (ERP) Gupta et al. (1995), from the EEG, there are few that involve a constant Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Proceedings of the 4th Congress on Robotics and Neuroscience auditory stimulus. On the other hand, deep neural networks have been applied successfully in different fields and with performances that outperforms conventional machine learning techniques. In particular, convolutional neural networks are a special type of deep networks that have proved very effective in classifying images because they have the ability to extract relevant characteristics, which they use with a non-linear classifier Goodfellow et al. (2016). On the other hand, a wide variety of deep learning models have been proposed in order to classify images in complex contexts (see for example Mellado et al. (2019)) The objective of the present work is to make the prediction of the mental task performed by a subject from the EEG signal when it is under a steady state response to an auditory stimulus. Methods and Material Acquisition of EEG Signals We used a dataset from Voicikas et al. (2016) consisting of 28 healthy young male subjects. The auditory stimulus used was the Click trial, which consisted of 20 identical bursts of white sound with a duration of 1.5 ms. The subjects underwent three tasks: the first was to count the number of presentations of the stimulus. In the end, the subjects who report the number of stimuli specified was requested to guarantee attention, in the second the subject had to ignore the stimulus presented and try to keep his/her mind blank and, finally, in the third task he/she was asked to make a silent reading of an easily readable text and presented on a computer screen. At the end of the experiment, the subjects were briefly questioned about the content of the material to control the attention. The channels that were selected to extract the information of the response to the stimulus were F3, F1, Fz, F2, F4, FC3, FC1, FCz, FC2, FC4, C3, C1, Cz, C2, C4. The sampling frequency was set at 1024Hz, while the number of trials per subject varies between 100 and 120. Coherent Averaging The coherent averaging of 𝑁 trials of EEG signals consists in obtaining the average in each instant of time of segments of signals of equal size and that are started with the stimulus applied. The objective of applying this technique is to reduce noise and random activations, but on the other hand it is expected to highlight the evoked potentials in response to the stimulus. In this work, coherent averages were applied every 15 trials of EGG signals of the same task. Inter Trial Phase Coherence The Inter Trial Phase Coherence (ITPC) is a computational technique that averages the complex representation of a unit vector, obtained from the phase angle of a trial at a given time, represented using Euler’s formula. The method was introduced by Tallon-Baudry et al. (1996). The ITPC is mathematically defined by the following equation: | ∑𝑛 | | | 𝐼𝑇 𝑃 𝐶𝑡𝑓 = |𝑛−1 𝑒𝑖𝑘𝑡𝑓 𝑟 | (1) | | | 𝑟=1 | where 𝑛 represents the number of trials and 𝑒𝑖𝑘𝑡𝑓 𝑟 is the complex polar representation of a phase angle 𝑘 on trial 𝑟, at time-frequency point 𝑡𝑓 . The result of the ITPC is an image whose pixels have values that are in the range of 0 to 1, where 0 indicates evenly distributed phase angles and 1 indicates completely identical phase angles corresponding to complete coherence Delorme and Makeig (2004). Proceedings of the 4th Congress on Robotics and Neuroscience (a) Count (b) Read (c) Rest Figure 1. Images obtained with the ITPC method for each of the mental tasks, each image comes from different subjects In this work, the ITPC images (see figure 1) are re-scaled to be used in the classifiers. Classifiers In this work, 3 types of classifiers will be evaluated, which are explained below: 1. Feedforward or Fully Connected Neural Network: It is a type of artificial neural network consisting of 1 layer of input neurons, 1 or more layers of hidden neurons and 1 output layer. All neurons in one layer are connected to all neurons in the next layer. This type of network has no recurrence, lateral connections, nor connections to layers farther than the consecutive ones. The learning algorithm used is Backpropagation. These networks have the property of being universal approximators. (More details of these networks see Allende et al. (2001)) In this work, the network architecture used consists of 2 hidden layers. The learning algorithm used is: RMSprop. 2. Convolutional Neural Network (CNN): these are a type of artificial neural networks that have been successfully applied in computer vision. Its name comes from the mathematical operation that is carried out in at least one of its layers, the convolution. A CNN is composed of at least 3 layers and these are: • Convolution layer: The convolution operation receives the image as input and then applies a filter or kernel on it. This layer returns a map of characteristics of the original image and whose dimensions will decrease according to the kernel size. • Pooling layer: The purpose of this layer is to reduce the spatial dimensions of the input volume for the next convolutional layer without affecting depth. The reduction in size and the loss of information is favorable due to the decrease in the size of the network that leads to a lower overload in the calculation in the following layers and can also reduce the overfitting. • Fully Connected Layer: This is used as the last layer in the CNN. The neurons of the filters are flattened and the information pass through non-linear activation functions. This layer is responsible for classifying the images. The number of output neurons is equivalent to the number of classes. It should be noted that a CNN can consist of several Convolution layers and several Pooling layers. In this work, the CNN network architecture is composed of two stages, in the first stage there are the 4 layers of convolution with kernel (3x3), followed by layers of average pooling and max pooling and 4 layers of batch normalization, which they are responsible for the extraction of features and dimensionality reduction respectively, in the second stage there are 2 fully-connected layers, responsible for classification. The learning algorithm used is called RMSprop. In this work, an ad-hoc CNN model was developed for the available data, where the resulting architecture tries to preserve the parsimony. Because in addition to the CNN model ITPC images are incorporated, the model will be called Hybrid-CNN Proceedings of the 4th Congress on Robotics and Neuroscience 3. VGG16: The VGG16 model is a convolutional neural network with specific architecture and has been applied in different contexts (see Simonyan and Zisserman (2014)). The architecture of the model is composed of 16 layers, of which 13 are convolutional and 3 fully-connected. Framework of the proposed model In this paper, a framework for the classification of tasks from ASSR signals is proposed. The scheme used is shown in figure 2. The proposed scheme consists of the following stages: 1. Data Acquisition: The EEG signals were recorded using a ANT device of 50mV/V and 64 WaveGuard EEG channels. 2. Channel selection and coherent trial averaging: From the EGG signals, the 15 channels closest to Cz were selected where the greatest response to the stimulus is visualized (see section Acquisition of EEG Signals). A coherent averaging of 15 trials for each channel is performed. 3. ITPC: The ITPC method explained in the Inter Trial Phase Coherence section is applied to generate the spectral images. The resulting images have dimensions 21 × 717. An image bank with 7155 samples was obtained. 4. Feature Extraction: This stage consists of a series of filters that are convolved with the input signals. Afterwards, an activation functions of max-pooling type are applied which generates a downsampling and, at the same time, they work as detectors of relevant characteristics. 5. Classification: At this stage, fully connected non-linear activation neuron layers were used. From the activated filters obtained from the previous stage a characteristic vector is generated that is processed by the neuronal classifier. The result is the final classification in one of the following tags: Read, Count and Rest. Figure 2. Flowchart of the proposed method Results This section presents the results of the different types of classifiers that were implemented to predict the tasks from the ASSR data, the best results for each model are reported. In the figure 3 a distribution of the confusion matrix of each of the classifiers can be visualized (a) Feedforward Artificial Neural Network (b) VGG16 (c) Hybrid-CNN Figure 3. Confusion Matrix of the Test data using the three types of models Proceedings of the 4th Congress on Robotics and Neuroscience Table 1 shows the performance metrics of the models evaluated in the Test dataset. For all the task the best results were obtained by the Hybrid-CNN model followed by the VGG16, leaving behind the FANN that obtained the worst performance. For all the tasks the recall and precision values for the Hybrid-CNN classifier range between 0.841 and 0.867, while for the VGG16 they range between 0.652 and 0.711. This indicates that the Hybrid-CNN model outperforms the other models in recognizing tasks correctly. Count Read Rest FANN F1_Score 0,125 ± 0,262 0,253 ± 0,283 0,311 ± 0,267 Recall_Score 0,163 ± 0,343 0,193 ± 0,229 0,246 ± 0,244 Precision_Score 0,163 ± 0,343 0,423 ± 0,480 0,539 ± 0,481 VGG16 F1_Score 0,689 ±0,042 0,679 ±0,019 0,675 ± 0,041 Recall_Score 0,676 ± 0,090 0,672 ± 0,063 0,702 ± 0,052 Precision_Score 0,711 ± 0,039 0,694 ± 0,056 0,652 ± 0,044 Hybrid CNN F1_Score 0,866 ± 0,020 0,846 ± 0,019 0,848 ± 0,017 Recall_Score 0,867 ± 0,032 0,852 ± 0,033 0,841 ± 0,034 Precision_score 0,866 ± 0,032 0,841 ± 0,028 0,856 ± 0,025 Table 1. Classifier performance evaluation metrics Loss Accuracy ECM FANN 6,286 ± 2,345 0,583 ± 0,076 0,408 ± 0,100 vGG16 0,781 ± 0,056 0,682 ± 0,024 0,147 ± 0,010 Hybrid-CNN 0,274 ± 0,030 0,903 ± 0,010 0,074 ± 0,007 Table 2. Accuracy of each classifier As can be seen in the ROC curves (figure 4), the VGG16 and hybrid-CNN models are above the non-discrimination line, otherwise the FANN model is very close, tending the but performance of the three models, the VGG16 have a good performance with respect to the classification of the tasks but it is not an optimal model, on the contrary the hybrid-CNN algorithm shows the best performance in all the tasks obtaining the best results of area under the curve. Proceedings of the 4th Congress on Robotics and Neuroscience (a) Feed forward (b) VGG16 (c) Hybrid-CNN Figure 4. ROC curve for each of the classifiers: (b) Feedforward, (b) VGG16, and (c) Hybrid-CNN. The Hybrid.CNN shows a better AUC than the other two methods for the cognitive classification task. Conclusion We have proposed the application of a convolutional neural network to analyze and classify EEG signals of steady-state auditory responses. To improve the performance of the convolutional neural network, coherent averaging of 15 trials was performed and images were then obtained by applying the ITPC method. The results show that with the pipeline of the proposed model for the prediction of cognitive tasks can be made with an an AUROC of 0.95, corresponding to a sensitivity (recall) score of 0.85. Future work is required in order to increase the number of subjects in the study, specially considering people with some alteration or that presents cognitive difficulties. Acknowledgments The authors acknowledge the support of the grant REDI170367 from CONICYT. References Allende H, Moraga C, Salas R. Artificial Neural Networks in Time Series Forecasting: A Comparative Analysis. Kybernetika. 2001; 38(6):685–707. Delorme A, Makeig S. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of neuroscience methods. 2004; 134(1):9–21. Proceedings of the 4th Congress on Robotics and Neuroscience Gaut G, Li X, Turner B, Cunningham WA, Lu ZL, Steyvers M. Predicting Task and Subject Differences with Functional Connectivity and BOLD Variability. arXiv:180704745 [q-bio]. 2018 Jul; http://arxiv.org/abs/1807. 04745, arXiv: 1807.04745. Goodfellow I, Bengio Y, Courville A. Deep Learning. MIT Press; 2016. http://www.deeplearningbook.org. Gupta L, Molfese DL, Tammana R. An artificial neural-network approach to ERP classification. Brain and cognition. 1995; 27(3):311–330. Korczak P, Smart J, Delgado R, M Strobel T, Bradford C, Auditory Steady-State Responses; 2012. https://www. ingentaconnect.com/content/aaa/jaaa/2012/023/003/art03, doi: info:doi/10.3766/jaaa.23.3.3. Mellado D, Saavedra C, Chabert S, Torres R, Salas R. Self-Improving Generative Artificial Neural Net- work for Pseudo-Rehearsal Incremental Class Learning. Preprints. 2019; 2019070121:1–17. Doi: 10.20944/preprints201907.0121.v1. Palaniappan R, Raveendran P. Cognitive task prediction using parametric spectral analysis of EEG signals. Malaysian Journal of Computer Science. 2001; 14(1):58–67. Papakostas M, Tsiakas K, Giannakopoulos T, Makedon F. Towards predicting task performance from EEG signals. In: 2017 IEEE International Conference on Big Data (Big Data); 2017. p. 4423–4425. doi: 10.1109/Big- Data.2017.8258478. Saavedra C, Salas R, Bougrain L. Wavelet-based semblance methods to enhance single-trial ERP detection. To be published in Computational Intelligence and Neuroscience. 2019; . Saavedra C, Bougrain L. Processing stages of visual stimuli and event-related potentials. In: The Neuro- Comp/KEOpS’12 workshop; 2012. . Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. 2014; . Tallon-Baudry C, Bertrand O, Delpuech C, Pernier J. Stimulus specificity of phase-locked and non-phase-locked 40 Hz visual responses in human. Journal of Neuroscience. 1996; 16(13):4240–4249. Voicikas A, Niciute I, Ruksenas O, Griskova-Bulanova I. Effect of attention on 40Hz auditory steady-state response depends on the stimulation type: Flutter amplitude modulated tones versus clicks. Neuroscience Letters. 2016; 629:215–220. doi: 10.1016/j.neulet.2016.07.019.