Extreme Learning Machines For Efficient Speech Emotion Estimation In Julia Georgios Drakopoulos∗ , Phivos Mylonas Department of Informatics, Ionian University, Tsirigoti Sq. 7. Kerkyra 49100, Hellas Abstract Speech is a mainstay of communication across literally all human activities. Besides facts and statements speech carries substantial information regarding experiences, thoughts, and emotions, therefore adding significant context. Moreover, non-linguistic elements such as pauses add more to the message. The field of speech emotion recognition (SER) has been developed precisely to develop algorithms and tools performing what humans learn to do from early on. One promising line of research comes from applying deep learning techniques trained on numerous audio attributes to discern between various emotions as dictated by a given model of fundamental human emotions. Extreme learning machines (ELMs) are neural network architectures achieving efficiency through simplicity and can potentially operate akin to a sparse coder. When trained by a plethora of audio attributes, such as cepstral coefficients, zero crossing rate, and autocorrelation, then it can classify emotions in speech based on the established emotion wheel model. The evaluation, done with the Toronto emotional speech set (TESS) on an ELM implemented in Julia, is quite encouraging. Keywords extreme learning machine, speech emotion recognition, emotion classification, Plutchik model, higher order patterns, spectrogram, cepstral coefficients, zero crossing rate, TESS dataset, Julia 1. Introduction emotion classification as is the case here. Among the various models proposed for the vari- Language, whether oral or written, is among the major ous SER tasks, extreme learning machines (ELMs) have sources of human emotion and perhaps a mainstay of shown considerable potential. The latter can be partially civilization itself. The field of speech emotion recog- at least attributed to the ELM structure which has only a nition (SER) almost since its formulation has been an single but very long hidden layer. In turn, this allows for essentially demanding field systematically garnering in- straightforward and easy to interpret training schemes, tense interdisciplinary interest since it aims to answer all of which eventually stem from a synaptic weight reg- fundamental questions regarding human speech, which ularization property. This is aligned with the intuition includes major elements such as intonation and pitch as that a certain optimality condition should hold in order well as latent and non-linguistic elements such as pauses for the weights to be uniquely derived. and the length of sentences. Because of the complexity The primary research contribution of this conference and volatility of human speech, SER relies heavily on paper is the development of an ELM implemented in Julia machine learning (ML) and recently on deep learning and operating like a sparse encoder for the emotion classi- (DL) techniques for performing tasks. fication of sentences coming from the ubiquitous Toronto Human emotion models such as the emotion wheel emotion speech set (TESS) collection, a benchmark for by Plutchik [1] and the universal emotion models [2] training ML and DL models for SER tasks. have been developed to explain not only which emotions The remainder of this work is structured as follows. are fundamental, with interpretations ranging from so- The recent scientific literature regarding ELMs, SER, and cial conditioning to brain functionality and evolutionary graph mining is briefly reviewed in section 2. In section goals, but also how they are composed, which may well 3 the proposed methodology is described, whereas the entail non-linear operations. In any case, such models results obtained using the TESS dataaset are analysed in can serve well as training guides to ML models for speech section 4. Possible future research directions are given in section 5. Bold capital letters denote matrices, bold small CIKM’22: 31st ACM International Conference on Information and vectors, and small scalars. Acronyms are explained the Knowledge Management (companion volume), October 17–21, 2022, first time they are encountered in the text. Finally, the Atlanta, GA notation of this work is summarized in table 1. ∗ Corresponding author. Envelope-Open c16drak@ionio.gr (G. Drakopoulos); fmylonas@ionio.gr (P. Mylonas) Orcid 0000-0002-0975-1877 (G. Drakopoulos); 0000-0002-6916-3129 2. Related Work (P. Mylonas) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Because of its interdisciplinary nature SER has been at Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) the attention focus of a number of fields [3]. To address 1 Georgios Drakopoulos et al. CEUR Workshop Proceedings 1–7 Table 1 is given in [34], sequential graph collaborative filtering Notation Summary is the topic of [35], mining hot sports in trajectories with graph based methodologies is developed in [36], and fMRI Symbol Meaning First in image classification with tensor distance metrics is the △ = Equality by definition Eq. (1) focus of [37]. ‖⋅‖ Vector or matrix norm Eq. (6) 𝜑(⋅) Activation function Eq. (3) tanh (⋅) Hyperbolic tangent function Eq. (3) 3. Methodology tr (⋅) Matrix trace Eq. (7) 3.1. Attributes Emotion models have been developed in order to explain the inherent complexity of the SER tasks, ML approaches how emotions work, their intensity and elicit conditions, such as ensemble learning [4], deep convolutional neu- how they may be composed in case of emotion levels, ral networks [5], domain invariant feature learning [6], and possibly their evolutionary purpose. In this set of two-dimensional convolutional neural networks [7], and models the one proposed by Plutchik has been among multimodal deep learning [8] have been proposed in the the earliest and one of the most commonly used in en- scientific literature. Human emotion models such as the gineering applications. Additionally, it has an easy to emotion wheel by Plutchik [1] or the universal emotion understand and intuitive-friendly visual interpretation, theory by Ekman [2] typically describe a fundamental set which is shown in figure 1. Notice that this figure depicts of emotions [9, 10, 11] along with composition rules and a two dimensional projection of a cone. possible evolutionary explanations for them [12]. More recently, personality taxonomies go beyond single emo- tional reactions and treat personality as a whole such as the Myers-Brigs type indicator (MBTI). A reasoning based framework for emotion classification is [13]. ELMs have been used in ML because of the simplicity of their architecture [14]. They have been used as part of ML pipelines for wavelet transforms [15], in conjunction with an autoencoder for predicting the concentration of emitted greenhouse gases from boilers [16], and in opti- mizing a Kalman filter for determining the aging factors of supercapacitors [17]. Further applications include es- timating soil thermal conductivity [18] and an evolving kernel assisted ELM for medical diagnosis [19], whereas an extensive list of applications is given in [20]. Graph mining is a field relying heavily on ML [21] and graph signal processing techniques [22]. Regarding the use of ML, self organizing maps (SOMs) for recom- mending cultural content are presented in [23], exploiting natural language attributes for finding linked require- ments between software artefacts is the focus of [24], decompressing a sequence of Twitter graphs compressed Figure 1: Plutchik model (From Wikipedia). with the two-dimensional discrete cosine transform us- ing a tensor stack network (TSN) is described in [25], According to this model each emotion corresponds to combining graph mining with transformers is shown a location in a circle which is primarily a function of its in [26], advanced graph clustering techniques for clas- valence as well as of its direction. The latter is related sifying variation of cancer genomes [27], message pass- to the nature of the emotion under consideration, which ing graph neural networks for fuzzy [28] and ordinary also determines at least in part its polarity. Specifically, Twitter graphs [29] are described, a GPU-based system there are in total eight directions with three scales each. for efficient graph mining is shown in [30], partitioning Moreover, there are some emotions which are combina- the user base of a portal for cultural content recommen- tions of others from two directions. Moreover, the set of dation is explained in [31], visualizing massive graphs emotions is categorized as basic, primary, secondary, and for human feedback is described in [32], approximating tertiary. Primary emotions are archetypes the remain- directed graphs with undirected ones under optimality ing ones are patterned after or are derived of. They are conditions is shown in [33], classification of noisy graphs characterized by especially high survival value. 2 Georgios Drakopoulos et al. CEUR Workshop Proceedings 1–7 Table 2 The individual output of the 𝑗-th neuron can be com- Primary Emotions In Plutchik’s Model puted from the nonlinear combination of equation (2). Therein 𝑞 is the number of input neurons which is much Emotion Polarity Opposite smaller than that of the hidden neurons 𝑝, namely 𝑞 ≪ 𝑝, Neutral Neutral Neutral and equal to the dimensionality of each data point. Surprise Positive or negative Anticipation Anticipation Positive or negative Surprise 𝑞 △ Joy Positive Sadness ℎ𝑗 (x𝑖 ) = 𝜑𝑗 (∑ 𝑤𝑗,𝑘 x𝑖 [𝑘]) = 𝜑𝑗 (w𝑇𝑗 x𝑖 ) (2) Trust Positive Disgust 𝑘=1 Anger Negative Fear The nonlinear activation function 𝜑𝑘 (⋅) may take a Sadness Negative Joy Disgust Negative Trust number of forms such as the logistic function or poly- Fear Negative Anger nomial kernels. In this case it is the hyperbolic tangent function of (3). It has the advantage of being differen- tiable and of being the Bayes estimator of a bipolar source As stated earlier, each of the above emotions has an under additive white Gaussian (AWGN) noise. associated emotional polarity. For most emotions this △ 𝜑𝑘 (𝑥; 𝛽0 ) = tanh (𝑥; 𝛽0 ) (3) polarity is clear, although is some cases such as surprise this has to be determined by the context. The intensity The first derivative 𝜓 of 𝜑 can be expressed as a sec- of each emotion essentially determines its location on a ond order polynomial of the latter as shown in (4). This given affective axis. The higher the intensity, the more expression is that of Malthus population models. emotional a person is at a given time. The primary emotions in the model of Plutchik are △ 𝜕𝜑(𝑥; 𝛽0 ) 𝜓(𝑥; 𝛽0 ) = = 𝛽0 (1 − 𝜑 2 (𝑥; 𝛽0 )) (4) shown in table 2. Their location on the intensity scale 𝜕𝑥 and their relationship to other emotions are shown in fig- ure 1. Moreover, the primary emotions come in four bipo- The column synaptic weight vector w𝑗 is formed by lar opposite emotions in the sense that they accomplish stacking the 𝑞 weights w𝑗,𝑘 connecting the 𝑗-th hidden opposite objectives and their physical manifestations are neuron with the 𝑘-input one. Moreover, this is also the considerably different. Bipolarity does not necessarily 𝑗-th column of the synaptic weight matrix W. If the data mean that in every pair there is one feeling with positive points are stacked on top of each other, then the input polarity, though this may be the case as in the pair of matrix X is formed. Thus H of (1) can be rewritten as in joy and sadness. Instead, both emotions in the anger (5), where the function of (3) is elementwise applied. and fear pair are both perceived as negative, but they are △ H = 𝜑(WX𝑇 ) (5) diametric opposites in the context of fight or flight. In general ELMs, depending on their training formu- 3.2. Training lation, can perform regularized least squares fitting in order to determine the optimal weights as in (6) where ELM training has a simpler compared to other neural 𝜂0 is a hyperparameter. Therein the regularization term network architectures since it has only one hidden layer adds robustness to the algorithmic minimization process. with a large number 𝑝 of processing neurons. With the To this end, the nonlinear least squares problem of (6) proper training, each neuron can be specialized in a par- was formulated, where the Frobenius matrix norm is used ticular subset of the training set, which comprises of 𝑛 since it is differentiable. Also Y is the ground truth ma- data vectors. In this case the ELM output matrix H has trix containing the one hot encoding of the eight primary the elementwise structure of equation (1). emotions and W∗ is the solution. From its structure the 𝑖-th row of H contains the output of each neuron for 𝑖-th data point 0 ≤ 𝑖 ≤ 𝑛 − 1, while the W∗ =△ argmin [𝐽] = argmin [𝜂 ‖W‖2 + ‖𝜑(WX𝑇 ) − Y‖2 ] 0 𝐹 𝐹 𝑗-th column consists of the output of the 𝑗-th neuron 0 ≤ (6) 𝑗 ≤ 𝑝 − 1 across all the available data points, preserving Expanding (6) and taking into consideration the ex- the order in which they were given to the ELM. pansion of the Frobenius norm the objective function 𝐽 ℎ (x ) ℎ1 (x0 ) … ℎ𝑝−1 (x0 ) to be minimized can be recast as in (7). Because of the ⎡ 0 0 ⎤ form Frobenius norm and that of the nonlinear activation ℎ0 (x1 ) ℎ𝑘 (x1 ) … ℎ𝑝−1 (x1 ) ⎥ H= ⎢ △ ⎢ ⎥ ∈ ℝ𝑛×𝑝 function 𝐽 not only is differentiable but it also has a single ⎢ ⋮ ⋮ ⋱ ⋮ ⎥ global minimum. Additionally, the regularization term ⎣ℎ0 (x𝑛−1 ) ℎ1 (x𝑛−1 ) … ℎ𝑝−1 (x𝑛−1 )⎦ ensures that synaptic weight sparsity also taken into con- (1) sideration. Thus, minimizing 𝐽 translates into finding the 3 Georgios Drakopoulos et al. CEUR Workshop Proceedings 1–7 weight set achieving a tradeoff between fitting the ELM its central frequency. This allows the details of a speech response to the target response with the least possible signal to be more discernible. energy. The latter can be considered as the explanation The spectrogram of a signal is a function of time and closest to that dictated by Occam’s razor. frequency and shows how its frequency content evolves in small time steps. Typically it can be obtained by the 𝑇 𝐽 = 𝜂0 tr (W𝑇 W) + tr ((𝜑(WX𝑇 ) − Y) (𝜑(WX𝑇 ) − Y)) wavelet transform, by the short time Fourier transform (7) (STFT), or by a bank of bandpass filters such as Gabor The minimization problem of (7) is a regularized non- and shifted Chebyshev filters. In any case, the resulting linear least squares problem. The hyperparameter 𝜂0 heatmap has been transformed to a long column vec- determines the relative weight of the synaptic weight tor, which incurs some information loss as the spatial matrix sparsity compared to how well the ELM response structure is lost. This is attributed to the fact that the matches the target response. The problem of (7) can be proposed ELM is trained with data points with are real solved by a plethora of methodologies including iterative vectors. An architecture natively handling matrices may ones such as fixed point methods. However, they should be more adept in this scenario. take into consideration the nonlinear term introduced The 𝑘-th autocorrelation coefficient 𝑎 [𝑘] of any real- by the activation function. This can be accomplished by valued stationary sequence 𝑠 [𝑖] is defined as the expected utilizing methods such as the Gauss-Newton or a regu- value of the sequence multiplied by a shifted version of larized version thereof. In this work the steepest descent itself by 𝑘 positions. In practice these stochastic coef- iterative method was selected because of its simplicity ficients are often approximated by the sample mean of and because of the single global minimum 𝐽 has, since equation (8) under the assumption of ergodicity. Autocor- the latter is essentially a sum of squares. relation coefficients are a measure of the self-similarity Furthermore, it can be argued that the proposed ELM, of the sequence under consideration and play a central if properly trained, operates like a sparse coder with role in discovering higher order patterns through the each activation neuron corresponding to a single emo- Wiener filter. It should be noted that the higher 𝑘 is, the tion. This approach is clear it can be extended to an less reliable the estimation of 𝑎 [𝑘] becomes as fewer term arbitrary number of emotions, provided of course that pairs are available. Therefore, 𝑘 in most engineering ap- the appropriate attributes are available. However, the plications is small compared to the total length 𝑛 of the ELM proposed here can in fact discover the emotional speech sample sequence. As a direct consequence of the direction and not the valence itself. Cauchy-Schwarz inequality, the maximum autocorrela- tion coefficient is the first one 𝑎 [0]. 3.3. Attributes 1 𝑛−𝑘−1 𝑎 [𝑘] ≈ ∑ 𝑠 [𝑖] 𝑠 [𝑖 + 𝑘] , 0≤𝑘 ≤𝑛−1 (8) In this subsection the various features used to train the 𝑛 − 𝑘 𝑖=0 ELM described here, their primary properties, and their respective meaning are explained. Said attributes are also Finally, the zero crossing rate (ZCR) is an important shown in table 3 along with a brief explanation. feature which assumes that the mean value of the speech The cepstral coefficients 𝑐 [𝑘] express a modified short signals has been subtracted from it during a preprocess- term power spectrum of a signal 𝑠 [𝑘] consisting of speech ing phase. ZCR is closely tied with the primary mode samples. They are derived by an algorithmic process of the Hilbert-Huang spectrum (HHS), which is built on which involves the following steps: fundamental signals inherent in the sequence. Thus, in- tuitively speaking, the HHS is very similar to the Fourier • The sequence is pre-emphasized such that higher spectrum but it is composed of basis signals progressively frequencies receive an energy boost. extracted from the original signal itself and hence having • The spectrum is smoothed with a window, usually irregular shapes instead of weighted complex exponen- a Hamming window of odd length. tials. In this context ZCR plays a role analogous to that • The power spectrum is translated in the nonlinear of the fundamental frequency in Fourier analysis. Mel scale where resolution is not constant. The audio attributes used in this work are also shown • The logarithm of said power spectrum, which is in table 3 along with their interpretation. always real, is computed. • The coefficients of the inverse Fourier spectrum 4. Results are the cepstral coefficients. The natural meaning of the cepstral coefficients is that 4.1. ELM Architecture they represent a power spectrum where each frequency The architecture of the proposed ELM is shown in 2. No- band has a resolution roughly inversely proportional to tice that all hidden neurons belong to the same ELM layer 4 Georgios Drakopoulos et al. CEUR Workshop Proceedings 1–7 Table 3 Audio Attributes Attribute Meaning Cepstral coefficients Short term windowed power spectrum Spectrogram Frequency content evolution over short time steps Autocorrelation Self-similarity patterns in the speech sequence Zero crossing rate Tied to primary mode of Hilbert-Huang spectrum TESS attributes Ιnput layer Hidden layer seg. 1 Hidden layer seg. 2 Hidden layer seg. 3 Hidden layer seg. 4 ... Hidden layer seg. n Figure 2: Proposed architecture. and they are conceptually but not physically segmented In figure 3 is shown the heatmap resulting from the to show they are an integer multiple of the neurons of analysis of the ELM training. From it the following can the input layer. Hence the hidden layer can be thought be immediately inferred: of as comprising of segments, although in practice all • The neutral emotional state is the only one which hidden neurons are simultaneously trained. can be accurately discovered in the context of The implementation language of choice was Julia. this work. This can be attributed to the fact that It is a rapidly emerging multiparadigm high level lan- compared to the other states there is no valence. guage aiming at computation-heavy tasks such as those In turn this allows its isolation from the rest of frequently encountered in DL and ML scenarios, large the states in the attribute space with a margin database clustering, extensive and fine grained simula- sufficient for the ELM to discern it. tions, and graph signal processing. • On the contrary anger is the most difficult to be discovered. A possible explanation is that its bipo- 4.2. Emotion Recognition lar opposite emotion is also present in TESS and, The TESS dataset contains 200 target words spoken in the thus, certain instances have been misattributed context of a carrier phrase by two actresses, a younger to it. Moreover, anger is also confused with sur- and an older one aged 26 and 64 respectively. Each record- prise and sadness. The former is possibly due to ing contains 2000 data points, which are sufficient for valence, whereas the latter because of polarity. processing, and they represent the neutral state plus six • Concerning the other bipolar pair of sadness and of the primary emotions according to Plutchik’s model, happiness, they are clearly distinguished from namely these of anger, disgust, fear, happiness, pleas- each other, but nevertheless there is a small prob- ant surprise, and sadness. Therefore, from the emotions ability they will be misclassified respectively as listed in table 2 anticipation and trust are absent. Conse- disgust and as pleasant surprise. This can be at- quently, from the four pairs of primary bipolar emotions tributed to their valence as well as to the seman- only two are fully present in TESS. tics of each emotion under consideration. • The remaining emotions can be also be distin- 5 Georgios Drakopoulos et al. CEUR Workshop Proceedings 1–7 guished relatively easy from the others in the tectures capable of natively handling two-dimensional dataset. Still, the negative emotions tend to be attributes such as the class of graph neural networks. classified with a lower level accuracy compared to the positive ones, with the single exception of sadness. This can be explained by their preva- Acknowledgments lence in TESS. This conference paper is part of Project 451, a long term research initiative with a primary objective of develop- ing novel, scalable, numerically stable, and interpretable higher order analytics. References [1] A. Semeraro, S. Vilella, G. Ruffo, PyPlutchik: Visu- alising and comparing emotion-annotated corpora, PLoS one 16 (2021). [2] A. Talipu, A. Generosi, M. Mengoni, L. Giraldi, Eval- uation of deep convolutional neural network archi- tectures for emotion recognition in the wild, in: ISCT, IEEE, 2019, pp. 25–27. [3] T. M. Wani, T. S. Gunawan, S. A. A. Qadri, M. Kar- tiwi, E. Ambikairajah, A comprehensive review of speech emotion recognition systems, IEEE Access Figure 3: ELM heatmap. 9 (2021) 47795–47814. [4] W. Zehra, A. R. Javed, Z. Jalil, H. U. Khan, T. R. Gadekallu, Cross corpus multi-lingual speech emo- In summary, the heatmap reveals a performance level tion recognition using ensemble learning, Complex which may be satisfactory for certain applications. Still, & Intelligent Systems 7 (2021) 1845–1854. as negative emotions with the sole exception of sadness [5] S. Kwon, Optimal feature selection based speech tend to be less accurately identified compared to the emotion recognition using two-stream deep convo- positive ones, there is room for improvement. lutional neural network, International Journal of Intelligent Systems 36 (2021) 5116–5135. [6] C. Lu, Y. Zong, W. Zheng, Y. Li, C. Tang, B. W. 5. Conclusions Schuller, Domain invariant feature learning for speaker-independent speech emotion recognition, The focus of this conference paper is the development IEEE/ACM Transactions on Audio, Speech, and Lan- of an extreme learning machine (ELM) for speech emo- guage Processing 30 (2022) 2217–2230. tion recognition (SER) based on the primary emotions [7] Z. Zhao, Q. Li, Z. Zhang, N. Cummins, H. Wang, identified in Plutchik’s model. Based on a wide array of J. Tao, B. W. Schuller, Combining a parallel 2D CNN audio attributes an ELM is trained to act like a sparse with a self-attention dilated residual network for coder with the nine fundamental emotions are one-hot CTC-based discrete speech emotion recognition, encoded in an output vector. The proposed approach Neural Networks 141 (2021) 52–60. is flexible enough as the training phase on an ELM is [8] S. Zhang, X. Tao, Y. Chuang, X. Zhao, Learning much simpler compared to that of other neural network deep multimodal affective features for spontaneous architectures, especially the fundamental multilayer per- speech emotion recognition, Speech Communica- ceptron. The results obtained with waveforms taken from tion 127 (2021) 73–81. the established Toronto emotional speech set (TESS) are [9] P. Sreeja, G. Mahalakshmi, Emotion models: A very encouraging in terms of accuracy. review, International Journal of Control Theory Regarding future research directions, the proposed and Applications 10 (2017) 651–657. neural network architecture and the associated encod- [10] K. R. Scherer, et al., Psychological models of emo- ing can be tested with other publicly available speech tion, The neuropsychology of emotion 137 (2000) datasets such as the Emo-Soundscape or SUSAS. More- 137–162. over, an ELM can be adapted to other human emotion [11] S. Marsella, J. Gratch, P. Petta, et al., Computa- models such as the big five or the universal emotion theory. tional models of emotion, A Blueprint for Affective Finally, attribute vectorization can be avoided with archi- 6 Georgios Drakopoulos et al. CEUR Workshop Proceedings 1–7 Computing - A sourcebook and manual 11 (2010) rics, NCAA 33 (2021) 16363–16375. doi:10.1007/ 21–46. s00521- 021- 06235- 9 . [12] R. M. Nesse, Evolutionary explanations of emotions, [26] I. Tyagin, A. Kulshrestha, J. Sybrandt, K. Matta, Human nature 1 (1990) 261–289. M. Shtutman, I. Safro, Accelerating COVID-19 re- [13] A. Lieto, G. L. Pozzato, S. Zoia, V. Patti, R. Dami- search with graph mining and transformer-based ano, A commonsense reasoning framework for learning, in: Conference on Artificial Intelligence, explanatory emotion attribution, generation and volume 36, AAAI, 2022, pp. 12673–12679. re-classification, Knowledge-Based Systems 227 [27] G. Gomez-Sanchez, L. Delgado-Serrano, D. Carrera, (2021). D. Torrents, J. L. Berral, Author correction: Cluster- [14] Q.-Y. Zhu, A. K. Qin, P. N. Suganthan, G.-B. Huang, ing and graph mining techniques for classification Evolutionary extreme learning machine, Pattern of complex structural variations in cancer genomes, recognition 38 (2005) 1759–1763. Scientific Reports 12 (2022). [15] S. Yahia, S. Said, M. Zaied, Wavelet extreme learning [28] G. Drakopoulos, E. Kafeza, P. Mylonas, S. Sioutas, machine and deep learning for data classification, A graph neural network for fuzzy Twitter graphs, Neurocomputing 470 (2022) 280–289. in: G. Cong, M. Ramanath (Eds.), CIKM companion [16] Z. Tang, S. Wang, X. Chai, S. Cao, T. Ouyang, Y. Li, volume, volume 3052, CEUR-WS.org, 2021. Auto-encoder-extreme learning machine model for [29] G. Drakopoulos, I. Giannoukou, P. Mylonas, boiler NOx emission concentration prediction, En- S. Sioutas, A graph neural network for assessing ergy 256 (2022). the affective coherence of Twitter graphs, in: IEEE [17] D. Li, S. Li, S. Zhang, J. Sun, L. Wang, K. Wang, Big Data, IEEE, 2020, pp. 3618–3627. doi:10.1109/ Aging state prediction for supercapacitors based on BigData50022.2020.9378492 . heuristic Kalman filter optimization extreme learn- [30] L. Hu, L. Zou, A GPU-based graph pattern mining ing machine, Energy 250 (2022). system, in: CIKM, 2022, pp. 4867–4871. [18] N. Kardani, A. Bardhan, P. Samui, M. Nazem, [31] G. Drakopoulos, Y. Voutos, P. Mylonas, S. Sioutas, A. Zhou, D. J. Armaghani, A novel technique based Motivating item annotations in cultural portals on the improved firefly algorithm coupled with ex- with UI/UX based on behavioral economics, in: treme learning machine (ELM-IFF) for predicting IISA, IEEE, 2021. doi:10.1109/IISA52424.2021. the thermal conductivity of soil, Engineering with 9555569 . Computers 38 (2022) 3321–3340. [32] S. A. Bhavsar, V. H. Patil, A. H. Patil, Graph parti- [19] J. Xia, D. Yang, H. Zhou, Y. Chen, H. Zhang, T. Liu, tioning and visualization in graph mining: A survey, A. A. Heidari, H. Chen, Z. Pan, Evolving kernel ex- Multimedia Tools and Applications (2022) 1–42. treme learning machine for medical diagnosis via a [33] G. Drakopoulos, I. Giannoukou, P. Mylonas, disperse foraging sine cosine algorithm, Computers S. Sioutas, On tensor distances for self organiz- in Biology and Medicine 141 (2022). ing maps: Clustering cognitive tasks, in: DEXA, [20] S. Ding, X. Xu, R. Nie, Extreme learning machine volume 12392 of Lecture Notes in Computer Sci- and its applications, NCAA 25 (2014) 549–556. ence, Springer, 2020, pp. 195–210. doi:10.1007/ [21] M. A. Thafar, S. Albaradie, R. S. Olayan, H. Ashoor, 978- 3- 030- 59051- 2\_13 . M. Essack, V. B. Bajic, Computational drug-target [34] Z. Xu, B. Du, H. Tong, Graph sanitation with appli- interaction prediction based on graph embedding cation to node classification, in: Web Conference, and graph mining, in: Proceedings of the 2020 10th ACM, 2022, pp. 1136–1147. international conference on bioscience, biochem- [35] Z. Sun, B. Wu, Y. Wang, Y. Ye, Sequential graph istry and bioinformatics, 2020, pp. 14–21. collaborative filtering, Information Sciences 592 [22] K. Yamada, Y. Tanaka, Temporal multireso- (2022) 244–260. lution graph learning, IEEE Access 9 (2021) [36] S. Wang, X. Niu, P. Fournier-Viger, D. Zhou, F. Min, 143734–143745. A graph based approach for mining significant [23] G. Drakopoulos, I. Giannoukou, S. Sioutas, P. My- places in trajectory data, Information Sciences 609 lonas, Self organizing maps for cultural con- (2022) 172–194. tent delivery, NCAA (2022). doi:10.1007/ [37] G. Drakopoulos, E. Kafeza, P. Mylonas, S. Sioutas, s00521- 022- 07376- 1 . Approximate high dimensional graph mining with [24] M. Singh, Using natural language processing and matrix polar factorization: A Twitter application, graph mining to explore inter-related requirements in: IEEE Big Data, IEEE, 2021, pp. 4441–4449. doi:10. in software artefacts, ACM SIGSOFT Software En- 1109/BigData52589.2021.9671926 . gineering Notes 44 (2022) 37–42. [25] G. Drakopoulos, E. Kafeza, P. Mylonas, L. Iliadis, Transform-based graph topology similarity met- 7