Adaptive learning of evolving hyper basis function neural network Yevgeniy Bodyanskiya, Anastasiia Deinekob, Iryna Plissa and Oleksandr Zeleniyc a Kharkiv National University of Radio Electronics, Control Systems Research Laboratory, Nauky av., 14, Kharkiv, 61166, Ukraine b Kharkiv National University of Radio Electronics, Artificial Intelligence Department, Nauky av., 14, Kharkiv, 61166, Ukraine c Kharkiv National University of Radio Electronics, Department of Media Systems and Technologies, Nauky av., 14, Kharkiv, 61166, Ukraine Abstract In the article architecture and learning method of the artificial evolving hyper basis neural network are proposed. The neural network under consideration tunes not only its synaptic weights, but automatically determines neurons number, coordinates of the kernel activation function centers and parameters of the receptive fields in online mode providing high speed data processing. Keywords 1 Artificial neural networks, adaptive learning, hyper basis function neural network, self- organizing T. Kohonen’s map, V. Epanechnikov’s activation kernel function 1. Introduction To date, artificial neural networks (ANNs) are widely used to solve various problems of Data Mining and first of all for intelligent control, identification, pattern recognition, classification, clustering, forecasting, emulation in conditions of uncertainty and significant nonlinearity. If data should be processed in a sequential online mode, a convergence rate of a learning process comes to the forefront, that significantly limits the ANNs class suitable for work under these conditions. ANNs, which use kernel activation functions (radial-basis, bell-shaped, potential), are very effective from the speed optimization point of view in the learning process. Radial-basis function neural networks (RBFN) are widely used because their output signal depends linearly on synaptic weights. The main idea of this ANNs is connected with potential function method [1], Parsen’s estimations [2], kernel [3] and nonparametric [4] regressions. Universal approximation properties and ability to process data sequentially in online mode are its main benefits. However, the RBFN is exposed to the so-called “curse of dimensionality” which means that when the input space dimensionality increases, there’s an exponential growth of the adjustable parameters’ (weights’) amount. To overcome this problem for solving practical tasks possible by using hyper basis function neural network (HBFN) [5], with have more advantages comparatively to traditional RBFN. 2. Radial basis and hyper basis function neural networks In Figure 1 standard architecture of the radial-basis function network is shown whose hidden layer implements some nonlinear transformation of the input space R! into higher dimension (h > n) hidden CMIS-2021: The Fourth International Workshop on Computer Modeling and Intelligent Systems, April 27, 2021, Zaporizhzhia, Ukraine EMAIL: yevgeniy.bodyanskiy@nure.ua (Ye. Bodyanskiy); iryna.pliss@nure.ua (I. Pliss); anastasiia.deineko@.nure.ua (A. Deineko); oleksandr.zeleniy@nure.ua (O. Zeleniy) ORCID: 0000-0001-5418-2143 (Ye. Bodyanskiy); 0000-0001-7918-7362 (I. Pliss); 0000-0002-3279-3135 (A. Deineko); 0000-0002-7583- 7759 (O. Zeleniy) © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Wor Pr ks hop oceedi ngs ht I tp: // ceur - SSN1613- ws .or 0073 g CEUR Workshop Proceedings (CEUR-WS.org) space R" and its output layer is formed by adaptive linear associators that form the network response by performing a nonlinear transformation of the form " " # (k) y( = w$ + - w% φ% /x(k)1 = - w% φ% /x(k)1 = w ( φ 2 /x(k)1 %&' %&$ ( where x(k) = /x' (k), x) (k), … , x! (k)1 , φ% /x(k)1 = φ% (‖x(k) − c% ‖, σ% ) − radial-basis function, depended of distance ‖x(k) − c% ‖ between input vector x(k) and activation function centers c% and ( 2 /x(k)1 = 91, φ( /x(k)1; , φ/x(k)1 = 9φ' /x(k)1;, width parameter σ% , k − is current discrete time, φ ( φ) /x(k)1,…, φ" /x(k)1; . Figure 1: Standard radial-basis function network The most commonly standard Gaussian function as activation function in the radial-basis ANNs is used in the form ‖+(-)/0! ‖" 𝜑% /x(k)1 = exp ?− @ , l = 1,2, . . . , h (1) 1"! where centers c% and wight σ% parameters are determined beforehand and do not tuned during learning process. The learning process itself is connected with adjusting of the synaptic weights vector w = (w$ , w' , … , w" )( , for that different modifications of least-squares method or traditional gradient procedures are usually used. Using the multidimensional construction instead of the Gaussian (1) it is possible to improve the approximating properties of the network 𝜑% /x(k)1 = exp 9−(x(k) − c% )( Σ%/' (x(k) − c% ); = exp 9−‖x(k) − c% ‖)2#$ ; (2) ! where Σ3/' covariance matrix that determines shape, size and receptive field orientation of 𝑙 − 𝑡ℎ kernel activation function. This is the main difference between hyper basis function network and traditional RBFN. If Σ3 = 𝜎3) 𝐼 (here 𝐼 = (𝑛 × 𝑛) − identity matrix) the receptive field is a hypersphere with a center 𝑐3 and radius 𝜎3 ; if Σ3 = 𝑑𝑖𝑎𝑔(𝜎') , 𝜎)) , … , 𝜎4) ) − that is hyperelipsoid with axes coincide with input space axes and have length 𝜎5 , at 𝑖 67 axe, and finally, if Σ3 is arbitrary positive definite matrix Σ3 = 𝑄3 8 Λ3 𝑄3 diagonal matrix of the eigenvectors Λ3 determines receptive field size and rotation orthogonal matrix 𝑄3 − its orientation. Speaking about hyper basis function ANNs learning it should be noted that here not only synaptic weights vector 𝑤 can be adjusted but and centers 𝑐3 , and matrixes Σ3 . So, introducing into consideration the transformation implemented by the neural network in the form 7 𝑦( 9 (𝑘) = - 𝑤3 𝜑3 9‖𝑥(𝑘) − 𝑐3 ‖)2#$ ;, % 3&$ learning criterion ) 7 1 1 𝐸(𝑘) = 𝑒 ) (𝑘) = Y𝑦(𝑘) − - 𝑤3 𝜑3 9‖𝑥(𝑘) − 𝑐3 ‖): #$ ;Z 2 2 % 3&$ (here 𝑦(𝑘) − external reference signal) and derivations by all tuned parameters ;<(=) ⎧ = −𝑒(𝑘)𝜑3 9‖𝑥(𝑘) − 𝑐3 ‖)2#$ ; , ;>% % ⎪ ) ∇?% 𝐸 (𝑘) = 2𝑒(𝑘)𝑤3 𝜑3 9‖𝑥(𝑘) − 𝑐3 ‖2#$ ; Σ3/' (𝑥 (𝑘) − 𝑐3 ), @ (3) % ⎨ ⎪`;<(=)a = −𝑒(𝑘)𝑤 𝜑@ 9‖𝑥(𝑘) − 𝑐 ‖)#$ ; (𝑥(𝑘) − 𝑐 )(𝑥(𝑘) − 𝑐 )8 , ⎩ ;2#$ 3 3 3 2 3 3 % % the learning algorithm could be written [6]: ⎧ 𝑤3 (𝑘 + 1) = 𝑤3 (𝑘) + 𝜂> (𝑘 + 1)𝑒(𝑘 + 1)𝜑3 9‖𝑥 (𝑘 + 1) − 𝑐3 (𝑘)‖)2#$ (=) ; , % ⎪ ⎪ (𝑘 𝑐3 + 1) = 𝑐3 (𝑘) − 𝜂? + 1)𝑒(𝑘 + 1)𝑤3 + 1)𝜑3 9‖𝑥(𝑘 + 1) − 𝑐3 )2#$(=) ; × (𝑘 (𝑘 @ ‖ % × Σ3/' (𝑘)(𝑥 (𝑘 + 1) − 𝑐3 (𝑘)), ⎨ /' /' @ ) ⎪Σ3 (𝑘 + 1) = Σ3 (𝑘) + 𝜂2 (𝑘 + 1)𝑒(𝑘 + 1)𝑤3 (𝑘 + 1)𝜑3 9‖𝑥(𝑘 + 1) − 𝑐3 (𝑘 + 1)‖2#$ % (=) ;× ⎪ 8 ⎩ × /𝑥(𝑘 + 1) − 𝑐3 (𝑘 + 1)1/𝑥(𝑘 + 1) − 𝑐3 (𝑘 + 1)1 (4) where 𝜂> (𝑘 + 1), 𝜂? (𝑘 + 1), 𝜂2 (𝑘 + 1) − learning rate parameters for the corresponding variables. Using as activation functions Gaussians (1) and (2) leads to learning procedure (4) becomes too cumbersome from computational point of view that naturally slows down the learning speed. In this regards we propose to introduce into consideration multidimensional modification of the V. Epanechnikov’s function [7] in the form 𝜑3 9‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖)2#$ ; = 1 − ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖)2#$ , % % whose derivations have the form ) ⎧ ∇?% 𝜑3 9‖𝑥(𝑘) − 𝑐3 ‖2#$ ; = −2Σ3/' (𝑥(𝑘) − 𝑐3 ), ⎪ % ;A% B‖C(=)/?% ‖" #$ D (5) ⎨c &% d = (𝑥 ( 𝑘 ) − 𝑐 )( 𝑥 ( 𝑘 ) − 𝑐 ) 8 . ⎪ ;2#$ % 3 3 ⎩ Relations (5) avoid to rewrite the system (3) in the form 𝜕𝐸(𝑘) ⎧ = −𝑒(𝑘)𝜑3 91 − ‖𝑥(𝑘) − 𝑐3 ‖)2#$ ; , ⎪ 𝜕𝑤 3 % /' (𝑥(𝑘) ∇?% 𝐸(𝑘) = 2𝑒(𝑘)𝑤3 Σ3 − 𝑐3 ), ⎨ 𝜕𝐸(𝑘) ⎪f g = −𝑒(𝑘)(𝑥(𝑘) − 𝑐3 )(𝑥(𝑘) − 𝑐3 )8 , ⎩ 𝜕Σ3/' and learning algorithm – ) ⎧ 𝑤3 (𝑘 + 1) = 𝑤3 (𝑘 ) + 𝜂> (𝑘 + 1)𝑒(𝑘 + 1) 91 − ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖2#$ % (=) ;, ⎪ (𝑘 𝑐3 + 1) = 𝑐3 (𝑘) − 𝜂? (𝑘 + 1)𝑒(𝑘 + 1)𝑤3 (𝑘 + 1)Σ3/' (𝑘)(𝑥(𝑘 + 1) − 𝑐3 (𝑘)), (6) ⎨ Σ3/' (𝑘 + 1) = Σ3/' (𝑘) + 𝜂2 (𝑘 + 1)𝑒(𝑘 + 1)𝑤3 (𝑘 + 1) × ⎪ 8 ⎩ × /𝑥(𝑘 + 1) − 𝑐3 (𝑘 + 1)1/𝑥(𝑘 + 1) − 𝑐3 (𝑘 + 1)1 . It is easy to see that procedure (6) from the computational point of view is simpler that algorithm (4). 3. Hybrid neural network and evolution of its architecture The question how to choose number of neurons in the network ℎ and initial center locations 𝑐3 it is very important. The easiest way to overcome this problem is to use Subtractive Clustering algorithm [8], that is effective enough for processing data in the batch mode. But this algorithm requires to select huge amount of the free parameters. If solving task is connected with processing of non-stationary data, then it is necessary to re-initialize the network from time to time. Dynamic Decay Adjustment (DDA) also is one of the possible methods to tune neural network with kernel activation functions [9]. This method is belonging to constructive learning algorithms and works fast enough. But, in the mode of online processing of the non-stationary signals this method becomes noneffective. Resource Allocation Network [10] uses hybrid learning based on both optimization and memory learning (the principle "Neurons in data points"), using of competition elements. At the same time, in the learning process using gradient procedures, both synaptic weights and the parameters of the neuron’s centers closest to the received observation are adjusted. It can be noted that the standard Epanechnikov’s functions are used instead of traditional Gaussians as activation functions in the network. The disadvantage of Resource Allocation Network is its high computational complexity. In this regard, it is necessary to develope artificial evolving hyper basis function neural network, that tunes not only its all parameters but and automatically determines number of neurons in online mode with high speed of data processing. In the figure 2 architecture of a hybrid evolving artificial neural network that is based on a hyper- basis neural network with a variable number of neurons and T. Kohonen's self-organizing map (SOM) [11], that controls their number and adjusts centers location in the self-learning mode is demonstrated. Figure 2: Structural schema of the hybrid evolving network The functioning process of this system is as follows. When the first observation 𝑥(𝑘) is fed to the neural network input, where the first neuron is formed according to the principle of "Neurons at data points", i. e. almost instantly. In the next incoming of the data they firstly fed to SOM, where they are compared with the already existing centroids and then if no coincidences are found, a new center of the kernel function is formed and, accordingly, a new neuron in HBFN. According to the approach under consideration next method of the controlled numbers of neurons activation function is put into consideration: Stage 11: to encode all the values of the input variables into the interval −1 ≤ 𝑥5 ≤ 1 and set the receptive field radius of the neighborhood function in the interval 𝑟 ≤ 0,33; Stage 21: observation 𝑥(1) after this is fed set 𝑐' = 𝑥(1); Stage 31: observation 𝑥(2) is set: - if ‖𝑥(2) − 𝑐' ‖ ≤ 𝑟, that 𝑐' (1) is corrected by the rule 𝑐' + 𝑥(2) 𝑐3 (2) = ; 2 - if 𝑟 < ‖𝑥(2) − 𝑐' (1)‖ ≤ 2𝑟, 𝑐' (1) is corrected according to the self-learning of the self- organizing Kohonen’s map by principle of "Winner takes more" (WTM) [11] 𝑐' (2) = 𝑐' (1) + 𝜂(2)𝜓' (2)(𝑥(2) − 𝑐' (1)) with neighborhood function ) ‖𝑥(2) − 𝑐' (1)‖ 𝜓3 (2) = 𝑚𝑎𝑥 c0,1 − p q d 2𝑟 (Epanechnikov’s function with receptive field with radius equal 2𝑟) - if ‖𝑥(2) − 𝑐' (1)‖ > 2𝑟, new kernel activation function is formed with center 𝑐) (2) = 𝑥(2). This completes the first iteration of the activation functions forming of the hyper basis neural network. Let us for the 𝑘-th moment of time 𝑝 ≤ ℎ activation functions 𝜑3 /𝑥(𝑘)1 are formed with centers 𝑐3 (𝑘) and 𝑥(𝑘 + 1)-th observation is fed to processing. Next forming of the activation functions is performed as follows: Stage 1k+1: to determine neuron-winner for each distance ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖ minimal among all 𝑙 = 1,2, … , 𝑝; Stage 2k+1: - if ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖ ≤ 𝑟 then 𝑐3 (𝑘) + 𝑥(𝑘 + 1) 𝑐3 (𝑘 + 1) = ; 2 - if 𝑟 < ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖ ≤ 2𝑟, - then 𝑐3 (𝑘 + 1) = 𝑐3 (𝑘) + 𝜂(𝑘 + 1)𝜓3 (𝑘 + 1)(𝑥(𝑘 + 1) − 𝑐3 (𝑘)); ) ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖ 𝜓3 (𝑘 + 1) = 𝑚𝑎𝑥 c0,1 − p q d; 2𝑟 - if ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖ > 2𝑟 then kernel activation function is formed with center 𝑐EF' (𝑘 + 1) = 𝑥(𝑘 + 1); - if in the process of activation functions formation arises a situation ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖ > 2𝑟 and 𝑝 = ℎ, then it is necessary to increase the receptive field radius and return to stage2k+1 with an increased radius of the function 𝜓3 (𝑘 + 1). As can be seen, this procedure is a hybrid of N. Kasabov's evolving algorithm [12] and T. Kohonen's self-organizing map. However, the proposed neural network is designed not only to solve clustering problems, but to control the number of neurons in the hyper basis neural network. 4. Evolving cascaded hyper basic function neural network If we have a short learning sample then it is possible to overcome the problem of overfitting by dividing an initial task in some way to subtasks of lower dimensionality and by grouping the obtained solutions to get a required result. From a computational point of view, the most convenient method in this case is the Group Method of Data Handling (GMDH) [14-17] that have demonstrated its efficiency while solving a number of practical tasks. In [17], a multilayer GMDH-neural network is considered. It contains two-input N-adalines as nodes and each node’s output is a quadratic function of an input signal. Each neuron’s synaptic weights are defined in a batch mode using the conventional least squares method. One way use some more hidden layers to provide a necessary approximation quality. That is why an online learning procedure becomes impossible. In this connection it’s expedient to introduce a simplified architecture which is based on simple nodes and can be tuned under conditions of a short learning sample. Let’s introduce an architecture of the compartmental R-neuron that is shown in the Figure 4. Generally speaking, it’s a simplified architecture of the conventional RBFN with two inputs 𝑥5 and 𝑥G , 𝑖, 𝑗 = 1,2, … , 𝑛, where 𝑛 is a dimensionality of initial input space. 5G 5G 5G The compartmental R-neuron contains 𝑝 activation functions 𝜑7 /𝑥 5G , 𝑐7 , Σ7 1, 𝑝 + 1 synaptic 5G 5G 5G 5G 5G weights that are joined in vector 𝑤3 = /𝑤3$ , 𝑤3' , … , 𝑤3E 1, 𝑝 two-dimensional centers’ vectors 𝑐7 = G 8 5G /𝑐75 , 𝑐7 1 , 𝑝 (2 × 2) −matrices of receptive fields’ activation functions Σ7 , a two-dimensional input 8 vector 𝑥 5G = /𝑥5 , 𝑥G 1 , output 𝑦(3 ; 𝑙 = 1,2, … , 𝑝; 𝑘 = 1,2, … , 𝑁 is a number of observations in a processing sample or an index of the current discrete time. Figure 3: The compartmental R-neuron 5G 5G 5G Multidimensional Epanechnikov kernels are used as activation functions 𝜑7 /𝑥 5G , 𝑐7 , Σ7 1 5G 5G 5G 5G ) 𝜑7 /𝑥 5G , 𝑐7 , Σ7 1 = 1 − u𝑥 5G − 𝑐7 u () #$ (7) H2' I 5G which have a bell-shape form determined by a positive definite matrix of the receptive field Σ7 . An advantage of the activation function (7) comparing to conventional ones is linearity of its derivatives with respect to all the parameters that makes it possible to adjust synaptic weights as well as both centers and receptive fields (which is very important when we have a very short learning sample). At the same time, the compartmental R-neuron implements a transformation in the form E E 5G 5G 5G 5G 5G 5G 5G 5G 5G ) 𝑦(3 = 𝑤3$ + - 𝑤37 𝜑7 /𝑥 , 𝑐7 , Σ7 1 = 𝑤3$ + - 𝑤37 p1 − u𝑥 5G − 𝑐7 u () #$ q. H2' I 7&' 7&' It should be noticed that if we have a two-dimensional case it is easy to locate centers at the regular lattice’s nodes and to define receptive fields as circles. 5G 5G 5G Introducing a (𝑝 + 1) × 1-vector of activation functions 𝜑5G = 91, 𝜑3 9𝑥 5G (𝑘), 𝑐3 , ΣG ;, 8 5G 5G 5G …,𝜑E /𝑥 5G (𝑘), 𝑐E , ΣE 1; and a learning criterion ) J J ) J ) J 5G 8 5G 𝐸3 = ∑=&'/𝑦(𝑘) − 𝑦(3 (𝑘)1 = ∑=&' 𝑒3 (𝑘) = ∑=&' p𝑦(𝑘) − /𝑤3 1 𝜑 (𝑘)q (8) it's easy to obtain a required solution with the help of the traditional least squares method in the form J F J 8 5G 5G (𝑘) 5G (𝑘); (9) 𝑤3 = w- 𝜑 9𝜑 x - 𝜑5G (𝑘)𝑦(𝑘) =&' =&' where (∙)F is a symbol of the Moore-Penrose inversion. If data are fed sequentially in an online mode, recurrent from of the expression (8) can be used in the form 8 ⎧ 𝑃5G (𝑘 − 1) p𝑦(𝑘) − 9𝑤35G (𝑘 − 1); 𝜑5G (𝑘)q ⎪ 𝑤35G (𝑘) = 𝑤35G (𝑘 − 1) + 𝜑5G (𝑘), ⎪ 8 1 + /𝜑 (𝑘)1 𝑃5G (𝑘 − 1)𝜑 (𝑘) 5G 5G ⎨ 8 ⎪ (𝑘) 𝑃5G (𝑘 − 1)𝜑5G (𝑘) 9𝜑5G (𝑘); 𝑃5G (𝑘 − 1) ⎪𝑃5G = 𝑃5G (𝑘 − 1) − 8 , 𝑃5G (0) = 𝛾𝐼, 𝛾 ≫ 0. ⎩ 1 + /𝜑5G (𝑘)1 𝑃5G (𝑘 − 1)𝜑5G (𝑘) (10) The algorithms (9) and (10) are effective only in those cases when a required solution is stationary which means that optimal values of the synaptic weights aren’t variable in time. But most of real-word practical tasks are characterized by an opposite situation. A high-performance adaptive learning algorithm with both tracking and filtering properties [18] can be used for adaptive identification of non- stationary objects and non-stationary time series prediction in the form 8 ⎧𝑤35G (𝑘) = 𝑤35G (𝑘 − 1) + 𝑝> /' (𝑘) 5G p𝑦(𝑘) − 9𝑤3 (𝑘 − 1); 𝜑5G (𝑘)q 𝜑5G (𝑘) = ⎪ (11) ⎨ = 𝑤35G (𝑘 − 1) + 𝑝> /' (𝑘)𝑒 (𝑘)𝜑 5G (𝑘), 3 ⎪ ) ⎩ 𝑝> (𝑘) = 𝛽𝑝> (𝑘 − 1) + u𝜑/𝑥(𝑘)1u , 0 ≤ 𝛽 ≤ 1. To tune centers and covariance matrices, one could use a procedure ⎧ 𝑤35G (𝑘) = 𝑤35G (𝑘 − 1) + 𝜂> (𝑘)𝑒3 (𝑘)𝜑5G (𝑘), ) ⎪ /' ( ) 𝜂> 𝑘 = 𝑝> (𝑘 ) = 𝛽𝑝> (𝑘 − 1) + u𝜑/𝑥(𝑘)1u , ⎪ /' ⎪ 𝑐75G (𝑘) = 𝑐75G (𝑘 − 1) + 𝜂? (𝑘)𝑒3 (𝑘)𝑤35G (𝑘) 9Σ75G (𝑘 − 1); 5G 9𝑥 5G (𝑘) − 𝑐7 (𝑘 − 1); = ⎪ ⎪ = 𝑐75G (𝑘 − 1) + 𝜂? (𝑘)𝑒3 (𝑘)𝑔7 (𝑘), ⎨ 𝜂?/' (𝑘) = 𝑝? (𝑘) = 𝛽𝑝? (𝑘 − 1) + ‖𝑔7 (𝑘)‖) , /' /' 8 ⎪9Σ5G (𝑘); = 9Σ5G (𝑘 − 1); − 𝜂 (𝑘)𝑒 (𝑘)𝑤 5G (𝑘) 9𝑥 5G (𝑘) − 𝑐 5G (𝑘); 9𝑥 5G (𝑘) − 𝑐 5G (𝑘); = 7 7 2 3 3 7 7 ⎪ ⎪ /' ⎪ = 9Σ75G (𝑘 − 1); − 𝜂2 (𝑘)𝑒3 (𝑘)𝐺7 (𝑘), ⎪ ⎩ 𝜂2/' (𝑘) = 𝑃2 (𝑘 ) = 𝛽𝑃2 (𝑘 − 1) + 𝑇𝑟 9𝐺7 (𝑘)𝐺78 (𝑘); . An architecture in Figure 4 combines the GMDH ideas and cascade neural networks. Figure 4: An evolving cascade neural network The first hidden layer of the network is formed similarly to the first hidden layer of the GMDH neural network [17] and contains a number of neurons that equals to a number of combinations of 𝑛 in 2, that is 𝐶4) . A selection block SB executes a sorting procedure by accuracy, for example, the most ['] [']∗ [']∗ accurate signal among all output signal 𝑦(3 is 𝑦(' in the sense of variations, then comes 𝑦() and the [']∗ [']∗ [']∗ worst one is 𝑦(M " . The SB outputs 𝑦(' and 𝑦() are fed to the only neuron in the second cascade layer + 𝐶𝑅 − 𝑁 [)] that computes a signal 𝑦 [)] which is later joined in the third cascade with the selection [']∗ block’s output 𝑦(N . A process of cascades’ increasing goes on unit the required accuracy is obtained. An overall neurons’ number of this network is defined by a value 2𝐶4) − 1. The neural network can process information that is fed in real time by adjusting both its parameters and its architecture in time [19]. 5. Experimental results In first series of experiments, we used synthetics generated data set that describes Normal Gauss bivariate distribution. In this data set number of clusters was chosen arbitrary and all data were generated in the interval [−1; 1]. Result of processing synthetic data set by evolving hyper basis function neural network (EHBFNN) are shown at Figure 5. Figure 5: Result of processing synthetic data set by EHBFNN For the next experiments series data set “Wine” and data set “Ionosphere” from UCI Repository [13] were taken. Efficiency of proposed evolving hyper basis function neural network (EHBFNN) has been investigated and compared with the standard radial-basis neural network and the standard T. Kohonen’s self-organizing map (SOM). Firstly, data preprocessing that include outliers and gaps analysis was made. Investigated data were coding into hypercube. For results visualization principal component analysis method was used. The Figure 6 demonstrates evaluation of the centroid’s coordinates (as a visualization example data set “Wine” was taken). Comparison of numerical results of proposed EHBFNN, standard RBFN and SOM are shown in the Table 1. Clustering quality were measured by Calinski-Harabash index [19] and in the Table 1 maximum, average and minimum values has been demonstrated. This index in general formulation has a form /' 1 1 𝐶𝐻(𝑚) = 𝑇𝑟𝑆OP ? 𝑇𝑟𝑆>P @ 𝑚−1 𝑁−𝑚 P ' P P P /P P /P 8 where 𝑆O = J ∑G&' 𝑁G /𝑤G − 𝑤 1/𝑤G − 𝑤 1 , j = 1,2, … , m − the inter-cluster distance ' matrix for 𝑚 clusters; 𝑤 /P = J ∑P P P P G&' 𝑁G 𝑤G − gravity data set center 𝑥; 𝑁G − number of observation ' 8 belonged to the j-th clusters; 𝑆>P = J ∑P J P P G&' ∑=&' 𝑢G (𝑘)/𝑥(𝑘) − 𝑤G 1 /𝑥(𝑘) − 𝑤G 1 − intra-cluster distance matrix for 𝑚 clusters, 1, 𝑖𝑓 𝑥(𝑘) 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 𝑗 − 𝑡ℎ 𝑐𝑙𝑢𝑠𝑡𝑒𝑟, 𝑢𝑗 = ! 0, 𝑜𝑣𝑒𝑟𝑤𝑖𝑠𝑒 where 𝑢G − crisp membership function. In situation when observations came to process sequentially in online mode is it necessary to organize calculation of the Calinski-Harabash index on the sliding window of dimensional 𝑠 (𝑠 = 1,2, … , 𝑁) and we could rewrite this index in the form 1 ) ∑P P P G&' 𝑁G (𝜏)u𝑤G (𝜏) − 𝑤 /P (𝜏)u 𝐶𝐻(𝑚, 𝑘) = 𝑚 − 1 1 ) ∑P P 𝑁 − 𝑚 G&' 𝑢G (𝜏)u𝑥(𝜏) − 𝑤G (𝜏)u where = /P (𝜏) 1 𝑤 = - 𝑥(𝜏). 𝑠 R&=/SF' Figure 6: Evaluation of the centroid’s coordinates Visualizations of clustering results by proposed evolving hyper basis function neural network are presented at the Figure 7 (Figure 7a – data set “Wine”; Figure 7b – dada set “Ionosphere”) a) b) Figure 7: Visualizations of clustering results by proposed evolving hyper basis function neural network: a) data set “Wine”, b) dada set “Ionosphere” Table 1 Comparison of numerical results of investigated ANNs Calinski-Harabash index Investigated ANNs avg max min “Wine” data set RBFN 65.34 69.98 40.93 SOM 56.20 70.67 47.62 EHBFNN 69.52 70.94 41.90 “Ionosphere” data set RBFN 62.78 90.52 55.35 SOM 70.56 86.16 64.82 EHBFNN 74.70 122.64 62.78 6. Conclusions The approach, which combines training of both synaptic weights and activation functions’ centers, and which is based on both supervised learning and self-learning, is proposed in this paper. The main advantage of the proposed approach is that it can be used in an online mode, when a training set is fed to a system’s input sequentially, and its volume is not fixed beforehand. The results can be used for solving a wide class of Dynamic Data Mining and Data Stream Mining problems. 7. References [1] M.A. Aizerman, E.M. Braverman, L.I. Rozonoer, Method of potential functions in the theory of machine learning. Moscow: Nauka, 1970. [2] E. Parzen, “On the estimation of a probability density function and the mode”, Ann. Math. Statist. 3, (1962), 1065-1076. [3] S. Haykin, Neural Networks. A Comprehensive Foundation. Upper Saddle River, N.J.: Prentice Hall, Inc., (1999). [4] T.V. Varyadchenko, V.Ya. Katkovnik, “Nonparametric method of inversion of regression functions”, Stochastic control systems, Novosibirsk: Nauka, (1979), pp. 4-14. [5] Y. Zhon, T. Mu, Zh-H. Pang, Ch. Zheng, “A survey on hyper basis function neural network”, System Science & Comtrol Engineering, 7(1), (2019), 495-507. [6] Ye. Bodyanskiy, O. Tychenko, A. Deineko, “An evolving radial basis neural network with adaptive learning of its parameters and architecture”, Automatic Control and Computer Science, 49(5), 2015, 255-260. [7] V. A. Epanechnikov, “Nonparametric estimation of multivariate probability density”, Probability Theory and its Application, 14 (2), (1968), 156-161. [8] S. Chiu, “Fuzzy model identification based on cluster estimation”, Journal of Intelligent & Fuzzy Systems, 2(3), (1994). [9] N. Karimi, S. Kazem, D. Ahmadian, H. Adibi, L. V. Ballestra, “On a generalized Gaussian radial basis function: Analysis and applications”, Eng. Anal. with Boundary Elements. (2020), vol. 112, 46-57, [10] M. Dehghan, V. Mohammadi, “The numerical solution of Fokker–Planck equation with radial basis functions (RBFs) based on the meshless technique of Kansa’s approach and Galerkin method”, Eng. Anal. Boundary Elements, (2014), vol. 47, 38-63. [11] T. Kohonen, Self-Organizing Maps. Berlin: Springer-Verlag. 1995. [12] N. Kasabov. Evolving Connectionist Systems. London: Springer-Verlag. 2003. [13] A. Frank, and A. Asuncion, UCI Machine Learning Repository, [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2013. [14] A.G. Ivakhnenko, Self-learning systems for recognition and automatic control. Technika, Kyev, 1969. [15] A.G. Ivakhnenko, Long-term prediction and complex systems control. Technika, Kyev, 1975. [16] A.G. Ivakhnenko, H.R. Madala, Inductive Learning Algorithms for Complex Systems Modeling. CRC Press, London-Tokio,1994. [17] Z. Zhao, Y. Lou, Y. Chen, H. Lin, R. Li, G. Yu, “Prediction of interfacial interactions related with membrane fouling in a membrane bioreactor based on radial basis function artificial neural network (ANN)”, Bioresource Techno, (2019), vol. 282, 262-268. [18] Ye .V. Bodyanskiy, O.K. Tyshchenko, A.O. Deineko, Evolving neuro-fuzzy systems with kernel activation function. Saarbruecken, Germany: LAP Lambert Academic Publishing. 2015. [19] R. Xu, D.C. Wunsch. Clustering, IEEE Press Series on Computational Intelligence. Hoboken, NJ: John Wiley & Sons, Inc., 2009.