Adaptive learning of evolving hyper basis function neural
network
Yevgeniy Bodyanskiya, Anastasiia Deinekob, Iryna Plissa and Oleksandr Zeleniyc
a
  Kharkiv National University of Radio Electronics, Control Systems Research Laboratory, Nauky av., 14,
Kharkiv, 61166, Ukraine
b
  Kharkiv National University of Radio Electronics, Artificial Intelligence Department, Nauky av., 14, Kharkiv,
61166, Ukraine
c
  Kharkiv National University of Radio Electronics, Department of Media Systems and Technologies, Nauky av.,
14, Kharkiv, 61166, Ukraine

                                  Abstract
                                  In the article architecture and learning method of the artificial evolving hyper basis neural
                                  network are proposed. The neural network under consideration tunes not only its synaptic
                                  weights, but automatically determines neurons number, coordinates of the kernel activation
                                  function centers and parameters of the receptive fields in online mode providing high speed
                                  data processing.

                                  Keywords 1
                                  Artificial neural networks, adaptive learning, hyper basis function neural network, self-
                                  organizing T. Kohonen’s map, V. Epanechnikov’s activation kernel function

1. Introduction
    To date, artificial neural networks (ANNs) are widely used to solve various problems of Data Mining
and first of all for intelligent control, identification, pattern recognition, classification, clustering,
forecasting, emulation in conditions of uncertainty and significant nonlinearity. If data should be
processed in a sequential online mode, a convergence rate of a learning process comes to the forefront,
that significantly limits the ANNs class suitable for work under these conditions. ANNs, which use
kernel activation functions (radial-basis, bell-shaped, potential), are very effective from the speed
optimization point of view in the learning process.
    Radial-basis function neural networks (RBFN) are widely used because their output signal depends
linearly on synaptic weights. The main idea of this ANNs is connected with potential function method
[1], Parsen’s estimations [2], kernel [3] and nonparametric [4] regressions. Universal approximation
properties and ability to process data sequentially in online mode are its main benefits. However, the
RBFN is exposed to the so-called “curse of dimensionality” which means that when the input space
dimensionality increases, there’s an exponential growth of the adjustable parameters’ (weights’)
amount. To overcome this problem for solving practical tasks possible by using hyper basis function
neural network (HBFN) [5], with have more advantages comparatively to traditional RBFN.

2. Radial basis and hyper basis function neural networks

  In Figure 1 standard architecture of the radial-basis function network is shown whose hidden layer
implements some nonlinear transformation of the input space R! into higher dimension (h > n) hidden


CMIS-2021: The Fourth International Workshop on Computer Modeling and Intelligent Systems, April 27, 2021, Zaporizhzhia, Ukraine
EMAIL: yevgeniy.bodyanskiy@nure.ua (Ye. Bodyanskiy); iryna.pliss@nure.ua (I. Pliss); anastasiia.deineko@.nure.ua (A. Deineko);
oleksandr.zeleniy@nure.ua (O. Zeleniy)
ORCID: 0000-0001-5418-2143 (Ye. Bodyanskiy); 0000-0001-7918-7362 (I. Pliss); 0000-0002-3279-3135 (A. Deineko); 0000-0002-7583-
7759 (O. Zeleniy)
                               © 2020 Copyright for this paper by its authors.
                               Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Wor
    Pr
       ks
        hop
     oceedi
          ngs
                ht
                I
                 tp:
                   //
                    ceur
                       -
                SSN1613-
                        ws
                         .or
                       0073
                           g

                               CEUR Workshop Proceedings (CEUR-WS.org)
space R" and its output layer is formed by adaptive linear associators that form the network response
by performing a nonlinear transformation of the form
                                            "                    "
                        # (k)
                   y(           = w$ + - w% φ% /x(k)1 = - w% φ% /x(k)1 = w ( φ
                                                                             2 /x(k)1
                                           %&'                  %&$
                                                   (
where x(k) = /x' (k), x) (k), … , x! (k)1 , φ% /x(k)1 = φ% (‖x(k) − c% ‖, σ% ) − radial-basis function,
depended of distance ‖x(k) − c% ‖ between input vector x(k) and activation function centers c% and
                                                                                               (
                                                   2 /x(k)1 = 91, φ( /x(k)1; , φ/x(k)1 = 9φ' /x(k)1;,
width parameter σ% , k − is current discrete time, φ
                                (
φ) /x(k)1,…, φ" /x(k)1; .


Figure 1: Standard radial-basis function network

   The most commonly standard Gaussian function as activation function in the radial-basis ANNs is
used in the form
                                                                      ‖+(-)/0! ‖"
                                                 𝜑% /x(k)1 = exp ?−                 @ , l = 1,2, . . . , h       (1)
                                                                          1"!
where centers c% and wight σ% parameters are determined beforehand and do not tuned during learning
process. The learning process itself is connected with adjusting of the synaptic weights vector w =
(w$ , w' , … , w" )( , for that different modifications of least-squares method or traditional gradient
procedures are usually used.
   Using the multidimensional construction instead of the Gaussian (1) it is possible to improve the
approximating properties of the network
                        𝜑% /x(k)1 = exp 9−(x(k) − c% )( Σ%/' (x(k) − c% ); = exp 9−‖x(k) − c% ‖)2#$ ; (2)
                                                                                                             !
where Σ3/' covariance matrix that determines shape, size and receptive field orientation of 𝑙 − 𝑡ℎ kernel
activation function. This is the main difference between hyper basis function network and traditional
RBFN.
   If Σ3 = 𝜎3) 𝐼 (here 𝐼 = (𝑛 × 𝑛) − identity matrix) the receptive field is a hypersphere with a center
𝑐3 and radius 𝜎3 ; if Σ3 = 𝑑𝑖𝑎𝑔(𝜎') , 𝜎)) , … , 𝜎4) ) − that is hyperelipsoid with axes coincide with input
space axes and have length 𝜎5 , at 𝑖 67 axe, and finally, if Σ3 is arbitrary positive definite matrix
                                                 Σ3 = 𝑄3 8 Λ3 𝑄3
diagonal matrix of the eigenvectors Λ3 determines receptive field size and rotation orthogonal matrix
𝑄3 − its orientation.
   Speaking about hyper basis function ANNs learning it should be noted that here not only synaptic
weights vector 𝑤 can be adjusted but and centers 𝑐3 , and matrixes Σ3 . So, introducing into consideration
the transformation implemented by the neural network in the form
                                                       7

                                      𝑦(   9 (𝑘)
                                                   = - 𝑤3 𝜑3 9‖𝑥(𝑘) − 𝑐3 ‖)2#$ ;,
                                                                                %
                                                       3&$
learning criterion
                                                                                           )
                                                          7
                            1         1
                      𝐸(𝑘) = 𝑒 ) (𝑘) = Y𝑦(𝑘) − - 𝑤3 𝜑3 9‖𝑥(𝑘) − 𝑐3 ‖): #$ ;Z
                            2         2                               %
                                                         3&$
(here 𝑦(𝑘) − external reference signal) and derivations by all tuned parameters
                                       ;<(=)
                      ⎧                      = −𝑒(𝑘)𝜑3 9‖𝑥(𝑘) − 𝑐3 ‖)2#$ ; ,
                                        ;>%                            %
                      ⎪                                          )
                            ∇?% 𝐸 (𝑘) = 2𝑒(𝑘)𝑤3 𝜑3 9‖𝑥(𝑘) − 𝑐3 ‖2#$ ; Σ3/' (𝑥 (𝑘) − 𝑐3 ),
                                                 @
                                                                                                              (3)
                                                                   %
                      ⎨
                      ⎪`;<(=)a = −𝑒(𝑘)𝑤 𝜑@ 9‖𝑥(𝑘) − 𝑐 ‖)#$ ; (𝑥(𝑘) − 𝑐 )(𝑥(𝑘) − 𝑐 )8 ,
                      ⎩ ;2#$               3 3           3 2                3             3
                          %                                  %
the learning algorithm could be written [6]:
       ⎧          𝑤3 (𝑘 + 1) = 𝑤3 (𝑘) + 𝜂> (𝑘 + 1)𝑒(𝑘 + 1)𝜑3 9‖𝑥 (𝑘 + 1) − 𝑐3 (𝑘)‖)2#$ (=) ; ,
                                                                                           %
       ⎪
       ⎪         (𝑘
               𝑐3 + 1) = 𝑐3    (𝑘)  − 𝜂? + 1)𝑒(𝑘 + 1)𝑤3 + 1)𝜑3 9‖𝑥(𝑘 + 1) − 𝑐3 )2#$(=) ; ×
                                        (𝑘                 (𝑘        @                    ‖
                                                                                                 %
                                             × Σ3/' (𝑘)(𝑥 (𝑘 + 1) − 𝑐3 (𝑘)),
      ⎨
         /'             /'                                       @                          )
      ⎪Σ3 (𝑘 + 1) = Σ3 (𝑘) + 𝜂2 (𝑘 + 1)𝑒(𝑘 + 1)𝑤3 (𝑘 + 1)𝜑3 9‖𝑥(𝑘 + 1) − 𝑐3 (𝑘 + 1)‖2#$       % (=)
                                                                                                    ;×
      ⎪                                                                         8
      ⎩                      × /𝑥(𝑘 + 1) − 𝑐3 (𝑘 + 1)1/𝑥(𝑘 + 1) − 𝑐3 (𝑘 + 1)1
                                                                                                    (4)
where 𝜂> (𝑘 + 1), 𝜂? (𝑘 + 1), 𝜂2 (𝑘 + 1) − learning rate parameters for the corresponding variables.
   Using as activation functions Gaussians (1) and (2) leads to learning procedure (4) becomes too
cumbersome from computational point of view that naturally slows down the learning speed. In this
regards we propose to introduce into consideration multidimensional modification of the V.
Epanechnikov’s function [7] in the form
                    𝜑3 9‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖)2#$ ; = 1 − ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖)2#$ ,
                                                   %                                   %
whose derivations have the form
                                                  )
                            ⎧ ∇?% 𝜑3 9‖𝑥(𝑘) − 𝑐3 ‖2#$ ; = −2Σ3/' (𝑥(𝑘) − 𝑐3 ),
                            ⎪                       %

                                    ;A% B‖C(=)/?% ‖" #$ D                                                      (5)
                                 ⎨c                &%
                                                          d = (𝑥 ( 𝑘 ) − 𝑐  )( 𝑥 ( 𝑘 ) − 𝑐  ) 8
                                                                                                .
                                 ⎪         ;2#$
                                              %
                                                                          3               3
                                 ⎩
   Relations (5) avoid to rewrite the system (3) in the form
                                  𝜕𝐸(𝑘)
                               ⎧           = −𝑒(𝑘)𝜑3 91 − ‖𝑥(𝑘) − 𝑐3 ‖)2#$ ; ,
                               ⎪    𝜕𝑤  3                                           %
                                                                 /' (𝑥(𝑘)
                                      ∇?% 𝐸(𝑘) = 2𝑒(𝑘)𝑤3 Σ3                  − 𝑐3 ),
                               ⎨ 𝜕𝐸(𝑘)
                               ⎪f          g = −𝑒(𝑘)(𝑥(𝑘) − 𝑐3 )(𝑥(𝑘) − 𝑐3 )8 ,
                               ⎩ 𝜕Σ3/'
and learning algorithm –
                                                                                                    )
               ⎧ 𝑤3 (𝑘 + 1) = 𝑤3 (𝑘 ) + 𝜂> (𝑘 + 1)𝑒(𝑘 + 1) 91 − ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖2#$                % (=)
                                                                                                            ;,
               ⎪ (𝑘
                 𝑐3 + 1) = 𝑐3 (𝑘) − 𝜂? (𝑘 + 1)𝑒(𝑘 + 1)𝑤3 (𝑘 + 1)Σ3/' (𝑘)(𝑥(𝑘 + 1) − 𝑐3 (𝑘)),
                                                                                                               (6)
               ⎨            Σ3/' (𝑘 + 1) = Σ3/' (𝑘) + 𝜂2 (𝑘 + 1)𝑒(𝑘 + 1)𝑤3 (𝑘 + 1) ×
               ⎪                                                                                  8
               ⎩               × /𝑥(𝑘 + 1) − 𝑐3 (𝑘 + 1)1/𝑥(𝑘 + 1) − 𝑐3 (𝑘 + 1)1 .
   It is easy to see that procedure (6) from the computational point of view is simpler that algorithm (4).

3. Hybrid neural network and evolution of its architecture
   The question how to choose number of neurons in the network ℎ and initial center locations 𝑐3 it is
very important. The easiest way to overcome this problem is to use Subtractive Clustering algorithm
[8], that is effective enough for processing data in the batch mode. But this algorithm requires to select
huge amount of the free parameters. If solving task is connected with processing of non-stationary data,
then it is necessary to re-initialize the network from time to time.
   Dynamic Decay Adjustment (DDA) also is one of the possible methods to tune neural network with
kernel activation functions [9]. This method is belonging to constructive learning algorithms and works
fast enough. But, in the mode of online processing of the non-stationary signals this method becomes
noneffective.
   Resource Allocation Network [10] uses hybrid learning based on both optimization and memory
learning (the principle "Neurons in data points"), using of competition elements. At the same time, in
the learning process using gradient procedures, both synaptic weights and the parameters of the
neuron’s centers closest to the received observation are adjusted. It can be noted that the standard
Epanechnikov’s functions are used instead of traditional Gaussians as activation functions in the
network. The disadvantage of Resource Allocation Network is its high computational complexity.
   In this regard, it is necessary to develope artificial evolving hyper basis function neural network,
that tunes not only its all parameters but and automatically determines number of neurons in online
mode with high speed of data processing.
   In the figure 2 architecture of a hybrid evolving artificial neural network that is based on a hyper-
basis neural network with a variable number of neurons and T. Kohonen's self-organizing map (SOM)
[11], that controls their number and adjusts centers location in the self-learning mode is demonstrated.


Figure 2: Structural schema of the hybrid evolving network

   The functioning process of this system is as follows. When the first observation 𝑥(𝑘) is fed to the
neural network input, where the first neuron is formed according to the principle of "Neurons at data
points", i. e. almost instantly. In the next incoming of the data they firstly fed to SOM, where they are
compared with the already existing centroids and then if no coincidences are found, a new center of the
kernel function is formed and, accordingly, a new neuron in HBFN.
   According to the approach under consideration next method of the controlled numbers of neurons
activation function is put into consideration:
   Stage 11: to encode all the values of the input variables into the interval −1 ≤ 𝑥5 ≤ 1 and set the
receptive field radius of the neighborhood function in the interval 𝑟 ≤ 0,33;
   Stage 21: observation 𝑥(1) after this is fed set 𝑐' = 𝑥(1);
   Stage 31: observation 𝑥(2) is set:
   -     if ‖𝑥(2) − 𝑐' ‖ ≤ 𝑟, that 𝑐' (1) is corrected by the rule
                                                      𝑐' + 𝑥(2)
                                             𝑐3 (2) =           ;
                                                          2
   -     if 𝑟 < ‖𝑥(2) − 𝑐' (1)‖ ≤ 2𝑟, 𝑐' (1) is corrected according to the self-learning of the self-
organizing Kohonen’s map by principle of "Winner takes more" (WTM) [11]
                             𝑐' (2) = 𝑐' (1) + 𝜂(2)𝜓' (2)(𝑥(2) − 𝑐' (1))
with neighborhood function
                                                                        )
                                                        ‖𝑥(2) − 𝑐' (1)‖
                                𝜓3 (2) = 𝑚𝑎𝑥 c0,1 − p                    q d
                                                              2𝑟
(Epanechnikov’s function with receptive field with radius equal 2𝑟)
    -    if ‖𝑥(2) − 𝑐' (1)‖ > 2𝑟,
new kernel activation function is formed with center 𝑐) (2) = 𝑥(2).
    This completes the first iteration of the activation functions forming of the hyper basis neural
network. Let us for the 𝑘-th moment of time 𝑝 ≤ ℎ activation functions 𝜑3 /𝑥(𝑘)1 are formed with
centers 𝑐3 (𝑘) and 𝑥(𝑘 + 1)-th observation is fed to processing. Next forming of the activation functions
is performed as follows:
    Stage 1k+1: to determine neuron-winner for each distance ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖ minimal among all
𝑙 = 1,2, … , 𝑝;
    Stage 2k+1:
    -    if ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖ ≤ 𝑟 then
                                                   𝑐3 (𝑘) + 𝑥(𝑘 + 1)
                                      𝑐3 (𝑘 + 1) =                   ;
                                                            2
    -    if 𝑟 < ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖ ≤ 2𝑟,
    -    then 𝑐3 (𝑘 + 1) = 𝑐3 (𝑘) + 𝜂(𝑘 + 1)𝜓3 (𝑘 + 1)(𝑥(𝑘 + 1) − 𝑐3 (𝑘));
                                                                              )
                                                        ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖
                           𝜓3 (𝑘 + 1) = 𝑚𝑎𝑥 c0,1 − p                         q d;
                                                                2𝑟
    -    if ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖ > 2𝑟 then kernel activation function is formed with center
𝑐EF' (𝑘 + 1) = 𝑥(𝑘 + 1);
    -    if in the process of activation functions formation arises a situation ‖𝑥(𝑘 + 1) − 𝑐3 (𝑘)‖ > 2𝑟
and 𝑝 = ℎ, then it is necessary to increase the receptive field radius and return to stage2k+1 with an
increased radius of the function 𝜓3 (𝑘 + 1).
    As can be seen, this procedure is a hybrid of N. Kasabov's evolving algorithm [12] and T. Kohonen's
self-organizing map. However, the proposed neural network is designed not only to solve clustering
problems, but to control the number of neurons in the hyper basis neural network.

4. Evolving cascaded hyper basic function neural network

     If we have a short learning sample then it is possible to overcome the problem of overfitting by
dividing an initial task in some way to subtasks of lower dimensionality and by grouping the obtained
solutions to get a required result. From a computational point of view, the most convenient method in
this case is the Group Method of Data Handling (GMDH) [14-17] that have demonstrated its efficiency
while solving a number of practical tasks. In [17], a multilayer GMDH-neural network is considered. It
contains two-input N-adalines as nodes and each node’s output is a quadratic function of an input signal.
Each neuron’s synaptic weights are defined in a batch mode using the conventional least squares
method. One way use some more hidden layers to provide a necessary approximation quality. That is
why an online learning procedure becomes impossible. In this connection it’s expedient to introduce a
simplified architecture which is based on simple nodes and can be tuned under conditions of a short
learning sample.
     Let’s introduce an architecture of the compartmental R-neuron that is shown in the Figure 4.
Generally speaking, it’s a simplified architecture of the conventional RBFN with two inputs 𝑥5 and 𝑥G ,
𝑖, 𝑗 = 1,2, … , 𝑛, where 𝑛 is a dimensionality of initial input space.
                                                                        5G     5G 5G
     The compartmental R-neuron contains 𝑝 activation functions 𝜑7 /𝑥 5G , 𝑐7 , Σ7 1, 𝑝 + 1 synaptic
                                      5G      5G   5G        5G                                      5G
weights that are joined in vector 𝑤3 = /𝑤3$ , 𝑤3' , … , 𝑤3E 1, 𝑝 two-dimensional centers’ vectors 𝑐7 =
     G 8                                                                    5G
/𝑐75 , 𝑐7 1 , 𝑝 (2 × 2) −matrices of receptive fields’ activation functions Σ7 , a two-dimensional input
                         8
vector 𝑥 5G = /𝑥5 , 𝑥G 1 , output 𝑦(3 ; 𝑙 = 1,2, … , 𝑝; 𝑘 = 1,2, … , 𝑁 is a number of observations in a
processing sample or an index of the current discrete time.


Figure 3: The compartmental R-neuron

                                                                                         5G   5G   5G
   Multidimensional Epanechnikov kernels are used as activation functions 𝜑7 /𝑥 5G , 𝑐7 , Σ7 1
                                         5G        5G   5G                5G )
                                       𝜑7 /𝑥 5G , 𝑐7 , Σ7 1 = 1 − u𝑥 5G − 𝑐7 u   () #$                        (7)
                                                                             H2' I
                                                                                                        5G
which have a bell-shape form determined by a positive definite matrix of the receptive field Σ7 . An
advantage of the activation function (7) comparing to conventional ones is linearity of its derivatives
with respect to all the parameters that makes it possible to adjust synaptic weights as well as both centers
and receptive fields (which is very important when we have a very short learning sample). At the same
time, the compartmental R-neuron implements a transformation in the form
                            E                               E
                      5G       5G 5G  5G 5G 5G        5G       5G              5G )
               𝑦(3 = 𝑤3$ + - 𝑤37 𝜑7 /𝑥 , 𝑐7 , Σ7 1 = 𝑤3$ + - 𝑤37 p1 − u𝑥 5G − 𝑐7 u () #$ q.
                                                                                  H2' I
                           7&'                             7&'
    It should be noticed that if we have a two-dimensional case it is easy to locate centers at the regular
lattice’s nodes and to define receptive fields as circles.
                                                                                      5G           5G 5G
    Introducing a (𝑝 + 1) × 1-vector of activation functions 𝜑5G = 91, 𝜑3 9𝑥 5G (𝑘), 𝑐3 , ΣG ;,
                               8
     5G             5G   5G
…,𝜑E /𝑥 5G (𝑘), 𝑐E , ΣE 1; and a learning criterion
                                                                                 )
               J    J                  )   J    )        J             5G 8 5G
              𝐸3 = ∑=&'/𝑦(𝑘) − 𝑦(3 (𝑘)1 = ∑=&' 𝑒3 (𝑘) = ∑=&' p𝑦(𝑘) − /𝑤3 1 𝜑 (𝑘)q                             (8)
it's easy to obtain a required solution with the help of the traditional least squares method in the form
                                     J                      F J
                                                         8
                               5G       5G (𝑘)    5G (𝑘);                                                    (9)
                              𝑤3 = w- 𝜑        9𝜑          x - 𝜑5G (𝑘)𝑦(𝑘)
                                    =&'                      =&'
where (∙)F is a symbol of the Moore-Penrose inversion. If data are fed sequentially in an online mode,
recurrent from of the expression (8) can be used in the form
                                                                                 8
          ⎧                                   𝑃5G (𝑘 − 1) p𝑦(𝑘) − 9𝑤35G (𝑘 − 1); 𝜑5G (𝑘)q
          ⎪      𝑤35G (𝑘) = 𝑤35G (𝑘 − 1) +                                          𝜑5G (𝑘),
          ⎪                                                8
                                             1 + /𝜑 (𝑘)1 𝑃5G (𝑘 − 1)𝜑 (𝑘)
                                                    5G                  5G

          ⎨                                                  8
          ⎪ (𝑘)                 𝑃5G (𝑘 − 1)𝜑5G (𝑘) 9𝜑5G (𝑘); 𝑃5G (𝑘 − 1)
          ⎪𝑃5G  = 𝑃5G (𝑘 − 1) −                   8                      , 𝑃5G (0) = 𝛾𝐼, 𝛾 ≫ 0.
          ⎩                          1 + /𝜑5G (𝑘)1 𝑃5G (𝑘 − 1)𝜑5G (𝑘)
                                                                                                   (10)
   The algorithms (9) and (10) are effective only in those cases when a required solution is stationary
which means that optimal values of the synaptic weights aren’t variable in time. But most of real-word
practical tasks are characterized by an opposite situation. A high-performance adaptive learning
algorithm with both tracking and filtering properties [18] can be used for adaptive identification of non-
stationary objects and non-stationary time series prediction in the form
                                                                                8
                   ⎧𝑤35G (𝑘) = 𝑤35G (𝑘 − 1) + 𝑝>
                                               /' (𝑘)           5G
                                                      p𝑦(𝑘) − 9𝑤3 (𝑘 − 1); 𝜑5G (𝑘)q 𝜑5G (𝑘) =
                   ⎪
                                                                                                       (11)
                ⎨                     = 𝑤35G (𝑘 − 1) + 𝑝>
                                                        /' (𝑘)𝑒 (𝑘)𝜑 5G (𝑘),
                                                                3
                ⎪                                                  )
                ⎩             𝑝> (𝑘) = 𝛽𝑝> (𝑘 − 1) + u𝜑/𝑥(𝑘)1u , 0 ≤ 𝛽 ≤ 1.
To tune centers and covariance matrices, one could use a procedure
   ⎧                         𝑤35G (𝑘) = 𝑤35G (𝑘 − 1) + 𝜂> (𝑘)𝑒3 (𝑘)𝜑5G (𝑘),
                                                                                )
   ⎪                          /' ( )
                             𝜂>   𝑘 = 𝑝> (𝑘 ) = 𝛽𝑝> (𝑘 − 1) + u𝜑/𝑥(𝑘)1u ,
   ⎪                                                                  /'
   ⎪      𝑐75G (𝑘) = 𝑐75G (𝑘 − 1) + 𝜂? (𝑘)𝑒3 (𝑘)𝑤35G (𝑘) 9Σ75G (𝑘 − 1);
                                                                                      5G
                                                                           9𝑥 5G (𝑘) − 𝑐7 (𝑘 − 1); =
   ⎪
   ⎪                                 = 𝑐75G (𝑘 − 1) + 𝜂? (𝑘)𝑒3 (𝑘)𝑔7 (𝑘),
  ⎨                            𝜂?/' (𝑘) = 𝑝? (𝑘) = 𝛽𝑝? (𝑘 − 1) + ‖𝑔7 (𝑘)‖) ,
              /'                  /'                                                         8
  ⎪9Σ5G (𝑘); = 9Σ5G (𝑘 − 1); − 𝜂 (𝑘)𝑒 (𝑘)𝑤 5G (𝑘) 9𝑥 5G (𝑘) − 𝑐 5G (𝑘); 9𝑥 5G (𝑘) − 𝑐 5G (𝑘); =
       7              7                   2    3     3               7               7
  ⎪
  ⎪                                              /'
  ⎪                              = 9Σ75G (𝑘 − 1); − 𝜂2 (𝑘)𝑒3 (𝑘)𝐺7 (𝑘),
  ⎪
  ⎩                      𝜂2/' (𝑘) = 𝑃2 (𝑘 ) = 𝛽𝑃2 (𝑘 − 1) + 𝑇𝑟 9𝐺7 (𝑘)𝐺78 (𝑘); .
An architecture in Figure 4 combines the GMDH ideas and cascade neural networks.


Figure 4: An evolving cascade neural network

     The first hidden layer of the network is formed similarly to the first hidden layer of the GMDH
neural network [17] and contains a number of neurons that equals to a number of combinations of 𝑛 in
2, that is 𝐶4) . A selection block SB executes a sorting procedure by accuracy, for example, the most
                                           [']    [']∗                                       [']∗
accurate signal among all output signal 𝑦(3 is 𝑦(' in the sense of variations, then comes 𝑦() and the
                 [']∗                 [']∗     [']∗
worst one is 𝑦(M " . The SB outputs 𝑦(' and 𝑦() are fed to the only neuron in the second cascade layer
               +
𝐶𝑅 − 𝑁 [)] that computes a signal 𝑦 [)] which is later joined in the third cascade with the selection
                 [']∗
block’s output 𝑦(N . A process of cascades’ increasing goes on unit the required accuracy is obtained.
An overall neurons’ number of this network is defined by a value 2𝐶4) − 1. The neural network can
process information that is fed in real time by adjusting both its parameters and its architecture in time
[19].

5. Experimental results
   In first series of experiments, we used synthetics generated data set that describes Normal Gauss
bivariate distribution. In this data set number of clusters was chosen arbitrary and all data were
generated in the interval [−1; 1]. Result of processing synthetic data set by evolving hyper basis
function neural network (EHBFNN) are shown at Figure 5.
Figure 5: Result of processing synthetic data set by EHBFNN

    For the next experiments series data set “Wine” and data set “Ionosphere” from UCI Repository [13]
were taken. Efficiency of proposed evolving hyper basis function neural network (EHBFNN) has been
investigated and compared with the standard radial-basis neural network and the standard T. Kohonen’s
self-organizing map (SOM). Firstly, data preprocessing that include outliers and gaps analysis was
made. Investigated data were coding into hypercube. For results visualization principal component
analysis method was used.
    The Figure 6 demonstrates evaluation of the centroid’s coordinates (as a visualization example data
set “Wine” was taken).
    Comparison of numerical results of proposed EHBFNN, standard RBFN and SOM are shown in the
Table 1. Clustering quality were measured by Calinski-Harabash index [19] and in the Table 1
maximum, average and minimum values has been demonstrated.
    This index in general formulation has a form
                                                                        /'
                                              1            1
                               𝐶𝐻(𝑚) =           𝑇𝑟𝑆OP ?         𝑇𝑟𝑆>P @
                                           𝑚−1           𝑁−𝑚
          P     ' P     P     P      /P       P    /P  8
where 𝑆O = J ∑G&' 𝑁G /𝑤G − 𝑤 1/𝑤G − 𝑤 1 , j = 1,2, … , m − the inter-cluster distance
                              '
matrix for 𝑚 clusters; 𝑤 /P = J ∑P     P P                                P
                                 G&' 𝑁G 𝑤G − gravity data set center 𝑥; 𝑁G − number of observation
                                      '                                              8
belonged to the j-th clusters; 𝑆>P = J ∑P       J                  P            P
                                          G&' ∑=&' 𝑢G (𝑘)/𝑥(𝑘) − 𝑤G 1 /𝑥(𝑘) − 𝑤G 1 − intra-cluster
distance matrix for 𝑚 clusters,
                                   1, 𝑖𝑓 𝑥(𝑘) 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 𝑗 − 𝑡ℎ 𝑐𝑙𝑢𝑠𝑡𝑒𝑟,
                             𝑢𝑗 = !
                                                0, 𝑜𝑣𝑒𝑟𝑤𝑖𝑠𝑒
where 𝑢G − crisp membership function.
   In situation when observations came to process sequentially in online mode is it necessary to
organize calculation of the Calinski-Harabash index on the sliding window of dimensional 𝑠
(𝑠 = 1,2, … , 𝑁) and we could rewrite this index in the form
                                        1                                  )
                                             ∑P     P        P
                                              G&' 𝑁G (𝜏)u𝑤G (𝜏) − 𝑤
                                                                    /P
                                                                       (𝜏)u
                        𝐶𝐻(𝑚, 𝑘) =   𝑚  −  1
                                           1                             )
                                               ∑P                  P
                                        𝑁 − 𝑚 G&' 𝑢G (𝜏)u𝑥(𝜏) − 𝑤G (𝜏)u
where
                                                            =
                                          /P (𝜏)
                                                     1
                                     𝑤             =       -       𝑥(𝜏).
                                                     𝑠
                                                         R&=/SF'
Figure 6: Evaluation of the centroid’s coordinates

   Visualizations of clustering results by proposed evolving hyper basis function neural network are
presented at the Figure 7 (Figure 7a – data set “Wine”; Figure 7b – dada set “Ionosphere”)


                          a)                                                  b)
Figure 7: Visualizations of clustering results by proposed evolving hyper basis function neural network:
a) data set “Wine”,
b) dada set “Ionosphere”

Table 1
Comparison of numerical results of investigated ANNs
                                                  Calinski-Harabash index
   Investigated ANNs                avg                      max                         min
                                          “Wine” data set
          RBFN                    65.34                     69.98                       40.93
          SOM                     56.20                     70.67                       47.62
        EHBFNN                    69.52                     70.94                       41.90
                                       “Ionosphere” data set
          RBFN                    62.78                     90.52                       55.35
          SOM                     70.56                     86.16                       64.82
        EHBFNN                    74.70                    122.64                       62.78
6. Conclusions

   The approach, which combines training of both synaptic weights and activation functions’ centers,
and which is based on both supervised learning and self-learning, is proposed in this paper. The main
advantage of the proposed approach is that it can be used in an online mode, when a training set is fed
to a system’s input sequentially, and its volume is not fixed beforehand. The results can be used for
solving a wide class of Dynamic Data Mining and Data Stream Mining problems.

7. References

[1] M.A. Aizerman, E.M. Braverman, L.I. Rozonoer, Method of potential functions in the theory of
     machine learning. Moscow: Nauka, 1970.
[2] E. Parzen, “On the estimation of a probability density function and the mode”, Ann. Math. Statist.
     3, (1962), 1065-1076.
[3] S. Haykin, Neural Networks. A Comprehensive Foundation. Upper Saddle River, N.J.: Prentice
     Hall, Inc., (1999).
[4] T.V. Varyadchenko, V.Ya. Katkovnik, “Nonparametric method of inversion of regression
     functions”, Stochastic control systems, Novosibirsk: Nauka, (1979), pp. 4-14.
[5] Y. Zhon, T. Mu, Zh-H. Pang, Ch. Zheng, “A survey on hyper basis function neural network”,
     System Science & Comtrol Engineering, 7(1), (2019), 495-507.
[6] Ye. Bodyanskiy, O. Tychenko, A. Deineko, “An evolving radial basis neural network with
     adaptive learning of its parameters and architecture”, Automatic Control and Computer Science,
     49(5), 2015, 255-260.
[7] V. A. Epanechnikov, “Nonparametric estimation of multivariate probability density”, Probability
     Theory and its Application, 14 (2), (1968), 156-161.
[8] S. Chiu, “Fuzzy model identification based on cluster estimation”, Journal of Intelligent & Fuzzy
     Systems, 2(3), (1994).
[9] N. Karimi, S. Kazem, D. Ahmadian, H. Adibi, L. V. Ballestra, “On a generalized Gaussian radial
     basis function: Analysis and applications”, Eng. Anal. with Boundary Elements. (2020), vol. 112,
     46-57,
[10] M. Dehghan, V. Mohammadi, “The numerical solution of Fokker–Planck equation with radial
     basis functions (RBFs) based on the meshless technique of Kansa’s approach and Galerkin
     method”, Eng. Anal. Boundary Elements, (2014), vol. 47, 38-63.
[11] T. Kohonen, Self-Organizing Maps. Berlin: Springer-Verlag. 1995.
[12] N. Kasabov. Evolving Connectionist Systems. London: Springer-Verlag. 2003.
[13] A. Frank, and A. Asuncion, UCI Machine Learning Repository, [http://archive.ics.uci.edu/ml].
     Irvine, CA: University of California, School of Information and Computer Science, 2013.
[14] A.G. Ivakhnenko, Self-learning systems for recognition and automatic control. Technika, Kyev,
     1969.
[15] A.G. Ivakhnenko, Long-term prediction and complex systems control. Technika, Kyev, 1975.
[16] A.G. Ivakhnenko, H.R. Madala, Inductive Learning Algorithms for Complex Systems Modeling.
     CRC Press, London-Tokio,1994.
[17] Z. Zhao, Y. Lou, Y. Chen, H. Lin, R. Li, G. Yu, “Prediction of interfacial interactions related with
     membrane fouling in a membrane bioreactor based on radial basis function artificial neural
     network (ANN)”, Bioresource Techno, (2019), vol. 282, 262-268.
[18] Ye .V. Bodyanskiy, O.K. Tyshchenko, A.O. Deineko, Evolving neuro-fuzzy systems with kernel
     activation function. Saarbruecken, Germany: LAP Lambert Academic Publishing. 2015.
[19] R. Xu, D.C. Wunsch. Clustering, IEEE Press Series on Computational Intelligence. Hoboken, NJ:
     John Wiley & Sons, Inc., 2009.