<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Information-Communication Technologies &amp; Embedded Systems, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Adaptive Least-Squares Support Vector Machine and Its Online Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yevgeniy Bodyanskiy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasiia Deineko</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Filip Brodetskyi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danylo Kosmin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Radio Electronics, Artificial Intelligence Department</institution>
          ,
          <addr-line>Nauky av., 14, Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kharkiv National University of Radio Electronics, Department of Informatics</institution>
          ,
          <addr-line>Nauky av., 14, Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>12</volume>
      <issue>2020</issue>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>In this paper the adaptive learning method for least-squares support vector machine (LS-SVM) is proposed. Essential feature of this method is that the minimization criterion of empirical risk is realized on the sliding window of fixed dimension that essentially simplifies numerical implementation of the procedure and allows to process information generated by non-stationary nonlinear objects. For solving wide class of tasks like information processing, system and object identification, ifrst of all significantly nonlinear in conditions of structure and parametric uncertainty, artificial neural networks are wieldy used, because of its universal approximation properties and ability to learn. One of the efective neural network to solve this task are least-squares support vector machines that however couldn't be used for processing increasing data sets, when data are fed to the system in online mode. In the paper adaptive approach for learning LS-SVM in online mode using “sliding window” that permits to solve wide class of the tasks in the common problem of the Data Stream Mining is proposed. Support vector machine is neural hybrid system that combines both learning based on optimization and memory (so-called, lazy learning) and realizes method of empirical risk optimization. The key concept in the synthesis of this network are support vectors that form a small subset of the most informative data vectors allocated in the learning process. Support vector machines are really eficient neural networks in conditions of small datasets, that provide high quality approximation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Neural networks</kwd>
        <kwd>kernel function</kwd>
        <kwd>least-squares support vector machine</kwd>
        <kwd>empirical risk criterion</kwd>
        <kwd>sliding window</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>2. Adaptive learning method for LS-SVM neural network
The main disadvantage of conventional SVM [1, 2] is the numerical cumbersomeness of the
synaptic weight determination procedure, that is reduced to the problem of nonlinear
programming with inequality constraints, the number of which is determined by the size of the learning
sample. From this point of view more efective are support vectors machines based on the least
squares method, however, one way or another both neural networks process data only in batch
mode.</p>
      <p>The transformation realized by the support vector machine can be written in the form
 ̂
 ( ) = (  ) 
 ( ) +  0
the quadratic criterion</p>
      <p>where  
case of LS-SVM) [3] is reduced to the simultaneous setting of the activation functions centers at
the points of the training dataset  ( ),  = 1, 2, ...,  like in the GRNN [4, 5] and optimization of
= ( 
1 , ...,  
, ...,  
ℎ ) , 
 ( ) = ( 
1 ( ), ..., 
ℎ ( )) and its learning (in the
in the presence of system of ℎ =  linear constraints-equations:
function
 (  ,  0 ,  ( ),  ( )) =
=   ( ) + ∑  ( )( ( ) − (  )</p>
      <p>( ( )) −  0 −  ( )) =
1
1


2 
= (  )   +
∑  ( )( ( ) − (  ) 
 ( ( )) −  0 −  ( ))
must be found.</p>
      <p>
        in addition besides synaptic weights   and  0 also k indefinite Lagrange multipliers  ( )
The system of Kuhn-Tucker equations for Lagrangian (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) can be written in the form
1
2
  ( ) = (  )   +

2
      </p>
      <p>∑  2( )
⎪
⎪
where  &gt; 0 - regularization parameter,
In the batch mode LS-SVM tuning is connected with finding the saddle point of the Lagrange
⎪
⎪ 
⎨


 = − ∑  ( ) = 0,</p>
      <p>→−
⎧⎪⎪ ∇   =   − ∑  ( )  ( ( )) = 0 ,
⎪⎪⎪  ( ) =   ( ) −  ( ) = 0,</p>
      <p>
        ⎪⎩  ( ) =  ( ) − (  )  ( ( )) −  0 −  ( ) = 0
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
 − ( × 1) – vector formed by zeros) or
⎪
⎪
      </p>
      <p>∑  ( ) = 0,
⎨⎪⎪  ( ) =  ( ),
⎧
⎪⎪   = ∑  ( )  ( ( )),
⎪⎪  ( ) − (  )  ( ( )) −  0 −  ( ) = 0</p>
      <p>⎩</p>
      <p>
        From the first equation of the system (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) it follows that the synaptic weights depend entirely
of the values of the indefinite Lagrange multipliers, in connection with which the LS-SVM
training is reduced to their definition, and the system (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) can be rewritten in a compact form:
0
      </p>
      <p>, )( Λ( ) ) = (  ( ) )
same Gaussian [6]
1,2,...,,</p>
      <p>
        = 1,2,...,
where Λ( ) = ( (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ),..., ( ),..., ( )) ,   − ( × 1) – vector formed by unities,
{
 ,
,  ( ) = ( (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ),..., ( ),..., ( )) ,
Ω( ) =
Ω
= ( 
) ( ( )) 
− ( ×  ) –  = }
=  ( ( ), ( )) ,
 ( ( ), ( )) – some kernel function that satisfies the conditions of Mercer’s theorem, often the
 ( ( ), ( )) = 
(
−‖ ( ) −  ( )‖2
2 2
)
In this case, the transformation (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) implemented by the support vector machine can be
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
rewritten in the form
̂
 ( ) = ∑ ( ) (, ( )) +  0



0

⎛
⎜⎜ 0
⎜
⎜
⎝
and its parameters  ( ), 0 can be found directly from (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) as
Rewriting (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ) for (k+1)-th time moment as
= (  
0
Ω( ) +  −1
 , )
−1
      </p>
      <p>0
(  ( ) )</p>
      <p>0
=   ( )(  ( ) )</p>
      <p>
        It is clear that the LS-SVM adaptive learning may be organized based on the numerically
simple procedure for matrix inversion in the right part of the system (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ).
( Λ( + 1) ) = (  
Ω( + 1) +  −1
      </p>
      <p>+1, +1 ) (  ( + 1) ) =






= ⎜   Ω( ) +  −1
 ,
−1
1</p>
      <p>
        0
⋮
1 +  −1
 ( (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), ( + 1)) ⎟⎟
 ( (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), ( + 1)) ⎟ =
⎞
⎟
⎟
⎠
⎛
      </p>
      <p>0 ⎞
= ⎜  ( ) ⎟ =
⎜
⎝  ( + 1) ⎠⎟</p>
      <p>
        (  ( ))−1
( →− ( ( ), ( + 1))
→−
 ( ( ), ( + 1)) ×
1 +  −1
→−
 ( )
) (  ( + 1) )
here →− ( ( ),  ( + 1)) = (1,  ( (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ),  ( + 1)), ...,  ( ( ),  (
+ 1))) , →− ( ) = (0,   ( )) and
applying the Frobenius formula for the inversion of block matrices [7], we obtain at a simple
expression for calculating the matrix
      </p>
      <p>( + 1)
= ⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎜
⎝</p>
      <p>→−
⎛  ( ) + (  ( ) ( ( ),  ( + 1)))×



→−
→−
×→− ( ( ),  ( + 1)  )
(1 +  −1 − →− ( ( ),  ( + 1)))</p>
      <p>( ) ( ( ),  ( + 1))−1
−(→− ( ( ),  ( + 1))  ( ))
(1 +  −1 − →− ( ( ),  ( + 1))
 ( ) ( ( ),  ( + 1))−1


→−
→−</p>
      <p>( + 1) =
→−
−(  ( ) ( ( ),  ( + 1)))
(1 +  −1 − →− ( ( ),  ( + 1))) ⎟
⎞
⎟
 ( ) ( ( ),  ( + 1))−1
(1 +  −1 − →− ( ( ),  ( + 1)) ⎟</p>
      <p>
        ⎟
 ( ) ( ( ),  ( + 1))−1
⎟
⎟
⎟
⎟
⎟
⎟
⎠
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
much easier to do using formula (
        <xref ref-type="bibr" rid="ref7">7</xref>
        ).
      </p>
      <p>It is clear that with large volumes of the training set, the inversion of the ( ×  ) – matrices is
3. The LS-SVM neural network learning on the “sliding
window”
{Ω</p>
      <p>in the form
It should be kept in mind that as the training data set usually grows in time, so does the number
of neurons in the neural network, which will sooner or later lead to "curse of dimensionality".</p>
      <p>That is why, it is necessary, in cases that the object which generates data is non-stationary,
that is necessary in situation when object generate non-stationary data (medical data sets, time
series of forecasting electricity consumption, exchange rates). Thus, in this regards better to
organize information processing on the “sliding window” [8, 9], that includes s last observations,
which, in turn, will lead to the fact that the neural network will be formed by s nodes. For
processing non-stationary data more commonly used exponential weighting of old information
method, that in our situation can leads to a significant increasing of kernel functions in the
hidden layer of proposed LS-SVM and this operation will make system very bulky.</p>
      <p>In this case, introducing into consideration the ( ×  ) – kernel function matrix Ω(,  ) =
transformation implemented by a neural network with a fixed number of neurons can be written
( ( )   ( )) =  ( ( ),  ( ),  ),  =  −  + 1,</p>
      <p>−  + 2, ...,  ;  =  −  + 1,  −  + 2, ...,  }

∑
 = − +1




 ̂
(, ,  ) =
 (,  ) (, 
( ),  ) +  0 (,  ).</p>
      <p>
        Thereof parameters  (,  ),  0 (,  )can be found by solving a matrix equation of type (
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
where Λ(,  ) = ( ( −  + 1,  ), ...,  ( −  + 1,  ), ...,  (,  )) ,  (,  ) = ( ( −  + 1), ...,  ( −  +
2), ...,  ( )) .
      </p>
      <p>When ( + 1)-th observation is fed to processing, it should be calculated into Ω( + 1,  ) matrix
and in the same time from this matrix should be deleted observation that was fed to processing
in  −  + 1-th moment of time. Herewith
=</p>
      <p>0
where Λ( + 1,  ) = ( ( −  + 2,  ), ...,  ( −  + 3,  ), ...,  ( + 1,  )) ,
 ( + 1,  ) = ( ( −  +</p>
      <p>
        It is easy to see, that relations (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ), (
        <xref ref-type="bibr" rid="ref9">9</xref>
        ) are essentially an online adaptive algorithm for learning
2), ...,  ( −  + 3), ...,  ( + 1)) .
the neural network of fixed architecture.
      </p>
    </sec>
    <sec id="sec-2">
      <title>4. Experimental results</title>
      <p>In experimental modeling were investigated tuning of LS-SVM [10] model on the several data
sets [10, 11, 12, 13] and the influence of the choosing model parameters on the results of data
processing and classification. For first example data set “Double Donut” was taken, this data set
includes two discharged circles one inside the other artificially generated. To the input of the
neural system the number of observations, the noise level and the factor of their scale between
the inner and outer circles were transmitted. On the Figure 1 initial classification of data set
“Double Donut” is shown.</p>
      <p>
        In the described earlier SVM model there are two parameters that afect learning. Also,
it is necessary to choose the kernel function for learning LS-SVM model. The polynomial,
radial-basis (Gaussian) or linear function can be used for tuning neural network SVM.
are demonstrated.
criterion (
        <xref ref-type="bibr" rid="ref10">10</xref>
        ) was optimized
      </p>
      <p>For comparison of classification results linear, radial-basis and polynomial activation functions
were used. On the Figure 2 results of classification by SVM model with linear activation function</p>
      <p>
        Next, as the activation function the radial basis kernel (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) was used, in which the standard
width parameter, which is adjusted manually was taken, let is note that in the learning process
(
        <xref ref-type="bibr" rid="ref10">10</xref>
        )
(
        <xref ref-type="bibr" rid="ref11">11</xref>
        )
 should be selected:
      </p>
      <p>
        Modifying the output layer of SVM neural network obtained probability of occurrence of
each of the classes. Due to the probabilities of belonging to the class, the 3D graph that will
show the probability of a given point to one of the classes can be built. Consider the influence
of the parameter  from formula (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) on the classification results. The regularization parameter
      </p>
      <p>Thus, for adequate classification it is necessary to choose the parameter so that the classifier
is not very general, and not overfitted.</p>
      <p>=</p>
      <p>1
2 2</p>
      <p>Increasing of this parameter lead to increasing of SVM overfitting and becomes less common.
If parameter  is equal to 0.15 classification results are shown in the Figure 3a and 3b.
(b) The classification results of the</p>
      <p>SVM with radial basis kernel
(a) The surface of probability of the</p>
      <p>2nd class occurrence</p>
      <p>Thus, it is easy to see that radial basis kernel was chosen correctly. And as can be seen at the
Figure 3 separating hypersurface is constructed correctly. But also it should be noted that all
observations that are outside the outer circle will belong to the first class, and others – to the
second. It is interesting to see what will be with increasing of the parameter  . At the Figure 4
is show result of tuning SVM model with  = 10. The influence of the parameter  is illustrated
on the Figure 5. Here as an example SVM model with radial basis kernel was used.</p>
      <p>Based on the results obtained in the first experiment, we can conclude that for this sample
the parameter of the fine does not have much efect. This may be due to the fact that at these
location parameters the dividing hyperplane is found quickly and the error during training is
very small.</p>
      <p>Second part of experimental modeling was made on the data set “Double helix”. This data
set was generated by mathematical equations and it could not be linearly separated. This data
set represents the two helixes, which are inside one another. Figure 6 shows what this data set
looks like.</p>
      <p>The comparative analysis of classification the support vector machine with linear activation
function and the support vector machine with kernel activation function were held. The results
of the experiment represented in Figure 7 - Figure 10.</p>
      <p>As can be seen in the Figure 7, the linear classifier divides the observation space into two
halves, minimizing, as far as possible, the classification error. Since the linear classifier does not
give the desired result, it is necessary to use the radial basis kernel.</p>
      <p>For learning SVM model with kernel activation function the probabilities surface of one of
the classes was developed. The probabilities surface of belonging to the first class with penalty
function and kernel parameter equal to one is presented in the Figure 8.</p>
      <p>As can be seen, the surface does not describe this data set well, as the model parameters
are selected incorrectly for good classification. This is due to the fact that the small value
of the penalty function and radial basis function parameters for complex models create an
insuficiently trained model but generalize it. However, with increased parameters, the model
becomes overfit. The results of classification are represented at the Figure 9.</p>
      <p>As can be seen in the Figure 9, the classification of classes is not done well enough to correctly
classify even those observations that were used in the model learning. Because the model is
very general and cannot create a valid dividing hyperplane, as the model parameters do not
allow to create a more complex model.</p>
      <p>To solve this problem, it is necessary to increase the penalty function parameter or parameter
of radial basis function. Graphs of probability surfaces of the first class with increasing parameter
 and radial basis function represented in the Figure 10a and 10b.</p>
      <p>Based on the results presented in the Figures 10a and 10b, can be said that changing both of the
parameters greatly influences the classification results. At very large values of the parameters,
the model loses the ability to generalize and can classify only those observations that were in
the training sample, which is presented in Figure 11.</p>
      <p>
        (a) Increasing of the penalty function pa-(b) Increasing of the radial basis function
rameter parameter
Figure (
        <xref ref-type="bibr" rid="ref10">10</xref>
        ): Graphs of probability surfaces
      </p>
      <p>In the next series of experiments data set «Breast Cancer Wisconsin Diagnostic» was used.
This data set consist of observations that describe breast cancer diseases of Indian women. This
data set contains features that were calculated based on the digitized results of fine-needle
aspiration taken from the chest weight. Also, for each of the observations the classification
attribute with correct mark is presented: "M" - malignant, "B" - benign. Because this data set
is high dimensional for visualization principal component analysis was used for compression
initial data. The compression results based on the principal component analysis are shown in
the Figure 12.</p>
      <p>As can be seen in this figure observation are not linearly separable. Next 3D compression
was made for building separating hyperplane. In the Figure 13 is demonstrate results of 3D
compression.</p>
      <p>The classification results of data set «Breast Cancer Wisconsin Diagnostic» by LS-SVM with
linear kernel are presented in the table 1.</p>
      <p>As can be seen from these results, the precisions of the model is 85%, among malignant
neoplasms correctly classified about 60%, and among benign - about 99.5%. Based on these data,
we can say that the classification of malignant neoplasms by the model of the LS-SVM does
not give good enough results, as only much more than half of the observations were correctly
classified. But the classification of malignant neoplasms gives a relatively good result.</p>
      <p>The developed model is not good enough to use because a probability of 85% does not provide
an suficiently adequate result to inform the patient or for using to make a diagnosis. The matrix
of the LS-SVM with linear kernel errors is shown in the Figure 14.</p>
      <p>(a) quantitative estimate
(b) percentage estimate
Figure (14): The matrix of the LS-SVM with linear kernel errors</p>
      <p>Thus, after analyzing these results, it is clear that using the linear core for this model is not a
good enough solution. For improving classification results let is activation function and used
redial basis kernel. The results of the classification by the LS-SVM model with radial basis
kernel are represented in the table 2.</p>
      <p>As can be seen from these results, the model of radial basis kernel gives 100% correct
classiifcation, which is practically impossible with real world data. This fact may indicate that the
model is overfeted. In this situation better to divided initial data set into training and testing
sets (size of the testing set is 20% of the total sample). The results of the classification by the
LS-SVM model with radial basis kernel after retraining are represented in the table 3.</p>
      <p>The accuracy of the testing sample is more than 95%, which is a good indicator. And almost
97% of malignant neoplasms were found correctly. Error matrices are presented in Figure 16.</p>
      <p>The experimental research confirms efectiveness of proposed approach for solving task of
Big Data Mining in situation than these data are sequentially fed to processing in online mode.</p>
    </sec>
    <sec id="sec-3">
      <title>5. Conclusions</title>
      <p>The adaptive learning method for least square support vector machine neural network (LS-SVM)
with fixed architecture was proposed. The distinctive feature of this method is that the empirical
risk minimization criterion occurred on the fixed dimension sliding window, which simplifies
the numerical implementation procedures and allows to process information generated by
non-stationary objects.</p>
      <p>The main benefit of the investigated approach is this method could be used in situation when
observations are fed to process in the online mode from nonlinear and nonstationary objects in
conditions of outliers in input data. Also proposed system does not sufer from the curse of
dimensionality because amount of radial basis function in the hidden layer is limited by the size
of the sliding window that helps to protect from the inherited to SVM and LS-SVM curse of
dimensionality.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Chervonenkis</surname>
          </string-name>
          ,
          <article-title>Pattern Recognition Theory (The Nature of Statistical Learning Theory)</article-title>
          , Nauka, Moscow,
          <year>1974</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y. V.</given-names>
            <surname>Bodyanskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. O.</given-names>
            <surname>Deineko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Eze</surname>
          </string-name>
          ,
          <article-title>Kernel fuzzy kohonen's clustering neural network and it's recursive learning</article-title>
          ,
          <source>Automatic Control and Computer Sciences</source>
          <volume>52</volume>
          (
          <year>2018</year>
          )
          <fpage>166</fpage>
          -
          <lpage>174</lpage>
          . doi:
          <volume>10</volume>
          .3103/S0146411618030045.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. A. K.</given-names>
            <surname>Suykens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. V.</given-names>
            <surname>Gestel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Brabanter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Moor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vandewalle</surname>
          </string-name>
          , Least Squares Support Vector Machines, World Scientific, Singapore,
          <year>2002</year>
          . doi:
          <volume>10</volume>
          .1142/5089.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. F.</given-names>
            <surname>Specht</surname>
          </string-name>
          ,
          <article-title>A general regression neural network, ieee trans. on neural networks</article-title>
          ,
          <source>IEEE Transactions on Neural Networks</source>
          <volume>2</volume>
          (
          <year>1991</year>
          )
          <fpage>568</fpage>
          -
          <lpage>576</lpage>
          . doi:
          <volume>10</volume>
          .1109/72.97934.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y. V.</given-names>
            <surname>Bodyanskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. O.</given-names>
            <surname>Deineko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. V.</given-names>
            <surname>Kutsenko</surname>
          </string-name>
          ,
          <article-title>On-line kernel clustering based on the general regression neural network and t. kohonen's self-organizing map</article-title>
          ,
          <source>Automatic Control and Computer Sciences</source>
          <volume>51</volume>
          (
          <year>2017</year>
          )
          <fpage>55</fpage>
          -
          <lpage>62</lpage>
          . doi:
          <volume>10</volume>
          .3103/S0146411617010023.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I.</given-names>
            <surname>Izonin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gregus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tkachenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tkachenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kryvinska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vitynskyi</surname>
          </string-name>
          ,
          <article-title>Committee of sgtm neural-like structures with rbf kernel for insurance cost prediction task</article-title>
          ,
          <source>in: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON</source>
          <year>2019</year>
          ), IEEE,
          <year>2019</year>
          , pp.
          <fpage>1037</fpage>
          -
          <lpage>1040</lpage>
          . doi:
          <volume>10</volume>
          .1109/UKRCON.
          <year>2019</year>
          .
          <volume>8879905</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F. R.</given-names>
            <surname>Gantmacher</surname>
          </string-name>
          ,
          <source>The Theory of Matrices</source>
          , Chelsea Publishing Company, New York,
          <year>1959</year>
          . doi:
          <volume>10</volume>
          .1126/science.131.3408.1216-a.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>O.</given-names>
            <surname>Herman-Safar</surname>
          </string-name>
          ,
          <source>Time based cross validation</source>
          ,
          <year>2020</year>
          . URL: https://towardsdatascience. com
          <article-title>/time-based-cross-validation-d259b13d42b8.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Peterson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Davie</surname>
          </string-name>
          , Computer Networks:
          <article-title>A Systems Approach</article-title>
          , Morgan Kaufmann,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Haykin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Neural</given-names>
            <surname>Networks</surname>
          </string-name>
          .
          <string-name>
            <given-names>A Comprehensive</given-names>
            <surname>Foundation</surname>
          </string-name>
          , Prentice Hall, Inc., New Jersey,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Jupyter</surname>
            <given-names>notebook documentation</given-names>
          </string-name>
          ,
          <year>2020</year>
          . URL: http://jupyter.org/documentation.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Scikit-learn documentation</surname>
          </string-name>
          ,
          <year>2020</year>
          . URL: http://scikit-learn.org/stable/documentation.html.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Numpy</surname>
            <given-names>library documentation</given-names>
          </string-name>
          ,
          <year>2020</year>
          . URL: http://www.numpy.org.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>