The Quality Indicators of Decision Tree and
                     Forest Based Models

                             Sergey Subbotin1[0000-0001-5814-8268]
            1
             National University "Zaporizhzhia Polytechnic", Zhukovsky str., 64,
                               Zaporizhzhia, 69063, Ukraine
                                subbotin@zntu.edu.ua


       Abstract. The problem of quality model creation for models based on decision
       trees and forests is considered. The set of indicators characterizing properties of
       decision trees and forests is proposed. It allows to quantitatively evaluate such
       properties as diversity, equivalence, retraining, confidence in a decision-
       making, hierarchy, equifinality, generalization, nonlinearity, robustness, homo-
       geneity, sensitivity to the input signals, plasticity, variability adaptability, sym-
       metry, asymmetry, emergence (integrity), interpretability (logical transparency),
       learnability, and autonomy as for individually considered single tree model, as
       for an ensemble of tree models (forest).

       Keywords: decision tree, forest, quality, property, model selection


1      Introduction

Automation of decision-making in applied tasks, as a rule, requires the construction of
a decision-making model. To solve the problem of decision-making model construct-
ing on the precedents, a wide class of computational intelligence methods has been
proposed, including neural networks [1-5], neuro-fuzzy networks [6-9], decision and
regression trees [10-17], forests of decision trees [18-22] and etc.
   Usually, the quality of such models is characterized by the error function [1, 2]. As
a result model is selected from several alternative obtained models, which has the
smallest error. Note that for each class of methods, even for the same given training
sample of observations, it is possible to obtain a wide range of different models with
acceptable accuracy. At the same time, achieving the maximum accuracy (the small-
est error) does not guarantee a high level of customer properties of the model.
   Earlier in [23-26] author has proposed a set of indicators, applicable to models based
on neural and neuro-fuzzy networks. However, most of these indicators are not applica-
ble to the models based on decision trees and forests due to their paradigm difference
from network models paradigm. Therefore, it is necessary to develop a quality model
for decision trees and forests providing comparability of its indicators with the quality
indicators of models based on neural and neuro-fuzzy networks proposed earlier in [23-
25].
  Copyright © 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
   The properties of models can be affected not only by the structural parameters, but
also by the properties of the training sample [27-31]. Therefore, it is necessary to take
into account information about the properties of the sample when determining the
quantitative indicators characterizing the properties of models.
   The aim of this work was to create a quality model for models based on decision
trees and forests as a set of quality indicators.


2      Formal problem statement

Let we have a training sample of observations in the form <x, y>, where x = {xs},
xs={xsj}, y={ys}, s = 1, 2, ..., S, j = 1, 2, ...., N, xs is a set of input values (descriptive)
features of s-th sample instance, xsj is a value of j-th input feature of s-th sample in-
stance, y is an output feature value for the s-th instance in a sample, S is a number of
instances in the training sample, N is a number of input (descriptive) features charac-
terizing the instances of training sample.
    Then the problem of model building for the dependence y=f(w, x) on a sample of
observations <x, y> based on the decision tree tree consists in identifying the model
structure f and the values of its parameters w that provide an acceptable value of the
given quality functional of the model F(x, y, f, w) [17].
    The problem of model building based on a forest of decision trees, in turn, can be
represented as a problem of obtaining a set of models forest={treet}, treet =ft(wt, x),
that provide an acceptable value of a given quality functional F(x, y, {ft},{wt}), where
t is a number of a tree in the forest, t = 1, 2, ..., T, T is a number of trees in the forest, f
is a model structure of a t-th tree , wt is a set of model parameter values of a t-th tree
[20].
    It is obviously, that the problem of the quality functional creation of models based
on decision trees and forests requires the determination of a set of indicators {Ii} that
quantitatively describers the properties of the models.


3      Primary model characteristics

Along with the sample parameters described above, we will use such notation for the
characteristics of the samples: <xtest, ytest> is a test sample, Stest is a number of in-
stances in the test sample; N, N' are, respectively, the number of signs in the original
set and in the reduced set of features; f max , f min are, respectively, the maximum and
minimum boundary values of a model output, y max , y min are, respectively, the maxi-
mum and minimum boundary values of output feature.
   The basic properties characterizing the model structure are defined as: M is a num-
ber of levels in a tree, Nn is a number of nodes in the model, N  is a number of nodes
in the  -th layer of a tree, N nmax is a maximum possible number of nodes in a model,
fi is an i-th node function,  min is a smallest possible change of the real number, tak-
ing into account the bit grid of the computer, o ( j ) is a complexity of the j-th node,
which can be defined similarly to [250] in units of elementary operations of addition
and multiplication, iaut
                       *
                          is a characteristic of autonomy of a formation of i-th element
of a model structure (  iaut
                           *
                              = 0, if the inclusion (or exclusion) of i-th node to the model
determined only by the human;  iaut
                                  *
                                     = 1, if the inclusion (or exclusion) of i-th node to
the model is automatically defined by the training method;  iaut
                                                               *
                                                                  = 0.5, if the inclusion
(or exclusion) of i-th node to the model can be determined by the human or training
method),  np (i ) is a characteristics of plasticity of i-th node of the tree, which is
equal to the number of possible states of a node i (for leaf nodes containing singleton
(not containing functions) np (i ) = 1, for the nodes with functions the np (i ) should
be taken equal to the number of different functions that may contained in the node, for
the rest nodes of the tree we should take  np (i ) as a number of branches of the node),
wij is a connection of i-th and j-th nodes ( wij = 0, if i-th and j-th nodes are not con-
nected, and wij = 1, if i-th and j-th nodes are connected).
   Let introduce the notation for the description of a model parameters: wmax , wmin
are, accordingly, the maximum and minimum possible values of a model parameters,
Nw is a number of adjustable parameters of model node, N wmax is a maximum possible
number of adjustable parameters of a model nodes, wimax     min
                                                    , j , wi , j
                                                                 are, respectively, the
maximum and minimum possible values of j-th parameter in i-th node, w is a
smallest possible change in weights taking into account the size of the computer bit
grid, wj is a j-th model parameter, Nw(i) is a number of parameters of i-th model node,
 wi , j is a minimum possible change of j-th parameter of i-th node taking into ac-
count the bit grid size of a computer, sp(i) is a characteristics of a plasticity of pa-
rameters of i-th node ( sp(i) = 0 if the node has no adjustable parameters, otherwise
                N w (i )
set:  sp (i )   round(( wimax
                             , j  wi , j ) / wi , j )
                                    min                 ),
                  j 1

   We define the notation for describing the functioning of a model as: Etr, Еtest are,
respectively, model errors for the training and test samples, E (w) is a model error at a
set of weights w.
   The following notation will be used for the model training method parameters:
                                                          aut
 N met is a the number of training method parameters, N met is a number of parameters
of a training method, which values of are determined automatically without a human
intervention,  aut ( wi ) is a characteristic of autonomy of i-th node parameter values
setting (  aut ( wi ) = 0, if parameter values set only by a human;  aut ( wi ) = 1, if the
values of the node parameters are determined automatically by the training method;
 aut ( wi ) = 0.5, if the values of the node parameters can be determined by the human
and method).
    For tree models in a forest we denote: w f , max , w f , min , accordingly, maximum and
minimum possible values of parameters of a forest model, N w is a number of pa-
                                                                                            f

rameters of a forest model without taking into account the number of parameters of its
trees, T max is a maximum possible number of trees in the forest.
   If necessary to distinguish the use of indicators for the decision tree we will use no-
                             t        tree
tations in the form of I or I                (here I is an indicator, t is a tree number, tree is a
                                                                                                    forest
tree symbol), and for the forest we will use notation of the form of I                                       (here I is
an indicator, forest is a forest symbol).


4       Model Quality Indicators

Diversity is defined by a number of different states of a system. In accordance with
the law of "Requisite variety" of W.R. Ashby, creating of a system able to decide a
problem, which has certain known diversity (complexity), it is necessary to provide
for a system an even greater diversity (knowledge of solving methods) than the diver-
sity of the problem being addressed, or to ensure the ability of the system to create
this diversity within itself (it would have methodology, could have developed a meth-
odic or proposed new methods for solving the problem) [32].
   The absolute indicator of the limiting diversity of the synthesized model based on
the decision tree tree, by analogy with [24], is defined as (1):
                                                                 N w N max
                                              w  wmin               n
                                                                                                                   (1)
                       I div (tree)  round  max
                                                  w
                                                                    ( (i)).  np
                                                                    i 1


   The absolute indicator of the limiting diversity of the synthesized model based on
the forest of decision trees forest is defined as (2):
                                                                    Nwf
                                       w f , max  w f , min               T max
             I div ( forest )  round 
                                                 w
                                                              
                                                                             ( I (tree )).
                                                                                      div       t
                                                                                                                   (2)
                                                                            t 1


   The greater the value of limiting diversity, the wider the range of models we can
obtained on its basis.
   By analogy with [24], we define the diversity indicators of the tree model tree and
forest model forest:
   – in relation to the training sample as (3) and (4):

                                                            I div (tree)                                           (3)
                                 I div (tree,  x, y )                   ;
                                                            I div ( x, y )

                                                            I div ( forest )                                       (4)
                             I div ( forest,  x, y )                       ;
                                                               I div ( x, y )

    – in relation to the population universe as (5) and (6):
                                                               I div (tree)                    (5)
                                    I div (tree, X , Y )                      ;
                                                               I div ( X , Y )

                                                               I div ( forest )                (6)
                                  I div ( forest , X , Y )                     .
                                                                I div ( X , Y )

    The more the value of Idiv(tree,<x, y>) for a single tree, and the value of
Idiv(forest,<x, y>) for the forest, the more will be the model potential for approximat-
ing the relationship represented by the sample. The smaller the value of the corre-
sponding indicator for the model at an acceptable level of error E , the better the ap-
proximation of the sample is.
    The more the value of Idiv(tree, X, Y) for a single tree, and the value of Idiv(forest, X,
Y) for a forest, the more the model will be able to solve the given problem. However,
if the relevant indicator is greater than one, or is near to one, then the model is too
excessive for the problem solving.
    The equivalence of models is determined as follows: two models are equivalent if
they have the same sets of answers (they respond equally to the same input stimuli)
[33].
    The equivalence coefficient of trained models based on decision trees t1 and t2 for
the sample <x, y> we defined as (7):

                                              1 S
                                                S
                                                                             2
                        I eq (t1 , t2 )  exp   f t1 ( x s )  f t2 ( x s ) .             (7)
                                                 s1                          
   The equivalence coefficient of trained forests models can be determined similarly
to the above, replacing the calculated tree outputs with the corresponding forest out-
puts.
   The values of the equivalence indicator will be in the range from zero to one: the
more similar the responses of the models with the same input influences, the greater
the value of the equivalence coefficient.
   Retraining of a model for the training sample x relatively to the test sample
<xtest, ytest>  <x, y> may be defined as (in the given form with substitutions) [24]:
   – for classification problems as (8):

              ( x, x test ) 
                                 1 S
                                   
                                 S x 1
                                        
                                        1 f (xs )  y s  1 Stest
                                                                                         
                                                                  1 f ( xtests )  ytests ;
                                                          S test x 1
                                                                                               (8)


   – for the evaluation problems as (9):

             (x, xtest) 
                             1 S
                               
                             S x1
                                    
                                   1  f (xs )  ys    
                                                        1 Stest
                                                       Stest x1
                                                                        s
                                                                          
                                                              1  f (xtest )  ytest
                                                                                  s
                                                                                       .      (9)


where  is an error threshold.
   Since the error threshold for an instance in practice cannot always be set, as well as
for greater universality and uniformity in solving various problems we define it as
(10):

                                                                                                   
                                                                                                     2

                                 1 S                              f ( xs )  ys                    
                ( x, xtest )    exp                                                       s  
                                 S s1                  max
                                                 s1, 2,...,S    ( y  s
                                                                          )      min        ( y  )  
                                                                              s 1, 2,...,S                 (10)
                                                                                        
                                                                                            2

                    1 Stest                            s
                                                    f ( xtest )  ytest.
                                                                       s
                                                                                          .
                            exp  
                                                   ( ytest )  min ( ytest )  
                                                       s                          s           
                  Stest s1        smax                                              
                                       1, 2,...,S              s 1, 2,...,S
                                                                                              

   These indicators can be used not only for a model based on a single tree, but also
for a forest of decision trees, using as f the output value determined by the ensemble
of forest trees.
   The higher the value of the retraining indicator, the worse the approximating prop-
erties of the model for data that did not used in the training.
   Confidence in a decision-making is a subjective assessment by the model of the
made decision [34].
   Regarding the value at the model output for the instance xs fed to its inputs, we de-
termine the confidence indicator of the model in the made decision (11):

                                                                  N                          
                                                              
                I cert ( x s )  exp  ( f ( x s )  y s ) 2 exp  (C uj ( x )  x sj ) 2  ,
                                                                               s
                                                                                                               (11)
                                                                  j1                        

where C qj is a value of the coordinate on j-th feature of q-th cluster center corre-
                                                                                                           s
sponding to the node of the tree, to which the recognized instance x hits. For in-
stances that are not included in the training set, instead of ys it is possible to substitute
the value of the output feature associated with the center of the corresponding cluster.
   It is possible to estimate the coordinates of the cluster centers for leaf nodes on the
basis of a training sample:
   – for the decision tree constructed on the basis of the training sample, determine
the belonging of the training sample instances to leaf nodes;
   – for every q-th leaf node uq form a cluster C q  {C qj } of instances fallen into this
node, q = 1, 2, ..., Q , where Q is a number of leaf nodes (clusters);
   – for each j-th feature as the coordinate of the cluster center take the arithmetic av-
erage of the coordinates of the instances of the corresponding cluster according to the
corresponding feature (12):

                      1 S s s
            С qj         {x j | x  uq } , j = 1, 2, ..., N; q = 1, 2, ..., Q,
                      S q s 1
                                                                                                               (12)
where S q is a number of instances of the training sample that fell into the q-th node
(cluster).
                                         s                                              s
  – determine the function u( x ) that maps the recognized instance x to the node
number of the tree into which it fell.
  The average confidence of the decision tree for a sample x is defined as (13):

                                                     1 S
                                 I cert ( x)          
                                                     S s1
                                                           I cert ( x s ).                  (13)


   Indicators of subjective confidence of model based on a decision tree will take val-
ues in the range from zero to one: the higher the value, the closer the properties of a
recognized instance to formed cluster center templates, and the more the model is
confident in the made decision.
   The confidence of the forest of decision trees for the instance xs is defined as (14):
                      forest
                   I cert
                                         k
                                             
                             ( x s )   I cert
                                            forest
                                                                
                                                   ( x s , k ) , k  1, 2, ..., K ,         (14)

where
                 forest
              I cert
                                     t
                                         
                        ( x s , k )   I cert
                                          t
                                                                      
                                               | f t ( x s )  k , t  1, 2, ..., T ,       (15)

                                                               forest
 ft is an estimated value of the model output of t th tree, I cert    is an indicator of
confidence of forest models,  ,  are symbols of operators, defining the confi-
                                     k           t
dence of the forest in the decision for the k-th class and the t-th tree, respectively. As
such operators it is possible to use the minimum, maximum, arithmetic mean of the
set of arguments.
   The indicator of averaged confidence of forest for sample x is defined as (16):

                                                     1 S forest s
                                 forest
                              I cert    ( x)           I cert ( x ).
                                                     S s1
                                                                                            (16)


   Indicators of subjective confidence of the forest will take values in the range from
zero to one: the higher the indicator value, the more the trees ensemble confident in
made decision.
   The hierarchical organization of the structure, the integrity and crushability of ele-
ments allows to build models of complex objects from simpler ones; the work of the
hierarchical structure requires that the information element in each hierarchical level
behave as a whole, but when moving from level to level it must be fragmented, and
when moving from the upper hierarchical level to the lower, this fragmentation corre-
sponds to the allocation of its constituent elements, and when moving from the lower
level to the top, it corresponds to the inclusion of a certain part of this element in a
more complex object [33].
   The hierarchy of the model based on the decision tree is defined by analogy with
[25] as (17):
                                             M
                                          2 N 
                               Ih            1              , M  1, Nn  1.             (17)
                                      M ( M  1) N n

   The greater the Ih value , the greater the number of hierarchical levels in the model
with respect to maximum possible number of levels for a given number of nodes Nn.
   Estimate the maximum possible number of levels. Since the maximum number of
levels in the tree will be at the minimum number of outcomes from nodes, then at
each level there should be at least one node with two outcomes, and the rest of the
nodes should be leafy. Moreover, the greatest number of levels will be achieved for
the tree, where only one node at each level (except for the last) has two outcomes, and
the rest are leafy and contain only two nodes at each level. Thus, for the deepest tree,
the number of nodes of the highest level is 1 (root), for the lower level is 2 (leaves),
for the remaining layers is 2, i.e. 1+2(M–1) = Nn . From here we get: M =0.5(Nn–1)+1.
Therefore, for the decision tree we get (18):
                                             M
                                          2 N 
                       Ih                    1                        , M  1, Nn  1.   (18)
                                         3          2
                              0,25 N n  N n  0,75 N n

   The hierarchy of the model based on the forest of decision trees is defined as (19):

                                         I hforest  max {I ht }.                           (19)
                                                        t 1, 2 ,...,T


   The elasticity for a function y(x) on the variable xj in [35] is defined as (20):
                                                     y
                                                      y               y  x j            (20)
                              El x j ( y )  lim            lim                ,
                                            x j  0 x j    x j  0 x  y
                                                                        j 
                                                      xj

where y  y ( x j  x j )  y ( x j ) , xj > 0, y > 0.
                     y( x j )
   The relative elasticity indicator on the variable xj of approximating function y=f(x)
realized by the model at output y trained on a training sample <x, y> is defined simi-
larly to [24] as (21):

                                                  ~s ~                 ~
                                                                                    
                                         1 S  x j f ( ~x js   x j )  f ( ~x js )  ,    (21)
                                            
                         El x j ( y )           
                                        2 S s 1 
                                                 
                                                                 ~ 2
                                                       x j f ( ~x s )
                                                                       j
                                                                                   
                                                                                     
                                                                                     
        ~
where f ~ s is a calculated value at the output of the model when applying normal-
         (x j )

ized values of the features of s-th instance to its inputs; ~f ~ s       is a value of the
                                                              ( x  x )               j     j


model output when applying normalized values of features of s-th instance to its in-
puts, and corrected normalized by x j value of j-th feature of s -th instance to j-th
input.
   The larger the value of elasticity indicator, the more elastic is the model. This indi-
cator applies both to a single tree model and for a forest model.
   Equifinality is a regularity of functioning and development of the system, charac-
terizing its ultimate capabilities [36].
   Relative equifinality of a model based on a decision tree tree defined similarly to
[24] as (22):

                                                 Nw Nn        1 S                          
          I eqf (tree,  x, y )                 max max
                                                          exp   ( f ( w, x s )  y s ) 2 .              (22)
                                                 Nw Nn        S s 1                       
   Relative equifinality will receive the largest value (top-limited by one) for those
models which have reached the maximum possible size and the number of parameters
during synthesis as well as the smallest error (bottom limited by zero) in the learning
process.
   For the forest of decision trees, we determine the relative equifinality indicator as
(23):
                                                                    T
                                                             T
                I eqf ( forest ,  x, y ) 
                                                         T   max    I (tree ,  x, y ).
                                                                   t 1
                                                                          eqf   t
                                                                                                            (23)


   Generalization is the model’s ability to integrate partial data to determine patterns
and prolongate results, that is, after training based on the training set, to give answers
for test sample instances similar to the training sample but not included in it [37, 38].
   The generalization indicator of the decision tree for the training <x, y> and test
<xtest, ytest> samples is determined by analogy with [37, 38] as (24):

                                                            N
                                                                                                      
                     
                             Stest 
                                      ( f ( x p* )  ytest
                                                       p 2
                                                           ) ( x sj  x jptest )2                    
                      1                                                                                 (24)
        I G  1  exp           
                                                             j 1

                                                   N ( y  ytest )
                                                         s       p 2
                                                                                   ( yi  yitest )  0,
                                                                                      s     p     2

                      Stest p1                                                                     
                                                                                                   
                                                                                                       
                                          N
                s  arg min ( xtj  x jptest )2 , j = 1, 2, ..., N, p = 1, 2, ..., Stest.
                          t 1, 2,...,S
                                          j 1


   In a similar way, the generalization indicator for the forest of decision trees will be
determined.
   Generalization indicator will take values in the range from zero to one, and will be
the greater, the smaller the error of a model at instance recognition, and difference of
recognized instance to nearest by features instance of a training sample is more.
   The generalization indicator of the trained model is defined as (25):

                                     NS                                              (25)
                          I gen         exp( ( Etr  Etest ) 2 ).
                                    NwNn

   If generalization indicator is significantly greater than one, then the model shows
great ability to generalization, if the generalization indicator much smaller than one,
then the model does not shows no generalizing properties.
   Generalization indicator for a forest of decision trees is defined as (26):

                                           NS                                        (26)
                           forest
                        I gen      T               exp( ( Etr  Etest ) 2 ).
                                    N N
                                    t 1
                                            t
                                            w
                                                t
                                                n


   The errors here are defined for the ensemble of trees.
   Nonlinearity is a dependency that cannot be explained by a linear combination of
variable inputs [39, 40].
   The nonlinearity indicator for classification problems is defined similarly to [39] as
(27):

                                        S   x p    s              
                                        1 f       1 x   f (x p ) 
                                    S 
                                               
                                                                            
                         2     S
                                          0   S  S                            (27)
                Inl                                                      .
                      S(S 1) s1 ps1                                      
                                                                     
                                                     N

                                                      x s
                                                           j  x p 2
                                                                 j            
                                                   j 1
                                                                               

  The nonlinearity indicator for estimation problems is defined as (28):

                                     S f (x pS 1  (1 S 1)xs )  f (x p ) 
                                 S 
                                                                               
                      2     S
                                                       fmax  fmin              
                              
                                      0                                            (28)
             Inl                                                              .
                   S(S 1) s1 ps1               N
                                                                                
                                                     (x j  x j )
                                                          s     p 2
                                                                                
                                                  j1                          
   The nonlinearity indicator of the classifier will take values in the range [0, 1]: the
greater its value, the more nonlinear is a model. A disadvantage of this indicator is its
applicability only for models with single output, and exclusively for classification
problems.
   The nonlinearity indicator for estimation problems is applicable also for classifica-
tion problems.
   In a similar way, the nonlinearity indicator for the forest of decision trees can be
determined.
   The indicator of compliance of nonlinearities of the sample and model is defined as
(29):
                                             ~     I ( x , y  )                                       (29)
                                             I nl  nl            ,
                                                       I nl

where I nl ( x, y ) is a nonlinearity indicator of the sample, determined according to
[41, 42].
                    ~
   If the indicator I nl is equal to one, then we can conclude that the model corre-
                                                                                               ~
sponds to the sample in complexity. If the indicator I nl is less than one, then the
smaller its value, the greater the effect of retraining will be, and it would show possi-
                                                ~
ble redundancy of a model. If the indicator I nl value exceeds one, then the model is
not sufficient for good approximation (require additional training or change the model
structure).
   Robustness is a model property to reliably solve a problem when receiving incom-
plete and / or damaged data. In addition, the results must be consistent, even if some
part of the model is damaged [43, 44].
   The robustness of the model on the basis of a decision tree in relation to the input
signals is defined as (30):

                                                  min 1 ,  2   min { x sj } 
                                                                            s 1, 2 ,..., S           (30)
                      I   x
                          Rb    min                                                           s 
                                                                                                    ,
                                j 1, 2 , ..., N      max          { x s
                                                                         }    min
                                                  s 1, 2 ,..., S j s 1, 2 ,..., S j   { x  }

where

                               2                            min                          2
                                                                                               x ,
                                                                                                   j
                                      1 S   s* xbs*  xbs ,b  j ,b1, 2 ,..., N  s 
                                          f x                                           y   
                                      S s 1   x sj*  x sj  x j                  i 
                                                                                           


               x j  min { x sj }   x  max { x sj }  min { x sj } ,...,
                        s 1, 2 ,..., S        s 1, 2 ,..., S       s 1, 2 ,..., S     
                      max { x j }   x  max { x j }  min { x j } ,
                                        s                       s                     s
                    s 1, 2 ,..., S        s 1, 2 ,..., S       s 1, 2 ,..., S       
x 0,1 is a constant that regulates the accuracy of determining a robustness indicator
on model inputs.
   The indicator is normalized smallest change in the input signal, resulting in a sig-
nificant increase of model error.
   The robustness of a model based on a decision tree with respect to weights (pa-
rameters) is defined as (31):

                                                min 1 ,  2   wmin                                (31)
                             w
                           I Rb           min                         ,
                                                wmax  wmin
                                       j 1, 2 , ..., N w
                                                                        
                         1                            min                      w ,
                                                                        
                                  1 S                          2                         j
                                         s                   s
                                     f x w j  w j  w  y i  
                                  S s 1


                        2                             min                      w ,
                                                                        
                                  1 S                          2                             j
                                         s                   s
                                     f x w j  w j  w  y i  
                                  S s 1


           w j  wmin  wwmax  wmin  ,..., wmax  w wmax  wmin ,

where  w  0, 1 is a constant regulating accuracy of determination of the robustness
indicator on model parameters.
   The indicator is a normalized least change in weight values, leading to significantly
increase in model error.
   The integral robustness indicator for a decision tree can be defined as (32):

                                          I Rb  I Rbx I Rbw.                                   (32)

   The indicator IRb will take values in the range [0, 1]. The closer its value to zero,
the lower the robustness of the model, the more sensitive the model to a change in
input signals or parameter values. The closer the value of the indicator IRb to one, the
greater the robustness of the model, the less sensitive the model to changes in input
signals or parameter values.
   In this way, robustness indicators for the forest of decision trees can be determined.
   The homogeneity of the elements lies in the fact that models are built from many
simple unified standard elements that perform elementary actions and are intercon-
nected by various connections [45].
   The homogeneity of the functions of the nodes of a decision tree is defined by
analogy with [25] as (33):
                                            Nn         Nn
                                          2   {1 | f i  f j }
                                            i 1 j  i  1
                           I hn                                                     .           (33)
                                                 N n ( N n  1)

   The homogeneity indicator will vary from zero to one: the more its value, the more
uniform the corresponding elements of the model.
   The homogeneity of the node functions of the of the forest trees is defined as (34):
                                                 T

                                                I N ( N  1)
                                                            t
                                                            hn
                                                                     t
                                                                     n
                                                                             t
                                                                             n
                            I    forest
                                              t 1                              .               (34)
                                hn                 T

                                                  N ( N  1)
                                                     t 1
                                                                 t
                                                                 n
                                                                         t
                                                                         n


   The sensitivity to the input signals is characterized by calculating the partial de-
rivatives of the model error function [46–48]. However, this approach is computation-
ally hard.
   The averaged normalized indicator of the sensitivity of the output of the decision
tree to a change in the input signal is defined as (35):
                                                          S    N
                                       1
                                                           max  ,  ,
                   I tol                                                               (35)
                                
                             SN y   max
                                           y     min
                                                         s 1 j 1
                                                                      1   2


                                                                                  2
                             x bs *  x bs , b  j , b  1, 2 ,..., N ,       
                1   f  x s* s*                                          ys  ,
                             x j  x sj   min                             i
                                                                                 
                                                                              
                                                                                  2
                              x bs *  x bs , b  j , b  1, 2 ,..., N ,       
                 2   f  x s* s*                                          ys  .
                              x j  x j   min
                                            s                                  i
                                                                                  
                                                                               

   The value of the sensitivity indicator will be in the range [0, 1]. The higher the
value of the sensitivity indicator, the stronger the model reacts to changes in the input
signal, the greater are its categorization capabilities. However, too high sensitivity
may indicate a weak model resistance to noise and interference in the input signal.
          The average indicator of the sensitivity of the forest to changes in the input
signal is defined as (36):

                                                1 T
                                I tolforest       I tol (treet ).
                                                T t 1
                                                                                        (36)


   Plasticity determines the complexity of the model’s behavior, which is considered
as a result of the interaction of many elements, each of which limits the action of oth-
ers and is limited by others on the way to the formation of global observable behavior
[49, 50]. As an analogue of neural plasticity, where neuron nodes are considered as
plastic elements for neural network models, with respect to decision trees, we will
consider the plasticity of nodes. As an analogue of synaptic plasticity (modification of
the strength of the synaptic connection between nodes, implemented by the scales in
neural network models) as applied to the decision tree, we will consider the plasticity
of tunable parameters of tree nodes.
   The relative indicator of plasticity of the nodes of the model by analogy with [25]
is defined as (37):
                                                  Nn

                                                   (i) np
                                      I np  i 1max max .                              (37)
                                             N n np

   The indicator Inp will take values in the range from zero to one: the greater its
value, the higher the level of plasticity of the model nodes.
   The relative plasticity indicator of the adjustable model parameters, by analogy
with [25], is defined as (38):
                                               Nn

                                                (i) sp
                          I sp                i 1            .                     (38)
                                              w  wmin 
                                   N n2 round max      
                                                 w    
                                                       
   The coefficient Isp will take values in the range from zero to one: the bigger its
value, the higher the level of plasticity of the model parameters.
   The relative indicator of plasticity of a model is defined as (39) [25]:

                                         I pl  I np I sp .                          (39)

   The relative indicator of plasticity will take values in the range from zero to one:
the greater its value, the higher the level of plasticity of the model and, therefore, it
has better adaptive abilities.
   For the forest of decision trees, we define the plasticity indicators as (40)–(42):

                                                  1 T t
                                   I npforest       I np ,
                                                  T t 1
                                                                                     (40)


                                                  1 T t
                                   I spforest       I sp ,
                                                  T t 1
                                                                                     (41)


                                                  1 T t
                                   I plforest       I pl .
                                                  T t 1
                                                                                     (42)


   Variability is an ability to obtain several different models for approximating de-
pendencies from the same data sample using the same method [51-53]. As applied to
decision trees, the variability of models is determined by the choice of a feature for
the root node and the order in which features are added for checks at other nodes, the
method of determining threshold values in nodes, etc.
   The absolute indicator of the variability of the model we define similarly to [25] as
(43):
                              M     N
                         I v   v (n(, i))v ( w(, i )),                        (43)
                              1 i 1


where v (n(, i)) is a variability of the verification of the i-th node of  -th layer of
the model: v (n(, i)) = 1, if a non-random feature hit into the node; v (n(, i)) =N,
if a feature for checking in the node selected as random from all original feature
set v (n(, i)) = N* , if a feature for checking in the node is randomly selected from
the set of N* not yet considered features (44):
                                                  1 N 
                                    N (*,i )   1  i  1 ,                                     (44)
                                                  1 j 1 

where v ( w(, i)) is a variability of determining the values of the parameters of i-th
node of  -th model layer: v ( w(, i)) is equal to the number of parameters in the
node that can be configured non-deterministically, if all parameters in the node de-
pend on previous nodes, then v ( w(, i)) = 1.
   The more the Iv value the more different models can be obtained based on the cor-
                             ,

responding paradigm.
   The absolute indicator of the variability of the forest of decision trees is defined as
(45):
                                                         T
                                           I vforest   I vt .                                        (45)
                                                         t 1


   Noise resistance is the property of the model to provide the correct response to an
input signal containing noise [54].
   The indicator of the resistance of the trained model to additive noise in the input
signal at the j-th input is defined similarly to [24] as (46):

                                               1 S            
                                 I tol j  exp   1   2 ,
                                      
                                                                                                       (46)
                                               2 s 1         

               1  ( f ( x s )  f ( ( j  ) x s )) 2 ,  2  ( f ( x sj )  f ( ( j  ) x s )) 2 ,
                              xgs , g  j , g  1,2,..., N ;
                             
                        xs   s 
                  ( j ) g
                              xg   pmax                           
                                                                             
                                                      xgp  min xgp , g  j ,
                                         1, 2 ,...,S      p 1, 2 ,...,S

                              x gs , g  j , g  1,2,..., N ;
                             
                        xs   s 
                  ( j ) g
                              x g   pmax                             
                                                         xgp  min x gp , g  j ,
                                                                             
                                          1 , 2 ,..., S      p 1, 2 ,...,S

where  is a given noise level, 0 <  < 1. In order to automate the process of  set-
ting it is proposed to use such formula (47):

                                                     
                                           min x sj  x jp y s  y p 
                                           s1, 2,...,S ;                
                                                                                
                         min  p s1,...,S                               .                         (47)
                          j 1, 2 ,..., N
                                                       
                                            max x jp  min x jp 
                                            p 1, 2,...,S p 1, 2 ,...,S
                                                                           
                                                                            

   The greater the value of the indicator of model resistance to noise in the input sig-
nal on j-th input, the less important this input to the decision making.
   This indicator is applicable both to a model based on a single decision tree and to a
forest-based model.
   Adaptability is a property of structures to dynamically and independently change
their behavior in response to an input stimulus [55]. In relation to a model based on
decision trees, adaptability is determined, first of all, by plasticity, which determines
the resources for adaptation: the greater the plasticity, the more adaptive the proper-
ties of the model.
   Plasticity is a necessary but insufficient prerequisite for adaptability. Along with
plasticity, the adaptive properties of the model are influenced by the sensitivity of the
model, which determines the strength of the reaction of the model to the minimum
change in the values of its parameters.
   The adaptability indicator of a model is defined as (48):

                                     I adapt  I pl  I tol .                          (48)

   The larger the adaptability indicator value, the greater the possibility has models to
adapt to a given task.
   The adaptability indicator in this way can be determined for the forest of decision
trees.
   Symmetry reflects the proportionality in the arrangement of the parts of the whole
in a space, the complete correspondence (in location, size) of one half of the whole to
the other half [56]. In relation to decision trees, it is possible to determine the indica-
tors of symmetry and asymmetry.
   The indicator of symmetry of the structure of the decision tree is defined as (49):


                                                                   
                                            Nn Nn
                                       2
                          n
                        I sym       2        1 fi  f j .
                                  N n  N n i1 j i1
                                                                                       (49)


   The greater the Insym value the greater the symmetry of the structure of the decision
                              ,

tree.
   The asymmetry indicator of the model structure based on the decision tree as (50):
                                       n
                                     I asym  1  I sym
                                                    n
                                                        .                              (50)

   The greater the value of Inasym the greater the asymmetry of the model structure.
   The symmetry indicator of the nodes of the decision tree is defined as (51):

                             w
                           I sym 
                                      1 Nn Nn
                                        2 
                                     N n i 1 j 1
                                                               
                                                   1 wij  w ji .                      (51)


   The higher the Iwsym value the more the symmetry of model connections.
   The asymmetry indicator of model connections based on the decision tree is de-
fined as (52):
                                       w
                                     I asym  1  I sym
                                                    w
                                                        .                              (52)
   The larger the Iwasym value the greater the asymmetry of the model connections.
   The general indicator of the symmetry of a model based on a decision tree is de-
fined as (53):

                                  I sym  I sym
                                            n     w
                                                I sym .                             (53)

   The higher the value of Isym the bigger the symmetry of the decision tree .
   The general indicator of the asymmetry of a model based on a decision tree is de-
fined as:

                                   I asym  1  I sym.                              (54)

   The larger the Iasym the greater the asymmetry of the model.
   For the forest of decision trees, the symmetry and asymmetry indicators can be de-
termined as the average values of the corresponding indicators of the individual trees
included in the forest.
   Emergence (integrity) is a regularity that manifests itself in the system in the ap-
pearance of new properties in it, which are absent in its elements. The integrity prop-
erty is associated with the purpose for which the system is created. Let Co is an intrin-
sic complexity, which is the total complexity (content) of system elements without
interconnecting them (in the case of pragmatic information, the total complexity of
the elements that affect the achievement of the goal), Cv is a mutual complexity char-
acterizing the degree of interconnection of elements in the system (i.e. the complexity
of its schema or structure). The degree of system integrity in accordance with [24, 57,
58] is defined as (55):

                                   I   Сv / Co .                                  (55)

   The emergence of a model based on a decision tree is defined similarly to [24] as
(56):
                                     Nn Nn

                                       (i, j ) v

                               I   i 1 Njn1           .                         (56)

                                          o ( j )
                                         j 1


  The larger the I  value, the more holistic is the model.
  For a forest-based model, emergence is defined as (57):

                                               T2                                   (57)
                                  I forest  T     .
                                              I t

                                                t 1


  Interpretability (logical transparency) is a model property to be understandable for
human perception and analysis [59, 60]. Obviously, a model is more interpretable if it
is hierarchical, and the average number of node connections does not exceed 5-7 (this
number is caused by the peculiarities of the human psyche). Since each node in the
decision trees has only one input, it is necessary to consider mainly the number of
outcomes from the node.
   Heuristically we define interpretability through hierarchy and the number of nodes
[25] as (58):

                                                                     Ih
                            I interp.  N n N n                                            .       (58)

                                                {w | j  i}
                                                 i 1 j 1
                                                                       ij


  The level of model interpretability increases with increasing of Iinterp. value.
  For the forest of decision trees, we define the interpretability as (59):

                                                     1  max {I ht }
                                                              t 1, 2 ,...,T
                                I    forest
                                    interp.                               2
                                                                                       .           (59)
                                                                2N      n


   Learnability is the property of a model to improve its work (to learn or adapt), us-
ing examples to turn it to solve a particular problem [61].
   The decision tree model learning indicator similarly to [25] is defined as (60):

                                                     I pl L(tree)
                                        I lr                                  ,                   (60)
                                                              NSL
where L is the Lipschitz constant for the training sample [62] as (61):

                       L        max              y  y / x  x ,
                            s , p 1, 2 ,...., S ;
                                                          s            p           s           p
                                                                                                   (61)
                                   s p


L(tree) is a Lipschitz constant (complexity) of the model [63, 64], which, as applied to
the binary decision tree, is estimated as (62):
                                                     N'              N '/ K
                                                                             N'
                        L(tree)  10                 K
                                                               N ' K    .                      (62)
                                                                      0  2 


   The greater the value of the learnability indicator, the model has bigger potential
for solving the problem of approximating dependence y = f(x) given in tabular form.
   For a forest of decision trees, we define the learnability indicator as (63):
                                                      T

                                                      I L(tree )
                                                                t
                                                                pl             t
                                    I lrforest  t 1                              .               (63)
                                                                NSL
   Autonomy is an agent's ability to act without direct human intervention by control
on its own actions and internal state. Autonomy also implies the possibility of learn-
ing based on experience [65].
   Since the trained computational intelligence models, as a rule, in the process of
their functioning in decision-making does not require human intervention, they are
equally have a property of autonomous functioning. However, in the process of train-
ing the level of autonomy for different models and different training methods may
vary considerably.
   Therefore, we will consider further characteristics of model autonomy only in rela-
tion to the process of its learning. Since the ability to learn is determined by plasticity,
the autonomy of learning (self-adaptivity) will be characterized by an indicator that
depends on the plasticity characteristics of the model. On the other hand, the depend-
ence of model learning from the human may be characterized by its influence (por-
tion) on the formation of the structure and parameters of the model.
   Combining these considerations, we obtain the indicator of autonomy of model
training method (64):

                                                                                    
                                                  Nn
                                                                 i*
                                                                        aut ( w i )
                                1 N   aut                       aut
                                                                                               (64)
                        I aut         met
                                           i 1                                           .
                                1  N met                          Nn

   As Iaut increases, the level of model autonomy in the training process increases.
   For a forest of decision trees, the indicator Iaut can be defined as a smallest of the
indicators of forest trees (65):
                                 forest
                              I aut      min { I aut
                                                  t
                                                      }.                                       (65)
                                             t  1 , 2 ,..., T


5      Integral indicators of model quality

Information quality criteria is a family of integral indicators, depending on the model
error E , the training sample volume S and the number of adjusted model parameters
Nw . They include Hannan-Quinn Criterion [66], Bayesian Information Criterion [67],
Akaike's Information Criterion [68], Corrected AIC [69], and Unbiased AIC [69]. A
number of criteria in addition to the error, the sample size and the number of adjust-
able parameters also take into account the maximum possible number of adjustable
parameters N wmax . They include a Minimum Description Length [70] , Shortest Data
Description [71], Consistent AIC [69] and Mallow Criterion [72].
   At model constructing and comparing it is usually assumed to be identical the
sample size. Therefore, it is advisable to exclude the sample size from the comparison
criteria. At the same time, various synthesized models may not use all of the features
of the sample. Therefore the number of features used in the models N' should be seen
as an important property of the models at their comparison.
   On the basis of these considerations we can define integral information criterion of
a model as (66):
                                              N '  E .
                                   IIC  1       e
                                                max 
                                                                                      (66)
                                           NN w 
   The IIC criterion will take values in the range from zero to one. The less its value
the worse the model, and the bigger its value the better the model. Here, for different
models, the error values E and the maximum number of adjustable parameters N wmax ,
as well as the number of used features N', may differ. At the same time the number of
features in the original set N will not differ, but is used in the formula (66) for normal-
izing N'.
   This indicator is applicable for comparing models based on single decision trees
and for ensembles (forests) of decision trees. For the case of ensembles, N will remain
unchanged, and the error E, the number of selected features N' and the number of
adjustable parameters N wmax will be determined for the entire ensemble of trees.
   The effectiveness (quality) of a problem solving by the model is determined by the
accuracy (error) of problem solving for the training and test data, simplicity, logical
transparency and speed of the resulting model, as well as by the cost of model build-
ing (by hardware requirements, iteration and time spent of training method).
   The generalized indicator of the model effectiveness based on the indicators pro-
posed above is determined by analogy with [25] as (67):

                                   I ef  I gen I lr exp(  E ).                      (67)

   The indicator Ief can be used as for comparing the models and methods of their
synthesis, as for optimization of the model building process.
   Similarly, the Ief indicator can be determined for a forest of decision trees.
   In this case, the error E and indicators I gen , I lr will be determined for the entire
ensemble of trees.
   An alternative generalized indicator of model efficiency may be defined by anal-
ogy with [25] as (68):

                             2          N N wmax N nmax NS 
                 I ef '       arcsec                      exp  E ,            (68)
                                       N' Nw Nn Nw 
                                  N '  1, N nmax  1, N nmax  1 .

  Having resulted similar, we receive (69):

                              2          N 2 SN wmax N nmax 
                  I ef '       arcsec                      exp  E ,
                                           N  ' N
                                                     2
                                                       N                             (69)
                                                  w     n   
                                 N '  1, N nmax  1, N nmax  1 .

  An alternative generalized efficiency indicator Ief' may be used as for comparing
models and methods for their synthesis, as to optimize the process of model building.
   Ief ' indicator can be used also for models based on forest of decision-trees. For the
case of ensembles, N and S will remain unchanged, and the error E, the number of
selected features N', and the number of adjustable parameters N wmax will be deter-
mined for the entire ensemble of trees.


6      Results and Discussion
The set of indicators proposed above is extensive and for its application in practice it
is advisable to analyze the proposed indicators.
    The Fig. 1 presents the classification of a set of proposed indicators characterizing
the properties of models based on decision trees and forests.
    Horizontally at Fig. 1, indicators are divided into groups according to the complex-
ity of data compilation: sample (indicators characterize the properties of the sample
and are model independent), tree model (indicators are defined for a single decision
tree model), forest (indicators are defined for a set of decision tree models of a forest).
    The more to the right the indicator is located horizontally at Fig. 1, the higher the
level of complexity of data generalization by the model is required to determine it.
    Vertically at Fig. 1, indicators are divided into groups according to the level of
computational complexity relative to primary characteristics: basic properties (easily
identifiable characteristics of data and models), primary indicators (indicators deter-
mined on the basis of basic properties), secondary indicators (indicators determined
on based on primary indicators), and integrative indicators (indicators determined on
the basis of indicators of previous levels).
    The higher the level of the indicator, the more difficult it is to calculate it with re-
spect to the primary properties of the data and models.


7      Conclusion

The problem of creation of a quality model for models based on decision trees and
forests is solved.
    The set of indicators characterizing properties of decision trees and forests is pro-
posed. It allows to quantitatively evaluate such model properties as diversity, equiva-
lence, retraining, confidence in a decision-making, hierarchy, equifinality, generaliza-
tion, nonlinearity, robustness, homogeneity, sensitivity to the input signals, plasticity,
variability adaptability, symmetry, asymmetry, emergence (integrity), interpretability
(logical transparency), learnability, and autonomy as for individually considered (sin-
gle) tree model, as for a an ensemble of tree models (forest).
    The prospects of further study are to obtain estimates of the computational (time)
and spatial (memory) complexity of calculating the proposed indicators, to conduct an
experimental study of the set of the proposed indicators for assessing the properties of
models in solving practical problems of diagnosis and automatic classification on
features, to identify the relationships between different indicators of the properties of
models based on decision trees and forests.
Integrative                                    I ef        I ef '          IIC
indicators

                                I Rb                          ~                      I asym                      forest
                                                                                                              I cert    (x)
                                                              I nl
                                I interp.                                              I sym                   forest
                                                                                                            I cert    (xs )
                                                             I gen
Secondary                       I cert (x)                                              n
                                                                                      I asym                       I plforest
indicators                                                    I lr
                                Idiv(tree,  x, y )                                    w
                                                            I adapt                   I asym     I div ( forest ,  x, y )
                                I div (tree, X , Y )
                                                                                        I pl        I div ( forest , X , Y )


              I nl ( x, y )   I div (tree)                   IG                          E               I div ( forest )
                                                                                                              forest
                                                                                        n
                                                                                                           I cert    (xs , k)
                                Icert(xs )                     I nl                   I sym
              I div ( x, y )                                                                                       I hforest
                                                                                                                      forest
                                Ih                            I aut
                                                                                        w
                                                                                      I sym                        I gen
                                                                                                                     forest
              I div ( X , Y )                                                                                     I aut
                                Ieqf (tree,  x, y )      L(tree)                      I np                        I lrforest
                                                                                                                     forest
              L                   x                                                                                Iinterp.
Primary                         I Rb                           I                       I sp
indicators                                                                                                        I forest
              I nl ( x, y )   I    w
                                                               Iv
                                     Rb                                          El x j ( y )                       I vforest
                                                                                                                   I npforest
                   q
                                I tol                         I hn           ( x, xtest )
              C    j                                                                                               I spforest
                                I tol j
                                       
                                                          I eq (t1, t2 )                                           I hnforest
                                                                                                                   I tolforest
                                                                                                I eqf ( forest ,  x, y )


              xs j               N'                            w            v (n(, i))                              treet
                                 M                          wmax
                                 Nn
              ys                    N nmax                  wmax            v ( w(, i))                         w f ,max
                                    N                      w                        N met
              S                  f                            Nw                                                  w f ,min
                                     f max                   N wmax                    aut
                                                                                     N met
Basic                             f min
properties
              N                                               wij
                                 fi                                                                                   Nwf
                                  min                      wimax
                                                              ,j
                                                                              aut ( wi )
              y max              o ( j )                   wimin
                                                               ,j

                                  iaut
                                     *                       wj                                                     T max
                                                            Nw(i)
              y min              np (i )
                                 sp(i)                    wi , j

Level             Sample                                Tree model                              Forest model

                           Fig. 1. Analysis of decision tree and forest indicators
References
 1. Haykin, S.: Neural Networks and Learning Machines. Pearson, London (2008).
 2. Bishop, C.: Neural networks for pattern recognition. Oxford University Press, New York
    (1995).
 3. Subbotin, S.: The special deep neural network for stationary signal spectra classification.
    In: Proceedings of 14th International Conference on Advanced Trends in Radioelectronics,
    Telecommunications and Computer Engineering (TCSET 2018), Slavske, pp. 123-128.
    IEEE, Los Alamitos (2018). doi: 10.1109/tcset.2018.8336170
 4. Subbotin, S.: Neural network modeling of medications impact on the pressure of a patient
    with arterial hypertension. In: Proceedings of the International Conference on Information
    and Digital Technologies (IDT 2016), Zilina, pp. 249-260. IEEE, Los Alamitos (2016).
    doi: 10.1109/dt.2016.7557182
 5. Subbotin, S.A.: The neural network model synthesis based on the fractal analysis. Optical
    Memory and Neural Networks (Information Optics), 26(4): 257-273 (2017). doi:
    10.3103/s1060992x17040099
 6. Kumar, K. V.: Neural networks and fuzzy logic. S. K. Kataria & Sons, New Delhi (2009).
 7. Rutkowska, D.: Neuro-Fuzzy Architectures and Hybrid Learning. Studies in Fuzziness and
    Soft Computing, Springer, Berlin (2002).
 8. Oliinyk, A.O., Zayko, T.A., Subbotin, S.O.: Synthesis of Neuro-Fuzzy Networks on the
    Basis of Association Rules. Cybernetics and Systems Analysis, 50(3): 348-357 (2014).
    doi: 10.1007/s10559-014-9623-7
 9. Subbotin, S.: The neuro-fuzzy network synthesis and simplification on precedents in prob-
    lems of diagnosis and pattern recognition. Optical Memory and Neural Networks (Infor-
    mation Optics), 22(2): 97-103 (2013). doi:10.3103/s1060992x13020082
10. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classiﬁcation and regression trees.
    Chapman and Hall, Wadsworth, New York (1984).
11. Rabcan, J., Rusnak, P., Subbotin, S.: Classification by fuzzy decision trees inducted based
    on Cumulative Mutual Information. In: Proceedings of 14th International Conference on
    Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering
    (TCSET 2018), pp. 208-212. IEEE, Los Alamitos (2018). doi: 10.1109/tcset.2018.8336188
12. Geurts, P., Irrthum, A., Wehenkel, L.: Supervised learning with decision tree-based meth-
    ods in computational and systems biology. Molecular Biosystems, 5(12):1593–1605
    (2009).
13. Rabcan, J., Levashenko, V., Zaitseva, E., Kvassay, M., Subbotin, S.: Application of Fuzzy
    Decision Tree for Signal Classification. IEEE Transactions on Industrial Informatics,
    15(10): 5425-5434 (2019). doi: 10.1109/tii.2019.2904845
14. Kamiński, B., Jakubczyk, M., Szufel, P.: A framework for sensitivity analysis of decision
    trees. Central European Journal of Operations Research, 26 (1): 135–159 (2017).
    doi:10.1007/s10100-017-0479-6
15. Rabcan, J., Levashenko, V., Zaitseva, E., Kvassay, M., Subbotin, S.: Non-destructive di-
    agnostic of aircraft engine blades by Fuzzy Decision Tree. Engineering Structures, 197
    (2019). doi: 10.1016/j.engstruct.2019.109396
16. Quinlan, J. R.: Induction of decision trees. Machine learning, 1(1):81-106 (1986).
17. Subbotin, S., Kirsanova, E.: The regression tree model building based on a cluster-
    regression approximation for data-driven medicine. CEUR Workshop Proceedings, 2255:
    155-169 (2018).
18. Breiman, L.: Random Forests. Machine Learning. 45 (1): 5–32 (2001).
    doi:10.1023/A:1010933404324
19. Denisko, D., Hoffman, M.: Classification and interaction in random forests. Proceedings of
    the National Academy of Sciences of the United States of America, 115 (8): 1690–1692
    (2018). doi:10.1073/pnas.1800256115.
20. Subbotin, S.: A random forest model building using a priori information for diagnosis.
    CEUR Workshop Proceedings, 2353: 962-973 (2019).
21. Boulesteix, A.-L., Janitza, S., Kruppa, J., König, I. R.: Overview of random forest meth-
    odology and practical guidance with emphasis on computational biology and bioinformat-
    ics. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(6):493–
    507 (2012).
22. Lin, Y., Yongho, J. Random forests and adaptive nearest neighbors. Journal of the Ameri-
    can Statistical Association. 101 (474): 578–590 (2006). doi:10.1198/016214505000001230
23. Subbotin, S.: Methods of data sample metrics evaluation based on fractal dimension for
    computational intelligence model buiding. In: Proceedings of 4th International Scientific-
    Practical Conference Problems of Infocommunications Science and Technology (PICS and
    T 2017), pp. 1-6 (2017). doi: 10.1109/infocommst.2017.8246136
24. Subbotin, S. A.: Models of criterions of comparison of neural networks and neuro-fuzzy
    networks in the problems of diagnosis and pattern classification. Scientific Reports of Do-
    netsk National Technical University. Serie "Informatics, Cybernetics and Computers", 12
    (165): 148-151 (2010).
25. Subbotin, S. A.: Analysis of properties and criterions of comparison of neural network
    models for solving diagnostics anf pattern recognition problems. Data Registration, Saving
    and Processing, 11 (3): 42–52 (2009).
26. Subbotin, S. A.: Metodics and criterions of comparison of models and algorithms of artifi-
    cial neural network synthesis. Radio Electronics, Computer Science, Control, 2: 109–114
    (2003).
27. Subbotin, S.A., Oliinyk, A.A.: The dimensionality reduction methods based on computa-
    tional intelligence in problems of object classification and diagnosis. Advances in Intelli-
    gent Systems and Computing, 543: 11-19 (2017). doi: 10.1007/978-3-319-48923-0_2
28. Subbotin, S.: Quasi-relief method of informative features selection for classification. In:
    Proceedings of IEEE 13th International Scientific and Technical Conference on Computer
    Sciences and Information Technologies (CSIT), pp. 318-321. IEEE, Los Alamitos (2018).
    doi: 10.1109/stc-csit.2018.8526627
29. Subbotin, S.A.: Methods of sampling based on exhaustive and evolutionary search. Auto-
    matic Control and Computer Sciences, 47(3): 113-121 (2013). doi:
    10.3103/s0146411613030073
30. Subbotin, S., Oliinyk, A.: The sample and instance selection for data dimensionality reduc-
    tion. Advances in Intelligent Systems and Computing, 543: 97-103 (2017). doi:
    10.1007/978-3-319-48923-0_13
31. Subbotin, S.: The instance and feature selection for neural network based diagnosis of
    chronic obstructive bronchitis. Studies in Computational Intelligence, 606: 215-228
    (2015). doi: 10.1007/978-3-319-19147-8_13
32. Ashby, W. R.: An Introduction to Cybernetics. Martino Fine Books, Eastford (2015).
33. Dopico, J. R., de la Calle, J. D., Sierra, A. P.: Encyclopedia of artificial intelligence. In-
    formation Science Reference, New York (2009).
34. Luger, G. F.: Artificial Intelligence. Structures and Strategies for Complex Problem Solv-
    ing. Pearson Education, London (2011).
35. Nievergelt, Y.: The Concept of Elasticity in Economics. SIAM Review, 25 (2): 261–265
    (1983). doi:10.1137/1025049.
36. Beven, K.J., Freer, J.: Equifinality, data assimilation, and uncertainty estimation in mecha-
    nistic modelling of complex environmental systems. Journal of Hydrology, 249: 11–29
    (2001).
37. Pedrycz, W.: Fuzzy modelling: paradigms and practice. Springer, Berlin (1996).
38. Hoekstra, A.: Generalisation in feed forward neural classifiers. Technische Universiteit
    Delft, Delft (1998).
39. Hoekstra A., Duin, R.: On the nonlinearity of pattern classifiers. In: Proceedings of 13 In-
    ternational conference on Pattern recognition, Vienna, 25-29 August 1996, vol. 4, pp. 271–
    275. IEEE, Los Alamitos (1996)
40. Grossberg, S.: Nonlinear neural networks: principles, mechanisms, and architectures. Neu-
    ral Networks, 1(1): 17-61 (1988).
41. Subbotin, S. A.: The training set quality measures for neural network learning. Optical
    Memory and Neural Networks (Information Optics), 19 (2): 126–139 (2010). doi:
    10.3103/s1060992x10020037
42. Subbotin, S.A. The sample properties evaluation for pattern recognition and intelligent di-
    agnosis. In: 10th International Conference on Digital Technologies (DT 2014), Zilina, pp.
    321-332. IEEE, Los Alamitos (2014). doi: 10.1109/dt.2014.6868734
43. Weng T.-W, Zhang, H., Chen, P.-Yu, Yi, J., Su, D., Gao, Y., Hsieh C.-J., Daniel, L.:
    Evaluating the Robustness of Neural Networks: An Extreme Value Theory Approach.
    https://arxiv.org/pdf/1801.10578
44. Carlini, N., Wagner, D.: Towards Evaluating the Robustness of Neural Networks. In: 2017
    IEEE Symposium on Security and Privacy (SP), vol. 1, pp. 39-57. IEEE, Los Alamitos
    (2017). DOI:10.1109/SP.2017.49
45. Krus, D.J., Blackman, H.S., Test reliability and homogeneity from perspective of the ordi-
    nal test theory. Applied Measurement in Education, 1: 79–88 (1988).
46. Alippi, C., Piuri, V., Sami, M.: Sensitivity to errors in artificial neural networks: a behav-
    ioral approach. IEEE transactions on сircuits and systems – I: Fundamental theory and ap-
    plications, 42 (6): 358–361 (1995).
47. Hashem, S.: Sensitivity analysis for feedforward artificial neural networks with differenti-
    able activation functions. In: Proceedings of International Joint Conference on Neural
    Networks, Baltimore, 7-11 June 1992, vol. I., pp. 419–424. IEEE, Los Alamitos (1992).
48. Tao C.-W., Nguyen H.T., Yao J.T., Kreinovich V.: Sensitivity analysis of neural control.
    In: Proceedings of Fourth International Conference on Intelligent Technologies, Chiang
    Mai, 17-19 December 2003, pp. 478–482. Chiang Mai University, Chiang Mai (2003).
49. Gerrow, K. Synaptic stability and plasticity in a floating world. Current Opinion in Neuro-
    biology, 20 (5): 631–639 (2010). doi:10.1016/j.conb.2010.06.010. PMID 20655734
50. Meyer, D., Bonhoeffer, T., Scheuss, V.: Balance and Stability of Synaptic Structures dur-
    ing Synaptic Plasticity. Neuron, 82 (2): 430–443 (2014).
51. Kick, D.R., Schulz, D.J.: Variability in neural networks. eLife, 7: e341532018 (2018). doi:
    10.7554/eLife.34153
52. Norris, B.J., Wenning, A., Wright, T.M., Calabrese, R.L.: Constancy and variability in the
    output of a central pattern generator. Journal of Neuroscience, 31:4663–4674 (2011). doi:
    10.1523/JNEUROSCI.5072-10.2011
53. Masquelier, T.: Neural variability, or lack thereof. Front. Comput. Neurosci., 7: 7 (2013).
    doi: 10.3389/fncom.2013.00007
54. Rusiecki, A., Kordos, M., Kamiński, T., Greń, K.: Training Neural Networks on Noisy
    Data. In: International Conference on Artificial Intelligence and Soft Computing (ICAISC
    2014) pp. 131-142. Springer, Cham (2014).
55. Martín J.A., de Lope, J., Maravall, D.: Adaptation, Anticipation and Rationality in Natural
    and Artificial Systems: Computational Paradigms Mimicking Nature. Natural Computing,
    8(4): 757-775 (2009).
56. Mainzer, K.: Symmetry And Complexity: The Spirit and Beauty of Nonlinear Science.
    World Scientific, Singapore (2005).
57. Bunge, M. A.: Emergence and Convergence: Qualitiative Novelty and the Unity of
    Knowledge, University of Toronto Press, Toronto (2003).
58. Wan, P.: Emergence à la Systems Theory: Epistemological Totalausschluss or Ontological
    Novelty? Philosophy of the Social Sciences, 41 (2): 178–210 (2011).
    doi:10.1177/0048393109350751
59. Japaridze, G., De Jongh, D. The logic of provability. in Buss, S., ed., Handbook of Proof
    Theory, pp. 476-546 North-Holland, Amsterdam (1998).
60. Molnar, Ch.: Interpretable Machine Learning. A Guide for Making Black Box Models Ex-
    plainable. https://christophm.github.io/interpretable-ml-book/
61. Valiant, L.: A theory of the learnable. Communications of the ACM, 27 (11): 1134–1142
    (1984). doi:10.1145/1968.1972.
62. Iuliano, E.: A Comparative Evaluation of Surrogate Models for Transonic Wing Shape
    Optimization. In: Andrés-Pérez, E., González, L.M., Periaux, J., Gauger, N.R.,
    Quagliarella, D., Giannakoglou, K.C., eds., Evolutionary and Deterministic Methods for
    Design Optimization and Control With Applications to Industrial and Societal Problems,
    pp. 161-180. Springer, Cham (2019).
63. Scaman, K., Virmaux, A.: Lipschitz regularity of deep neural networks: analysis and effi-
    cient estimation. In: 32nd Conference on Neural Information Processing Systems (NeurIPS
    2018), Montréal, Canada. https://papers.nips.cc/paper/7640-lipschitz-regularity-of-deep-
    neural-networks-analysis-and-efficient-estimation.pdf
64. Lipschitz             constant.           Encyclopedia           of           Mathematics.
    http://www.encyclopediaofmath.org/index.php?title=Lipschitz_constant&oldid=30687
65. Russell, S., Norvig, P.: Artificial intelligence: a modern approach. Prentice Hall, Upper
    Saddle River (2009).
66. Hannan, E. J., Quinn, B. G.: The determination of the order of an autoregression. Journal
    of the Royal Statistical Society, Ser. B, 41(2): 190–195 (1979).
67. Schwarz, G. E.: Estimating the dimension of a model. Annals of Statistics, 6 (2): 461–464
    (1978).
68. Akaike, H.: A new look at the statistical model identification. IEEE Transactions on
    Automatic Control, 19 (6): 716–723 (1974).
69. Gheissari, N., Bab-Hadiashar, A.: Model selection criteria in computer vision: are they dif-
    ferent? In: Proceedings of Digital Image Computing: Techniques and Applications, Syd-
    ney, 10-12 December 2003, pp. 185-194. CSIRO, Collingwood (2003).
70. Grünwald, P., Myung J., Pitt, M.: Advances in Minimum Description Length: Theory and
    Applications. MIT Press, Cambridge (2005).
71. Rissanen, J.: Modeling by shortest data description. Automatica, 14 (5): 465-471 (1978).
72. Mallows, C. L.: Some Comments on CP. Technometrics, 15 (4): 661–675 (1973).
    doi:10.2307/1267380