1. Introduction

Informatica

5803407

10.1109/tii.2019.2904845

The problem of convergence of classifiers construction procedure in the schemes of logical and algorithmic classification trees

Igor Povkhan

igor.povkhan@uzhnu.edu.ua 0

Oksana Mulesa

oksana.mulesa@uzhnu.edu.ua 0

Olena Melnyk

olena.melnyk@uzhnu.edu.ua 0

Yuriy Bilak

yuriy.bilak@uzhnu.edu.ua 0

Volodymyr Polishchuk

volodymyr.polishchuk@uzhnu.edu.ua 0 0 Uzhhorod National University , Zankoveckoy str., 89B, Uzhhorod, 88000 , Ukraine

2020

31 2007 249 268

The paper considers the problem of convergence in the procedure of classifier schemes synthesis by methods of logical and algorithmic classification trees. It suggests the upper evaluation of complexity for the scheme of algorithms tree in the problem of approximation of real data array by a set of generalized features with a fixed criterion of termination of the branching procedure at the stage of the classification tree construction. This approach provides the required accuracy of the model, evaluates its complexity, reduces the number of branches, and achieves the required performance features. The paper represents the evaluation of convergence for the procedure of recognition schemes construction is logical and algorithmic classification trees on conditions of weak and robust separation of primary initial sampling classes.

1 logical tree algorithmic tree classifier pattern recognition feature initial sample

1. Introduction

Problems united by pattern recognition are various and currently widely spread in both the economic and social content of the human activity, leading to the necessity of building and investigating the mathematical models of the relevant systems [1-5]. A universal approach to their solution is still missing, while plenty of general theories and approaches help solve different problem types (classes). However, their practical application varies by high sensitivity to the specific parameters of the problem itself or subject area of application [6]. Various theoretical results derive from exceptional cases and subproblems, pointing out that the bottleneck of successful real recognition systems requires performing a considerable amount of computation and focusing on powerful hardware tools. The classification trees (solution trees) have fixed a significant part of the above shortcomings. It enables to work effectively with problems of arbitrary scale data (where the information is set in natural form) [7-10].

Today there are various relevant approaches to constructing classification systems in the form of logical trees and algorithmic classifications (LCT/ACT) [11-13]. Moreover, the interest in recognition methods using LCT is caused by several valuable properties. From one side, the complexity of the class of recognition functions (RF) in the form of LCT models, under certain conditions, does not exceed the complexity of the class of linear recognition functions (the simplest of the known). On the other side, RF in the form of classification trees allows distinguishing in the process of classification both causal links (and unambiguously take them into account in the future) and factors of chance or uncertainty. Additionally, it considers both functional and stochastic links between properties and behavior of the entire system and the process of missing interactions classification between environmental objects, different animal species, and people (except the objects, information about which is transmitted genetically (hereditary) or other). It takes place according to the so-called logical decision tree [14].

In most forecasting and classification problems, which use unstructured data (such as sets of discrete patterns or text arrays), an artificial neural network (of selected type) outperforms all other types of algorithms or decision tree frameworks. Otherwise (in the case of structured high-volume discrete data arrays), the methods and algorithms of the decision tree concept are vastly preferred [1517]. Practically, LCT algorithms and construction methods usually give structurally complex logical trees at the output (in terms of the number of vertices, number of branches belonging to the class of irregular trees), which are unevenly filled with data and have different numbers of branches. Such complex tree-like structures are difficult to perceive for external analysis due to a massive number of nodes (vertices) and a massive number of stepwise partitions of the primary initial sample (IS) containing a minimum number of objects (maybe even single objects in the worst case). In practice, let us review the fundamental question regarding methods of classification trees (classification models) – the question of convergence of procedure for constructing a classification tree (methods of classification trees), structures LCT/ACT.

2. Formal problem statement

Let it be the IS in the following form: (x1, f R ( x1)),..., ( xm , f R ( xm )) ( 1 )

Consider xi  G , (i  1,..., m) , where m  is the quantity of objects with the IS, f R ( xi ) – is a finite value function that designates the partition of R for the set G into classes (patterns) H 0 ,..., H k 1 . Ratio f R ( xi )  l (l  1,..., k 1) means xi  Hl , xi  {xi1 ,..., xin } , xi j  value j  kind features for object xi , ( j  1,..., n) , n  quantity of features in IS.

As a result, IS is a population (or the sequence) of the set, where each set is a population of indicated values and functional values [18]. Additionally, the set of indicated values is a particular image, where the value of the function relates it to the particular pattern. The problem is to build a structure LCT/ACT  L based on the array of primary IS base of the type ( 1 ) and determine the value of its structural parameters p (that is F (L( p, xi ), f R ( xi ))  opt ).

3. Literature review

All basic approaches in the theory of recognition have their advantages and disadvantages and form a single toolkit for solving practical problems of artificial intelligence theory — especially the integrally elaborated classical algebraic approach developed by Y.I. Zhuravlyov [19]. This direction of recognition theory development is connected with the classification algorithms model construction and the optimized quality recognition algorithm selection. The research focuses on the actual concept of decision trees (LCT structures). In particular, The classification scheme, which is given by an arbitrary approach and method and algorithm of the classification tree, has a tree-like logical structure [1,3,20]. Moreover, the structure of the logical tree consists of vertices (features), grouped by circles and built (selected) at a certain step (stage) of classification tree model construction [21]. The main peculiarity of tree-like recognition systems is that the importance of individual features (groups of features or sets of features) is determined relative to the function that defines the division of objects into classes [22].

In the research [21], a generation scheme for the classification tree structure is suggested based on the step-by-step selection of elementary attributes, the disadvantage of which is the high dependence of the model complexity on the effectiveness of the final minimization, tree cutting procedure. In researches [23-25], the modular scheme of classifiers construction has been offered in the form of classification tree structures, which circumvents the limitations of traditional decision tree methods. The research [24] proposes an efficient scheme for generating generalized features based on constructing sets of geometric objects. However, the disadvantage is the limitation of the initial training structure of the sample and the lack of practical application versatility. In addition, challenges of LDK models structural complexity estimating the minimization stage evaluated in research [20].

Thus, the research [22] shows that the resulting classification rule has a tree-like logical structure built by an arbitrary method or the indication branched selectional algorithm. In this case, the question of qualitative branching criterion selection comes first. The logical tree consists of vertices grouped on tiers and received at a particular step of recognition tree construction [25]. Thereby, it challenges the effective structure minimization for the constructed model of the classification tree. A significant problem that arises from article [23] is the synthesis of recognition trees, which the actual tree of algorithms will represent. An essential area of LDK structures research remains the issues of the generation of decision trees for the case of uninformative features [14] and the actual issue of classification trees theory. It questions the possibility of constructing all logical tree variants, which correspond to primary IS, and selecting minimum depth or structural complexity (number of tiers) classification tree [26-30].

4. Convergence of synthesis of logical and algorithmic classification tree structure

Consider that at every step during building a logical tree (some LCT model), only one selected elementary feature is picked from the set of the fixed features (1,..., n ) . Then on n  th step of the tree classification construction procedure, the LCT scheme represents by itself a predicate pn (generalized feature, which is built from the set of elementary features) [23, 30], that is the most effective approximation of the initial IS of general view ( 1 ) (applicable for the ACT structure case).

In particular, pn represents some tree-like scheme (classification tree), which consists of n vertices. The structure of the predicate pn includes only n elementary features (attributes of the IS discrete object) from the initial set.

The sequence of predicates p1,..., p j (generalized features) coincides with the primary IS of the ( 1 ) kind. In case it starts from some Q , the following condition fulfills:

fQm  f R (xi ) , (i  1,..., m) , (m  0) .

Let us denote with n some elementary features selected (fixed) on n  th step in the scheme of the LCT model construction. The feature n corresponds to some fixed path r1, r2 ,... , which ends by the given attribute (the vertex of classification tree – LCT model). For example, on Fig. 1 there is the LCT in which the vertex 2 (feature) corresponds to the path {0} , and the vertex 5 corresponds to the path {0,1} .

The path that corresponds to the elementary sign n as indicated, let us denote as Tn , and with Dn let us denote the set of those pairs (xi , fR (xi )) of the primary IS of the general view ( 1 ), where objects wi belong to the path Tn . For example, for the LCT structure (Fig. 1), let n  4 , then the path Tn looks as {1,0} . 1(wi )  1 and 1(wi )  0 .

In such a case some object wi belongs to the path {1,0} , if the following conditions are fulfilled: We consider further, that the elementary feature n weakly separates the set Dn , if in Dn there are such pairs (xi , fR (xi )) and (x j , f R (x j )) , that n (xi )  0 and n (xi )  1 (that is n ( xi )  n (x j ) ).

Figure 1: LCT structure with elementary features as vertices.

In the final power of the scheme of classification tree method (models LCT/ACT), we call the number of all finite vertices (certain sheets) of the scheme. For example, for the LCT on Fig. 1 the power is 6.

Obviously, the final power of the classification tree method scheme is also equal to the number of all final paths in it. It is clear that with induction of n , it can be easily proved that the final power of each of the above schemes pn (predicates) is equal to n 1. Indeed, the fact that the final power p1 which includes only one feature or algorithm (cases with LCT/ACT), equals 2, is obvious.

Let the final power of the scheme pn equals to n 1. Let us calculate the final power pn1 . It is clear that this scheme is based on the scheme pn , when in some final vertex, a new vertex is successively added (feature, algorithm) with the number n 1. Obviously, when adding this feature (algorithm) to the scheme pn one end vertex disappears and two new end vertices are added. Therefore, we can conclude that the number of all end vertices of the scheme pn equals to n  2 .

Let us assume that on each n  th step of the classification tree construction procedure (of the LCT model) the set Dn is weakly separated by some feature n . Next, we consider the scheme pn . According to this scheme, it has n 1 of the final paths. Due to the fact that Dn at each step is weakly separated, every such path contains at least one pair of the primary IS of general view ( 1 ). It is also obvious that different end paths in pn do not have common pairs from the sample ( 1 ). Therefore, we can conclude that the scheme (predicate) pn divides IS (based on the basic branching criterion introduced by the current method of classification tree) in n 1 non-empty parts (subsets) that do not intersect. Since in the primary IS in total there are m training pairs, the scheme pm1 (or a predicate with a smaller number) completely separates the primary IS, that is pm1 fully recognizes the sample.

Thereby, if on each n  th step the selected elementary feature n weakly separates the set Dn , then in this case the LCT construction process coincides with the primary IS and ends in no more than in m 1 steps, where m  the number of all training pairs of primary IS.

Let us notice that the condition of weak separation of classes for primary IS is relatively weak – therefore, it provides low convergence of the construction procedure for the classification tree. Thus, it is essential to consider the convergence of the process under more vital conditions. Therefore, we will assume that we are dealing with a case where IS contains information about two classes (patterns)

H1 , and IS itself has a deterministic nature. Let n j  is a number of training pairs ( xi , f R ( xi )) within primary IS, which satisfy the ratio f R ( xi )  j, ( j  0,1) , and for simplification and certainty, we put it as n0  n .

Having fixed f R ( x)  0 , we obtain some generalized feature (scheme) f0 , which approximates (in whole or in part) the primary IS. Obviously, in this case (that is in a situation where the choice of any elementary feature has not yet been made n ), generalized feature (scheme) f0 is the best approximation of primary IS. Further the value n1 we call the unconditional number of errors in the primary IS.

Let in the first step of building a classification tree some elementary features has been selected (arbitrarily) 1 – and this feature will break the initial sample into two parts (subsets) H 0 and H1 , where H j  is the set of all pairs ( xi , f R ( xi )) of primary IS, for which the ratio is satisfied f1( xi )  j, ( j  0,1) .

Let nmj  the set of all pairs ( xi , f R ( xi )) from sample H j , ( j  0,1) , for which the relation is fulfilled f R ( xi )  m, (m  0,1) . Feature 1 can be considered a generalized feature f1 (scheme), which is built on the first step of the LCT construction process.

Let us enter the value   max(n00 , n10 )  max(n01 , n11) , which is the number of correct answers (classifications) that are realized by a generalized feature f1 , and accordingly, the value n0 is the number of correct answers (classifications), which are realized by a generalized feature f0 .

By the number of correct answers we mean the number of those training pairs ( xi , f R ( xi )) in the initial training sample of the type ( 1 ), for which the relation of equality is fulfilled f R ( xi )  f1 ( xi ) .

Since n00  n01  n0 and n10  n11  n1 , then we will have the following:   max(n00 , n10 )  max(n10 , n11)  n0 .

m  n  n1  (  n0 )  n1 .

Therefore, when choosing a feature 1 the number of correct answers at least does not decrease. The number of errors given by the generalized algorithm f1 , will be equal to: n1 m   Let us note that ( 3 ) follows from ( 2 ). Let us enter the value 1  and call it the quality of elementary feature 1 relating to primary IS, similarly we determine n of feature n relating to primary IS (n  1,2,3,...) .

With the power of some constructed generalized feature (GF) or set GF (for a fixed step of ACT scheme) we will call the number of training pairs ( xi , f R ( xi )) of primary IS that look like ( 1 ), which is approximated (correctly classified) by this generalized feature (sequence of generalized features).

It is important for ACT schemes that at step-by-step division of IS on two samples H0 and H1 (and so on) part of the sample will be completely covered by the current classification algorithm (generalized feature or their set) - that is, we will have a case of strong separation of array IS classes. Therefore, we can assume that the complexity of the final ACT scheme (the total number of steps to build a tree) largely depends on the procedure of initial evaluation and selection of a set of independent classification algorithms ai , their initial parameters, parameters of set GF fi , which they generate for each step of the ACT scheme.

Then, for the ACT scheme, it is essential to consider the general complexity of the procedure for constructing a classification tree in the condition of weak separation of the primary IS classes. A single GF is generated with the power of one unit per verticle of the tree. Under high separation ( 2 ) ( 3 ) conditions, when a problem and practical feasibility have not set the number limit on GF and its power, it is possible to build them.

In the first stage, we consider a case of weak class division with restrictions on the GF sets being built by the ACT scheme. Let us note that the procedure for constructing an algorithmic tree has certain features in terms of step-by-step approximation of the primary IS by sequence GF. Let at every step of the construction of some ACT model, select for work one fixed classification algorithm from a set of selected algorithms (a1, a2 ,..., an ) . Moreover, the classification tree can be built by one algorithm and sequence GF, which he is generating.

Therefore, after fulfilling the n steps of the classification tree constructing procedure the ACT structure represents some scheme sn (generalized second-order feature, which is built from a set of GF synthesized by classification algorithms), which is the most effective approximation of the primary IS of general view ( 1 ) a set of independent classification algorithms and their GF. In particular sn will represent some tree-like scheme (GFT structure), which consists of n vertices, that is, in the design of the scheme sn enters only n classification algorithms (GF – conditioned that for each step of the tree constructing procedure no more than one generalized feature of the minimum power per unit is generated) from the initial set.

5. Experiments and results

In the next stage for the LCT structure let us make an assumption – quality n of elementary features n relative to the array of primary IS not less than some number y , where y  1.

Let us analyze the complexity of the procedure for classification tree construction under this condition ( y  1) , to do this, let us estimate the number of steps for which this process (procedure) will implement full recognition of the initial training sample array.

To be certain, let us consider the following scheme of classification tree construction (Fig. 2).

Let n1  an unconditional number of errors within primary IS. Elementary feature 11 separated IS in two samples: H0 and H1 . Let h0 and h1 accordingly is an unconditional number of errors in the samples H0 and H1 . Feature 12 separate the set H0 in two sets H 00 and H 01 . Let h00 and h01 - is an unconditional number of errors in the samples H 00 and H01 . Similarly, we define sets H10 , H11 and quantities h10 and h11 for the elementary feature 32 .

Figure 2: Scheme of division into subsets in the structure of the classification tree.

From the initial condition ( y  1) it follows: From ( 4 ) we receive the following: * h0 . Let us make the following assumptions in this regard: h0  1 , h1  1, and h11  1 . From here we will have the following: Similarly for the set of features 1i, i2 ,... , located on i  th tier of the logical tree, we will have: 2i  y1i * n1 or (2 y)i  n1 . ( 7 )

Hence we can conclude that the process of constructing a classification tree will continue until there the structure of will have m tiers (levels), where m has the following form:

Under R( x) means rounding the number x to the nearest integer, which exceeds x . For example Q(1.2)  2, Q(3.7)  4, Q(4.1)  5 .

Hence the classification tree, which has m full tiers (that is, the case when on i  th tier there are vertices), has vertices – thus recognizing the primary IS with condition that ( y  1) by means of full LCT takes place no more than in steps, where m is calculated using an expression ( 8 ).

For the case of structure ACT, we can conclude that the sequence of constructed schemes s1, s2 ,..., s j (generalized features of the second order) coincides with the primary IS with the view ( 1 ), not more than in M steps (where M  total power of primary IS), even if at each step only one GF is generated, while the power of each is not more than one unit.

Some classification algorithm that will be selected (fixed) on n  s th step in the procedure of building the ACT model (to generate the appropriate GF), let us denote with an , and it is clear, that this algorithm an corresponds to some scheme sn , which consists of algorithms a1, a2 ,..., an1 and ends with this attribute (vertice of classification tree - ACT model). For example, on Fig. 3 some ACT model is shown, in which a fixed scheme s2 (vertice of the classification tree under construction) corresponds to the sequence of steps (schemes) {s1} , and scheme sM – sequential path {s1, s2 ,..., sM 1} .

Therefore, for the ACT model it is possible to make the following conclusion: scheme sn (in the structure of the classification tree) divides IS on n non-empty parts (subsets) that do not intersect, and because in the primary IS in total there are M training pairs, that is why the scheme sM will completely divide (approximates) the primary IS (that fully recognizes the sample at the condition that it generates one GF at each step with the power of one unit). So, if on every n  th step of the ACT construction scheme the generated GF (selected by classification algorithm an ) weakly separates the set of primary IS, in this case, the process of classification tree construction coincides with the primary IS and finishes not more than in M steps, where M  is the quantity of all training pairs of primary IS. In the following stage of research, it is important to consider the case of strong separation of classes in primary IS, when there are no set restrictions on algorithms ai regarding generation GF (power of constructed GF is limited only by the practical possibility of the classification algorithm itself ai and structural parameters of IS). Let with P( f j ) we denote the total power (approximation ability) of corresponding GF f j , (1  j  s) , where s  is the quality of GF in constructed ACT scheme.

Further on some step r, (1  r  M ) of the ACT scheme, a sequence of generalized features is constructed f1,..., fr

with their corresponding values P( fi )  zi , where (1  z  M ) , (1  i  r) , M  general power IS, and among them there are values z max and z min , which are for them, respectively, the maximum and minimum (relatively to the current step of ACT scheme). Then in this case the ACT scheme (model) will be constructed in t steps, where the value t is determined by the ratio ( 9 ).

t  2 *

Ppt (IS ) z max  z min 

2M z max  z min .

( 9 ) ( 10 )

We notice that in the case when by the condition of the practical problem the power restrictions of synthesized GF are imposed on the ACT scheme under construction (not exceeding the appropriate value P ) – classification tree scheme (ACT model) will be built in t steps, where the value t is determined by the ratio ( 10 ).

t 

At strict restrictions of the ACT scheme on one generated GF (where according to the condition P( f j )  1, (1  i  t) , that is in case of weak classes separation of the current problem, the classification tree scheme (model ACT) will be built in t steps, where the value t  M .

In the next stage, for simplification, let us take the classification problems for which sets of structures LCT/ACT have been built from researches [19-24,30]. The initial parameters of these practical problems are presented in the Table 1. So in IS, the information regarding separation into two classes has been represented. At the examination stage, the constructed classification system should effectively recognize objects of unknown classification concerning these two classes. Let us note that the training and test sample was automatically checked for correctness at the initial stage. (searching and deleting the identical objects of different affiliations - errors of the first and second type).

Here (in Table 2) is represented the evaluation of the constructed structures of classification trees (LCT/ACT) of problems from Table 1. To assess the quality of constructed classifiers (classification schemes) we used an integrated feature of the quality of the classification tree QMain from resear [30].

The problem of

geological data classification (Z1)

The problem of

chemical analysis of the quality of hydrocarbon

fuels (Z2)

The problem of

classification of flood situations in the Tisza river basin of

Transcarpathian region (Z3) The problem of

classification of flood situations in the Uzh river basin of

Transcarpathian region (observation post №1) (Z4) The problem of

classification of flood situations in the Uzh river basin of

Transcarpathian region (observation post №2) (Z5)

structure LKol . Convergence of the structure synthesis LCT/ACT procedure is estimated on the basis of quantitative features – the total number of iterations SMain and the number of tiers in the classification tree Relation of objects of different classes IS – (H / H 2 / ... / H i ) 1

756/494 823/648/1412/918/583/764

76/108/5934 The total number of classes by

data splitting IS – l 2 6 3 3 3 18 4252

73/102/4107 18 4139 68/97/3974

Integrated feature of the classification tree quality QMain displays the basic parameters (characteristics) of classification trees and can be used as an optimality criterion in the evaluation procedure of an arbitrary tree-like recognition scheme [20]. Let us note that the main idea of classification tree methods based on autonomous algorithms in their structure lies in a step-by-step approximation by the selected set of algorithms the data set of primary IS [22]. Table 2

Comparative table of structure classification schemes LCT/ACT

№ of Method of Integral feature of Convergence Number of problem synthesis of model quality of the tiers in classification QMain classification structure tree structure tree LCT/ACT structure LKol (number of iterations) Z1 Z1 Z2 Z2 Z3 Z3 Z4 Z5

Method of

complete LCT based on the selection of elementary features

LCT model with

a one‐time assessment of the importance of the features

Limited LCT

construction method

Algorithmic tree method (type I) Algorithmic tree method (type ІІ) Method of

extensive selection of features (step‐ by‐step assessment)

Algorithm tree (type I) Algorithm tree (type II)

The obtained structures of classification trees (ACT/LCT models) are from one side are characterized by high versatility in terms of practical problems and relatively compact structure of the model itself, but from the other side it requires significant hardware costs for storing generalized features and initial assessment of the fixed classification algorithms quality according to data of IS comparing to the neural network concept [31-35]. Therefore in comparison with the ACT concept LCT method has a high speed of classification schemes, relatively insignificant hardware costs for storage and operation of the tree structure itself, and high quality of discrete objects classification.

Let us note that all ACT / LCT models and ACT / LCT structures were built in the software application "Orion III". Based on the classification tree method and the principle of modularity, Uzhhorod National University has developed a software application "Orion III" to generate autonomous recognition systems. The algorithmic library of the system has 15 recognition algorithms, including algorithmic implementations of both the LCT structure and the ACT structure [23].

From Table 2 we can see that the methods of algorithm trees (of two types) have shown a high rate of convergence for constructing the classification tree structure on represented IS compared to the LCT schemes. Consider that the first type of the ACT structure shows a good result in terms of structural complexity (number of tiers, vertices, generalized features) of constructed classification model in comparison with logical classification trees and the tree of algorithms of the second type. In general, we can conclude about the rapid convergence of ACT structures compared to LCT models and advantage due to this the structural complexity of the constructed classification tree and the informational capacity of generalized feature sets.

6. Conclusion

Therefore, taking into consideration all the above in the research, we can assume the following points:

On the condition of weak class separation in the LCT case, if on every n  th step the selected elementary feature n weakly separates the set (subset) of objects of primary IS, then, in this case, the process of classification tree construction coincides relating to the primary IS and terminates no more than in m 1 steps, where m  the quantity of all training pairs of primary IS.

Classification tree (of LCT structure) on condition of strong classes separation of primary IS sets of objects, which has m full tiers, levels (that is the case, when on i  th tier vertices are located), has vertices – so array recognition of primary IS on condition that ( y  1) by means of full LCT takes place no more than in steps, where m is calculated by means of expression m  R( log2 n1 ) . 1  log2 y

A general number of all final vertices of logical structure (sheets of recognition tree) of constructed classification scheme will unambiguously determine the final power of classification tree method scheme (models LCT/ACT).

Power of some GF (the set of constructed GF) for the fixed step of the ACT method scheme is considered the general quantity of training pairs ( xi , f R ( xi )) of primary IS (a subset of primary IS) that look like ( 1 ), which are approximated (correctly classified) by given generalized feature (sequence of generalized features).

In case of weak class separation of primary IS for the ACT scheme the process of classification tree construction coincides relating to IS data set and terminates in no more than M steps, where M  is the quantity of all primary IS training pairs.

In the case of strong classes, separation of primary IS for the ACT scheme, when the power of constructed GF (or set of GF) is limited only by the practical possibility of the classification algorithm itself ai and initial parameters of IS, the ACT scheme (model) will be constructed in t steps, where the value t is defined by the ratio ( 9 ).

7. Acknowledgments

The paper was carried out within the framework of the research "Modeling and forecasting of emergency situations in the Carpathian region and countries of Central and Eastern Europe", the state registration number of the work is 0106V00285, the category of work is fundamental research (ID2201020), 01 – Fundamental research on the most important problems of natural, social and humanitarian sciences.

8. References

[1]

Hastie ,

Tibshirani ,

Friedman , The Elements of Statistical Learning , Berlin, Springer, 2008 .

[2]

J.R.

Quinlan , Induction of Decision Trees, Machine Learning 1 ( 1986 ) 81 - 10 .

[3]

L.L.

Breiman ,

J.H.

Friedman ,

R.A.

Olshen ,

C.J.

Stone , Classification and regression trees , Boca Raton , Chapman and Hall/CRC, 1984 .

[4]

Lupei ,

Mitsa ,

Repariuk ,

Sharkan , Identification of authorship of Ukrainianlanguage texts of journalistic style using neural networks , Eastern-European Journal of Enterprise Technologies 1 ( 2 ( 103 )) ( 2020 ) 30 - 36 . doi: 10 .15587/ 1729 - 4061 . 2020 . 195041 .

[5]

Jafari ,

Xu , Intelligent Control for Unmanned Aerial Systems with System Uncertainties and Disturbances Using Artificial Neural Network, Drones 3 ( 2018 ) 24 - 36 .

[6]

Miyakawa , Criteria for selecting a variable in the construction of efficient decision trees , IEEE Transactions on Computers 38 ( 1 ) ( 1989 ) 130 - 141 .

[7]

Koskimaki , I. Juutilainen ,

Laurinen ,

Roning , Two-level clustering approach to training data instance selection: a case study for the steel industry , in: Proceedings of the International Joint Conference on Neural Networks, IJCNN 2008, part of the IEEE World Congress on Computational Intelligence , IEEE, Los Alamitos. ( 2008 ) 3044 - 3049 . doi: 10 .1109/ijcnn. 2008 . 4634228 .

[8]

S.F.

Jaman ,

Anshari , Facebook as marketing tools for organizations: Knowledge management analysis, in: Dynamic perspectives on globalization and sustainable business in Asia , IGI Global, Hershey. ( 2019 ) 92 - 105 . doi: 10 .4018/978-1- 5225 -7095-0. ch007 .

[9]

Sirichotedumrong ,

Maekawa ,

Kinoshita ,

Kiya . Privacy-preserving deep neural networks with pixel-based image encryption considering data augmentation in the encrypted domain , in: Proceedings of the 26th IEEE International Conference on Image Processing, ICIP 2019 , IEEE, Los Alamitos. ( 2019 ) 674 - 678 . doi: 10 .48550/arXiv. 1905 . 01827 .

[10] R.L. De Mántaras , A distance-based attribute selection measure for decision tree induction , Machine learning 6(1) ( 1991 ) 81 - 92 .

[11]

Karimi ,

Hamilton , Generation and Interpretation of Temporal Decision Rules , International Journal of Computer Information Systems and Industrial Management Applications 3 ( 2011 ) 314 - 323 .

[12]

Kamiński ,

Jakubczyk ,

Szufel , A framework for sensitivity analysis of decision trees , Central European Journal of Operations Research 26 ( 1 ) ( 2017 ) 135 - 159 .

[13]

Deng ,

Runger , E. Tuv, Bias of importance measures for multi-valued attributes and solutions , in: Proceedings of the 21st International Conference on Artificial Neural Networks , volume 2 of ICANN 2011 , Springer-Verlag, Berlin. ( 2011 ) 293 - 300 . doi: 0.1007/978-3- 642 -21738-8_ 38 .

[14]

S.A.

Subbotin , Construction of decision trees for the case of low-information features , Radio Electronics, Computer Science, Control 1 ( 2019 ) 121 - 130 .

[15]

Jafari ,

Xu . Intelligent Control for Unmanned Aerial Systems with System Uncertainties and Disturbances Using Artificial Neural Network . Drones . 3 ( 2018 ) 24 - 36 .

[16]

Painsky ,

Rosset , Cross-validated variable selection in tree-based methods improves predictive performance , IEEE Transactions on Pattern Analysis and Machine Intelligence 39 ( 11 ) ( 2017 ) 2142 - 2153 . doi: 10 .1109/tpami. 2016 . 2636831 .

[17]

Imamovic ,

Babovic ,

Bijedic , Prediction of mortality in patients with cardiovascular disease using data mining methods , in: Proceedings of the 19th International Symposium INFOTEHJAHORINA, INFOTEH 2020 , IEEE, Los Alamitos. ( 2020 ) 1 - 4 .

[18] S.B. Kotsiantis,

Supervised Machine Learning: A Review of Classification Techniques,