=Paper= {{Paper |id=Vol-3702/paper16 |storemode=property |title=The Modified Algorithm Tree Method in the Geological Data Classification Problem |pdfUrl=https://ceur-ws.org/Vol-3702/paper16.pdf |volume=Vol-3702 |authors=Igor Povkhan,Oksana Mulesa,Olena Melnyk,Vasyl Morokhovych |dblpUrl=https://dblp.org/rec/conf/cmis/PovkhanMMM24 }} ==The Modified Algorithm Tree Method in the Geological Data Classification Problem== https://ceur-ws.org/Vol-3702/paper16.pdf
                         The modified algorithm tree method in the geological
                         data classification problem
                         Igor Povkhan 1 , Oksana Mulesa 1 , Olena Melnyk 1 , Vasyl Morokhovych 1
                         1 Uzhhorod National University, Zankoveckoy str., 89B, Uzhhorod, 88000, Ukraine



                                         Abstract
                                         This study presents the development of an advanced algorithmic tree synthesis method predicated on
                                         a set configuration of initial data for the task of geological data recognition. The devised classification
                                         tree algorithm of the second type demonstrates precise classification of the complete training dataset,
                                         adhering to the established classification schema. It boasts high interpretability, a straightforward
                                         structure, and incorporates autonomous algorithms for classification and scheme recognition as
                                         vertices within a graphical framework. The refined construction methodology for the tree algorithm
                                         facilitates handling substantial volumes of discrete data across diverse categories, ensuring remarkable
                                         accuracy of the classification schema. Moreover, it judiciously utilises hardware resources during the
                                         creation of the definitive classification schema and supports the development of models with specified
                                         accuracy levels. The paper advocates a novel synthesis approach for recognition algorithms, drawing on
                                         a repository of extant algorithms and theoretical recognition methods. Employing the proposed second
                                         type tree algorithm, a suite of models has been constructed that adeptly classifies extensive arrays of
                                         geological data. The constructed models of classification trees have verified the absence of errors in both
                                         training and testing datasets, substantiating the efficacy of the second type tree method algorithm.

                                         Keywords
                                         Algorithmic tree, classifier, pattern recognition, feature, initial sample. 1


                         1. Introduction
                             Classification and image recognition represent critical problem domains within the sphere of
                         artificial intelligence, notable for their extensive diversity, varying degrees of structural
                         complexity, and significant applicability across numerous sectors of human economic and social
                         endeavours. In disciplines such as geology, where the challenges of classification are tackled
                         through sophisticated information systems, the importance and intensity of research in this area
                         are well-documented [1-10]. These classification challenges demand the development and
                         decomposition of mathematical models tailored to the specific systems under study. Presently,
                         the field of artificial intelligence lacks a universally applicable approach capable of addressing the
                         full spectrum of these complex problems. However, several broadly applicable theories and
                         methodologies have emerged, with neural networks being particularly prominent due to their
                         versatility in addressing a wide array of classification challenges [11-14]. In practical scenarios,
                         specifically configured artificial neural networks often outperform traditional algorithms and
                         established decision tree models, such as gradient boosting methods, especially in tasks involving
                         unstructured data, discrete image sets, or textual content. Conversely, when dealing with
                         structured datasets comprising large volumes of massive discrete data, which exhibit diverse
                         feature spaces, decision tree-based methods and algorithms exhibit distinct advantages [15].
                         Generally, classification tree methodologies facilitate effective data processing across various
                         magnitudes, presenting the input information in its inherent form. Numerous contemporary
                         strategies and concepts are focused on developing recognition systems (RS) and classifications
                         using logical/algorithmic classification tree models (LCT/ACT structures). The growing interest
                         in tree-like graph-schematic representations of classifiers is driven by their numerous


                         CMIS-2024: Seventh International Workshop on Computer Modeling and Intelligent Systems, May 3, 2024,
                         Zaporizhzhia, Ukraine
                            igor.povkhan@uzhnu.edu.ua (I. Povkhan); oksana.mulesa@uzhnu.edu.ua (O. Mulesa);
                         olena.melnyk@uzhnu.edu.ua (O. Melnyk); morv77@ukr.net (V. Morokhovych)
                            0000-0002-1681-3466 (I. Povkhan); 0000-0002-6117-5846 (O. Mulesa); 0000-0001-7340-8451 (O. Melnyk);
                         0000-0002-4939-6566 (V. Morokhovych)
                                    Β© 2024 Copyright for this paper by its authors.
                                    Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
advantageous properties [16]. One promising area of application for the classification tree model,
specifically within the realm of algorithmic trees, is in the classification of geological informations
[22].


2. Formal problem statement
    Let be 𝐻1 , 𝐻1 , … , π»π‘˜ the system of classes (images) defined on the set 𝐺 consisting of objects π‘₯𝑖 , (𝑖 =
1, … , π‘š). The nature of the division of the set 𝐺 into the corresponding classes is specified using the
following training sample (TS):
                              ((π‘₯1 , 𝑓𝑅 (π‘₯1 )), (π‘₯2 , 𝑓𝑅 (π‘₯2 )), … , (π‘₯π‘š , 𝑓𝑅 (π‘₯π‘š ))).                      (1)
    Let us note that here π‘₯β„Ž ∈ 𝐺, 𝑓𝑅 (π‘₯) ∈ {0,1, … , π‘˜ βˆ’ 1}, (β„Ž = 1,2, … , π‘š), π‘˜ βˆ’ the number of TS
classes, π‘š βˆ’ the total number of TS objects, and 𝑓𝑅 (π‘₯) is some finitely significant function that
determines the division of the set 𝐺 into corresponding images. The ratio 𝑓𝑅 (π‘₯β„Ž ) = 𝑙,
(𝑙 = 0,1, … , π‘˜ βˆ’ 1) means that π‘₯β„Ž ∈ 𝐻𝑙 . We note that each TS of the form (1) can be according
(with the help of some algorithm or representation method) to a wholly defined LCT, which
matches 𝑓𝑅 (π‘₯β„Ž ) the objects of TS (1) with the value of the function π‘₯β„Ž , (β„Ž = 1, … , π‘š), which
specifies the partition 𝑅 on the set 𝐺. Therefore, the task will be to build a structure of the
classification tree (LCT/ACT), the structure of which would be optimal f R ( x j ) β†’ opt with the
initial data of TS.

3. Literature review
   The current study delves into the theory of fixed-type decision trees, focusing on algorithm
trees and the classification of discrete objects [14, 23, 25]. Notably, research [20] underscores
that the classification rules and decision schemes, derived from any branching feature selection
method or algorithm, manifest a tree-like logical structure. A typical decision tree classifier
comprises an organized sequence of nodes, features, and attributes structured into layers or
levels, each established during a specific phase of the classification tree synthesis [15].
A significant challenge identified in [18] is the effective construction of recognition tree
structures, which can take the form of tree-like structures or algorithm graphs (ACT structures).
Consequently, decision tree methodologies facilitate the creation of innovative classifiers based
on a modular principle, utilizing well-known recognition algorithms [19-21]. The study [14]
explores fundamental issues related to the generation of decision tree structures, particularly
when features are low-informative, including their sets and combinations. Within the sphere of
intelligent data analysis, the invariant capacity of LCT/ACT structures to execute one-
dimensional branching allows for the analysis of the influence, importance, and quality of
individual variables. This capability is essential for managing different types of variables as
predicate sets. The persistent challenge with decision tree methods and structures is evaluating
the quality and efficiency of the branches (generalized features) that serve as autonomous
classification algorithms [15].Logical decision tree classification methods are prevalently
employed in intelligent data analysis, aiming to synthesize operational models that predict the
value of a target variable based on an initial set of data formatted as a structured training sample
[19]. From an applied perspective, numerous methods and algorithms grounded in the decision
tree concept are utilized for classification tasks; however, C4.5/C5.0 and CART have emerged as
particularly popular. The C4.5/C5.0 methods employ a theoretical-informational criterion for
node or vertex selection, whereas the CART algorithm relies on the Gini index, which assesses the
relative distances between class distributions within the metric of the training sample [20, 21].
The set of methods and algorithms for branching feature selection (ACT structures) is based on
optimally approximating the initial training set using a ranked series of classification algorithms
[22]. A key issue within LCT/ACT methods, as discussed in [23], involves choosing an effective
branching criterionβ€”that is, selecting nodes, attributes, and features of discrete objects for LCT
schemes and algorithms for ACT. These foundational issues are thoroughly examined in another
paper [24], which addresses the qualitative evaluation and informativeness of individual discrete
features, their sets, and fixed combinations, ultimately supporting the efficient implementation
of a branching mechanism within the logical/algorithmic tree structure. Concerns regarding the
convergence of the classification tree construction process, including the selection of stopping
criteria for the synthesis of logical and algorithmic trees, remain significant [25]. The concept of
classification trees accommodates the use of not only individual attributes and object features
but also their combinations and sets as features, attributes, and nodes of the recognition tree
structure. By adopting independent individual recognition algorithms (evaluated using training
data) instead of object attributes as branches, a novel ACT structure is realized [21-24]. This
research specifically targets the exploration of fixed-type ACT structures within the practical
domain.


4. The general second type trees method algorithm
    Let the initial TS of the general form (1) be given as a sequence of training pairs of known
classification (power π‘š) and some system (set) of independent and autonomous recognition
(classification) algorithms for the initial TS 𝛼1 (π‘₯), 𝛼2 (π‘₯), … , 𝛼𝑛 (π‘₯). Next, it is necessary to enter
the following sets, which represent the breakdown of the data of TS by the corresponding
classification algorithms π‘Žπ‘– :
                             πΊπ‘Ž1 ,…,π‘Žπ‘– = {π‘₯ ∈ 𝐺/𝛼𝑖 (π‘₯) = 1}, (𝑖 = 1, … , 𝑛).                          (2)
    Note that to simplify explanations, each autonomous classification algorithm 𝛼𝑖 (π‘₯) generates
output values only within the binary set {0,1}, particularly 𝛼𝑖 (π‘₯) = 1 in the case of successful
object classification π‘₯ and 𝛼𝑖 (π‘₯) = 0 in the opposite case.
    Note that the system of sets πΊπ‘Ž1 ,…,π‘Žπ‘– will represent a complete step-by-step division of the set
𝐺 (with an increase in the size of 𝑖 the involved classification algorithms), which is implemented
by independent algorithms 𝛼1 , 𝛼2 , … , 𝛼𝑛 . Note that depending on the initial selection of a set of
classification algorithms, 𝛼1 , 𝛼2 , … , 𝛼𝑛 some of the sets πΊπ‘Ž1 ,…,π‘Žπ‘– may be empty (in case one or
more algorithms are not suitable for approximating the current TS) [21].
    At the next stage, we denote by the value π‘†π‘Ž1 ,…,π‘Žπ‘› the number of occurrences of those training
pairs (π‘₯𝑠 , 𝑓𝑅 (π‘₯𝑠 )), (1 ≀ 𝑠 ≀ π‘š) which satisfy the basic condition of belonging, in the initial TS π‘₯𝑠 ∈
πΊπ‘Ž1 ,…,π‘Žπ‘– .
                                     𝑗
    Accordingly, by the value π‘†π‘Ž1 ,…,π‘Žπ‘– , (𝑗 = 0,1, … , π‘˜ βˆ’ 1) we denote the number of occurrences in
the TS of those pairs (π‘₯𝑠 , 𝑓𝑅 (π‘₯𝑠 )) (𝑠 = 1,2, … , π‘š), which satisfy the conditions π‘₯𝑖 ∈ πΊπ‘Ž1 ,…,π‘Žπ‘› and
𝑓𝑅 (π‘₯𝑠 ) = 𝑗.
    So, taking into account the above, what was said and by analogy with the methods of selection
of sets of elementary features, the following values can be introduced, which should be
considered as a certain criterion of branching in the structure of the ACT:
                                                            𝑗
                                 π‘†π‘Ž1 ,…,π‘Žπ‘–      𝑗         π‘†π‘Ž ,…,π‘Ž                            𝑗
                   π›Ώπ‘Ž1 ,…,π‘Žπ‘– =      π‘š
                                             , πœ“π‘Ž1 ,…,π‘Žπ‘– = 𝑆 1      𝑖
                                                                        , πœŒπ‘Ž1 ,…,π‘Žπ‘– = max πœ“π‘Ž1 ,…,π‘Žπ‘– .   (3)
                                                           π‘Ž1 ,…,π‘Žπ‘–                    𝑗
   Note that if the object π‘₯𝑠 βˆ‰ πΊπ‘Ž1 ,…,π‘Žπ‘– is for all 𝑠 = 1, … , π‘š, then it is clear that π›Ώπ‘Ž1 ,…,π‘Žπ‘– = 0 and
 𝑗
πœ“π‘Ž1 ,…,π‘Žπ‘– = 0 for 𝑗 = 0,1, … , π‘˜ βˆ’ 1.
   Particularly the quantity π›Ώπ‘Ž1 ,…,π‘Žπ‘– characterizes the frequency of occurrences of members of the
sequence π‘₯1 , π‘₯2 , … , π‘₯π‘š (discrete objects) in the set πΊπ‘Ž1 ,…,π‘Žπ‘– , and accordingly, the quantity
 𝑗
πœ“π‘Ž1 ,…,π‘Žπ‘– characterizes the frequency of belonging to some object π‘₯ of the image (class) 𝐻𝑗 ,
provided that π‘₯ ∈ πΊπ‘Ž1 ,…,π‘Žπ‘– . It should be noted that the given condition is equivalent to the
condition that in the sequence of algorithms π‘Ž1 , … , π‘Žπ‘– there is such an algorithm π‘Žπ‘¦ that π‘Žπ‘¦ (π‘₯) =
1. Then the value π›Ώπ‘Ž1 ,…,π‘Žπ‘– characterizes the information efficiency of recognizing the belonging of
some object π‘₯ to one of the classes 𝐻0 , 𝐻1 , … , π»π‘˜βˆ’1 provided that π‘₯ ∈ πΊπ‘Ž1 ,…,π‘Žπ‘– .
    At the next stage, a fundamental question arises again regarding the object's belonging to
π‘₯ classes 𝐻0 , 𝐻1 , … , π»π‘˜βˆ’1 (the question of forming a classification rule). It is clear that the object
should be assigned π‘₯ to the class 𝐻𝑗 for which a simple relation is fulfilled:
                                                                𝑗
                                     πœŒπ‘Ž1 ,…,π‘Žπ‘– = πœ“π‘Ž1 ,…,π‘Žπ‘– .                                           (4)
    Note that here {0 ≀ 𝑗 ≀ π‘˜ βˆ’ 1}, and relation (4) represents a certain classification rule, and it
is clear that the greater the value of the value of πœŒπ‘Ž1 ,…,π‘Žπ‘– , the higher the effectiveness of the rule.
    Since the only information that represents the partitioning of images 𝐻0 , 𝐻1 , … , π»π‘˜βˆ’1 is the
initial TS, then the class 𝐻𝑗 is understood as the set of all training pairs (π‘₯𝑠 , 𝑓𝑅 (π‘₯𝑠 ))of TS that
satisfy the ratio 𝑓𝑅 (π‘₯𝑖 ) = 𝑗, that is, the condition of belonging.
    It is clear that an algorithmic tree is not the only possible construction (structure)
classification algorithm that can be organized in the form of a tree-like recognition model (several
types of such structures can be proposed). Next, we will propose one scheme for organizing a set
of classification and recognition algorithms (𝛼1 , 𝛼2 , … , π›Όπ‘š ) in the form of an ACT model, which
we will call an algorithmic classification tree of the second type.
    Let us note that a set of autonomous classification and recognition algorithms
(𝛼1 , 𝛼2 , … , π›Όπ‘š )can act as a set of primary features (attributes) for an arbitrary discrete object of
π‘₯𝑖 some initial TS of the general form (1). Moreover, with regard to a fixed discrete object π‘₯𝑖 of
the initial TS, information about the appearance of the generalized feature (GF), which is built by
the current classification algorithm, and information about the general possibility of recognizing
this discrete object (presence of failure, incorrect classification, impossibility) will be necessary
for this ACT scheme GF to describe this object, etc.).
    Therefore, let each training pair (π‘₯𝑖 , 𝑓𝑅 (π‘₯𝑖 )) of the TS correspond to its training pair of the
following form:
                           (π‘₯𝑖 (πœ‘(𝛼1 ), πœ‘(𝛼2 ), … , πœ‘(π›Όπ‘š )), 𝑓𝑅 (π‘₯𝑖 )), πœ‘(𝛼𝑗 ) ∈ {0,1} where.        (5)
    Moreover πœ‘(𝛼𝑗 ) = 1, if this discrete object is approximated by some GF 𝑓𝑙 , which is built by
the 𝛼𝑗 set classification algorithm (𝛼1 , 𝛼2 , … , π›Όπ‘š ) at the corresponding stage of ACT generation.
Similarly πœ‘(𝛼𝑗 ) = 0, if for a given discrete object the algorithm 𝛼𝑗 did not build a suitable GF
(which would ensure its approximation, classification), this situation also includes failures and
classification errors (errors of the first and second kind).
    By the algorithmic tree of the second type, we will understand some tree-like construction, the
general view of which is presented in (Fig. 1), at the vertices of which there are appropriate labels
(classification and recognition algorithms, 𝛼𝑗 as well as sets of GFs that they generate at a specific
step of the ACT construction procedure). Note that the logical tree of this construction belongs to
the class of regular logical trees of full complexity (this logical tree will be equivalent to a logical
function of four arguments, the arguments of which take values from the set {0,1}).
    The following basic ACT scheme for synthesizing a tree of algorithms of the second type based
on a branched selection of generalized features allows us to build ACT structures of arbitrary
complexity and efficiency (Fig. 1).
    Stage of initial selection and evaluation of independent classification algorithms. At the
initial stage, it is necessary to select and evaluate the basic set (fixed set) of classification and
recognition algorithms (𝛼1 , 𝛼2 , … , π›Όπ‘š ) from the initial algorithm library. Note that this procedure
is performed based on the selected (fixed) performance criterion, followed by ranking –
interactive or randomly. The performance criterion may vary depending on the type of act
structure that is being built and cannot be changed during the classification tree synthesis
process. The set of autonomous algorithms (𝛼1 , 𝛼2 , … , π›Όπ‘š ), as well as their total number in the
set, are selected depending on the applied aspects of the problem and can be selected even on the
basis of a complete search of the algorithm library (of course, with significant losses of hardware
resources and processor time). At the initial stage of synthesis of the second type of ACT model,
by selecting (ranking) a set of classification algorithms and their total number, the final structural
complexity of the algorithm tree can be controlled.
    Stage of synthesis of the algorithm tree structure and generalized features. At the next
stage, the central task is to build a complete regular classification tree (fixed LCT structure),
where the corresponding tiers of the structure contain the selected classification algorithms
(𝛼1 , 𝛼2 , … , π›Όπ‘š ), fixed at the first stage of constructing classifier sets.
    A special feature of the algorithm tree of the second type is that in the constructed
classification tree structure (LCT structure), each vertex has two transitions to the next level,
denoted by a value from the binary set {0,1}. This is why the structure of the algorithm tree is
represented using a regular LCT construct. Based on this, all attributes (labels) of the same type
(classification algorithms and generated generalized features) are located at each of the levels of
this structure. In such a regular classification tree structure, nodes are independent algorithms
(classifiers) (𝛼1 , 𝛼2 , … , π›Όπ‘š ). Generalized feature sets (GFs) of 𝑓𝑗 are also generated during the
synthesis step of the algorithms tree structure. Therefore, we can conclude that the algorithm
tree generates a tree of generalized features.
     The idea of the second stage of synthesis of the algorithm tree structure (ACT type II model)
is the procedure for synthesizing a set of generalized features 𝑓𝑗 (vertices of the generalized
features tree) based on pre-selected sets of independent classification and recognition algorithms
𝛼𝑖 . Note that the total number of GFs 𝑓𝑗 generated by the corresponding classification algorithm
depends on the initial parameters of the ACT model and synthesis parameters, the specifics of the
application problem, and the resource constraints of the classification tree synthesis system.




   Figure 1: The general block - diagram of the second type tree method

    At the end of the second stage, after the formation of a set of synthesized generalized features
𝑓𝑗 for a given application problem is completed, they are located in the corresponding nodes, tiers
of the tree of algorithms of the second type (the structure of the tree of generalized features is
constructed).
    Stage of checking the constructed structure of ACT. At the final stage of synthesizing the
second type of algorithm tree, you need to check the constructed ACT model. For each element
(object) of the test sample, the corresponding values of πœ‘(𝛼𝑗 ) are calculated. This value is
calculated based on a set of previously constructed generalized features - for each node of the
corresponding tree level. The constructed generalized features define the corresponding route
(bounded classifier) in the structure of the tree of algorithms of the second type. For such a GFs
structure, each of the nodes in the algorithm tree, in the event of a possible approximation of an
object of unknown classification, increases the corresponding counter of the class belonging to it
and leaves it unchanged in the event of a classification error or failure. This procedure allows you
to make a final assessment of the effectiveness of the constructed tree of algorithms of the second
type.

5. Experiments and results
    The experimental validation of the proposed second type algorithm tree construction scheme
underscores its capability to tune the complexity and accuracy of the resulting classification tree
model. The model comprises various autonomous classification algorithms which, during the
modeling process, evolve into a hierarchical structure of generalized features. The selection of an
optimal model from the array of constructed Algorithmic Classification Trees (ACTs) for a specific
task hinges on evaluating multiple parameters and the effectiveness of the model, which is
typically assessed through techniques such as cross-validation against the training set (TS) data.
An essential stage in this process involves identifying the most critical parameters of the model,
such as the feature space size, the number of vertices, transitions, and algorithms. This step is
crucial for estimating the ACT's error relative to the input data set, facilitating comparison, and
aiding in the selection of a specific ACT model from the pre-defined ensemble.
    Quality criteria of the constructed algorithm trees are paramount and depend on several
factors including model error, the robustness of the initial TS data set, the size of the testing
sample, and the dimensional characteristics of the problem (e.g., the number of model
parameters). At the optimization stage of the constructed ACT model, priority is given to
minimizing errors across the training and test datasets for each class defined by the initial
conditions of the current applied problem.
    A significant ongoing challenge is reducing the complexity and structural pruning of the ACT
model. This reduction pertains to the overall count of functions (classifiers) and algorithms
within the ACT framework, the total number of vertices (generalized features), and the number
of transitions within the structure, as well as optimizing total memory usage and processing time
of the information system. Consequently, the defining measure of the quality and efficiency of a
constructed model, whether ACT or Logical Classification Trees (LCT), is determined by an overall
integral quality indicator:
                                                           πΈπ‘Ÿπ΄π‘™π‘™
                                              πΉπ‘Ÿπ΄π‘™π‘™        βˆ’
                                 π‘„π‘€π‘Žπ‘–π‘› = 𝑉              β‹… 𝑒 𝑀𝐴𝑙𝑙 .                               (6)
                                             𝐴𝑙𝑙 β‹…βˆ‘π‘– 𝑝𝑖
    Note that in formula (6), the set of parameters 𝑝𝑖 represents the most important
characteristics of the constructed classification tree that is evaluated:
    1) πΈπ‘Ÿπ΄π‘™π‘™ – the total number of errors of the ACT model on the data arrays of the initial test and
training samples;
    2) 𝑀𝐴𝑙𝑙 – the total capacity (volume) of data arrays of training and test samples;
    3) πΉπ‘Ÿπ΄π‘™π‘™ – the number of vertices of the obtained ACT model with the resulting values
𝑓𝑅 (recognition functions, i.e. leaves of the classification tree);
    4) 𝑉𝐴𝑙𝑙 – represents the total number of all types of vertices in the structure of the ACT model;
    5) π‘‚π‘ˆπ‘§ – the total number of generalized features used in the classification tree model;
    6) 𝑃𝐴𝑙𝑙 – the total number of transitions between vertices in the structure of the constructed
classification tree model;
    7) 𝑁𝐴𝑙𝑔 – the total number of different autonomous classification algorithms π‘Žπ‘– used in the
classification tree model.
    Note that this integral indicator of the quality of the ACT model will take values from zero to
one. The smaller it is, the worse the quality of the constructed classification tree will be, and the
larger the indicator, the better the resulting model will be.
    The Orion software complex was developed at the Uzhhorod National University based on
classification tree methods to generate autonomous recognition systems. The algorithmic library
of the system includes 18 recognition algorithms, among which tree schemes of algorithms of
three types are implemented.
    The primary task on which the effectiveness of algorithm tree methods was tested was the
task of recognizing geological data – the task of separating oil-bearing and water-bearing strata.
The initial parameters of this applied problem of geological data classification are presented in
(Table 1).
    Information about objects of two classes is presented in the TS. At the examination stage, the
constructed classification system should effectively recognize objects of unknown classification
relative to these two classes. Before starting work, the training sample was automatically checked
for correctness - finding and removing errors of the first kind. The system implements a
retraining and error correction scheme in the classification tree (REC algorithm).
    The training sample of the presented problem consisted of 1342 objects, of which 761 were
oil-bearing objects. The effectiveness of the constructed ACT model was evaluated on a test
sample of 267 objects. The data from training and test samples were obtained based on geological
exploration in the territory of the Transcarpathian region in the period from 2001 to 2019. A
fragment of the main results of the above experiments, constructed models of LCT/ACT of various
types, are presented in (Table 2).

Table 1
Initial parameters of the classification problem
           Description of       The          The        The         Relation of objects of
          classes 𝐻𝑖 tasks dimension power             total     different classes IS –𝐻𝑖 /𝑀
                               of the      of data   number
                              feature     array of       of
                              space𝑁         the      classes
                                          primary    by data
                                            IS –𝑀    splitting
                                                       IS –𝑙
           Oil-bearing       (12/10)       1342          2              761 / 1342
           layers ( 𝐻1 )
          Aquifers ( 𝐻2 )    (12/10)       1342         2               581 / 1342

   (Table 3) presents information on the classification models' generation time, the total number
of vertices, and elementary and generalized features on the basic hardware configuration
Intel i7-12700H. All constructed schemes of classification trees (structures of LCT/ACT) provided
the necessary level of accuracy given by the task condition, speed, and consumption of the
system's working memory.

Table 2
Comparison table of built ACT/LCT models for classification of geological data
         Classific       Method of       Integral feature     The overall         The
          ation         synthesis of         of model         indicator of    number of
           tree      classification tree  quality π‘„π‘€π‘Žπ‘–π‘›      the structural errors and
          model           structure                          complexity of    failures to
            No                                                    the         classify the
                                                             classification    LCT/ACT
                                                              tree model       model on
                                                                 π‘†π‘€π‘Žπ‘–π‘›         the data
                                                                               set πΈπ‘Ÿπ΄π‘™π‘™
          No. 1        The method of
                     full LCT based on      0.004786              121              7
                      the selection of
                     elementary traits
                         (extensive
                        selection of
                          features)
           No. 2     The LCT method
                     with a one-time       0.002271           144               12
                    assessment of the
                      importance of
                         features
           No. 3     Limited method
                    of construction of     0.003193           97                16
                            LCT
           No. 4     Algorithmic tree
                     method (type I)       0.005287            52               10

           No. 5     Algorithmic tree
                     method (type II)      0.003033            64               8

           No. 6    A limited method
                     of building ACT       0.002654            55               14

           No. 7      Algorithm tree
                         based on          0.007221            31               6
                      hyperspheres
           No. 8         A tree of
                    algorithms based       0.004418            54               19
                            on
                    hyperparallelepip
                            eds
           No. 9         A tree of         0.006476            30               8
                    algorithms based
                     on hyperellipses
          No. 10      Algorithm tree       0.006251            37               11
                         based on
                       hypercubes

Table 3
General structural parameters of the constructed models of LCT/ACT
                No. 1 No. 2 No. 3 No. 4 No. 5 No. 6 No. 7               No. 8       No. 9   No. 10
 Total time of
 classification
      tree        34     21       18      65      82      55      47     56          50      98
   synthesis     (s.)    (s.)     (s.)    (s.)    (s.)    (s.)   (s.)    (s.)        (s.)    (s.)
      𝑇𝐴𝑙𝑙
 The number
   of tiers of    12     10        9      26      30      23      21     24          22      34
 the LCT/ACT
   structure
      𝑅𝐴𝑙𝑙
   The total
  number of
  attributes /
  vertices of    102     91       86      234     244     212    198    223          207     219
 the LCT/ACT
   structure
      𝑉𝐴𝑙𝑙
   The total
  number of
  elementary
        /
  generalized
  features in      56      72      40       17       41      30       18      47       21        35
      the         (el.)   (el.)   (el.)    (g.)     (g.)    (g.)     (g.)    (g.)     (g.)      (g.)
 structure of
      the
 classification
      tree
    𝑂𝑒𝑙 /π‘‚π‘ˆπ‘§

   Therefore, the algorithmic tree classification method proposed in the paper (second-type ACT
methods) was compared with the complete LCT method and the limited method of selection of
elementary features and showed a generally acceptable result.


6. Conclusion
   The developed models of classification trees (ACT/LCT structures) have successfully met the
requirements for quality and speed in geological data classification schemes while maintaining a
compact structure (parameter π‘†π‘€π‘Žπ‘–π‘› ). The sets of independent classification algorithms selected
for generating GF groups also demonstrated their effectiveness within the scope of this applied
problem. Notably, the models of ACT employing basic geometric classifiers were found to be the
most effective upon critical evaluation. Furthermore, the composite ACT structures resulted in a
relatively low number of classification errors in both the training and testing datasets. The full
model of the second type ACT, based on geometric classifiers, showed promising results
(π‘„π‘€π‘Žπ‘–π‘› = 0,003033), largely due to the inclusion of a universal algorithm of hyperspheres in the
scheme. In contrast, the structure of the first type ACT exhibited superior quality
(π‘„π‘€π‘Žπ‘–π‘› = 0,005287) compared to second type algorithm trees. This superiority is attributed to the
more complex construction of the model (π‘†π‘€π‘Žπ‘–π‘› =52), which, consequently, required longer
generation times. However, it is essential to consider the limitations of the selected geometric
classifiers, which may not always provide effective approximation of the TS data. A notable
drawback of the ACT models presented, identified during this task, is the relatively high time
consumption during the synthesis stage of the classification tree models, especially when
compared to the LCT structures. The time difference in constructing the first type of ACT models,
which includes a step-by-step assessment of feature informativeness, was nearly 34% greater
than that of LCT.
   The scientific novelty lies in the fact that for the first time a modified method for constructing
algorithm trees based on evaluating and ranking a set of autonomous recognition algorithms for
generating a classification tree structure (ACT model) has been proposed.
   The practical implications of these findings are significant. The proposed method for
constructing ACT models (of the second type) enables the creation of economical and efficient
classification models with specified accuracy. This method has been integrated into the algorithm
library of the "ORION" system, addressing various applied classification challenges and
demonstrating a high degree of versatility across a range of applications. The efficacy of the
classification tree models and the associated software have been confirmed through practical
applications. Looking ahead, future research could focus on the further development of ACT
methods, including the introduction of new types and schemes of classification trees.
Additionally, optimizing the software implementations of the proposed ACT method and its
practical validation on a variety of precise classification and recognition tasks could provide
valuable insights and enhancements to the field.

Acknowledgements
   The presented work was carried out within the framework of projects: Expanding
Opportunities of High Technologies for Higher Education Institutions (Skills2Scale) - HEI
Initiative; Innovative teaching methods to support partnership relations - "InovEduc" - grant
project No. CBC01008 of the Norwegian State Fund with the solidarity budget of the Slovak
Republic within the framework of the SK08 cross-border cooperation program; Research work
"Modeling and prediction of emergency situations in the Carpathian region and the countries of
Central Eastern Europe", state registration number of the work - 0106V00285, work category -
fundamental research, 01 Fundamental research on the most important problems of natural,
social and humanitarian sciences.


References
[1] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer, Berlin,
     2008.
[2] J.R. Quinlan, Induction of Decision Trees, Machine Learning 1 (1986) 81-10.
[3] L.L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and regression trees,
     Chapman and Hall/CRC, Boca Raton, 1984.
[4] A. Kintonova, M. Mussaif, G. Gabdreshov, Improvement of iris recognition technology for
     biometric identification of a person, Eastern-European Journal of Enterprise Technologies
     6(2(120)) (2023) 60–69. doi: 10.15587/1729-4061.2022.269948.
[5] Y.V. Bodyanskiy, A.Y. Shafronenko, I.P. Pliss, Credibilistic fuzzy clustering based on
     evolutionary method of crazy cats, System Research and Information Technologies 3 (2021)
     110–119. doi: 10.20535/SRIT.2308-8893.2021.3.09.
[6] M. Miyakawa, Criteria for selecting a variable in the construction of efficient decision trees,
     IEEE Transactions on Computers 38(1) (1989) 130-141.
[7] H. Koskimaki, I. Juutilainen, P. Laurinen, J. Roning, Two-level clustering approach to training
     data instance selection: a case study for the steel industry, in: Proceedings of the
     International Joint Conference on Neural Networks, IJCNN 2008, IEEE, Los Alamitos, 2008,
     pp. 3044-3049. doi: 10.1109/ijcnn.2008.4634228.
[8] S.F. Jaman, M. Anshari, Facebook as marketing tools for organizations: Knowledge
     management analysis, in: S.F. Jaman, M. Anshari, Dynamic perspectives on globalization and
     sustainable business in Asia, IGI Global, Hershey, 2019, pp. 92-105. doi: 10.4018/978-1-
     5225-7095-0.ch007.
[9] V.E. Strilets, S.I. Shmatkov, M.L. Ugryumov et al, Methods of machine learning in the
     problems of system analysis and decision making, Karazin Kharkiv National University,
     Kharkiv, 2020.
[10] R.L. De MΓ‘ntaras, A distance-based attribute selection measure for decision tree induction,
     Machine learning 6(1) (1991) 81-92.
[11] K. Karimi, H. Hamilton, Generation and Interpretation of Temporal Decision Rules,
     International Journal of Computer Information Systems and Industrial Management
     Applications 3 (2011) 314-323.
[12] B. KamiΕ„ski, M. Jakubczyk, P. Szufel, A framework for sensitivity analysis of decision trees,
     Central European Journal of Operations Research 26(1) (2017) 135-159.
[13] H. Deng, G. Runger, E. Tuv, Bias of importance measures for multi-valued attributes and
     solutions, in: Proceedings of the 21st International Conference on Artificial Neural Networks,
     volume 2 of ICANN 2011, Springer-Verlag, Berlin, 2011, pp. 293-300. doi: 0.1007/978-3-
     642-21738-8_38.
[14] S.A. Subbotin, Construction of decision trees for the case of low-information features, Radio
     Electronics, Computer Science, Control 1 (2019) 121-130. doi: 10.15588/1607-3274-2019-
     1-12.
[15] A. Shyshatskyi, Complex Methods of Processing Different Data in Intellectual Systems for
     Decision Support System, International Journal of Advanced Trends in Computer Science and
     Engineering 9(4) (2020) 5583–5590. doi: 10.30534/ijatcse/2020/206942020.
[16] A. Painsky, S. Rosset, Cross-validated variable selection in tree-based methods improves
     predictive performance, IEEE Transactions on Pattern Analysis and Machine Intelligence
     39(11) (2017) 2142-2153. doi:10.1109/tpami.2016.2636831.
[17] D. Imamovic, E. Babovic, N. Bijedic, Prediction of mortality in patients with cardiovascular
     disease using data mining methods, in: Proceedings of the 19th International Symposium
     INFOTEH-JAHORINA, INFOTEH 2020, IEEE, Los Alamitos, 2020, pp. 1-4.
     doi:10.1109/INFOTEH48170.2020.9066297.
[18] S.B. Kotsiantis, Supervised Machine Learning: A Review of Classification Techniques,
     Informatica, 31 (2007) 249-268.
[19] Y.I. Zhuravlev, V.V. Nikiforov, Recognition algorithms based on the calculation of estimates,
     Cybernetics 3 (1971) 1-11.
[20] I. Povkhan, O. Mulesa, O. Melnyk, Y. Bilak, V. Polishchuk, The Problem of Convergence of
     Classifiers Construction Procedure in the Schemes of Logical and Algorithmic Classification
     Trees, in: Proceedings of the Second International Workshop on Computer Modeling and
     Intelligent Systems, CMIS-2022, Volume 3137 of CEUR Workshop Proceedings, CEUR-WS,
     Zaporizhzhia, Ukraine, 2022, pp. 1-13.
[21] I. Povkhan, A constrained method of constructing the logic classification trees on the basis of
     elementary attribute selection, in: Proceedings of the Second International Workshop on
     Computer Modeling and Intelligent Systems, CMIS-2020, CEUR Workshop Proceedings,
     Volume 2608, 2020, pp. 843-857.
[22] I. Povkhan, M. Lupei, The algorithmic classification trees, in: Proceedings of the IEEE Third
     International Conference on Data Stream Mining & Processing, DSMP 2020, IEEE, Los
     Alamitos, 2020, pp. 37-44.
[23] I. Povkhan, M. Lupei, M. Kliap, V. Laver, The issue of efficient generation of generalized
     features in algorithmic classification tree methods, in: Proceedings of the International
     Conference on Data Stream Mining and Processing, DSMP 2020, IEEE, Los Alamitos, 2020,
     pp. 98-113.
[24] I. Povkhan, Classification models of flood-related events based on algorithmic trees, Eastern-
     European Journal of Enterprise Technologies, 6(4) (2020) 58-68. doi: 10.15587/1729-
     4061.2020.219525.
[25] J. Rabcan, V. Levashenko, E. Zaitseva, M. Kvassay, S. Subbotin, Application of Fuzzy Decision
     Tree for Signal Classification, IEEE Transactions on Industrial Informatics, 15(10) (2019)
     5425-5434. doi: 10.1109/tii.2019.2904845.