<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ORCID:</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Predict the Survival of Kidney Transplants One Month after Transplantation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ivan Izonin</string-name>
          <email>ivanizonin@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tolstyak</string-name>
          <email>tolstyakyaroslav@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rashkevych</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentyna Chopyak</string-name>
          <email>chopyakv@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kachmar</string-name>
          <email>veronika.kachmar.knm.2020@lpnu.ua</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roman</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tkachenko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mariia</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Danylo Halytsky Lviv National Medical University</institution>
          ,
          <addr-line>Pekarska str., 69, Lviv, 79010</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lviv Polytechnic National University</institution>
          ,
          <addr-line>S. Bandera str., 12, Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Lviv Regional Clinical Hospital</institution>
          ,
          <addr-line>Chernihivska str., 7, Lviv, 79010</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2028</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The paper deals with the task of predicting the survival of kidney transplants one month after transplantation using artificial intelligence tools. The use of machine learning to solve it can provide additional information to the doctor for timely and correct diagnosis and the possibility of adjusting therapy schemes. The authors used a real-world dataset obtained from the Lviv Regional Clinical Hospital, Ukraine. Taking into account the imbalance of the dataset, the effectiveness of using different approaches to balancing was investigated in this paper. It has been established that the class weights approach provides the best results for solving the stated problem. The authors developed the new hybrid two-ML-based classifier based on the sequential use of two machine learning algorithms. The results of the first of them, the Naive Bayes classifier, in the form of a set of probabilities belonging to each of the defined classes of the task, replace all the initial attributes with the found set of probabilities. The second machine learning method, the Random Forest algorithm, uses a new dataset of significantly reduced dimensionality to predict the result. The optimal parameters of both algorithms, which are the basis of the method, are selected. The efficiency of the method compared with several known approaches. The highest accuracy of the hybrid two-ML-based classifier, its high generalization properties, and the satisfactory duration of the training procedure were determined experimentally. All this ensures the possibility of its use when solving the real problems of transplantation medicine. small data approach, kidney transplant survival, classification, machine learning, class weights, IDDM'2023: 6th International Conference on Informatics &amp; Data-Driven Medicine, November 17 - 19, 2023, Bratislava, Slovakia Proceedings</p>
      </abstract>
      <kwd-group>
        <kwd>hybrid approach</kwd>
        <kwd>Random forest</kwd>
        <kwd>Naive Bayes</kwd>
        <kwd>medical diagnosis</kwd>
        <kwd>non-linear exensions</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The modern development of medicine is accompanied by the appearance of a large number of
complex diagnosis and prediction tasks, which require the use of artificial intelligence for their effective
solution [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Today, a wide range of existing artificial intelligence tools provides the possibility of
solving a large number of similar tasks.
      </p>
      <p>If we talk about the tasks of analyzing tabular datasets in medicine, then it should be noted a large
number of factors that affect the effectiveness of the selected machine learning methods. First, it is the
multi-parameter nature of tabular datasets, which includes attributes from clinical, laboratory, and other
types of research. It can reduce the generalization properties of a particular machine learning method.
In addition, in such data, there is a large number of complex nonlinear interconnections between a large
number of attributes of each data vector, which are very difficult for the clinician to detect.
EMAIL:
(A.</p>
      <p>1);</p>
      <p>2);</p>
      <p>2023 Copyright for this paper by its authors.
CEUR</p>
      <p>ceur-ws.org</p>
      <p>
        Medical data can be noisy or contain errors for various reasons, such as improper collection or
registration [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. They contain many omissions and outliers. If we are talking about classification tasks,
then in the vast majority such data is unbalanced.
      </p>
      <p>All these factors significantly affect the effectiveness of intellectual data analysis and, as a result,
can lead to incorrect conclusions and diagnoses. In addition, if the medical dataset intended for analysis
is critically small, then the impact of all of the above increases many times.</p>
      <p>This paper aims to develop a hybrid two-ML-based classifier to improve the accuracy of medical
diagnostics in the conditions of processing an unbalanced multi-parameter short dataset. The applied
task solved in this paper consists of predicting the survival of kidney transplants one month after
transplantation.</p>
      <p>The significant scientific and practical results of this paper can be summarized as follows:
• We investigated the influence of the data balancing methods from different classes on the
accuracy of ML-based classifiers when solving the task of predicting the survival of kidney
transplants one month after transplantation in the conditions of processing an unbalanced short
dataset;
• We proposed a procedure for data pre-processing by the Naive Bayes classifier to obtain a set
of probabilities belonging to each data class and a procedure for replacing all independent initial
attributes with the found set of probabilities to increase the accuracy and speed of the next classifier;
• We developed a hybrid two-ML-based classifier based on the consistent use of the Naive Bayes
classifier and Random Forest algorithm to increase the classification accuracy of prediction of the
kidney transplants survival one month after transplantation
• We selected the optimal parameters of the developed method and established the highest
accuracy of its operation according to various indicators by comparing it with several existing
methods.</p>
    </sec>
    <sec id="sec-2">
      <title>2. State-of-the-arts</title>
      <p>
        The task of predicting the risk of kidney graft failure after transplantation is not new. The application
of artificial intelligence tools to its solution provides the possibility of taking into account a large
number of hidden dependencies in a multi-parameter dataset, which provides the possibility of
increasing the accuracy of solving this task [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        In particular, in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] the authors investigated this task using one of the most widely used methods in
statistical survival and time-to-event analysis, the Cox Proportional-Hazards Model (CPHM). It
provides the ability to assess the impact of various factors on the risk of an event on a time scale. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
the CPHM was also used, however, for the assessment of the primary immunosuppressive therapy's
influence on kidney transplant survival. Experimental studies of both tasks were conducted on a short
set of data. Despite the sufficient accuracy of the obtained results, the CPHM assumes that the influence
of risk factors should remain constant over time. In some cases, this assumption may be violated, and
then the model may produce inaccurate or incorrect results.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] the authors solved the kidney allograft failure prediction task using an artificial intelligence
tool. The peculiarity of this paper is that the authors used a dataset, parts of which were obtained from
18 academic transplant centers around the world. Thus, the data sample consisted of more than 13
thousand vectors. As the chosen approach, the authors investigated the effectiveness of Bayesian joint
models. The proposed approach’s efficiency was evaluated using only the AUC curve. The obtained
accuracy isn’t at such a level that this approach can be used in practice.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] the authors tried to predict the risk of kidney graft failure one year and five years after
transplantation. The peculiarity of this paper is that they used a large dataset (more than 50 thousand
observations). The classification was done using several well-known machine learning methods. The
authors established the highest classification accuracy of 83% using SVM. Other methods showed
significantly lower accuracy. The authors also developed a smart system for analyzing the patient's
activity using the principles before such development [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], [11]. However, the low classification
accuracy using the existing large dataset (relative to other studies) is not satisfactory.
      </p>
      <p>In [12] the above task was also solved. The authors investigated the effectiveness of the application
of the stacking strategy to increase the accuracy of solving the stated task. In this case, they worked
with only 513 observations. In connection with the imbalance of the dataset, the paper uses several
balancing methods. Performance was evaluated using various indicators. The authors established the
highest prediction accuracy of 87% using the C5.0 algorithm. The use of artificial neural networks in
this case demonstrated significantly lower classification accuracy.</p>
      <p>In [13] a stacking strategy was also considered to increase the classification accuracy of prediction
of the risk of kidney graft failure one month after transplantation. The data set contained about 200
observations. The authors performed data pre-processing procedures (missing data recovery, feature
selection), applied classic machine learning methods, and stacked the best of them in one method. The
maximum obtained accuracy reached 91% (f1-score). However, considering that the dataset is not
balanced, the accuracy of the stacking proposed by the authors could be improved using data balancing
methods at the algorithmic level [14], and the speed of operation due to the parallelization [15].</p>
      <p>In general, the above-mentioned approaches are quite complex, require qualification from the user,
and do not provide high classification accuracy. In this paper, we tried to correct the above
shortcomings.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Materials and methods</title>
      <p>This section describes the dataset to predict the survival of kidney transplants one month after
transplantation. The problem of analyzing an unbalanced dataset and the main classes of methods for
its solution are considered. The analysis of two machine learning methods, which are the basis of the
approach proposed in this paper, is presented. A description of the composition of the proposed
approach and algorithms for its training and application are presented.
3.1.</p>
    </sec>
    <sec id="sec-4">
      <title>Dataset description</title>
      <p>In this study, the outpatient medical histories of 146 patients who received HLA-compatible renal
allografts between 1992 and 2020 were retrospectively analyzed: 146 patients were transplanted for the
first time: 55 (37.6%) women and 91 (62.3%) men with an average age of 32.8 years ± 8.4 years (range
= 18–60 years) at the time of transplantation. All patients were receiving outpatient treatment in the
nephrology and dialysis department of the Lviv Regional Clinical Hospital (LRKL), One Month after
kidney transplantation [13].</p>
      <p>The study used a group of patients who had transplantation for the first time, lived One Month after
kidney transplantation, and received supportive immunosuppressive therapy. Kidney transplantation
was performed in Ukraine (Kyiv, Zaporizhzhia, Kharkiv, Lviv, Odesa) in 79.8% of patients, and in
other countries (Italy, Belarus, Pakistan, Poland, Turkey, China) - 20.2% of patients. 100% of patients
from Ukraine received a kidney from a family donor, and 14.5% of patients, all from abroad, received
a kidney from a deceased donor. The dataset consists of 40 features and one target attribute [13].</p>
      <p>Clinical examinations of patients (collection of complaints, medical and life anamnesis, objective
examination of patients) were carried out in the department of nephrology and dialysis of the Lviv
Regional Clinical Hospital One Month after kidney transplantation. Laboratory research of general
clinical indicators (general analysis of blood and urine and biochemical analysis of blood) was carried
out in the central laboratory of the Lviv Regional Clinical Hospital according to the instructions for the
use of laboratory research kits. In this retrospective analysis, a history of renal dysfunction and other
data were obtained from personal history [13].</p>
      <p>Viral infections were determined by the immunoenzymatic method and standard methods of
polymerase chain reaction in private immunological laboratories.
3.2.</p>
    </sec>
    <sec id="sec-5">
      <title>A balancing approach for small data</title>
      <p>
        In the medical diagnostics field, often arise the tasks, when one dataset’s class (for example, patients
with a rare disease) may be significantly smaller than another (healthy patients) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. As a result, we get
unbalanced data samples that should be analyzed by machine learning methods [16]. A similar situation
can lead to a decrease in the effectiveness of the selected classifier since machine learning methods
usually tend to learn a larger class and underestimate a smaller class [14], [17]. This leads to false
conclusions about the work of the chosen method, which can, have serious consequences for the life
and health of people.
      </p>
      <p>The two main classes of methods for dealing with the imbalance of a stated set of medical data are
data-level balancing and algorithmic-level balancing. The first class of methods includes the intuitive
techniques of oversampling and undersampling. The underdamping technique removes redundant
observations from more numerous classes to compare them with minor classes. However, reducing the
data sample can lead to the loss of important information and reduce the accuracy of the classifier [18].
The over-sampling technique consists of adding additional observations to less represented classes. This
can be done by duplicating existing samples or creating new, synthetic data, for example using SMOTE.
In the case of analysis of short datasets, it is appropriate to solve the balancing problem using the latter
approach.</p>
      <p>The second class of methods is based on the balancing at the algorithmic level. In particular, methods
based on changing the decision threshold can help balance decisions between classes. Typically, the
model makes decisions based on the probability of class membership, and changing the threshold can
affect positive and negative predictions. However, this requires an expert. Another technique is a
method based on the use of class weights. It uses a higher weight when calculating the loss function
specifically for smaller classes. This allows the machine learning method to pay more attention to
smaller classes.</p>
      <p>Given the need to analyze extremely short datasets, in this paper, we will investigate two balancing
methods from different classes: SMOTE and class weights. Details of their work and practical
implementation can be found in [18], and [19].
3.3.</p>
    </sec>
    <sec id="sec-6">
      <title>Naive Bayes classifier</title>
      <p>The Naive Bayes classifier is based on Bayes' theorem, which originated in the 18th century, but the
naive Bayes classifier as a machine learning method was formulated much later when computer
technologies became more accessible [21].</p>
      <p>The Naive Bayesian classifier is a machine learning method used to solve classification tasks. It is
based on the assumption that all characteristics (or variables) of the sample are independent of each
other, although this is rarely true.</p>
      <p>The principle of operation of the naive Bayesian classifier includes the following steps [21]:
1. A collection of datasets in which objects have labels or classes, as well as a set of features that
describe each object.
2. Calculation of posterior probabilities for each possible class (probability that this object belongs
to a specific class) using Bayes' theorem.
3. Classification procedure. The object is assigned to the class for which the posterior probability
is the highest.</p>
      <p>The Naive Bayes classifier has its advantages and disadvantages, especially when applied to the
analysis of short datasets.</p>
      <p>Among the advantages, it should be noted the high speed of its operation. All calculations are based
on calculating probabilities for individual features, so it is well suited for large datasets. However, the
naive assumption of feature independence can increase classification performance when the training
dataset is small.</p>
      <p>Disadvantages of this machine learning method include the naive assumption of independence: This
assumption rarely holds, which can lead to poor accuracy. In addition, if a priori assumptions about the
class probabilities are incorrect or unreliable, this can significantly affect the classification results. In
addition, to achieve high classification accuracy, it is necessary to have a large amount of training data,
which can be a limitation in some domain areas.
3.4.</p>
    </sec>
    <sec id="sec-7">
      <title>Random Forest algorithm</title>
      <p>Random Forest is a powerful machine learning method that uses an ensemble of trees to solve
classification and regression tasks [18]. The main idea of the method is that instead of using one big
tree, several trees are created with random subsamples of data and random subsamples of features, and
the results of their outputs are combined to provide more stable and accurate results. The main steps of
implementing this algorithm are as follows [18]:</p>
      <p>1. Construction of random subsamples of data with repetition (bootstrap sampling). This means that
each tree will be built on its dataset, which may include duplicate objects.</p>
      <p>2. Construction of each branch in the tree due to the selection of a subset of random features
(Random Feature Selection). This helps to make the trees less correlated and diverse.
3. Building each tree randomly by choosing the best branch from a subset of features at each step.
4. Combining the results of the classification of each tree by voting and forming the result based on
the largest number of votes.</p>
      <p>The Random Forest algorithm has many advantages. In particular, due to bootstrap sampling and
random feature selection, Random Forest is resistant to overfitting. In addition, it often shows high
accuracy on a variety of tasks, particularly on small data with a large number of features. In addition,
Random Forest can effectively work with large volumes of data without significant loss of performance.</p>
      <p>However, this method also has disadvantages. Compared to the single trees that form it, a Random
Forest can be difficult to interpret due to its complexity. In addition, building multiple trees can be
timeconsuming, especially on large datasets. However, these shortcomings do not reduce the relevance of
using this machine learning method to solve the task.
3.5.</p>
    </sec>
    <sec id="sec-8">
      <title>Combined two-ML-based classifier</title>
      <p>This paper proposes a combined approach to classification. It is based on the consistent use of the
Naive Bayes classifier and Random Forest algorithm.</p>
      <p>The Naive Bayes classifier is used for pre-processing the data to modify the features that will be
submitted to the Random Forest algorithm. Naive Bayes classifier provides the possibility of forming
the result in two ways: as a label belonging to a certain class of the task, and a set of probabilities of
observation belonging to each of the classes of the task. In this paper, we use the last option. We use
the obtained set of probabilities for each observation of the studied dataset to form a new dataset for the
classifier based on the Random Forest algorithm. That is, we replace all the initial attributes with the
output signals of the Random Forest algorithm.</p>
      <p>The Random Forest algorithm uses a new dataset. The results of classification using this machine
learning algorithm are the final desired result of the work of the hybrid two-ML-based classifier
proposed in this paper. The flow-chart of the proposed approach is shown in Fig. 1
5. Formation of a new dataset by replacing the initial features with a set of probabilities obtained
by the Naive Bayes classifier for each observation from the dataset.
6. Training the Random Forest algorithm.</p>
      <p>The algorithmic implementation of the application procedure of the hybrid two-ML-based classifier
involves the following steps:
1. Normalization of the input vector, which should be assigned to one of the classes defined by
the task.
2. Application of the normalized vector on the pre-trained Naive Bayes classifier to obtain its
output signals.
3. Application of the set of probabilities obtained in the previous step to the previously trained
Random Forest algorithm.</p>
      <p>4. Obtaining the desired value.</p>
    </sec>
    <sec id="sec-9">
      <title>4. Modeling and results</title>
      <p>This section describes the modeling procedure. The optimal operating parameters of the developed
method were selected and the obtained results were presented.</p>
      <p>Modeling took place using a real-world dataset obtained from various hospitals in Ukraine. The data
set had 146 observations and it is unbalanced. It was randomly split 80% to 20%. Data was normalized
using MaxAbsScaler.
4.1.</p>
    </sec>
    <sec id="sec-10">
      <title>Results on data balancing</title>
      <p>First, we investigated the effectiveness of applying two balancing methods from different classes.
We used data-level balancing by applying the SMOTE algorithm. In this case, we increased the number
of representatives of the smaller class from 37 to 109 using the above approach. Another method, class
weights, provides balancing at the algorithm level by penalizing errors in a smaller class. It uses a set
of weights that can be formed as a ratio of a larger class to a smaller one and uses them in the process
of the selected machine learning algorithm. Due to this, the smaller class is given more attention, which
increases the accuracy of its processing.</p>
      <p>Fig. 2 summarizes the results of the work of both classifiers on original, unbalanced data, and using
both balancing methods.
0,920 0,910 0,910 1,000 1,000 1,000 0,990
0,900 0,884 0,900 0,824 0,824
0,880 0,800 0,739
0,860 0,700
00000,,,,,888774206800000 0,813 0,783 0,813 0000,,,,653400000000
0,740 0,200
0,720 0,100
0,700 0,000</p>
      <p>Naive Bayes Naive Bayes Naive Bayes
classifier (original classifier (SMOTE) classifier (class
dataset) weights )</p>
      <p>Train F1 Score</p>
      <p>The following conclusions can be drawn using Fig. 2:
• Naive Bayes classifier using the original, unbalanced dataset provides fairly high classification
accuracy, taking into account the large number of input attributes and the limited volume of the
studied dataset;
• Application of the SMOTE algorithm significantly reduced the classification accuracy based
on the F1-score indicator;
• The use of the class weights approach for dataset balancing provides the same level of accuracy
as the classification by the Naive Bayes algorithm on the original data;
• Random Forest classifier using the original, unbalanced dataset shows significantly lower
classification accuracy compared to Naive Bayes classifier;
• Applying the SMOTE algorithm to balance the dataset for the Random Forest classifier
increased the classification accuracy based on the F1-score, but there is observed the overfitting
problem;
• The use of the class weights approach for dataset balancing provides the same level of accuracy
in the test mode as the use of the SMOTE method, but an overfitting problem is not observed in this
case. Considering everything described above, further research will use the class weights method
for balancing the studied dataset.
4.2.</p>
    </sec>
    <sec id="sec-11">
      <title>Results</title>
      <p>In the paper, the optimal parameters of each of the machine learning methods, which are the basis
of the proposed, hybrid two-ML-based classifier, were selected. We implemented the grid search
method from the Python library to achieve this goal.</p>
      <p>The optimal operating parameters of the Naive Bayes classifier are as follows: 'var_smoothing':
1e09.</p>
      <p>Accordingly, the optimal parameters of the Random Forest classifier are as follows: 'bootstrap':
True, 'max_depth': 25, 'max_features': 'sqrt', 'min_samples_leaf': 4, 'min_samples_split': 2,
'n_estimators': 100.</p>
      <p>The results of the developed method when using the optimal parameters of its operation based on
various performance indicators are presented in Table 1</p>
      <sec id="sec-11-1">
        <title>As can be seen from Table 1, the developed method:</title>
        <p>• Ensures high classification accuracy based on all studied performance indicators (in particular,
F1-score in test mode equal 93%);
• Provides high generalization properties (the difference between the metrics of both training and
test modes is small);
• Does not provoke overfitting, which is typical when analyzing small data.</p>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>5. Comparison and discussion</title>
      <p>To evaluate the results of the work of the hybrid two-ML-based classifier, both machine learning
algorithms that form the method and several existing methods were used, in particular:
• Probabilistic Neural Network with next optimal parameters: the Canberra distance, smooth
factor = 6.51617;
• SVR with RBF kernel with next optimal parameters: 'C': 1, 'coef0': 1.0, 'degree': 2, 'gamma': 1,
'kernel': 'sigmoid', 'max_iter': 10000
The results of such a comparison based on different performance indicators are summarized in Fig.
3.</p>
      <sec id="sec-12-1">
        <title>Cohen's Kappa</title>
      </sec>
      <sec id="sec-12-2">
        <title>Matthews’s correlation coefficient</title>
        <p>Hybrid two-ML-based classifier
0,84 Hybrid two-ML-based classifier</p>
        <sec id="sec-12-2-1">
          <title>The following conclusions can be drawn from the obtained results (Fig. 3):</title>
          <p>• The nonlinear method, SVR with RBF kernel, even though it is quite often used during the
analysis of short datasets, demonstrates the lowest accuracy of work using all three indicators;
• Both machine learning algorithms, which are the basis of the hybrid two-ML-based classifier,
demonstrate somewhat higher accuracy of work compared to SVR;
• Acceptable results, taking into account the complexity of the task, are demonstrated by PNN.
In particular, the accuracy of this neural network reaches 84% (f1-score). However, the time of its
operation, taking into account the optimization by dual annealing for the optimal parameters
selection of its operation, is quite significant.
• The developed method demonstrates the highest accuracy using all performance indicators
when solving the task of predicting the survival of kidney transplants one month after
transplantation. In addition, it shows high generalization properties (Table 1) and a satisfactory
duration of the training procedure. All this ensures the possibility of its use when solving the
realworld task of transplantation medicine.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>6. Conclusions</title>
      <p>The paper considers the problem of the prediction of the survival of kidney transplants one month
after transplantation using machine learning tools. This approach will provide the doctor with additional
information for timely and correct diagnosis and will allow for adjusting the treatment regimen.</p>
      <p>The authors used a real-world dataset obtained from the Lviv Regional Clinical Hospital. Taking
into account the imbalance of the dataset, the effectiveness of the application of various approaches to
balancing was investigated in the paper and the best one of them was determined for solving the stated
problem.</p>
      <p>The paper presents the new hybrid two-ML-based classifier, which is based on the sequential use of
two machine learning algorithms. In the first step of the method, a Naive Bayes classifier is used, the
results of which, in the form of a set of probabilities belonging to each of the data classes, replace all
initial attributes with the found set of probabilities. In the second step of the method, the Random Forest
algorithm uses a new dataset of significantly reduced dimensionality to predict the result.</p>
      <p>
        Due to the comparison with several known approaches, the high accuracy of the method presented
in this paper has been established, which provides high generalization properties and does not provoke
overfitting. All this ensures the possibility of its use when solving the real problems of transplantation
medicine. This approach can be also used in other areas [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [22]–[24] as an Interpretable AI solution
based on the use of the SGTM neural-like structure as a second element of the proposed cascade
structure.
      </p>
    </sec>
    <sec id="sec-14">
      <title>7. Acknowledgments</title>
      <p>This research is supported by the EURIZON Fellowship Program “Remote Research Grants for
Ukrainian Researchers”, grand № 138.</p>
    </sec>
    <sec id="sec-15">
      <title>8. References</title>
      <p>[11] O. Basystiuk, et al., ‘Machine Learning Methods and Tools for Facial Recognition Based on
Multimodal Approach’, Proc. Mod. Mach. Learn. Technol. Data Sci. Workshop MoMLeTDS 2023
Lviv Ukr. June 3 2023, vol. 3426, pp. 161–170.
[12] L. Shahmoradi, et al., ‘Predicting the survival of kidney transplantation: design and evaluation of
a smartphone-based application’, BMC Nephrol., vol. 23, no. 1, p. 219, Dec. 2022, doi:
10.1186/s12882-022-02841-4.
[13] Y. Tolstyak et al., ‘The Ensembles of Machine Learning Methods for Survival Predicting after
Kidney Transplantation’, Appl. Sci., vol. 11, no. 21, p. 10380, Nov. 2021, doi:
10.3390/app112110380.
[14] V. Kotsovsky, et al., ‘On the Size of Weights for Bithreshold Neurons and Networks’, in 2021
IEEE 16th International Conference on Computer Sciences and Information Technologies (CSIT),
LVIV, Ukraine: IEEE, Sep. 2021, pp. 13–16. doi: 10.1109/CSIT52700.2021.9648833.
[15] L. Mochurad and R. Panto, ‘A Parallel Algorithm for the Detection of Eye Disease’, in Advances
in Intelligent Systems, Computer Science and Digital Economics IV, vol. 158, Z. Hu, Y. Wang, and
M. He, Eds., in Lecture Notes on Data Engineering and Communications Technologies, vol. 158.
, Cham: Springer Nature Switzerland, 2023, pp. 111–125. doi: 10.1007/978-3-031-24475-9_10.
[16] I. Krak, et al., ‘Data Classification Based on the Features Reduction and Piecewise Linear
Separation’, in Intelligent Computing and Optimization, vol. 1072, P. Vasant, I. Zelinka, and
G.W. Weber, Eds., in Advances in Intelligent Systems and Computing, vol. 1072. , Cham: Springer
International Publishing, 2020, pp. 282–289. doi: 10.1007/978-3-030-33585-4_28.
[17] V. Kotsovsky and A. Batyuk, ‘Feed-forward Neural Network Classifiers with Bithreshold-like
Activations’, in 2022 IEEE 17th International Conference on Computer Sciences and Information
Technologies (CSIT), Lviv, Ukraine: IEEE, Nov. 2022, pp. 9–12. doi:
10.1109/CSIT56902.2022.10000739.
[18] A. Trostianchyn, Z et al., ‘Sm-Co alloys coercivity prediction using stacking heterogeneous
ensemble model’, Acta Metall. Slovaca, vol. 27, no. 4, pp. 195–202, Dec. 2021, doi:
10.36547/ams.27.4.1173.
[19] ‘sklearn.utils.class_weight.compute_class_weight’, scikit-learn. Accessed: Aug. 28, 2023.
[Online]. Available:
https://scikitlearn/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html
[20] ‘SMOTE — Version 0.11.0’. Accessed: Aug. 28, 2023. [Online]. Available:
https://imbalancedlearn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html
[21] G. I. Webb, E et al., ‘Naïve Bayes’, in Encyclopedia of Machine Learning, C. Sammut and G. I.</p>
      <p>Webb, Eds., Boston, MA: Springer US, 2011, pp. 713–714. doi: 10.1007/978-0-387-30164-8_576.
[22] I. Oleksiv, et al., ‘Quality of Student Support at IT Educational Programmes: Case of Lviv
Polytechnic National University’, in 2021 11th International Conference on Advanced Computer
Information Technologies (ACIT), Deggendorf, Germany: IEEE, Sep. 2021, pp. 270–275. doi:
10.1109/ACIT52158.2021.9548648.
[23] W. Auzinger, et al., ‘A Continuous Model for States in CSMA/CA-Based Wireless Local Networks
Derived from State Transition Diagrams’, in Proceedings of International Conference on Data
Science and Applications, vol. 287, M. Saraswat, S. Roy, C. Chowdhury, and A. H. Gandomi, Eds.,
in Lecture Notes in Networks and Systems, vol. 287. , Singapore: Springer Singapore, 2022, pp.
571–579. doi: 10.1007/978-981-16-5348-3_45.
[24] J. Wasilczuk, et al., ‘Entrepreneurial competencies and intentions among students of technical
universities’, Probl. Perspect. Manag., vol. 19, no. 3, pp. 10–21, Jul. 2021, doi:
10.21511/ppm.19(3).2021.02.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>I.</given-names>
            <surname>Izonin</surname>
          </string-name>
          , et. al.,
          <article-title>'Addressing Medical Diagnostics Issues: Essential Aspects of the PNN-based Approach'</article-title>
          ,
          <source>CEUR-WS Proc. 3rd Int. Conf. Inform. Data-Driven Med. Växjö Swed. Novemb. 19 - 21</source>
          <year>2020</year>
          , vol.
          <volume>2753</volume>
          , pp.
          <fpage>209</fpage>
          -
          <lpage>218</lpage>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bodyanskiy</surname>
          </string-name>
          , et. al., '
          <article-title>Modified generalized neo-fuzzy system with combined online fast learning in medical diagnostic task for situations of information deficit'</article-title>
          , Math. Biosci. Eng., vol.
          <volume>19</volume>
          , no.
          <issue>8</issue>
          , pp.
          <fpage>8003</fpage>
          -
          <lpage>8018</lpage>
          ,
          <year>2022</year>
          , doi: 10.3934/mbe.2022374.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Babichev</surname>
          </string-name>
          , et. al.,
          <source>'Information Technology of Gene Expression Profiles Processing for Purpose of Gene Regulatory Networks Reconstruction', in 2018 IEEE Second International Conference on Data Stream Mining Processing (DSMP)</source>
          ,
          <year>Aug</year>
          .
          <year>2018</year>
          , pp.
          <fpage>336</fpage>
          -
          <lpage>341</lpage>
          . doi:
          <volume>10</volume>
          .1109/DSMP.
          <year>2018</year>
          .
          <volume>8478452</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Medykovskyy</surname>
          </string-name>
          , et. al.,
          <article-title>'Development of a regional energy efficiency control system on the basis of intelligent components'</article-title>
          ,
          <source>in 2016 XIth International Scientific and Technical Conference Computer Sciences and Information Technologies (CSIT)</source>
          , Lviv, Ukraine: IEEE, Sep.
          <year>2016</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>20</lpage>
          . doi:
          <volume>10</volume>
          .1109/STC-CSIT.
          <year>2016</year>
          .
          <volume>7589858</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chumachenko</surname>
          </string-name>
          , et. al.,
          <source>'Predictive Model of Lyme Disease Epidemic Process Using Machine Learning Approach', Appl. Sci.</source>
          , vol.
          <volume>12</volume>
          , no.
          <issue>9</issue>
          , p.
          <fpage>4282</fpage>
          ,
          <string-name>
            <surname>Apr</surname>
          </string-name>
          .
          <year>2022</year>
          , doi: 10.3390/app12094282.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tolstyak</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Havryliuk</surname>
          </string-name>
          , '
          <article-title>An Assessment of the Transplant's Survival Level for Recipients after Kidney Transplantations using Cox Proportional-Hazards Model'</article-title>
          ,
          <source>CEUR-WSorg</source>
          , vol.
          <volume>3302</volume>
          , pp.
          <fpage>260</fpage>
          -
          <lpage>265</lpage>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tolstyak</surname>
          </string-name>
          , et. al.,
          <article-title>'An investigation of the primary immunosuppressive therapy's influence on kidney transplant survival at one month after transplantation', Transpl</article-title>
          . Immunol., vol.
          <volume>78</volume>
          , p.
          <fpage>101832</fpage>
          ,
          <string-name>
            <surname>Jun</surname>
          </string-name>
          .
          <year>2023</year>
          , doi: 10.1016/j.trim.
          <year>2023</year>
          .
          <volume>101832</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Raynaud</surname>
          </string-name>
          et al., '
          <article-title>Dynamic prediction of renal survival among deeply phenotyped kidney transplant recipients using artificial intelligence: an observational, international</article-title>
          , multicohort study',
          <source>Lancet Digit. Health</source>
          , vol.
          <volume>3</volume>
          , no.
          <issue>12</issue>
          , pp.
          <fpage>e795</fpage>
          -
          <lpage>e805</lpage>
          ,
          <year>Dec</year>
          .
          <year>2021</year>
          , doi: 10.1016/S2589-
          <volume>7500</volume>
          (
          <issue>21</issue>
          )
          <fpage>00209</fpage>
          -
          <lpage>0</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. A. A.</given-names>
            <surname>Naqvi</surname>
          </string-name>
          ,
          <string-name>
            <surname>K</surname>
          </string-name>
          et. al.,
          <source>'Predicting Kidney Graft Survival Using Machine Learning Methods: Prediction Model Development and Feature Significance Analysis Study', J. Med. Internet Res.</source>
          , vol.
          <volume>23</volume>
          , no.
          <issue>8</issue>
          , p.
          <fpage>e26843</fpage>
          ,
          <string-name>
            <surname>Aug</surname>
          </string-name>
          .
          <year>2021</year>
          , doi: 10.2196/26843.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Altameem</surname>
          </string-name>
          , et al.,
          <article-title>'Patient's data privacy protection in medical healthcare transmission services using back propagation learning'</article-title>
          ,
          <source>Comput. Electr. Eng.</source>
          , vol.
          <volume>102</volume>
          , p.
          <fpage>108087</fpage>
          ,
          <string-name>
            <surname>Sep</surname>
          </string-name>
          .
          <year>2022</year>
          , doi: 10.1016/j.compeleceng.
          <year>2022</year>
          .
          <volume>108087</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>