Investigation of PNN Optimization Methods to Improve
                         Classification Performance in Transplantation Medicine
                         Myroslav Havryliuka, Nazarii Hovdysha, Yaroslav Tolstyakb,c, Valentyna Chopyakb , Natalya
                         Kustraa

                         a
                           Lviv Polytechnic National University, S. Bandera str., 12, Lviv, 79013, Ukraine
                         b
                           Danylo Halytsky Lviv National Medical University, Pekarska str., 69, Lviv, 79010, Ukraine
                         c
                           Lviv Regional Clinical Hospital, Chernihivska str., 7, Lviv, 79010, Ukraine

                                          Abstract
                                          The problem of predicting the success of organ transplantation is critical in the field of
                                          medicine. The use of a probabilistic neural network is of considerable interest in this context.
                                          In this study, the authors compared the speed of work of four popular methods for optimizing
                                          the parameter of a probabilistic neural network in the case of analyzing a short medical dataset
                                          collected by Lviv Regional Clinical Hospital. All three algorithms have demonstrated
                                          efficiency, reaching the optimum performance point. The use of optimizers provided a
                                          significant saving of time and computing resources compared to grid search.

                                          Keywords 1
                                          Probabilistic neural network, Optimization, Small data, Classification

                         1. Introduction

                            The problem of predicting the success of organ transplantation is critical in the field of medicine.
                         Currently, there are no models capable of accurately describing the patient's condition after
                         transplantation. Therefore, the use of methods of intelligent data analysis has gained wide popularity.
                            However, insufficient data is often an obstacle to building adequate machine learning models.
                         Classical models of artificial intelligence do not demonstrate sufficient efficiency in the case of
                         processing small medical datasets. This is due to a number of reasons, the main of which is the problem
                         of overfitting.
                            The use of a probabilistic neural network in such cases can improve performance compared to
                         traditional models. However, the selection of the optimal network parameter by brute force method
                         requires a lot of time and computing resources. That is why the use of optimization methods is
                         appropriate for this task.

                         2. State-of-the-arts
                             New approaches to working with small datasets appear every year. However, this area of research
                         still needs development.
                             The issue of using a probabilistic neural network for classification problems was analyzed in [1].
                         The authors found that the number of studies involving the application of probabilistic neural networks
                         had increased over the previous five years. Research concerns various fields of medicine, such as
                         nephrology, cardiology, oncology, pulmonology, endocrinology, neurosurgery, etc. Often use a
                         combination of probabilistic neural network with other machine learning methods, such as SVM in [2],
                         [3] and [4], Naive Bayes in [5], [6] and [7], K-means in [8], [9] and [10].

                         IDDM’2023: 6th International Conference on Informatics & Data-Driven Medicine, November, 17–19, 2023, Bratislava, Slovakia
                         EMAIL: myroslav.a.havryliuk@lpnu.ua (A. 1)
                         ORCID: 0000-0001-5259-7564
                                       ©️ 2023 Copyright for this paper by its authors.
                                       Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                       CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
     In the above-mentioned works, the selection of the optimal parameters of the neural network was
carried out using a grid search. Thus, optimizing the parameters of a probabilistic neural network is
relevant.
    The purpose of this study is to compare the performance of three popular methods for optimizing
the parameter of a probabilistic neural network in the case of analyzing a short medical dataset.

2.1.    Probabilistic Neural Network

    A probabilistic neural network is often used to solve a wide range of tasks, including classification
[11]. The training procedure of this neural network is quite simple. The model also has certain
disadvantages, the main one of which is the increase in dimensionality of the structure with the increase
of the sample [12]. Accordingly, the use of a probabilistic neural network can require the allocation of
a large amount of resources.
    The work of this neural network in the case of binary classification can be described as follows:
    1. Let there be k vectors of class 1 and m vectors of class 2 in the sample. We denote the j-th
                                       1                          2
    component of the i-th vector as X i , j for class 1 and as X i , j for class 2. The task of the model is to
   classify the input vector X . Therefore, it is necessary to determine the probability that the vector
    X belongs to class 1.
   2. Canberra distances between the input vector and all sample vectors are calculated:
                                    n      X i1, j − X j                  n      X i2, j − X j
                           R =
                             1
                                                           , R = 2
                                                                          X                             (1)
                                           X i1, j + X j                                    + Xj
                             i                                  i                     2
                                    j =1                                  j =1       i, j

   3.   Gaussian distances are calculated based on the obtained values:
                                                                ( Ri1,2 )2
                                            D  1,2
                                                     = exp(−                     )                       (2)
                                               i
                                                                     2
   4.    The probability that the input vector belongs to class 1 is calculated by the formula:
                                                            k

                                                           D        1
                                                                     i
                                                                                                         (3)
                                                     P1 = i =1
                                                              k
   5.   Similarly for class 2:
                                                           k

                                                           D        i
                                                                      2
                                                                                                         (4)
                                                     P2 = i =1
                                                              k
   6.   So probabilistic neural network predicts a class of the input vector using the following rule:
                                          0, if max  Pc  = P1 
                                                                
                                 y pred =                        , c = 1, 2                            (5)
                                          1, if max  Pc  = P2 
                                                                


2.2.    Optimization problem formulation

   For models that are built using unbalanced datasets, the F1 score is an appropriate measure of
performance [13].
   Let yi, i ∈ 1..N denote belonging to a certain class in the test sample of size N, then yipred, i ∈ 1..N is
the value predicted by the model.
   Precision of the model will be equal to:
                                                                 ( y                         * yi )
                                                                    N
                                                                                   pred
                                                                                  i
                                   Precision =                      i =1
                                                                            N                                                (6)
                                                                           y
                                                                            i =1
                                                                                     i
                                                                                      pred


   Accordingly, recall is equal to:

                                                             ( y                    * yi )
                                                                N
                                                                             pred
                                                                            i
                                         Recall = i =1                      N                                                (7)
                                                                           y
                                                                           i =1
                                                                                     i


   According to the definition of the F1-score metric, it can be expressed as follows:

                                                  ( y                     * yi )             ( y              * yi )
                                                   N                                           N
                                                                 pred                                    pred
                                                                i                                       i
                                                  i =1                                         i =1
                                                          N                              *             N

                                                         y          i
                                                                      pred
                                                                                                       y       i
                   f 1_ score = 2 * N                    i =1                                          i =1
                                                                                                                             (8)
                                                ( y                       * yi )              ( y                 * yi )
                                                                                                N
                                                               pred                                      pred
                                                              i                                         i
                                                  i =1
                                                         N                               +     i =1
                                                                                                        N

                                                         y
                                                         i =1
                                                                    i
                                                                     pred
                                                                                                       y
                                                                                                       i =1
                                                                                                                i


   Thus, the problem of maximizing F1-score can be presented in the following form:

                           ( yipred * yi )                     ( y                      * yi )
                           N                                        N
                                                                                 pred
                                                                                i
                           i =1                                  i =1
                                   N                      *                     N

                                  y      i
                                           pred
                                                                            y           i
                     2* N         i =1                                       i =1
                                                                                                       → max                 (9)
                          ( y                 * yi )            ( y                        * yi )
                                                                    N
                                        pred                                     pred
                                       i                                        i
                          i =1
                                  N                       + i =1                N

                                  y
                                  i =1
                                          i
                                           pred
                                                                             y
                                                                              i =1
                                                                                          i

   with the restrictions 0.001>sigma>10.

2.3.    Methods for solving the optimization task

   We applied three popular optimization algorithms:
   •   Bayesian optimization
   •   Differential evolution
   •   Dual annealing

   These methods do not require the calculation of derivatives and can perform optimization in case
the objective function is a “black box” [14].
   Bayesian optimization uses Gaussian process to model the black-box objective function [15].
    We defined the upper confidence bounds function as acquisition function to balance exploitation
and exploration. Also we used the following optimization parameters:
    •    number of initial points – 5;
    •    number of iterations – 10.
    Differential evolution is a stochastic method. It applies the key concepts of genetic algorithms [16].
The first step of the algorithm is to create a generation of candidates that are the objective function
arguments. At each iteration, a new generation is created by mixing with other candidates.
    We applied a “best1bin” strategy for creating trial candidates. According to it:
    •    the difference between two randomly chosen candidates is used to provide a mutation of the
    best member of the population;
    •    a binomial distribution is applied for recombination.
    We defined following key algorithm parameters:
    •    population size – 10;
    •    mutation – [0.5;1);
    •    recombination – 0.7;
    •    maximum number of generations – 10.
    Dual annealing is also a stochastic approach. It combines the generalization of Fast Simulated
Annealing and Classical Simulated Annealing coupled to a strategy for carrying out a local search on
accepted locations [17]. This approach describes an advanced method to improve the solution that was
found by the generalized annealing process. A distorted Cauchy-Lorentz visiting distribution is used in
this optimization algorithm.
    We used the following optimization parameters:
    •    parameter for visiting distribution – 2.62;
    •    parameter for acceptance distribution – -5.0;
    •    maximum number of global search iterations – 10.
    For all algorithms, the optimization was performed on the interval σ ∈ [0.00001;10].

3. Modeling and results


3.1.    Dataset descriptions

    The imbalanced dataset collected by Lviv Regional Clinical Hospital (Department Hospital
Nephrology and Dialysis) was used in this study. It contains data on 164 patients who received HLA-
matched renal allografts between 1992 and 2020 by 42 attributes (such as age, sex, glucose level, etc.).
Among them, 64 (42.1%) were women and 88 (57.9%) were men. The age of the patients at the time
of transplantation was 32.6 ± 8.7 (in the range of 18–60) years. 152 patients were transplanted for the
first time, 12 (5 women and 7 men) were transplanted again.

3.2.    Results

   Three optimization algorithms described above were used to optimize the parameter. The
implementation of optimizers from the scipy.optimize and bayesian optimization libraries of the Python
programming language was used. The optimization results are shown in Table 1.
Table 1
Optimization results
                                                                     Number of
                                                                                                Time,
   Optimizer       Accuracy      Precision      Recall    F1 score   evaluations       σ
                                                                                                 sec
                                                                     of F1-score
  Differential
                       0.896       0.727          1        0,842          62         7.232      0.377
   evolution
   Bayesian            0.896       0.727          1        0,842          15         5.472      1.624
      Dual
                       0.896       0.727          1        0,842          29         6.678      0.189
  annealing

4. Comparison and discussion

   As can be seen, all three optimizers have reached the point from the intervals where the value of F1-
score is maximal. The precision value indicates a quite large proportion of false-positive results, while
the recall is 100%.
   All algorithms showed quite good optimization speed. The shortest execution time was
demonstrated by dual annealing. A visualization of the optimization duration can be seen at Fig. 1.


Figure 1: Optimization execution time

   In terms of the number of evaluations of the objective function, Bayesian optimization is the most
effective (a visualization can be seen at Fig. 2). However, other steps of this algorithm also cause
computational costs, which is reflected in the duration of execution.
Figure 2: Number of evaluations of F1 score

    The selection of the optimal value of the parameter was also carried out using a grid search on the
interval σ ∈ [0.00001;10] with a step Δ=0.001. The execution time was 41.852 seconds. The number
of objective function calculations was 10000. As a result of the experiment, it was found that there are
two intervals on which F1 score reaches a maximum of 0.842 (Fig. 3).


Figure 3: Grid search optimization results

  So the use of each of the optimizers provides a significant reduction in execution time and
computational costs, compared to the grid search.
5. Conclusions

   The problem of predicting the success of organ transplantation is critical in the field of medicine.
The use of a probabilistic neural network is of considerable interest in this context. In this study, the
authors compared the performance of three popular methods for optimizing the parameters of a
probabilistic neural network in the case of analyzing a short set of medical data collected by Lviv
Regional Clinical Hospital. All three algorithms have demonstrated efficiency, reaching the optimum
performance point. The use of optimizers provided a significant saving of time and computing resources
compared to a grid search.
   Further research may concern the optimization of model parameters, where the probabilistic neural
network is used in combination with other machine learning methods.


6. Acknowledgments

  This research is supported by the EURIZON Fellowship Program: “Remote Research Grants for
Ukrainian Researchers”, grand № 138.

7. References
[1] Izonin I, et al. Addressing Medical Diagnostics Issues: Essential Aspects of the PNN-based
     Approach. In: Proceedings of the 3rd International Conference on Informatics & Data-Driven
     Medicine. Växjö, Sweden, November 19 - 21, 2020
[2] Izonin I, et al. PNN-SVM Approach of Ti-Based Powder’s Properties Evaluation for Biomedical
     Implants Production. Comput Mater Contin. 2022;71(3):5933–47.
[3] Mochurad L, et al. Classification of X-Ray Images of the Chest Using Convolutional Neural
     Networks. In: Proceedings of the 4th International Conference on Informatics & Data-Driven
     Medicine. Valencia, Spain, November 19 - 21, 2021. 269-282.
[4] Tolstyak Y, Havryliuk M. An Assessment of the Transplant's Survival Level for Recipients after
     Kidney Transplantations using Cox Proportional-Hazards Model. In: Proceedings of the 5th
     International Conference on Informatics & Data-Driven Medicine. Lyon, France, November 18 -
     20, CEUR-WS.org, 2022. pp. 260-265.
[5] Tolstyak Y, et al. The Ensembles of Machine Learning Methods for Survival Predicting after
     Kidney Transplantation. Appl Sci. 2021 Jan;11(21):10380.
[6] Liaskovska S, et al. Investigation of Anomalous Situations in the Machine-Building Industry
     Using Phase Trajectories Method. In: Hu Z, Petoukhov S, Yanovsky F, He M, editors. Advances
     in Computer Science for Engineering and Manufacturing. Cham: Springer International
     Publishing; 2022. p. 49–59. (Lecture Notes in Networks and Systems).
[7] Basystiuk O, Melnykova N. Multimodal Approaches for Natural Language Processing in Medical
     Data. In: Proceedings of the 5th International Conference on Informatics & Data-Driven Medicine.
     Lyon, France, November 18–20, CEUR-WS.org, pp. 246–252 (2022).
[8] Tolstyak Y, et al. An investigation of the primary immunosuppressive therapy’s influence on
     kidney transplant survival at one month after transplantation. Transpl Immunol. 2023 Jun
     1;78:101832.
[9] Kotsovsky V, et al. On the Size of Weights for Bithreshold Neurons and Networks. In: 2021 IEEE
     16th International Conference on Computer Sciences and Information Technologies (CSIT). 2021.
     p. 13–6.
[10] Mochurad LI. Canny Edge Detection Analysis Based on Parallel Algorithm, Constructed
     Complexity Scale and CUDA. Comput Inform. 2022 Nov 9;41(4):957–80.
[11] Kotsovsky V, Batyuk A. Feed-forward Neural Network Classifiers with Bithreshold-like
     Activations. In: 2022 IEEE 17th International Conference on Computer Sciences and Information
     Technologies (CSIT). 2022. p. 9–12.
[12] Oleksiv I, et al. Quality of Student Support at IT Educational Programmes: Case of Lviv
     Polytechnic National University. In: 2021 11th International Conference on Advanced Computer
     Information Technologies (ACIT). 2021. p. 270–5.
[13] Martyn Y, et al. Optimization of Technological’s Processes Industry 4.0 Parameters for Details
     Manufacturing via Stamping: Rules of Queuing Systems. Procedia Comput Sci. 2021 Jan
     1;191:290–5.
[14] Ganguli C, et al. Adaptive Artificial Bee Colony Algorithm for Nature-Inspired Cyber Defense.
     Systems. 2023 Jan;11(1):27.
[15] Ljaskovska S, et al. Optimization of Parameters of Technological Processes Means of the FlexSim
     Simulation Simulation Program. In: 2020 IEEE Third International Conference on Data Stream
     Mining & Processing (DSMP). 2020. p. 391–7.
[16] Mochurad L. Optimization of Regression Analysis by Conducting Parallel Calculations. In:
     COLINS-2021: 5th International Conference on Computational Linguistics and Intelligent
     Systems, April 22–23, 2021, Kharkiv, Ukraine, 982-996 p.
[17] Basystiuk O, et al. Machine Learning Methods and Tools for Facial Recognition Based on
     Multimodal Approach. In: Proceedings of the Modern Machine Learning Technologies and Data
     Science Workshop (MoMLeT&DS 2023). Lviv, Ukraine, June 3, 2023. CEUR-WS.org, pp. 161-
     170 (2023).