Addressing Medical Diagnostics Issues: Essential Aspects of the PNN-based Approach Ivan Izonina, Roman Tkachenkoa, Liliia Ryvaka, Khrystyna Zuba, Mariia Rashkevycha, and Olena Pavliuka a Lviv Polytechnic National University, S. Bandera str., 12, Lviv, 79013, Ukraine Abstract The use of artificial intelligence tools is one of the areas of development of medical diagnostics in its various fields. This paper is devoted to the study of the current state of development of Probabilistic Neural Network and its modifications in medical diagnostics. The authors conducted a systematic literature review over the past 5 years using the Scopus database. It has been established a significant increase of the interest for this ANN type, which is confirmed by the constant growth of publications in the Scopus database on this topic. The PNN topology is presented, the procedure of its functioning is described. Two algorithms for generating the output signal of neural networks of this type are described. The PNN operation is modeled using both algorithms based on the short sample of real medical data. The seminal quality prediction task was solved. A significant increase of the accuracy of PNN operation using an algorithm that describes the complete system of events has been demonstrated. The high accuracy of PNN operation based on Accuracy, Precision, Recall and F-measure has been demonstrated by comparison with existing classifiers (based on machine learning algorithms and artificial neural networks). Prospects for further research on the development of this ANN type, in particular for the construction of hybrid computational intelligence systems based on it, are described. This approach will significantly increase the accuracy of such systems with satisfactory time results of their work. Keywords 1 Medical diagnostics, ANN, PNN, classification, complete system of events, seminal quality prediction, small dataset 1. Introduction Among the main problems of modern health care is a clear tendency to increase the risk of various diseases that occur for various reasons. This is due to modern lifestyles, bad habits, the environmental situation in a particular region, constant levels of stress, etc. [1]. This situation is typical for different areas of medicine [2], [3] and covers various stages, from determining the likelihood of the disease, the need for express diagnosis and to the predictions of the consequences of various operations or different types of treatment. Timely and correct diagnosis can reduce the potential harm to health and sometimes the threat to human life, in particular by establishing the optimal method of treatment. Given the different characteristics of each person's body, a large number of test results, the complex, not at all obvious relationships between them [4], a number of external factors, etc., it is sometimes difficult for young practitioners to make a timely and correct diagnosis. However, experienced doctors sometimes need an outside look too. In particular, if it is necessary to assess the patient's condition by specialists in various fields of medicine (for example, preparation for surgery - a cardiologist, anesthesiologist, surgeon, etc.). IDDM’2020: 3rd International Conference on Informatics & Data-Driven Medicine, November 19–21, 2020, Växjö, Sweden EMAIL ivanizonin@gmail.com (A. 1); roman.tkachenko@gmail.com (A. 2); liliia.ryvak@gmail.com (A. 3); khrystyna.zub@gmail.com (A. 4); maria.i.rashkevych@lpnu.ua (A. 5); olena.m.pavliuk@lpnu.ua (A. 6) ORCID: 0000-0002-9761-0096 (A. 1); 0000-0002-9802-6799 (A. 2); 0000-0002-8579-8829 (A. 3) 0000-0001-6476-7305 (A. 4); 0000-0001- 5490-1750 (A. 5); 0000-0003-4561-3874 (A. 6) ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) The practice of consultations of doctors who meet and discuss the condition of each patient avoids a number of problems. However, this is not always possible. The use of artificial intelligence tools looks like perspective area of development of medical diagnostics in various fields of medicine [5]. The fast development of machine learning tools and artificial neural networks, IoT-based devices, as well as the computing power of modern computers allows solving complex medical problems with scientific and resource-intensive methods [6]. Systems on such basis [7] can be an additional, fairly reliable source [8] of information for the practitioner, in particular during the diagnosis or treatment of the patient. In the case of processing the short datasets, the use of artificial neural networks, in particular without training, looks perspective from our point of view. Therefore, the aim of this paper is to investigate the effectiveness of the use of PNN in medical diagnostics. In particular, we have considered various algorithms for generating the output signal by a neural network of this type. 2. Related works This section presents the results of the review and analysis of the current state of development of Probabilistic Neural Network (PNN), and ways of their practical application to solve applied diagnostic problems in various fields of medicine. The methodology of searching for scientific sources on this research topic corresponded to the PRISMA scheme. The Scopus database was chosen as a tool for searching scientific publications. Fig. 1 shows our search query for searching the scientific papers for the period of 2016-2020. Figure 1: Query for searching the scientific papers in the Scopus database A number of articles were selected based on our search query that meets the specified criteria. The results of the initial analysis of the selection of 29 published papers for the last five years, as well as 143 citations to these works are shown in Fig. 2. 12 10 y = 1,2x + 2,2 8 6 4 2 0 2016 2017 2018 2019 2020 a) 2016 2017 4% 10% 2018 19% 2020 43% 2019 24% b) Figure 2: Data from the Scopus database as of 29.10.2020: a) the number of papers on the research topic; b) the number of citations for these papers. As can be seen from Fig. 2 a, the number of published scientific papers on the stated topic has increased every year, which indicates the relevance and scientific value of such research. In addition, the number of references to these works (Fig. 1b) has grown rapidly from year to year, which suggests a wide interest and practical significance of such searches. The results of the analysis of literature sources obtained based on the search in the Scopus database are summarized in table. 1. Table 1 The results of the systematic literature review according to the PRISMA scheme № Task Computational intelligence Area of Reference tools medicine 1 Prediction of acute clinical Recurrent PNN & hidden cardiology [9] deterioration after coronary Markov model artery bypass grafting 2 Predicting the risk of cancer Naive Bayesian classifier & oncology [10] PNN 3 Detection of the level of Wavelet-transformation & cardiology [11] fatigue PNN 4 Determination of renal Recurrent PNN nephrology [12] calculus on the ultrasound image 5 Early diagnosis of Invariant moments & PNN pulmonology [13] tuberculosis 6 Early diagnosis of Texture segmentation & oncology [14] pneumothorax, PNN pneumoconiosis and emphysema 7 Early diagnosis of lung Texture segmentation & pulmonology [15] cancer PNN 8 Diagnosis of diabetes Correlation features & PNN endocrinology [16] 9 Diagnosis of heart disease Sobel's method & PNN cardiology [17] 10 Early diagnosis of lung Sobel's method & PNN pulmonology [17] cancer 11 Classification of arrhythmia Wavelet-transformation & cardiology [18] based on eight states of PNN heartbeat 12 Diagnosis of ovarian cancer k-means & PNN oncology [19] 13 Diagnosis of ovarian cancer SVM & PNN oncology [19] 14 ECG classifications Wavelet-transformation & cardiology [20] PNN 15 Diagnosis of dementia Vector quantization network neurosurgery [21] & PNN 16 Predicting the survival of PNN oncology [22] patients with cervical cancer As can be seen from the results of the analysis, the Probabilistic Neural Network and its modifications are widely used to solve many diagnostic tasks in various fields of medicine. The combined methods, which are based on the work of this ANN type, acquire practical value. In this paper, we investigate two algorithms for its implementation that will affect the accuracy of both the actual probabilistic ANN and combined diagnostic methods based on it. 3. Materials and methods This section describes a set of data that was used for the practical implementation of the studied methods. In addition, the procedure of preparation and use of neural networks of this type is described, as well as two algorithms for their implementation, which significantly affect the accuracy of PNN. 3.1. Dataset A dataset from [23] was used to study the operation of PNN algorithms. The task was to determine seminal quality based on data from 100 patients. The independent variables are season in which the analysis was performed; age at the time of analysis; childish diseases; accident or serious trauma; surgical intervention; high fevers in the last year; frequency of alcohol consumption; smoking habit; number of hours spent sitting per day. The output variable is the diagnosis: normal or altered. Detailed information on the information collection procedure, the main characteristics of the data set and the attributes used for the simulation are given in [24]. 3.2. Different PNN algorithms Probabilistic neural network (Fig. 3), developed by Donald F. Spech, is widely used to solve classification and pattern recognition tasks through a quick and simple learning procedure. However, this topology of artificial neural networks is also characterized by several disadvantages, including large time delays in the application mode, low accuracy and dimensionality of the structure, which depends on the size of the prepared sample. That is why, depending on the chosen algorithm, its implementation, etc., this neural networks type can be quite time- and resource-intensive. Let's consider the sequence of functioning of the PNN to solve the classification task in the case of two classes: 1 1. Suppose a sample of data from к+m vectors is given. Let к vectors from the sample X i , j , where i=1,…, к is the number of the vector, j=1,…, n is the number, the components of the vector represent the first class of objects to be classified; all other m vectors that do not belong to class 1, belong to class 2. The inputs of the neural network receive the input vector X j , which must be classified, ie it is necessary to determine the probability of its belonging to class 1. 2. We calculate the Euclidean distances from the input vector to all vectors of the preparation sample 1 і 2: n n  X  X  , R   X  X  2 2 Ri1  1 i, j j i 2 2 i, j j (1) j 1 j 1 Input layer Pattern layer Summation layer Output layer x1 ... x2 y ... ... x k ... Figure 3: PNN topology for two classes 3. We pass from them to Gaussian distances:  R1,2 2  Di1,2  exp   i   , (2)   2    where  – smooth factor. 4. The probability of belonging of the input vector to class 1, according to the classical version of the ANN is calculated by the formula: k D 1 i (3) P(1)  i 1 , k However, the ANN according to this variant (algorithm 1) does not describe the complete system of events. Let's consider another variant of calculation of an output signal of ANN of this type: k D i 1 1 i P(1)  k m , (3) i 1 Di1   i 1 Di2 In this case (algorithm 2), P(1) does not always exceed 1, because the sum of Р(1)+Р(2) =1. Let's make an experimental comparison of the accuracy of both of the above algorithms 4. Modeling, results and comparison Experimental investigations of both PNN algorithms were based on preparatory and test data samples. It was formed from the main dataset by randomly dividing the sample into a ratio of 70% to 30%, respectively. The authors have developed a software implementation [25], [26] of the studied algorithms in Python. The comparison was based on the existing software implementation of computational intelligence methods with the scikit-learn librarian [27]. The parameters of the computer on which the modeling took place are as follows: Windows 64, Intel i7, 8GB RAM, 500GB HDD. In fig. 2 in the form of two curves, different values of accuracy indicators at the change of smooth factor on an interval  [0.01,3],   0.01 for both investigated algorithms of PNN realization are resulted. It should be noted that the experiment was performed for the interval  [0.01,10],   0.01 . However, since no increase in the value of each of the indicators  ,   3 was observed, the graph shows the value of the accuracy indicators at   3 . 0,95 0,85 0,75 Accuracy 0,65 0,55 0,45 0,35 0 1 2 3  Accuracy (2nd algorithm) Accuracy (1st algorithm) a) 0,9 0,8 F-measure 0,7 0,6 0,5 0,4 0 1  2 3 F-measure (2nd algorithm) F-measure (1st algorithm) b) 0,98 Precision 0,93 0,88 0,83 0,78 0 1  2 3 Precision (2nd algorithm) Precision (1st algorithm) c) 1,1 1 0,9 0,8 Recall 0,7 0,6 0,5 0,4 0,3 0,2 0 1  2 3 Recall (2nd algorithm) Recall (1st algorithm) d) Figure 2: Changing the accuracy indicators for classifiers based on the first and second algorithm of PNN realization when changing the smooth factor on an interval  [0.01,3],   0.01 : a) Accuracy; b) F-measure; c) Precision; d) Recall. As can be seen from all four graphs of Figs. 2, the second PNN implementation algorithm, which describes a complete system of events, provides the highest accuracy (green line), except for the Precision indicator. Given the fact that the overall accuracy of the classifier based on this algorithm shows higher accuracy at any values  (Fig. 2 a), it should be used in the practical application of PNN to solve applied diagnostic problems in medicine Table 1 summarizes the optimal values of the accuracy indicators and the corresponding value of the smooth factor for both PNN implementation algorithms. It should be noted that they are selected at intervals  [0.01,10],   0.01 based on the highest Accuracy value. The results of the considered algorithms were compared with the work of known classifiers, based on machine learning algorithms and artificial neural network. The results of this comparison in the form of accuracy indicators for training and testing modes are summarized in table 2. Table 1 Accuracy indicators for different PNN algorithms №  Accuracy Precision Recall F-measure Algorithm 1 0,04 0,83 0,862 0,96 0,91 Algorithm 2 0,64 0,87 0,867 1 0,93 As can be seen from Table 2, the largest error is shown by Logistic Regression. Algorithms based on decision trees as well as a multilayer perceptron show almost the same, quite acceptable in the field of medicine results. The only exception here is ExtraTreesClassifier, which works at the same level of accuracy as PNN using the first algorithm. The most accurate results are shown by the SVM and PNN (based on the second algorithm). Table 2 Comparison with other classificators № Method Train accuracy Test accuracy 1 LogisticRegression 0.75 0.6 2 DecisionTreeClassifier 1.0 0.8 3 RandomForestClassifier 1.0 0.8 4 MLPClassifier 1.0 0.8 5 ExtraTreesClassifier 1.0 0.83 6 PNN (1st algorithm) - 0.83 7 SVC 0.88 0.87 8 PNN (2nd algorithm) - 0.87 “-”denotes that algorithm doesn’t require any training procedures However, among the advantages of the latter classifier are the following:  lack of training procedure;  the need to configure only one parameter;  the ability to present the result in the form of probabilities of belonging to each class, which describe the full system of events. In the field of medicine, where the accuracy of the method can significantly affect human health and life, the latter advantage is exceptional. It provides the doctor with additional information that, in combination with his experience, will help to make an accurate diagnosis or treatment. In general, the construction of integrated information systems based on PNN for medical diagnostics will provide the necessary support for young professionals [28]–[30]. The use of machine learning tools [31] or the latest developments in the field of artificial neural networks [32] will provide an opportunity to reduce the time and material resources [33] of diagnostics in various fields of medicine. 5. Conclusion The paper reviews and analyzes the current state of development of Probabilistic Neural Network in the field of medical diagnostics for the last 5 years using Scopus database. The results of such analysis are given. The procedure of using this ANN type for solving the classification task is described, the topology of this computational intelligence tool is given. PNN simulation was performed while solving the seminal quality prediction task. A real set of small data was chosen for modeling. A number of experimental studies have been carried out to determine the accuracy of various algorithms for generating the output signal of PNN. It is experimentally established that algorithm 2, which describes the complete system of events, provides higher accuracy when changing the values of the smooth factor. By comparison with existing classifiers based on machine learning algorithms, high accuracy of PNN is experimentally established. Among the prospects for further use of the results of this study should be noted the possibility of using PNN outputs based on the second algorithm to expand the independent features of each of the vectors of the medical data set of a particular field of medicine. Further processing of extended data sets by machine learning methods will increase the accuracy of diagnostic processes of various diseases. The theoretical justification for such an approach is the consequences of Kover's Theorem. 6. Funding The National Research Foundation of Ukraine funds this study from the state budget of Ukraine within the project "Decision support system for modeling the spread of viral infections" (№ 2020.01 / 0025). 7. References [1] S. Zubchenko, G. Potemkina, A. Havrylyuk, M. Lomikovska, and O. Sharikadze, ‘ANALYSIS OF THE LEVEL OF CYTOKINES WITH ANTIVIRAL ACTIVITY IN PATIENTS WITH ALLERGOPATHOLOGY IN ACTIVE AND LATENT PHASES OF CHRONIC PERSISTENT EPSTEIN-BARR INFECTION’, Georgian Med News, no. 289, pp. 158–162, Apr. 2019. [2] O. Berezsky et al., ‘Fuzzy System For Breast Disease Diagnosing Based On Image Analysis’, CEUR-WS.org, vol. 2488, pp. 69–83, 2019. [3] O. Berezsky, G. Melnyk, T. Datsko, and S. Verbovy, ‘An intelligent system for cytological and histological image analysis’, in The Experience of Designing and Application of CAD Systems in Microelectronics, Lviv - Polyana, Ukraine, Feb. 2015, pp. 28–31, doi: 10.1109/CADSM.2015.7230787. [4] O. Ryabukha and I. Dronyuk, ‘Applying Regression Analysis to Study the Interdependence of Thyroid, Adrenal Glands, Liver, and Body Weight in Hypothyroidism and Hyperthyroidism’, CEUR-WS.org, vol. 2488, pp. 155–164, 2019. [5] T. Shmelova and O. Sechko, ‘Application Artificial Intelligence for Real-Time Monitoring, Diagnostics, and Correction Human State’, CEUR-WS.org, vol. 2488, pp. 185–194, 2019. [6] R. Kaminsky, L. Mochurad, N. Shakhovska, and N. Melnykova, ‘Calculation of the Exact Value of the Fractal Dimension in the Time Series for the Box-Counting Method’, in 2019 9th International Conference on Advanced Computer Information Technologies (ACIT), Jun. 2019, pp. 248–251, doi: 10.1109/ACITT.2019.8780028. [7] O. V. Bisikalo, V. V. Kovtun, and V. V. Sholota, ‘The Information System for Critical Use Access Process Dependability Modeling’, in 2019 9th International Conference on Advanced Computer Information Technologies (ACIT), Jun. 2019, pp. 5–8, doi: 10.1109/ACITT.2019.8780013. [8] T. O. Hovorushchenko, ‘Methodology of Evaluating the Sufficiency of Information for Software Quality Assessment According to ISO 25010’, J. inf. organ. sci. (Online), vol. 42, no. 1, pp. 63– 85, Jun. 2018, doi: 10.31341/jios.42.1.4. [9] T. Tsuji et al., ‘Recurrent probabilistic neural network-based short-term prediction for acute hypotension and ventricular fibrillation’, Sci Rep, vol. 10, no. 1, p. 11970, Dec. 2020 [10] C. Yang, J. Yang, Y. Liu, and X. Geng, ‘Cancer Risk Analysis Based on Improved Probabilistic Neural Network’, Front Comput Neurosci, vol. 14, Jul. 2020, doi: 10.3389/fncom.2020.00058. [11] M. K. Wali, ‘Probabilistic Neural Network Based Fatigue Level Classification Using Electrocardiogram High Frequency Band and Average Heart Beat’, Nano BioMed ENG, vol. 12, no. 2, pp. 132–138, Apr. 2020, doi: 10.5101/nbe.v12i2.p132-138. [12] ‘An Efficient Optimized Probabilistic Neural Network Based Kidney Stone Detection and Segmentation over Ultrasound Images’, IJRTE, vol. 8, no. 3, pp. 7465–7473, Sep. 2019 [13] U. Andayani, R. F. Rahmat, N. S. Pasi, B. Siregar, M. F. Syahputra, and M. A. Muchtar, ‘Identification of The Tuberculosis (TB) Disease Based on XRay Images Using Probabilistic Neural Network (PNN)’, J. Phys.: Conf. Ser., vol. 1235, p. 012056, Jun. 2019 [14] A. Zotin, Y. Hamad, K. Simonov, and M. Kurako, ‘Lung boundary detection for chest X-ray images classification based on GLCM and probabilistic neural networks’, Procedia Computer Science, vol. 159, pp. 1439–1448, Jan. 2019, doi: 10.1016/j.procs.2019.09.314. [15] S. C. S R and H. Rajaguru, ‘Lung Cancer Detection using Probabilistic Neural Network with modified Crow-Search Algorithm’, Asian Pac J Cancer Prev, vol. 20, no. 7, pp. 2159–2166, 01 2019, doi: 10.31557/APJCP.2019.20.7.2159. [16] K. Kalaiselvi and P. Sujarani, ‘Correlation Feature Selection (CFS) and Probabilistic Neural Network (PNN) for Diabetes Disease Prediction’, IJET, vol. 7, no. 3.27, p. 325, Aug. 2018 [17] P. A. Kowalski and M. Kusy, ‘Determining the significance of features with the use of Sobol method in probabilistic neural network classification tasks’, in 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), Sep. 2017, pp. 39–48 [18] J. A. Gutiérrez-Gnecchi et al., ‘DSP-based arrhythmia classification using wavelet transform and probabilistic neural network’, Biomedical Signal Processing and Control, vol. 32, pp. 44–56, Feb. 2017, doi: 10.1016/j.bspc.2016.10.005. [19] M. Kusy and J. Kluska, ‘Assessment of prediction ability for reduced probabilistic neural network in data classification problems’, Soft Comput, vol. 21, no. 1, pp. 199–212, Jan. 2017, doi: 10.1007/s00500-016-2382-9. [20] S. Saraswat, G. Srivastava, and S. Nand Shukla, ‘Malignant Ventricular Ectopy Classification using Wavelet Transformation and Probabilistic Neural Network Classifier’, Indian Journal of Science and Technology, vol. 9, no. 40, Oct. 2016, doi: 10.17485/ijst/2016/v9i40/95486. [21] J. Der Lee, S. Ting Yang, Y. Yau Wai, J. Jie Wang, W. Chuin Hsu, and J. Chih Chien, ‘Probability- based prediction model using multivariate and LVQ-PNN for diagnosing dementia’, Neuropsychiatry, vol. 06, no. 06, 2016, doi: 10.4172/Neuropsychiatry.1000164. [22] B. Obrzut, M. Kusy, A. Semczuk, M. Obrzut, and J. Kluska, ‘Prediction of 10-year Overall Survival in Patients with Operable Cervical Cancer using a Probabilistic Neural Network’, J Cancer, vol. 10, no. 18, pp. 4189–4195, Jul. 2019, doi: 10.7150/jca.33945. [23] ‘UCI Machine Learning Repository: Fertility Data Set’. http://archive.ics.uci.edu/ml/datasets/Fertility#Bardan%20R (accessed Nov. 16, 2020). [24] D. Gil, J. L. Girela, J. De Juan, M. J. Gomez-Torres, and M. Johnsson, ‘Predicting seminal quality with artificial intelligence methods’, Expert Syst. Appl., vol. 39, no. 16, pp. 12564–12573, Nov. 2012, doi: 10.1016/j.eswa.2012.05.028. [25] A. Sambir, V. Yakovyna, and M. Seniv, ‘Recruiting software architecture using user generated data’, in 2017 XIIIth International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH), Apr. 2017, pp. 161–163, doi: 10.1109/MEMSTECH.2017.7937557. [26] Y. Bobalo, M. Seniv, V. Yakovyna, and I. Symets, ‘Method of Reliability Block Diagram Visualization and Automated Construction of Technical System Operability Condition’, in Advances in Intelligent Systems and Computing III, Cham, 2019, pp. 599–610, doi: 10.1007/978- 3-030-01069-0_43. [27] ‘1. Supervised learning — scikit-learn 0.23.2 documentation’. https://scikit- learn.org/stable/supervised_learning.html#supervised-learning (accessed Nov. 16, 2020). [28] N. Boyko and N. Shakhovska, ‘Prospects for Using Cloud Data Warehouses in Information Systems’, in 2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), Sep. 2018, vol. 2, pp. 136–139, doi: 10.1109/STC- CSIT.2018.8526745. [29] N. Pasyeka, H. Mykhailyshyn, and M. Pasyeka, ‘Development Algorithmic Model for optimization of Distributed Fault-Tolerant Web-Systems’, in 2018 International Scientific-Practical Conference Problems of Infocommunications. Science and Technology (PIC S T), Oct. 2018, pp. 663–669, doi: 10.1109/INFOCOMMST.2018.8632160. [30] N. Boyko, O. Pylypiv, Y. Peleshchak, Y. Kryvenchuk, and J. Campos, ‘Automated Document Analysis for Quick Personal Health Record Creation’, CEUR-WS.org, vol. 2488, pp. 208–221, 2019. [31] N. Boyko, M. Kuba, L. Mochurad, and S. Montenegro, ‘Fractal Distribution of Medical Data in Neural Network’, CEUR-WS.org, vol. 2488, pp. 307–318, 2019. [32] S. Leoshchenko, A. Oliinyk, S. Subbotin, N. Gorobii, and T. Zaiko, ‘Synthesis of Artificial Neural Networks Using a Modified Genetic Algorithm’, CEUR-WS.org, vol. 2255, pp. 1–13, 2018. [33] N. Chukhrai and O. Grytsai, ‘Diagnosing the efficiency of cost management of innovative processes at machine-building enterprises’, Actual Problems of Economics, vol. 146, no. 8, pp. 75– 80, 2013.