Comparison of AI Systems in Fetal Health Classification Krzysztof Brzozaa , Kamila Gendasza and Michał Dzumlaa a Faculty of Applied Mathematics, Silesian University of Technology, Kaszubska 23, 44100 Gliwice, POLAND Abstract Fetal health is a very important aspect in this day and age. The ability to predict the health status based on preliminary ex- aminations could help a lot in saving lives, especially if it is done by an artificial intelligence system, which would streamline the whole process. The issue is complicated by the multitude of different algorithms that vary in accuracy and execution time that can be used for such predictions. This project compares three popular artificial intelligence systems: k-nearest neighbor algorithm, naive Bayes classifier, soft sets to show which one of them will prove to be the best. The final results are shown in comparison graphs. In this article, the mathematical issues related to these algorithms and examples of their application in this task are presented. Keywords soft sets, knn, naive Bayes 1. Introduction care problems. All mentioned solutions use different approaches but the incoming data are analyzed mainly in The reduction of child mortality is one of the key indi- two ways - taking the data, or only the extracted features. cators of human progress. It is expected that by 2030, Such solutions were analyzed and discussed in terms of countries end preventable deaths of newborns and chil- data analysis in [11, 12, 13, 14, 15]. dren under 5 years of age, with all countries aiming to This work will be compared artificial intelligence sys- reduce under-5 mortality. Parallel to the notion of child tems, their performance, and accuracy in predicting fetal mortality is of course maternal mortality, during and fol- health. lowing pregnancy and childbirth. The vast majority of For this purpose, 3 artificial intelligence systems were these deaths occurred in low-resource settings, and most used: could have been prevented. Cardiotocographs (CTGs) is a simple and cost-accessible • The k-nearest neighbours clustering algorithm [16]. option to assess fetal health, allowing healthcare profes- • Naive Bayes classifier [17]. sionals to take action in order to prevent child and ma- • Softsets [18]. ternal mortality. The equipment itself works by sending ultrasound pulses and reading its response, thus shedding to check which one of them will obtain the highest light on fetal heart rate (FHR), fetal movements, uterine accuracy. contractions, and more. Artificial intelligence is very important in all in all the aspects of modern life [1, 2, 3]. In medical areas it is used 2. The mathematical part of the for the prognosis of some diseases, classification, and selected tool even clustering. It can be seen in the research in this area. In [4, 5], the neural network were used for prediction • The k-nearest-neighbor clustering algorithm,to covid-19 virus spread. Similar research was the analysis maximize the probability of finding the most suit- of mortality [6]. Again in [7, 8], the deep learning solu- able solution, searches for it among k most similar tions were analyzed and discussed in terms of application solutions, and then chooses the most popular one in medical image analysis tasks. Image analysis was also by voting. used in the skin evaluation mechanism [9]. In medical The easiest way to determine the similarity is to systems not only neural networks are used but also fuzzy use the distance, the smaller it is, the greater the logic and computational intelligence. In [10], the fuzzy similarity is. In this case, was used Minkowski sets approach was used in the evaluation of the health- metric [19]. (︃ 𝑛 )︃1/𝑚 SYSTEM 2021 @ Scholar’s Yearly Symposium of Technology, ∑︁ Engineering and Mathematics. July 27–29, 2021, Catania, IT 𝑀𝑚 (𝑥, 𝑦) = |𝑥𝑖 − 𝑦𝑖 | 𝑚 (1) " krzybrz165@student.polsl.pl (K. Brzoza); 𝑖=1 kamigen389@student.polsl.pl (K. Gendasz); michdzu822@student.polsl.pl (M. Dzumla) where © 2021 Copyright for this paper by its authors. Use permitted under Creative CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) − 𝑥𝑖 is the value from the sample vector. 7 Krzysztof Brzoza et al. CEUR Workshop Proceedings 7–11 Table 1 2.1. A calculation example for our base Sample value Assuming that our sample vector is shown in Tab. 2. And having a training set, the Minkowski metric was id baseline value ··· hist._tend used to calculate the distance between the sample vector 1 120.0 ··· 1.0 and the analyzed vectors of the training set: (︃ 21 )︃1/2 ∑︁ Table 2 𝑀2 (𝑥, 𝑦) = |𝑥𝑖 − 𝑦𝑖 | 2 (3) Used data 𝑖=1 where 𝑥𝑖 is the value from the column of the sample id base. val. ··· his._tend fet._health vector, and 𝑦𝑖 is the value of the column of the currently 1 120.0 ··· 1.0 2.0 analyzed vector from the training set. In project was 2 132.0 ··· 0.0 1.0 assumed the value m = 2. After analyzing the entire ··· ··· ··· ··· ··· training set, the distances between the sample vector and 2126 140.0 ··· 1.0 2.0 the training set vectors are sorted. Next, a neighborhood 2127 142.0 ··· 0.0 1.0 is selected for the specified k nearest neighbors based on the previously calculated distances, and the most fre- quent label is selected. − 𝑦𝑗 is the value of the currently selected A naive Bayes classifier based on Bayes’ theorem: vector from the training set. 𝑃 (𝐵|𝐴)𝑃 (𝐴) − 𝑖 index of the currently analysed column 𝑃 (𝐴|𝐵) = (4) 𝑃 (𝐵) from the database. is particularly suitable for problems with very many Description of the KNN clustering algorithm: dimensions in the input. Despite the simplicity of the method, it often performs better than other very complex 1. Performing data normalization. classifier methods. The classifier learns by analyzing a set 2. Counting the distance between the test vector of learning data for which the correct classes are given. and all vectors of the training set. The model includes 𝑃 (𝑋|𝑌 )-the probability of ob- 3. Sorting distances. servations for different class labels and 𝑃 (𝑌 )-a prior 4. Selection of the label of the k-nearest vectors in probability, which is the probability calculated before relation to the test vector. the random experiment is performed. This is the classi- 5. Assign the most frequent label as the label of the cal probability calculated in the same way as the overall test vector. probability. Bayes’ rule is used to determine 𝑃 (𝑌 |𝑋)-the 6. Intheeventofatieinthevoting, the selection shall conditional probability of a class for a given observation. be made by a lot. The label for which the probability is highest is chosen. The model is called naive because it assumes a very strong To improve the performance of the algorithm, data simplification that says that for a fixed class label all are normalization was applied. This causes all dimensions features are independent of each other. for which the distance is calculated to have equal sig- nificance. Otherwise, a situation could arise in which 2.1.1. A calculation example for this base a single dimension would dominate other dimensions. Thanks to normalization we obtain a situation, where Classify the object X = (133.0, ..., 0.0) for this table (deci- the values of a variable belonging to the interval [0;1]. sion means 1.0 - Normal, 2.0 - Suspect, 3.0 - Pathological) what is shown in Tab. 2. 𝑥𝑗 (𝑖) − 𝑚𝑖𝑛(𝑥𝑗 ) After assigning a column to a 1.0/2.0/3.0 group, it is 𝑥𝑗 (𝑖) = (2) 𝑚𝑎𝑥(𝑥𝑗 ) − 𝑚𝑖𝑛(𝑥𝑗 ) needed to determine which probability is higher. − 𝑖 is another index of the vector. • 𝑃 (𝐶 = 𝐶1 |𝑋) e.g. P(decision = 1.0| base. val. = − 𝑗 is the index of the variable. 133.0, ..., hist._tend .= 0.0), − 𝑚𝑎𝑥(𝑥𝑗 ) is the maximum value of j. • 𝑃 (𝐶 = 𝐶2 |𝑋) e.g. P(decision = 2.0| base. val. = − 𝑚𝑖𝑛(𝑥𝑗 ) is the minimum value of j. 133.0, ..., hist._tend .= 0.0), • 𝑃 (𝐶 = 𝐶3 |𝑋) e.g. P(decision = 3.0| base. val. = 133.0, ..., hist._tend .= 0.0) 8 Krzysztof Brzoza et al. CEUR Workshop Proceedings 7–11 So there is the need to calculate 𝑃 (𝐶 = 𝐶1 ) · 𝑃 (𝐶 = Table 3 𝐶1 |𝑋), 𝑃 (𝐶 = 𝐶2 ) · 𝑃 (𝐶 = 𝐶2 |𝑋) and 𝑃 (𝐶 = 𝐶3 ) · Used data 𝑃 (𝐶 = 𝐶3 |𝑋) and compare the results. U baseline value ··· his._tend. 1. 𝑃 (𝐶 = 𝐶𝑗 ) = 13 ℎ1 1 ··· 0 2. 𝑃 (𝑎𝑖 = [𝑣𝑎𝑙𝑢𝑒]|𝐶 = 𝐶𝑗 ) = [𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦] where ℎ2 0 ··· 1 under i insert the number of the next column, ℎ3 1 ··· 1 under [𝑣𝑎𝑙𝑢𝑒] the value of that column, and un- der [probability] the calculated probability value, 𝑗 ∈ {1, 2, 3}. Based on this, a soft set table can be created (see Then it is needed to substitute the calculated values Tab. 3). into the formula: Due to the simplicity of this system, subsets of a soft set can be determined very easily. E.g. (F, 21 ∏︁ A) is a subset of(G,B)if 𝐴 ⊂ 𝐵 and ∀∈ ∈ 𝐴1 and 𝑃 (𝐶 = 𝐶𝑗 · 𝑃 (𝑎𝑖 = 𝑥𝑖 |𝐶 = 𝐶𝑗 )) = [𝑟𝑒𝑠𝑢𝑙𝑡𝑗 ] (5) 𝐹 (∈) and 𝐺(∈) are the same. Two soft sets will 𝑖=1 be equal when the previous poset condition is It is checked which result (probability value) is greater satisfied and when (G,B) is also a poset of (F, A). which determines which class the object X should be classified in. • Soft sets are defined by a set of properties which, 3. Description of the database however, are not completely precise (e.g. instead The project was used a ready-made database, which can of saying that it is 15∘ C outside one can say that be downloaded at [20]. it is warm). The general theory of soft sets is very The database consists of one table with 22 columns. 21 similar to that of fuzzy sets but much simplified. columns are numerical data defining the CTG study and Thanks to the fact that there are no restrictions the last text column defines the fetal status. on the approximate description of the object, it is Description of the columns: very easy to apply this method in practice. Mathematical assumptions: U - the class of all • ’baseline value’- FHR baseline (beats per minute). elements considered. E - parameter class P(U) • ’accelerations’ - Number of accelerations per sec- - set of all subsets of U 𝐴 ⊆ 𝐸 - components ond. considered • ’fetal_movement’- Number of fetal movements So the pair (F,A) is called a soft set, where F is an per second. assignment: • ’uterine_contractions’ - Number of uterine con- tractions per second. 𝐹 : 𝐴 → 𝑃 (𝑈 ) (6) • ’light_decelerations’- Number of light decelera- Calculation example: tions per second. − U - fetal health class • ’severe_decelerations’ - Number of severe decel- − E - a set of parameters that determine the erations per second. health of the fetus • ’prolongued_decelerations’- Number of prolonged − A = {baseline value, ..., hist._tend.} decelerations per second. • ’abnormal_short_term_variability’- Percentage The class consists of three health states. of time with abnormal short term variability. U = {Normal, Suspect, Pathological} • ’mean_value_of_short_term_variability’ - Mean E = {𝑒1 , · · · , 𝑒21 } value of short term variability. What follows is the definition of subsets specify- ing the characteristics that health states satisfy • ’percentage_of_time_with_abnormal_long_term_ given assumptions 𝑒𝑖 , where 𝑖 ∈ 1 : 21: variability’ - Percentage of time with abnormal F(𝑒1 ) = {Normal} long term variability. F(𝑒2 ) = {Normal, Suspect} • ’mean_value_of_long_term_variability’- Mean ... value of long term variability. F(𝑒20 ) = {Pathological} • ’histogram_width’ - Width of FHR histogram F(𝑒21 ) = {Normal} • ’histogram_min’- Minimum (low frequency) of FHR histogram. 9 Krzysztof Brzoza et al. CEUR Workshop Proceedings 7–11 Figure 1: Time analysis. Figure 2: Obtained accuracy level. • ’histogram_max’ - Maximum (high frequency) of For all k achieved average precision values were around FHR histogram. 90, which is considered to be a very good result, and it • ’histogram_number_of_peaks’ - Number of his- is concluded that this algorithm performs very well for togram peaks. the database. Unfortunately, a big disadvantage for this • ’histogram_number_of_zeroes’ - Number of his- algorithm is the execution time against a large database. togram zeros. In each iteration, for each validation sample, distances to nearest neighbors are recalculated and sorted anew, • ’histogram_mode’ - Histogram mode. hence the execution time for the database, for one itera- • ’histogram_mean’- Histogram mean. tion was about 173 seconds. • ’histogram_median’- Histogram median. In the naive Bayes classifier algorithm, the final results • ’histogram_variance’- Histogram variance. are not as satisfactory as in the k-nearest neighbor clus- • ’histogram_tendency’ - Histogram tendency. tering algorithm. The average precision obtained for 10 iterations was about 82%. This is not a bad result, but compared to the previous algorithm, a significant differ- 4. Tests ence can be seen. It was also checked if the algorithm 10 tests were performed for each system and extracted would improve its final result when splitting the data the average values, obtaining the following results: 80:20 for training and validation data. Unfortunately, the precision did not improve. The advantage of this ∙ A k-nearest neighbours clustering algorithm: algorithm is the speed of its execution, oscillating around 0.3 seconds. − for 𝑘 = 1 : 90.61% For classification using soft sets, an average obtained − for 𝑘 = 2 : 90.24% accuracy was oscillating 82%. The execution time of − for 𝑘 = 3 : 90.39% the algorithm for 1 iteration was about 4 seconds. In − for 𝑘 = 3 : 90.39% this method, quarterlies were used due to the strongly ∙ Naive Bayes-nearest-neighbour classifier: 82.43% overlapping characteristics of the CTG surveys analyzed. Graphs comparing the accuracy and execution time ∙ Soft sets: 82.77% of the algorithms used in our project are shown in Fig. 1 and 2. 5. Experiments 6. Conclusions For each of them, a certain number of tests were per- formed and extracted average values to better specify the After an accuracy comparison, the k-nearest-neighbor final result. clustering algorithm proved to be the best. However, In the k-nearest neighbor clustering algorithm using its execution time was incomparably longer than in the the Minkowski metric, 10 tests were performed each for other cases. So if we look at the overall optimality of the different k, where k was assumed to be 1, 2, 3, and 4. performance, the Bayes algorithm with a slightly worse 10 Krzysztof Brzoza et al. CEUR Workshop Proceedings 7–11 result for a much faster execution time can be determined ral networks and reinforcement learning, in: CEUR as the best algorithm for our task. Workshop Proceedings, volume 1260, 2014. However, the most important factor is the accuracy of [12] H. Das, B. Naik, H. Behera, Medical disease analysis the algorithm because the life of the fetus depends on using neuro-fuzzy with feature extraction model the correctness of the diagnosis. for classification, Informatics in Medicine Unlocked 18 (2020) 100288. [13] G. Capizzi, S. Coco, G. L. Sciuto, C. Napoli, A new References iterative fir filter design approach using a gaussian approximation, IEEE Signal Processing Letters 25 [1] G. Capizzi, G. Lo Sciuto, C. Napoli, E. Tramontana, (2018) 1615–1619. M. Woźniak, A novel neural networks-based tex- [14] D. Połap, M. Woźniak, Meta-heuristic as manager ture image processing algorithm for orange defects in federated learning approaches for image process- classification, International Journal of Computer ing purposes, Applied Soft Computing 113 (2021) Science and Applications 13 (2016) 45–60. 107872. [2] G. Capizzi, G. Lo Sciuto, C. Napoli, R. Shikler, [15] S. Russo, S. I. Illari, R. Avanzato, C. Napoli, Reduc- M. Wozniak, Optimizing the organic solar cell ing the psychological burden of isolated oncologi- manufacturing process by means of afm measure- cal patients by means of decision trees, in: CEUR ments and neural networks, Energies 11 (2018). Workshop Proceedings, volume 2768, 2020. doi:10.3390/en11051221. [16] G. Guo, H. Wang, D. Bell, Y. Bi, K. Greer, Knn [3] G. Lo Sciuto, G. Capizzi, S. Coco, R. Shikler, Geomet- model-based approach in classification, in: OTM ric shape optimization of organic solar cells for effi- Confederated International Conferences" On the ciency enhancement by neural networks, Lecture Move to Meaningful Internet Systems", Springer, Notes in Mechanical Engineering (2017) 789–796. 2003, pp. 986–996. doi:10.1007/978-3-319-45781-9_79. [17] K. P. Murphy, et al., Naive bayes classifiers, Uni- [4] M. Wieczorek, J. Siłka, D. Połap, M. Woźniak, versity of British Columbia 18 (2006) 1–8. R. Damaševičius, Real-time neural network based [18] H. Aktaş, N. Çağman, Soft sets and soft groups, predictor for cov19 virus spread, Plos one 15 (2020) Information sciences 177 (2007) 2726–2735. e0243189. [19] R. C. De Amorim, B. Mirkin, Minkowski metric, [5] S. I. Illari, S. Russo, R. Avanzato, C. Napoli, A cloud- feature weighting and anomalous cluster initializ- oriented architecture for the remote assessment and ing in k-means clustering, Pattern Recognition 45 follow-up of hospitalized patients, in: Symposium (2012) 1061–1075. for Young Scientists in Technology, Engineering [20] D. Ayres-de Campos, J. Bernardes, A. Garrido, and Mathematics, volume 2694, 2020. J. Marques-de Sa, L. Pereira-Leite, Sisporto [6] Z. Zhang, W. Yao, Y. Wang, C. Long, X. Fu, Wuhan 2.0: a program for automated analysis of car- and hubei covid-19 mortality analysis reveals the diotocograms, Journal of Maternal-Fetal Medicine critical role of timely supply of medical resources, 9 (2000) 311–318. Journal of Infection 81 (2020) 147–178. [7] D. Karimi, H. Dou, S. K. Warfield, A. Gholipour, Deep learning with noisy labels: Exploring tech- niques and remedies in medical image analysis, Medical Image Analysis 65 (2020) 101759. [8] S. Stolte, R. Fang, A survey on medical image analy- sis in diabetic retinopathy, Medical image analysis 64 (2020) 101742. [9] D. Połap, Analysis of skin marks through the use of intelligent things, IEEE Access 7 (2019) 149355– 149363. [10] A. Mardani, R. E. Hooker, S. Ozkul, S. Yifan, M. Ni- lashi, H. Z. Sabzi, G. C. Fei, Application of deci- sion making and fuzzy sets theory to evaluate the healthcare and medical problems: a review of three decades of research with recent developments, Ex- pert Systems with Applications 137 (2019) 202–231. [11] C. Napoli, G. Pappalardo, E. Tramontana, An agent- driven semantical identifier using radial basis neu- 11