Comparison of AI Systems in Fetal Health Classification
Krzysztof Brzozaa , Kamila Gendasza and Michał Dzumlaa
a
     Faculty of Applied Mathematics, Silesian University of Technology, Kaszubska 23, 44100 Gliwice, POLAND


                                             Abstract
                                             Fetal health is a very important aspect in this day and age. The ability to predict the health status based on preliminary ex-
                                             aminations could help a lot in saving lives, especially if it is done by an artificial intelligence system, which would streamline
                                             the whole process. The issue is complicated by the multitude of different algorithms that vary in accuracy and execution
                                             time that can be used for such predictions. This project compares three popular artificial intelligence systems: k-nearest
                                             neighbor algorithm, naive Bayes classifier, soft sets to show which one of them will prove to be the best. The final results
                                             are shown in comparison graphs. In this article, the mathematical issues related to these algorithms and examples of their
                                             application in this task are presented.

                                             Keywords
                                             soft sets, knn, naive Bayes


1. Introduction                                                care problems. All mentioned solutions use different
                                                               approaches but the incoming data are analyzed mainly in
The reduction of child mortality is one of the key indi- two ways - taking the data, or only the extracted features.
cators of human progress. It is expected that by 2030, Such solutions were analyzed and discussed in terms of
countries end preventable deaths of newborns and chil- data analysis in [11, 12, 13, 14, 15].
dren under 5 years of age, with all countries aiming to          This work will be compared artificial intelligence sys-
reduce under-5 mortality. Parallel to the notion of child tems, their performance, and accuracy in predicting fetal
mortality is of course maternal mortality, during and fol- health.
lowing pregnancy and childbirth. The vast majority of            For this purpose, 3 artificial intelligence systems were
these deaths occurred in low-resource settings, and most used:
could have been prevented.
   Cardiotocographs (CTGs) is a simple and cost-accessible          • The k-nearest neighbours clustering algorithm [16].
option to assess fetal health, allowing healthcare profes-          • Naive Bayes classifier [17].
sionals to take action in order to prevent child and ma-            • Softsets [18].
ternal mortality. The equipment itself works by sending
ultrasound pulses and reading its response, thus shedding        to check which one of them will obtain the highest
light on fetal heart rate (FHR), fetal movements, uterine accuracy.
contractions, and more.
   Artificial intelligence is very important in all in all the
aspects of modern life [1, 2, 3]. In medical areas it is used 2. The mathematical part of the
for the prognosis of some diseases, classification, and            selected tool
even clustering. It can be seen in the research in this area.
In [4, 5], the neural network were used for prediction              • The k-nearest-neighbor clustering algorithm,to
covid-19 virus spread. Similar research was the analysis              maximize the probability of finding the most suit-
of mortality [6]. Again in [7, 8], the deep learning solu-            able solution, searches for it among k most similar
tions were analyzed and discussed in terms of application             solutions, and then chooses the most popular one
in medical image analysis tasks. Image analysis was also              by voting.
used in the skin evaluation mechanism [9]. In medical                 The easiest way to determine the similarity is to
systems not only neural networks are used but also fuzzy              use the distance, the smaller it is, the greater the
logic and computational intelligence. In [10], the fuzzy              similarity is. In this case, was used Minkowski
sets approach was used in the evaluation of the health-               metric [19].
                                                                                                                                            (︃ 𝑛                     )︃1/𝑚
SYSTEM 2021 @ Scholar’s Yearly Symposium of Technology,                                                                                       ∑︁
Engineering and Mathematics. July 27–29, 2021, Catania, IT                                                                    𝑀𝑚 (𝑥, 𝑦) =           |𝑥𝑖 − 𝑦𝑖 |   𝑚
                                                                                                                                                                             (1)
" krzybrz165@student.polsl.pl (K. Brzoza);                                                                                                    𝑖=1
kamigen389@student.polsl.pl (K. Gendasz);
michdzu822@student.polsl.pl (M. Dzumla)                                                                                   where
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       Commons License Attribution 4.0 International (CC BY 4.0).
                                       CEUR Workshop Proceedings (CEUR-WS.org)                                              − 𝑥𝑖 is the value from the sample vector.


                                                                                                                      7
Krzysztof Brzoza et al. CEUR Workshop Proceedings                                                                           7–11


Table 1                                                          2.1. A calculation example for our base
Sample value
                                                                 Assuming that our sample vector is shown in Tab. 2.
                                                                   And having a training set, the Minkowski metric was
          id      baseline value   ···    hist._tend             used to calculate the distance between the sample vector
           1          120.0        ···        1.0
                                                                 and the analyzed vectors of the training set:
                                                                                          (︃ 21                     )︃1/2
                                                                                            ∑︁
Table 2                                                                     𝑀2 (𝑥, 𝑦) =            |𝑥𝑖 − 𝑦𝑖 |   2
                                                                                                                             (3)
Used data                                                                                    𝑖=1

                                                                    where 𝑥𝑖 is the value from the column of the sample
     id        base. val.   ···    his._tend   fet._health       vector, and 𝑦𝑖 is the value of the column of the currently
      1          120.0      ···       1.0           2.0          analyzed vector from the training set. In project was
      2          132.0      ···       0.0           1.0          assumed the value m = 2. After analyzing the entire
    ···           ···       ···       ···          ···
                                                                 training set, the distances between the sample vector and
   2126          140.0      ···       1.0           2.0
                                                                 the training set vectors are sorted. Next, a neighborhood
   2127          142.0      ···       0.0           1.0
                                                                 is selected for the specified k nearest neighbors based
                                                                 on the previously calculated distances, and the most fre-
                                                                 quent label is selected.
            − 𝑦𝑗 is the value of the currently selected             A naive Bayes classifier based on Bayes’ theorem:
              vector from the training set.                                                  𝑃 (𝐵|𝐴)𝑃 (𝐴)
            − 𝑖 index of the currently analysed column                          𝑃 (𝐴|𝐵) =                                    (4)
                                                                                                 𝑃 (𝐵)
              from the database.
                                                                    is particularly suitable for problems with very many
Description of the KNN clustering algorithm:                     dimensions in the input. Despite the simplicity of the
                                                                 method, it often performs better than other very complex
    1. Performing data normalization.                            classifier methods. The classifier learns by analyzing a set
    2. Counting the distance between the test vector             of learning data for which the correct classes are given.
       and all vectors of the training set.                         The model includes 𝑃 (𝑋|𝑌 )-the probability of ob-
    3. Sorting distances.                                        servations for different class labels and 𝑃 (𝑌 )-a prior
    4. Selection of the label of the k-nearest vectors in        probability, which is the probability calculated before
       relation to the test vector.                              the random experiment is performed. This is the classi-
    5. Assign the most frequent label as the label of the        cal probability calculated in the same way as the overall
       test vector.                                              probability. Bayes’ rule is used to determine 𝑃 (𝑌 |𝑋)-the
    6. Intheeventofatieinthevoting, the selection shall          conditional probability of a class for a given observation.
       be made by a lot.                                         The label for which the probability is highest is chosen.
                                                                 The model is called naive because it assumes a very strong
   To improve the performance of the algorithm, data             simplification that says that for a fixed class label all are
normalization was applied. This causes all dimensions            features are independent of each other.
for which the distance is calculated to have equal sig-
nificance. Otherwise, a situation could arise in which 2.1.1. A calculation example for this base
a single dimension would dominate other dimensions.
Thanks to normalization we obtain a situation, where Classify the object X = (133.0, ..., 0.0) for this table (deci-
the values of a variable belonging to the interval [0;1].  sion means 1.0 - Normal, 2.0 - Suspect, 3.0 - Pathological)
                                                           what is shown in Tab. 2.
                         𝑥𝑗 (𝑖) − 𝑚𝑖𝑛(𝑥𝑗 )                    After assigning a column to a 1.0/2.0/3.0 group, it is
            𝑥𝑗 (𝑖) =                                   (2)
                      𝑚𝑎𝑥(𝑥𝑗 ) − 𝑚𝑖𝑛(𝑥𝑗 )                  needed to determine which probability is higher.

    − 𝑖 is another index of the vector.                               • 𝑃 (𝐶 = 𝐶1 |𝑋) e.g. P(decision = 1.0| base. val. =
    − 𝑗 is the index of the variable.                                   133.0, ..., hist._tend .= 0.0),
    − 𝑚𝑎𝑥(𝑥𝑗 ) is the maximum value of j.                             • 𝑃 (𝐶 = 𝐶2 |𝑋) e.g. P(decision = 2.0| base. val. =
    − 𝑚𝑖𝑛(𝑥𝑗 ) is the minimum value of j.                               133.0, ..., hist._tend .= 0.0),
                                                                      • 𝑃 (𝐶 = 𝐶3 |𝑋) e.g. P(decision = 3.0| base. val. =
                                                                        133.0, ..., hist._tend .= 0.0)


                                                             8
Krzysztof Brzoza et al. CEUR Workshop Proceedings                                                                       7–11


   So there is the need to calculate 𝑃 (𝐶 = 𝐶1 ) · 𝑃 (𝐶 =          Table 3
𝐶1 |𝑋), 𝑃 (𝐶 = 𝐶2 ) · 𝑃 (𝐶 = 𝐶2 |𝑋) and 𝑃 (𝐶 = 𝐶3 ) ·              Used data
𝑃 (𝐶 = 𝐶3 |𝑋) and compare the results.
                                                                           U     baseline value   ···    his._tend.
    1. 𝑃 (𝐶 = 𝐶𝑗 ) = 13                                                    ℎ1           1         ···         0
    2. 𝑃 (𝑎𝑖 = [𝑣𝑎𝑙𝑢𝑒]|𝐶 = 𝐶𝑗 ) = [𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦] where                      ℎ2           0         ···         1
       under i insert the number of the next column,                       ℎ3           1         ···         1
       under [𝑣𝑎𝑙𝑢𝑒] the value of that column, and un-
       der [probability] the calculated probability value,
       𝑗 ∈ {1, 2, 3}.
                                                                          Based on this, a soft set table can be created (see
   Then it is needed to substitute the calculated values                  Tab. 3).
into the formula:                                                         Due to the simplicity of this system, subsets of
                                                                          a soft set can be determined very easily. E.g. (F,
               21
               ∏︁                                                         A) is a subset of(G,B)if 𝐴 ⊂ 𝐵 and ∀∈ ∈ 𝐴1 and
 𝑃 (𝐶 = 𝐶𝑗 ·         𝑃 (𝑎𝑖 = 𝑥𝑖 |𝐶 = 𝐶𝑗 )) = [𝑟𝑒𝑠𝑢𝑙𝑡𝑗 ] (5)               𝐹 (∈) and 𝐺(∈) are the same. Two soft sets will
               𝑖=1                                                        be equal when the previous poset condition is
   It is checked which result (probability value) is greater              satisfied and when (G,B) is also a poset of (F, A).
which determines which class the object X should be
classified in.

     • Soft sets are defined by a set of properties which,
                                                                   3. Description of the database
       however, are not completely precise (e.g. instead           The project was used a ready-made database, which can
       of saying that it is 15∘ C outside one can say that         be downloaded at [20].
       it is warm). The general theory of soft sets is very          The database consists of one table with 22 columns. 21
       similar to that of fuzzy sets but much simplified.          columns are numerical data defining the CTG study and
       Thanks to the fact that there are no restrictions           the last text column defines the fetal status.
       on the approximate description of the object, it is           Description of the columns:
       very easy to apply this method in practice.
       Mathematical assumptions: U - the class of all                   • ’baseline value’- FHR baseline (beats per minute).
       elements considered. E - parameter class P(U)                    • ’accelerations’ - Number of accelerations per sec-
       - set of all subsets of U 𝐴 ⊆ 𝐸 - components                       ond.
       considered                                                       • ’fetal_movement’- Number of fetal movements
       So the pair (F,A) is called a soft set, where F is an              per second.
       assignment:
                                                                        • ’uterine_contractions’ - Number of uterine con-
                                                                          tractions per second.
                           𝐹 : 𝐴 → 𝑃 (𝑈 )                (6)
                                                                        • ’light_decelerations’- Number of light decelera-
        Calculation example:                                              tions per second.
           − U - fetal health class                                     • ’severe_decelerations’ - Number of severe decel-
           − E - a set of parameters that determine the                   erations per second.
             health of the fetus                                        • ’prolongued_decelerations’- Number of prolonged
           − A = {baseline value, ..., hist._tend.}                       decelerations per second.
                                                                        • ’abnormal_short_term_variability’- Percentage
       The class consists of three health states.
                                                                          of time with abnormal short term variability.
       U = {Normal, Suspect, Pathological}
                                                                        • ’mean_value_of_short_term_variability’ - Mean
       E = {𝑒1 , · · · , 𝑒21 }
                                                                          value of short term variability.
       What follows is the definition of subsets specify-
       ing the characteristics that health states satisfy               • ’percentage_of_time_with_abnormal_long_term_
       given assumptions 𝑒𝑖 , where 𝑖 ∈ 1 : 21:                           variability’ - Percentage of time with abnormal
       F(𝑒1 ) = {Normal}                                                  long term variability.
       F(𝑒2 ) = {Normal, Suspect}                                       • ’mean_value_of_long_term_variability’- Mean
       ...                                                                value of long term variability.
       F(𝑒20 ) = {Pathological}                                         • ’histogram_width’ - Width of FHR histogram
       F(𝑒21 ) = {Normal}                                               • ’histogram_min’- Minimum (low frequency) of
                                                                          FHR histogram.


                                                               9
Krzysztof Brzoza et al. CEUR Workshop Proceedings                                                                   7–11


Figure 1: Time analysis.                                      Figure 2: Obtained accuracy level.


     • ’histogram_max’ - Maximum (high frequency) of       For all k achieved average precision values were around
       FHR histogram.                                   90, which is considered to be a very good result, and it
     • ’histogram_number_of_peaks’ - Number of his-     is concluded that this algorithm performs very well for
       togram peaks.                                    the database. Unfortunately, a big disadvantage for this
     • ’histogram_number_of_zeroes’ - Number of his-    algorithm is the execution time against a large database.
       togram zeros.                                    In each iteration, for each validation sample, distances
                                                        to nearest neighbors are recalculated and sorted anew,
     • ’histogram_mode’ - Histogram mode.
                                                        hence the execution time for the database, for one itera-
     • ’histogram_mean’- Histogram mean.
                                                        tion was about 173 seconds.
     • ’histogram_median’- Histogram median.               In the naive Bayes classifier algorithm, the final results
     • ’histogram_variance’- Histogram variance.        are not as satisfactory as in the k-nearest neighbor clus-
     • ’histogram_tendency’ - Histogram tendency.       tering algorithm. The average precision obtained for 10
                                                        iterations was about 82%. This is not a bad result, but
                                                        compared to the previous algorithm, a significant differ-
4. Tests                                                ence can be seen. It was also checked if the algorithm
10 tests were performed for each system and extracted   would   improve its final result when splitting the data
the average values, obtaining the following results:    80:20  for  training and validation data. Unfortunately,
                                                        the precision did not improve. The advantage of this
     ∙ A k-nearest neighbours clustering algorithm:     algorithm is the speed of its execution, oscillating around
                                                        0.3 seconds.
          − for 𝑘 = 1 : 90.61%                             For classification using soft sets, an average obtained
          − for 𝑘 = 2 : 90.24%                          accuracy was oscillating 82%. The execution time of
          − for 𝑘 = 3 : 90.39%                          the algorithm for 1 iteration was about 4 seconds. In
          − for 𝑘 = 3 : 90.39%                          this method, quarterlies were used due to the strongly
     ∙ Naive Bayes-nearest-neighbour classifier: 82.43% overlapping characteristics of the CTG surveys analyzed.
                                                           Graphs comparing the accuracy and execution time
     ∙ Soft sets: 82.77%
                                                        of the algorithms used in our project are shown in Fig. 1
                                                        and 2.

5. Experiments
                                                              6. Conclusions
For each of them, a certain number of tests were per-
formed and extracted average values to better specify the     After an accuracy comparison, the k-nearest-neighbor
final result.                                                 clustering algorithm proved to be the best. However,
   In the k-nearest neighbor clustering algorithm using       its execution time was incomparably longer than in the
the Minkowski metric, 10 tests were performed each for        other cases. So if we look at the overall optimality of the
different k, where k was assumed to be 1, 2, 3, and 4.        performance, the Bayes algorithm with a slightly worse


                                                         10
Krzysztof Brzoza et al. CEUR Workshop Proceedings                                                                      7–11


result for a much faster execution time can be determined             ral networks and reinforcement learning, in: CEUR
as the best algorithm for our task.                                   Workshop Proceedings, volume 1260, 2014.
   However, the most important factor is the accuracy of         [12] H. Das, B. Naik, H. Behera, Medical disease analysis
the algorithm because the life of the fetus depends on                using neuro-fuzzy with feature extraction model
the correctness of the diagnosis.                                     for classification, Informatics in Medicine Unlocked
                                                                      18 (2020) 100288.
                                                                 [13] G. Capizzi, S. Coco, G. L. Sciuto, C. Napoli, A new
References                                                            iterative fir filter design approach using a gaussian
                                                                      approximation, IEEE Signal Processing Letters 25
 [1] G. Capizzi, G. Lo Sciuto, C. Napoli, E. Tramontana,
                                                                      (2018) 1615–1619.
     M. Woźniak, A novel neural networks-based tex-
                                                                 [14] D. Połap, M. Woźniak, Meta-heuristic as manager
     ture image processing algorithm for orange defects
                                                                      in federated learning approaches for image process-
     classification, International Journal of Computer
                                                                      ing purposes, Applied Soft Computing 113 (2021)
     Science and Applications 13 (2016) 45–60.
                                                                      107872.
 [2] G. Capizzi, G. Lo Sciuto, C. Napoli, R. Shikler,
                                                                 [15] S. Russo, S. I. Illari, R. Avanzato, C. Napoli, Reduc-
     M. Wozniak, Optimizing the organic solar cell
                                                                      ing the psychological burden of isolated oncologi-
     manufacturing process by means of afm measure-
                                                                      cal patients by means of decision trees, in: CEUR
     ments and neural networks, Energies 11 (2018).
                                                                      Workshop Proceedings, volume 2768, 2020.
     doi:10.3390/en11051221.
                                                                 [16] G. Guo, H. Wang, D. Bell, Y. Bi, K. Greer, Knn
 [3] G. Lo Sciuto, G. Capizzi, S. Coco, R. Shikler, Geomet-
                                                                      model-based approach in classification, in: OTM
     ric shape optimization of organic solar cells for effi-
                                                                      Confederated International Conferences" On the
     ciency enhancement by neural networks, Lecture
                                                                      Move to Meaningful Internet Systems", Springer,
     Notes in Mechanical Engineering (2017) 789–796.
                                                                      2003, pp. 986–996.
     doi:10.1007/978-3-319-45781-9_79.
                                                                 [17] K. P. Murphy, et al., Naive bayes classifiers, Uni-
 [4] M. Wieczorek, J. Siłka, D. Połap, M. Woźniak,
                                                                      versity of British Columbia 18 (2006) 1–8.
     R. Damaševičius, Real-time neural network based
                                                                 [18] H. Aktaş, N. Çağman, Soft sets and soft groups,
     predictor for cov19 virus spread, Plos one 15 (2020)
                                                                      Information sciences 177 (2007) 2726–2735.
     e0243189.
                                                                 [19] R. C. De Amorim, B. Mirkin, Minkowski metric,
 [5] S. I. Illari, S. Russo, R. Avanzato, C. Napoli, A cloud-
                                                                      feature weighting and anomalous cluster initializ-
     oriented architecture for the remote assessment and
                                                                      ing in k-means clustering, Pattern Recognition 45
     follow-up of hospitalized patients, in: Symposium
                                                                      (2012) 1061–1075.
     for Young Scientists in Technology, Engineering
                                                                 [20] D. Ayres-de Campos, J. Bernardes, A. Garrido,
     and Mathematics, volume 2694, 2020.
                                                                      J. Marques-de Sa, L. Pereira-Leite,          Sisporto
 [6] Z. Zhang, W. Yao, Y. Wang, C. Long, X. Fu, Wuhan
                                                                      2.0: a program for automated analysis of car-
     and hubei covid-19 mortality analysis reveals the
                                                                      diotocograms, Journal of Maternal-Fetal Medicine
     critical role of timely supply of medical resources,
                                                                      9 (2000) 311–318.
     Journal of Infection 81 (2020) 147–178.
 [7] D. Karimi, H. Dou, S. K. Warfield, A. Gholipour,
     Deep learning with noisy labels: Exploring tech-
     niques and remedies in medical image analysis,
     Medical Image Analysis 65 (2020) 101759.
 [8] S. Stolte, R. Fang, A survey on medical image analy-
     sis in diabetic retinopathy, Medical image analysis
     64 (2020) 101742.
 [9] D. Połap, Analysis of skin marks through the use
     of intelligent things, IEEE Access 7 (2019) 149355–
     149363.
[10] A. Mardani, R. E. Hooker, S. Ozkul, S. Yifan, M. Ni-
     lashi, H. Z. Sabzi, G. C. Fei, Application of deci-
     sion making and fuzzy sets theory to evaluate the
     healthcare and medical problems: a review of three
     decades of research with recent developments, Ex-
     pert Systems with Applications 137 (2019) 202–231.
[11] C. Napoli, G. Pappalardo, E. Tramontana, An agent-
     driven semantical identifier using radial basis neu-


                                                            11