<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Heart disease prediction system utilizing Naive Bayes classifiers with added priorities* KNN and</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan Mierzwa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jakub Płocidem</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Applied Mathematics, Silesian University of Technology</institution>
          ,
          <addr-line>Kaszubska 23, 44100 Gliwice</addr-line>
          ,
          <country country="PL">POLAND</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Cardiovascular diseases are one of the most common causes of death for both man and woman across the globe. Diagnosing a heart illness can lead change in lifestyle thus lowering the risk of said illness being fatal. The main goal while writing this paper was to create a prediction system based on K-Nearest Neighbours (KNN) and Naive Bayes classifiers with added automatically calculated weight for each parameter. With created prediction system we managed to achieve final average accuracy of around 81% with no dominant classifier what only reinforces the importance of comparing the results.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Naive Bayes</kwd>
        <kwd>K-Nearest Neighbours</kwd>
        <kwd>Heart illness prediction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The heart is one of the most important organs in human body. While healthy it is a size of a
clenched fist located behind and slightly to the left of the breastbone between lungs. On
average the heart beats roughly 100 000 times, moving around 7600 litres of blood in a single day
supplementing the remaining organs in our body with oxygen an nutrients needed for the proper
functioning of our body [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. When that one singular muscle starts to function abnormally it can
quickly cause a lot of harm to wide range of other organs what in turn can lead to lifelong
consequences or even death.
      </p>
      <p>
        For years cardiovascular diseases were named as one of the leading causes of death. The British Heart
Foundation (BHF) expressed that as of January 2024 around 620 million people across the world
live with heart disease what means roughly 1 in 13 people and if the current trend continues
that number will only rise due to ageing and growing population and changing lifestyles. It
is estimated that in 2021 around 20.5 million deaths were caused by a heart or circulatory
diseases [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This number is approximately 30% of deaths globally. Undeniably that number could
be lowered with the early diagnosis and appropriate countermeasures.
      </p>
      <p>
        A key to raising awareness and thus earlier diagnosis could be creating a prediction model that
could help detect potential risk of developing a cardiovascular disease [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Data analysis has
been used in a large number of fields including medicine itself where by comparing patients
tests results to a data with already diagnosed illnesses the process is made easier and faster
compared to a more traditional methods. The idea of automating that process and building
prediction models has been implemented many times and can be found in many courses about
machine learning, yet on a large scale it remains unused most likely due to negative effects that
solution could have.
      </p>
      <p>
        Outside of risks connected to security or public opinion on broadly defined artificial
intelligence and by that machine learning [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ], there are risks purely connected to the way
those systems work [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ], as they are not one hundred percent accurate and vast majority of
them can be easily mislead by a large amount of ambiguous data. In this study we created a
prediction model with automatically calculated weights of each parameter using the K-Nearest
Neighbours (KNN) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and Naive Bayes classifiers [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] for both normalized and not normalized data.
The main idea behind that program was to create a model which not only uses classifiers best
performing in a given task but also automatically omits unimportant data, at the same time
prioritizing the more relevant parameters.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <sec id="sec-2-1">
        <title>2.1. Purpose of using multiple classifiers</title>
        <p>Most classifiers have a unique way to classify data and because of that different strengths
and weaknesses. For example Naive Bayes algorithm has a really good accuracy for sets with
independent data, but when predictors are dependent the accuracy is considerably lower. Using
multiple different algorithms allows to create more universal prediction model where input
data does not need to be primarily analyzed as thoroughly thus saving time prior to diagnosis.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. K-Nearest Neighbours</title>
        <p>
          K-Nearest Neighbors is a fairly old algorithm, it was developed in 1951 and is mainly used
for classification and regression. The principle behind this algorithm is not complicated and
relatively easy to understand. The primary goal of the algorithm is to assign a data point, that
is being tested, to the most suitable class of already known data points that are present in the
dataset.[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]
Firstly, the distance from the tested data point to all other data points from the dataset has to be
calculated. It is done by using pre-designed formulas called "metrics". It is worth highlighting
that the distance has to be calculated in a multi-dimensional space, where each dimension is a
feature that the data point has. There are many metrics to choose from but there is none that
would be universally the best. The usefulness of a metric depends on the dataset in which
distances are calculated. It is a good idea to choose the right one after performing multiple tests on
different metrics. The best metric is usually the one that allows the algorithm to achieve the
highest overall accuracy.
        </p>
        <p>The next step, after the distances are already calculated, is to select from all of the data points
in the dataset "K" points that are the closest to the tested data point. The variable "K", which
appears also in the name of the algorithm, is an integer that indicates how many "neighbors" of
the tested data point should be taken into account. As it was in the case of metrics, there isn’t
one number that would give the best results regardless of the database used. The number that
gives the best accuracy of the algorithm in a specific database should be the one chosen.
The last step, when "K" nearest neighbors have been selected, is to decide which class the tested data
point should be assigned to. The classes are simply all of the possible outcomes of the target
feature. For example, in this project data points can be divided into two main classes: patients
that are healthy and patients that have a heart disease. Then, the occurrences of every class among
"K" nearest data points are counted and the tested data point is assigned to the class that
appeared the most frequently.</p>
        <p>
          It is important to note that the K-Nearest Neighbors algorithm has also some
disadvantages. One of them is called "the curse of dimensionality" [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. This problem can be
observed when the dataset contains a significant number of features. In a scenario like that
the metrics used to calculate distances may return results that do not reflect a real situation
precisely and it can lead to lowered accuracy of the algorithm. The sever- ity of this
problem can be mitigated by taking proper precautions, for example by ex- cluding certain
features from the dataset or by assigning priorities to them. This pro- cess is described in a
more detailed manner in the subsequent sections of this article.
        </p>
        <p>Algorithm 1: K-Nearest Neighbors algorithm with "Manhattan" metric</p>
        <sec id="sec-2-2-1">
          <title>Data:</title>
          <p>k - number of neighbors
X_train - set of training samples
Y_train - set of results for training samples
X_test - set of test samples
argsort(list) - function returning indices of list elements sorted in ascending order first(list,
k) - function returning first "k" elements of a list
most_frequent(list) - function returning an element that appears the most frequently in the list
Result:
predictions - list of predicted outcomes for all of the tested data points from X_test dataset
1 predictions = []
2 for  in _ do
3 distances = []
4 for  in _ do
5 d = 0
6 for ( = 0;  &lt; ℎ();  + +) do
7 d += | − | · 
8 distances.append(d)
16 return predictions</p>
          <p>Algorithm 2: Helper function to calculate the Gaussian distribution
Data:
x - a tested data point
n - number of features
mean - a list of means of featurs across certain class var
- a list of variances of features across certain class
1 results = []
2 for i in range(n) do
3
4</p>
          <p>results.append(p)</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>5 return results</title>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Naive Bayes</title>
        <p>
          The Naive Bayes algorithm belongs to the family of probabilistic classifiers. This algorithm
treats all features of a tested data point as independent of each other, even if in reality there is
some degree of correlation between them. Unlike the K-Nearest Neighbors, this classifier does not
use multiple different metrics and also does not utilize any variables whose values had to be set
before using this algorithm. The Naive Bayes classifier works by performing statistical analysis
of the training data (e.g. finding the mean of all the values that belong to a certain feature)
and by making calculations on the data points’ features. This whole process can be split into a
couple of steps.[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
In the first place, target classes have to be extracted from the dataset and every data point from the
training set has to be assigned to its class. Then, a statistical analysis has to be performed for
every class. By the end of this process, the classes should have three values calculated: the
proportion of the class elements compared to the number of elements in the entire dataset,
which is often described as "prior probability"; the mean of all values that belong to a certain
feature of data points in this class; the variance of these values.
        </p>
        <p>The next step after that is to calculate the "posterior probability" for every class. It can be done by
applying the Gaussian distribution formula to every feature of a tested data point. The
formula is as follows:
 - value of a certain feature of a tested data point
 - the mean of feature values across certain class
 2 - the variance of feature values across certain class</p>
        <p>Results received after applying the above formula to every feature of a tested data point
should be summed up and added to "prior probability". The result is "posterior probability" and
has to be calculated for every class.</p>
        <p>The last step is to select the class with the highest "posterior probability" value. This class is the
predicted class for the tested data point.</p>
        <p>Algorithm 3: Naive Bayes classifier</p>
        <p>Data: X_train - set of training samples, Y_train - set of results for training samples, X_test - set
of test samples, Y_test - set of results for test samples
gauss(x,mean,var) - helper function defined earlier, used to calculate the Gaussian distribution
unique(list) - function returning unique values from a list
get_matching(list, condition) - function returning elements from a list that are matching a
condition
len(list) - counts elements in a list
get_mean(matrix) - calculates mean of values in every column of a matrix, returns a list of those values
get_var(matrix) - calculates variance of values in every column of a matrix, returns a list of those values
log(list) - logarithms every element on a list
sum(list) - sums all elements on a list
max(dict) - returns key from a dictionary with the highest value
Result: predictions - list of predicted outcomes for all of the tested data points from X_test
dataset
1 predictions = []</p>
        <p>// {} defines dictionary with key:value pairs
2 means = {}
3 vars = {}
4 prior = {}
5 classes = unique(Y_train)
6 for c in classes do
7 x_c = get_matching(X_train, Y_train == c)
8 priori[c] = len(x_c) / len(X_train)
9 means[c] = get_mean(x_c)
10 vars[c] = get_var(x_c)
11 for x in X_test do
12 posterior = {}
13 for c in classes do
14 p = sum(log(gauss(x, means[c], vars[c])))
15 p += log(priori[c])
16 posterior[c] = p
19 return predictions</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Calculating weight</title>
        <p>Depending on data contained within used database different features might have different
influence on a result. In most cases to achieve the highest possible accuracy of a classifier
the database is analysed prior to making predictions, in this project we decided to automate
that process but also include prioritisation of the most influential data further increasing the
accuracy of each classifier. In the following pseudo-code weights are referred to as priorities.</p>
        <p>Algorithm 4: Calculating priorities</p>
        <sec id="sec-2-4-1">
          <title>Data:</title>
          <p>X_train, Y_train - set of a training samples and corresponding results
X_test, Y_train - set of test samples and corresponding results
n - number of features
Classifier.Predict() - function returning predictions made by Classifier for which the priorities are being
calculated
Classifier.Fit() - function used to specify the training sets for the classifier
Calculate_accuracy() - function returning accuracy of a prediction Result:
p - table with calculated priorities
1 Classifier.Fit(X_train, Y_train)
2 for  in range (0, ) do
3 p[index] = 1/n
4 for  in range (0, ) do
5 y_pred_first = Classifier.predict(X_test, p)
6 y_acc_first = Calculate_accuracy(Y_train, y_pred_first)
7 p[index] = 0
8 y_pred_second = Classifier.predict(X_test, p)
9 y_acc_second = Calculate_accuracy(Y_train, y_pred_second)
10 if y_acc_first &gt; y_acc_second then
11 p[index] = 1/n
12 y_pred_first = Classifier.predict(X_test, p)
13 y_acc_first = Calculate_accuracy(Y_train, y_pred_first)
14 p[index] += 1/n
15 y_pred_second = Classifier.predict(X_test, p)
16 y_acc_second = Calculate_accuracy(Y_train, y_pred_second)
17 while y_acc_first &lt; y_acc_second do
18 y_pred_first = Classifier.predict(X_test, p)
19 y_acc_first = Calculate_accuracy(Y_train, y_pred_first)
20 p[index] += 1/n
21 y_pred_second = Classifier.predict(X_test, p)
22 y_acc_second = Calculate_accuracy(Y_train, y_pred_second)
23 return 
In this study, public Heart Disease Dataset available on Kaggle was used. The dataset consists
of medical data of 1047 patients split into 12 features each. Below are presented the table with
features in dataset and a plot representing correlation between features.</p>
          <p>Description</p>
          <p>Age of a patient
Gender of a patient</p>
          <p>Type of chest pain
Resting blood pressure in mm Hg</p>
          <p>serum cholesterol in mg/dl
Fasting blood sugar above 120 mg/dl
Resting electrocardiographic results</p>
          <p>Max heart rate achieved</p>
          <p>Occurrence of Exercise induced angina
ST depression induced by exercise relative to rest</p>
          <p>Type of slope of the peak exercise ST segment</p>
          <p>Diagnosis of a heart illness</p>
          <p>Type of value</p>
          <p>Integer</p>
          <p>Binary
Integer in range 1 - 4</p>
          <p>Integer
Integer</p>
          <p>Boolean
Integer in range 0 - 2</p>
          <p>Integer
Boolean</p>
          <p>Float
Integer in range 0 - 3</p>
          <p>Boolean</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <sec id="sec-3-1">
        <title>3.1. Used database</title>
        <p>From the above plot we can conclude that features in dataset are independent, thus improving
accuracy of a Naive-Bayes classifier.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Metric choice in KNN</title>
        <p>Metric choice is an important decision to make because it can greatly influence the final
accuracy of the classifier. In this project, three popular metrics were considered: "The Euclidean
metric", "The Manhattan metric" and "The Minkowski metric". Every metric uses different
formula to calculate distances from tested data point to training data points.
1. The Euclidean metric:
2. The Manhattan metric:
3. The Minkowski metric:
 - tested data point
 - training data point
(, ) - distance from tested data point to training data point
 - feature of tested data point with index equal to 
 - feature of training data point with index equal to 
 - integer used in the Minkowski metric
The process of choosing the best metric for this project is described in the next sub- section.
3.3. Selecting the most efficient metric and the best amount of Neighbours in KNN
To determine the most efficient metric and the best amount of neighbors an automation script was
used. The script automatically tweaks two main parameters of the KNN algorithm: amount of
neighbors "K" and metric. Moreover, it performs checking the classifier accuracy multiple
times on different test and training datasets to eliminate edge cases and get the average accuracy. The
script starts checking accuracy starting with  = 5 for all of the three metrics and then
gradually increases the number of neighbors with step equal to 5 and goes up to 200. For every
number of neighbors and every metric the prediction is done  = 50 times, then the mean
value is calculated. Results returned by this script have been visualized in the plot:</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.4. Analysis of results</title>
        <p>Based on the performed experiment a couple of things can be concluded. The first thing that is
clearly visible in the plot is the fact that the Manhattan metric is on average 5 percent more
accurate than the other two metrics of which both have very similar accuracies. The second
thing that can be observed in the plot is that there is a general trend according to which the
more neighbors the classifier has the more accurate it is. It is also worth noting that the plot is
flattening at about  = 50 and further increments of the number of neighbors do not provide
any significant increase in accuracy. As a result of this experiment the Manhattan metric and the
number of neighbors equal to 50 were assumed to be the best parameters for KNN classifier used on
the database specific to this project.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>The final model used both KNN and Naive-Bayes classifiers on both normalized and not normal- ized
data. After conducting several tests we achieved average accuracy of around 81%, and none of the
classifier was used significantly more than other. In the future more classifiers might be added
or the model itself might be adapted for prediction of different diseases.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Harding</surname>
          </string-name>
          ,
          <source>The Exquisite Machine: The New Science of the Heart</source>
          , MIT Press,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B. H.</given-names>
            <surname>Foundation</surname>
          </string-name>
          ,
          <source>Global heart &amp; circulatory diseases factsheet</source>
          ,
          <year>2024</year>
          . URL: https://www.bhf.org.uk/-/media/files/for-professionals/research/heart-statistics/ bhf-cvd-statistics-global-factsheet.
          <source>pdf?rev=e61c05db17e9439a8c2e4720f6ca0a19&amp;hash= 6350DE1B2A19D939431D876311077C7B, (accessed: 18.05</source>
          .
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Nurmohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Kraaijenhof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Nicholls</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Koenig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Catapano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. S.</given-names>
            <surname>Stroes</surname>
          </string-name>
          ,
          <article-title>Proteomics and lipidomics in atherosclerotic cardiovascular disease risk prediction</article-title>
          ,
          <source>European Heart Journal</source>
          <volume>44</volume>
          (
          <year>2023</year>
          )
          <fpage>1594</fpage>
          -
          <lpage>1607</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sultana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sengupta</surname>
          </string-name>
          , D. De,
          <article-title>Xai-reduct: accuracy preservation despite dimensionality reduction for heart disease classification using explainable ai</article-title>
          ,
          <source>The Journal of Supercomputing</source>
          <volume>79</volume>
          (
          <year>2023</year>
          )
          <fpage>18167</fpage>
          -
          <lpage>18197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wieczorek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Siłka</surname>
          </string-name>
          ,
          <article-title>Bilstm deep neural network model for imbalanced medical data of iot systems</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          <volume>141</volume>
          (
          <year>2023</year>
          )
          <fpage>489</fpage>
          -
          <lpage>499</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Połap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          ,
          <article-title>Bacteria shape classification by the use of region covariance and convolutional neural network</article-title>
          , in: 2019
          <source>International Joint Conference on Neural Networks (IJCNN)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Magrupov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kuchkarova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gaibnazarov</surname>
          </string-name>
          ,
          <article-title>Methodology for building a computerized advisory expert system for the diagnosis of epilepsy in children</article-title>
          , Biomedical
          <string-name>
            <surname>Engineering</surname>
          </string-name>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jaszcz</surname>
          </string-name>
          ,
          <article-title>The impact of entropy weighting technique on mcdm-based rankings on patients using ambiguous medical data</article-title>
          ,
          <source>in: International Conference on Information and Software Technologies</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>329</fpage>
          -
          <lpage>340</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wózniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Połap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Nowicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Pappalardo</surname>
          </string-name>
          , E. Tramontana,
          <article-title>Novel approach toward medical signals classifier</article-title>
          , in: 2015
          <source>International Joint Conference on Neural Networks (IJCNN)</source>
          , IEEE,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>K.</given-names>
            <surname>Prokop</surname>
          </string-name>
          ,
          <article-title>Grey wolf optimizer combined with k-nn algorithm for clustering problem</article-title>
          ,
          <source>in: IVUS 2022: 27th International Conference on Information Technology</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Gohari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kazemnejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mohammadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Eskandari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saberi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Esmaieli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sheidaei</surname>
          </string-name>
          ,
          <article-title>A bayesian latent class extension of naive bayesian classifier and its application to the classification of gastric cancer patients</article-title>
          ,
          <source>BMC Medical Research Methodology</source>
          <volume>23</volume>
          (
          <year>2023</year>
          )
          <fpage>190</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12] t. f. e. Wikipedia, k-nearest
          <source>neighbors algorithm</source>
          ,
          <year>2024</year>
          . URL: https://en.wikipedia.org/wiki/ Knearest_neighbors_algorithm,
          <source>(accessed: 18.05</source>
          .
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>IBM</surname>
          </string-name>
          ,
          <article-title>What is the k-nearest neighbors (knn) algorithm?</article-title>
          , ???? URL: https://www.ibm.com/ topics/knn, (accessed:
          <fpage>18</fpage>
          .
          <fpage>05</fpage>
          .
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] t. f. e. Wikipedia, Naive bayes classifier,
          <year>2024</year>
          . URL: https://en.wikipedia.org/wiki/Naive_ Bayes_classifier,
          <source>(accessed: 18.05</source>
          .
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>