<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparison of Classifiers for Predicting Heart Attack in Patients*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oliwia Cimała</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Bocheńska</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Applied Mathematics, Silesian University of Technology</institution>
          ,
          <addr-line>Kaszubska 23, 44100 Gliwice</addr-line>
          ,
          <country country="PL">POLAND</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IVUS2024: Information Society and University Studies 2024</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1923</year>
      </pub-date>
      <abstract>
        <p>Heart attack predictions play a pivotal role in patients health. While having two options of fast responding to health issue, making many tests on patients to see whats wrong or compare information about the patients with others to classify a patient and narrow down the search to the right field. This study presents a comprehensive comparison of three classification algorithms - Soft Set Classifier, Naive Bayes, and K-Nearest Neighbors (KNN) - for predicting heart attack in patients. Through experimentation with different variations of these algorithms, including custom implementations, the project evaluates their effectiveness in recognizing high or low chance of heart attack. Methodologically, the project explores the nuances of each algorithm, discussing their underlying principles and implementation details. Experimental results reveal insights into the performance of each algorithm, providing valuable considerations for practical applications. Additionally, the project discusses the significance of precision, recall, F1-score, and accuracy metrics in assessing algorithm performance. Overall this study contributes to advancing heart attack prediction technology, offering valuable insights into algorithmic approaches.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Soft Set Classifier</kwd>
        <kwd>Naive Bayes</kwd>
        <kwd>K-Nearest Neighbors</kwd>
        <kwd>Heart Attack Prediction</kwd>
        <kwd>Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The heart is vital to the body’s function, acting as a powerful pump that circulates blood,
oxygen, and essential nutrients throughout the body. This cardiovascular system ensures that
all bodily tissues receive the resources they need to operate effectively. Consequently, any
issues with the heart can disrupt the normal functioning of other organs and systems, leading
to widespread health problems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Heart disease are the main responsible for one-third of all
human deaths in the world [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], making accurate and timely diagnosis critical for effective
treatment. Traditional diagnostic methods often rely on various tests and clinical evaluations,
which can be time-consuming and costly. With the advancement of machine learning, there is
an increasing interest in developing automated systems for predicting heart disease using
patient data [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ].
      </p>
      <p>
        Existing solutions leverage different algorithms to achieve this goal, including logistic regression,
decision tree, random forest, voting and neural networks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However, our study focuses on
comparing three distinct classifiers: the Soft Set Classifier [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Naive Bayes [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and K-Nearest
Neighbors (KNN) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Each of these algorithms offers unique advantages and challenges, which we
explore in the context of heart disease prediction.
      </p>
      <p>To get a closer look into the applied classifiers, the following paragraphs will briefly describe
them to illustrate the differences between these calculation methods.</p>
      <p>The Soft Set classifier is a flexible and general mathematical tool used for handling
uncertainty in data. It does not rely on predefinedprobabilities or distances, making it particularly
useful in situations where traditional probabilistic or distance-based models like Naive Bayes or
K-Nearest Neighbors (KNN) may not perform well. The classifier iteratively adjusts the
membership values based on the training data, thus enabling it to handle imprecise and vague
information effectively. The model’s adaptability to various forms of uncertainty makes it a
valuable tool in fieldswhere data ambiguity is prevalent.</p>
      <p>The Naive Bayes classifier is a probabilistic machine learning model based on Bayes’ theorem,
which calculates the probability of a certain class given a set of features. It assumes that the
features are conditionally independent, hence "naive."
K-Nearest Neighbors (KNN) is a non-parametric supervised learning algorithm used for
classificationand regression tasks. In KNN, the class of a new data point is determined by the
majority class among its k nearest neighbors in the feature space. It’s simple to implement and
understand but can be computationally expensive for large datasets, as it requires storing all
training data and computing distances for each prediction.</p>
      <p>All three algorithms have varying time consumption, with K-Nearest Neighbors (KNN) being
more computationally expensive due to its need to calculate distances for each prediction. While
making the algorithms we follow the same build of the specific class. The class contains two
functions the fit and predict, if needed also other functions like: distance or score of the given
sample. Now, let’s delve into a brief explanation of each of the applied algorithms and the
underlying thought process behind their selection. The first classifier is the Soft Set classifier that
is independently create. Next, the Naive Bayes classifieris from the library, change a little to be
built like a rest (it also have a fit, predict functions in Bayes class). The third classifier is a
KNearest Neighbours algorithm but in this instance written by us. It was created following
open-access models with an interest to achieve as high accuracy as possible. After performing the
calculations, each algorithm displays a matrix and a table with the results of the effectiveness in
definingof low or high probability of heart attack.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>This section details the methodologies used for each classifier, including their mathematical
foundations and implementation specifics.</p>
      <sec id="sec-2-1">
        <title>2.1. SoP Set Classifier</title>
        <p>The Soft Set Classifier, from a mathematical perspective, assigns to each element of the set X a
value from the interval &lt;-1, 1&gt;, representing the degree of membership of that element to the
set X. A membership value of 1 indicates assignment to the negative class, while a membership
value of -1 indicates assignment to the positive class.</p>
        <p>Algorithm 1: Soft Set Classifier</p>
        <p>Input: Training set  train , Training labels  train , Number of iterations  iters , Regularization
parameter  param</p>
        <p>Output: Fitted model Y
1 Initialize weight vector Y to zeros of length equal to the number of features;
2 for iteration in range  iters do
3 for each sample  i,  i in  train,  train do
4 if  i * classify( i) ≤ 1 then
5 Update Y by Y ← Y +  i *  i - 2 *  param * Y
6 Return Fitted weight vector Y
Algorithm 2: Soft Set Prediction</p>
        <p>Input: Test set  test, Fitted weight vector Y</p>
        <p>Output: Predicted labels  pred
1 for each sample  i in  test do
2 Compute classification score classification ← classify( i) );
3 Assign label  pred ← sign (classification);
4 return Predicted labels  pred</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Naive Bayes Classifier</title>
        <p>The Naive Bayes classifieris based on Bayes’ theorem and assumes that the features are
conditionally independent given the class label. The implementation follows these steps:
where  (|) is the posterior probability of class  given feature vector .</p>
        <p>Algorithm 3: Naive Bayes</p>
        <p>Input: Training set , Training labels , Test set</p>
        <p>Output: Predicted labels 
1 Step 1: Initialize the Gaussian Naive Bayes model;
2 Step 2: Fit the model with the training data  and ;
3 Step 3: Predict the labels for  using the trained model;</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. K-Nearest Neighbors (KNN) Classifier</title>
        <p>The KNN classifier classifies a sample based on the majority label among its-nearest neighbors in the
training set. The distance metric used is typically the Euclidean distance:</p>
        <p>Algorithm 4: KNN Algorithm</p>
        <p>Input: Training set , Training labels , Test set , Number of neighbors</p>
        <p>Output: Predicted labels 
1 for each sample  in  do
2 Compute distances between  and all samples in ;
3 Identify the -nearest neighbors;
4 Assign the label based on the majority vote of the neighbors;</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiments</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset Description</title>
        <p>The dataset includes records of patients along with their medical attributes and the presence or
absence of heart disease. The dataset contains 13 columns with different attributes: age, sex,
number of major vessels, chest pain type, resting blood pleasure, cholesterol, maximum heart
rate achieved, fasting blood sugar, resting electrocardiograph results, exercises, slope, thal rate and
the last column that we compare to (target variable).</p>
        <p>
          All records were first normalized and then subjected to further tests. The normalization function
operated on the basic min-max algorithm [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Data Splitting and Testing</title>
        <p>To evaluate the performance of our classifiers, we split the dataset into a training set and a test
set. This is a crucial step to ensure that the model can generalize well to unseen data. We used
the ‘train-test-split‘ function from the ‘sklearn.model-selection‘ library for this purpose.
X _ t r a i n , X _ t e s t , y _t r a i n , y _ t e s t = t r a i n _ t e s t _ s p l i t ( X
, y , t e s t _ s i z e = 0 . 3 5 , r a n d o m _ s t a t e = 42 )
This function performs the following tasks:
• Input Parameters:
– X: the feature matrix containing the input data for all samples.
– y: the target vector containing the labels for all samples.
– test_size=0.35: specifiesthe proportion of the dataset to include in the test split.
(Here, 35% of the data is allocated for testing, and the remaining 65% is used for
training.)
– random_state=42: this parameter ensures reproducibility of the results. By setting a
specific random state, we ensure that the same split is generated every time the
code is run.
– X_train: the feature matrix for the training set.
– X_test: the feature matrix for the test set.
– y_train: the target vector for the training set.</p>
        <p>– y_test: the target vector for the test set.</p>
        <p>By splitting the data into training and testing sets, we can train the model on one subset of
the data and evaluate its performance on another, independent subset. This approach helps in
assessing how well the model can generalize to new, unseen data and is an essential part of
model validation in machine learning.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Results Analysis</title>
        <p>To compare the different performance parameters of the used algorithms, we utilized the metrics
module from the ’sklearn’ library. The dataset containing numerical values in 13 different types of
attributes (medical data of the patient) with a total length of 303 records was divided into
training and testing sets in a 65:35 ratio. For each algorithm, we compared parameters such as:
• precision - it is a measure that determines the ratio of correctly predicted class elements to
all those marked as the given class
• recall - a measure that informs us how many elements from given class were correctly
recognized
• f1-score - it is the harmonic mean between precision and recall
• support - a measure of the occurrences of each class in dataset
• accuracy - it is the ratio of correctly classified samples to all cases in the test set
Meaning of labels:
• TP - true positive - cases that were correctly classified as positive by the classifier
• TN - true negative - cases that were correctly classified as negative by the classifier
• FP - false positive - an error where the test result incorrectly indicates the presence of a
condition when it is not present
• FN - false negative - an error where the test result incorrectly indicates the absence of a
condition when it is actually present</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Results</title>
        <p>As we can see in the results above in matrix we have 0 and 1 (Fig. 1) as the output were 0 is a low
chance of heart attack and 1 is a higher chance of heart attack. And in the classification-report,
that is from ’sklearn’ library, the 0 value is change to -1 (Tab.: 1, 2, 3).</p>
        <p>Analyzing the results shown in above matrix and table, we can observe that all three algorithm
have lower precision in qualify the low chance of heart attack.
As observed, the Soft Set algorithm struggles the most (have the lowest accuracy 70% (see Tab. 3)).
With only 1% advantage at accuracy the K-Nearest Neighbors performs better then the Naive
Bayes algorithm whose accuracy is at 83% (see Tab. 2).</p>
        <p>Precision</p>
        <p>Recall</p>
        <p>F1-score</p>
        <p>Support</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This study presented a comparative analysis of three different classifiersfor heart disease
prediction. The Soft Set Classifier,while effective in handling uncertainty, showed moderate
accuracy which equals 70%. The Naive Bayes classifier demonstrated high accuracy 83%, making it a
strong candidate for medical diagnostics. The K-Nearest Neighbors classifier also performed well,
with an accuracy of 84%. These results provide valuable insights into the strengths and
limitations of each classifier,guiding future research and application in medical diagnostics. In
all this pondering we need to remember that the Naive Bayes classifierwasn’t written by us.
We can only assume what kind of results can give independently written the Naive Bayes
algorithm and what results can bring us the K-Nearest Neighbors and Soft Set classifier written
from the library.</p>
      <p>Improvements that we can make in the future are to write the Naive Bayes algorithm and check its
accuracy then, remake the Soft Set algorithm so it reaches higher accuracy. In addition to
boost the accuracy we can compare all of the three algorithms to the ones from library and
eliminate the weak points because of which the accuracy isn’t as high as needed.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Arghandabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shams</surname>
          </string-name>
          ,
          <article-title>A comparative study of machine learning algorithms for the prediction of heart disease</article-title>
          ,
          <source>International Journal for Research in Applied Science and Engineering Technology</source>
          <volume>8</volume>
          (
          <year>2020</year>
          )
          <fpage>677</fpage>
          -
          <lpage>683</lpage>
          . doi:
          <volume>10</volume>
          .22214/ijraset.
          <year>2020</year>
          .
          <volume>32591</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Uyar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ilhan</surname>
          </string-name>
          ,
          <article-title>Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy neural networks</article-title>
          ,
          <source>Procedia Computer Science</source>
          <volume>120</volume>
          (
          <year>2017</year>
          )
          <fpage>588</fpage>
          -
          <lpage>593</lpage>
          . doi:
          <volume>10</volume>
          .1016/j. procs.
          <year>2017</year>
          .
          <volume>11</volume>
          .283.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>I.</given-names>
            <surname>Rojek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kotlarz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kozielski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jagodziński</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Królikowski</surname>
          </string-name>
          ,
          <article-title>Development of ai-based prediction of heart attack risk as an element of preventive medicine</article-title>
          ,
          <source>Electronics</source>
          <volume>13</volume>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .3390/electronics13020272.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R. J. A.</given-names>
            <surname>Laxamana</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. M. M. Vale</surname>
          </string-name>
          ,
          <article-title>Heart attack prediction using machine learning algorithms</article-title>
          ,
          <source>Journal of Electrical Systems</source>
          <volume>20</volume>
          (
          <year>2024</year>
          )
          <fpage>1428</fpage>
          -
          <lpage>1436</lpage>
          . doi:
          <volume>10</volume>
          .52783/jes.2474,
          <string-name>
            <surname>license</surname>
            <given-names>CC</given-names>
          </string-name>
          <source>BYND 4.0.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shrivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Upadhyay</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. Chaurasia,</surname>
          </string-name>
          <article-title>A machine learning approach for heart attack prediction</article-title>
          ,
          <source>International Journal of Engineering and Advanced Technology</source>
          <volume>10</volume>
          (
          <year>2021</year>
          )
          <fpage>124</fpage>
          -
          <lpage>134</lpage>
          . doi:
          <volume>10</volume>
          .35940/ijeat.F3043.0810621, mahatma Gandhi Central University Bihar, Babasaheb Bhimrao Ambedkar Central University Lucknow.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Oliullah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Whaiduzzaman</surname>
          </string-name>
          ,
          <source>Analyzing the Effectiveness of Several Machine Learning Methods for Heart Attack Prediction</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>225</fpage>
          -
          <lpage>236</lpage>
          . doi:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          - 981-19-9483-8_
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Majeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Shareef</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Darwesh</surname>
          </string-name>
          ,
          <article-title>Three classes of soft functions via soft-open sets and soft-closed sets</article-title>
          ,
          <source>Wasit Journal of Pure Sciences</source>
          <volume>3</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          . doi:
          <volume>10</volume>
          .31185/wjps. 288.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Langley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Iba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Thompson</surname>
          </string-name>
          , et al.,
          <source>An analysis of bayesian classifiers 90</source>
          (
          <year>1992</year>
          )
          <fpage>223</fpage>
          -
          <lpage>228</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Prokop</surname>
          </string-name>
          ,
          <article-title>Grey wolf optimizer combined with k-nn algorithm for clustering problem</article-title>
          ,
          <source>in: IVUS 2022: 27th International Conference on Information Technology</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Shantal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Othman</surname>
          </string-name>
          ,
          <article-title>A novel approach for data feature weighting using correla- tion coefficients and min-max normalization</article-title>
          ,
          <source>Symmetry</source>
          <volume>15</volume>
          (
          <year>2023</year>
          )
          <article-title>2185</article-title>
          . doi:
          <volume>10</volume>
          .3390/ sym15122185.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>