=Paper= {{Paper |id=Vol-2823/Paper18 |storemode=property |title=Prediction of Heart Disease Mortality Rate Using Data Mining |pdfUrl=https://ceur-ws.org/Vol-2823/Paper18.pdf |volume=Vol-2823 |authors=Prasenjit Das, Shaily Jain, Chetan Sharma, Shankar Shambhu, Sakshi }} ==Prediction of Heart Disease Mortality Rate Using Data Mining== https://ceur-ws.org/Vol-2823/Paper18.pdf
Prediction of Heart Disease Mortality Rate Using Data Mining
Prasenjit Dasa, Shaily Jainb, Chetan Sharmac, Shankar Shambhua, Sakshid

a,
  Chitkara University School of Computer Applications, Chitkara University, Himachal Pradesh, India
b
  Chitkara University Institute of Engineering and Technology, Chitkara University, Himachal Pradesh, India
c
  Chitkara University Himachal Pradesh, India
d
  Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India


Abstract

Heart disease is the most acute disease with the highest mortality rate in the world. Prediction and timely
treatment of this deadly disease only can reduce its effectiveness. Our paper aims to predict heart disease
death using different data mining algorithms with utmost accuracy. In this context, we have used five data
mining algorithms, Naive Bayes, LBLinear, Naive tree, Regression and Bayesian network on weka
implementing on a dataset from UCI repository. According to the results obtained after execution, all data
mining algorithms are predictive with good accuracy. We have evaluated accuracy, f-measure, recall, and
precision to compare different data mining algorithms in consideration. However, the Bayes network
outperforms all with a maximum accuracy of 79.26%. The values of other parameters are also highest in the
Bayes network compared to the other four algorithms.

Keywords: Classification, Prediction, Algorithms, Heart Disease, Data Mining, WEKA

1.        INTRODUCTION

Data Mining is a branch of computer science                                                   Cardiovascular disease is the most commonly
that is widely used in many fields. Data mining                                               occurring disease, leading to maximum deaths
means that mining or digging out knowledge or                                                 around the globe [1]. According to WHO, more
useful information from a vast amount of data.                                                than 19 million people died from cardiovascular
Through data mining, we can explore small to                                                  diseases in 2018, and around 4 million of these
large datasets to dig out any useful data                                                     deaths are of non-senior citizens.
previously hidden or unknown and detect                                                       Large amounts of data are available with our
relationships between different parameters that                                               health care industry which can be mined to
were not possible with statistical methods. In                                                determine hidden information about diseases
the health care industry, by applying data                                                    and be used for effective decision making
mining techniques, we can diagnose and predict                                                beforehand [2]. Many researchers have already
the occurrence of disease and the probability of                                              been motivated by the increasing mortality rate
death. Early prediction and diagnosis of the                                                  of cardiovascular diseases and started working
disease can further decrease the death rate.                                                  in the direction of extracting useful information
____________________________________                                                          using various data mining techniques [3].
                                                                                              Hence, if we can design a prediction system for
ACI’21: Workshop on Advances in Computational Intelligence at
ISIC 2021, February 25-27, 2021, Delhi, India
                                                                                              different diseases like heart using machine
EMAIL: prasenjit.das@chitkarauniversity.edu.in (P. Das);                                      learning or deep learning methods, medical
shaily.jain@chitkarauniversity.edu.in (S. Jain);                                              professionals can forego symptoms or problems
chetan.sharma@chitkarauniversity.edu.in (C. Sharma);
shankar.shambhu@chitkarauniversity.edu.in (S. Shambhu);                                       related to the heart based on the available data
sakshi@chitkara.edu.in (Sakshi)                                                               about patients and various attributes that
ORCID: 0000-0002-7988-2418 (P. Das); 0000-0001-6078-3607                                      contribute to the occurrence of heart disease.
(S. Jain); 0000-0001-5401-8503 (C. Sharma); 0000-0002-2348-
1041 (S. Shambhu); 0000-0002-8757-4001 (Sakshi)                                               One major challenge in assisting doctors in
              ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative
              Commons License Attribution 4.0 International (CC BY 4.0).                      diagnosing the world’s most deadly disease
              CEUR Workshop Proceedings (CEUR-WS.org)                                         needs utmost accuracy [4]. Hence, most of the
research is aiming to improve diagnosis              with the accuracy of 86.12% and Decision Tree
accuracy.                                            with only 80.4%. The authors proposed a three-
This paper used different classification             phase model in [11] for heart disease diagnosis.
algorithms to evaluate and compare some              They achieved an accuracy of only 88.89%.
parameters like accuracy in predicting death         Accuracy is the most important factor in
rate, f-measure, precision, and recall. Section 2    prediction, but this is not only the one. Some
of this article is about the related work in this    researchers have taken some other parameters
domain, and the proposed methodology is              like precision, recall, f-measure, and R2 values
discussed in section 3. Section 4 tabulates our      into heart disease prediction. In [12], the
experimental setup along with our results and a      authors used the Dimensionality reduction
discussion on them. Finally, the paper is            technique to process the raw data of 74 features
concluded in section 5.                              first and then divide them into three groups.
                                                     They could achieve the highest accuracy of
2.      Background and Motivation                    99.4% for CH, 100% precision, and 97.1%
                                                     recall while using CHI-PCA with RF classifier.
                                                     Shamsollahi in [13] has used combined
Intensive research has been going on for the         predictive and descriptive approaches for
past few decades to predict heart disease using      predicting Coronary Artery Disease. They have
data mining techniques. Various data mining          selected the k-means method for clustering
algorithms like Naive Bayes, Decision Tree,          (descriptive) and various classification methods
Neural Network, Support Vector, Logistic             (predictive), including CHAID, Quest, C5.0,
Regression,      Machine(SVM),         k-Nearest     C&RT decision tree, and ANN method. They
Neighbour, Artificial neural network, Random         compared the results on parameters precision,
Forest, J48 have already been used by                accuracy, specificity, sensitivity, and error rate.
researching in determining different levels of       As per the results, C&RT comes out as the best
accuracy on multiple datasets around the globe       method for the entire dataset with only 0.074
[5].                                                 errors. In [14], authors applied decision tree
Guidi et al. and others in [6] designed a clinical   classification using J48, random forest, and
decision support system (CDSS) for the heart         logistic model trees algorithms on the UCI
failure analysis. In their paper, performance        repository. It is concluded from their results
comparison of various machine learning               that the J48 tree classification algorithm is the
classifiers like Artificial neural network           most excellent classifier for heart disease
(ANN), support vector machine (SVM), CART            prediction because it achieves the highest
system with fuzzy rules, and Random forests          accuracy and smallest amount of total time to
has been made in which CART model and                build. Moreover, effect is pruning is clearly
Random forest outperformed by achieving an           visible. They could achieve an accuracy of only
accuracy of 87.6%. In [7], the authors proposed      56.76% and time to build is 0.04 seconds for
a logistic regression classifier after feature       J48 while logistic model trees reach the only
selection based upon a decision support system       accuracy of 55.77% with a total time to build
for the classification of heart disease and          0.39 seconds.
achieved an accuracy of 77%. Authors in [8]          Authors have implemented five different
used two approaches, multilayer perceptron           classifying algorithms: Naïve Bayes, Decision
(MLP) and support vector machine, to classify        Tree, discriminant, Random Forest, and
heart disease and reached an accuracy of             Support Vector Machine with big datasets and
80.41%. In [9], the authors proposed and             compared their performance in terms of
evaluated a hybrid classification system of heart    accuracy, precision, specificity, recall, and F-
disease and achieved an accuracy of 87.4%.           measure [15]. Among all five classifiers, the
They combine the fuzzy and artificial neural         decision tree ranks first, achieving an accuracy
network techniques for classification to find the    of 99.0%, with random forest stands at the
results. Palaniappan et al. In [10] have applied     second position with an accuracy of 93.4%.
Naive Bayes, ANN, and Decision Tree
algorithms to diagnose the existence of heart
disease. According to their results, ANN comes       3.      Proposed Methodology
out as the best predictive model with an
accuracy of 88.12% compared to Naive Bayes
The experiment's process flow is explained in
Figure 1 and further sections explain the
proposed methodology used.




                                    Figure1: Methodology used
                                                      records. This patient’s data includes the data of
3.1     Dataset                                       194 men patients and 105 women patients.
                                                      However, we have used only 12 attributes for
                                                      this experimentation, as shown in table 1. We
We have taken the UCI repository dataset from         have not taken the Time attribute considering it
Kaggle [16] named as Heart Failure prediction.        is the consultation duration, and we feel it not
The dataset has in total 13 attributes and 299        so relevant for our study.

                                  Table 1: Dataset Information
 Attribute ID   Attribute Used    Attribute Information
      A1        Age               Age of Patient. The value ranges from 40 years to 95 years
      A2        Sex               Gender of the patient represented in binary form
                                  1 = male.
                                  0 = female
      A3        Anemia            Reduction in hemoglobin
                                  1:Yes
                                  0:No
      A4        Creatinine        Level of the creatinine phosphokinase (CPK) enzyme in the
                Phosphokinase     blood measured in micrograms per liter
      A5        Diabetes          Fasting blood sugar of the patient. If greater than 120 mg/dl the
                                  value is 1 (true), otherwise value is 0 (false).
                                  1 = true.
                                  0 = false.
      A6        Ejection          Percentage of blood leaving ranges from 14 to 80
                Fraction
      A7        High Blood         If a patient has high blood pressure (BP>120/80)
                Pressure           1:Yes
                                   0:No
      A8        Platelets          Platelet count in the blood and its unit is shown in kplatelets/ml
      A9         Serum                Creatinine level in blood and its unit of measure is mg/dl
                 Creatinine
      A10        Serum Sodium         Sodium level found in patient blood and its unit is
                                      milliequivalents per liter
      A11        Smoking              Patient smokes
                                      1: Yes
                                      0: No
      A12        Time                 This is follow up time with patients
      A13        DEATH_EVENT          The occurrence of death due to heart disease
                                      1 = yes.
                                      0 = no




3.2      Data Pre Processing                             Bayes classification, Bayes network, and
                                                         Liblinear.
The real-life data consists of redundant values          Regression [17][18]: Regression is a
and lots of noise. The data needs to be cleaned,         supervised learning technique used to predict
and the missing values need to be filled before          the class of the dataset when the target values
the data is fed to generate a model. In the pre-         are known[19]. The current study includes the
processing process, these issues are taken care          regression to generate a model with the
of so that the prediction can be made accurately.        parameters, namely, age, gender, etc., and we
Once the cleaning of data is done, i.e., the noise       have predicted the unknown class. The
is removed, and the missing values are filled,           technique of regression works as follows:
we need to transform it. Many supervised                 The parameters used to make the prediction are
learning algorithms work on nominal or                   continuous variables (θ1, θ2, ..., θn). Based on
cardinal data. So data transformation is applied         these parameters, the model tries to find the best
to the dataset obtained from UCI in the present          fit to predict Y's target variable and improve
work. Reduction of the dataset is applied to             upon the accuracy. Using the function F of
convert the complex dataset into a more                  more predictors (x1, x2, ..., xn ) and a factor e as
straightforward form, improving the model's              an error, the formula for calculation Y (value of
accuracy                                                 the target variable ) as

3.3      Tool Used                                          Y=F(x, θ) + e                             (1)

WEKA 3.8.4 machine learning tool is used to              The target variable Y is dependent on the
conduct this study written in Java and                   predictor variables, which are independent of
developed at the University of Waikato. WEKA             each other. The model is generated based on the
tool provides us with different classifiers to           relation between the predictors and the target
examine the performance. WEKA is used to                 class. This is done in the training process. The
evaluate different data mining tasks like pre-           model thus built is now fed with different
processing, classification, regression, and many         unknown datasets for which the target value is
more. WEKA accepts .csv and .arff file format            predicted. The number of correctly predicted
and the chosen dataset has already created the           classes constitutes the accuracy and establishes
required data in the mentioned format.                   the effectiveness of the model.

                                                         Naive Bayes Tree: It is a hybrid approach in
3.4      Classification Algorithms                       which the model is generated using the naïve
                                                         Bayes and Decision tree Approach. The naïve
After going through an intensive literature              Bayes classification assumes that the features
review, we have selected five classification             are unbiased of each other, and the decision tree
algorithms: regression, naive Bayes tree, naive          assumes that the features are dependent on each
other. So the hybrid approach takes advantage       ordered pair U= (G, Y). The first component of
of both approaches. The decision tree is built by   the ordered pair G is the acyclic graph. In this
considering only one feature, and output is fed     graph, the vertices represent the random
to the node. Based on the outcome of each node,     variable X1, X2……, Xn, and the edges
other features are selected. In this hybrid         represent the relationship between these
approach, the split is done in the same manner      variables. The second component, Y, is the set
by considering only one feature at every node       of features that constitute the network. It
but with Naive-Bayes classifiers at the leaves.     contains a feature Yxi|xi = PB(xi|xi ) for each
In large datasets, data splitting is considered a   possible value xi of Xi, and Πxi of ΠXi , where
vital and important task for classification using   ΠXi denotes the set of parents of Xi in G. A
the features we have implemented the naive          Bayesian network B defines a joint probability
Bayes tree classification.                          distribution (PDF) over U, and this is a unique
                                                    PDF.
Naive Bayes Classification [20]–[22]: This          PB(X1,X2,……,Xn) = Π PB(Xi|ΠXi) (3)
classification technique is based on the Bayes
theorem, which works on the assumption that         LiBLinear: LIBLinear is an open-source
the existence of one feature is independent of      library for linear classification. It supports two
the other feature. The advantage of the Naive       linear classifications, one logistic regression,
Bayes classification is that it requires a small    and another is the Linear Support vector
amount of data to create/train the model.           machine. Given a set of instance-label pairs (xi;
Bayes theorem provides a way of calculating         yi); i = 1; : : : ; l; xi 2, both methods solve the
posterior probability (conditional probability      following unconstrained optimization problem
where we are finding probability under a given      with different loss functions _(w; xi; yi):
condition assumed to be true ) P(c|x) from P(c),
P(x), and P(x|c). The following is the formula
to calculate posterior probability:
                                                                                            C is a
   P(c|x)=P(x|c)*P(c)/P(x|c)              (2)
                                                    penalty parameter, and C>0                   (4)
Where:
P(c|x) is the conditional probability that occurs
when x has already occurred
P(c) is the known probability of the class.         3.5      Evaluation Matrices
P(x|c) is the conditional probability of x
condition that c has occurred.                      We have considered four parameters for our
P(x) the known probability of the class.            paper. In the present work, the prediction class
                                                    is if the person having certain attributes has died
Bayes Network: The naïve Bayes algorithm            because of heart disease or not, so the class C in
assumes the independence of features. This          the above table is no. of instances belonging to
hypothesis hampers the performance of the NB        the class. Figure 2 is the confusion matrix.
classifier. To improve the performance of the       TP is the number of people who died because of
classifier, the Bayes networking algorithm is       heart disease, and the model also predicted the
applied. The network is an acyclic graph that       same. Similarly, TN is the person who didn’t
shows the joint probability distribution of the     die of a heart ailment, and our model also
random variables/features. Each node/vertex of      predicted the same. False Positive (FP) is a
the graph represents a feature, and the edge        Type I error because the model predicted that
represents the correlation between the features.    the person died of an ailment, but actually, the
This, in a way, reduces the effect of the           patient didn’t. False-negative is a type II error.
hypothesis that the features are independent of     The model predicted that the person didn’t die
each other. The independence of the features is     of the alignment, but he/she did.
then evaluated to reduce the number of
parameters needed to calculate the probability      The accuracy of the model is calculated through
distribution and compute the posterior              the formula given below:
probabilities. The acyclic graph is a joint
probability distribution of random variables,       Accuracy = (TP+TN)/Total no. of instance
say U. mathematically, we can say that it is an     (5)
                                                           Comparing the two models becomes difficult
The recall is the measure of correctly predicted           when the precision is low, and the recall value
classes out of the total positive classes. The             is high. In the case of vice versa, the two
formula is as follows:                                     parameters are not of much use for comparison
                                                           of the models. F-score is used to compare the
Recall= (TP)/(TP+FN)                        (6)            models in such cases. F-score uses the harmonic
                                                           mean of the two values. This helps to measure
Precision is the measure of actual positive                the recall and precision at the same time.
classes out of all the correctly predicted positive        Instead of the Arithmetic mean, the harmonic
classes. The formula for the recall is as follows:         mean is used because the Arithmetic mean is
                                                           sensitive to extreme values.
Precision = TP/(TP+FP)                      (7)
                                                             F-score= (2*Recall*Precision) / (Recall +
                                                                           Precision)

 Actual class\Predicted class       C                          Not in C
 C                                  True Positives (TP)        False Negatives (FN)
 Not in C                           False Positives (FP)       True Negatives (TN)
                                         Figure2: Confusion Matrix



3.6      k-Fold Cross-Validation
Dividing the dataset into k parts of equal size in which k-1 sets are used for training purposes and rest
are used for evaluation is termed as k-fold cross-validation [23]. For instance, if we use 10-fold cross-
validation, 90 percent of total data is used for training the classifier, and the rest 10 percent is used for
evaluation.

4.      Results and Discussion                             and the rest 34% data for evaluating the results.
                                                           From the results, we can easily predict that
The chosen five different classification                   Bayesian Network outperforms all with the
algorithms were implemented on the heart                   highest accuracy, precision, f-measure, and
disease dataset of the UCI repository. The                 recall in each method. Naive Bayes network
experimental results have been obtained on the             uses an acyclic graph where each node
framework of WEKA 3.8.4. We used different                 represents a feature, and the edge represents its
k as 5, 10, and 20 for cross-validation and                relation with other features. In the present work,
evaluated the above mentioned four parameters              the features such as age, gender, blood pressure,
using five classification algorithms on WEKA.              diabetes, etc., contribute towards heart disease
Table 2 tabulates the results obtained when                [24]. Hence, the accuracy for this classifier
taken 5-fold CV classification with five                   outperforms the other. This establishes our
algorithms to evaluate the accuracy, F-measure,            hypothesis that the features such as age, gender,
precision, and recall. Similarly, table 3 and              etc., when classified in the form of a graph
table 4 show our experiment's simulation                   (where these are dependent on each other),
results on weka with 10-fold and 20-fold CV                means that the heart-related ailment depends on
classification. Table5 tabulates the results when          these factors. So we can use this technique for
we have used 66% data for training the system              the prediction of heart disease[25].
                      Table 2: Performance Comparison of classifiers (k=5)
 Algorithms          Accuracy (%)        F-Measure (%)          Precision (%)           Recall (%)
 LibLINEAR              74.24                73.1                    73.1                 74.2
 Naïve Bayes            77.92                77.6                    77.4                 77.9
 NB Tree                73.91                73.1                    72.9                 73.9
 Bayes Net              78.59                78.7                    78.8                 78.6
 Classification         71.90                70.1                    70.2                 71.9
 via
 Regression

                       Table3: Performance Comparison of classifiers (k=10)
 Algorithms                      Accuracy (%) F-Measure (%)       Precision (%)         Recall (%)
 LibLINEAR                          76.58         75.6                75.7                76.6
 Naïve Bayes                        77.92         77.8                77.7                77.9
 NB Tree                            77.59         77.4                77.3                77.6
 Bayes Net                          79.26         79.5                79.8                79.3
 Classification via Regression      75.25         73.4                74.2                75.3

                      Table 4: Performance Comparison of classifiers (k=20)
 Algorithms                     Accuracy (%) F-Measure (%)       Precision (%)          Recall (%)
 LibLINEAR                          72.90         71.9                71.7                72.9
 Naïve Bayes                        75.25         75.2                75.1                75.3
 NB Tree                            74.91         74.7                74.5                74.9
 Bayes Net                          75.91         76.2                76.7                75.9
 Classification via Regression      75.91         74.3                 75                 75.9

              Table5: Performance Comparison of classifiers (percentage split= 66%)
 Algorithms                    Accuracy (%) F-Measure (%)         Precision (%)     Recall (%)
 LibLINEAR                        74.50            73                 74.8            74.5
 Naïve Bayes                      72.54           71.8                 72             72.5
 NB Tree                          72.54           71.8                 72             72.5
 Bayes Net                        74.50           74.4                74.3            74.5
 Classification via Regression    73.52           73.7                 74             73.5

5.      Conclusion and Future Scope                 like smoking habit, diabetes, high BP, etc.
                                                    Hence, we get better accuracy and prove that
In this paper, five data mining classifiers         these factors contribute to heart disease
(LibLinear, Naive Bayes, Naive Bayes tree,          occurrence.
Bayes network, and classification via               In the future, we could use these results to
regression) on heart disease data taken from the    design an effective prediction system that could
UCI repository have been implemented. The           help our medical practitioners diagnose and
goal of this experimentation is to detect the       treat heart disease. Also, we could implement
accuracy in the prediction of heart disease of      these data mining techniques for other diseases
patients. We successfully achieved the highest      like diabetes, etc.
accuracy of 79.28% with the Bayesian network
classifier followed by naive Bayes. The reason      References
behind excellent performance by the Bayesian        [1]      S. Gupta, D. Kumar, and A. Sharma,
network is the use of graphs in it, as graphs can   “Performance analysis of various data mining
reflect the relationship better between             classification techniques on healthcare data,”
dependent variables as we have in our dataset       Int. J. Comput. Sci. Inf. Technol., vol. 3, no. 4,
                                                    pp. 155–169, 2011.
[2]       J. Soni, U. Ansari, D. Sharma, and S.       predictive methods of data mining for coronary
Soni, “Predictive data mining for medical             artery disease prediction: a case study
diagnosis: An overview of heart disease               approach,” J. AI Data Min., vol. 7, no. 1, pp.
prediction,” Int. J. Comput. Appl., vol. 17, no.      47–58, 2019.
8, pp. 43–48, 2011.                                   [14]     J. Patel, D. TejalUpadhyay, and S.
[3]       C. S. Dangare and S. S. Apte,               Patel, “Heart disease prediction using machine
“Improved study of heart disease prediction           learning and data mining technique,” Hear.
system using data mining classification               Dis., vol. 7, no. 1, pp. 129–137, 2015.
techniques,” Int. J. Comput. Appl., vol. 47, no.      [15]     I. A. Zriqat, A. M. Altamimi, and M.
10, pp. 44–48, 2012.                                  Azzeh, “A comparative study for predicting
[4]       S. Sa, “Intelligent heart disease           heart diseases using data mining classification
prediction system using data mining                   methods,” arXiv Prepr. arXiv1704.02799,
techniques,” Int. J. Healthc. Biomed. Res., vol.      2017.
1, pp. 94–101, 2013.                                  [16]     G. J. Davide Chicco, “Heart Failure
[5]       S. Nazir, S. Shahzad, S. Mahfooz, and       Prediction,”                                 2015.
M. Nazir, “Fuzzy logic based decision support         https://www.kaggle.com/andrewmvd/heart-
system for component security evaluation.,”           failure-clinical-data (accessed Nov. 10, 2020).
Int. Arab J. Inf. Technol., vol. 15, no. 2, pp.       [17]     F. E. Harrell, “Ordinal logistic
224–231, 2018.                                        regression,” in Regression modeling strategies,
[6]       G. Guidi, M. C. Pettenati, P. Melillo,      Springer, 2015, pp. 311–325.
and E. Iadanza, “A machine learning system to         [18]     V. Vapnik, The nature of statistical
improve heart failure patient assistance,” IEEE       learning theory. Springer science & business
J. Biomed. Heal. informatics, vol. 18, no. 6, pp.     media, 2013.
1750–1756, 2014.                                      [19]     K. Larsen, J. H. Petersen, E. Budtz-
[7]       R. Detrano et al., “International           Jørgensen, and L. Endahl, “Interpreting
application of a new probability algorithm for        parameters in the logistic regression model with
the diagnosis of coronary artery disease,” Am.        random effects,” Biometrics, vol. 56, no. 3, pp.
J. Cardiol., vol. 64, no. 5, pp. 304–310, 1989.       909–914, 2000.
[8]       M. Gudadhe, K. Wankhade, and S.             [20]     L. Li, Y. Wu, and M. Ye,
Dongre, “Decision support system for heart            “Experimental comparisons of multi-class
disease based on support vector machine and           classifiers,” Informatica, vol. 39, no. 1, 2015.
artificial neural network,” in 2010 International     [21]     P. Ahmad, S. Qamar, and S. Q. A.
Conference on Computer and Communication              Rizvi, “Techniques of data mining in
Technology (ICCCT), 2010, pp. 741–745.                healthcare: a review,” Int. J. Comput. Appl.,
[9]       H. Kahramanli and N. Allahverdi,            vol. 120, no. 15, 2015.
“Design of a hybrid system for the diabetes and       [22]     S. S. Nikam, “A comparative study of
heart diseases,” Expert Syst. Appl., vol. 35, no.     classification techniques in data mining
1–2, pp. 82–89, 2008.                                 algorithms,” Orient. J. Comput. Sci. Technol.,
[10]      S. Palaniappan and R. Awang,                vol. 8, no. 1, pp. 13–19, 2015.
“Intelligent heart disease prediction system          [23] V. Madaan and A. Goyal, "Predicting
using data mining techniques,” in 2008                Ayurveda-Based Constituent Balancing in
IEEE/ACS international conference on                  Human Body Using Machine Learning
computer systems and applications, 2008, pp.          Methods," in IEEE Access, vol. 8, pp. 65060-
108–115.                                              65070,                   2020,                doi:
[11]      E. O. Olaniyi, O. K. Oyedotun, and K.       10.1109/ACCESS.2020.2985717.
Adnan, “Heart diseases diagnosis using neural         [24]     Vishu Madaan and Anjali Goyal,
networks arbitration,” Int. J. Intell. Syst. Appl.,   “Analysis and Synthesis of a Human Prakriti
vol. 7, no. 12, p. 72, 2015.                          Identification System Based on Soft Computing
[12]      A. K. Garate-Escamilla, A. H. E. L.         Techniques”, Recent Patents on Computer
Hassani, and E. Andres, “Classification models        Science, 12(1), pp 1-10, 2019. DOI:
for heart disease prediction using feature            10.2174/2213275912666190207144831
selection and PCA,” Informatics Med.                  [25] Prateek Agrawal, Vishu Madaan, Vikas
Unlocked, p. 100330, 2020.                            Kumar, “Fuzzy Rule Based Medical Expert
[13]      M. Shamsollahi, A. Badiee, and M.           System to Identify the Disorders of Eyes, ENT
Ghazanfari, “Using combined descriptive and           and Liver”, International Journal of Advanced
Intelligence Paradigm (IJAIP), vol 7, issue3-4,   pp. 352-367, Inderscience Publications, 2015.