<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Analysis of Machine Learning Algorithms for Classification and Prediction of Heart Disease</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Nataliya</forename><surname>Boyko</surname></persName>
							<email>nataliya.i.boyko@lpnu.ua</email>
							<affiliation key="aff0">
								<orgName type="institution">Lviv Polytechnic National University</orgName>
								<address>
									<addrLine>Profesorska Street 1</addrLine>
									<postCode>79013</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Iryna</forename><surname>Dosiak</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lviv Polytechnic National University</orgName>
								<address>
									<addrLine>Profesorska Street 1</addrLine>
									<postCode>79013</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Analysis of Machine Learning Algorithms for Classification and Prediction of Heart Disease</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">B52CA246F5797631B7BCB96FD6D4B772</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T01:38+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Model</term>
					<term>classification</term>
					<term>machine learning</term>
					<term>algorithm</term>
					<term>Bayes classifier 0000-0002-6962-9363 (N. Boyko); 0000-0002-5488-4468 (I. Dosiak)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The study aims to improve the effectiveness of health care in various ways. The paper considers ML algorithms that allow health professionals to allocate resources optimally and physicians to choose the best treatment options for patients. This approach will reduce the burden on doctors and increase and accelerate patients' access to health care, save resources and reduce costs. The paper presents the results of research that will allow the use of smaller data sets to develop transparent models. The report uses a naive Bayes classifier to predict heart disease. The advantage of this approach is that the sample size requirements are reduced from exponential to linear, which is very important. There is an overview of the classification model, its advantages and disadvantages. Materials and methods are also analyzed.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Machine Learning (ML) algorithms allow healthcare professionals to allocate resources optimally and physicians to choose the best treatment options for patients. This approach reduces the burden on doctors, increases and accelerates patients' access to health care, saves resources, and reduces costs. However, despite the achievements of ML research in medicine, its role is currently limited. Creating and testing a model may require large amounts of high-quality data. Besides, diagnostic models must be built individually for each disease. It is a lengthy process. In addition, the psychological aspect of trusting black box algorithms can also be difficult to perceive. However, continuing ML research may allow using smaller data sets and developing more transparent models <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b12">13]</ref>.</p><p>The nature of heart disease is complex. In addition, the diagnosis of heart disease in most cases depends on a complex combination of clinical and pathological data. The relationship between the real cause of the disorder and the effects of spontaneous symptoms in patients can often be hidden and not obvious <ref type="bibr" target="#b5">[6]</ref>.</p><p>That is why the analysis of medical data in health care is considered an important but complex task that must be performed accurately and effectively. In addition, the study of medical data is necessary to avoid medical error.</p><p>The basis of medical diagnosis is the problem of classification. The diagnosis comes down to the problem of displaying data to one of N different results.</p><p>The study aims to apply and implement the original Naive Bayes model with two existing models: the Gaussian model and the Multinomial model. This study will focus on comparative analysis, differences, capabilities, and effectiveness of the classifier with different models</p><p>The purpose of classifying heart disease is to diagnose a disease in a patient based on specific diagnostic measurements included in the data set. In addition, the work will consist of searching for significant features and patterns between the various factors influencing the diagnosis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Review of literature sources</head><p>For a detailed study of these tasks, you need to read and analyze the experience of scientists in this field. Since the problem is relevant, numerous studies have been conducted that have focused on diagnosing heart disease in combination with or without another condition.</p><p>• G. Parthiban, A. Rajesh, S.K. Srivatsa predicted the chances of people with diabetes having heart disease and highlighted the results in their article "Diagnosis of Heart Disease for Diabetic Patients using Naive Bayes Method," published in the International Journal of Computer Applications <ref type="bibr" target="#b0">[1 ]</ref>. The accuracy was 74%. • Mrs. Mr. Subbalakshmi, Mr. K. Ramesh M. Tech, Mr. M. Chinna Rao M.Tech developed a system that extracts hidden knowledge from a historical heart disease database using a Naive Bayes classification <ref type="bibr" target="#b1">[2]</ref>. The article "Decision Support in Heart Disease Prediction System using Naive Bayes" was published in the Indian Journal of Computer Science and Engineering». • Jyoti Soni, Ujma Ansari, Dipesh Sharma, Sunita Soni conducted a study and compared KNN and the Naive Bayes classifier to predict heart disease <ref type="bibr" target="#b2">[3]</ref>. However, the accuracy of the results reached 45.6% for KNN and 52.33% in the case of the Naive Bayes classifier. Their article "Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction" was published in the International Journal of Computer Applications. In the end, they added the need to improve the proposed study. • Vincy Cherian and Bindu M.S developed a heart disease prediction system using a Naive Bayes classifier and a Laplace smoothing technique <ref type="bibr" target="#b3">[4]</ref>. They reported this in their article "Improved Study of Heart Disease Prediction System using Data Mining Classification Techniques." They achieved high accuracy. However, the system has a limit on the number of attributes -symptoms. Unfortunately, searches for such studies among Ukrainian sources did not yield any results. Thus, various studies only represent the effectiveness of predicting heart disease using ML methods. This study aims to find features and patterns between different factors that affect the diagnosis using a Naive Bayes classifier.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methods overview</head><p>Classification solves the following problem: let there be a set of objects divided into classes on one or more grounds. Moreover, a finite set of objects is given, for which it is known to which classes they belong. Such a set is considered to be a training sample. It is unknown to which class the other objects belong. We need to build an algorithm that can classify any object of the source set -specify the number or name of the class to which it belongs <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b10">11]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">A mathematical formulation of the classification problem</head><p>Let X be a set of object descriptions, and Y be class numbers or names. There is an unknown target relationship -mapping , the values of which are known only on the elements of the finished training sample . We need to build an algorithm a , that can classify an arbitrary object <ref type="bibr" target="#b11">[12]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Bayes classifier</head><p>Bayes classifier -provides a classification with a degree of confidence rather than simply issuing the most plausible class. Bayes' theorem is used to determine the degree of certainty.</p><p>Bayes' theorem describes the probability of an event, given the circumstances that may affect the event. Thus, you can more accurately calculate the probability, considering both already known information and data from new observations <ref type="bibr" target="#b13">[14]</ref>.</p><p>A Naive Bayes classifier is an assumption about the independence of traits. In other words, the NCB assumes that any attribute in the class is not related to the presence of any other feature.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Method overview</head><p>As mentioned, the Bayes classifier is based on the Bayes theorem, which describes the probability of an event, given the circumstances that may affect the event <ref type="bibr" target="#b13">[14]</ref>.</p><p>Suppose there is a symptom S. In addition, there are classes (diseases) C, which should include the symptom. It is necessary to find a class (disease) C in which the probability for this line would be maximum. The mathematical notation is given in Formula 1.</p><p>(1)</p><p>It is hard to calculate P(C|O). However, you can use Bayes' theorem and go to (Formula 2):</p><formula xml:id="formula_0">, (<label>2</label></formula><formula xml:id="formula_1">)</formula><p>where P(С) -an a priori probability, the probability of meeting a class among all the data; P(O|C) -conditional probability, the probability of symptoms in each class; P(O) -total probability, probability of symptoms. Usually, it makes no sense to work with one symptom. It is much more effective to detect the disease on several grounds. Thus Formula 2 will take the form (Formula 3):</p><p>(3)</p><p>Since you need to find the function's maximum, the denominator can be ignored (this is a constant). It is also necessary to include a "naive" assumption that the symptoms of S depend only on class C and do not depend on each other. Then the numerator will take the form (Formula 4):</p><p>(4) So, the final formula will look like (Formula 5):</p><p>(5</p><formula xml:id="formula_2">)</formula><p>So it all comes down to calculating the probability P(C) and P(S|C). Calculating these parameters is called classifier training.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Multinomial Naive Bayes</head><p>Multinomial Naive Bayes implements a Naive Bayes algorithm for multinomial distributed data and is one of two classic variants of Naive Bayes <ref type="bibr" target="#b7">[8]</ref>.</p><p>This algorithm puts forward a second assumption of independence -the assumption of positional independence. Conditional probabilities of symptom onset are equally independent of its position in the data sample <ref type="bibr" target="#b8">[9]</ref>.</p><p>The data is usually presented as a vector. The basic idea is that each unique feature (symptom) that occurs is assigned a unique integer. Therefore the data can be represented as a sequence of numbers.</p><p>The distribution of the number of vectors is parameterized by vectors for each class, where n -number of features (symptom), and the probability of the appearance in the sample of features belonging to class C.</p><p>The parameter is estimated by the smoothed version of the maximum probability. The relative frequency calculation (Formula 6):</p><formula xml:id="formula_3">, (<label>6</label></formula><formula xml:id="formula_4">)</formula><p>where -the number of times the 𝑖 character appears in a class C sample in the training set.;</p><p>-the total number of all features (symptoms) for class C; A -Laplace smoothing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Review and analysis of data</head><p>The data set about heart disease "heart.csv" is used for research <ref type="bibr" target="#b5">[6]</ref>. It was taken from Kaggle. This database contains 76 attributes, but all published experiments involve using a subset of 14 of them, as the rest of the information is the identification of individuals. The total number is 303 rows and 14 columns, of which 165 have heart disease <ref type="bibr" target="#b6">[7]</ref>.</p><p>Attribute information: 1. age;    As can be seen from this section, most values are usually categorized. All columns have no spaces, contain 303 rows of data.</p><p>An analysis of atypical emissions should also be conducted. To do this, use a standardized Z-Score score, which shows how many standard deviations is the scatter of the value relative to the observed average value. If the Z-Score value is greater than or less than 3 or -3, respectively, this data point will be defined as non-standard (Fig. <ref type="figure">4</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 4: Z-Score for atypical data</head><p>Fig. <ref type="figure">5</ref> shows that this data set contains two emissions. Let's try to visualize them. For this purpose, it is necessary to construct the box diagram to visualize atypical data (Fig. <ref type="figure">5</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 5: Visualization of atypical data in a dataset</head><p>Because only two sets of data that differed from the others were identified, so they were removed from the sample. This will help achieve better results in predicting heart disease.</p><p>The next step is to review the number of existing or absent diseases. To do this, determine the average number of different values for prediction by columns (Fig. <ref type="figure" target="#fig_4">6</ref>). Target variable: whether the patient has heart disease or not (value 0 -yes; value 1 -no). Fig. <ref type="figure" target="#fig_4">6</ref> shows that the distribution is balanced.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Search for the correlation of heart disease with different parameters</head><p>To find the links of heart disease with different parameters, we need to build a correlation matrix (Fig. <ref type="figure">7</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure7: Correlation matrix</head><p>Fig. <ref type="figure">7</ref> shows certain relationships between the features. It is first necessary to determine the difference between the correlation coefficients in men and women. The results are shown in Fig. <ref type="figure">8</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 8: Difference of correlation coefficients for different sexes</head><p>Figure <ref type="figure">8</ref> shows that all coefficients, except for the target variable, differ between men and women. The most noticeable difference for trestbps. This is the resting blood pressure in millimeters of mercury.</p><p>Most people have normal blood pressure in certain groups (these can be healthy adults, adults taking medication, the elderly). It also appears that very high blood pressure may indicate heart disease <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>.</p><p>Observations follow from the obtained results: 1. Age is negatively correlated with heart disease. Because older people are more likely to have heart disease, they are more likely to have a health check-up, even if they have mild or no symptoms. Young people go for a health check only when they have apparent symptoms. That is why they are more often diagnosed with heart disease. 2. Cholesterol and fasting blood glucose levels have little correlation with heart disease. 3. Chest pain (cp), maximal pulse (thalach), a tilt of the ST segment in the ECG are positively correlated with heart disease. 4. Exercise angina (exang), ST depression caused by exercise (oldpeak), the number of major vessels (0-3) stained with fluoroscopy (ca) are negatively correlated with heart disease. Moreover, in all these ratios, the correlation is lower for men than for women.</p><p>5. Trestbps (resting blood pressure) and fbs (fasting blood sugar) are negatively correlated. Moreover, the correlation is lower for women compared to men. For these observations, the accuracy of the conclusions should be checked, taking into account the distribution of data between men and women (Fig. <ref type="figure">9</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 9: Data distribution between women and men</head><p>Fig. <ref type="figure">9</ref> shows that women account for about half of the observations than men. You can also see that gender is a risk factor. Also, to verify the above statements, you should visualize the presence or absence of the disease depending on the age range (Fig. <ref type="figure" target="#fig_5">10</ref>).   In Fig. <ref type="figure" target="#fig_7">12</ref>, the x-axis represents the resting blood pressure in millimeters of mercury. The y-axis represents the density estimate. Yellow indicates the absence of the disease, red -the presence. The relationship between blood pressure and the female sex is on the left, on the right -between blood pressure and male.</p><p>In Fig. <ref type="figure" target="#fig_8">13</ref>. presents the presence or absence of the disease, taking into account only one featureone attribute. From Fig. <ref type="figure" target="#fig_8">13</ref>, the following observations follow: 6. The number of major vessels stained with fluoroscopy refers to the number of narrow vessels seen, so the higher the value of this feature, the greater the likelihood of heart disease. 7. A very invasive process for patients obtains the results of blood flow observed through the radioactive dye. But in themselves, they are excellent evidence of heart disease or not. 8. The slope of the ST segment can help determine if you have heart disease or not if it is flat or growing. 9. Angina is a good indicator of heart disease. However, we can also see that knowing what angina is and what it is not an easy task can be confused with other pains or atypical angina. 10. When someone has heart disease, the first symptom is usually stable angina (angina during exercise). When angina occurs even at rest, the condition worsens (typically narrowing the coronary arteries). That is why so few patients find abnormal heart rates at rest, and the vision of this anomaly is very indicative of the presence of heart disease.</p><p>11. On the other hand, a value of 0, the probable presence of hypertrophy, in itself does not indicate the presence of heart disease. 12. In itself, the feature -blood sugar levels, does not give confidence in the presence or absence of heart disease. However, we will not abandon this feature, as it can be helpful with other variables. 13. Chest pain also does not give an unambiguous answer. It is challenging to tell if a patient has heart disease that corresponds only to its symptoms. To verify the accuracy of the conclusions, you should use PCA, which helps extract a set of variables from an existing large set of variables. These extracted variables are called essential components.</p><p>Because the data set is small and has no many features, only two components should be used to see how much variance it covers.</p><p>The study can explain approximately 90% of the variance in the data set using only two components. Fig. <ref type="figure" target="#fig_9">14</ref> presents each of these decomposed components: Component 1: Fig. <ref type="figure" target="#fig_9">14</ref> shows that the weight is considerable and positive for the feature of chol, slightly positive for sex and cp. This means that patients with a high rate of this component will have a meager chance of being diagnosed with heart disease. At the same time, people with more elevated serum cholesterol are more likely to be diagnosed with heart disease.</p><p>Component 2: Fig. <ref type="figure" target="#fig_9">14</ref> shows that the weight is considerable and negative for thalach (maximum heart rate reached) and slightly negative for cp (type of chest pain), chol (serum cholesterol), and slope (slope of the peak segment of exercise ST).</p><p>Thus, a high rate of thalach, cp, slope and chol, mainly does not cause heart disease. People who have high levels of these components are much less likely to have heart disease. In contrast, age and high resting blood pressure (trestbps) may be the first features of heart disease. In Fig. <ref type="figure" target="#fig_9">14</ref>, they are positive.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Application of the Naive Bayes classifier</head><p>The next step is to divide the data into training and test in 80% to 20%. You should also normalize the data with OneHotEncoder and MinMaxScaler <ref type="bibr" target="#b9">[10]</ref>.</p><p>OneHotEncoder -a strategy in which each value of the category is converted into a new column, and it is assigned a value of 1 or 0 (notation for true/false). Fig. <ref type="figure" target="#fig_10">15</ref> shows an example of the strategy. For each value in the object, MinMaxScaler subtracts the minimum value and then divides it by range. The range is the difference between the initial maximum and the initial minimum. MinMaxScaler retains the shape of the original distribution.</p><p>After normalization, the classification should be performed. To implement the classification, you need to use GuassianNB from the sklearn library with different types of states when sharing data.</p><p>The score function from the sklearn library is used to evaluate the results, which returns the average accuracy of the given test data and labels. The results obtained are presented in Fig. <ref type="figure" target="#fig_4">16</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 16: Average classification scores using all attributes</head><p>Fig. <ref type="figure" target="#fig_4">16</ref> x-axis indicates the number of random states, y-axis -the average score for this method. Fig. <ref type="figure" target="#fig_4">16</ref> shows that the estimate ranges from 0.5 to 1.</p><p>Thus, the average estimate of the Naive Bayes classifier for random states from 0 to 200 is 0.844262295081968.</p><p>To illustrate the performance of the algorithm should build a matrix of inconsistencies (confusion matrix).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 17: Matrix of discrepancies</head><p>Figure <ref type="figure" target="#fig_1">17</ref> shows four different results: true positive, false positive, true negative, and false negative.</p><p>From the correlation matrix, you can determine the accuracy or positive predictive value (precision), the probability of detection (recall), and the completeness of the definition (f1_score).</p><p>• TP -true-positive decision;</p><p>• TN -true-negative decision;</p><p>• FP -false-positive decision;</p><p>• FN -false-negative decision. The next step is to use the metrics for this method. The results are shown in Fig. <ref type="figure" target="#fig_1">18</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 18: Measures of accuracy</head><p>We need to reduce the number of attributes to 10. To do this, we need to remove the parameters that have the most negligible impact on heart disease and apply the Naive Bayes classifier again. The results of the experiment are shown in Fig. <ref type="figure" target="#fig_11">19</ref>.  Thus, the average estimate of the Naive Bayes classifier for random states from 0 to 200 is 0.765245901639342.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Application of the Multinomial Naive Bayes classifier</head><p>To implement the classification, you should use MultinomialNB from the sklearn library with different states when sharing data.</p><p>The score function from the sklearn library is used to evaluate the results.</p><p>The obtained results are presented in Fig. <ref type="figure" target="#fig_2">21</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 21: Average classification scores using all attributes</head><p>In Fig. <ref type="figure" target="#fig_2">21</ref> x-axis indicates the number of random states, y-axis -the average score for this method. The figure shows that the score ranges from 0.7 to 1.</p><p>Thus, the average score of the Multinomial Naive Bayes classifier for random states from 0 to 200 is 0.850129016334426.</p><p>To illustrate the algorithm's performance, you need to build a matrix of inconsistencies (confusion matrix) (Fig. <ref type="figure" target="#fig_2">22</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 22: Matrix of discrepancies</head><p>It is also necessary to determine the accuracy of Fig. <ref type="figure" target="#fig_13">23</ref>.   Therefore, the average estimate of the Multinomial Naive Bayes classifier for random states from 0 to 200 is 0.83010.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion of experimental results</head><p>To study the accuracy of the two classification models, we use a set of data on heart disease. Table <ref type="table" target="#tab_0">1</ref> summarizes the characteristics of the data set used in the experiments. Table <ref type="table" target="#tab_1">.</ref> 1 mentions the features that have been tested to training algorithms. The data set was initially studied using 14 features. Subsequently, ten features were selected and rejected those that had the least impact on heart disease. And finally, seven features. The training of the three classification models are given in Table <ref type="table" target="#tab_1">.</ref> 2. We chose an 80:20 ratio because the Naive Bayes classifier could not benefit from retraining the data.   From Fig. <ref type="figure" target="#fig_17">26</ref>, it is noticeable that both methods show worse results when the number of features decreases. This is because we first rejected the symptoms, which had little effect on heart disease. Therefore, the accuracy is similar.</p><p>In addition, we can notice that the Multinomial classifier shows much better results when reducing the number of features. This advantage is because this method makes the second assumption of positional independence. Conditional probabilities of symptom onset are equally independent of its position in the data sample.</p><p>Let's take into account the nature of the chosen problem in the study, namely the values of "0" and "1" in the answers of the classifiers. We can conclude that the correct classification of first-class objects is, in our case, more critical. After all, it is better to do all the tests once again for a healthy person than not to recognize the disease in a sick person.</p><p>That is why it is worth emphasizing the recall score when comparing models with each other and choosing the best one. It estimates the proportion of correctly classified first-class objects. In addition, it is necessary to reach the positive predictive value (precision) and the completeness of the definition (f1_score) (Table <ref type="table" target="#tab_5">4</ref>).  Analyzing the figure, we can conclude that with the help of the Multinomial Bayes classifier, the number of sick patients in whom the disease will be detected is more significant. Using this classification method, more people will receive a correct diagnosis and, therefore, will have a chance for treatment and recovery.</p><p>We also compare the operating time of the two classification methods. Namely, determine the time of training (Table <ref type="table">5</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 5 Division of data into test and training</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Method</head><p>Time, s 14 features 10 7 features GaussianNB 0,01301 0,00902 0,00603 MultinomialNB 0,01196 0,00881 0.00399 Table <ref type="table">5</ref> shows the execution time of the classification of different Naive Bayes models. On the same data set, MultinomialNB performs training faster, which again emphasizes its advantage for the selected data set.</p><p>It is also noticeable that as the number of features decreases, the time decreases (Fig. <ref type="figure" target="#fig_19">28</ref>). Analyzing Fig. <ref type="figure" target="#fig_19">28</ref>, we can conclude that the Multinomial Bayes classifier is more accurate and faster for the selected data set.</p><p>So, the choice of using the Naive Bayes method depends on the data. The Multinomial Naive Bayes is appropriate if the data consists of calculations, and observations can only take non-negative integers. It is better to use the Gaussian NB for decimal features. GNB accepts features that correspond to the normal distribution.</p><p>For the selected data set, which contains features for diagnosing heart disease, the Multinomial Naive Bayes showed better results. Using this method, we can achieve greater accuracy and reduce the time to perform training.</p><p>Analyzing the study results, it is worth emphasizing the importance of choosing the correct method of the naive classifier. It helps achieve better classification results, which is critical in the medical field.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>The paper considered the relevance of the topic: the use of data mining methods for diagnosing the disease in a patient on a set of indicators, such as symptoms, test results, and other indicators.</p><p>We used the Heart data set for the study, which we cleared of emissions, Null values, and normalized. We also performed a search and analysis of significant features and patterns between different factors influencing heart disease.</p><p>In addition, we used two algorithms in this work, which objectively showed the classification results on the selected dataset.</p><p>The parameters used for the analysis were the selection and deletion of the function. We first tested a classifier with all the features and then gradually reduced the set to determine which algorithm best classifies with fewer features.</p><p>The simulation results show that the Multinomial Naive Bayes classifier has better accuracy than the Gaussian method with the same data set and parameters. In addition, it reduces training time, which is very important because the annual growth of data in medicine is increasing very rapidly.</p><p>In future work, it is worth considering two aspects. Namely, we can compare more algorithms to achieve better results and potentially introduce a better algorithm in Naive Bayes. Moreover, we can try to evaluate the effectiveness of their work to justify their use in the health care system.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>2. sex : (1 = a man; 0 = a woman); 3. cp: chest pain type (4 values); 4. trestbps: blood pressure at rest (in mm Hg on admission to the hospital); 5. chol: serum cholesterol in mg / dl; 6. fbs: (fasting blood sugar) (1 =&gt; 120 mg / dl; 0 = &lt;120 mg / dl); 7. restecg: the results of electrocardiography at rest (values 0, 1, 2); 8. thalach: the maximum pulse; 9. exang: angina caused by exercise (1 = yes; 0 = no); 10. oldpeak: ST depression caused by exercise for rest; 11. slope: the slope of the peak segment of exercise ST; 12. ca: the number of major vessels (0-3) stained by fluoroscopy; 13. thal: thalassemia (1 = normal; 2 = fixed defect; 3 = reversible defect); 14. target: (1 = heart disease or 0 = no heart disease). Fig. 1, Fig. 2, and Fig. 3 show a dataset.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Image of the first five rows of data</figDesc><graphic coords="4,140.40,422.40,328.20,79.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Attributes overview From Fig. 2, we can see that the categorical data are missing. There are numeric data of type int and float.</figDesc><graphic coords="4,225.00,541.00,159.00,114.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: The main data characteristicsAs can be seen from this section, most values are usually categorized. All columns have no spaces, contain 303 rows of data.An analysis of atypical emissions should also be conducted. To do this, use a standardized Z-Score score, which shows how many standard deviations is the scatter of the value relative to the observed average value. If the Z-Score value is greater than or less than 3 or -3, respectively, this data point will be defined as non-standard (Fig.4).</figDesc><graphic coords="5,89.80,72.00,429.60,103.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: An image of the mean values for the column, which determines the presence or absence of the diseaseTarget variable: whether the patient has heart disease or not (value 0 -yes; value 1 -no). Fig.6shows that the distribution is balanced.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 10 :</head><label>10</label><figDesc>Figure 10: The amount of data corresponding to each age In Fig. 10 x-axis indicates the age of patients, y-axis -the number of patients of a certain age. The graph shows that the age of the youngest patient is 22, the oldest is 77. The most common patients are aged 58. There are few patients under 40 or after 70. Therefore, the age distribution is shown in Fig. 11.</figDesc><graphic coords="7,135.40,324.00,338.40,115.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 11 :</head><label>11</label><figDesc>Figure 11: The presence or absence of the disease in different age categories Fig. 11 shows the age distribution: from 0 to 40 years, from 40 to 50, from 50 to 60, from 60 to 70, from 70 to 100. Green shows the presence of the disease, red the absence. The age range is arranged along the x-axis and the number of patients along the y-axis.In Fig.12, the x-axis represents the age of the patients. The y-axis represents the density estimate. Yellow indicates the absence of the disease, red -the presence. The relationship between age and female gender on the left. Between age and male -on the right.</figDesc><graphic coords="7,202.00,528.80,205.20,137.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 12 :</head><label>12</label><figDesc>Figure 12: Relationship between blood pressure and sexIn Fig.12, the x-axis represents the resting blood pressure in millimeters of mercury. The y-axis represents the density estimate. Yellow indicates the absence of the disease, red -the presence. The relationship between blood pressure and the female sex is on the left, on the right -between blood pressure and male.In Fig.13. presents the presence or absence of the disease, taking into account only one featureone attribute.</figDesc><graphic coords="8,151.80,72.00,305.40,145.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 13 :</head><label>13</label><figDesc>Figure 13: Relationship between the presence of the disease and other attributesIn Fig.13, x-axis shows features: gender, chest pain, blood sugar, electrocardiogram results, angina, ST-segment tilt during the most difficult part of the exercise, the number of major vessels stained with fluoroscopy, and thalassemia. The y-axis shows the number of patients. Yellow indicates the presence of the disease, red -the absence.From Fig.13, the following observations follow: 6. The number of major vessels stained with fluoroscopy refers to the number of narrow vessels seen, so the higher the value of this feature, the greater the likelihood of heart disease. 7. A very invasive process for patients obtains the results of blood flow observed through the radioactive dye. But in themselves, they are excellent evidence of heart disease or not. 8. The slope of the ST segment can help determine if you have heart disease or not if it is flat or growing. 9. Angina is a good indicator of heart disease. However, we can also see that knowing what angina is and what it is not an easy task can be confused with other pains or atypical angina. 10. When someone has heart disease, the first symptom is usually stable angina (angina during exercise). When angina occurs even at rest, the condition worsens (typically narrowing the coronary arteries). That is why so few patients find abnormal heart rates at rest, and the vision of this anomaly is very indicative of the presence of heart disease.</figDesc><graphic coords="8,152.40,332.40,304.20,206.60" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_9"><head>Figure 14 :</head><label>14</label><figDesc>Figure 14: Analysis of the main componentsComponent 1: Fig.14shows that the weight is considerable and positive for the feature of chol, slightly positive for sex and cp. This means that patients with a high rate of this component will have a meager chance of being diagnosed with heart disease. At the same time, people with more elevated serum cholesterol are more likely to be diagnosed with heart disease.Component 2: Fig.14shows that the weight is considerable and negative for thalach (maximum heart rate reached) and slightly negative for cp (type of chest pain), chol (serum cholesterol), and slope (slope of the peak segment of exercise ST).Thus, a high rate of thalach, cp, slope and chol, mainly does not cause heart disease. People who have high levels of these components are much less likely to have heart disease. In contrast, age and high resting blood pressure (trestbps) may be the first features of heart disease. In Fig.14, they are positive.</figDesc><graphic coords="9,158.60,274.40,291.80,211.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head>Figure 15 :</head><label>15</label><figDesc>Figure 15: Example of OneHotEncoder operation Fig. 15 shows an example of the OneHotEncoder operation. The pain column, which contained three classes: medium, strong, and weak, was divided into three new columns: severe pain, moderate pain, and mild pain. All columns contain only two values: 1 if the information is confirmed, 0 if not.For each value in the object, MinMaxScaler subtracts the minimum value and then divides it by range. The range is the difference between the initial maximum and the initial minimum. MinMaxScaler retains the shape of the original distribution.After normalization, the classification should be performed. To implement the classification, you need to use GuassianNB from the sklearn library with different types of states when sharing data.The score function from the sklearn library is used to evaluate the results, which returns the average accuracy of the given test data and labels. The results obtained are presented in Fig.16.</figDesc><graphic coords="10,135.20,72.00,338.60,73.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_11"><head>Figure 19 :</head><label>19</label><figDesc>Figure 19: Average classification scores using 10 attributes Fig. 19 x-axis indicates the number of random states, y-axis -the average score for this method. The figure shows that the score ranges from 0.4 to 1. Thus, the average estimate of the naive Bayes classifier for random states from 0 to 200 is 0.830327868852459. Again, we need to reduce the number of attributes to 7. The results of the experiment are shown in Fig. 20.</figDesc><graphic coords="11,157.20,296.00,294.60,99.60" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_12"><head>Figure 20 :</head><label>20</label><figDesc>Figure 20: Average classification scores using 7 attributes Fig. 20 x-axis indicates the number of random states, y-axis -the average score for this method. The figure shows that the score ranges from 0.5 to 1.Thus, the average estimate of the Naive Bayes classifier for random states from 0 to 200 is 0.765245901639342.</figDesc><graphic coords="11,166.40,510.20,276.20,109.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_13"><head>Figure 23 :</head><label>23</label><figDesc>Figure 23: Measures of accuracy Fig. 23 shows the accuracy or positive predictive value (precision), probability of detection (recall), and completeness of determination (f1_score). The next step is to reduce the number of attributes to 10. You need to remove the parameters that have the most negligible impact on heart disease and apply the Multinomial Naive Bayes classifier again. The results of the experiment are shown in Fig. 24.</figDesc><graphic coords="12,156.60,615.40,296.00,112.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_14"><head>Figure 24 :</head><label>24</label><figDesc>Figure 24: Average classification scores using 10 attributesIn Fig.24x-axis indicates the number of random states, y-axis -the average score for this method. The figure shows that the score ranges from 0.7 to 1.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_15"><head>Figure 25 :</head><label>25</label><figDesc>Figure 25: Average classification scores using 7 attributes Fig. 25 x-axis indicates the number of random states, y-axis -the average score for this method. The figure shows that the score ranges from 0.7 to 1.Therefore, the average estimate of the Multinomial Naive Bayes classifier for random states from 0 to 200 is 0.83010.</figDesc><graphic coords="13,145.60,147.80,317.80,111.60" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_16"><head></head><label></label><figDesc><ref type="bibr" target="#b2">3</ref> shows the results of classification, the accuracy of different models of Naive Bayes. In the study, Multinomial Naive Bayes achieved the highest average accuracy with 0.85%. This shows that the multinomial classifier surpassed the Gaussian model. Fig.26shows a comparison of the accuracy of the two methods.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_17"><head>Figure 26 :</head><label>26</label><figDesc>Figure 26: Comparison of the accuracy of two methods of the Naive Bayes classifier In Fig. 26, orange depicts a Naive Bayes classifier with a Gaussian distribution. The Naive Multinomial Bayes classifier is shown in yellow. The x-axis indicates the number of features used for the experiments. The y-axis shows the achieved accuracy.From Fig.26, it is noticeable that both methods show worse results when the number of features decreases. This is because we first rejected the symptoms, which had little effect on heart disease. Therefore, the accuracy is similar.In addition, we can notice that the Multinomial classifier shows much better results when reducing the number of features. This advantage is because this method makes the second assumption of positional independence. Conditional probabilities of symptom onset are equally independent of its position in the data sample.Let's take into account the nature of the chosen problem in the study, namely the values of "0" and "1" in the answers of the classifiers. We can conclude that the correct classification of first-class objects is, in our case, more critical. After all, it is better to do all the tests once again for a healthy person than not to recognize the disease in a sick person.That is why it is worth emphasizing the recall score when comparing models with each other and choosing the best one. It estimates the proportion of correctly classified first-class objects. In addition, it is necessary to reach the positive predictive value (precision) and the completeness of the definition (f1_score) (Table4).</figDesc><graphic coords="14,194.20,221.80,220.80,160.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_18"><head>Figure 27 :</head><label>27</label><figDesc>Figure 27: Comparison of estimates of two methods of the Naive Bayes classifier In Fig. 27, orange depicts a Naive Bayes classifier with a Gaussian distribution. The Multinomial classifier is shown in yellow. The x-axis indicates the selected estimates used for the experiments. The y-axis shows the achieved accuracy.Analyzing the figure, we can conclude that with the help of the Multinomial Bayes classifier, the number of sick patients in whom the disease will be detected is more significant. Using this classification method, more people will receive a correct diagnosis and, therefore, will have a chance for treatment and recovery.We also compare the operating time of the two classification methods. Namely, determine the time of training (Table5).</figDesc><graphic coords="15,194.20,72.00,220.80,160.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_19"><head>Figure 28 :</head><label>28</label><figDesc>Figure 28: Comparison of learning time of two methods of the Naive Bayes classifier In Fig. 28, orange depicts a Naive Bayes classifier with a Gaussian distribution. The Multinomial classifier is shown in yellow. The x-axis indicates the number of features that were used for the experiments. The y-axis indicates the training time.</figDesc><graphic coords="15,194.20,548.80,220.80,160.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="6,147.60,153.60,314.00,200.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc></figDesc><table><row><cell>Data set characteristics</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Dataset</cell><cell>Examples</cell><cell>Train data</cell><cell>Class</cell><cell>No. of features</cell></row><row><cell>Heart</cell><cell>303</cell><cell>240</cell><cell>2</cell><cell>14</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table . 1</head><label>.</label><figDesc>shows the use of the Heart data set, which contains 303 data sets, of which 240 are used for training. The data set includes two classes and 14 characteristics.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Dividing data to test and training</figDesc><table><row><cell>Method</cell><cell>Train</cell><cell>Dataset (Heart)</cell><cell>Test</cell></row><row><cell>GaussianNB</cell><cell>80%</cell><cell></cell><cell>20%</cell></row><row><cell>MultinomialNB</cell><cell>80%</cell><cell></cell><cell>20%</cell></row><row><cell cols="4">Table. 2 shows the results of data sharing for training and testing. Both algorithms obtained data</cell></row><row><cell>that were equally separated.</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3</head><label>3</label><figDesc>Comparison of accuracy of two classifier models</figDesc><table><row><cell>Method</cell><cell>14 features</cell><cell>Accuracy % 10 features</cell><cell>7 features</cell></row><row><cell>GaussianNB</cell><cell>0,84426229</cell><cell>0,83032786</cell><cell>0,765245901</cell></row><row><cell>MultinomialNB</cell><cell>0.85012901</cell><cell>0.849039016</cell><cell>0.830100260</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table .</head><label>.</label><figDesc></figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 4</head><label>4</label><figDesc>Comparison of evaluations of two classifier models</figDesc><table><row><cell>Method</cell><cell>Recall</cell><cell>Accuracy % Precision</cell><cell>F1_score</cell></row><row><cell>GaussianNB</cell><cell>0,868788</cell><cell>0,828571</cell><cell>0,852941</cell></row><row><cell>MultinomialNB</cell><cell>0,878788</cell><cell>0,852941</cell><cell>0.865672</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Diagnosis of Heart Disease for Diabetic Patients using Naive Bayes Method</title>
		<author>
			<persName><forename type="first">G</forename><surname>Parthiban</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rajesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Srivatsa</surname></persName>
		</author>
		<idno type="DOI">10.5120/2933-3887</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Applications</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">3</biblScope>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Decision Support in Heart Disease Prediction System using Naive Bayes</title>
		<author>
			<persName><forename type="first">G</forename><surname>Subbalakshmi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tech</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tech</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Indian Journal of Computer Science and Engineering</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="170" to="176" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Predictive Data Mining for Medical Diagnosis: An Overview of Heart Disease Prediction</title>
		<author>
			<persName><forename type="first">J</forename><surname>Soni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Ansari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soni</surname></persName>
		</author>
		<idno type="DOI">10.5120/2237-2860</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Applications</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="43" to="48" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Prediction Analysis of Cardiac Disease using Classification</title>
		<author>
			<persName><surname>Ch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Vincy</surname></persName>
		</author>
		<author>
			<persName><surname>Bindu</surname></persName>
		</author>
		<idno type="DOI">10.22214/ijraset.2019.6295</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Advanced Technologies of Big Data Research in Distributed Information Systems</title>
		<author>
			<persName><forename type="first">N</forename><surname>Kunanets</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vasiuta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Boikо</surname></persName>
		</author>
		<idno type="DOI">10.1109/STC-CSIT.2019.8929756</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14th International conference &quot;Computer sciences and Information technologies</title>
				<meeting>the 14th International conference &quot;Computer sciences and Information technologies<address><addrLine>Lviv, Ukraine</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">September 17-20 (2019</date>
			<biblScope unit="page" from="71" to="76" />
		</imprint>
	</monogr>
	<note>CSIT 2019)</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<ptr target="https://www.kaggle.com/zhaoyingzhu/heartcsv" />
		<title level="m">Heart Database</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<ptr target="https://manufacturaclinica.com/blog/sertsevo-sudinni-zahvoryuvannya" />
		<title level="m">Cardiovascular Diseases</title>
				<imprint/>
		<respStmt>
			<orgName>Clinic Manufactory</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Periodontal disease as a risk factor for heart disease</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">J</forename><surname>Loesche</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Compendium</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page">978</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Weighted naive bayes classifier: A predictive model for breast cancer detection</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kharya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Soni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Computer Applications</title>
		<imprint>
			<biblScope unit="volume">133</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page" from="32" to="37" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Performance comparison between Naïve Bayes, decision tree and k-nearest neighbor in searching alternative design in an energy simulation tool</title>
		<author>
			<persName><forename type="first">A</forename><surname>Ashari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Iman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Min</forename><surname>Tjoa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Advanced Computer Science and Applications</title>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Extraction of action rules for chronic kidney diseas using Naive Bayes classifier</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">D</forename><surname>Uma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Internstional Conference Comput Intelligence Comput Res</title>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Lipids, risk factors and ischaemic heart disease</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">P</forename><surname>Castelli</surname></persName>
		</author>
		<idno type="DOI">10.1016/0021-9150(96)05851-0</idno>
	</analytic>
	<monogr>
		<title level="j">Atherosclerosis</title>
		<imprint>
			<date type="published" when="1996">1996</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Clustering of Metabolic Factors and</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">F</forename><surname>Wilson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">B</forename><surname>Kannel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Silbershatz</surname></persName>
		</author>
		<idno type="DOI">10.1001/archinte.159.10.1104</idno>
	</analytic>
	<monogr>
		<title level="j">Heart Disease</title>
		<imprint>
			<biblScope unit="volume">159</biblScope>
			<biblScope unit="issue">10</biblScope>
			<biblScope unit="page">1104</biblScope>
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<ptr target="https://www.youtube.com/watch?v=O2L2Uv9pdDA&amp;ab_channel=StatQuestwithJoshStarmerStatQuestwithJoshStarmer%D0%9F%D1%96%D0%B4%D1%82%D0%B2%D0%B5%D1%80%D0%B4%D0%B6%D0%B5%D0%BD%D0%BE" />
		<title level="m">Stat Quest with Josh Starmer -Naive Bayes</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Information System of Catering Selection by Using Clustering Analysis</title>
		<author>
			<persName><forename type="first">N</forename><surname>Boyko</surname></persName>
		</author>
		<author>
			<persName><surname>Kh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Shakhovska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mochurad</surname></persName>
		</author>
		<author>
			<persName><surname>Campos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st International Workshop on Digital Content &amp; Smart Multimedia (DCSMart 2019)</title>
				<meeting>the 1st International Workshop on Digital Content &amp; Smart Multimedia (DCSMart 2019)<address><addrLine>Lviv, Ukraine</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">December 23-25. 2019</date>
			<biblScope unit="page" from="94" to="106" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
