<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Rajni Bala, Dharmender Kumar,
Classification Using ANN: A Review,
International Journal of Computational
Intelligence Research</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">0973-1873</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Prediction and Detection of Diabetes using Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Olta Llaha</string-name>
          <email>olta.petritaj@fshn.edu.al</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amarildo Rista</string-name>
          <email>amarildorista@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>South East European University</institution>
          ,
          <addr-line>Arhiepiskop Angelarij, Skopje 1000, North Macedonia, Skopje, North</addr-line>
          <country country="MK">Macedonia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>16</volume>
      <issue>7</issue>
      <fpage>7</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>Data mining and machine learning have become a vital part of different disease detection and prevention. One of them is diabetes. The purpose of this paper is to evaluate data mining methods and their performances that can be used for analyzing the collected data about the diabetes. We identified the most appropriate data mining methods to analyze the data by comparing them theoretically and practically. Some attributes of this dataset are: Age, Body Mass Index, Insulin, Glucose, etc. Methods are applied on these data to determine their effectiveness in analyzing and preventing diabetes. Evaluations on the data showed that the method with a higher performance is “Decision Tree”. This was achieved by some performance measures, such as the number of instances correctly classified, accuracy, precision, recall and F-measure, that has brought better results compared to other methods. We come to the conclusion that the data mining methods and machine learning contribute to the predictions on the possibility of occurrence of the diabetes.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Machine Learning</kwd>
        <kwd>Prediction</kwd>
        <kwd>Diabetes Disease</kwd>
        <kwd>Data Mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Diabetes is a disease that is increasingly
affecting the world even the most developed
countries. Diabetes by the nature of its
development as a globally problematic disease
requires maximum commitment from medical
staff, patients, family and society. Diabetes is a
disease with high social, health and economic
costs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Diabetes is a chronic disease
characterized by an increase in glucose or blood
sugar levels because the body cannot produce
insulin or its production is insufficient, or
insulin is not able to act on the cells of the
organism [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Medics still do not know
exactly why such a thing is happening and they
have called the cause: x syndrome. Historically
diabetes treatment has been done by fighting
the symptoms and not the cause. According to
the World Health Organization, Diabetes
affects about 5% of the world's population and
the number of patients is constantly increasing
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In developed countries, diabetes and the
largest number of diabetics are found in people
over 65 years of age. Whereas in developing
countries where our country is part of the
largest number of diabetics is found in the age
of 45-64 years, but in recent years type 2
diabetes is more commonly encountered also in
the age of 30-40 years [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The availability of
historical data naturally leads to the application
of data mining techniques for pattern discovery.
The goal is to find rules that help understand
diabetes and make it easier to diagnose it
sooner. Prevention of diabetes is of great
interest in the field of medicine. The use of data
mining accelerates data analysis, and analysts
can examine existing data to identify patterns
and trends of diabetes.
      </p>
      <p>This paper is structured as follows: Section. 2
describes the relationship that exists between
data mining, machine learning and medicine.
The methodology and description of the dataset
are described in Section. 3. Sections. 4 and 5,
represent a theoretical description of the
methods and algorithms that will be applied
practically to our data. Section 6 presents the
results of the application of algorithms and an
explanation for the algorithm with the best
results. In sect. 7 the conclusions and future
work are discussed.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Using Data Mining and Machine</title>
    </sec>
    <sec id="sec-3">
      <title>Learning in Medicine</title>
      <p>
        Medicine is the science and practice of
establishing the diagnosis, prognosis,
treatment, and prevention of disease. Medicine
encompasses a variety of health care practices
evolved to maintain and restore health by the
prevention and treatment of illness [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This is
one of the most important areas when applying
data mining techniques can produce significant
results [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        With data mining techniques, doctors will be
able to predict illnesses effectively and they
will be better equipped to manage potential
high-risk candidates [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The high volume of
diseases data and the complexity of the
relationships between them have made
medicine an appropriate field for applying data
mining techniques. Data mining can be used to
examine many large datasets involving a large
set of variables beyond what a single analyst or
doctor, or even an analytical team can. Like any
other problem solving method, the task of data
mining begins with a problem definition. The
identification of the data mining problem
enables the determination of the data mining
process and the modeling technique. Machine
learning is a subfield of data science that deals
with algorithms able to learn from data and
make accurate predictions. Data mining gives
health organizations the opportunity to learn
about disease trends etc. By using data mining
methods and machine learning algorithms we
improve diabetes analysis and we help to
reduce and prevent it.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Data and Methodology</title>
      <p>
        We compare theoretically and practically
data mining methods to discover the most
appropriate method for our data. The methods
were compared by applying machine learning
algorithms to concrete data in the WEKA
“Waikato Environment for Knowledge
Analysis” [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] environment. The implemented
algorithms are: Simple Logistic, Multilayer
Perceptron, Logistic, Naive Bayes, Bayes Net,
SMO, C4.5.In Figure 1, we explain all the
stages of this study from predicting diabetes
using data mining methods and machine
learning algorithms of these methods.
      </p>
      <p>In data gathering step we collect data from
the sector of statistics of the Public Health
Institute. The collected data is stored into
database for further process. The dataset is
made up of 270 records or instances.</p>
      <p>The variables or attributes of this dataset are:
1) Age: As you age, your risk of diabetes
increases, especially when you are over 45
years old., 2) Body Mass Index: It is an
indicator of weight (underweight, normal,
overweight) based on length and weight. Given
weight (kg) / (length m) 2. Ideal BMI values are
18.5-24.9. If we have values 25-29.9 the person
is considered overweight, 30-39.9 indicates
obesity and 40+ significant obesity. 3) Insulin:
Serum Insulin in two hours. Values higher than
150μU/ ml mean that a person needs insulin
therapy, therefore he is pre-diabetic or diabetic.,
4) Glucose: Glucose tolerance test values
(glucose value mg / dl 2 hours after 75 g
glucose) A person is said not to suffer from
diabetes if the tolerance test value at two hours
is less than or equal to 110 mg / dL (Norman
1)., 5) Skin Thickness: Triceptal Muscle
Thickness (mm) - Indicative value of 23 mm
overweight for women, values higher than
normal indicate that the person is overweight.,
6) Blood Pressure: Diastolic blood pressure
(mm Hg) Normal blood pressure values are:
6080 mm Hg, 80-89 indicates pre-hypertension
and 90+ hypertension., 7) Number of
pregnancies: A woman can be diagnosed with
diabetes Gestational during pregnancy.
Hormones produced during pregnancy can
make cells more resistant to insulin. Those who
are older than 25 have a higher risk. Moreover,
if a woman has diabetes during one pregnancy,
there is an increased risk at the next pregnancy
(Diabetes-Bing Health). 8) Outcome: negative
when the person is not diagnosed with diabetes
and positive when the person is diagnosed with
diabetes. The experiments were conducted with
a female population over 19 years of age.
Diabetes dataset is in CSV format.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Classification</title>
      <p>
        Classification is a data mining technique
that categorizes data in order to assist in more
accurate predictions and analysis [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. It is one
of the data mining methods that aims to analyze
very large datasets. It is used to derive patterns
that accurately define the important data classes
within the data set. Classification techniques
predict the target classes for each of the present
data instance. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Classification algorithms
attempt to detect relationships between
attributes that would make it possible to predict
the result. They analyze the input and produce
a prediction. The classification task of data
mining is generally used in healthcare
industries [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
4.1.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Naïve Bayes</title>
      <p>
        Bayesian classification represents a
supervised learning method as well as a
statistical classification method. The Naive
Bayes Classifier technique is based on the
 ( | =
Bayesian theorem and is used especially when
the dimensionality of the inputs is high [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Bayesian classification provides practical
learning algorithms and prior knowledge, here
the observed data can be combined. It calculates
the apparent hypothetical probability. The
algorithm works as follows. Bayes' theorem
offers a way to calculate the probability of a
hypothesis based on our prior knowledge. It
works based on conditional probability [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. It
can be represented as:
 ( | ) ( )
      </p>
      <p>( )</p>
      <p>Here M and N are two events and, P(M|N) is
the conditional probability of M given N. P(M)
is the probability of M. P(N) is the probability
of N. P (N|M) is the conditional probability of
N given M.</p>
      <p>
        Naive Bayes is a strong and powerful
predictor. This technique can be useful for very
large number of data sets [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The Naive
Bayesian classifier is fast and incremental and
it can deal with discrete and continuous
attributes. It has excellent performance and it
can explain the decisions.
4.2.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Support Vector Machine</title>
      <p>
        SVM classifier is a supervised learning
algorithm based on statistical learning theory
introduced by Vepnik (Vapnik, 1995) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The
main idea behind this method is to determine a
hyperplane that optimally separates two classes
using training dataset. SVM is a set of related
supervised learning method used in medical
diagnosis for classification and regression [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
Support Vector Machine (SVM) model is the
representation of examples defined as points in
space that are mapped so that the examples of
the different categories can be divided by a
clear gap that is as large as possible [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. SVM
also supports regression and classification
techniques and can handle multiple continuous
and categorical variables. The efficiency of
SVM-based classification is not directly
dependent on the dimension of the classified
entities. This algorithm achieves high
discriminative power by using special nonlinear
functions called kernels to transform the input
space into a multidimensional space [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. It can
be seen that the choice of kernel function and
best value of parameters for particular kernel is
critical for a given amount of data [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. It also
normalizes all attributes by default.
      </p>
    </sec>
    <sec id="sec-8">
      <title>The decision tree</title>
      <p>Decision tree model has a tree structure,
which can describe the process of classification
instances based on features [17]. It splits the
data in the database into subsets based on the
values of one or more fields. This process will
be repeated for each subgroup recursively until
all instances are a node in a single class. The
result of the decision tree is a tree-shaped
structure that describes a series of decisions
given at each step [17]. Decision trees are easy
to interpret and understand. They provide white
box structure for each provided dataset and can
be combined
with any
other data
mining
techniques [18]. The typical algorithms of
decision tree are ID3, C4.5, CART and so on.
In this study, we used the C4.5 algorithm. The
C4.5 is a fraction between information gain and
its splitting information. It selects the attribute
value of the data that most effectively separate
the tested data into subset data which enriched
the
class.</p>
      <p>The tree is generated
by the
normalized information gain [19]. The C4.5
inductive algorithm</p>
      <p>generates rules from a
single tree. It can transform multiple decision
trees and create a set of classification rules.
Such features of this algorithm can be used to
scale general rules, instruction time, size, and
number of rules. This algorithm fits to medical
records because it copes with missing values.
Furthermore the algorithm handles continuous
data which is common in medical symptoms.
Random Forest is a method of classification
which combines hundreds or thousands of
decision trees and it trains each one of them on
a slightly different set of the observations,
splitting nodes in each tree considering a
limited number of the features [20]. The final
predictions of the Random Forest method are
made by averaging the predictions of each
individual tree. It is fast and easy to implement,
and it produces highly accurate predictions and
also it can handle a very large number of input
variables without over-fitting [21].
4.4.</p>
    </sec>
    <sec id="sec-9">
      <title>Artificial Neural Network</title>
      <p>Neural networks are an area of Artificial
Intelligence
(AI),
where
based
on
the
inspiration we have from the human brain [22].
Applying neural network techniques, a program
can learn from the examples and create an
internal set of rules for classifying different
inputs. All processes of a neural network are
performed by this group of neurons or units
[22]. Each neuron is a separate communication
device, making its operation relatively simple.
The function of one unit is simply to receive
data from other units, as a function of the inputs
it receives to calculate an output value, which it
sends to
other
units. In
artificial
neural
networks, neurons are organized in layers
which process information using dynamic state
responses to external inputs [17]. Artificial
neural network is an example of supervised
learning
(ANNs)
observations</p>
      <sec id="sec-9-1">
        <title>Neural classification, [23]. are</title>
        <p>network</p>
        <p>Artificial
neural</p>
        <p>networks
capable
from
of
predicting</p>
        <p>new
existing</p>
        <p>observations.
method
is
used
for
clustering,
feature
mining,
prediction and pattern recognition. One of the
most used Neural Networks is the Multilayer
Perceptron (MLP), in which its neurons apply a
nonlinear activation function to calculate their
outputs [24]. The activation function includes a
sigmoid function (f(x) = 1 / (1 + exp (-x))) in
the hidden layer and a linear function (fj(x) =
Σp i=1wijxi, where xi's are predictor variables
and wij's are input weights) in the output layer.
The functional form of the MLP can be written
as:

 =1
 =  ( ∑  ji 
+  j)
where xi is the i-th nodal value in the previous
layer, yj is the j-th nodal value in the present
layer, bj is the bias of the j-th node in the present
layer, wji is a weight connecting xi and yj, N is
the number of nodes in the previous layer, and
f is the activation function in the present layer
[24].</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>5. Association</title>
    </sec>
    <sec id="sec-11">
      <title>Regression</title>
    </sec>
    <sec id="sec-12">
      <title>Rules and</title>
      <sec id="sec-12-1">
        <title>Association</title>
        <p>
          Rule is one of the
most
important canonical tasks in data mining and
probably one of the most studied techniques for
pattern discovery. Association rules are if/then
statements that help to uncover relationships
between unrelated data in a database, relational
database or other information repository [
          <xref ref-type="bibr" rid="ref17">25</xref>
          ].
Association Rules identify the arguments found
together with a given, event or record: "the
presence of one set of arguments brings the
presence of another set". This is how rules of
type are identified: "if argument A is part of an
event, then for a certain probability argument B
is also part of the event" [
          <xref ref-type="bibr" rid="ref18">26</xref>
          ]. Association also
has great impact in the health care industry to
discover the relationships between diseases,
state of human health and the symptoms of
disease [
          <xref ref-type="bibr" rid="ref19">27</xref>
          ]. It can be used to detect and study
the etiological pathways in the populations as
they suggest interconnections of various risk
factors responsible for a disease and are easily
interpretable
[
          <xref ref-type="bibr" rid="ref18">26</xref>
          ].
        </p>
        <p>The
objective
of the
association rule</p>
        <p>was to discover interesting
association or correlation relationships among a
large set of data items. Support and confidence
are the most known measures for the evaluation
of association rule.
trees
instead</p>
        <p>
          While classification provides categorical,
discrete labels, regression
has
continuous
function values. So regression is used mainly to
predict missing numeric data values rather than
discrete class labels. Regression analysis is a
statistical
technique
for
examination
of
connection between the dependent variable and
independent variable, which aims to predict the
dependent
variable
from
the
independent
variable or variables [
          <xref ref-type="bibr" rid="ref20">28</xref>
          ]. Regression also
involves identifying the distribution of trends
based on available data. For this purpose
regression trees can be used as well as decision
whose nodes have numerical values
of
categorical
values.
        </p>
        <p>
          Logistic
regression used to estimate the probability of
occurrence
of a
specific
event
and
the
dependent variable is odds ratio
which is
another way of expressing possibility. This
model can be taken into
account as the
generalized linear model as a link function and
its
mistake
following
of the polynomial
distribution [
          <xref ref-type="bibr" rid="ref20">28</xref>
          ].
        </p>
        <p>This model as:
 = log  ( ) = 
=  +  1 1,j + ⋯ +  k k,</p>
        <p>1 − 
 = 1 … 
 =  r(Yi=1)
 =  r(Yi=1|X)=</p>
      </sec>
      <sec id="sec-12-2">
        <title>Is that</title>
        <p>+ 1 1,i+⋯+ k k,i
1 +   + 1 1,i+⋯+ k k,i
Where: P = is the probability that an example
belongs to a particular category,
e = base of natural algorithm (~ 2.72),
α = constant of the equation,
β = coefficient of the predictor variables.</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>6. Experimental Results</title>
      <p>
        To conduct this study we used WEKA [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
software based on the approach and familiarity
with its use. WEKA is an open source tool for
data mining, which allows users to apply
preprocessing algorithms but it does not provide
assistance in terms of which one to apply.
However,
algorithms
since
have
different
different
data
regarding the dataset, some preprocessing is
applied
algorithms.
      </p>
      <p>by
default
Data
inside</p>
      <p>some
preprocessing
of the
includes
cleaning, instance
selection,
normalization,
transformation, feature extraction, selection,
etc. Data preprocessing affects the way in
which outcomes of the final data processing can
be interpreted.</p>
      <sec id="sec-13-1">
        <title>WEKA software package has</title>
        <p>different programs for different techniques and
algorithms.</p>
        <p>Experiments are done by using
Crossvalidation on default option folds= 10. Cross
validation helps to improve the model results.
The 10-fold cross validation technique has been
used for better predictions. We have divided our
dataset in to 10 samples. Each sample had to go
from the process of retained as a validation
data, where the rest 9 samples acted as a
training data. This was a 10 times vice versa
process. That's why it is call 10-fold cross
validation.</p>
        <p>The
advantage
gained
by this
process step is that it cuts down the bias
association
with random sampling
methods.</p>
        <p>Different classification algorithms were applied
on our dataset, and the results for all methods
were slightly different as the working criteria of
each algorithm is different. The results were
evaluated on the basis of correctly classified
instances, accuracy, precision, recall and
fmeasure. Performance indicators are given on
the following Table 2 and Table 3</p>
        <p>This algorithm is clear and easy when we
use it to interpret the results. It selects the
attribute value of the data that most effectively
separates the tested data into subset data which
enriches the class. The model construction is
done by modifying the parameter values and
this algorithm classifies diabetes disease data
with a higher accuracy than other algorithms of
data mining methods. This is shown in Table 3,
it is the comparison of Accuracy of models after
the implementation of algorithms.</p>
        <p>Accuracy of classifier refers to the ability of
classifier. It predicts the class label correctly
and the accuracy of the predictor refers to how
well a given predictor can guess the value of
predicted attribute for a new data. F-measure is
a measure of a test's accuracy. It considers both
the precision and the recall of the test to
compute the score: precision is the number of
correct positive results divided by the number
of all positive results returned by the classifier,
and recall is the number of correct positive
results divided by the number of all relevant
samples (all samples that should have been
identified as positive).</p>
        <p>− 


=

=

+ 
=


=
2 ∗ 

+ 
+ 

+ 

+</p>
        <p>∗ 
+ 
+ 
 



</p>
        <p>True positive (TP): correct positive
prediction
False positive (FP): incorrect positive
prediction
True negative (TN): correct negative
prediction
False negative (FN): incorrect negative
prediction</p>
        <p>We converted our data to CSV format. The
C4. 5 algorithm for building decision trees is
implemented in WEKA as a classifier called
J48. J48 has the full name
weka.classifiers.trees.J48. What came out of
this algorithm: the visualization and the
decision tree are presented in Figure 4 and
Figure 5.</p>
        <p>The implementation of this algorithm has
classified the diabetes data based on the dataset
attributes where precision, recall and f-measure
have the highest values compared to other
algorithms of data mining methods. This is
shown in Figure 3. Figure 5 shows the
visualization of the decision tree which is
generated by the implementation of the C4.5
algorithm.</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>7. Conclusion</title>
      <p>The purpose of this article was to create a
decision-making structure for diagnosing
diabetes. This structure was realized through
the study of classification data mining methods
such as Naive Bayes, Decision Tree, Support
Vector Machine (SVM), Logistic Regression
and their evaluation to show the highest
performing method on the dataset. The results
of experiments conducted in this research by
implementing algorithms of data mining
methods have revealed that these methods are
applicable in the process of diabetes prediction.
The decision tree as a data mining classification
method has classified diabetes data at an
accuracy rate of 79%. This method has shown
promising results for the problem of diabetes
prediction as the accuracy rate is high in the
experiments performed. Furthermore, the
decision tree seems more viable due to the fact
that in contrast to other algorithms, it expresses
the rules explicitly. These rules can be
expressed in human language so that anyone
can understand them. Decision trees are easy to
interpret and understand. The use of machine
learning in analysis diabetes is important
because data mining methods and machine
learning can be used in the decision making
process. In the future extension of this study
some models will be created for predicting the
diabetes that will help health centers, hospitals,
etc. to create policies or make decisions about
diabetes by preventing it. Algorithms’ behavior
changes will be looked at when more data is
added. In the future we plan to do the same
study but this time not only on women but on
all persons regardless of gender. We also intend
to implement this study to an integrated
Diabetes Decision Support System (DDSS) that
we will create.
8. References</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] http://www.ishp.gov.al/wpcontent/uploads/2015/kalendar/Dita%20b oterore%20e %
          <fpage>20diabetit</fpage>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] https://www.familjadheshendeti.
          <article-title>com/sem undja-e-sheqerit-diabeti-te-femrat/S.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Bo</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Kuang-i Shu and Heng Zhang, Machine Learning and Data Mining in Diabetes Diagnosis and Treatment</article-title>
          , IOP Conference Series: Materials Science and Engineering, Volume
          <volume>490</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>4</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>IOP</given-names>
            <surname>Conf</surname>
          </string-name>
          .
          <source>Series: Materials Science and Engineering</source>
          <volume>490</volume>
          (
          <year>2019</year>
          ) 042049 IOP doi:
          <volume>10</volume>
          .1088/1757899X/490/4/042049
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>[4] https://en.wikipedia.org/wiki/Medicine</mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Ionita</surname>
            ,
            <given-names>Irina</given-names>
          </string-name>
          &amp; Ioniță,
          <string-name>
            <surname>Liviu.</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Applying Data Mining Techniques in Healthcare</article-title>
          .
          <source>Studies in Informatics and Control</source>
          .
          <volume>25</volume>
          .
          <fpage>385</fpage>
          -
          <lpage>394</lpage>
          .
          <fpage>10</fpage>
          .24846/v25i3y201612.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Bisandu</surname>
            ,
            <given-names>Desmond</given-names>
          </string-name>
          &amp; Datiri, Dorcas &amp; Onokpasa, Eva &amp; Thomas,
          <string-name>
            <surname>Godwin</surname>
          </string-name>
          &amp; Haruna, Musa &amp; Aliyu,
          <string-name>
            <surname>Aminu.</surname>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Diabetes Prediction using Data mining Techniques</article-title>
          .
          <source>International Journal of Innovation Science</source>
          .
          <volume>4</volume>
          .
          <fpage>103</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>Eibe</given-names>
          </string-name>
          &amp; Hall, Mark &amp; Holmes, Geoffrey &amp; Kirkby, Richard &amp; Pfahringer, Bernhard &amp; Witten, Ian &amp; Trigg,
          <string-name>
            <surname>Len.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Weka-A Machine Learning Workbench for Data Mining</article-title>
          .
          <volume>10</volume>
          .1007/978-0-387
          <fpage>09823</fpage>
          -
          <lpage>4</lpage>
          _
          <fpage>66</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Pang-Ning Tan</surname>
          </string-name>
          ;
          <article-title>Michael Steinbach; Anuj Karpatne; Vipin Kuma Introduction to Data Mining 2nd ed</article-title>
          ,
          <source>Publisher: Pearson</source>
          ,
          <year>2019</year>
          ,
          <string-name>
            <surname>Print</surname>
            <given-names>ISBN</given-names>
          </string-name>
          :
          <volume>9780133128901</volume>
          , 0133128903 eText ISBN:
          <volume>9780134080284</volume>
          ,
          <fpage>013408028</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Pandey</surname>
            ,
            <given-names>Dr. Subhash.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Data Mining Techniques for Medical Data: A Review</article-title>
          .
          <volume>10</volume>
          .1109/SCOPES.
          <year>2016</year>
          .
          <volume>7955586</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Sisodia</surname>
            ,
            <given-names>Deepti</given-names>
          </string-name>
          &amp; Sisodia,
          <string-name>
            <surname>Dilip.</surname>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Prediction of Diabetes using Classification Algorithms</article-title>
          . Procedia Computer Science.
          <volume>132</volume>
          .
          <fpage>1578</fpage>
          -
          <lpage>1585</lpage>
          .
          <fpage>10</fpage>
          .1016/j.procs.
          <year>2018</year>
          .
          <volume>05</volume>
          .122.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Tharak</surname>
            <given-names>Roopesh</given-names>
          </string-name>
          ,
          <source>Asadi Srinivasulu andK.S.Kannan EasyChair</source>
          ,
          <article-title>Prediction of Diabetes Disease Using Data Mining and Deep Learning Techniques</article-title>
          , Easy hair Preprint, №
          <volume>1608</volume>
          ,
          <string-name>
            <surname>October</surname>
            <given-names>9</given-names>
          </string-name>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Priyadarshini</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.Lakshmi</surname>
          </string-name>
          ,
          <article-title>Predictive Analysis of Diabetes Using Bayesian Network and Naive Bayes Techniques</article-title>
          ,
          <source>International Conference on Advancements in Computing Technologies - ICACT</source>
          <year>2018</year>
          , Volume:
          <volume>4</volume>
          Issue: 2, ISSN:
          <fpage>2454</fpage>
          -
          <lpage>4248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Giveki</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salimi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahmanyar</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Khademian</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Automatic Detection of Diabetes Diagnosis using Feature Weighted Support Vector Machines based on Mutual Information and Modified Cuckoo Search</article-title>
          . ArXiv, abs/1201.2173.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Jegan</surname>
            ,
            <given-names>Chitra.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Classification Of Diabetes Disease Using Support Vector Machine</article-title>
          .
          <source>International Journal of Engineering Research and Applications</source>
          .
          <volume>3</volume>
          .
          <fpage>1797</fpage>
          -
          <lpage>1801</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Sahana</surname>
            <given-names>Shetty</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaveri</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Kari</surname>
            and
            <given-names>Jayantkumar. A.</given-names>
          </string-name>
          <string-name>
            <surname>Rathod</surname>
          </string-name>
          ,
          <article-title>Detection of Diabetic Retinopathy Using Support Vector Machine (SVM</article-title>
          ) ,
          <source>International Journal of Emerging Technology in Computer Science &amp; Electronics (IJETCSE) ISSN: 0976-1353</source>
          Volume 23 Issue 6
          <article-title>-OCTOBER 2016 (SPECIAL ISSUE)</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Wei</surname>
            <given-names>Yu</given-names>
          </string-name>
          , Tiebin Liu, Rodolfo Valdez, Marta Gwinn, Muin J Khoury,
          <source>Application Methods in Prediction of Diabetes in Iran. Healthcare informatics research</source>
          .
          <volume>19</volume>
          .
          <fpage>177</fpage>
          -
          <lpage>85</lpage>
          .
          <fpage>10</fpage>
          .4258/hir.
          <year>2013</year>
          .
          <volume>19</volume>
          .3.177.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Kumbhare</surname>
            ,
            <given-names>Trupti A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Santosh</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Chobe</surname>
          </string-name>
          .
          <article-title>“An Overview of Association Rule Mining Algorithms</article-title>
          .” (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Umang</surname>
            <given-names>Soni</given-names>
          </string-name>
          , Sushma Behara, Karthik Unni Krishnan, Ramniwas Kumar,
          <article-title>Application of Association Rule Mining in Risk Analysis for Diabetes Mellitus</article-title>
          ,
          <source>International Journal of Advanced Research in Computer and Communication</source>
          Engineering Vol.
          <volume>5</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>4</given-names>
          </string-name>
          ,
          <string-name>
            <surname>April</surname>
            <given-names>2016</given-names>
          </string-name>
          , ISSN (Online)
          <fpage>2278</fpage>
          -
          <lpage>1021</lpage>
          ISSN (Print) 2319
          <fpage>5940</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>Sheenal</given-names>
          </string-name>
          &amp; Patel,
          <string-name>
            <surname>Hardik.</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Survey of Data Mining Techniques used in Healthcare Domain</article-title>
          .
          <source>International Journal of Information Sciences and Techniques</source>
          .
          <volume>6</volume>
          .
          <fpage>53</fpage>
          -
          <lpage>60</lpage>
          .
          <fpage>10</fpage>
          .5121/ijist.
          <year>2016</year>
          .
          <volume>6206</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Parastoo</surname>
            <given-names>RAHIMLOO</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahmad</surname>
            <given-names>JAFARIAN</given-names>
          </string-name>
          ,
          <article-title>Prediction of Diabetes by Using Artificial Neural Network, Logistic Regression Statistical Model</article-title>
          and Combination of Them, Bulletin de la
          <source>Société Royale des Sciences de Liège</source>
          , Vol.
          <volume>85</volume>
          ,
          <year>2016</year>
          , p.
          <fpage>1148</fpage>
          -
          <lpage>1164</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>