Machine Learning Methods in Medicine Diagnostics
                                    Problem

                      Strilets Viktoriia1[0000-0002-2475-1496], Bakumenko Nina1[0000-0003-3496-7167],
                     Donets Volodymyr1[0000-0002-5963-9998], Chernysh Serhii2[0000-0002-1750-5158],
                   Ugryumov Mykhaylo1[0000-0003-0902-2735], Goncharova Tamara3[0000-0003-3210-3867]
                              1 V. N. Karazin Kharkiv National University, Kharkiv, Ukraine
                      2 National Aerospace University “Kharkiv Aviation Institute”, Kharkiv, Ukraine
                         3 National University of Civil Defence of Ukraine, Kharkiv, Ukraine

                      striletsvictoria@gmail.com, n.bakumenko@karazin.ua,
                           vovan.s.marsa@gmail.com, 91sergey@gmail.com,
                    ugryumov.mykhaylo52@gmail.com, super-gusenichka@ukr.net


                      Abstract. Medical service improvement has always been a life topical problem.
                      To decide it, we must continuously raise the competency of doctors on the one
                      hand and it is necessary to develop new methods and approaches which could
                      help take decisions concerning diagnostics (classification) of patient health
                      conditions and concerning patient’s further treatment.
                          At the paper the machine learning methods for patient health condition clas-
                      sification were considered. These methods were Naive Bayes Classifier, Linear
                      Classifier, Support-vector machine, K-nearest Neighbor Classifier, Logistic Re-
                      gression, Decision Tree Classifier, Random Forest Classifier, Ada Boost Classi-
                      fier and Artificial Neural Network. A radial basis network was chosen from the
                      variety of artificial neural system architecture to solve classification tasks.
                          The problem of patient health conditions classification was considered for
                      two sets of laboratory research results: on liver diseases and on urological dis-
                      eases.
                          Confusion matrixes and ROC-curves were taken to estimate classification
                      quality of patient health conditions with above-mentioned methods.

                      Keywords: medicine diagnostic, machine learning, artificial neural network,
                      ROC-curve, confusion matrix.


              1       Classification Methods of the Complex Dynamic System State

              Origin and evolution of errors in complex systems are the complex dynamic process.
              Experts cannot always predict their origin exactly and define the type of failure.
              However, bringing the system back to normal mode is paramount importance. Control
              and prediction over the state of complex dynamic systems help specialists take effec-
              tive measures. Thus, great attention is paid to the classification of complex system
              states. Today, we have many publications describing methods of settling the problem


Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
of complex system states classification. We’ll consider the ones which are mostly
frequently viewed.
    Naïve Bayes Classifier is a simple probabilistic classifier, which relies on applying
the Bayes theorem with ‘naïve’ assumption of mutual sign independence. It applies to
the simplest models of the Bayes network. Naïve Bayes method developments started
in the distant 1960s and are still a popular method of text categorization (e.g. a scien-
tific text, a fiction literature text, spam etc.) [1]. This method is also applied for the
automated medical diagnostics [2].
    Advantages of the method:

─ in the little data sets, it can achieve better results than other classifiers because of
  low tendency to retraining;
─ linear scalability on the quantity of possible signs; a slight renovation on new edu-
  cational data is also possible;
─ the method can process overlooked data by retraining and forecasting;
─ despite the fact that the assumption about sign independence is often false, the
  Bayes classifier can independently estimate each sign class; it makes it possible to
  avoid the problem of large dimension [3].

   Disadvantages of the method:
─ Naïve Bayes evidently assumes that all the signs are mutually independent, which
  is almost impossible in reality;
─ if the variable has the textual data set category which was not observable in the
  learning data set, the model will set the probability 0 and won’t be able to make
  forecasting;
─ quality of work is sensitive to the class distribution representativeness in the total
  package.

   Linear classifier is a method of machine learning, which takes decisions about the
class by relying on the linear behavior combination values; they are usually represent-
ed as a vector. Subdiving into classes in the multidimensional space can be made by
dividing with a straight or n-dimensional plane. These classifiers work well for practi-
cal problems, e.g. for document classification. Moreover, the method can achieve the
non-liner classifier results within less time on learning and use [4].
   Advantages of the method:

─ linear classifier is often used when the speed of classification matters, since it is the
  fastest classifier, especially if the input vector is very big;
─ fast method realization and low requirements for operative memory and the central
  processor.

   Disadvantage of the method: linear character of the method doesn’t make it possi-
ble to define the class exactly when it is impossible to clearly discriminate between
the classes, as data distribution is usually mixed and demands non-linear separation.
   Support-vector machine (SVM) is a controlled learning model with a tutor, and
usually used for classification and regression analysis. The method was proposed by
Vapnik V. and Chervoneniks A. in 1963. Allowing for the set of educational exam-
ples, each of which earlier attributed to one of the two categories, the learning algo-
rithm SVM builds a model, which can assign a new example to a specific category.
The SVM model is a presentation of examples as points in space, represented so that
the examples of separate categories are divided into discrete highest possible inter-
vals. Then, new examples are represented in the same space; they are assumed to be
in the category which is based on that side of the space where they belong [5].
    SVM can be used for solving various real tasks:

─ for categorization of a text and hypertext, since it diminishes the need in marked
  learning data [6];
─ for image classification [7];
─ for cursive identification [6];
─ in biology and other sciences. It was used for protein classification and gave 90%
  classification correctness [6].

  Advantages of the method:
─ retraining problem is not so important as with other methods;
─ SVM doesn’t heavily depend on computer memory;
─ SVM works rather effectively in the cases when the task dimension exceeds the
  number of examples.

  Disadvantages of the method [8]:

─ the methods are characterized by high calculation complexity. As compared to
  other simple methods (K-NN, Decision Tree, Naïve Bayes Classifier), the method
  requires more time for learning;
─ the major problem is the choice of the most appropriate central function. Various
  central functions give different results for each data set;
─ SVM has bad results in the case of noise present (target classes have no distinct
  partition boundary).

   Relevance Vector Machine (RVM) is a method of machine learning, which uses a
Bayes conclusion to obtain decisions on the principle of economy for regressive and
probabilistic classification [9].
   Advantages of the method:

─ RVM has the identical functional form with SVM, but it provides probabilistic
  classification;
─ Bayes RVM base makes it possible to avoid SVM independent parameter sets,
  which generally require post-optimization, based on cross check.

   The main disadvantage of the method is that it employs a learning method, resem-
bling expectance maximization, so it may give a local extremum. At the same time,
standard algorithms on the basis of successive minimum, used in SVM, will find
global extremum.
   K-Nearest Neighbor Classifier (k-NN) is a nonparametric method used for classifi-
cation and regression. In both cases, the input consists of k nearest educational exam-
ples in a function space. In B classification, k-NN output is a notion of class. The
object is classified on the majority vote with its nearest neighbors. At this time, the
object is assigned to the class which prevails over its nearest neighbors (k is a whole
number, as a rule, small), If k equals 1, the object is simply assigns to the class of this
nearest exclusive neighbor [10].
   Neighbors are taken from the set of objects, for which the class (for k-NN classifi-
cation) or object property value (for k-NN regression) is known. It is considered to be
the learning algorithm set, though a distinct preparation step is not needed [10].
   The specific feature of k-NN method is that it is not sensitive to the data local
structure [10].
   Advantages of the method:

─ absence of the education step. The method saves the learning data set and learns
  only in real time forecasting. It makes the algorithm k-NN much faster than other
  ones which require learning, for example, SVM, Linear Regression etc.;
─ new data are easy to add because the k-NN algorithm doesn’t require preparation;
  this won’t influence the method accuracy;
─ k-NN is very simple to realize, it needs only two parameters: the number of classes
  k and the distance function (e.g. Euclidean, Manhattan etc.).

  Disadvantages of the method:
─ it works poorly with large data sets. In large data sets, the complexity of calculat-
  ing the distance between the new point and each existing point is enormous; it
  worsens algorithm efficiency;
─ it requires function scaling (standardization and normalization) before applying the
  k-NN algorithm to any data set. If we don’t do this, k-NN may generate wrong
  forecasts;
─ it is sensitive to noisy data, absence of values. It is necessary to inscribe omitted
  values or erase remainders manually.

    Logistic regression is a statistical model, which uses the logistic function for mod-
elling binary dependent variable in its basic form. The logistic regression measures
interdependence between categorically dependent variable and one or several inde-
pendent variables by evaluating probabilities with a logarithmic function [11].
    Logistic regression can be considered as a special case of generalized linear model
and, thus, similar to the linear regression. However, the logistic regression model is
based on the assumptions of dependent and independent variable interdependence.
The key differences between these models can be seen in the following two logistic
regression peculiarities. Firstly, conditional distribution (y|x) is a Bernoulli distribu-
tion, but not gaussian, because the dependence curve is binary. Secondly, the forecast
values are probable and, thus, are limited (0,1) through a logistic distribution function
because logistic regression assumes concrete result probability, but not pure results.
    Advantages of the method [12]:
─ logistic regression works well when the data set is linearly separable (which is
  common for k-NN and Linear Regression);
─ logistic regression has less tendency for retraining, but retraining can appear in big
  dimension data sets: regularization methods are generally used to solve this prob-
  lem;
─ logistic regression can forecast not only the final class, but show the interconnec-
  tion between input data and a resulting class;
─ logistic regression method is simple in realization, interpreting, and is effective in
  learning.

  Disadvantages of the method [12]:

─ the principal limitation is the assumption about linearity between the dependent
  variable and independent variables:
─ if the quantity of observations is less than the quantity of variables, the logistic
  regression shouldn’t be used because this can lead to overlearning:
─ logistic regression can be used just to forecast discrete functions. Therefore, the
  dependent logistic regression variable is limited by a discrete number set.
   Decision Tree Classifier [13] is the machine learning method, which uses a deci-
sion tree model for classification. The tree model, where the target variable can take a
discrete value set, is also called the classification tree. In these tree structures, the
leaves represent the class marks and the branches – the combination of signs leading
to these class marks. The decision trees, where the target variable can take permanent
values (real numbers, as a rule) are called the regression trees.
   Advantages of the method:

─ decision tree is simple to understand, is easy to represent graphically [13];
─ it is capable of processing numerical and categorical data as well;
─ it requires a small data preparation. Other methods often demand data normaliza-
  tion. Fictitious variables are not necessary here because the trees can work with
  qualitative forecasts;
─ possibility for checking the models by statistic tests. It makes it possible to take
  into account the model reliability;
─ a non-statistical approach, which doesn’t foresee assumptions concerning learning
  data and forecast remainders e. g. no assumptions as to distribution, independence
  or constant dispersion;
─ it works well with big data sets;
─ the mirror of human decision taking is closer than other approaches [13]. This may
  be useful when modelling human decisions / behavior.

  Disadvantages of the method:

─ the trees can be very unstable. A little change of learning data can lead to the
  change of a tree and consequently of final forecasts [13];
─ it is known that the problem of studying the decision optimal tree is NP-complete
  in several optimality aspects and even for simple conceptions [14]. Thus, the learn-
  ing algorithms of the decision practical tree are based on the heuristic, such as a
  greedy algorithm, where the locally optimal decisions are taken at every unit. Such
  algorithms cannot guarantee obtaining optimal decisions on the whole decision
  tree. To decrease the locally optimum greedy algorithm, e. g. the double infor-
  mation distance tree was suggested [15];
─ for the data including categorical variables with different level quantities, the deci-
  sion tree information gain is biased in favor of big level attributes. However, the
  problem of biased choice is resolved, e.g. by a conditional conclusion approach.

   Random Forest Classifier is an ensemble method of classification, regression,
which works by constructing many decision trees during learning and withdrawal of
the class which is a regime class (classification) or average forecast (regression) of
separate trees [16, 17]. The first algorithm of random decision forests was created by
Tin Kam Ho [16] on the basis other random subspace method [17], which, as formu-
lated by Ho, is the means of realizing "stochastic discrimination" approach to the
classification proposed by Eugene Kleinberg [18].
   Advantages of the method:

─ random forest is based on the stacking algorithm and uses the ensemble learning
  technics;
─ random forest works well both with categorical and persistent variables;
─ random forest can automatically process missing values;
─ it doesn’t demand function scaling (standardization and normalization), since this
  method demands the approach based on rules instead of calculating distance;
─ the random forest algorithm is stable.
  Disadvantages of the method:

─ complexity. Random forest creates many trees (unlike only one tree in the case of a
  decision tree) and comprises their results: e.g. on default it creates 100 trees in the
  sklearn Python library. This method demands much more computing power and re-
  sources;
─ longer learning time: random forests require much more time for preparation com-
  pared to decision trees, since they generate many trees (instead one tree in the case
  of a decision tree) and take decision by a majority of votes.

    AdaBoost Classifier is the machine learning meta-algorithm formulated by Ioav
Freindom and Robert Shapiro. It can be used with many other classification algo-
rithms for productivity improvement. The output of other classification algorithms
(weak classifiers) is assembled into a weighted sum, which is a finite output of the
accelerated classifier. AdaBoost is adaptable: weak classifiers are adjusted in favor of
those cases which were classified earlier. AdaBoost is not very sensitive to the data
noise. In some tasks it can be less sensitive to the retraining problem, than other learn-
ing algorithms.
    Each classification algorithm usually corresponds to some types of tasks better
than others and, as a rule, has great number of various parameters and configurations,
which must be corrected before achieving optimal data set productivity. AdaBoost,
alongside with decision trees as weak classifiers, are often called the best classifier
[19]. When using decision trees, the information collected in every phase of the Ada-
Boost algorithm about the relative rigidity of each learning pattern is put to the tree
building algorithm, so that the later trees, as a rule, are concentrated on more im-
portant examples to classify.
   Advantages of the method:

─ weak classifiers for cascading are easy to use;
─ various classification algorithms can be used as weak classifiers;
─ AdaBoost has high accuracy;
─ AdaBoost isn’t sensitive to the data noise.

  Disadvantages of the method:
─ quantity of AdaBoost iterations are also determined by the quantity of weak classi-
  fiers, which can be defined with cross check;
─ data imbalance results in lower classification accuracy;
─ learning takes longer time.

   Artificial Neural Network (ANN) is a computing system, which is inspired by bio-
logical neural networks. Such systems ‘learn’ to decide tasks, considering examples
and, as a rule, are not programmed to perform concrete tasks.
   ANN is based on the combination of coupled units or packs called artificial neu-
rons, which freely model neurons in the biological brain.
   The primary aim of ANN approach consisted in solving problems as the human
brain does. However, in due course, ANN application shifted to deciding variety of
tasks, including computer vision, speech recognition, machine translation, filtering in
the social network, game boards and video-games, medical diagnosis, and even in
schools which are traditionally considered to be human activities (e.g. painting) [20].
   Advantages of the method [21]:

─ information storage on the whole network. Disappearance of some information
  fragments in one place don’t impede the network function;
─ ability to work with insufficient knowledge;
─ fault tolerance: damage of one or several ANN cells won’t impede the data output;
─ possibility to learn: artificial neural networks study events and take decisions, us-
  ing such events;
─ possibility of parallel processing.

  Disadvantages of the method [21]:

─ estimation of proper network structure: there aren’t concrete rules to define struc-
  ture of artificial neural networks. The network structure is chosen by relying on the
  practical experience or trial-and-error method;
─ ANN can work only with numerical data. The data must be converted into numeri-
  cal values before introducing into ANN.
   To solve the classifying problem of the patient’s state the Radial Basis Function
Network was chosen. In this Network a multiple logistic regression was used [22].
This allows classification by more than two classes. And as the learning algorithm the
stochastic approximation algorithm with deep learning elements based on the ravine
conjugate gradient method was used. This Radial Basis Function Network was inde-
pendently implemented by the authors in the “ROD&IDS ®” computer decision sup-
port system, designed to solve the problem of diagnosing, classifying and optimizing
systems and processes.


2      Problem Statement

                                                                   
Let the condition multidimensional matrix be known X = xi , j , (i = 1..I , j = 1..J ) ,
where І is the quantity of checked patients and J is the quantity of state characteristics
(variable) to be measured. The majority of the examined methods require normalizing
input data; centering and normalizing are done according to the formula
                                          (
                                   xi0, j = xi, j − X j   ) ,j

where <Xj> is the average of j-state attribute, σj is its quadratic mean deviation. The
task of building a classification model of the patient state: the vector function is given
by a set of learning couples  X , d  , p=1..P, with input dimension vectors H0 and
                                  ( 0)

                                      p
output dimension Hk+1. It is necessary to build the mathematical vector function
  ( K +1)  (0) 
Y          X  for the input data approximation.
               
                                                          
   We formulate the classification problem. Let X be the variable vector, which de-
scribes the state of a patient and M – multitude of scenarios (possible state classes).
                               
According to the values of X vector, the current state is related to one of the multi-
tudes Rm, where m=0..M-1. It is necessary to find such m-scenario, for which the max-
imal distribution density of the conditional appearance probability in m-scenario:
                                                                     
                !m  Cm    X m Rm  (m = 0..M − 1) :   X m Rm  → max ,
                                                                       
                      
where Cm    X m Rm   is the multitude of m-indices of distribution density of the
                       
conditional appearance probability in m-scenario.
   Let us consider the medical-biological system. The final state of patients and a set
of parameters describing it are characteristic for each medical treatment stage. Take
the hypothesis that the state of a patient is definitely defined by this set of parameters.
Therefore, the task of checking health state is reduced to the task of classification of
patient status variables. Let’s examine the application of above-mentioned methods
for solving the task of patient state variable classification. We’ll estimate and compare
the quality of classification by these methods.
3      Methods of Estimating Classification Quality

A confusion matrix is used in machine learning to solve classification problems for
productivity visualization and algorithm work quality (usually learning with a teach-
er) [23]. Each line of the fault matrix is a copy in a forecast class and each column is a
copy in a real class 9 (or vice versa). The matrix name comes from the fact that it
helps vividly see whether the resultant classes are mixed or not, i.e. whether the one
class is defined as the other.
   We consider fault matrix building for the problem of binary classification. Let the
classification result be designated as positive (p) and negative (n). The binary classifi-
er has four possible results. If the classification result is p and the actual meaning is p,
then the result is called real positive (TP). If the classification result is p and the actual
meaning is n, then the result is called confusion (fault) positive (FP). Similarly, the
result is called real negative (TN), if the classification result and real meaning are n,
and it is called confusion negative (FN) if the classification result is n, but the real
meaning is p.
   Suppose we carried out an experiment for P positive copies and N negative cases.
The classification results can be summarized in in the fault matrix, shown in Fig. 1.
   ROC curves are also used to estimate classification quality. An ROC curve is a di-
agram which helps estimate binary classification quality. It is defined by the correla-
tion between the quantity of objects from the total amount of sign media classified as
true sign media (classification algorithm sensitivity) and the number of objects from
the total amount of sign media with no sign, classified mistakenly as sign media [24].


                  Fig. 1. Possible meanings defined by the confusion matrix.

  ROC quantitative interpretation gives an AUC indicator; it is the area limited by a
ROC curve and the axis, which equals to fault positive classifications. The higher
AUC indicator the better a classifier works. The value less than 0.5 shows that the
classifier acts vice-versa: in the case of positive classifications it calls them negative,
and the negative classification is represented as positive [23]. There exist many classi-
fications of ROC curves for classification estimations according to more than 2 clas-
ses and also the ones which with a diagram help estimate the drawbacks of the current
classification modes. Fig. 2 shows two diagrams, which characterize work of two
classification algorithms.


                Fig. 2. ROC-curves which compare work of two algorithms.

   The diagram clearly shows which class was better recognized as apposite, which is
suitable for model classification adjustment.


4      Methods Comparison for Medical-Biological System State
       Classification

Let’s examine a patient within a period of medical treatment. To make the diagnosis
more accurate, we formulate the problem of patient state classification: to define the
current state of the patient (healthy or sick) according to the laboratory research rec-
ords and primary health examination. The problem was solved for two data sets on
liver disease and urological disease. These data provided by the Department of Infec-
tious, Pediatric and Oncological Urology, Kharkiv Medical Academy of Postgraduate
Education.
   The urological disease sampling contained information for 40 patients. These data
were divided into learning (30) patients and testing (10 patients). The information for
one patient consisted of 47 estimated characteristics with the values of three types:
real, Boolean and enumerated numbers.
   The liver disease sampling consisted of the information for 590 patients. Learning
sampling was taken from 420 patients, testing sampling – from 170 patients. The
information for one patient consisted of 10 estimated characteristics with the values of
three types: real, Boolean and enumerated numbers.
    To solve classification problems, we used Naïve Bayes Classifier, K-nearest
Neighbor Classifier, Logistic Regression, Random Forest Classifier, Ada Boost Clas-
sifier and Radial Basis Function Network.
    Learning and testing problems made it possible to error matrices and ROC-curves
for classification quality analysis. For example, on fig. 3 and 4 the error matrices and
ROC-curves of the Naïve Bayes Classifier results are presented.


Fig. 3. ROC-curve and confusion matrix based on the method Naïve Bayes Classifier for pa-
tients with urological diseases.

   For the first data set on urological diseases, Logistic Regression, AdaBoost Classi-
fier and Radial Basis Function Network give 100% classification accuracy with ROC
AUC=1, the Random Forest Classifier method – 96.6% classification accuracy and
ROC AUC=1, Naïve Bayes Classifier – 73.3% classification accuracy and ROC
AUC=0.97, K-nearest Neighbor Classifier – 80% classification accuracy with ROC
AUC=0.82.


Fig. 4. ROC-curve and confusion matrix based on the method of Naïve Bayes Classifier for
patients with liver diseases.

   The second data set for liver diseases cardinally differs from the first in dimensions
(4 times less attributes, but 18 times more recordings). For classification, we used the
same methods. The obtained results are: Naïve Bayes Classifier method – 60% classi-
fication accuracy and ROC AUC=0.73, K-nearest Neighbor Classifier – 81.7% classi-
fication accuracy and ROC AUC=0.898, Logistic Regression – 80.98% classification
accuracy and ROC AUC=0.787, Random Forest Classifier – 98,86% classification
accuracy and ROC AUC=0.99, Ada Boost Classifier – 85.7% classification accuracy
and ROC AUC=0.94, Radial Basis Function Network – 80.56% classification accura-
cy and ROC AUC=0.801.
   Thus, the most qualitative data classification about the state of patients’ status was
given by Random Forest Classifier method, it showed high accuracy and ROC AUC
indicator for both data sets.


5      Conclusions

Diagnosing complex dynamic system states, e. g. medical-biological system (patient),
faces the problems of system state classification. The work studied methods of decid-
ing classification tasks, such as Naïve Bayes Classifier, K-nearest Neighbor Classifi-
er, Logistic Regression, Random Forest Classifier, Ada Boost Classifier and Artificial
Neural Network (Radial Basis Function Network architecture). Confusion matrices
and ROC-curves were taken for quality classification estimation.
    The Radial Basis Function Network differs from the classical one in that it uses
multivariate logistic regression and a recurrent learning algorithm with deep learning
elements. The Network application allows to not depend on the data type and expert
opinion during making decisions.
    As an example, we considered two data sets, which characterized the state of pa-
tients with liver and urological diseases. As a result, all the methods gave classifica-
tion accuracy more than 80% except for Naïve Bayes Classifier. Radial Basis Func-
tion Network showed the best classification quality with 100% accuracy for urologi-
cal diseases and the Random Forest Classifier method showed the best classification
quality with 98.86% for liver diseases.
    Further, we are planning to test the methods which showed the best classification
for other data sets with different dimensions. The authors are also working out a mod-
ification of the method by using Radial Basis Function Network to improve its accu-
racy for various input data.


References
 1. Maron, M.E.: Automatic Indexing: An Experimental Inquiry. Journal of the ACM 8(3),
    404–417 (1961).
 2. Rish, I.: An empirical study of the naive Bayes classifier. IJCAI Workshop on Empirical
    Methods in AI (2001).
 3. Niculescu-Mizil, A., Caruana, R.: Predicting good probabilities with supervised learning.
    ICML (2005).
 4. Guo-Xun Yua, Chia-Hua Ho, Chih-Jen Lin: Recent Advances of Large-Scale Linear Clas-
    sification. Proc. IEEE, 100(9) (2012).
 5. Cortes, C., Vapnik, V.N. Support-vector networks. Machine Learning, 20(3), 273–297
    (1995).
 6. Pradhan, Sameer S., et al.: Shallow semantic parsing using support vector machines. Pro-
    ceedings of the Human Language Technology Conference of the North American Chapter
    of the Association for Computational Linguistics: HLT-NAACL 2004 (2004).
 7. Barghout, L.: Spatial-Taxon Information Granules as Used in for Image Segmentation.
    Granular Computing and Decision-Making. Springer International Publishing, 285–318
    (2015).
 8. Divya, T.: A survey on Data Mining approaches for Healthcare. International Journal of
    Bio-Science and Bio-Technology, 5(5), 241-266 (2013).
 9. Tipping, M.E.: Sparse Bayesian Learning and the Relevance Vector Machine. Journal of
    Machine Learning Research, 1, 211–244 (2001).
10. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression.
    The American Statistician, 46(3), 175–185 (1992).
11. Rodríguez, G.: Lecture Notes on Generalized Linear Models. Chapter 3, P. 45. (2007).
12. Kumar, N.: Advantages and Disadvantages of Logistic Regression in Machine Learning.
    The Professionals Point (2019).
13. Gareth, J., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning.
    Springer, New York, P. 315. (2015).
14. Hyafil, L., Rivest, R.L.: Constructing Optimal Binary Decision Trees is NP-complete. In-
    formation Processing Letters, 5(1), 15–17 (1976).
15. Ben-Gal I., Dana A., Shkolnik N.: Efficient Construction of Decision Trees by the Dual In-
    formation Distance Method. Quality Technology & Quantitative Management, 11(1), 133–
    147 (2014).
16. Ho, Tin Kam: Random Decision Forests. Proceedings of the 3rd International Conference
    on Document Analysis and Recognition, Montreal, QC, pp. 278–282. (1995).
17. Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. IEEE Trans-
    actions on Pattern Analysis and Machine Intelligence, 20(8), 832–844 (1998).
18. Kleinberg, E.: Stochastic Discrimination. Annals of Mathematics and Artificial Intelli-
    gence, 1(1–4), 207–239 (1990).
19. Kégl, B.: The return of AdaBoost.MH: multi-class Hamming trees, (2013).
20. Bethge, M., Ecker, A.S., Gatys, L.A.: A Neural Algorithm of Artistic Style (2015).
21. Schmidhuber, J.: Deep Learning in Neural Networks: An Overview. Neural Networks, 61,
    85–117 (2015).
22. Strilets, V., Bakumenko, N., Chernysh, S. ets. Application of the c-means fuzzy clustering
    method for the patient’s state recognition problems in the medicine monitoring system. In-
    telligent Systems and Computing Integrated Computer Technologies, pp. 173-185 (2020).
23. Stehman, S.V.: Selecting and interpreting measures of thematic classification accuracy.
    Remote Sensing of Environment, 62(1), pp. 77–89 (1997).
24. David, M.W.: Powers. Evaluation: From Precision, Recall and F-Measure to ROC, In-
    formedness, Markedness & Correlation. Journal of Machine Learning Technologies, 2(1),
    37–63 (2011).