Development of Heart Disease Diagnosis Concept Using
Machine Learning
Bohdan Kovala, Iulia Khlevnaa and Svitlana Shabatskab
a
    Taras Shevchenko National University of Kyiv, Volodymyrska str., 60, Kyiv, 01033, Ukraine
b
    Bogomolets National Medical University, T. Shevchenko boulevard, 13, Kyiv, 01601, Ukraine

                 Abstract
                 The work is devoted to the development of a classification model using machine learning to be
                 used in the applied industry of healthcare. It establishes prospects of modern information
                 processing methods for medicine. The research was conducted on the basis of patients’ medical
                 records statistics to diagnose heart diseases. It proposes a conceptual approach to the
                 application of machine learning to solve this problem, and forms the tools and the approach to
                 a determination of connection between medical data records attributes and formation of results
                 (diagnostics). The existing expert systems for the detection and diagnosis of heart diseases
                 have been processed and analyzed. The work includes a research analysis of a set of medical
                 data, which may be taken as a basis of operations and calculations with data that requires
                 transparency and predictability. It constructs examples of machine learning models for heart
                 disease detection with classification accuracy in the range of 80-89%. It gives the analytics and
                 explanations of the received models, calculates a diagnosis outcome and what factors and to
                 what extent influence it, therefore the models are not black boxes and can be clearly examined
                 by clients (patients and doctors). The work outlines the perspectives and potential ways of
                 development of the formed data preparation, analysis, and research concept.

                 Keywords 1
                 machine learning, classification, anomalies detection, data visualization, data mining

1. Introduction
   A large number of modern practical problems belong to the class of constraint satisfaction problems
[1]. The target functions in such tasks are, as a rule, undifferentiated and (or) poly-extremal
dependencies. The use of classical methods of continuous optimization and in many cases discrete
optimization is impossible [2]. Сombinatorial optimization methods, evolutionary algorithms, etc - are
used to solve such tasks. The functional dependencies can be set tabularly or algorithmically. In this
case evolutionary algorithms are most often preferred.
   Exactly, the use of this algorithms does not require strict target functions constraints, but does not
guarantee the finding of a global optimum, although according to certain conditions there is a
probability convergence. The obtained solutions are considered suboptimal.
   Historically, the first methods of evolutionary optimization were genetic algorithms and
evolutionary strategies [3, 4]. These methods allowed to consider optimization problems differently and
expanded the subject base of optimization technologies. Also, these methods were based on the ideas
of natural evolution. In particular, genetic algorithms traditionally use the principle that the best parents
tend to have better children.
    Two parental potential solutions are involved in generating potential solutions-offspring. In
evolutionary strategies, potential offspring solutions are generated around a single parent solution. This


Information Technology and Implementation (IT&I-2021), December 01–03, 2021, Kyiv, Ukraine
E-MAIL: bohkoval@gmail.com (A.1); yuliya.khlevna@gmail.com (A.2); sveta.shabatska@gmail.com (A.3)
ORCID: 0000-0002-3757-0221 (A.1); 0000-0002-1807-8450 A.2); 0000-0002-9885-5546 (A.3)
           ©️ 2022 Copyright for this paper by its authors.
           Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                             122
is the main difference between genetic algorithms and evolutionary strategies in the generation of
offspring solutions.
    Our hypothesis is that more potential parent solutions that will be involved in generating potential
successive solutions will improve the accuracy of the solutions and speed up the convergence of the
optimum search algorithms. This progress will be achieved by in-depth study of potential promising
solutions and by reducing the number of algorithm steps in non-perspective directions.

2. Theoretical base and features of heart diseases detection and analysis using
   machine learning
2.1.    Definition of the subject area of research
    Of all the applications of machine learning, diagnosing any serious disease using complex and
complex mathematical models (which are mostly black boxes, because it is difficult to justify what are
the real factors behind a particular parameter and coefficient) is one of the most difficult tasks. [5]. If
the result of a model is a specific course of treatment (possibly with side effects), surgery, or no
treatment, people want to know why that particular result was obtained.
    Electrocardiography, echocardiography, myocardial images using nuclear medicine, bicycle
ergometry and carotid artery ultrasound are used to diagnose cardiovascular diseases. One of the biggest
difficulties in working with medical indicators is the data: their volume, structure and presentation [6].
    Since the early eighties of the last century, the so-called Medical Information Systems (MIS) -
systems of workflow automation in treatment and prevention facilities, which combine a decision
support system for medicine, electronic medical records, digital research data, financial and
administrative information - began to spread rapidly. We can only imagine how fast the annual data
growth in such systems is. In addition to the extremely large volume, the records in the system have
other features that prevent analysis: gaps in the data, inaccuracies or subjectivity of judgment. Thus,
there is a new term that we associate with the data subject to these characteristics - Big Data [7].
    Obviously, without innovative high-tech data analysis tools, the probability of error is very high,
because it is easy to miss an important factor, because attention was paid to another, less critical.
Specially trained algorithms are able to find connections and dependencies that previously went
unnoticed. An important aspect that distinguishes medical data from most others is that the quality,
objectivity, timeliness and accuracy of the results are critical and the result must be constantly
substantiated. Difficulties are also caused not only by the data itself, but also by the process of collecting
them. All data comes from different providers, so the data is often cluttered and unstructured. Dirty data
can quickly nullify complex analytics, especially when looking at information from different resources
that can record data in a variety of formats.

2.2. Analysis of methods for heart disease detection and diagnosis
    The healthcare information technology industry is changing daily. Therefore, organizations that
have not entered the market for some time may not be aware of the latest innovative proposals from
ambitious new "players" in the industry. As a result of rapid development, a new culture of scientific
advances in machine learning is emerging, enabling the healthcare sector to take advantage of a number
of revolutionary tools that use natural language processing, pattern recognition and in-depth training to
provide more qualified services. The market for data mining technologies based on the solutions they
try to offer can be divided into 4 main categories.
    1. Analysis of pathology images. Improving the analysis of pathology images through machine
learning is of particular interest to healthcare organizations. Machine learning can complement the skills
of radiologists by showing more serious changes in faster scanning of images, which can potentially
lead to earlier and more accurate diagnosis of diseases. A number of technology manufacturers have
already begun to invest heavily in image analysis and analysis projects.


                                                                                                         123
    IBM Watson is deploying a clinical imaging service to help identify aortic stenosis, while Microsoft
has focused on biomarker phenotyping research to complement its cancer research [8].
    Scientific institutions are also part of the first level of modern pattern recognition. At Indiana
University and Purdue University, Indiana, researchers are uncovering machine learning algorithms on
pathology images to predict the recurrence rate of acute myelogenous leukemia. [9].
    At Stanford University, machine learning tools have shown better results than experts, distinguishing
between two types of lung cancer. The computer has also outperformed its competitors in predicting
patient life expectancy [10].
    2. Processing of natural language and text data. Unstructured data is everywhere in the healthcare
field, whether it was created on purpose or accidentally.
    Lab reports images in PDF format, voice recordings of consumer interactions, and text input with e-
mail are important issues for traditional analysis tools. Machine learning offers a new way to derive
value from these data sources. Using natural language processing (NLP) algorithms, machine learning
algorithms can convert text images into documents, edit them, extract semantic values from these
documents, or process user searches in plain text to increase the accuracy of the results.[11].
    3. Decision support and forecasting. The ability to extract values from large amounts of free text is
also crucial to support clinical decisions and prediction - another area where the development of
machine learning illuminates the road. Prompt detection and elimination of risks can significantly
improve outcomes for patients with any number of serious diseases, both clinical and behavioral.
    4. Disease diagnosis using machine learning methods. The medical field has always been and
remains one of the most important areas, into the development of which mankind puts a lot of effort.
This is primarily due to the high cost of error, because we are talking about human life. 90% of the
success of disease treatment depends on timely and correct diagnosis. For a long time, this responsibility
fell on the shoulders of doctors. But even the most experienced professionals are, above all, people.
First, a person cannot know everything, and second, there is a possibility of cognitive error due to
fatigue, stress or inattention. A separate number of problems are caused by medical data: their volume,
quality and presentation format. More than 80% of cognitive errors are caused simply by
misinterpretation of data. Here come to the aid of the latest technologies that use artificial intelligence
and machine learning as a diagnostic tool.
    The basis of medical diagnosis is the task of classification. The diagnosis can be reduced to a
problem with displaying data in one of N different results. In some cases N = 2: the task is only to
determine whether the patient has a certain disease (1) or not (0) on the basis of data (e.g. X-ray or
cardiogram). It is worth noting that, as a rule, preference is given to algorithms that the output has not
only the answer to which class the object belongs, but also the probability that the object belongs to
each class [12, 13]. From the conducted analysis it is established that it is appropriate to form the
concept of diagnostics of heart diseases, developing methods of machine learning. In particular,
algorithms based on the knowledge and analytical abilities of one or more experts in the field of
medicine with the formation of a system that is able to draw logical conclusions based on this
knowledge, thereby providing conclusions in the diagnosis of heart disease.

2.3. Indicators of a heart disease detection and analysis in the expert system
   Medical information system (expert system in medicine) in the narrow sense - a set of technical
means and mathematical software designed to collect, analyze medical and biological information and
issue results in a user-friendly form. Depending on the type of tasks to be solved, MIS can be divided
into three groups: information and reference, information and logic, and management. The development
of any expert system in medicine should be based on some principles of building systems, the study
and formulation of which is engaged in systems engineering. The use of knowledge of modern medicine
is of paramount importance in the development of MIS. When developing a specific diagnostic or
prognostic system, it is necessary to consistently address a number of issues:
   1) choice of purpose and determination of the main purpose of the system;
   2) selection of the system structural scheme;

                                                                                                       124
    3) compiling a list of nosological forms and collecting statistically reliable information on the
symptoms of conditions;
    4) construction of a decisive rule for solving problems of evaluation of medical information and
issuance of conclusions on diagnosis and forecasting, i.e. development of the algorithmic basis of the
system.
    Data and knowledge processing is reduced to three main stages. At the first stage, the elements of
information are placed in certain structures - databases and knowledge bases. At the second stage, the
database and the knowledge base are streamlined: their structure changes, the order of placement of
information, the nature of the relationships between the elements of information. At the third stage carry
out operation of a DB: search for the necessary information, decision-making, editing of databases and
knowledge [14]. There are different algorithms for medical diagnosis. The simplest algorithm, which
often does not require the use of computers, is the so-called "tabular algorithm", based on the use of
special diagnostic tables. Other, potentially more effective, types of medical diagnostic algorithms:
    1) probabilistic method, which consists in calculating the probabilities of diseases according to the
Bayesian formula;
    2) Wald's method of sequential statistical analysis;
    3) method of searching for clinical precedent;
    4) phase interval method;
    5) logical basis method;
    6) algorithms based on neural networks.
    A fundamental feature of any, even with the use of powerful technical means, diagnosis: to achieve
100% reliability of the diagnosis is almost impossible due to the fundamentally difficult reasons of both
objective and subjective nature. Causes of an objective nature - individual characteristics of the
organism, atypical course of the disease, errors in measuring the signs, overlapping areas of the space
of signs. Causes of a subjective nature are the doctor's mistakes due to a number of factors [15].
    Expert systems are not only a technology, but also a philosophy of using medical knowledge. The
most useful feature of the expert system is that it uses high-quality experience of medical professionals
to solve the problem.

2. Development of heart disease detection and analysis models using machine
learning
2.1. Indicators formation for detection and analysis of heart disease in the
expert system
    A set of data on medical indicators of patients diagnosed with heart disease will be used for analysis.
The data set [16] consists of appeals to a medical institution that has agreed to participate in the
collection of relevant data. Therefore, the set includes data from both patients with confirmed heart
problems and those diagnosed with healthy ones. The database included basic indicators of the patient
(age, sex, etc.) and specialized medical (cardiogram results, etc.). Each record also contains an indicator
(flag), which indicates whether the patient's final heart problems are diagnosed or not, and the severity
of the disease on a scale from 0 to 4, where 0 - no disease, 4 - severe heart disease.
    The use of such data sets with machine learning algorithms can help to create models for diagnosing
and analyzing the occurrence and course of heart disease in patients, which in the future may contribute
not only to more effective treatment but also prevention and prevention of these diseases, which can be
expected to improve quality. and life expectancy. Also, around the created models it is possible to build
a service - an information system, which can be joined by medical institutions, individuals, etc.
    The data set contains a number of variables, the target of which is the state of the presence or absence
of heart disease. Names and numbers of patients' social insurance were removed from the database and
replaced with fictitious values. Dataset is a raw set of binary data, that will be used for the heart disease
diagnosis of patients. The initial data set (rawBinaryData) contains 77 values (property) for each record
(entity). The essence of the entity is the diagnosis of the patient, i.e. for each treatment and subsequent

                                                                                                        125
diagnosis of the patient a separate record is created, which includes patient data and diagnostic results.
Thus, each record is characterized by the patient's name and 76 attributes.
   In the final data set (tidy dataset), which will be used for further work, it is necessary to leave only
the attributes that may be relevant to our model and normalize / prepare them.
   In the final data set we will expect attributes in the normalized form (number # indicates the ordinal
number of the attribute in the initial data set):
        1. #3 (age)
        2. #4 (sex)
        3. #9 (cp)
        4. #10 (trestbps)
        5. #12 (chol)
        6. #16 (fbs)
        7. #19 (restecg)
        8. #32 (thalach)
        9. #38 (exang)
        10. #40 (oldpeak)
        11. #41 (slope)
        12. #44 (ca)
        13. #51 (thal)
        14. #58 (num) (attribute to diagnose)
   The algorithm of initial data processing is:
   1. We transform binary data into a text representation by decoding the binary representation using
'unicode_escape' encoding to avoid conflicts.
   2. Replace \n with ‘’ to place the data on the same order of attachments (so that there are no gaps in
the records).
   3. At this stage, the records are placed in one line and separated by a name. Therefore, to place each
entry on its own line, replace 'name' with \n. (name is a constant that does not tell us anything).
   4. We create dataFrame from raw data: we initialize the constructor with the received csv-like test
representation of the data, as a result we receive the normalized dataset of initial data.
   5.Change the values of categorical quantities to improve their interpretation.
   6. Check the data types. Some of them are not true (for example, the categorical values that we
changed in the previous step), we adapt them.
   7. We need to create dummy variables for categorical variables. For example, instead of having
"men" and "women", we will have a "man" with values of 0 or 1 (1 is a man and 0 is therefore a woman).
Normalizing thus a dataset, we receive the final dataset - tidyDataset.
   As a result, we got tidyDataset, saved in the file tidy_dataset.csv. The data set is ready for further
work, creation of models based on it and use in machine learning.

2.2. Modeling of heart diseases detection and analysis using machine learning
    In total, the system implements 6 stages of diagnosing heart disease in a patient. The information
system will help diagnose, analyze and predict users' heart disease. Below are presented and described
all 6 stages of system operation:
    1. Signal receivements. The system receives a request to process patient data through the ARI,
which can be connected to smart watches, medical systems, mobile devices. The connection is
encrypted to protect the client's private data.
    2. Data Processing. The system examines the obtained data, normalizes them and uses data mining
algorithms to extract useful data from the obtained ones. This helps the system to operate on what is
really important and ignore the so-called white noise.
    3. Building of mathematical models. Based on the prepared data, using machine learning
algorithms, we create and train models for the detection and prevention of heart disease.


                                                                                                       126
    4. Results formation. The system tests the obtained models, selects the most accurate and based
on them forms the final result of the diagnosis of the patient's heart disease. A comprehensive diagnostic
analytics is being created.
    5. Notification of results to a client. The client uses the web interface of the mobile device to
receive diagnostic results. The patient can also indicate other people who should get the result, e.g. your
doctor.
    6. Providing a report and recommendations for further action. On the screen of the device the
client is shown the complex report on the received diagnostics and recommendations are given.
    To implement the software product, the Python programming language and the ecosystem around it
were chosen. In combination with Python we will use the libraries implemented on its basis:
     ● pandas - data processing and analysis, the library provides data structures and operations for
table and time series manipulation, it is optimized for working with large amounts of data;
     ● numpy - adds support for large, multidimensional arrays and matrices, along with a vast
collection of high-level mathematical functions for operations with them;
     ● matplotlib - offers an object-oriented software interface for inserting various visualizations;
     ● scikit-learn - introduces various algorithms for solving problems of classification, regression
analysis, clustering, including the method of support vectors, K-nearest neighbors, random forests and
others;
     ● xgboost - framework to work with gradient boosting algorithms [13].
    We start with the research of the data set and identify patterns in it. Taking into account empirical
data on risk groups, we investigate the distribution by age and heart rate. As we can see on Fig. 1, there
are several age categories that are at risk. Interestingly, the statistics are not increasing with age, and
the most problematic ages are 41-45 years and 51-54 years.
    From Fig. 2 we can see that most people of this age also have an increased heart rate compared to
peers. During the study of the data it was found that it is effective to create fictitious variables based on
categorical, and therefore convert the variables `cp`, ` thal`, `slope`:
      dummyCp = pd.get_dummies(df['cp'], prefix = "cp")
      dummyThal = pd.get_dummies(df['thal'], prefix = "thal")
      dummySlope = pd.get_dummies(df['slope'], prefix = "slope")

                                           Heart Disease Frequency for Ages
          12
  Frequency


              0
                                                            Age
Figure 1: Distribution of heart diseases diagnosis by age (0 - no problems detected, 1 - problems
detected)
   Let's start with logistic regression. We can create and initialize the model ourselves, but for
convenience we use the scikit-learn package and import logistic regression from it:
      accuracies = {}


                                                                                                        127
   lr = LogisticRegression()
   lr.fit(x_train.T,y_train.T)
   acc = lr.score(x_test.T,y_test.T)*100
   We will demonstrate the learning process of the model and the obtained result (Fig. 3).
   We derive the result of the model - 84.39% accuracy. We initialize and train the model of K-nearest
neighbors:
   scoreList = []
   for i in range(1,20):
    knn2 = KNeighborsClassifier(n_neighbors = i
    knn2.fit(x_train.T, y_train.T)
    scoreList.append(knn2.score(x_test.T, y_test.T))


   Figure 2: Distribution of heart disease diagnosis by maximum heart rate


   Figure 3: The logistic regression result
    And demonstrate the retrieved outcome (Fig. 4). The KNN model showed 88.52% accuracy. We
initialize and train the model by the support vectors machine:
   from sklearn.svm import SVC
   svm = SVC(random_state = 1)
   svm.fit(x_train.T, y_train.T)
   Model Accuracy: 85.75%. Naive Bayes method:


                                                                                                  128
   from sklearn.naive_bayes import GaussianNB
   nb = GaussianNB()
   nb.fit(x_train.T, y_train.T)
   The model showed an accuracy of 83.19%. Decision trees:
   from sklearn.tree import DecisionTreeClassifier
   dtc = DecisionTreeClassifier()
   dtc.fit(x_train.T, y_train.T)
   The accuracy is 80.33%. We will also use the built-in RandomForest classifier and analyze its result
as well:
   from sklearn.ensemble import RandomForestClassifier
   rf = RandomForestClassifier(n_estimators = 1000, random_state = 1)
   rf.fit(x_train.T, y_train.T)


   Figure 4: Timeline of changes in the accuracy of the method of K-nearest neighbors.

   The accuracy is 88.18%. We initialize all the mentioned models and get the predicted values (Fig. 5):
   y_head_lr = lr.predict(x_test.T)
   knn3 = KNeighborsClassifier(n_neighbors = 3)
   knn3.fit(x_train.T, y_train.T)
   y_head_knn = knn3.predict(x_test.T)
   y_head_svm = svm.predict(x_test.T)
   y_head_nb = nb.predict(x_test.T)
   y_head_dtc = dtc.predict(x_test.T)
   y_head_rf = rf.predict(x_test.T)
   For a clearer presentation, we create and derive a matrix of errors of each model.
   from sklearn.metrics import confusion_matrix
   cm_lr = confusion_matrix(y_test,y_head_lr)
   cm_knn = confusion_matrix(y_test,y_head_knn)
   cm_svm = confusion_matrix(y_test,y_head_svm)
   cm_nb = confusion_matrix(y_test,y_head_nb)
   cm_dtc = confusion_matrix(y_test,y_head_dtc)
   cm_rf = confusion_matrix(y_test,y_head_rf)
   We will output the received result (Fig. 6). Thus, the best result was demonstrated by the model
based on the method K of the nearest neighbors. The model showed similar results with the Random


                                                                                                    129
Forest classifier, which is, in fact, a more complex and complex modification of the same method.
Therefore, we found that the K-nearest method showed the best result and can be recommended as a
priority for use in the expert system.

    90
Accuracy, %


         0
              Logistic Regression     KNN              SVM           Naïve Bayes    Decision Tree   Random Forest

              Figure 5: Comparison of estimates of the received models
                                                        Confusion Matrixes
                                                                                             Support Vector Machine
              Logistic Regression Confusion Matrix   K-Nearest Neighbors Confusion Matrix    Confusion Matrix


                                                      Decision Tree Classifier
               Naïve Bayes Confusion Matrix           Confusion Matrix                      Random Forest Confusion Matrix


              Figure 6: Confusion matrices of the received models.
  But now technology is a black box for us - we get the result, but we do not have a clear idea.
Therefore we visualize the received tree of decisions:
              export_graphviz (estimator, out_file='tree.dot',
                              feature_names = feature_names,
                              class_names = y_train_str,
                              rounded = True, proportion = True,
                              label='root',
                              precision = 2, filled = True)
              from subprocess import call
              call(['dot', '-Tpng', 'tree.dot', '-o', 'tree.png', '-Gdpi=600'])
              from IPython.display import Image
              Image(filename = 'tree.png')


                                                                                                                130
   Also, one of the important tools for understanding the model is to establish the influence of attributes
on the final result. Below we show how the values of attributes affect the classification of the state.
  perm = PermutationImportance(model, random_state=1).fit(X_test,
y_test)
  eli5.show_weights(perm, feature_names = X_test.columns.tolist())
    A fragment of the built decision tree show in Fig. 7. Thus, we established which factors most
influence the result (Fig. 8). We also visualize not only the relative influence of factors, but also the
absolute values that they acquire in a given result. As we see from Fig. 9, we can clearly see how the
value of the attribute affects the result and how the values with the same result are grouped. For
example, you can see that num_major_vessels directly affects the result, and the values of the attributes
are grouped so that the positive values of the result are on the left and the negative values are on the
right.


   Figure 7: A fragment of the built decision tree


   Figure 8: Impact of attributes on the received result


                                                                                                       131
3. Conclusion and perspectives
    The scope of machine learning applications is constantly expanding. Comprehensive informatization
leads to the accumulation of huge amounts of data in science, industry, business, transport, health care.
The resulting tasks of forecasting, management and decision-making are often reduced to learning by
precedent. Previously, when there was no such data, these tasks were either not set at all, or solved by
completely different methods. In this work, machine learning methods for solving practical problems
were investigated and implemented. The concept of detection and analysis of heart diseases has been
formed, which is proposed to be implemented in the form of an expert system.
    The work considered 6 different models for the classification of heart disease. In general, each model
has demonstrated high performance and can be used as an auxiliary mechanism for cardiologists and in
medical institutions in cardiology departments to diagnose heart disease. The best results were shown
by the K-nearest neighbors classifier with a result of 88.52% and the Random Forest classifier built on
a similar principle, which is actually directly related to the K-nearest neighbors as its further
development.
    Other models also showed good results in the range of 80-88%. Among the disadvantages of all
systems is the lack of 100% results, and therefore recommend the expert system on their basis as the
main tool is not possible. However, each of them can be a very useful addition to the usual, laboratory
diagnosis of diseases in the patient. They can be especially useful in combination, which can
theoretically improve results. The ensemble (combination) of these models, their integration with each
other can be the further development of the system implemented in the work.
    One of the important tasks was not only to build a machine learning model, but also to explain how
it works, what impacts the result and how it is calculated. Therefore, an algorithm for examining the
obtained result with the analysis and research of the impact of attributes and their visualization was
presented, which allowed us to investigate the received result in more detail and transparency.


   Figure 9: Distribution of attribute values depending on the result

                                                                                                      132
4.    References
[1]   Institute for Health Metrics and Evaluation. Global Burden of Disease (GBD 2019). 9 Dec. 2020.
     URL: http://www.healthdata.org/gbd/2019.
[2] Krucevych, T. Henderni. Osoblyvosti proyavu faktoriv ryzyku sercevo-sudynnyx zaxvoryuvan u
     cholovikiv i zhinok zriloho viku. Sportyvnyj visnyk Prydniprov'ya. 2019. № 3. S. 110-117. (In
     Ukranian).
[3] Daniel M. Study Suggests Medical Errors Now Third Leading Cause of Death in the U.S. [Internet
     resource].      Mode     of     access:    https://www.hopkinsmedicine.org/news/media/releases/
     study_suggests_medical_errors_now_third_leading_cause_of_death_in_the_us
[4] Yevropejs"ke rehional"ne byuro VOOZ. Doslidzhennya STEPS: poshyrenist" faktoriv ryzyku
     neinfekcijnyx zaxvoryuvan" v Ukrayini u 2019 roci. Kopenhahen, 2020. License: CC BY-NC-SA
     3.0 IGO. (In Ukranian).
[5] Koktysh L. The state of the art in health data analytics [Internet resource]. Mode of access:
     https://www.scnsoft.com/blog/health-data-analytics-overview
[6] Ty - Jourti. A Hybrid Intelligent System Framework for the Prediction of Heart Disease Using
     Machine Learning Algorithms VL - 2018PY - 2018DA - 2018/12/02DO - 10.1155/2018/3860146
     UR, [online]. URL: https://doi.org/10.1155/2018/3860146AB.
[7] Rashmee U. Shah and John S. Rumsfeld, Big Data in Cardiology European Heart Journal, vol. 38,
     no. 24, pp. 1865-1867, June 2017, [online]. URL: https://doi.org/10.1093/eurheartj/ehx284.
[8] IBM Watson Health [Internet resource]. URL: https://www.ibm.com/watson/health/oncology-
     and-genomics/oncology/
[9] Miotto R, Wang F, Wang S, Jiang X, Dudley JT. Deep learning for healthcare: review,
     opportunities and challenges. Brief Bioinform. 2018;19(6):1236-1246. doi:10.1093/bib/bbx044
[10] Fan J, Xing L, Ma M, Hu W, Yang Y: Verification of the machine delivery parameters of treatment
     plan via deep learning. Physics in Medicine and Biology, Early Access, 2020.
[11] Top 6 use cases of data science in healthcare, Jan. 2019, [online]. URL: https://bigdata-
     madesimple.com/top-6-use-cases-of-data-science-in-healthcare/.
[12] Cheryl Alexander and Lidong. Wang, Big Data Analytics in Heart Attack Prediction. Journal of
     Nursing & Care. 06. 10.4172/2167-1168.1000393, 2017.
[13] Khlevna І., Koval B. Fraud detection technology in payment systems. // IT&I 2020 – Information
     Technology and Interactions. Proceedings of the 7th International Conference "Information
     Technology and Interactions" (IT&I-2020). Workshops Proceedings. Kyiv, Ukraine, December
     02-03, 2020. CEUR Workshop Proceedings, – Р. 85 – 95.
[14] Kalra D., Beale T., Heard S., Lloyd D. An HER architecture for Archetyped Health Information
     Systems, 2003. URL: http://www.openehr.org.
[15] Bresnick J. How to Choose the Right Healthcare Big Data Analytics Tools [Internet
     resource].URL:https://healthitanalytics.com/features/how-to-choose-the-right-healthcarebig-
     data-analytics-tools
[16] Heart diseases dataset [Internet resource]. URL: https://archive.ics.uci.edu/ml/machine-learning-
     databases/heart-disease/.
[17] Teslia I., Yehorchenkov O., Khlevna I., Khlevnyi А. Development concept and method of
     formation of specific project management methodologies. 2018. Eastern European Journal of
     Advanced Technologies, 5/3(95), 6 – 16.


                                                                                                  133